TY - GEN
T1 - Marrying k-means with evidence accumulation in clustering analysis
AU - Zhang, Hongli
AU - Guo, Xiaoding
AU - Ye, Lin
AU - Li, Shang
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12
Y1 - 2018/12
N2 - Text clustering is becoming increasingly important to text mining and to the development of commercial applications. Previous research mainly focused on single clustering on documents. Compared with cluster ensembles, segmentations obtained from single clustering runs are less convincing in terms of accuracy and consistency. In this paper, we propose an approach based on evidence accumulation clustering (EAC) with k-means for text clustering problems. Our goal is to obtain a consistent, stable, and credible clustering scheme. First, we ran the k-means algorithm multiple times while the number of clusters ranges in an optimum area. Then, we constructed a matrix called co-association matrix by integrating all the derived clustering partitions. Finally, we obtained consistent clusters by performing hierarchical cluster algorithm on the co-association matrix. The linkage criterion used was a single link. The above process is equivalent to the process of finding a minimum spanning tree (MST) for a completed graph determined by a co-association matrix. The algorithm was tested on four text data sets. Experimental results showed that our method improves the accuracy of the final results.
AB - Text clustering is becoming increasingly important to text mining and to the development of commercial applications. Previous research mainly focused on single clustering on documents. Compared with cluster ensembles, segmentations obtained from single clustering runs are less convincing in terms of accuracy and consistency. In this paper, we propose an approach based on evidence accumulation clustering (EAC) with k-means for text clustering problems. Our goal is to obtain a consistent, stable, and credible clustering scheme. First, we ran the k-means algorithm multiple times while the number of clusters ranges in an optimum area. Then, we constructed a matrix called co-association matrix by integrating all the derived clustering partitions. Finally, we obtained consistent clusters by performing hierarchical cluster algorithm on the co-association matrix. The linkage criterion used was a single link. The above process is equivalent to the process of finding a minimum spanning tree (MST) for a completed graph determined by a co-association matrix. The algorithm was tested on four text data sets. Experimental results showed that our method improves the accuracy of the final results.
KW - Evidence accumulation clustering
KW - Hierar-chical clustering
KW - K-means
UR - https://www.scopus.com/pages/publications/85070819361
U2 - 10.1109/CompComm.2018.8780791
DO - 10.1109/CompComm.2018.8780791
M3 - 会议稿件
AN - SCOPUS:85070819361
T3 - 2018 IEEE 4th International Conference on Computer and Communications, ICCC 2018
SP - 2050
EP - 2056
BT - 2018 IEEE 4th International Conference on Computer and Communications, ICCC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 4th IEEE International Conference on Computer and Communications, ICCC 2018
Y2 - 7 December 2018 through 10 December 2018
ER -