Abstract
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, average normalized mutual information - ANMI) borrowed from cluster ensemble. This algorithm is easy to implement, requiring multiple hash tables as the only major data structure. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-the-art categorical data clustering algorithms with respect to clustering accuracy.
| Original language | English |
|---|---|
| Pages (from-to) | 223-233 |
| Number of pages | 11 |
| Journal | Information Fusion |
| Volume | 9 |
| Issue number | 2 |
| DOIs | |
| State | Published - Apr 2008 |
Keywords
- Categorical data
- Cluster ensemble
- Clustering
- Data mining
- Mutual information
Fingerprint
Dive into the research topics of 'k-ANMI: A mutual information based clustering algorithm for categorical data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver