Skip to main navigation Skip to search Skip to main content

k-ANMI: A mutual information based clustering algorithm for categorical data

  • Zengyou He*
  • , Xiaofei Xu
  • , Shengchun Deng
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, average normalized mutual information - ANMI) borrowed from cluster ensemble. This algorithm is easy to implement, requiring multiple hash tables as the only major data structure. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-the-art categorical data clustering algorithms with respect to clustering accuracy.

Original languageEnglish
Pages (from-to)223-233
Number of pages11
JournalInformation Fusion
Volume9
Issue number2
DOIs
StatePublished - Apr 2008

Keywords

  • Categorical data
  • Cluster ensemble
  • Clustering
  • Data mining
  • Mutual information

Fingerprint

Dive into the research topics of 'k-ANMI: A mutual information based clustering algorithm for categorical data'. Together they form a unique fingerprint.

Cite this