Skip to main navigation Skip to search Skip to main content

CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering

  • Ruilin Zhang
  • , Haiyang Zheng
  • , Hongpeng Wang*
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Peng Cheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

One of the main challenges in data mining is choosing the optimal number of clusters without prior information. Notably, existing methods are usually in the philosophy of cluster validation and hence have underlying assumptions on data distribution, which prevents their application to complex data such as large-scale images and high-dimensional data from the real world. In this regard, we propose an approach named CNMBI. Leveraging the distribution information inherent in the data space, we map the target task as a dynamic comparison process between cluster centers regarding positional behavior, without relying on the complete clustering results and designing the complex validity index as before. Bipartite graph theory is then employed to efficiently model this process. Additionally, we find that different samples have different confidence levels and thereby actively remove low-confidence ones, which is, for the first time to our knowledge, considered in cluster number determination. CNMBI is robust and allows for more flexibility in the dimension and shape of the target data (e.g., CIFAR-10 and STL-10). Extensive comparisof-the-art competitors on various challenging datasets demonstrate the superiority of our method.

Original languageEnglish
Title of host publicationAdvanced Data Mining and Applications - 19th International Conference, ADMA 2023, Proceedings
EditorsXiaochun Yang, Bin Wang, Heru Suhartanto, Guoren Wang, Jing Jiang, Bing Li, Huaijie Zhu, Ningning Cui
PublisherSpringer Science and Business Media Deutschland GmbH
Pages262-277
Number of pages16
ISBN (Print)9783031466762
DOIs
StatePublished - 2023
Externally publishedYes
Event19th International Conference on Advanced Data Mining and Applications, ADMA 2023 - Shenyang, China
Duration: 21 Aug 202323 Aug 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14180 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th International Conference on Advanced Data Mining and Applications, ADMA 2023
Country/TerritoryChina
CityShenyang
Period21/08/2323/08/23

Keywords

  • Boundary filtering
  • Cluster center
  • Complex data
  • Number of clusters
  • Pairwise matching

Fingerprint

Dive into the research topics of 'CNMBI: Determining the Number of Clusters Using Center Pairwise Matching and Boundary Filtering'. Together they form a unique fingerprint.

Cite this