Skip to main navigation Skip to search Skip to main content

Adaptive text classification based on topic similarity clustering

  • Yan Kang*
  • , Qiyue Yang
  • , Hao Li
  • , Wentao Liang
  • , Jinyuan Li
  • , Guorong Cui
  • , Peiyao Wang
  • *Corresponding author for this work
  • Yunnan University

Research output: Contribution to journalArticlepeer-review

Abstract

Traditional text classification method only uses one model for classification, so it is easy to ignore the overlapping of different categories of feature words, which affects the classification performance. To improve accuracy of text classification, this paper proposes a text classification algorithm based on topic similarity clustering. The algorithm combines CHI with WordCount to extract category feature words. Then it performs clustering using the K-means algorithm and extracts cluster feature words to constructs a cluster feature word library. On this basis, the Adaptive Strategy algorithm is used to adaptively choose fasttext, TextCNN or RCNN model for classification to obtain the final classification result. Experimental results on the AG News dataset show that the proposed algorithm can better solve overlapping of different categories of feature words, and significantly improves text classification performance compared with fasttext, TextCNN and RCNN models used alone.

Original languageEnglish
Pages (from-to)93-98
Number of pages6
JournalJisuanji Gongcheng/Computer Engineering
Volume46
Issue number3
DOIs
StatePublished - 2020
Externally publishedYes

Keywords

  • Adaptive algorithm
  • CHI method
  • Feature extraction
  • K-means algorithm
  • Text classification

Fingerprint

Dive into the research topics of 'Adaptive text classification based on topic similarity clustering'. Together they form a unique fingerprint.

Cite this