Abstract
Traditional text classification method only uses one model for classification, so it is easy to ignore the overlapping of different categories of feature words, which affects the classification performance. To improve accuracy of text classification, this paper proposes a text classification algorithm based on topic similarity clustering. The algorithm combines CHI with WordCount to extract category feature words. Then it performs clustering using the K-means algorithm and extracts cluster feature words to constructs a cluster feature word library. On this basis, the Adaptive Strategy algorithm is used to adaptively choose fasttext, TextCNN or RCNN model for classification to obtain the final classification result. Experimental results on the AG News dataset show that the proposed algorithm can better solve overlapping of different categories of feature words, and significantly improves text classification performance compared with fasttext, TextCNN and RCNN models used alone.
| Original language | English |
|---|---|
| Pages (from-to) | 93-98 |
| Number of pages | 6 |
| Journal | Jisuanji Gongcheng/Computer Engineering |
| Volume | 46 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2020 |
| Externally published | Yes |
Keywords
- Adaptive algorithm
- CHI method
- Feature extraction
- K-means algorithm
- Text classification
Fingerprint
Dive into the research topics of 'Adaptive text classification based on topic similarity clustering'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver