Abstract
The decision tree is a flexible and useful classification tool. But on the data with high dimensionality, it meets problems. For most of current decision tree algorithms, when splitting a node of a tree, only the "best" one feature is selected and used. Since more features are ignored, the classification accuracy is not high. To solve the problem, this paper uses a cluster tree for text categorization. Unlike familiar decision trees (e.g. CART, C4.5), clustering results are used as the splitting rule and more features are considered. Obviously, the used clustering algorithm is an very important to the cluster tree. For better performance, a text clustering algorithm is proposed to enhance the cluster tree. Experiments show that the cluster tree solves the high-dimensionality problem and outperforms C4.5 and CART on text data. Sometimes, it may do better than LibSVM, which may be the most powerful tool for text categorization.
| Original language | English |
|---|---|
| Pages (from-to) | 3785-3790 |
| Number of pages | 6 |
| Journal | Procedia Engineering |
| Volume | 15 |
| DOIs | |
| State | Published - 2011 |
| Externally published | Yes |
| Event | 2011 International Conference on Advanced in Control Engineering and Information Science, CEIS 2011 - Dali, Yunnam, China Duration: 18 Aug 2011 → 19 Aug 2011 |
Keywords
- Cluster tree
- Decision tree
- Text categorixzation
Fingerprint
Dive into the research topics of 'A Cluster tree method for text categorization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver