Skip to main navigation Skip to search Skip to main content

A Cluster tree method for text categorization

  • Zhaocai Sun*
  • , Yunming Ye
  • , Weiru Deng
  • , Zhexue Huang
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen

Research output: Contribution to journalConference articlepeer-review

Abstract

The decision tree is a flexible and useful classification tool. But on the data with high dimensionality, it meets problems. For most of current decision tree algorithms, when splitting a node of a tree, only the "best" one feature is selected and used. Since more features are ignored, the classification accuracy is not high. To solve the problem, this paper uses a cluster tree for text categorization. Unlike familiar decision trees (e.g. CART, C4.5), clustering results are used as the splitting rule and more features are considered. Obviously, the used clustering algorithm is an very important to the cluster tree. For better performance, a text clustering algorithm is proposed to enhance the cluster tree. Experiments show that the cluster tree solves the high-dimensionality problem and outperforms C4.5 and CART on text data. Sometimes, it may do better than LibSVM, which may be the most powerful tool for text categorization.

Original languageEnglish
Pages (from-to)3785-3790
Number of pages6
JournalProcedia Engineering
Volume15
DOIs
StatePublished - 2011
Externally publishedYes
Event2011 International Conference on Advanced in Control Engineering and Information Science, CEIS 2011 - Dali, Yunnam, China
Duration: 18 Aug 201119 Aug 2011

Keywords

  • Cluster tree
  • Decision tree
  • Text categorixzation

Fingerprint

Dive into the research topics of 'A Cluster tree method for text categorization'. Together they form a unique fingerprint.

Cite this