Abstract
In view of the data sparseness problem in question classification, the paper proposes an approach for classifying Chinese factoid questions using interrogative and focus words as the key cues. The approach first identifies interrogative and focus words in the questions raised by users automatically and classifies the questions using the nearest neighbor model if the cue words exist, and then, classifies other questions using the support vector machine (SVM) model. The training set of SVM is extended automatically with the questions mined from Web when training the SVM model, while for the nearest neighbor model, only using the sense distance of the cue words for classification judgment. The experimental results show that the approach, selecting different classifiers according to question structure, outperforms the single classification model, and the problem of training data sparseness is alleviated using the word sense distance and the extension of training set, thus the classification performance is improved.
| Original language | English |
|---|---|
| Pages (from-to) | 111-118 |
| Number of pages | 8 |
| Journal | Gaojishu Tongxin/Chinese High Technology Letters |
| Volume | 19 |
| Issue number | 2 |
| DOIs | |
| State | Published - Feb 2009 |
Keywords
- Extension of training set
- Focus word
- Question classification
- Word sense distance
Fingerprint
Dive into the research topics of 'Chinese question classification based on identification of cue words and extension of training set'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver