Abstract
This paper reviewed existing smoothing methods for N-gram model firstly, and implemented the Absolute, W-B and Katz smoothing algorithms respectively. Traditional Katz algorithm couldn't discount the probability when it smoothed Chinese collocation. We constructed new discounting coefficient based on Part-of-Speech information to resolve this problem. Calculated by the new discounting coefficient, discount could decrease when word frequency increased, and the more count of following word, the more discount. All this satisfied demand of smoothing methods. Experiment result showed that improved Katz smoothing algorithm could not only decrease the cross entropy of language model, but also increase the F measure when applied to Chinese word segmentation.
| Original language | English |
|---|---|
| Pages (from-to) | 1445-1448 |
| Number of pages | 4 |
| Journal | Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology |
| Volume | 39 |
| Issue number | 9 |
| State | Published - Sep 2007 |
| Externally published | Yes |
Keywords
- Data sparseness
- Katz smoothing
- N-gram model
- POS information
Fingerprint
Dive into the research topics of 'Improved Katz smoothing algorithms with POS information'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver