Skip to main navigation Skip to search Skip to main content

Improved Katz smoothing algorithms with POS information

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

This paper reviewed existing smoothing methods for N-gram model firstly, and implemented the Absolute, W-B and Katz smoothing algorithms respectively. Traditional Katz algorithm couldn't discount the probability when it smoothed Chinese collocation. We constructed new discounting coefficient based on Part-of-Speech information to resolve this problem. Calculated by the new discounting coefficient, discount could decrease when word frequency increased, and the more count of following word, the more discount. All this satisfied demand of smoothing methods. Experiment result showed that improved Katz smoothing algorithm could not only decrease the cross entropy of language model, but also increase the F measure when applied to Chinese word segmentation.

Original languageEnglish
Pages (from-to)1445-1448
Number of pages4
JournalHarbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology
Volume39
Issue number9
StatePublished - Sep 2007
Externally publishedYes

Keywords

  • Data sparseness
  • Katz smoothing
  • N-gram model
  • POS information

Fingerprint

Dive into the research topics of 'Improved Katz smoothing algorithms with POS information'. Together they form a unique fingerprint.

Cite this