Skip to main navigation Skip to search Skip to main content

Chinese named entity recognition: a CRF approach based on word triggers information

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, a new probabilistic model, conditional random field (CRF), which is very fit for labeling sequence data, is firstly introduced to the task of Chinese named entity recognition (CNER). Unlike the generative model, CRF does not make effort on the observation modeling and can utilize rich overlapped features; moreover it can avoid the label bias problem of discriminative model. In order to perform CNER, special features are selected to capture more informative traits of Chinese language. In addition, word triggers are integrated to CRF to solve the long distance constraint problem, which has advantages of small parameters space and memory size compared with mining parallel information in a large sized window or whole sentence. Word triggers are selected by two steps: preparing candidate words and estimating the correlation degree of two words. The two methods of AMI (Average Mutual Information) and χ2 statistic are used to estimate the correlation degree. Experimental results on half-year People's Daily show that the CRF together with word triggers extracted by the method of χ2 can achieve the state-of-the-art performance.

Original languageEnglish
Pages (from-to)795-801
Number of pages7
JournalGaojishu Tongxin/Chinese High Technology Letters
Volume16
Issue number8
StatePublished - Aug 2006
Externally publishedYes

Keywords

  • Chinese named entity recognition (CNER)
  • Conditional random fields (CRF)
  • Information extraction (IE)
  • Natural language processing (NLP)
  • Probabilistic model
  • Word triggers

Fingerprint

Dive into the research topics of 'Chinese named entity recognition: a CRF approach based on word triggers information'. Together they form a unique fingerprint.

Cite this