Abstract
In this paper, a new probabilistic model, conditional random field (CRF), which is very fit for labeling sequence data, is firstly introduced to the task of Chinese named entity recognition (CNER). Unlike the generative model, CRF does not make effort on the observation modeling and can utilize rich overlapped features; moreover it can avoid the label bias problem of discriminative model. In order to perform CNER, special features are selected to capture more informative traits of Chinese language. In addition, word triggers are integrated to CRF to solve the long distance constraint problem, which has advantages of small parameters space and memory size compared with mining parallel information in a large sized window or whole sentence. Word triggers are selected by two steps: preparing candidate words and estimating the correlation degree of two words. The two methods of AMI (Average Mutual Information) and χ2 statistic are used to estimate the correlation degree. Experimental results on half-year People's Daily show that the CRF together with word triggers extracted by the method of χ2 can achieve the state-of-the-art performance.
| Original language | English |
|---|---|
| Pages (from-to) | 795-801 |
| Number of pages | 7 |
| Journal | Gaojishu Tongxin/Chinese High Technology Letters |
| Volume | 16 |
| Issue number | 8 |
| State | Published - Aug 2006 |
| Externally published | Yes |
Keywords
- Chinese named entity recognition (CNER)
- Conditional random fields (CRF)
- Information extraction (IE)
- Natural language processing (NLP)
- Probabilistic model
- Word triggers
Fingerprint
Dive into the research topics of 'Chinese named entity recognition: a CRF approach based on word triggers information'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver