Skip to main navigation Skip to search Skip to main content

Classification-based Chinese collocation extraction

  • Ruifeng Xu*
  • , Qin Lu
  • , Kam Fai Wong
  • , Wenjie Li
  • *Corresponding author for this work
  • Hong Kong Polytechnic University
  • Chinese University of Hong Kong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most collocation extraction algorithms use a single set of criteria and a single threshold which is not quite appropriate because different types of collocations have different behaviors. This paper presents a window-based Chinese collocation extraction system, which identifies different types of collocations separately. By taking into consideration of compositional, non-substitutable, and non-modifiable properties as well as statistical significance, Chinese collocations are classified into four types. A multi-stage extraction system is then designed to separately identify different types of collocations by using different combinations of features. Furthermore, heuristic rules based on dependency knowledge are applied to filter out some pseudo collocations. Experiments show that the proposed system achieves better F 1 performance compared to most existing algorithms for Chinese collocation extraction.

Original languageEnglish
Title of host publicationIEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering
Pages308-315
Number of pages8
DOIs
StatePublished - 2007
Externally publishedYes
EventInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007 - Beijing, China
Duration: 30 Aug 20071 Sep 2007

Publication series

NameIEEE NLP-KE 2007 - Proceedings of International Conference on Natural Language Processing and Knowledge Engineering

Conference

ConferenceInternational Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
Country/TerritoryChina
CityBeijing
Period30/08/071/09/07

Fingerprint

Dive into the research topics of 'Classification-based Chinese collocation extraction'. Together they form a unique fingerprint.

Cite this