Skip to main navigation Skip to search Skip to main content

Automatic collocation extraction using web feedback data

  • Jian Fang Lin*
  • , Cheng Niu
  • , Sheng Li
  • , De Quan Zheng
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

To improve the precision of collocation extraction, this paper proposes a new method based on Internet data. For the constraint by the corpus scale for traditional collocation extraction approach based on linguistic corpus, we acquire collocations from Web, which contains plenty of information and knowledge. Three classical association measures of co-occurrence frequency, mutual information and χ2-test are used to automatically extract the collocation. Based on the experimental results, the benchmarks show that the performance of this new Web-based approach is superior to that of traditional approach in both precision and recall. Thus the data from Internet may be applied in many NLP applications.

Original languageEnglish
Pages (from-to)281-285
Number of pages5
JournalHarbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology
Volume42
Issue number2
StatePublished - Feb 2010

Keywords

  • Co-occurrence frequency
  • Collocation
  • Corpora
  • Mutual information
  • Web
  • χ-test

Fingerprint

Dive into the research topics of 'Automatic collocation extraction using web feedback data'. Together they form a unique fingerprint.

Cite this