Abstract
To improve the precision of collocation extraction, this paper proposes a new method based on Internet data. For the constraint by the corpus scale for traditional collocation extraction approach based on linguistic corpus, we acquire collocations from Web, which contains plenty of information and knowledge. Three classical association measures of co-occurrence frequency, mutual information and χ2-test are used to automatically extract the collocation. Based on the experimental results, the benchmarks show that the performance of this new Web-based approach is superior to that of traditional approach in both precision and recall. Thus the data from Internet may be applied in many NLP applications.
| Original language | English |
|---|---|
| Pages (from-to) | 281-285 |
| Number of pages | 5 |
| Journal | Harbin Gongye Daxue Xuebao/Journal of Harbin Institute of Technology |
| Volume | 42 |
| Issue number | 2 |
| State | Published - Feb 2010 |
Keywords
- Co-occurrence frequency
- Collocation
- Corpora
- Mutual information
- Web
- χ-test
Fingerprint
Dive into the research topics of 'Automatic collocation extraction using web feedback data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver