Skip to main navigation Skip to search Skip to main content

Research on automatic acquisition of domain terms

  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In order to solve the various issues in natural language processing more precisely, it is important to construct a system for automatic acquisition of domain terms. A method for automatic acquisition of domain terms from raw materials that are not segmented is presented in this paper. The raw domain corpus is pre-processed firstly. Then by using the method of Information Entropy and Log-likelihood ratio, we can extract candidate words automatically, after this we use the open-domain lexicon to preserve domain terms by removing general words. At last, confidence is used to remove the non-meaningful words to improve term acquisition accuracy from domain candidate term set, and the special domain lexicon is constructed finally. The experimental results show that this simple method is efficient in extracting most of the domain terms. The domain terms we extracted have been effectively applied in personalized Chinese word segmentation system.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Pages3026-3031
Number of pages6
DOIs
StatePublished - 2008
Event7th International Conference on Machine Learning and Cybernetics, ICMLC - Kunming, China
Duration: 12 Jul 200815 Jul 2008

Publication series

NameProceedings of the 7th International Conference on Machine Learning and Cybernetics, ICMLC
Volume5

Conference

Conference7th International Conference on Machine Learning and Cybernetics, ICMLC
Country/TerritoryChina
CityKunming
Period12/07/0815/07/08

Keywords

  • Automatic term extraction
  • Domain Terms
  • Information entropy
  • Log-likelihood ratio
  • Natural language processing

Fingerprint

Dive into the research topics of 'Research on automatic acquisition of domain terms'. Together they form a unique fingerprint.

Cite this