Skip to main navigation Skip to search Skip to main content

Minimum normalized google distance for unsupervised multilingual Chinese-English word sense disambiguation

  • Pengyuan Liu*
  • , Yongzeng Xue
  • , Shiqi Li
  • , Shui Liu
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper introduces normalized Google distance into the study of word sense disambiguation and presents a novel unsupervised method of word sense disambiguation. The normalized Google distance is a theory of similarity between words and phrases, based on information distance and Kolmogorov complexity by using the world-wide-web as database, with its page counts derived from a search engine such as Google. This unsupervised method regards the word sense disambiguation as a process of searching minimum normalized Google distance between n-gram and the translation or synonym of the target word, based on the supposition that one sense per n-gram. Our System is tested on Multilingual Chinese-English Lexical Sample task in Semeval-2007. Experimental result shows that our method outperforms the best competing system. Our Experiment on nouns of this dataset also gives a promising result.

Original languageEnglish
Title of host publicationProceedings - 4th International Conference on Genetic and Evolutionary Computing, ICGEC 2010
Pages252-255
Number of pages4
DOIs
StatePublished - 2010
Event4th International Conference on Genetic and Evolutionary Computing, ICGEC 2010 - Shenzhen, China
Duration: 13 Dec 201015 Dec 2010

Publication series

NameProceedings - 4th International Conference on Genetic and Evolutionary Computing, ICGEC 2010

Conference

Conference4th International Conference on Genetic and Evolutionary Computing, ICGEC 2010
Country/TerritoryChina
CityShenzhen
Period13/12/1015/12/10

Keywords

  • Normalized Google distance
  • One sense per n-gram
  • Unsupervised word sense disambiguation

Fingerprint

Dive into the research topics of 'Minimum normalized google distance for unsupervised multilingual Chinese-English word sense disambiguation'. Together they form a unique fingerprint.

Cite this