Skip to main navigation Skip to search Skip to main content

An improved method for semantic similarity calculation based on stop-words

  • Haodi Li*
  • , Qingcai Chen
  • , Xiaolong Wang
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text similarity calculation has become one of the key issues of many applications such as information retrieval, semantic disambiguation, automatic question answering. There are increasing needs of similarity calculations in different levels, e.g. characters, vocabularies, syntactic structures and semantic etc. Most of existing semantic similarity algorithms can be categorized into statistical based methods, rule based methods and combination of these two methods. Statistical methods use knowledge bases to incorporate more comprehensive knowledge and have the capability of reducing knowledge noise. So they are able to obtain better performance. Nevertheless, for the unbalanced distribution of different items in the knowledge base, semantic similarity calculation performance for low-frequency words is usually poor. In this work, based on the distributions of stop-words, we proposes a weights normalization method for semantic dimensions. The proposed method uses the semantic independence of stop-words to avoid semantic bias of corpus in statistical methods. It further improves the accuracy of semantic similarity computation. Experiments compared with several existing algorithms show the effectiveness of the proposed method.

Original languageEnglish
Title of host publicationMachine Learning and Cybernetics - 13th International Conference, Proceedings
EditorsXizhao Wang, Qiang He, Patrick P.K. Chan, Witold Pedrycz
PublisherSpringer Verlag
Pages339-347
Number of pages9
ISBN (Electronic)9783662456514
DOIs
StatePublished - 2014
Externally publishedYes
Event13th International Conference on Machine Learning and Cybernetics, ICMLC 2014 - Lanzhou, China
Duration: 13 Jul 201416 Jul 2014

Publication series

NameCommunications in Computer and Information Science
Volume481
ISSN (Print)1865-0929

Conference

Conference13th International Conference on Machine Learning and Cybernetics, ICMLC 2014
Country/TerritoryChina
CityLanzhou
Period13/07/1416/07/14

Keywords

  • ESA
  • Semantic dimension normalization
  • Semantic similarity
  • Stop-words

Fingerprint

Dive into the research topics of 'An improved method for semantic similarity calculation based on stop-words'. Together they form a unique fingerprint.

Cite this