Skip to main navigation Skip to search Skip to main content

Exploring wikipedia and query log's ability for text feature representation

  • Bing Li*
  • , Qing Cai Chen
  • , Daniel S. Yeung
  • , Wing W.Y. Ng
  • , Xiao Long Wang
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid increase of internet technology requires a better management of web page contents. Many text mining researches has been conducted, like text categorization, information retrieval, text clustering. When machine learning methods or statistical models are applied to such a large scale of data, the first step we have to solve is to represent a text document into the way that computers could handle. Traditionally, single words are always employed as features in Vector Space Model, which make up the feature space for all text documents. The single-word based representation is based on the word independence and doesn't consider their relations, which may cause information missing. This paper proposes Wiki-Query segmented features to text classification, in hopes of better using the text information. The experiment results show that a much better F1 value has been achieved than that of classical single-word based text representation. This means that Wikipedia and query segmented feature could better represent a text document.

Original languageEnglish
Title of host publicationProceedings of the Sixth International Conference on Machine Learning and Cybernetics, ICMLC 2007
Pages3343-3348
Number of pages6
DOIs
StatePublished - 2007
Externally publishedYes
Event6th International Conference on Machine Learning and Cybernetics, ICMLC 2007 - Hong Kong, China
Duration: 19 Aug 200722 Aug 2007

Publication series

NameProceedings of the Sixth International Conference on Machine Learning and Cybernetics, ICMLC 2007
Volume6

Conference

Conference6th International Conference on Machine Learning and Cybernetics, ICMLC 2007
Country/TerritoryChina
CityHong Kong
Period19/08/0722/08/07

Keywords

  • Query-log
  • Text feature representation
  • Wikipedia (Wiki)
  • Word-based model

Fingerprint

Dive into the research topics of 'Exploring wikipedia and query log's ability for text feature representation'. Together they form a unique fingerprint.

Cite this