Skip to main navigation Skip to search Skip to main content

A new document representation using term frequency and vectorized graph connectionists with application to document retrieval

  • Tommy W.S. Chow*
  • , Haijun Zhang
  • , M. K.M. Rahman
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This paper presents a new document representation with vectorized multiple features including term frequency and term-connection-frequency. A document is represented by undirected and directed graph, respectively. Then terms and vectorized graph connectionists are extracted from the graphs by employing several feature extraction methods. This hybrid document feature representation more accurately reflects the underlying semantics that are difficult to achieve from the currently used term histograms, and it facilitates the matching of complex graph. In application level, we develop a document retrieval system based on self-organizing map (SOM) to speed up the retrieval process. We perform extensive experimental verification, and the results suggest that the proposed method is computationally efficient and accurate for document retrieval.

Original languageEnglish
Pages (from-to)12023-12035
Number of pages13
JournalExpert Systems with Applications
Volume36
Issue number10
DOIs
StatePublished - Dec 2009
Externally publishedYes

Keywords

  • Document retrieval
  • Graph representation
  • Multiple features
  • Self-organizing map

Fingerprint

Dive into the research topics of 'A new document representation using term frequency and vectorized graph connectionists with application to document retrieval'. Together they form a unique fingerprint.

Cite this