Skip to main navigation Skip to search Skip to main content

Topic detection by topic model induced distance using biased initiation

  • Yonghui Wu*
  • , Yuxin Ding
  • , Xiaolong Wang
  • , Jun Xu
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Clustering is widely used in topic detection task. However, the vector space model based distance, such as cosine-like distance, will get a low precision and recall when the corpus contains many related topics. In this paper, we propose a new distance measure method: the Topic Model (TM) induced distance. Assuming that the distribution of word is different in each topic, the documents can be treated as a sample of the mixture of k topic models, which can be estimated using expectation maximization (EM). A biased initiation method is proposed in this paper for topic decomposition using EM, which will generate a converged matrix for the generation of TM induced distance. The collections of web news are clustered into classes using this TM distance. A series of experiments are described on a corpus containing 5033 web news from 30 topics. K-means clustering is processed on test set with different topic numbers. A comparison of clustering result using the TM induced distance and the traditional cosine-like distance are given. The experiment results show that the proposed topic decomposition method using biased initiation is effective than the topic decomposition using random values. The TM induced distance will generate more topical groups than the VS model based cosine-like distance. In the web news collections containing related topics, the TM induced distance can achieve a better precision and recall.

Original languageEnglish
Title of host publicationAdvances in Computer Science and Information Technology - AST/UCMA/ISA/ACN 2010 Conferences, Joint Proceedings
Pages310-323
Number of pages14
DOIs
StatePublished - 2010
Externally publishedYes
Event2nd International Conference on Advanced Science and Technology - Miyazaki, Japan
Duration: 23 Jun 201025 Jun 2010

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume6059 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2nd International Conference on Advanced Science and Technology
Country/TerritoryJapan
CityMiyazaki
Period23/06/1025/06/10

Keywords

  • Topic detection
  • clustering
  • distance measure
  • topic model

Fingerprint

Dive into the research topics of 'Topic detection by topic model induced distance using biased initiation'. Together they form a unique fingerprint.

Cite this