Skip to main navigation Skip to search Skip to main content

Subtopic segmentation of Chinese document: An adapted dotplot approach

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

An adapted dotplot model based on Chinese word sense quantization is presented to find the boundaries of subtopics in a document The data reduction techniques of rough set are introduced for the purpose of selecting axis words for word space. To discrete and filter data in information table, mutual information between axis words and feature words is calculated. Then the adapted model is constructed through the replacing of counting identical words with calculating of sense similarity between feature words. As a submodule of our Chinese auto-summarization system "InsunAbs", its performance is indirectly evaluated through the quantitatively evaluation of "InsunAbs". By compared with the baseline and the original dotplot model, the performance of this adapted model is outperforming in testing experiments.

Original languageEnglish
Title of host publicationProceedings of 2002 International Conference on Machine Learning and Cybernetics
Pages1571-1576
Number of pages6
StatePublished - 2002
Externally publishedYes
EventProceedings of 2002 International Conference on Machine Learning and Cybernetics - Beijing, China
Duration: 4 Nov 20025 Nov 2002

Publication series

NameProceedings of 2002 International Conference on Machine Learning and Cybernetics
Volume3

Conference

ConferenceProceedings of 2002 International Conference on Machine Learning and Cybernetics
Country/TerritoryChina
CityBeijing
Period4/11/025/11/02

Keywords

  • Attribute reduction
  • Dotplot
  • Mutual information
  • Rough set
  • Subtopic segmentation

Fingerprint

Dive into the research topics of 'Subtopic segmentation of Chinese document: An adapted dotplot approach'. Together they form a unique fingerprint.

Cite this