Skip to main navigation Skip to search Skip to main content

Short Text Clustering Enhanced by Semantic Matching Model

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Harbin Institute of Technology Weihai
  • Qilu University of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the popularity of social networks, short text clustering has become a more and more important task that is widely used. Short text clustering is a challenging problem because social network short texts are characterized by irregular words, a lot of noise, and sparse features. We propose a Short Text Clustering enhanced by Semantic Matching Model (abbr. to STCSMM). The STCSMM method applies the knowledge of the tagged text similarity task dataset to the short text clustering through the semantic matching model, thereby improving the effect of short text clustering. First, we train a semantic matching network on the data set of the text similarity task, where the network contains the feature extraction layer and the vector distance calculation layer. Then, we use the learned feature extraction layer to extract short text feature and use the vector distance calculation layer replaces the commonly used distance metrics in the traditional K-means algorithm, such as cosine distance, Euclidean distance and so on. Finally, the text features obtained by feature extraction layer are applied to K-means based on vector distance calculation layer. This improved K-means clustering (STCSMM) has better performance on the microblog text clustering dataset than some existing methods such as K-means clustering with LDA, LSI or average word embedding feature vectors.

Original languageEnglish
Title of host publication2019 2nd International Conference on Information Systems and Computer Aided Education, ICISCAE 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages480-484
Number of pages5
ISBN (Electronic)9781728130668
DOIs
StatePublished - Sep 2019
Externally publishedYes
Event2nd IEEE International Conference on Information Systems and Computer Aided Education, ICISCAE 2019 - Dalian, China
Duration: 28 Sep 201930 Sep 2019

Publication series

Name2019 2nd International Conference on Information Systems and Computer Aided Education, ICISCAE 2019

Conference

Conference2nd IEEE International Conference on Information Systems and Computer Aided Education, ICISCAE 2019
Country/TerritoryChina
CityDalian
Period28/09/1930/09/19

Keywords

  • K- means
  • STCSMM
  • Semantic matching model
  • Short text clustering

Fingerprint

Dive into the research topics of 'Short Text Clustering Enhanced by Semantic Matching Model'. Together they form a unique fingerprint.

Cite this