Skip to main navigation Skip to search Skip to main content

Evaluating tag quality for blogger modelling via topic models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

with the permission of annotating blog posts with tags, tags has become one of the most important resources used to describe blogger features. However, due to the irregular quality of tags, not all tags are appropriate for representing blogger's preferences. Poor tags or spam tags confuse the actual user's preferences and spam terms, thus they should be detected before they are directly used to tag bloggers. A detailed quantitative analysis on the categories of tag spam in the blogosphere is presented in this paper. Taking advantage of abundant text contents in blog posts and the relatively stable semantic relationship between tags and their target posts, an unsupervised approach based on topic models is proposed to evaluate tag quality for blogger modelling in the blogosphere. The latent interest topics of a blogger are mined out through Latent Dirichlet Allocation (LDA) topic modeling. The blog post of the blogger is represented as a distribution over latent topics and a latent topic is a distribution over words of the vocabulary. A tag is also expressed as a specific co-occurrence term vector. Ultimately, a scheme is devised to determine the similarity between each tag and its target blog post. Then the tags with less similarity value can be identified as poor tag. The experimental results indicate that the proposed method achieves more promising performance than the baselines on datasets collected from Sina Blog, which is one of the biggest Chinese blogs.

Original languageEnglish
Title of host publication2015 12th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2015
EditorsZhuo Tang, Jiayi Du, Shu Yin, Renfa Li, Ligang He
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1770-1776
Number of pages7
ISBN (Electronic)9781467376822
DOIs
StatePublished - 13 Jan 2016
Externally publishedYes
Event12th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2015 - Zhangjiajie, China
Duration: 15 Aug 201517 Aug 2015

Publication series

Name2015 12th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2015

Conference

Conference12th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2015
Country/TerritoryChina
CityZhangjiajie
Period15/08/1517/08/15

Keywords

  • blog representation
  • semantic similarity
  • tag quality evaluation
  • topic model

Fingerprint

Dive into the research topics of 'Evaluating tag quality for blogger modelling via topic models'. Together they form a unique fingerprint.

Cite this