Skip to main navigation Skip to search Skip to main content

A Chinese anti-spam filter approach based on Support Vector Machine

  • Xiu Li Pang*
  • , Yu Qiang Eeng
  • , Wei Jiang
  • *Corresponding author for this work
  • School of Management, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents an anti-spam filter approach based on Support Vector Machine (SVM). Firstly, we adopt the tri-gram language model to perform word segmentation in the Chinese Email. In order to overcome the sparse data problem, the Absolute Discount Smoothing algorithm is applied. Secondly, the different factoid words are identified by the Automaton Machine, so as to acquire the approximate syntactic and semantic usage of factoid words in the anti-spam filter task. Thirdly, we apply Support Vector Machine to filter the spam, where the Emails are permitted tobe written by the cross language, including Chinese and English. The experiments in the large-scale corpora with the cross language show that the SVM can improve the generalization than the Naïve Bayes (Smoothed by Lidstone algorithm) by 4.09% precision, and 8.18% higher precision than the Maximum Entropy Model.

Original languageEnglish
Title of host publicationProceedings of 2007 International Conference on Management Science and Engineering, ICMSE'07 (14th)
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages97-102
Number of pages6
ISBN (Print)9787883580805
DOIs
StatePublished - 2007
Externally publishedYes
Event2007 International Conference on Management Science and Engineering, ICMSE'07 - Harbin, China
Duration: 20 Aug 200722 Aug 2007

Publication series

NameProceedings of 2007 International Conference on Management Science and Engineering, ICMSE'07 (14th)

Conference

Conference2007 International Conference on Management Science and Engineering, ICMSE'07
Country/TerritoryChina
CityHarbin
Period20/08/0722/08/07

Keywords

  • Anti-spam filter
  • Maximum Entropy
  • Naïve Bayes
  • Support Vector Machine

Fingerprint

Dive into the research topics of 'A Chinese anti-spam filter approach based on Support Vector Machine'. Together they form a unique fingerprint.

Cite this