Skip to main navigation Skip to search Skip to main content

Web site classification based on key resources

  • Zhi Ming Xu*
  • , Xin Bo Gao
  • , Meng Lei
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic web site classification has a wide application prospect. However, there is a little research on the web site classification. Many methods represent the web site as normal text and still use the methods of text classification. But web sites are combination of many web pages via hyperlinks, so the methods of text classification are not suitable for web sites. This paper proposes a new approach to web site classification. First of all, we get the key resources of web site through a reasonable pruning strategy. Then abstract the topic vector of web site from the key resources, according to the web site's structure information and content information. To reflect the structure information of the web site, we use an improved vector space model which includes both structure feature words and content feature words to represent the topic vector of the web site.

Original languageEnglish
Title of host publicationProceedings of the 2009 International Conference on Machine Learning and Cybernetics
Pages3522-3526
Number of pages5
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 International Conference on Machine Learning and Cybernetics - Baoding, China
Duration: 12 Jul 200915 Jul 2009

Publication series

NameProceedings of the 2009 International Conference on Machine Learning and Cybernetics
Volume6

Conference

Conference2009 International Conference on Machine Learning and Cybernetics
Country/TerritoryChina
CityBaoding
Period12/07/0915/07/09

Keywords

  • Key resources
  • Topic vector of web site
  • Web site classification

Fingerprint

Dive into the research topics of 'Web site classification based on key resources'. Together they form a unique fingerprint.

Cite this