Skip to main navigation Skip to search Skip to main content

Whunter: A focused web crawler – A tool for digital library

  • Shanghai Jiao Tong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Topic-driven Web Crawler or focused crawler is the key tool of online web information library. It’s a challenging issue that how to achieve good performance efficiently with limited time and space resources. This paper proposes a focused web crawler wHunter that implements incremental and multi-strategy learning by taking the advantages of both SVM (support vector machines) and naïve Bayes. On the one hand, the initial performance is guaranteed via SVM classifier; on the other hand, when enough web pages are obtained, the classifier is switched to naïve Bayes so that on-line incremental learning is achieved. Experimental results show that our proposed algorithm is efficient and easy to implement.

Original languageEnglish
Title of host publicationDigital Libraries
Subtitle of host publicationInternational Collaboration and Cross-Fertilization - 7th International Conference on Asian Digital Libraries, ICADL 2004
EditorsQihao Miao, Ee-peng Lim, Zhaoneng Chen, Yuxi Fu, Hsinchun Chen, Edward Fox
PublisherSpringer Verlag
Pages519-522
Number of pages4
ISBN (Print)9783540240303
DOIs
StatePublished - 2005
Externally publishedYes
Event7th International Conference on Asian Digital Libraries, ICADL 2004 - Shanghai, China
Duration: 13 Dec 200417 Dec 2004

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3334 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Conference on Asian Digital Libraries, ICADL 2004
Country/TerritoryChina
CityShanghai
Period13/12/0417/12/04

Fingerprint

Dive into the research topics of 'Whunter: A focused web crawler – A tool for digital library'. Together they form a unique fingerprint.

Cite this