Skip to main navigation Skip to search Skip to main content

Keyword spotting in degraded document using mixed OCR and word shape coding

  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper presents a new way for keyword spotting in degraded imaged document. Two prevalent word indexing, OCR and word shape coding, are combined compactly based on the recognition confidence evaluation. The basic procedures are as follows. First, OCR candidates are used for OCR indexing. Second, a new stoke feature and convex-concave feature of word are adopted for word shape coding. Furthermore, an intelligent indexing based on recognition confidence is introduced, which is adaptive to image quality. Finally, an inexact matching is used for word spotting. A collection from NLM, including 1553 scanned imaged documents, is used to evaluate our method. The results confirm the validity of our method.

Original languageEnglish
Title of host publicationProceedings - 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010
Pages411-414
Number of pages4
DOIs
StatePublished - 2010
Externally publishedYes
Event2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010 - Xiamen, China
Duration: 29 Oct 201031 Oct 2010

Publication series

NameProceedings - 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010
Volume3

Conference

Conference2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2010
Country/TerritoryChina
CityXiamen
Period29/10/1031/10/10

Keywords

  • Degraded imaged document
  • Keyword spotting
  • OCR indexing
  • Word shape coding

Fingerprint

Dive into the research topics of 'Keyword spotting in degraded document using mixed OCR and word shape coding'. Together they form a unique fingerprint.

Cite this