Skip to main navigation Skip to search Skip to main content

Scanned english document retrieval based on OCR and word shape coding

  • Yong Xia*
  • , Ru Wei Dai
  • , Bai Hua Xiao
  • , Chun Heng Wang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Two commonly used methods for scanned document retrieval are analyzed, namely retrieval based on optical character recognition (OCR) and retrieval based on word shape coding. A new strategy of combining these two methods based on recognition confidence is given. Furthermore, a new way for word shape coding based on typographic feature and stroke is presented and it is tolerant to fonts. Experiments are conducted based on different word indexing and the results verify the validity of the proposed method.

Original languageEnglish
Pages (from-to)488-493
Number of pages6
JournalMoshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence
Volume22
Issue number3
StatePublished - Jun 2009
Externally publishedYes

Keywords

  • Document retrieval
  • Evaluation of recognition confidence
  • Optical character recognition (OCR)
  • Word shape coding

Fingerprint

Dive into the research topics of 'Scanned english document retrieval based on OCR and word shape coding'. Together they form a unique fingerprint.

Cite this