Skip to main navigation Skip to search Skip to main content

Chinese document image retrieval based on recognition candidates

  • Xuhui Jia*
  • , Yong Xia
  • , Rui Zhou
  • , Hongwei Liang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

For the sake of the low recognition rate for degraded Chinese document, the retrieval performance is not good if directly based on OCR result. In this paper, an indexing method with n-gram and recognition candidates is proposed to improve the performance of retrieval. For ease of test, this paper also presents a method to automatically generate ground-truth of imaged document, synthesized degraded document image and ground-truth of recognition candidates. Several synthesized document image collections on large-scale are built and used, and the experimental results show that the retrieval performance are improved for both collections with high or low OCR error rates.

Original languageEnglish
Title of host publicationProceedings - 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2012
Pages2892-2897
Number of pages6
DOIs
StatePublished - 2012
Externally publishedYes
Event2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2012 - Chongqing, China
Duration: 29 May 201231 May 2012

Publication series

NameProceedings - 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2012

Conference

Conference2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2012
Country/TerritoryChina
CityChongqing
Period29/05/1231/05/12

Keywords

  • Chinese document image retrieval
  • indexing method with n-gram and recognition candidates
  • synthesized degraded document image

Fingerprint

Dive into the research topics of 'Chinese document image retrieval based on recognition candidates'. Together they form a unique fingerprint.

Cite this