Abstract
Two commonly used methods for scanned document retrieval are analyzed, namely retrieval based on optical character recognition (OCR) and retrieval based on word shape coding. A new strategy of combining these two methods based on recognition confidence is given. Furthermore, a new way for word shape coding based on typographic feature and stroke is presented and it is tolerant to fonts. Experiments are conducted based on different word indexing and the results verify the validity of the proposed method.
| Original language | English |
|---|---|
| Pages (from-to) | 488-493 |
| Number of pages | 6 |
| Journal | Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence |
| Volume | 22 |
| Issue number | 3 |
| State | Published - Jun 2009 |
| Externally published | Yes |
Keywords
- Document retrieval
- Evaluation of recognition confidence
- Optical character recognition (OCR)
- Word shape coding
Fingerprint
Dive into the research topics of 'Scanned english document retrieval based on OCR and word shape coding'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver