Skip to main navigation Skip to search Skip to main content

Decoupling Recognition from Detection: Single Shot Self-Reliant Scene Text Spotter

  • Jingjing Wu
  • , Pengyuan Lyu
  • , Guangming Lu
  • , Chengquan Zhang
  • , Kun Yao
  • , Wenjie Pei*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Typical text spotters follow the two-stage spotting strategy: detect the precise boundary for a text instance first and then perform text recognition within the located text region. While such strategy has achieved substantial progress, there are two underlying limitations. 1) The performance of text recognition depends heavily on the precision of text detection, resulting in the potential error propagation from detection to recognition. 2) The RoI cropping which bridges the detection and recognition brings noise from background and leads to information loss when pooling or interpolating from feature maps. In this work we propose the single shot Self-Reliant Scene Text Spotter (SRSTS), which circumvents these limitations by decoupling recognition from detection. Specifically, we conduct text detection and recognition in parallel and bridge them by the shared positive anchor point. Consequently, our method is able to recognize the text instances correctly even though the precise text boundaries are challenging to detect. Additionally, our method reduces the annotation cost for text detection substantially. Extensive experiments on regular-shaped benchmark and arbitrary-shaped benchmark demonstrate that our SRSTS compares favorably to previous state-of-the-art spotters in terms of both accuracy and efficiency.

Original languageEnglish
Title of host publicationMM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages1319-1328
Number of pages10
ISBN (Electronic)9781450392037
DOIs
StatePublished - 10 Oct 2022
Externally publishedYes
Event30th ACM International Conference on Multimedia, MM 2022 - Lisboa, Portugal
Duration: 10 Oct 202214 Oct 2022

Publication series

NameMM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

Conference

Conference30th ACM International Conference on Multimedia, MM 2022
Country/TerritoryPortugal
CityLisboa
Period10/10/2214/10/22

Keywords

  • ocr (optical character recognition)
  • text detection
  • text recognition

Fingerprint

Dive into the research topics of 'Decoupling Recognition from Detection: Single Shot Self-Reliant Scene Text Spotter'. Together they form a unique fingerprint.

Cite this