TY - GEN
T1 - Global-Local Aware Scene Text Editing
AU - Yang, Fuxiang
AU - Su, Tonghua
AU - Di, Donglin
AU - Chen, Yin
AU - Wu, Xiangqian
AU - Wang, Zhongjie
AU - Fan, Lei
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Scene Text Editing (STE) involves replacing text in a scene image with new target text while preserving both the original text style and background texture. Existing methods suffer from two major challenges: inconsistency and length-insensitivity. They often fail to maintain coherence between the edited local patch and the surrounding area, and they struggle to handle significant differences in text length before and after editing. To tackle these challenges, we propose an end-to-end framework called Global-Local Aware Scene Text Editing (GLASTE), which simultaneously incorporates high-level global contextual information along with delicate local features. Specifically, we design a global-local combination structure, joint global and local losses, and enhance text image features to ensure consistency in text style within local patches while maintaining harmony between local and global areas. Additionally, we express the text style as a vector independent of the image size, which can be transferred to target text images of various sizes. We use an affine fusion to fill target text images into the editing patch while maintaining their aspect ratio unchanged. Extensive experiments on real-world datasets validate that our GLASTE model outperforms previous methods in both quantitative metrics and qualitative results and effectively mitigates the two challenges.
AB - Scene Text Editing (STE) involves replacing text in a scene image with new target text while preserving both the original text style and background texture. Existing methods suffer from two major challenges: inconsistency and length-insensitivity. They often fail to maintain coherence between the edited local patch and the surrounding area, and they struggle to handle significant differences in text length before and after editing. To tackle these challenges, we propose an end-to-end framework called Global-Local Aware Scene Text Editing (GLASTE), which simultaneously incorporates high-level global contextual information along with delicate local features. Specifically, we design a global-local combination structure, joint global and local losses, and enhance text image features to ensure consistency in text style within local patches while maintaining harmony between local and global areas. Additionally, we express the text style as a vector independent of the image size, which can be transferred to target text images of various sizes. We use an affine fusion to fill target text images into the editing patch while maintaining their aspect ratio unchanged. Extensive experiments on real-world datasets validate that our GLASTE model outperforms previous methods in both quantitative metrics and qualitative results and effectively mitigates the two challenges.
KW - Global-Local
KW - Harmony
KW - Scene Text Editing
UR - https://www.scopus.com/pages/publications/105022651797
U2 - 10.1109/ICME59968.2025.11209904
DO - 10.1109/ICME59968.2025.11209904
M3 - 会议稿件
AN - SCOPUS:105022651797
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Y2 - 30 June 2025 through 4 July 2025
ER -