Abstract
Language-guided image inpainting aims to fill the defective regions of an image under the guidance of text while keeping the non-defective regions unchanged. However, directly encoding the defective images is prone to have an adverse effect on the non-defective regions, giving rise to distorted structures on non-defective parts. To better adapt the text guidance to the inpainting task, this paper proposes NÜWA-LIP, which involves defect-free VQGAN (DF-VQGAN) and a multi-perspective sequence-to-sequence module (MP-S2S). To be specific, DF-VQGAN introduces relative estimation to carefully control the receptive spreading, as well as symmetrical connections to protect structure details unchanged. For harmoniously embedding text guidance into the locally defective regions, MP-S2S is employed by aggregating the complementary perspectives from low-level pixels, high-level tokens as well as the text description. Experiments show that our DF-VQGAN effectively aids the inpainting process while avoiding unexpected changes in non-defective regions. Results on three open-domain benchmarks demonstrate the superior performance of our method against state-of-the-arts. Our code, datasets, and model will be made publicly available.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 |
| Publisher | IEEE Computer Society |
| Pages | 14183-14192 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798350301298 |
| ISBN (Print) | 9798350301298 |
| DOIs | |
| State | Published - 2023 |
| Event | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, Canada Duration: 18 Jun 2023 → 22 Jun 2023 |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| Volume | 2023-June |
| ISSN (Print) | 1063-6919 |
Conference
| Conference | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 |
|---|---|
| Country/Territory | Canada |
| City | Vancouver |
| Period | 18/06/23 → 22/06/23 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Keywords
- Multi-modal learning
Fingerprint
Dive into the research topics of 'NÜWA-LIP: Language-guided Image Inpainting with Defect-free VQGAN'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver