Skip to main navigation Skip to search Skip to main content

PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation

  • Guojin Zhong
  • , Yihu Guo
  • , Jin Yuan*
  • , Qianjun Zhang*
  • , Weili Guan
  • , Long Chen
  • *Corresponding author for this work
  • Hunan University
  • Southwest Jiaotong University
  • Harbin Institute of Technology Shenzhen
  • Hong Kong University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Exemplar-based image translation has garnered significant interest from researchers due to its broad applications in multimedia/multimodal processing. Existing methods primarily employ Euclidean-based losses to implicitly establish cross-domain correspondences between exemplar and conditional images, aiming to produce high-fidelity images. However, these methods often suffer from two challenges: 1) Insufficient excavation of domain-invariant features leads to low-quality cross-domain correspondences, and 2) Inaccurate correspondences result in errors propagated during the translation process due to a lack of reliable prior guidance. To tackle these issues, we propose a novel prior-guided diffusion model with global-local contrastive learning (PROMOTE), which is trained in a self-supervised manner. Technically, global-local contrastive learning is designed to align two cross-domain images within hyperbolic space and reduce the gap between their semantic correlation distributions using the Fisher-Rao metric, allowing the visual encoders to extract domain-invariant features more effectively. Moreover, a prior-guided diffusion model is developed that propagates the structural prior to all timesteps in the diffusion process. It is optimized by a novel prior denoising loss, mathematically derived from the transitions modified by prior information in a self-supervised manner, successfully alleviating the impact of inaccurate correspondences on image translation. Extensive experiments conducted across seven datasets demonstrate that our proposed PROMOTE significantly exceeds state-of-the-art performance in diverse exemplar-based image translation tasks. The source code is publicly available at http://github.com/zgj77/PROMOTE.

Original languageEnglish
Title of host publicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages3313-3322
Number of pages10
ISBN (Electronic)9798400706868
DOIs
StatePublished - 28 Oct 2024
Externally publishedYes
Event32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024

Publication series

NameMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia, MM 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Keywords

  • contrastive learning
  • diffusion model
  • exemplar-based imgae translation
  • prior

Fingerprint

Dive into the research topics of 'PROMOTE: Prior-Guided Diffusion Model with Global-Local Contrastive Learning for Exemplar-Based Image Translation'. Together they form a unique fingerprint.

Cite this