TY - GEN
T1 - PROMOTE
T2 - 32nd ACM International Conference on Multimedia, MM 2024
AU - Zhong, Guojin
AU - Guo, Yihu
AU - Yuan, Jin
AU - Zhang, Qianjun
AU - Guan, Weili
AU - Chen, Long
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - Exemplar-based image translation has garnered significant interest from researchers due to its broad applications in multimedia/multimodal processing. Existing methods primarily employ Euclidean-based losses to implicitly establish cross-domain correspondences between exemplar and conditional images, aiming to produce high-fidelity images. However, these methods often suffer from two challenges: 1) Insufficient excavation of domain-invariant features leads to low-quality cross-domain correspondences, and 2) Inaccurate correspondences result in errors propagated during the translation process due to a lack of reliable prior guidance. To tackle these issues, we propose a novel prior-guided diffusion model with global-local contrastive learning (PROMOTE), which is trained in a self-supervised manner. Technically, global-local contrastive learning is designed to align two cross-domain images within hyperbolic space and reduce the gap between their semantic correlation distributions using the Fisher-Rao metric, allowing the visual encoders to extract domain-invariant features more effectively. Moreover, a prior-guided diffusion model is developed that propagates the structural prior to all timesteps in the diffusion process. It is optimized by a novel prior denoising loss, mathematically derived from the transitions modified by prior information in a self-supervised manner, successfully alleviating the impact of inaccurate correspondences on image translation. Extensive experiments conducted across seven datasets demonstrate that our proposed PROMOTE significantly exceeds state-of-the-art performance in diverse exemplar-based image translation tasks. The source code is publicly available at http://github.com/zgj77/PROMOTE.
AB - Exemplar-based image translation has garnered significant interest from researchers due to its broad applications in multimedia/multimodal processing. Existing methods primarily employ Euclidean-based losses to implicitly establish cross-domain correspondences between exemplar and conditional images, aiming to produce high-fidelity images. However, these methods often suffer from two challenges: 1) Insufficient excavation of domain-invariant features leads to low-quality cross-domain correspondences, and 2) Inaccurate correspondences result in errors propagated during the translation process due to a lack of reliable prior guidance. To tackle these issues, we propose a novel prior-guided diffusion model with global-local contrastive learning (PROMOTE), which is trained in a self-supervised manner. Technically, global-local contrastive learning is designed to align two cross-domain images within hyperbolic space and reduce the gap between their semantic correlation distributions using the Fisher-Rao metric, allowing the visual encoders to extract domain-invariant features more effectively. Moreover, a prior-guided diffusion model is developed that propagates the structural prior to all timesteps in the diffusion process. It is optimized by a novel prior denoising loss, mathematically derived from the transitions modified by prior information in a self-supervised manner, successfully alleviating the impact of inaccurate correspondences on image translation. Extensive experiments conducted across seven datasets demonstrate that our proposed PROMOTE significantly exceeds state-of-the-art performance in diverse exemplar-based image translation tasks. The source code is publicly available at http://github.com/zgj77/PROMOTE.
KW - contrastive learning
KW - diffusion model
KW - exemplar-based imgae translation
KW - prior
UR - https://www.scopus.com/pages/publications/85209789185
U2 - 10.1145/3664647.3680928
DO - 10.1145/3664647.3680928
M3 - 会议稿件
AN - SCOPUS:85209789185
T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
SP - 3313
EP - 3322
BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 28 October 2024 through 1 November 2024
ER -