Skip to main navigation Skip to search Skip to main content

SPGFusion: Semantic Prior Guided Infrared and visible image fusion via pretrained vision models

  • Huiqin Zhang
  • , Shihan Yao
  • , Jiayi Ma
  • , Junjun Jiang
  • , Yanduo Zhang
  • , Huabing Zhou*
  • *Corresponding author for this work
  • Wuhan Institute of Technology
  • Wuhan University
  • School of Artificial Intelligence, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Image fusion integrates multiple images of the same scene into a single enhanced image, improving visual clarity and supporting high-level vision tasks. Existing infrared–visible image fusion methods, while increasing semantic detail, rely heavily on labeled data, limiting flexibility and failing to capture unique features of different objects across modalities—features critical for human perception. These limitations hinder effective adaptive fusion. To address this, we propose SPGFusion: a Semantic Prior Guided infrared and visible image Fusion method without manual annotations. SPGFusion utilizes the global semantic alignment capability of CLIP, which associates visual features with human natural-language knowledge, enabling comprehensive understanding of the global semantic structures of images. Concurrently, DINO's ability to cluster semantically similar features captures fine-grained local semantic details. The complementary combination of global and local semantic priors enables the model to achieve a comprehensive, detailed, and label-free semantic understanding of the source images, effectively overcoming the annotation dependency issue encountered by existing methods. These priors guide the fusion process through a specially designed Semantic Adaptive Fusion Network, enabling adaptive, semantically-aware fusion that highlights modality-specific features. Finally, a visual feature decoder synthesizes the fused image, capturing critical semantic details from each source. By leveraging robust, label-free semantic priors, SPGFusion gains a deeper understanding of infrared and visible source images, allowing adaptive fusion of essential features across modalities. Extensive evaluations on public datasets demonstrate that SPGFusion outperforms current state-of-the-art methods in both visual quality and semantic accuracy. The source code is available at https://github.com/Huiqin-Zhang/SPGFusion.

Original languageEnglish
Article number103433
JournalInformation Fusion
Volume125
DOIs
StatePublished - Jan 2026
Externally publishedYes

Keywords

  • CLIP
  • DINO
  • Image fusion
  • Semantic adaptive fusion network
  • Semantic priors

Fingerprint

Dive into the research topics of 'SPGFusion: Semantic Prior Guided Infrared and visible image fusion via pretrained vision models'. Together they form a unique fingerprint.

Cite this