Abstract
Image fusion integrates multiple images of the same scene into a single enhanced image, improving visual clarity and supporting high-level vision tasks. Existing infrared–visible image fusion methods, while increasing semantic detail, rely heavily on labeled data, limiting flexibility and failing to capture unique features of different objects across modalities—features critical for human perception. These limitations hinder effective adaptive fusion. To address this, we propose SPGFusion: a Semantic Prior Guided infrared and visible image Fusion method without manual annotations. SPGFusion utilizes the global semantic alignment capability of CLIP, which associates visual features with human natural-language knowledge, enabling comprehensive understanding of the global semantic structures of images. Concurrently, DINO's ability to cluster semantically similar features captures fine-grained local semantic details. The complementary combination of global and local semantic priors enables the model to achieve a comprehensive, detailed, and label-free semantic understanding of the source images, effectively overcoming the annotation dependency issue encountered by existing methods. These priors guide the fusion process through a specially designed Semantic Adaptive Fusion Network, enabling adaptive, semantically-aware fusion that highlights modality-specific features. Finally, a visual feature decoder synthesizes the fused image, capturing critical semantic details from each source. By leveraging robust, label-free semantic priors, SPGFusion gains a deeper understanding of infrared and visible source images, allowing adaptive fusion of essential features across modalities. Extensive evaluations on public datasets demonstrate that SPGFusion outperforms current state-of-the-art methods in both visual quality and semantic accuracy. The source code is available at https://github.com/Huiqin-Zhang/SPGFusion.
| Original language | English |
|---|---|
| Article number | 103433 |
| Journal | Information Fusion |
| Volume | 125 |
| DOIs | |
| State | Published - Jan 2026 |
| Externally published | Yes |
Keywords
- CLIP
- DINO
- Image fusion
- Semantic adaptive fusion network
- Semantic priors
Fingerprint
Dive into the research topics of 'SPGFusion: Semantic Prior Guided Infrared and visible image fusion via pretrained vision models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver