Abstract
Visible and infrared image fusion aims to integrate complementary information from both modalities to produce high-quality fused images that enhance downstream computer vision tasks. However, existing fusion methods tend to use equal weight to fuse images, ignoring the different importance of the two modalities in spatial distribution and the need for dynamic adjustment of fusion methods in different scenarios. To address these limitations, we propose a novel RGB-T fusion method via semantic-guided attention and a dynamic gate mechanism to improve the robustness towards different scenarios. Specifically, by incorporating semantic attention importance maps, we propose a dual-modal semantic-driven feature alignment module that comprises a cross-modal query compensation module and an intra-modal query enhancement module, which explore the varying significance of different spatial regions in the two input images. Subsequently, we introduce a dynamic multi-path gating mechanism that enables the network to adjust the weights of each module according to the input of different scenarios, which ultimately improves the robustness of the fusion algorithm across various scenarios. Comprehensive experiments conducted on four benchmark datasets demonstrate that our approach achieves state-of-the-art performance in both qualitative and quantitative evaluations.
| Original language | English |
|---|---|
| Article number | 108779 |
| Journal | Neural Networks |
| Volume | 200 |
| DOIs | |
| State | Published - Aug 2026 |
| Externally published | Yes |
Keywords
- CLIP model
- Feature enhancement
- Saliency preservation
- Visible and infrared image fusion
- Vision transformer
Fingerprint
Dive into the research topics of 'SGAFuse: Semantic-guided adaptive fusion for RGB-thermal images via dynamic gating'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver