Skip to main navigation Skip to search Skip to main content

SGAFuse: Semantic-guided adaptive fusion for RGB-thermal images via dynamic gating

  • Chao Yang*
  • , Deshui Miao*
  • , Chao Tian
  • , Guoqing Zhu
  • , Zhenyu He*
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Visible and infrared image fusion aims to integrate complementary information from both modalities to produce high-quality fused images that enhance downstream computer vision tasks. However, existing fusion methods tend to use equal weight to fuse images, ignoring the different importance of the two modalities in spatial distribution and the need for dynamic adjustment of fusion methods in different scenarios. To address these limitations, we propose a novel RGB-T fusion method via semantic-guided attention and a dynamic gate mechanism to improve the robustness towards different scenarios. Specifically, by incorporating semantic attention importance maps, we propose a dual-modal semantic-driven feature alignment module that comprises a cross-modal query compensation module and an intra-modal query enhancement module, which explore the varying significance of different spatial regions in the two input images. Subsequently, we introduce a dynamic multi-path gating mechanism that enables the network to adjust the weights of each module according to the input of different scenarios, which ultimately improves the robustness of the fusion algorithm across various scenarios. Comprehensive experiments conducted on four benchmark datasets demonstrate that our approach achieves state-of-the-art performance in both qualitative and quantitative evaluations.

Original languageEnglish
Article number108779
JournalNeural Networks
Volume200
DOIs
StatePublished - Aug 2026
Externally publishedYes

Keywords

  • CLIP model
  • Feature enhancement
  • Saliency preservation
  • Visible and infrared image fusion
  • Vision transformer

Fingerprint

Dive into the research topics of 'SGAFuse: Semantic-guided adaptive fusion for RGB-thermal images via dynamic gating'. Together they form a unique fingerprint.

Cite this