Skip to main navigation Skip to search Skip to main content

IAF-RTDETR: Illumination Evaluation-Driven Multimodal Object Detection Network for Infrared–Visible Dual-Source Fusion

  • Harbin Institute of Technology Weihai
  • School of Information Science and Engineering, Harbin Institute of Technology Weihai

Research output: Contribution to journalArticlepeer-review

Abstract

Infrared–visible multimodal object detection has attracted increasing attention for its robustness under challenging conditions such as low illumination, occlusion, and complex backgrounds. However, existing fusion methods often suffer from coarse illumination modeling and insufficient cross-modal semantic alignment, leading to performance degradation in scenes with strong illumination variations or modality imbalance. To address these issues, this paper proposes IAF-RTDETR (Illumination-Aware Fusion RT-DETR), an illumination-aware fusion real-time detection network built upon the RT-DETR framework. The proposed method introduces a progressive fusion pipeline composed of four key modules: (1) a Modality-Specific Feature Enhancer to recalibrate modality-dependent representations and suppress low-quality feature interference; (2) a lightweight Global Light Estimator that learns a continuous illumination score via self-supervised proxy supervision derived from RGB image statistics; (3) a Light-Aware Fusion module that dynamically adjusts multi-scale fusion weights of infrared and visible features according to the estimated illumination; and (4) a Cross-Layer Dual-Branch Interaction Module that alleviates cross-modal semantic shift through bidirectional attention-guided interaction and channel reweighting. Extensive experiments on the M3FD dataset demonstrate that the proposed method achieves consistent performance improvements under diverse lighting conditions, outperforming RGB-only and IR-only baselines by 7.4% and 16.1% in mAP@50, respectively, while maintaining real-time inference speed (≈17.3 ms). Further evaluations on the LLVIP dataset validate the robustness and generalization ability of IAF-RTDETR in real low-illumination scenarios. Moreover, compared with representative multimodal fusion methods such as TFDet and TarDAL, the proposed method achieves superior detection accuracy. Visualization and quantitative semantic consistency analyses further confirm the effectiveness of the proposed illumination-aware fusion and cross-layer interaction mechanisms. These results indicate that IAF-RTDETR provides an effective and practical solution for real-time infrared–visible object detection under complex lighting environments.

Original languageEnglish
Article number1332
JournalElectronics (Switzerland)
Volume15
Issue number6
DOIs
StatePublished - Mar 2026
Externally publishedYes

Keywords

  • illumination perception
  • multimodal fusion
  • transformer-based object detection

Fingerprint

Dive into the research topics of 'IAF-RTDETR: Illumination Evaluation-Driven Multimodal Object Detection Network for Infrared–Visible Dual-Source Fusion'. Together they form a unique fingerprint.

Cite this