Abstract
Infrared–visible multimodal object detection has attracted increasing attention for its robustness under challenging conditions such as low illumination, occlusion, and complex backgrounds. However, existing fusion methods often suffer from coarse illumination modeling and insufficient cross-modal semantic alignment, leading to performance degradation in scenes with strong illumination variations or modality imbalance. To address these issues, this paper proposes IAF-RTDETR (Illumination-Aware Fusion RT-DETR), an illumination-aware fusion real-time detection network built upon the RT-DETR framework. The proposed method introduces a progressive fusion pipeline composed of four key modules: (1) a Modality-Specific Feature Enhancer to recalibrate modality-dependent representations and suppress low-quality feature interference; (2) a lightweight Global Light Estimator that learns a continuous illumination score via self-supervised proxy supervision derived from RGB image statistics; (3) a Light-Aware Fusion module that dynamically adjusts multi-scale fusion weights of infrared and visible features according to the estimated illumination; and (4) a Cross-Layer Dual-Branch Interaction Module that alleviates cross-modal semantic shift through bidirectional attention-guided interaction and channel reweighting. Extensive experiments on the M3FD dataset demonstrate that the proposed method achieves consistent performance improvements under diverse lighting conditions, outperforming RGB-only and IR-only baselines by 7.4% and 16.1% in mAP@50, respectively, while maintaining real-time inference speed (≈17.3 ms). Further evaluations on the LLVIP dataset validate the robustness and generalization ability of IAF-RTDETR in real low-illumination scenarios. Moreover, compared with representative multimodal fusion methods such as TFDet and TarDAL, the proposed method achieves superior detection accuracy. Visualization and quantitative semantic consistency analyses further confirm the effectiveness of the proposed illumination-aware fusion and cross-layer interaction mechanisms. These results indicate that IAF-RTDETR provides an effective and practical solution for real-time infrared–visible object detection under complex lighting environments.
| Original language | English |
|---|---|
| Article number | 1332 |
| Journal | Electronics (Switzerland) |
| Volume | 15 |
| Issue number | 6 |
| DOIs | |
| State | Published - Mar 2026 |
| Externally published | Yes |
Keywords
- illumination perception
- multimodal fusion
- transformer-based object detection
Fingerprint
Dive into the research topics of 'IAF-RTDETR: Illumination Evaluation-Driven Multimodal Object Detection Network for Infrared–Visible Dual-Source Fusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver