Skip to main navigation Skip to search Skip to main content

Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

  • Jialei Xu
  • , Rui Li
  • , Kai Cheng
  • , Junjun Jiang
  • , Xianming Liu*
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • Northwestern Polytechnical University Xian
  • University of Science and Technology of China
  • Peng Cheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Monocular depth estimation from RGB images plays a pivotal role in 3D vision. However, its accuracy can deteriorate in challenging environments such as nighttime or adverse weather conditions. While long-wave infrared cameras offer stable imaging in such challenging conditions, they are inherently low-resolution, lacking rich texture and semantics as delivered by the RGB image. Current methods focus solely on a single modality due to the difficulties to identify and integrate faithful depth cues from both sources. To address these issues, this paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. Concretely, we independently compute the coarse depth maps with separate networks by fully utilizing the individual depth cues from each modality. As the advantageous depth spreads across both modalities, we propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner. Harnessing the proposed pipeline, our method demonstrates the ability of robust depth estimation in a variety of difficult scenarios. Experimental results on the challenging MS2 and ViViD++ datasets demonstrate the effectiveness and robustness of our method.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Robotics and Automation, ICRA 2025
EditorsChristian Ott, Henny Admoni, Sven Behnke, Stjepan Bogdan, Aude Bolopion, Youngjin Choi, Fanny Ficuciello, Nicholas Gans, Clement Gosselin, Kensuke Harada, Erdal Kayacan, H. Jin Kim, Stefan Leutenegger, Zhe Liu, Perla Maiolino, Lino Marques, Takamitsu Matsubara, Anastasia Mavromatti, Mark Minor, Jason O'Kane, Hae Won Park, Hae-Won Park, Ioannis Rekleitis, Federico Renda, Elisa Ricci, Laurel D. Riek, Lorenzo Sabattini, Shaojie Shen, Yu Sun, Pierre-Brice Wieber, Katsu Yamane, Jingjin Yu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6283-6290
Number of pages8
ISBN (Electronic)9798331541392
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Robotics and Automation, ICRA 2025 - Atlanta, United States
Duration: 19 May 202523 May 2025

Publication series

NameProceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)1050-4729

Conference

Conference2025 IEEE International Conference on Robotics and Automation, ICRA 2025
Country/TerritoryUnited States
CityAtlanta
Period19/05/2523/05/25

Fingerprint

Dive into the research topics of 'Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios'. Together they form a unique fingerprint.

Cite this