Skip to main navigation Skip to search Skip to main content

Parameter-efficient multimodal adaptation for adverse condition depth estimation

  • Harbin Institute of Technology
  • Inner Mongolia University
  • Hefei University of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Adverse Condition Depth Estimation (ACDE) has emerged as a critical task for enabling robust perception in these scenarios, yet existing approaches face significant limitations. Traditional methods relying on generative models often require auxiliary target-domain images or computationally intensive domain adaptation modules, which increase deployment complexity and hinder real-world applicability. Furthermore, unlike vision-language models CLIP-where textual and visual features are inherently aligned-depth estimation frameworks lack explicit mechanisms to algin multimodal features. To address these challenges, we propose Parameter-Efficient Multimodal Adaptation (PEMA) including Prompt-Driven Domain Alignment (PDDA) module and Visual-Text Consistent Contrastive Learning (VTCCL) module for ACDE. Specifically, PDDA module injects low-rank decomposition matrices into self-attention layers in the image encoder of the depth estimator and is optimized by a novel language-image discrepancy equivalence loss. PDDA ensures semantic shifts in text prompts mirror visual feature shifts, capturing unseen target-domain visual representations under adverse conditions without requiring target-domain images. Moreover, VTCCL module bridges diffusion model visual features and CLIP text embeddings through hierarchical consistency constraints. VTCCL module employs cross-modal contrastive alignment to cluster vision-text pairs of same weather condition while dispersing mismatched pairs, alongside intra-modal consistency objectives to distinguish fine-grained weather variations within each modality. Through extensive experiments, PEMA achieves SOTA performance on nuScenes and Oxford RobotCar datasets e. g. 79.96 % on nuScenes-night, 95.37 % on nuScenes-rain and 89.33 % on RobotCar-night. Moreover, PEMA gains performance improvements 1.44 % in d1 on CityScapes-foggy compared with the baseline depth estimator. The code will be released soon.

Original languageEnglish
Article number130600
JournalExpert Systems with Applications
Volume304
DOIs
StatePublished - 1 Apr 2026

Keywords

  • Adverse condition depth estimation
  • Cross-modal alignment
  • Low-rank decomposition matrices

Fingerprint

Dive into the research topics of 'Parameter-efficient multimodal adaptation for adverse condition depth estimation'. Together they form a unique fingerprint.

Cite this