Abstract
Adverse Condition Depth Estimation (ACDE) has emerged as a critical task for enabling robust perception in these scenarios, yet existing approaches face significant limitations. Traditional methods relying on generative models often require auxiliary target-domain images or computationally intensive domain adaptation modules, which increase deployment complexity and hinder real-world applicability. Furthermore, unlike vision-language models CLIP-where textual and visual features are inherently aligned-depth estimation frameworks lack explicit mechanisms to algin multimodal features. To address these challenges, we propose Parameter-Efficient Multimodal Adaptation (PEMA) including Prompt-Driven Domain Alignment (PDDA) module and Visual-Text Consistent Contrastive Learning (VTCCL) module for ACDE. Specifically, PDDA module injects low-rank decomposition matrices into self-attention layers in the image encoder of the depth estimator and is optimized by a novel language-image discrepancy equivalence loss. PDDA ensures semantic shifts in text prompts mirror visual feature shifts, capturing unseen target-domain visual representations under adverse conditions without requiring target-domain images. Moreover, VTCCL module bridges diffusion model visual features and CLIP text embeddings through hierarchical consistency constraints. VTCCL module employs cross-modal contrastive alignment to cluster vision-text pairs of same weather condition while dispersing mismatched pairs, alongside intra-modal consistency objectives to distinguish fine-grained weather variations within each modality. Through extensive experiments, PEMA achieves SOTA performance on nuScenes and Oxford RobotCar datasets e. g. 79.96 % on nuScenes-night, 95.37 % on nuScenes-rain and 89.33 % on RobotCar-night. Moreover, PEMA gains performance improvements 1.44 % in d1 on CityScapes-foggy compared with the baseline depth estimator. The code will be released soon.
| Original language | English |
|---|---|
| Article number | 130600 |
| Journal | Expert Systems with Applications |
| Volume | 304 |
| DOIs | |
| State | Published - 1 Apr 2026 |
Keywords
- Adverse condition depth estimation
- Cross-modal alignment
- Low-rank decomposition matrices
Fingerprint
Dive into the research topics of 'Parameter-efficient multimodal adaptation for adverse condition depth estimation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver