Abstract
Incorporating both RGB and depth images has proven effective for enhancing the performance of semantic segmentation. However, current RGB-D semantic segmentation methods tend to overlook the critical role of cross-modal difference information during fusion, leading to the undesired suppression of discriminative cues and a failure to achieve potent cross-modal complementary fusion. In this article, a novel RGB-D semantic segmentation approach that realizes the efficient utilization of multimodal information is proposed. To address the issue of the suppression of cross-modal difference information, we propose a dynamic frequency-spatial difference-aware fusion module adept at explicitly emphasizing cross-modal differences, capturing vital features in the frequency domain, and using them to aggregate spatial context information of multimodal features. We also present a novel soft-edge loss to meticulously handle complex scenes by supervising different regions respectively. In addition, a progressive calibration context module is designed to enhance global contextual information by capturing multiscale multimodal representations. Extensive experiments on two public RGB-D datasets demonstrate that the proposed DFNet achieves highly competitive performance compared to state-of-the-art methods, making it well-suited for assisting indoor robots.
| Original language | English |
|---|---|
| Pages (from-to) | 7424-7434 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Industrial Informatics |
| Volume | 21 |
| Issue number | 10 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Cross-modal difference
- RGB-D fusion
- indoor scene
- multimodal semantic segmentation
Fingerprint
Dive into the research topics of 'Difference-Aware Fusion Network for Efficient RGB-D Semantic Segmentation in Indoor Robots'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver