Abstract
The fusion of hyperspectral image (HSI) and light detection and ranging (LiDAR) data leverages complementary spectral and elevation information to improve land cover classification, yet its effectiveness is hindered by heterogeneous data distributions and a common oversight of frequency-domain information. To address this, we propose a novel multimodal spatial–frequency fusion and refinement network (MSFFRN). The framework is designed to holistically capture and fuse multimodal features. It employs a dual-branch architecture that integrates wavelet-based frequency decomposition with convolutional neural network (CNN)-based spatial encoding for complementary feature extraction. Furthermore, a hierarchical fusion strategy using adaptive gating and cross-attention mechanisms dynamically integrates cross-modality and cross-domain information, while a dedicated feature refinement module (FRM) enhances robustness against environmental distortions and preserves structural features. Extensive experiments on the Houston 2013 and MUUFL datasets demonstrate that MSFFRN achieves state-of-the-art performance, increasing the overall accuracy (OA) by 2.66% and 1.32% on the respective datasets.
| Original language | English |
|---|---|
| Article number | 5501905 |
| Journal | IEEE Geoscience and Remote Sensing Letters |
| Volume | 23 |
| DOIs | |
| State | Published - 2026 |
| Externally published | Yes |
Keywords
- Classification
- hyperspectral image (HSI)
- light detection and ranging (LiDAR) data
- multimodal
- wavelet transform
Fingerprint
Dive into the research topics of 'MSFFRN: Multimodal Spatial–Frequency Fusion and Refinement Network for Hyperspectral and LiDAR Data Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver