Skip to main navigation Skip to search Skip to main content

RingMamba: Remote Sensing Multisensor Pretraining With Visual State Space Model

  • Peijin Wang
  • , Hao Chang
  • , Huiyang Hu
  • , Xin Li
  • , Xiaorui Liu
  • , Yu Liu
  • , Zhaolong Zhang
  • , Chen Chen
  • , Yundu Li
  • , Yingchao Feng
  • , Wenhui Diao
  • , Qingfang Zheng
  • , Yaowei Wang*
  • , Xian Sun*
  • *Corresponding author for this work
  • CAS - Aerospace Information Research Institute
  • University of Chinese Academy of Sciences
  • Pengcheng Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

Previous studies on remote sensing foundation models have demonstrated the representational ability of convolutional neural networks (CNNs) and vision transformers (ViTs). However, these models either focus on single-sensor data, thereby overlooking complementary information across sensors, or they increase the computational complexity in pursuit of a global receptive field. In this article, we introduce RingMamba, a novel multisensor remote sensing (RS) foundation model that excels in both performance and efficiency. To achieve unified representation learning for multisensor data, we propose a multisensor self-supervised pretraining framework that integrates generative and contrastive learning strategies based on texture feature constraints, enabling unsupervised training of massive multisensor data. Considering the multiangle and multisize distribution of instances, we also propose an asymmetric scan and scan couple (ASSC) block to achieve multidirectional scanning and recognition. We validate the performance of the model on multimodal semantic segmentation, scene classification, object detection, and change detection tasks using nine public datasets. The experimental results show leading performance across these tasks. In addition, we evaluate the computational consumption of the model for classification tasks under different input sizes. The results show that our model maintains leading performance with higher throughput and lower FLOPs, especially under high-resolution inputs, making it more suitable for the remote sensing field.

Original languageEnglish
Article number5640316
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Foundation model
  • Mamba
  • multisensor
  • remote sensing (RS)
  • self-supervised learning

Fingerprint

Dive into the research topics of 'RingMamba: Remote Sensing Multisensor Pretraining With Visual State Space Model'. Together they form a unique fingerprint.

Cite this