Skip to main navigation Skip to search Skip to main content

Dynamics of Masked Image Modeling in Hyperspectral Image Classification

  • Chen Ma
  • , Huayi Li*
  • , Junjun Jiang*
  • , Cesar Aybar
  • , Jiaqi Yao
  • , Gustau Camps-Valls
  • *Corresponding author for this work
  • School of Astronautics, Harbin Institute of Technology
  • School of Computer Science and Technology, Harbin Institute of Technology
  • University of Valencia
  • Tianjin Normal University

Research output: Contribution to journalArticlepeer-review

Abstract

Masked image modeling (MIM), a common self-supervised learning (SSL) technique, has been extensively studied for remote sensing (RS) image processing. Nevertheless, its effectiveness for hyperspectral imagery (HSI) remains underexplored due to the distinct data structures and high dimensionality. This article aims to provide a detailed understanding of MIM from different perspectives of representation learning and statistical analysis for HSI classification tasks. Our study reveals that the MIM paradigm injects inductive bias in the attention mechanism of the transformer model, which is advantageous for capturing the local discrepancies between the spectra. We also show that MIM can increase the diversity of the attention heads in every layer, which is beneficial for the model in extracting more discriminative features from different spectral bands. The similarity of representations from various layers further proves this. Furthermore, our investigation highlights how MIM introduces a dynamic perspective to spectral representations, enabling the model to learn more robust and discriminative features. The final numerical experiments indicate that a moderate mask ratio can enhance the performance of downstream tasks. This suggests that designing a more targeted masking strategy might be necessary to achieve higher and more stable gains in downstream task performance. Without bells and whistles, the vanilla MIM improves the overall classification accuracy by an average of 2.69% over its supervised learning (SL) counterpart. We hope that our findings can advance the understanding of MIM in HSI and inspire the design of a more stable SSL paradigm for HSI processing.

Original languageEnglish
Article number5512815
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Hyperspectral imagery (HSI)
  • masked image modeling (MIM)
  • representation learning
  • self-supervised learning (SSL)

Fingerprint

Dive into the research topics of 'Dynamics of Masked Image Modeling in Hyperspectral Image Classification'. Together they form a unique fingerprint.

Cite this