Skip to main navigation Skip to search Skip to main content

PESAT: A Parameter-Efficient Spatiotemporal Adapter Tuning Framework for Satellite Video Scene Classification

  • School of Electronics and Information Engineering, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Satellite video scene classification (SVSC) is a critical task for dynamic earth observation. However, it remains challenging due to the distinct spatiotemporal characteristics of satellite videos and the scarcity of annotated data, both of which differ significantly from general video datasets. Although vision transformers (ViTs) have shown strong performance in general video classification, their direct application to SVSC often results in suboptimal performance and overfitting. To address these challenges, we propose PESAT, a novel parameter-efficient spatiotemporal adapter tuning framework specifically tailored for SVSC tasks. PESAT enables the effective adaptation of pretrained ViTs for SVSC by keeping the backbone model largely frozen and fine-tuning only a small number of strategically inserted adapter modules. Our framework incorporates three key innovations: an efficient temporal attention modeling (TAM) mechanism that reuses pretrained self-attention weights for temporal feature extraction without adding new parameters; a sensitivity-guided adapter insertion strategy that identifies optimal locations within the ViT to place adapters, maximizing their impact; and a hybrid gated adapter (HGA) module, which combines depthwise convolution and a dynamic gating mechanism to capture complex spatiotemporal contexts specific to satellite video data. Experimental results demonstrate the superior performance of the proposed method.

Original languageEnglish
Article number5644912
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume63
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Parameter-efficient fine-tuning (PEFT)
  • satellite video
  • scene classification
  • video classification

Fingerprint

Dive into the research topics of 'PESAT: A Parameter-Efficient Spatiotemporal Adapter Tuning Framework for Satellite Video Scene Classification'. Together they form a unique fingerprint.

Cite this