Skip to main navigation Skip to search Skip to main content

Adaptive Spatial Tokenization Transformer for Salient Object Detection in Optical Remote Sensing Images

  • Lina Gao
  • , Bing Liu*
  • , Ping Fu
  • , Mingzhu Xu
  • *Corresponding author for this work
  • School of Electronics and Information Engineering, Harbin Institute of Technology
  • Shandong University

Research output: Contribution to journalArticlepeer-review

Abstract

Convolutional neural network (CNN)-based salient object detection (SOD) models have achieved promising performance in optical remote sensing images (ORSIs) in recent years. However, the restriction concerning the local sliding window operation of CNN has caused many existing CNN-based ORSI SOD models to still struggle with learning long-range relationships. To this end, a novel transformer framework is proposed for ORSI SOD, which is inspired by the powerful global dependency relationships of transformer networks. This is the first attempt to explore global and local details using transformer architecture for SOD in ORSIs. Concretely, we design an adaptive spatial tokenization transformer encoder to extract global-local features, which can accurately sparsify tokens for each input image and achieve competitive performance in ORSI SOD tasks. Then, a specific dense token aggregation decoder (DTAD) is proposed to generate saliency results, including three cascade decoders to integrate the global-local tokens and contextual dependencies. Extensive experiments indicate that the proposed model greatly surpasses 20 state-of-the-art (SOTA) SOD approaches on two standard ORSI SOD datasets under seven evaluation metrics. We also report comparison results to demonstrate the generalization capacity on the latest challenging ORSI datasets. In addition, we validate the contributions of different modules through a series of ablation analyses, especially the proposed adaptive spatial tokenization module (ASTM), which can halve the computational budget.

Original languageEnglish
Article number5602915
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume61
DOIs
StatePublished - 2023
Externally publishedYes

Keywords

  • Adaptive tokenization
  • optical remote sensing images (ORSIs)
  • salient object detection (SOD)
  • transformer

Fingerprint

Dive into the research topics of 'Adaptive Spatial Tokenization Transformer for Salient Object Detection in Optical Remote Sensing Images'. Together they form a unique fingerprint.

Cite this