Skip to main navigation Skip to search Skip to main content

Small target detection in remote sensing images based on multi-scale self-attention aggregation and coordinate attention enhancement

  • Te Qi
  • , Jing Tian
  • , Zhengjun Liu
  • , Hang Chen*
  • *Corresponding author for this work
  • Chinese Academy of Equipment Command and Technology
  • Beijing Institute of Remote Sensing Information

Research output: Contribution to journalArticlepeer-review

Abstract

Small targets in remote sensing images suffer from degraded texture and edges due to network pooling, with further challenges from scale variations and arbitrary orientations. SuperYOLO has two key limitations: its pixel-level symmetric Multi-modal Fusion (MF) module fails to distinguish the channel-wise semantic contributions to small targets; and its CSP modules lack explicit spatial coordinate encoding, leading to feature redundancy and localization deviations. Two improvements in this paper: ① replacing MF with a Multi-scale Self-Attention Aggregation (MSAA) module to enhance key features (e.g., vehicle contours, thermal signals) via channel weight optimization; ② integrating Coordinate Attention (CA) into CSP modules to strengthen position encoding through spatial-feature interaction, improving spatial discrimination under complex backgrounds. Experiments validate effectiveness: 78.9% mAP50 on VEDAI (3.8% higher than SuperYOLO, with trucks/tractors up +18.9%/+5.8%); 61.94 mAP50 on AI-TOD (surpassing FFCA-YOLO/SuperYOLO); 72.83 mAP50 on DOTA (exceeding counterparts). Ablation studies confirm MSAA + CA boosts generalization.

Original languageEnglish
Article number100724
JournalArray
Volume29
DOIs
StatePublished - Mar 2026

Keywords

  • Coordinate attention
  • Multi-scale self-attention aggregation
  • Remote sensing image detection
  • Small target detection

Fingerprint

Dive into the research topics of 'Small target detection in remote sensing images based on multi-scale self-attention aggregation and coordinate attention enhancement'. Together they form a unique fingerprint.

Cite this