Abstract
Small targets in remote sensing images suffer from degraded texture and edges due to network pooling, with further challenges from scale variations and arbitrary orientations. SuperYOLO has two key limitations: its pixel-level symmetric Multi-modal Fusion (MF) module fails to distinguish the channel-wise semantic contributions to small targets; and its CSP modules lack explicit spatial coordinate encoding, leading to feature redundancy and localization deviations. Two improvements in this paper: ① replacing MF with a Multi-scale Self-Attention Aggregation (MSAA) module to enhance key features (e.g., vehicle contours, thermal signals) via channel weight optimization; ② integrating Coordinate Attention (CA) into CSP modules to strengthen position encoding through spatial-feature interaction, improving spatial discrimination under complex backgrounds. Experiments validate effectiveness: 78.9% mAP50 on VEDAI (3.8% higher than SuperYOLO, with trucks/tractors up +18.9%/+5.8%); 61.94 mAP50 on AI-TOD (surpassing FFCA-YOLO/SuperYOLO); 72.83 mAP50 on DOTA (exceeding counterparts). Ablation studies confirm MSAA + CA boosts generalization.
| Original language | English |
|---|---|
| Article number | 100724 |
| Journal | Array |
| Volume | 29 |
| DOIs | |
| State | Published - Mar 2026 |
Keywords
- Coordinate attention
- Multi-scale self-attention aggregation
- Remote sensing image detection
- Small target detection
Fingerprint
Dive into the research topics of 'Small target detection in remote sensing images based on multi-scale self-attention aggregation and coordinate attention enhancement'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver