Abstract
Safe and efficient cooperative flight in cluttered, dynamic multi-target environments requires UAV swarms to repeatedly reassign targets while maintaining collision-free trajectories under partial observability. Existing methods often decouple target assignment and path planning into hand-engineered pipelines, which are effective in static, fully known settings but become brittle when targets move and perception is uncertain, leading to reduced cooperation efficiency and poor adaptability. This paper proposes DA-MAPPO, a dynamic assignment-aware multi-agent reinforcement learning framework built on PPO. DA-MAPPO integrates an online minimum-cost target allocation module into a unified perception–allocation–decision-making loop: real-time assignment outcomes are embedded into each agent’s local observation to form assignment-augmented states, enabling step-by-step coordination reconfiguration. In addition, a hierarchical cooperative reward is designed to jointly encourage individual safety (collision avoidance) and team-level mission efficiency. High-fidelity simulation results in Gazebo show that DA-MAPPO achieves 90%–99% mission success across dynamic multi-target scenarios with varying obstacle densities, outperforming representative baselines by up to 25 percentage points. When transferring from static to dynamic targets, DA-MAPPO exhibits negligible degradation in low- and medium-density environments and only about a 2% drop in the densest setting, while baselines degrade substantially. DA-MAPPO also produces shorter average trajectories and fewer decision steps, indicating higher mission efficiency with lower collision risk.
| Original language | English |
|---|---|
| Journal | IEEE Internet of Things Journal |
| DOIs | |
| State | Accepted/In press - 2026 |
| Externally published | Yes |
Keywords
- Cooperative Navigation
- Dynamic Target Allocation
- Multi-Agent Reinforcement Learning
- UAV Swarms
Fingerprint
Dive into the research topics of 'Dynamic Target Assignment and Cooperative Decision-Making for UAV Swarms Based on Multi-Agent Reinforcement Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver