TY - GEN
T1 - A Collaborative Control Method for Spacecraft Clusters Based on Multi Agent Reinforcement Learning
AU - Liang, Xi
AU - Wei, Cheng
AU - Zhao, Jianbo
AU - Wang, Peng
AU - Cheng, Zihao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Cluster intelligence refers to the emergence of collective behavior, such as collaborative detection, which compensates for individual limitations and accomplishes complex tasks through effective coordination among intelligent agents. The distributed strategy necessitates high autonomy for each spacecraft, with communication connections between adjacent spacecraft enabling state exchange. Firstly, the composition of the spacecraft cluster detection system described in this article is introduced. Then, a multi-agent reinforcement learning algorithm is introduced to address the aforementioned multivariable sequence decision-making problem. The entire sequence decision problem is divided into multiple time steps for multi-agent reinforcement learning modeling. Agents interact with the environment and receive reward feedback from it. After adopting the Actor Critic algorithm, each agent's optimization goal is to maximize their cumulative expected reward. The actor aims to learn the agent's strategy function and maximize expected cumulative rewards, while critics learn a value function to evaluate current state value and guide actor strategy optimization. Finally, scenario design rules and reward settings are based on collaborative target detection by search and tracking spacecrafts. This enables collaborative control of both types of spacecrafts, achieving 35 successful target tracks in line with task requirements.
AB - Cluster intelligence refers to the emergence of collective behavior, such as collaborative detection, which compensates for individual limitations and accomplishes complex tasks through effective coordination among intelligent agents. The distributed strategy necessitates high autonomy for each spacecraft, with communication connections between adjacent spacecraft enabling state exchange. Firstly, the composition of the spacecraft cluster detection system described in this article is introduced. Then, a multi-agent reinforcement learning algorithm is introduced to address the aforementioned multivariable sequence decision-making problem. The entire sequence decision problem is divided into multiple time steps for multi-agent reinforcement learning modeling. Agents interact with the environment and receive reward feedback from it. After adopting the Actor Critic algorithm, each agent's optimization goal is to maximize their cumulative expected reward. The actor aims to learn the agent's strategy function and maximize expected cumulative rewards, while critics learn a value function to evaluate current state value and guide actor strategy optimization. Finally, scenario design rules and reward settings are based on collaborative target detection by search and tracking spacecrafts. This enables collaborative control of both types of spacecrafts, achieving 35 successful target tracks in line with task requirements.
KW - Actor Critic Algorithm
KW - Cluster Collaborative Control
KW - Multi-Agent
KW - Scene Simulation
UR - https://www.scopus.com/pages/publications/105006423522
U2 - 10.1007/978-981-96-2200-9_50
DO - 10.1007/978-981-96-2200-9_50
M3 - 会议稿件
AN - SCOPUS:105006423522
SN - 9789819621996
T3 - Lecture Notes in Electrical Engineering
SP - 517
EP - 526
BT - Advances in Guidance, Navigation and Control - Proceedings of 2024 International Conference on Guidance, Navigation and Control Volume 1
A2 - Yan, Liang
A2 - Duan, Haibin
A2 - Deng, Yimin
PB - Springer Science and Business Media Deutschland GmbH
T2 - International Conference on Guidance, Navigation and Control, ICGNC 2024
Y2 - 9 August 2024 through 11 August 2024
ER -