TY - GEN
T1 - Flocking control of UAV swarms with deep reinforcement leaming approach
AU - Yan, Peng
AU - Bai, Chengchao
AU - Zheng, Hongxing
AU - Guo, Jifeng
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/27
Y1 - 2020/11/27
N2 - The flocking control of UAV swarms has been studied extensively due to its wide applications. In this paper, the UAV flocking control problem is formulated as a Partial Observable Markov Decision Process (POMDP) where the constraints of the UAV's communication and perception ranges are considered. A deep reinforcement learning approach is proposed to solve this problem with centralized training and decentralized execution manner. The experience collected by all UAVs is used to train the shared flocking control policy, and each UAV performs actions based on the local environment information it observes. To enable the UAV swarm to maintain a flock and navigate in an environment with dense obstacles, a reward function is constructed considering with goal reaching, obstacles avoidance and flocking maintenance. Especially, the flocking maintenance reward is designed with the global information of the UAV swarm, which can only be obtained during the training phase. Simulation results demonstrate that the policy trained with the flocking maintenance reward can make the UAV swarm keep a flock when encountering obstacles and has good generalization ability with different number of UAVs.
AB - The flocking control of UAV swarms has been studied extensively due to its wide applications. In this paper, the UAV flocking control problem is formulated as a Partial Observable Markov Decision Process (POMDP) where the constraints of the UAV's communication and perception ranges are considered. A deep reinforcement learning approach is proposed to solve this problem with centralized training and decentralized execution manner. The experience collected by all UAVs is used to train the shared flocking control policy, and each UAV performs actions based on the local environment information it observes. To enable the UAV swarm to maintain a flock and navigate in an environment with dense obstacles, a reward function is constructed considering with goal reaching, obstacles avoidance and flocking maintenance. Especially, the flocking maintenance reward is designed with the global information of the UAV swarm, which can only be obtained during the training phase. Simulation results demonstrate that the policy trained with the flocking maintenance reward can make the UAV swarm keep a flock when encountering obstacles and has good generalization ability with different number of UAVs.
KW - Deep reinforcement learning
KW - Flocking control
KW - Obstacles avoidance
KW - UAV swarms
UR - https://www.scopus.com/pages/publications/85098941318
U2 - 10.1109/ICUS50048.2020.9274899
DO - 10.1109/ICUS50048.2020.9274899
M3 - 会议稿件
AN - SCOPUS:85098941318
T3 - Proceedings of 2020 3rd International Conference on Unmanned Systems, ICUS 2020
SP - 592
EP - 599
BT - Proceedings of 2020 3rd International Conference on Unmanned Systems, ICUS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd International Conference on Unmanned Systems, ICUS 2020
Y2 - 27 November 2020 through 28 November 2020
ER -