TY - GEN
T1 - Multi-UAV Automatic Dynamic Obstacle Avoidance with Experience-shared A2C
AU - Han, Xiao
AU - Wang, Jing
AU - Zhang, Qinyu
AU - Qin, Xue
AU - Sun, Meng
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - With the increasing usage of UAV in reconnaissance, agriculture, logistics and entertainment, it's necessary for multi-UAV to automatically avoid the dynamic obstacles in order to ensure the safety of drones and livings in environment. The automatic obstacle avoidance is a classic multiple agent decision-making problem. Traditional algorithms, limited in the method of state classification and policy selection, are not applicable in such a complex scene including randomly dynamic scene and cooperative decision-making. In this paper, Advantaged Actor-Critic Algorithm is introduced to train multi-UAVs to automatically avoid obstacles and optimize avoidance decision-making model. Deep Q Learning, Actor-Critic (AC) and Advantaged Actor-Critic (A2C) algorithm are compared. And to further maximize the performance, we specifically improved A2C algorithm towards the multi-UAV scene by sharing experiences between UAVs to expedite the training process. Our experimental result shows our Experience-shared A2C (ES-A2C) algorithm leads to a higher performance and a shorter training period.
AB - With the increasing usage of UAV in reconnaissance, agriculture, logistics and entertainment, it's necessary for multi-UAV to automatically avoid the dynamic obstacles in order to ensure the safety of drones and livings in environment. The automatic obstacle avoidance is a classic multiple agent decision-making problem. Traditional algorithms, limited in the method of state classification and policy selection, are not applicable in such a complex scene including randomly dynamic scene and cooperative decision-making. In this paper, Advantaged Actor-Critic Algorithm is introduced to train multi-UAVs to automatically avoid obstacles and optimize avoidance decision-making model. Deep Q Learning, Actor-Critic (AC) and Advantaged Actor-Critic (A2C) algorithm are compared. And to further maximize the performance, we specifically improved A2C algorithm towards the multi-UAV scene by sharing experiences between UAVs to expedite the training process. Our experimental result shows our Experience-shared A2C (ES-A2C) algorithm leads to a higher performance and a shorter training period.
KW - advantage actor-critic
KW - moving obstacles avoidance
KW - multi-Agent decision
KW - multi-UAV
KW - shared experience
UR - https://www.scopus.com/pages/publications/85077606634
U2 - 10.1109/WiMOB.2019.8923344
DO - 10.1109/WiMOB.2019.8923344
M3 - 会议稿件
AN - SCOPUS:85077606634
T3 - International Conference on Wireless and Mobile Computing, Networking and Communications
SP - 330
EP - 335
BT - 2019 International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019
PB - IEEE Computer Society
T2 - 15th International Conference on Wireless and Mobile Computing, Networking and Communications, WiMob 2019
Y2 - 21 October 2019 through 23 October 2019
ER -