TY - GEN
T1 - Pursuit-Evasion Game of Multiple Unmannd Surface Vessels in Partially Observable Environments
AU - Zhuang, Yufei
AU - Qu, Delin
AU - Yao, Yiyi
AU - Huang, Haibin
AU - Wang, Runan
AU - Duan, Lianqi
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The pursuit-evasion game for unmanned surface vessels (USVs) is a topic of great interest in the field of ocean engineering. Most of the previous studies only focused on ideal scenarios with fully observable environments, which are beyond the practical applications. In this paper, we present a reinforcement learning (RL) model for the real marine environment including signal shielding regions, depots, and obstacles. Suppose in the signal shielding regions, the vessels' position information is unknown to each other except for the leader pursuer, and the new environment model is partially observable to ordinary USVs. Then, a distributed multi-agent RL algorithm is proposed and a leader-follower implicit communication mechanism to train the pursuer's strategy is introduced. Weighting the distance, number of captures, and other relevant parameters, a new reward function of the pursuit-evasion game is designed to optimize the pursuit strategy continuously. As shown in the simulation results, the ordinary pursuers can predict the movement direction and capture points of the evader with high accuracy, even when it is in the signal shielding area. This is due to the implicit communication with the leader pursuer in the partially observable environment, which can also effectively optimize the pursuit strategy in the new environment.
AB - The pursuit-evasion game for unmanned surface vessels (USVs) is a topic of great interest in the field of ocean engineering. Most of the previous studies only focused on ideal scenarios with fully observable environments, which are beyond the practical applications. In this paper, we present a reinforcement learning (RL) model for the real marine environment including signal shielding regions, depots, and obstacles. Suppose in the signal shielding regions, the vessels' position information is unknown to each other except for the leader pursuer, and the new environment model is partially observable to ordinary USVs. Then, a distributed multi-agent RL algorithm is proposed and a leader-follower implicit communication mechanism to train the pursuer's strategy is introduced. Weighting the distance, number of captures, and other relevant parameters, a new reward function of the pursuit-evasion game is designed to optimize the pursuit strategy continuously. As shown in the simulation results, the ordinary pursuers can predict the movement direction and capture points of the evader with high accuracy, even when it is in the signal shielding area. This is due to the implicit communication with the leader pursuer in the partially observable environment, which can also effectively optimize the pursuit strategy in the new environment.
KW - USVs
KW - deep reinforcement learning
KW - implicit communication
KW - partially observable environment
KW - pursuit-evasion game
UR - https://www.scopus.com/pages/publications/85189332184
U2 - 10.1109/CAC59555.2023.10451200
DO - 10.1109/CAC59555.2023.10451200
M3 - 会议稿件
AN - SCOPUS:85189332184
T3 - Proceedings - 2023 China Automation Congress, CAC 2023
SP - 8243
EP - 8248
BT - Proceedings - 2023 China Automation Congress, CAC 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 China Automation Congress, CAC 2023
Y2 - 17 November 2023 through 19 November 2023
ER -