TY - GEN
T1 - A Ranking Scheme for Trust Region Multi-agent Reinforcement Learning
AU - Gao, Ruichen
AU - Hu, Yi
AU - Zheng, Deqin
AU - Shao, Mengxuan
AU - Zhu, Haiqi
AU - Song, Chenyue
AU - Zhang, Wei
AU - Jiang, Feng
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In multi-agent reinforcement learning (MARL), trust region (TR) methods are widely used because they effectively mitigate the nonstationarity of multi-agent systems and facilitate collaboration among diverse agent types. Based on the multi-agent advantage decomposition lemma, TR methods adopt a sequential update scheme (i.e., agents' policy networks are trained with a certain order). However, current TR methods lack a ranking scheme and train the agents in a random order, this results in suboptimal performance and large variances. To solve this issue, based on agents' observations (the input of agents' policy networks), we formulate our ranking criteria and furthermore propose our ranking schemes. Specifically, we avoid agents with similar observations being ranked adjacent to each other for training and give higher priority to the agents with more information in their observations. We extend our schemes to popular TR methods and evaluate them on a series of StarCraftII, Google Football and Multi-Agent MuJoCo tasks, results show that our ranking schemes can enhance current TR methods in many tasks, whatever in performance, efficiency or stability, indicating its modeling capability on both homogeneous and heterogeneous agent tasks.
AB - In multi-agent reinforcement learning (MARL), trust region (TR) methods are widely used because they effectively mitigate the nonstationarity of multi-agent systems and facilitate collaboration among diverse agent types. Based on the multi-agent advantage decomposition lemma, TR methods adopt a sequential update scheme (i.e., agents' policy networks are trained with a certain order). However, current TR methods lack a ranking scheme and train the agents in a random order, this results in suboptimal performance and large variances. To solve this issue, based on agents' observations (the input of agents' policy networks), we formulate our ranking criteria and furthermore propose our ranking schemes. Specifically, we avoid agents with similar observations being ranked adjacent to each other for training and give higher priority to the agents with more information in their observations. We extend our schemes to popular TR methods and evaluate them on a series of StarCraftII, Google Football and Multi-Agent MuJoCo tasks, results show that our ranking schemes can enhance current TR methods in many tasks, whatever in performance, efficiency or stability, indicating its modeling capability on both homogeneous and heterogeneous agent tasks.
KW - MARL
KW - Ranking
KW - Reinforcement Learning
KW - Trust Region
UR - https://www.scopus.com/pages/publications/105003868860
U2 - 10.1109/ICASSP49660.2025.10889127
DO - 10.1109/ICASSP49660.2025.10889127
M3 - 会议稿件
AN - SCOPUS:105003868860
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
BT - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
A2 - Rao, Bhaskar D
A2 - Trancoso, Isabel
A2 - Sharma, Gaurav
A2 - Mehta, Neelesh B.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -