Skip to main navigation Skip to search Skip to main content

Correcting biased value estimation in mixing value-based multi-agent reinforcement learning by multiple choice learning

  • School of Electronics and Information Engineering, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-agent reinforcement learning (MARL) has become more and more popular over the past decades, and many value-based MARL methods are proposed in the past few years. Neural networks play important roles in these methods and are used to predict the value of the state–action pair, i.e. Q-value and actions of agents are chosen based on this. However the inaccurate prediction of the neural network leads to the biased Q-value estimation, which will cause inefficient usage of the experience data and poor performance. Unlike ensemble methods that just reduce the variance of predictions, multiple choice learning (MCL) methods exploit the cooperation among all the candidate models. This paper corrects the biased Q-value by exploiting the collaboration between the ensemble model and MARL to obtain a stabler and preciser Q-value estimator. In this paper, a new MARL method called Multiple Choice QMIX is developed to address the biased Q-value issue, which also extends the application scenarios of MCL methods. Specifically, we propose a voting network to learn the confidence level of each estimator and thus can provide the best prediction by combining their results. And a voting hindsight loss is proposed to encourage the voting network to overcome the overestimation of the Q-value. We also conduct experiments on four challenging tasks of the StarCraft II micromanagement benchmark. Experiment results show that our method obtains a faster convergence rate and stabler performance in multi-agent tasks.

Original languageEnglish
Article number105329
JournalEngineering Applications of Artificial Intelligence
Volume116
DOIs
StatePublished - Nov 2022
Externally publishedYes

Keywords

  • Biased value estimation
  • Dec-POMDPs
  • Multi-agent reinforcement learning
  • Multiple choice learning

Fingerprint

Dive into the research topics of 'Correcting biased value estimation in mixing value-based multi-agent reinforcement learning by multiple choice learning'. Together they form a unique fingerprint.

Cite this