Skip to main navigation Skip to search Skip to main content

深度Q学习的二次主动采样方法

Translated title of the contribution: Twice Sampling Method in Deep Q-network
  • Ying Nan Zhao
  • , Peng Liu
  • , Wei Zhao*
  • , Xiang Long Tang
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

One way of implementing the deep Q-learning is the deep Q-networks (DQN). Experience replay is known to train deep Q-networks by reusing transitions from a replay memory. However, an agent needs to interact with the environment lots of times to construct the replay memory, which will increase the cost and risk. To reduce the times of interaction, one way is to use the transitions more efficiently. The cumulative reward of an episode where one transition is obtained from has an impact on the training of DQN. If a transition is obtained from the episode which can get a big cumulative reward, it can accelerate the convergence of DQN and improve the best policy compared with the transition which is obtained from a small cumulative reward's episode. In this paper, we develop a framework for twice active sampling method in the deep Q-learning. First of all, we sample the episodes from the replay memory based on their cumulative reward. Then we sample the transitions from the selected episodes based on their temporal-difference error (TD-error). In the end, we train the DQN with these transitions. The method proposed in this paper not only accelerates the convergence of the deep Q-learning, but also leads to better policies because we replay transitions based on both TD-error and cumulative reward. By analyzing the results on Atari games, the experiments have shown that our method can achieve good results.

Translated title of the contributionTwice Sampling Method in Deep Q-network
Original languageChinese (Traditional)
Pages (from-to)1870-1882
Number of pages13
JournalZidonghua Xuebao/Acta Automatica Sinica
Volume45
Issue number10
DOIs
StatePublished - 1 Oct 2019
Externally publishedYes

Cite this