Skip to main navigation Skip to search Skip to main content

Sample-efficient policy learning based on completely behavior cloning

  • Harbin Institute of Technology
  • Anhui University of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In order to overcome these challenges, this paper proposed a training-free behavior cloning algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC transforms the Model Predictive Control (MPC) controller into a PieceWise Affine (PWA) function with multi-parametric programming, and uses a neural network to express this function. By this way, off-the-shelf deep reinforcement learning algorithms can be used to fine-tune this neural network. The experiments show that our method can help agent learn at the high reward state region, and converge faster and better.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2543-2548
Number of pages6
ISBN (Electronic)9781728145693
DOIs
StatePublished - Oct 2019
Event2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019 - Bari, Italy
Duration: 6 Oct 20199 Oct 2019

Publication series

NameConference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
Volume2019-October
ISSN (Print)1062-922X

Conference

Conference2019 IEEE International Conference on Systems, Man and Cybernetics, SMC 2019
Country/TerritoryItaly
CityBari
Period6/10/199/10/19

Fingerprint

Dive into the research topics of 'Sample-efficient policy learning based on completely behavior cloning'. Together they form a unique fingerprint.

Cite this