Skip to main navigation Skip to search Skip to main content

RVI reinforcement learning for Semi-Markov decision processes with average reward

  • Yanjie Li*
  • , Fang Cao
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Beijing Jiaotong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.

Original languageEnglish
Title of host publication2010 8th World Congress on Intelligent Control and Automation, WCICA 2010
Pages1674-1679
Number of pages6
DOIs
StatePublished - 2010
Externally publishedYes
Event2010 8th World Congress on Intelligent Control and Automation, WCICA 2010 - Jinan, China
Duration: 7 Jul 20109 Jul 2010

Publication series

NameProceedings of the World Congress on Intelligent Control and Automation (WCICA)

Conference

Conference2010 8th World Congress on Intelligent Control and Automation, WCICA 2010
Country/TerritoryChina
CityJinan
Period7/07/109/07/10

Keywords

  • Performance potential
  • Reinforcement learning
  • Relative value iteration
  • Semi-Markov decision processes

Fingerprint

Dive into the research topics of 'RVI reinforcement learning for Semi-Markov decision processes with average reward'. Together they form a unique fingerprint.

Cite this