TY - GEN
T1 - RVI reinforcement learning for Semi-Markov decision processes with average reward
AU - Li, Yanjie
AU - Cao, Fang
PY - 2010
Y1 - 2010
N2 - Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.
AB - Based on the sensitivity-based approach, we discuss the reinforcement learning problem of semi-Markov decision processes (SMDPs) with average reward. First, we provide a new Bellman optimality equation. On this basis, we propose a relative value iteration (RVI) reinforcement learning algorithm. The new RVI reinforcement learning algorithm may avoid the estimation of optimal average reward in the process of learning and has a good convergence rate.
KW - Performance potential
KW - Reinforcement learning
KW - Relative value iteration
KW - Semi-Markov decision processes
UR - https://www.scopus.com/pages/publications/77958114662
U2 - 10.1109/WCICA.2010.5554785
DO - 10.1109/WCICA.2010.5554785
M3 - 会议稿件
AN - SCOPUS:77958114662
SN - 9781424467129
T3 - Proceedings of the World Congress on Intelligent Control and Automation (WCICA)
SP - 1674
EP - 1679
BT - 2010 8th World Congress on Intelligent Control and Automation, WCICA 2010
T2 - 2010 8th World Congress on Intelligent Control and Automation, WCICA 2010
Y2 - 7 July 2010 through 9 July 2010
ER -