Skip to main navigation Skip to search Skip to main content

The policy gradient estimation for continuous-time partially observable Markovian decision processes

  • Bo Tang*
  • , Yan Jie Li
  • , Bao Qun Yin
  • *Corresponding author for this work
  • University of Science and Technology of China

Research output: Contribution to journalArticlepeer-review

Abstract

An algorithm for estimating the policy gradient is presented for the performance optimization of continuous-time partially observable Markovian decision processes (CTPOMDPs). This estimation algorithm is obtained by extending the corresponding estimation algorithm for discrete-time partially observable Markovian decision processes (DTPOMDP's), using the conformity method. The convergence and the error bound of this algorithm are analyzed; and a numerical example is provided to illustrate its application.

Original languageEnglish
Pages (from-to)805-808
Number of pages4
JournalKongzhi Lilun Yu Yingyong/Control Theory and Applications
Volume26
Issue number7
StatePublished - Jul 2009
Externally publishedYes

Keywords

  • CTPOMDP
  • Conformity
  • Error bound
  • Policy gradient estimation

Fingerprint

Dive into the research topics of 'The policy gradient estimation for continuous-time partially observable Markovian decision processes'. Together they form a unique fingerprint.

Cite this