Abstract
An algorithm for estimating the policy gradient is presented for the performance optimization of continuous-time partially observable Markovian decision processes (CTPOMDPs). This estimation algorithm is obtained by extending the corresponding estimation algorithm for discrete-time partially observable Markovian decision processes (DTPOMDP's), using the conformity method. The convergence and the error bound of this algorithm are analyzed; and a numerical example is provided to illustrate its application.
| Original language | English |
|---|---|
| Pages (from-to) | 805-808 |
| Number of pages | 4 |
| Journal | Kongzhi Lilun Yu Yingyong/Control Theory and Applications |
| Volume | 26 |
| Issue number | 7 |
| State | Published - Jul 2009 |
| Externally published | Yes |
Keywords
- CTPOMDP
- Conformity
- Error bound
- Policy gradient estimation
Fingerprint
Dive into the research topics of 'The policy gradient estimation for continuous-time partially observable Markovian decision processes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver