Abstract
In this note, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed and propose algorithms with multi-step sampling for performance gradient estimates; these algorithms do not require the standard assumption. Simulation examples are given to illustrate the accuracy of the estimates.
| Original language | English |
|---|---|
| Pages (from-to) | 3-17 |
| Number of pages | 15 |
| Journal | Discrete Event Dynamic Systems: Theory and Applications |
| Volume | 20 |
| Issue number | 1 |
| DOIs | |
| State | Published - Mar 2010 |
| Externally published | Yes |
Keywords
- Markov reward processes
- On-line estimation
- Performance potentials
- Policy gradient
Fingerprint
Dive into the research topics of 'On-line policy gradient estimation with multi-step sampling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver