Abstract
Reinforcement learning (RL) problems with uncertainty and hidden state present significant obstacles to prevailing RL methods. In this paper, a novel approximate algorithm, called Memetic algorithm based Q-Learning (MA-Q-Learning), is proposed as a means to solve the POMDP problems which has such uncertainty problems. The policies are evolved using memetic algorithms, whereas the improved Q-learning obtains predictive rewards to indicate fitness of the evolved policies. In order to solve the hidden state problem, historical information is incorporated with the current belief state to aid in finding the optimal policy. Finally, the search efficiency is improved by a hybrid search method, in which an adjustment factor is used to help keep the diversity of population and guide the crossover based on the combination of multiple kinds of crossover and mutation. The experiments conducted on benchmark datasets show that the proposed methodology is superior to other state-of-the-art POMDP approximate methods.
| Original language | English |
|---|---|
| Pages (from-to) | 1356-1360 |
| Number of pages | 5 |
| Journal | Tien Tzu Hsueh Pao/Acta Electronica Sinica |
| Volume | 34 |
| Issue number | 7 |
| State | Published - Jul 2006 |
| Externally published | Yes |
Keywords
- Belief state
- Hidden state
- Memetic algorithm
- POMDP
- Q-learning
Fingerprint
Dive into the research topics of 'Evolutionary algorithm based reinforcement learning in the uncertain environments'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver