Abstract
Theoretical research and practical applications of solvable subclasses in partially observable Markov decision processes (POMDPs) remain challenging in reinforcement learning (RL). This paper introduces a solvable subclass of POMDPs, termed the neighborhood α-weakly revealing condition, and proposes the neighborhood optimality maximum likelihood estimation (NOMLE) framework under this condition. We derive an upper bound on the regret of this framework during training and address limitations arising from ignored symmetric substructures. Based on the theoretical properties of our condition, we further derive a variational lower bound for maximizing mutual information, which leads to the proposed distinguishing neighborhood-infer exploration (DNE) algorithm. Experimental results show that the NOMLE framework achieves superior environment understanding and exploration performance in POMDPs, while DNE learns features with well-structured data manifolds and robustly handles data distribution shifts. Most notably, by optimizing the neighborhood size, DNE trains policies with lower entropy and higher certainty.
| Original language | English |
|---|---|
| Article number | 132810 |
| Journal | Neurocomputing |
| Volume | 674 |
| DOIs | |
| State | Published - 14 Apr 2026 |
Keywords
- Exploration
- Partially observable markov decision processes
- Reinforcement learning
- Solvable subclass
Fingerprint
Dive into the research topics of 'Neighborhood α-weakly revealing conditions and distinguishing neighborhood-infer exploration methods in reinforcement learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver