Skip to main navigation Skip to search Skip to main content

Neighborhood α-weakly revealing conditions and distinguishing neighborhood-infer exploration methods in reinforcement learning

Research output: Contribution to journalArticlepeer-review

Abstract

Theoretical research and practical applications of solvable subclasses in partially observable Markov decision processes (POMDPs) remain challenging in reinforcement learning (RL). This paper introduces a solvable subclass of POMDPs, termed the neighborhood α-weakly revealing condition, and proposes the neighborhood optimality maximum likelihood estimation (NOMLE) framework under this condition. We derive an upper bound on the regret of this framework during training and address limitations arising from ignored symmetric substructures. Based on the theoretical properties of our condition, we further derive a variational lower bound for maximizing mutual information, which leads to the proposed distinguishing neighborhood-infer exploration (DNE) algorithm. Experimental results show that the NOMLE framework achieves superior environment understanding and exploration performance in POMDPs, while DNE learns features with well-structured data manifolds and robustly handles data distribution shifts. Most notably, by optimizing the neighborhood size, DNE trains policies with lower entropy and higher certainty.

Original languageEnglish
Article number132810
JournalNeurocomputing
Volume674
DOIs
StatePublished - 14 Apr 2026

Keywords

  • Exploration
  • Partially observable markov decision processes
  • Reinforcement learning
  • Solvable subclass

Fingerprint

Dive into the research topics of 'Neighborhood α-weakly revealing conditions and distinguishing neighborhood-infer exploration methods in reinforcement learning'. Together they form a unique fingerprint.

Cite this