Abstract
We study the case of value function initialization in a reinforcement learning agent that deals with a set of tasks varying in terms of reward and sampled in a lifelong manner. The existing works tackling this setting of transfer reinforcement learning often consider a uniform sampling of tasks in their experiments, while optimistically transferring the knowledge by using the maximum seen outcome. However, due to the uncertainty of the real world, infrequent events cause the distribution to be non-uniform. As a consequence, the optimistic initialization seems impractical because it gives equally high importance to both frequent and infrequent tasks causing sample complexity to increase. We argue that to overcome such limitation, the agent must be able to assess how optimism is influenced by its uncertainty and confidence; two intercorrelated notions that play a crucial role in decision-making. Therefore we propose a novel approach UCOI (Uncertainty and Confidence aware Optimistic Initialization) that applies optimism only in adequate situations and we prove that our approach shows advantageous results over the existing works, especially for tasks coming from a non-uniform distribution.
| Original language | English |
|---|---|
| Article number | 111036 |
| Journal | Knowledge-Based Systems |
| Volume | 280 |
| DOIs | |
| State | Published - 25 Nov 2023 |
Keywords
- Knowledge transfer
- Lifelong learning
- Optimistic initialization
- Reinforcement learning
Fingerprint
Dive into the research topics of 'Value function optimistic initialization with uncertainty and confidence awareness in lifelong reinforcement learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver