Abstract
Reinforcement learning (RL) has shown strengthsin challenging sequential decision-making problems. The rewardfunction in RL is crucial to the learning performance, as itquantifies the degree of task completion. In real-world problems,the rewards are predominantly human-designed, which requireslaborious tuning, and is susceptible to human cognitive biases.To achieve automatic auxiliary reward generation, we proposea novel representation learning approach that can measurethe “transition distance” between states. Building upon theserepresentations, we introduce an auxiliary reward generationtechnique for both single-task and skill-chaining scenarios with-out the need for human knowledge. Furthermore, we theoreticallyshow that the proposed auxiliary rewards maintain the policyinvariance property, i.e., the generated rewards will not hurt thepolicy optimality under the original rewards. In the experimentsection, we evaluate the proposed approach in both online andoffline learning settings in a wide range of tasks, includingrobot manipulation and locomotion. The experiment resultsdemonstrate the effectiveness of measuring the transition distanceand the induced improvement by auxiliary rewards, whichpromotes better learning efficiency and increases convergentstability.
| Original language | English |
|---|---|
| Pages (from-to) | 13728-13740 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Automation Science and Engineering |
| Volume | 22 |
| DOIs | |
| State | Published - 2025 |
| Externally published | Yes |
Keywords
- Auxiliary rewards
- reinforcement learning
- representation learning
- skill chaining
Fingerprint
Dive into the research topics of 'Auxiliary Reward Generation With Transition Distance Representation Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver