Abstract
Inimitation learning, it is often assumed the demonstration data are optimal, even though they are imperfect in practice. The imperfect demonstrations result from expert errors, large-scale demonstration data, and the non-convexity of the solution space of the task. In this letter, we propose a new ranking-based Generative Adversarial Imitation Learning (RB-GAIL) that can deal with the above imperfect datasets by utilizing the generated experiences more efficiently and avoiding the dependency on plenty of different expert demonstrations. We performed a rigorous mathematical analysis, indicating that RB-GAIL can implicitly model the modes of the expert data by weighting multiple discriminators, and a monotonically increasing positive activation function can help the model converge to the global optimal solution. Experimental results show that our method surpasses other baseline methods with imperfect demonstration (ours: increased by 4.7% to the optimal expert level in the Ant task, but Trajectory-ranked Reward Extrapolation (T-REX): decreased by 12.2%, Unlabeled Imperfect Demonstrations in Adversarial Imitation Learning (UID): decreased by 31.01% and Wasserstein Adversarial Imitation Learning (WAIL): dropped by 96.0%). In physical experiments with manipulation, our proposed method achieved a success rate of 100% (WAIL: under 90%).
| Original language | English |
|---|---|
| Pages (from-to) | 8967-8974 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 9 |
| Issue number | 10 |
| DOIs | |
| State | Published - 2024 |
Keywords
- Imitation learning
- learning from demonstration
Fingerprint
Dive into the research topics of 'Ranking-Based Generative Adversarial Imitation Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver