TY - GEN
T1 - Learning to Solve Tasks with Exploring Prior Behaviours
AU - Zhu, Ruiqi
AU - Li, Siyuan
AU - Dai, Tianhong
AU - Zhang, Chongjie
AU - Celiktutan, Oya
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of picking up an object from an open drawer, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Reward Driven Example-based Control (IRDEC). Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.
AB - Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of picking up an object from an open drawer, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Reward Driven Example-based Control (IRDEC). Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.
UR - https://www.scopus.com/pages/publications/85182525925
U2 - 10.1109/IROS55552.2023.10342272
DO - 10.1109/IROS55552.2023.10342272
M3 - 会议稿件
AN - SCOPUS:85182525925
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 7501
EP - 7507
BT - 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2023
Y2 - 1 October 2023 through 5 October 2023
ER -