Abstract
In the field of robotic manipulation, traditional methods lack the flexibility required to meet the demands of diverse applications. Consequently, researchers have increasingly focused on developing more general techniques, particularly for long-horizon and gentle manipulation, to enhance the manipulation ability and adaptability of robots. In this study, we propose a framework called VLM-Driven Atomic Skills with Diffusion Policy Distillation (VASK-DP), which integrates tactile sensing to enable gentle control of robotic arms in long-horizon tasks. The framework trains atomic manipulation skills through reinforcement learning in simulated environments. The Visual Language Model (VLM) interprets RGB observations and natural language instructions to select and sequence atomic skills, guiding task decomposition, skill switching, and execution. It also generates expert demonstration datasets that serve as the basis for imitation learning. Subsequently, compliant long-horizon manipulation policies are distilled from these demonstrations using diffusion-based imitation learning. We evaluate multiple control modes, distillation strategies, and decision frameworks. Quantitative results across diverse simulation environments and long-horizon tasks validate the effectiveness of our approach. Furthermore, real robot deployment demonstrates successful task execution on physical hardware.
| Original language | English |
|---|---|
| Pages (from-to) | 2538-2545 |
| Number of pages | 8 |
| Journal | IEEE Robotics and Automation Letters |
| Volume | 11 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2026 |
| Externally published | Yes |
Keywords
- Dexterous manipulation
- force and tactile sensing
- imitation learning
Fingerprint
Dive into the research topics of 'Gentle Manipulation of Long-Horizon Tasks Without Human Demonstrations'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver