Skip to main navigation Skip to search Skip to main content

Zero6DOT: Zero-Shot 6D Object Pose Tracking With Monocular RGB Video

  • Bo Pang
  • , Deming Zhai
  • , Jianan Zhen
  • , Long Wang
  • , Xu Han
  • , Guofeng Zhang
  • , Xianming Liu*
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • SenseTime
  • Zhejiang University
  • Peng Cheng Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

6D object tracking plays an important role in various applications, including robotic manipulation and virtual reality. While current methodologies have achieved significant advancements through the use of CAD models, multi-modal sensor data, and category-level assumptions, such resources are often inaccessible in open-world scenarios. Consequently, tracking 6D object poses using only RGB data in such scenarios remains a challenging task. In this paper, we introduce Zero6DOT, an innovative and efficient method for real-time tracking of unknown 6D object poses in monocular RGB video sequences at 8Hz. Our approach requires only the mask of the initial frame, eliminating the need for additional data. The core of Zero6DOT lies in its ability to establish high-quality correspondences across images, from which accurate poses are derived. To achieve this, we employ a transformer-based neural network to predict initial long-term correspondences across frames and integrate a robust Dynamic Units System to refine these predictions. This combination facilitates precise pose tracking while maintaining both efficiency and robustness, even under challenging conditions such as object disappearance, reappearance, and handheld motion. The effectiveness of our approach has been rigorously evaluated through both qualitative and quantitative analyses on the OnePose, YCB-V, and RBOT datasets. The results demonstrate the potential of our proposed Zero6DOT to redefine 6D object pose tracking for real-world scenarios.

Original languageEnglish
Pages (from-to)12382-12395
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number12
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • 6D pose tracking
  • monocular RGB video
  • zero-shot

Fingerprint

Dive into the research topics of 'Zero6DOT: Zero-Shot 6D Object Pose Tracking With Monocular RGB Video'. Together they form a unique fingerprint.

Cite this