Skip to main navigation Skip to search Skip to main content

Sequential spacecraft pose estimation via visual geometry grounded transformers and learnable token merging

  • State Key Laboratory of Micro-Spacecraft Rapid Design and Intelligent Cluster
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Monocular spacecraft pose estimation is pivotal for autonomous on-orbit servicing but remains challenging due to harsh illumination, high contrast, and scale ambiguity inherent to the space environment. While Visual Geometry Grounded Transformers (VGGT) offer powerful capabilities for multi-view analysis, the high computational cost limits their performance, motivating the development of FastVGGT to address this inefficiency. Building upon FastVGGT and Mixture of Experts (MoE), we propose an enhanced framework tailored for sequential spacecraft pose estimation. Our core innovation lies in a novel Learnable Token Merging (L-ToMe) mechanism designed to replace standard static merging strategies. By explicitly learning to identify and preserve salient regions while compressing redundant space backgrounds and routing tokens to experts, our approach significantly reduces computational overhead without compromising geometric reconstruction fidelity. Experiments on the SwissCube dataset demonstrate that our method improves pose estimation accuracy by 21.6% compared to the baseline. These results validate the potential of our enhanced architecture for reliable monocular autonomous space navigation.

Original languageEnglish
Article number112310
JournalAerospace Science and Technology
Volume177
DOIs
StatePublished - Oct 2026

Keywords

  • Deep learning
  • Monocular vision
  • Spacecraft pose estimation
  • Token merging
  • Vision transformers

Fingerprint

Dive into the research topics of 'Sequential spacecraft pose estimation via visual geometry grounded transformers and learnable token merging'. Together they form a unique fingerprint.

Cite this