Abstract
Monocular spacecraft pose estimation is pivotal for autonomous on-orbit servicing but remains challenging due to harsh illumination, high contrast, and scale ambiguity inherent to the space environment. While Visual Geometry Grounded Transformers (VGGT) offer powerful capabilities for multi-view analysis, the high computational cost limits their performance, motivating the development of FastVGGT to address this inefficiency. Building upon FastVGGT and Mixture of Experts (MoE), we propose an enhanced framework tailored for sequential spacecraft pose estimation. Our core innovation lies in a novel Learnable Token Merging (L-ToMe) mechanism designed to replace standard static merging strategies. By explicitly learning to identify and preserve salient regions while compressing redundant space backgrounds and routing tokens to experts, our approach significantly reduces computational overhead without compromising geometric reconstruction fidelity. Experiments on the SwissCube dataset demonstrate that our method improves pose estimation accuracy by 21.6% compared to the baseline. These results validate the potential of our enhanced architecture for reliable monocular autonomous space navigation.
| Original language | English |
|---|---|
| Article number | 112310 |
| Journal | Aerospace Science and Technology |
| Volume | 177 |
| DOIs | |
| State | Published - Oct 2026 |
Keywords
- Deep learning
- Monocular vision
- Spacecraft pose estimation
- Token merging
- Vision transformers
Fingerprint
Dive into the research topics of 'Sequential spacecraft pose estimation via visual geometry grounded transformers and learnable token merging'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver