Abstract
With the advancement of space-based remote sensing technology, ultra-large-scale spacecraft have become crucial platforms for earth observation and deep-space exploration missions. However, these spacecraft face significant challenges in high-precision attitude control due to the nonlinear coupling between flexible structural vibrations and rigid-body attitude motion, further compounded by complex model uncertainties. Additionally, attitude control methods for ultra-large-scale spacecraft that rely on chemical or electric propulsion exhibit substantial energy consumption, making them unsuitable for long-duration missions. Consequently, achieving energy-optimal rigid-flexible coupling control under uncertain system dynamics remains a critical challenge in the field of ultra-large-scale spacecraft control. While existing approaches, such as robust control, adaptive control, and performance-constrained control, have successfully achieved attitude stabilization, they generally neglect energy optimization objectives. To address these issues, this paper proposes an energy-optimal attitude control strategy based on off-policy reinforcement learning for rigid-flexible-coupling ultra-large-scale spacecraft with model uncertainties. More specifically, first, a nonlinear dynamic model is established to capture the coupling effects between flexible appendage vibration modes and rigid-body motion. Model uncertainties, environmental disturbances, along with internal nonlinear couplings are encapsulated into a lumped unknown term, which is online estimated via a neural network (NN)-based adaptive identifier. Then, an efficient online actor-critic learning strategy integrated with an experience replay mechanism is proposed to address the data-efficiency and stability limitations of traditional policy iteration methods. The actor-critic structure is employed to solve the Bellman optimality equation for the energy-optimal control problem, where policy evaluation benefits from reusing historically collected data to improve learning efficiency. Furthermore, the off-policy learning scheme facilitates the learning of optimal control strategies by leveraging past experiences, strengthening the stability and robustness in such a complex system. Theoretical analysis proves that the iterative sequences of the value function and control strategy converge to the optimal counterparts. In addition, rigorous proof of Lyapunov stability of the system in the sense of uniform ultimate boundedness (UUB) is provided, which ensures that the quaternion and angular velocity converge to a neighborhood of the equilibrium. Numerical simulation results based on a large-scale deployable truss spacecraft validate the effectiveness of the proposed strategy. The results highlight its potential for achieving efficient energyoptimal attitude stabilization in the ultra-large-scale spacecraft.
| Original language | English |
|---|---|
| Title of host publication | IAF Astrodynamics Symposium - Held at the 76th International Astronautical Congress, IAC 2025 |
| Publisher | International Astronautical Federation, IAF |
| Pages | 69-75 |
| Number of pages | 7 |
| ISBN (Electronic) | 9798331329358 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IAF Astrodynamics Symposium at the 76th International Astronautical Congress, IAC 2025 - Sydney, Australia Duration: 29 Sep 2025 → 3 Oct 2025 |
Publication series
| Name | Proceedings of the International Astronautical Congress, IAC |
|---|---|
| Volume | 1-F219391 |
| ISSN (Print) | 0074-1795 |
Conference
| Conference | 2025 IAF Astrodynamics Symposium at the 76th International Astronautical Congress, IAC 2025 |
|---|---|
| Country/Territory | Australia |
| City | Sydney |
| Period | 29/09/25 → 3/10/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- Energy-Optimal Control
- Model Uncertainty Compensation
- Off-Policy Reinforcement Learning
- Rigid-Flexible Coupling
Fingerprint
Dive into the research topics of 'Energy-optimal Attitude Control of Ultra-large-scale Spacecraft: An Off-policy Actor-critic Approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver