Abstract
Autonomous spacecraft rendezvous poses significant challenges in increasingly complex space missions. Recently, Reinforcement Learning (RL) has proven effective in the domain of spacecraft rendezvous, owing to its high performance in complex continuous control tasks and low online storage and computation cost. However, the lack of safety guarantees during the learning process restricts the application of RL to safety-critical control systems within real-world environments. To mitigate this challenge, we introduce a safe reinforcement learning framework with optimization-based Run-time Assurance (RTA) for spacecraft rendezvous, where the safety-critical constraints are enforced by Control Barrier Functions (CBFs). First, we formulate a discrete-time CBF to implement dynamic obstacle avoidance within uncertain environments, concurrently accounting for soft constraints of spacecraft including velocity, time, and fuel. Furthermore, we investigate the effect of RTA on reinforcement learning training performance in terms of training efficiency, satisfaction of safety constraints, control efficiency, task efficiency, and duration of training. Additionally, we evaluate our method through a spacecraft docking experiment conducted within a two-dimensional relative motion reference frame during proximity operations. Simulation and expanded test demonstrate the effectiveness of the proposed method, while our comprehensive framework employs RL algorithms for acquiring high-performance controllers and utilizes CBF-based controllers to guarantee safety.
Supported by National Natural Science Foundation of China (62072233) and Chinese Aeronautical Establishment (201919052002).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31. PMLR (2017)
Agrawal, A., Sreenath, K.: Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. In: Robotics: Science and Systems, vol. 13, pp. 1–10. Cambridge, MA, USA (2017)
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: 2019 18th European Control Conference (ECC), pp. 3420–3431. IEEE (2019)
Ames, A.D., Xu, X., Grizzle, J.W., Tabuada, P.: Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 62(8), 3861–3876 (2016)
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Broida, J., Linares, R.: Spacecraft rendezvous guidance in cluttered environments via reinforcement learning. In: 29th AAS/AIAA Space Flight Mechanics Meeting, pp. 1–15. American Astronautical Society (2019)
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding for pomdps. arXiv preprint (2022)
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3387–3395 (2019)
Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A lyapunov-based approach to safe reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031 (2019)
Clohessy, W., Wiltshire, R.: Terminal guidance system for satellite rendezvous. J. Aerosp. Sci. 27(9), 653–658 (1960)
Dunlap, K., Mote, M., Delsing, K., Hobbs, K.L.: Run time assured reinforcement learning for safe satellite docking. J. Aerosp. Inf. Syst. 20(1), 25–36 (2023)
Federici, L., Benedikter, B., Zavoli, A.: Deep learning techniques for autonomous spacecraft guidance during proximity operations. J. Spacecr. Rocket. 58(6), 1774–1785 (2021)
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Gaudet, B., Linares, R., Furfaro, R.: Adaptive guidance and integrated navigation with reinforcement meta-learning. Acta Astronaut. 169, 180–190 (2020)
Hamilton, N., Dunlap, K., Johnson, T.T., Hobbs, K.L.: Ablation study of how run time assurance impacts the training and performance of reinforcement learning agents. In: 2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT), pp. 45–55. IEEE (2023)
Hill, G.W.: Researches in the lunar theory. Am. J. Math. 1(1), 5–26 (1878)
Hovell, K., Ulrich, S.: On deep reinforcement learning for spacecraft guidance. In: AIAA Scitech 2020 Forum, p. 1600 (2020)
Jewison, C., Erwin, R.S., Saenz-Otero, A.: Model predictive control with ellipsoid obstacle constraints for spacecraft rendezvous. IFAC-PapersOnLine 48(9), 257–262 (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Ma, H., et al.: Model-based constrained reinforcement learning using generalized control barrier function. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4552–4559. IEEE (2021)
Oestreich, C.E., Linares, R., Gondhalekar, R.: Autonomous six-degree-of-freedom spacecraft docking maneuvers via reinforcement learning. arXiv preprint arXiv:2008.03215 (2020)
Rivera, J.G., Danylyszyn, A.A., Weinstock, C.B., Sha, L., Gagliardi, M.J.: An architectural description of the simplex architecture. Carnegie Mellon University, Pittsburg, Pennsylvania, Technical report, Software Engineering Institute (1996)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Scorsoglio, A., Furfaro, R., Linares, R., Massari, M.: Relative motion guidance for near-rectilinear lunar orbits with path constraints via actor-critic reinforcement learning. Adv. Space Res. 71(1), 316–335 (2023)
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006)
Yang, L., et al.: Constrained update projection approach to safe policy optimization. Adv. Neural. Inf. Process. Syst. 35, 9111–9124 (2022)
Yang, Z., et al.: Model-based reinforcement learning and neural-network-based policy compression for spacecraft rendezvous on resource-constrained embedded systems. IEEE Trans. Industr. Inf. 19(1), 1107–1116 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xiao, Y., Yang, Z., Zhou, Y., Huang, Z. (2024). Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_17
Download citation
DOI: https://doi.org/10.1007/978-981-99-8664-4_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8663-7
Online ISBN: 978-981-99-8664-4
eBook Packages: Computer ScienceComputer Science (R0)