Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance

Yingmin Xiao^10,11,
Zhibin Yang^10,11,
Yong Zhou^10,11 &
…
Zhiqiu Huang^10,11

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14464))

Included in the following conference series:

International Symposium on Dependable Software Engineering: Theories, Tools, and Applications

550 Accesses

Abstract

Autonomous spacecraft rendezvous poses significant challenges in increasingly complex space missions. Recently, Reinforcement Learning (RL) has proven effective in the domain of spacecraft rendezvous, owing to its high performance in complex continuous control tasks and low online storage and computation cost. However, the lack of safety guarantees during the learning process restricts the application of RL to safety-critical control systems within real-world environments. To mitigate this challenge, we introduce a safe reinforcement learning framework with optimization-based Run-time Assurance (RTA) for spacecraft rendezvous, where the safety-critical constraints are enforced by Control Barrier Functions (CBFs). First, we formulate a discrete-time CBF to implement dynamic obstacle avoidance within uncertain environments, concurrently accounting for soft constraints of spacecraft including velocity, time, and fuel. Furthermore, we investigate the effect of RTA on reinforcement learning training performance in terms of training efficiency, satisfaction of safety constraints, control efficiency, task efficiency, and duration of training. Additionally, we evaluate our method through a spacecraft docking experiment conducted within a two-dimensional relative motion reference frame during proximity operations. Simulation and expanded test demonstrate the effectiveness of the proposed method, while our comprehensive framework employs RL algorithms for acquiring high-performance controllers and utilizes CBF-based controllers to guarantee safety.

Supported by National Natural Science Foundation of China (62072233) and Chinese Aeronautical Establishment (201919052002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Comparative Analysis of Reinforcement Learning Algorithms for Robust Interplanetary Trajectory Design

A Reinforcement Learning Method to Trajectory Design for Manned Lunar Mission via Reshaping Rewards

Multi-robot hierarchical safe reinforcement learning autonomous decision-making strategy based on uniformly ultimate boundedness constraints

Article Open access 18 February 2025

References

Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31. PMLR (2017)
Google Scholar
Agrawal, A., Sreenath, K.: Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. In: Robotics: Science and Systems, vol. 13, pp. 1–10. Cambridge, MA, USA (2017)
Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: 2019 18th European Control Conference (ECC), pp. 3420–3431. IEEE (2019)
Google Scholar
Ames, A.D., Xu, X., Grizzle, J.W., Tabuada, P.: Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 62(8), 3861–3876 (2016)
Article MathSciNet Google Scholar
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Broida, J., Linares, R.: Spacecraft rendezvous guidance in cluttered environments via reinforcement learning. In: 29th AAS/AIAA Space Flight Mechanics Meeting, pp. 1–15. American Astronautical Society (2019)
Google Scholar
Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding for pomdps. arXiv preprint (2022)
Google Scholar
Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3387–3395 (2019)
Google Scholar
Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A lyapunov-based approach to safe reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031 (2019)
Clohessy, W., Wiltshire, R.: Terminal guidance system for satellite rendezvous. J. Aerosp. Sci. 27(9), 653–658 (1960)
Article Google Scholar
Dunlap, K., Mote, M., Delsing, K., Hobbs, K.L.: Run time assured reinforcement learning for safe satellite docking. J. Aerosp. Inf. Syst. 20(1), 25–36 (2023)
Google Scholar
Federici, L., Benedikter, B., Zavoli, A.: Deep learning techniques for autonomous spacecraft guidance during proximity operations. J. Spacecr. Rocket. 58(6), 1774–1785 (2021)
Article Google Scholar
Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet Google Scholar
Gaudet, B., Linares, R., Furfaro, R.: Adaptive guidance and integrated navigation with reinforcement meta-learning. Acta Astronaut. 169, 180–190 (2020)
Article Google Scholar
Hamilton, N., Dunlap, K., Johnson, T.T., Hobbs, K.L.: Ablation study of how run time assurance impacts the training and performance of reinforcement learning agents. In: 2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT), pp. 45–55. IEEE (2023)
Google Scholar
Hill, G.W.: Researches in the lunar theory. Am. J. Math. 1(1), 5–26 (1878)
Article MathSciNet Google Scholar
Hovell, K., Ulrich, S.: On deep reinforcement learning for spacecraft guidance. In: AIAA Scitech 2020 Forum, p. 1600 (2020)
Google Scholar
Jewison, C., Erwin, R.S., Saenz-Otero, A.: Model predictive control with ellipsoid obstacle constraints for spacecraft rendezvous. IFAC-PapersOnLine 48(9), 257–262 (2015)
Article Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Ma, H., et al.: Model-based constrained reinforcement learning using generalized control barrier function. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4552–4559. IEEE (2021)
Google Scholar
Oestreich, C.E., Linares, R., Gondhalekar, R.: Autonomous six-degree-of-freedom spacecraft docking maneuvers via reinforcement learning. arXiv preprint arXiv:2008.03215 (2020)
Rivera, J.G., Danylyszyn, A.A., Weinstock, C.B., Sha, L., Gagliardi, M.J.: An architectural description of the simplex architecture. Carnegie Mellon University, Pittsburg, Pennsylvania, Technical report, Software Engineering Institute (1996)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Scorsoglio, A., Furfaro, R., Linares, R., Massari, M.: Relative motion guidance for near-rectilinear lunar orbits with path constraints via actor-critic reinforcement learning. Adv. Space Res. 71(1), 316–335 (2023)
Article Google Scholar
Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006)
Article MathSciNet Google Scholar
Yang, L., et al.: Constrained update projection approach to safe policy optimization. Adv. Neural. Inf. Process. Syst. 35, 9111–9124 (2022)
Google Scholar
Yang, Z., et al.: Model-based reinforcement learning and neural-network-based policy compression for spacecraft rendezvous on resource-constrained embedded systems. IEEE Trans. Industr. Inf. 19(1), 1107–1116 (2023)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Yingmin Xiao, Zhibin Yang, Yong Zhou & Zhiqiu Huang
Key Laboratory of Safety-Critical Software, Ministry of Industry and Information Technology, Nanjing, 211106, China
Yingmin Xiao, Zhibin Yang, Yong Zhou & Zhiqiu Huang

Authors

Yingmin Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhibin Yang .

Editor information

Editors and Affiliations

Saarland University, Saarbrücken, Germany
Holger Hermanns
Singapore Management University, Singapore, Singapore
Jun Sun
Nanjing University, Nanjing, China
Lei Bu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiao, Y., Yang, Z., Zhou, Y., Huang, Z. (2024). Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_17

Download citation

DOI: https://doi.org/10.1007/978-981-99-8664-4_17
Published: 15 December 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8663-7
Online ISBN: 978-981-99-8664-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis of Reinforcement Learning Algorithms for Robust Interplanetary Trajectory Design

A Reinforcement Learning Method to Trajectory Design for Manned Lunar Mission via Reshaping Rewards

Multi-robot hierarchical safe reinforcement learning autonomous decision-making strategy based on uniformly ultimate boundedness constraints

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis of Reinforcement Learning Algorithms for Robust Interplanetary Trajectory Design

A Reinforcement Learning Method to Trajectory Design for Manned Lunar Mission via Reshaping Rewards

Multi-robot hierarchical safe reinforcement learning autonomous decision-making strategy based on uniformly ultimate boundedness constraints

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation