[go: up one dir, main page]

Skip to main content

Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance

  • Conference paper
  • First Online:
Dependable Software Engineering. Theories, Tools, and Applications (SETTA 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14464))

  • 550 Accesses

Abstract

Autonomous spacecraft rendezvous poses significant challenges in increasingly complex space missions. Recently, Reinforcement Learning (RL) has proven effective in the domain of spacecraft rendezvous, owing to its high performance in complex continuous control tasks and low online storage and computation cost. However, the lack of safety guarantees during the learning process restricts the application of RL to safety-critical control systems within real-world environments. To mitigate this challenge, we introduce a safe reinforcement learning framework with optimization-based Run-time Assurance (RTA) for spacecraft rendezvous, where the safety-critical constraints are enforced by Control Barrier Functions (CBFs). First, we formulate a discrete-time CBF to implement dynamic obstacle avoidance within uncertain environments, concurrently accounting for soft constraints of spacecraft including velocity, time, and fuel. Furthermore, we investigate the effect of RTA on reinforcement learning training performance in terms of training efficiency, satisfaction of safety constraints, control efficiency, task efficiency, and duration of training. Additionally, we evaluate our method through a spacecraft docking experiment conducted within a two-dimensional relative motion reference frame during proximity operations. Simulation and expanded test demonstrate the effectiveness of the proposed method, while our comprehensive framework employs RL algorithms for acquiring high-performance controllers and utilizes CBF-based controllers to guarantee safety.

Supported by National Natural Science Foundation of China (62072233) and Chinese Aeronautical Establishment (201919052002).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning, pp. 22–31. PMLR (2017)

    Google Scholar 

  2. Agrawal, A., Sreenath, K.: Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation. In: Robotics: Science and Systems, vol. 13, pp. 1–10. Cambridge, MA, USA (2017)

    Google Scholar 

  3. Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  4. Ames, A.D., Coogan, S., Egerstedt, M., Notomista, G., Sreenath, K., Tabuada, P.: Control barrier functions: theory and applications. In: 2019 18th European Control Conference (ECC), pp. 3420–3431. IEEE (2019)

    Google Scholar 

  5. Ames, A.D., Xu, X., Grizzle, J.W., Tabuada, P.: Control barrier function based quadratic programs for safety critical systems. IEEE Trans. Autom. Control 62(8), 3861–3876 (2016)

    Article  MathSciNet  Google Scholar 

  6. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

  7. Broida, J., Linares, R.: Spacecraft rendezvous guidance in cluttered environments via reinforcement learning. In: 29th AAS/AIAA Space Flight Mechanics Meeting, pp. 1–15. American Astronautical Society (2019)

    Google Scholar 

  8. Carr, S., Jansen, N., Junges, S., Topcu, U.: Safe reinforcement learning via shielding for pomdps. arXiv preprint (2022)

    Google Scholar 

  9. Cheng, R., Orosz, G., Murray, R.M., Burdick, J.W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3387–3395 (2019)

    Google Scholar 

  10. Chow, Y., Nachum, O., Duenez-Guzman, E., Ghavamzadeh, M.: A lyapunov-based approach to safe reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  11. Chow, Y., Nachum, O., Faust, A., Duenez-Guzman, E., Ghavamzadeh, M.: Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031 (2019)

  12. Clohessy, W., Wiltshire, R.: Terminal guidance system for satellite rendezvous. J. Aerosp. Sci. 27(9), 653–658 (1960)

    Article  Google Scholar 

  13. Dunlap, K., Mote, M., Delsing, K., Hobbs, K.L.: Run time assured reinforcement learning for safe satellite docking. J. Aerosp. Inf. Syst. 20(1), 25–36 (2023)

    Google Scholar 

  14. Federici, L., Benedikter, B., Zavoli, A.: Deep learning techniques for autonomous spacecraft guidance during proximity operations. J. Spacecr. Rocket. 58(6), 1774–1785 (2021)

    Article  Google Scholar 

  15. Garcıa, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  Google Scholar 

  16. Gaudet, B., Linares, R., Furfaro, R.: Adaptive guidance and integrated navigation with reinforcement meta-learning. Acta Astronaut. 169, 180–190 (2020)

    Article  Google Scholar 

  17. Hamilton, N., Dunlap, K., Johnson, T.T., Hobbs, K.L.: Ablation study of how run time assurance impacts the training and performance of reinforcement learning agents. In: 2023 IEEE 9th International Conference on Space Mission Challenges for Information Technology (SMC-IT), pp. 45–55. IEEE (2023)

    Google Scholar 

  18. Hill, G.W.: Researches in the lunar theory. Am. J. Math. 1(1), 5–26 (1878)

    Article  MathSciNet  Google Scholar 

  19. Hovell, K., Ulrich, S.: On deep reinforcement learning for spacecraft guidance. In: AIAA Scitech 2020 Forum, p. 1600 (2020)

    Google Scholar 

  20. Jewison, C., Erwin, R.S., Saenz-Otero, A.: Model predictive control with ellipsoid obstacle constraints for spacecraft rendezvous. IFAC-PapersOnLine 48(9), 257–262 (2015)

    Article  Google Scholar 

  21. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  22. Ma, H., et al.: Model-based constrained reinforcement learning using generalized control barrier function. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4552–4559. IEEE (2021)

    Google Scholar 

  23. Oestreich, C.E., Linares, R., Gondhalekar, R.: Autonomous six-degree-of-freedom spacecraft docking maneuvers via reinforcement learning. arXiv preprint arXiv:2008.03215 (2020)

  24. Rivera, J.G., Danylyszyn, A.A., Weinstock, C.B., Sha, L., Gagliardi, M.J.: An architectural description of the simplex architecture. Carnegie Mellon University, Pittsburg, Pennsylvania, Technical report, Software Engineering Institute (1996)

    Google Scholar 

  25. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  26. Scorsoglio, A., Furfaro, R., Linares, R., Massari, M.: Relative motion guidance for near-rectilinear lunar orbits with path constraints via actor-critic reinforcement learning. Adv. Space Res. 71(1), 316–335 (2023)

    Article  Google Scholar 

  27. Wächter, A., Biegler, L.T.: On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 106, 25–57 (2006)

    Article  MathSciNet  Google Scholar 

  28. Yang, L., et al.: Constrained update projection approach to safe policy optimization. Adv. Neural. Inf. Process. Syst. 35, 9111–9124 (2022)

    Google Scholar 

  29. Yang, Z., et al.: Model-based reinforcement learning and neural-network-based policy compression for spacecraft rendezvous on resource-constrained embedded systems. IEEE Trans. Industr. Inf. 19(1), 1107–1116 (2023)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhibin Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiao, Y., Yang, Z., Zhou, Y., Huang, Z. (2024). Run-Time Assured Reinforcement Learning for Safe Spacecraft Rendezvous with Obstacle Avoidance. In: Hermanns, H., Sun, J., Bu, L. (eds) Dependable Software Engineering. Theories, Tools, and Applications. SETTA 2023. Lecture Notes in Computer Science, vol 14464. Springer, Singapore. https://doi.org/10.1007/978-981-99-8664-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8664-4_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8663-7

  • Online ISBN: 978-981-99-8664-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics