[go: up one dir, main page]

skip to main content
10.1145/3314221.3314638acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article
Public Access

An inductive synthesis framework for verifiable reinforcement learning

Published: 08 June 2019 Publication History

Abstract

Despite the tremendous advances that have been made in the last decade on developing useful machine-learning applications, their wider adoption has been hindered by the lack of strong assurance guarantees that can be made about their behavior. In this paper, we consider how formal verification techniques developed for traditional software systems can be repurposed for verification of reinforcement learning-enabled ones, a particularly important class of machine learning systems. Rather than enforcing safety by examining and altering the structure of a complex neural network implementation, our technique uses blackbox methods to synthesizes deterministic programs, simpler, more interpretable, approximations of the network that can nonetheless guarantee desired safety properties are preserved, even when the network is deployed in unanticipated or previously unobserved environments. Our methodology frames the problem of neural network verification in terms of a counterexample and syntax-guided inductive synthesis procedure over these programs. The synthesis procedure searches for both a deterministic program and an inductive invariant over an infinite state transition system that represents a specification of an application's control logic. Additional specifications defining environment-based constraints can also be provided to further refine the search space. Synthesized programs deployed in conjunction with a neural network implementation dynamically enforce safety conditions by monitoring and preventing potentially unsafe actions proposed by neural policies. Experimental results over a wide range of cyber-physical applications demonstrate that software-inspired formal verification techniques can be used to realize trustworthy reinforcement learning systems with low overhead.

Supplementary Material

WEBM File (p686-zhu.webm)

References

[1]
M. Ahmadi, C. Rowley, and U. Topcu. 2018. Control-Oriented Learning of Lagrangian and Hamiltonian Systems. In American Control Conference.
[2]
Anayo K. Akametalu, Shahab Kaynama, Jaime F. Fisac, Melanie Nicole Zeilinger, Jeremy H. Gillula, and Claire J. Tomlin. 2014. Reachability-based safe learning with Gaussian processes. In 53rd IEEE Conference on Decision and Control, CDC 2014. 1424-1431.
[3]
Anayo K. Akametalu, Shahab Kaynama, Jaime F. Fisac, Melanie Nicole Zeilinger, Jeremy H. Gillula, and Claire J. Tomlin. 2014. Reachability-based safe learning with Gaussian processes. In 53rd IEEE Conference on Decision and Control, CDC 2014, Los Angeles, CA, USA, December 15-17, 2014. 1424-1431.
[4]
Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. 2018. Safe Reinforcement Learning via Shielding. AAAI (2018).
[5]
Rajeev Alur, Rastislav Bodík, Eric Dallal, Dana Fisman, Pranav Garg, Garvit Juniwal, Hadas Kress-Gazit, P. Madhusudan, Milo M. K. Martin, Mukund Raghothaman, Shambwaditya Saha, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2015. Syntax-Guided Synthesis. In Dependable Software Systems Engineering. NATO Science for Peace and Security Series, D: Information and Communication Security, Vol. 40. IOS Press, 1-25.
[6]
MOSEK ApS. 2018. The mosek optimization software. http://www.mosek.com
[7]
Somil Bansal, Mo Chen, Sylvia Herbert, and Claire J Tomlin. 2017. Hamilton-Jacobi Reachability: A Brief Overview and Recent Advances. (2017).
[8]
Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya V. Nori, and Antonio Criminisi. 2016. Measuring Neural Net Robustness with Constraints. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). 2621-2629.
[9]
Osbert Bastani, Yewen Pu, and Armando Solar-Lezama. 2018. Verifiable Reinforcement Learning via Policy Extraction. CoRR abs/1805.08328 (2018).
[10]
Richard N Bergman, Diane T Finegood, and Marilyn Ader. 1985. Assessment of insulin sensitivity in vivo. Endocrine reviews 6, 1 (1985), 45-86.
[11]
Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, and Andreas Krause. 2017. Safe Model-based Reinforcement Learning with Stability Guarantees. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017. 908- 919.
[12]
Roderick Bloem, Bettina Könighofer, Robert Könighofer, and Chao Wang. 2015. Shield Synthesis: - Runtime Enforcement for Reactive Systems. In Tools and Algorithms for the Construction and Analysis of Systems - 21-t International Conference, TACAS 2015. 533-548.
[13]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS'08/ETAPS'08). 337-340.
[14]
Peter Dorato, Vito Cerone, and Chaouki Abdallah. 1994. Linear-Quadratic Control: An Introduction. Simon & Schuster, Inc., New York, NY, USA.
[15]
Chuchu Fan, Umang Mathur, Sayan Mitra, and Mahesh Viswanathan. 2018. Controller Synthesis Made Real: Reach-Avoid Specifications and Linear Dynamics. In Computer Aided Verification - 30th International Conference, CAV 2018. 347-366.
[16]
Ioannis Filippidis, Sumanth Dathathri, Scott C. Livingston, Necmiye Ozay, and Richard M. Murray. 2016. Control design for hybrid systems with TuLiP: The Temporal Logic Planning toolbox. In 2016 IEEE Conference on Control Applications, CCA 2016. 1030-1041.
[17]
Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin T. Vechev. 2018. AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation. In 2018 IEEE Symposium on Security and Privacy, SP 2018. 3-18.
[18]
Jeremy H. Gillula and Claire J. Tomlin. 2012. Guaranteed Safe Online Learning via Reachability: tracking a ground target using a quadrotor. In IEEE International Conference on Robotics and Automation, ICRA 2012, 14-18 May, 2012, St. Paul, Minnesota, USA. 2723-2730.
[19]
Sumit Gulwani, Saurabh Srivastava, and Ramarathnam Venkatesan. 2008. Program Analysis As Constraint Solving. In Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '08). 281-292.
[20]
Sumit Gulwani and Ashish Tiwari. 2008. Constraint-based approach for analysis of hybrid systems. In International Conference on Computer Aided Verification. 190-203.
[21]
Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety Verification of Deep Neural Networks. In CAV (1) (Lecture Notes in Computer Science), Vol. 10426. 3-29.
[22]
Dominic W Jordan and Peter Smith. 1987. Nonlinear ordinary differential equations. (1987).
[23]
Guy Katz, Clark W. Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks. In Computer Aided Verification - 29th International Conference, CAV 2017. 97-117.

Cited By

View all
  • (2024)Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning AgentIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32238925:4(1443-1457)Online publication date: Apr-2024
  • (2024)Synergies Between Machine Learning and Reasoning - An Introduction by the Kay R. Amel groupInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109206(109206)Online publication date: Apr-2024
  • (2024)A survey on interpretable reinforcement learningMachine Learning10.1007/s10994-024-06543-w113:8(5847-5890)Online publication date: 19-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
June 2019
1162 pages
ISBN:9781450367127
DOI:10.1145/3314221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 June 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. Invariant Inference
  2. Program Synthesis
  3. Program Verification
  4. Reinforcement Learning
  5. Runtime Shielding

Qualifiers

  • Research-article

Funding Sources

Conference

PLDI '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)296
  • Downloads (Last 6 weeks)37
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Explain the Explainer: Interpreting Model-Agnostic Counterfactual Explanations of a Deep Reinforcement Learning AgentIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32238925:4(1443-1457)Online publication date: Apr-2024
  • (2024)Synergies Between Machine Learning and Reasoning - An Introduction by the Kay R. Amel groupInternational Journal of Approximate Reasoning10.1016/j.ijar.2024.109206(109206)Online publication date: Apr-2024
  • (2024)A survey on interpretable reinforcement learningMachine Learning10.1007/s10994-024-06543-w113:8(5847-5890)Online publication date: 19-Apr-2024
  • (2024)Safe Exploration in Reinforcement Learning by Reachability Analysis over Learned ModelsComputer Aided Verification10.1007/978-3-031-65633-0_11(232-255)Online publication date: 26-Jul-2024
  • (2023)Interpretable and explainable logical policies via neurally guided symbolic abstractionProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668334(50838-50858)Online publication date: 10-Dec-2023
  • (2023)Runtime Verification of Learning Properties for Reinforcement Learning AlgorithmsElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.395.15395(205-219)Online publication date: 15-Nov-2023
  • (2023)Formal Synthesis of Neural Barrier Certificates for Continuous Systems via Counterexample Guided LearningACM Transactions on Embedded Computing Systems10.1145/360912522:5s(1-21)Online publication date: 31-Oct-2023
  • (2023)Deep Reinforcement Learning Verification: A SurveyACM Computing Surveys10.1145/359644455:14s(1-31)Online publication date: 6-May-2023
  • (2023)Symbolic Fixpoint Algorithms for Logical LTL Games2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00212(698-709)Online publication date: 11-Sep-2023
  • (2023)Risk-aware Shielding of Partially Observable Monte Carlo Planning PoliciesArtificial Intelligence10.1016/j.artint.2023.103987(103987)Online publication date: Aug-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media