[go: up one dir, main page]

Skip to main content

MAVIPER: Learning Decision Tree Policies forĀ Interpretable Multi-agent Reinforcement Learning

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

Many recent breakthroughs in multi-agentĀ reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.

S. Milani and Z. Zhangā€”Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    We use the PytorchĀ [31] implementation https://github.com/shariqiqbal2810/maddpg-pytorch.

References

  1. Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: ICML (2004)

    Google ScholarĀ 

  2. Bastani, O., et al.: Verifiable reinforcement learning via policy extraction. In: NeurIPS (2018)

    Google ScholarĀ 

  3. Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint 1912.06680 (2019)

    Google ScholarĀ 

  4. Bhalla, S., et al.: Deep multi agent reinforcement learning for autonomous driving. In: Canadian Conference Artificial Intelligent (2020)

    Google ScholarĀ 

  5. Brittain, M., Wei, P.: Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. arXiv preprint arXiv:1905.01303 (2019)

  6. BuciluĒŽ, C., et al.: Model compression. In: KDD (2006)

    Google ScholarĀ 

  7. Chen, Z., et al.: Relace: Reinforcement learning agent for counterfactual explanations of arbitrary predictive models. arXiv preprint arXiv:2110.11960 (2021)

  8. Degris, T., et al.: Learning the structure of factored Markov decision processes in reinforcement learning problems. In: ICML (2006)

    Google ScholarĀ 

  9. Ernst, D., et al.: Tree-based batch mode reinforcement learning. JMLR 6 (2005)

    Google ScholarĀ 

  10. Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)

    Google ScholarĀ 

  11. Foerster, J., et al.: Counterfactual multi-agent policy gradients. In: AAAI (2018)

    Google ScholarĀ 

  12. Heuillet, A., et al.: Collective explainable ai: explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values. IEEE Comput. Intell. Mag. 17, 59ā€“71 (2022)

    Google ScholarĀ 

  13. Hinton, G., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  14. Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: ICML (2019)

    Google ScholarĀ 

  15. Kazhdan, D., et al.: Marleme: a multi-agent reinforcement learning model extraction library. In: IJCNN (2020)

    Google ScholarĀ 

  16. Li, S., et al.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: AAAI (2019)

    Google ScholarĀ 

  17. Li, W., et al.: Sparsemaac: sparse attention for multi-agent reinforcement learning. In: International Conference on Database Systems for Advanced Applications (2019)

    Google ScholarĀ 

  18. Lipton, Z.: The mythos of model interpretability. ACM Queue 16(3) (2018)

    Google ScholarĀ 

  19. Littman, M.: Markov games as a framework for multi-agent reinforcement learning. In: Mach. Learning (1994)

    Google ScholarĀ 

  20. Lowe, R., et al.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2017)

  21. Luss, R., et al.: Local explanations for reinforcement learning. arXiv preprint arXiv:2202.03597 (2022)

  22. Malialis, K., Kudenko, D.: Distributed response to network intrusions using multiagent reinforcement learning. Eng. Appl. Artif. Intell. 40, 270ā€“284 (2015)

    Google ScholarĀ 

  23. Matignon, L., et al.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1ā€“31 (2012)

    Google ScholarĀ 

  24. McCallum, R.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Univ. Rochester, Dept. of Comp. Sci. (1997)

    Google ScholarĀ 

  25. Meng, Z., et al.: Interpreting deep learning-based networking systems. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (2020)

    Google ScholarĀ 

  26. Milani, S., et al.: A survey of explainable reinforcement learning. arXiv preprint arXiv:2202.08434 (2022)

  27. Mohanty, S., et al.: Flatland-rl: multi-agent reinforcement learning on trains. arXiv preprint arXiv:2012.05893 (2020)

  28. Molnar, C.: Interpretable Machine Learning (2019)

    Google ScholarĀ 

  29. Motokawa, Y., Sugawara, T.: MAT-DQN: toward interpretable multi-agent deep reinforcement learning for coordinated activities. In: ICANN (2021)

    Google ScholarĀ 

  30. Oliehoek, F., et al.: Optimal and approximate q-value functions for decentralized pomdps. JAIR 32, 289ā€“353 (2008)

    Google ScholarĀ 

  31. Paszke, A., et al.: Automatic differentiation in pytorch (2017)

    Google ScholarĀ 

  32. Pyeatt, L.: Reinforcement learning with decision trees. In: Appl. Informatics (2003)

    Google ScholarĀ 

  33. Pyeatt, L., Howe, A.: Decision tree function approximation in reinforcement learning. In: Int. Symp. on Adaptive Syst.: Evol. Comput. and Prob. Graphical Models (2001)

    Google ScholarĀ 

  34. Quinlan, J.: Induction of decision trees. Mach. Learn. 1, 81ā€“106 (1986)

    Google ScholarĀ 

  35. Rashid, T., et al.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)

    Google ScholarĀ 

  36. Ross, S., et al.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS (2011)

    Google ScholarĀ 

  37. Roth, A., et al.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv preprint arXiv:1907.01180 (2019)

  38. Shapley, L.: Stochastic games. PNAS 39(10), 1095ā€“1100 (1953)

    Google ScholarĀ 

  39. Son, K., et al.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1905.05408 (2019)

  40. Strehl, A., et al.: Efficient structure learning in factored-state mdps. In: AAAI (2007)

    Google ScholarĀ 

  41. Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)

  42. Topin, N., et al.: Iterative bounding mdps: learning interpretable policies via non-interpretable methods. In: AAAI (2021)

    Google ScholarĀ 

  43. Tuyls, K., et al.: Reinforcement learning in large state spaces. In: Robot Soccer World Cup (2002)

    Google ScholarĀ 

  44. Uther, W., Veloso, M.: The lumberjack algorithm for learning linked decision forests. In: International Symposium on Abstraction, Reformulation, and Approximation (2000)

    Google ScholarĀ 

  45. Vasic, M., et al.: Moƫt: Interpretable and verifiable reinforcement learning via mixture of expert trees. arXiv preprint arXiv:1906.06717 (2019)

  46. Wang, T., et al.: Dataset distillation. arXiv preprint arXiv:1811.10959 (2018)

  47. Wang, X., et al.: Explanation of reinforcement learning model in dynamic multi-agent system. arXiv preprint arXiv:2008.01508 (2020)

  48. Yu, C., et al.: The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)

Download references

Acknowledgements

This material is based upon work supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program. This research was sponsored by the U.S. Army Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the funding agencies or government agencies. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephanie Milani .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 669 KB)

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Milani, S. et al. (2023). MAVIPER: Learning Decision Tree Policies forĀ Interpretable Multi-agent Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26412-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26411-5

  • Online ISBN: 978-3-031-26412-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics