MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning

Stephanie Milani¹³,
Zhicheng Zhang^13,14,
Nicholay Topin¹³,
Zheyuan Ryan Shi¹³,
Charles Kamhoua¹⁵,
Evangelos E. Papalexakis¹⁶ &
…
Fei Fang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13716))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

956 Accesses
2 Citations

Abstract

Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.

S. Milani and Z. Zhang—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Behavior Tree Generation Study for Multi-agent

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

Notes

1.
We use the Pytorch [31] implementation https://github.com/shariqiqbal2810/maddpg-pytorch.

References

Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: ICML (2004)
Google Scholar
Bastani, O., et al.: Verifiable reinforcement learning via policy extraction. In: NeurIPS (2018)
Google Scholar
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint 1912.06680 (2019)
Google Scholar
Bhalla, S., et al.: Deep multi agent reinforcement learning for autonomous driving. In: Canadian Conference Artificial Intelligent (2020)
Google Scholar
Brittain, M., Wei, P.: Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. arXiv preprint arXiv:1905.01303 (2019)
Buciluǎ, C., et al.: Model compression. In: KDD (2006)
Google Scholar
Chen, Z., et al.: Relace: Reinforcement learning agent for counterfactual explanations of arbitrary predictive models. arXiv preprint arXiv:2110.11960 (2021)
Degris, T., et al.: Learning the structure of factored Markov decision processes in reinforcement learning problems. In: ICML (2006)
Google Scholar
Ernst, D., et al.: Tree-based batch mode reinforcement learning. JMLR 6 (2005)
Google Scholar
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)
Google Scholar
Foerster, J., et al.: Counterfactual multi-agent policy gradients. In: AAAI (2018)
Google Scholar
Heuillet, A., et al.: Collective explainable ai: explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values. IEEE Comput. Intell. Mag. 17, 59–71 (2022)
Google Scholar
Hinton, G., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: ICML (2019)
Google Scholar
Kazhdan, D., et al.: Marleme: a multi-agent reinforcement learning model extraction library. In: IJCNN (2020)
Google Scholar
Li, S., et al.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: AAAI (2019)
Google Scholar
Li, W., et al.: Sparsemaac: sparse attention for multi-agent reinforcement learning. In: International Conference on Database Systems for Advanced Applications (2019)
Google Scholar
Lipton, Z.: The mythos of model interpretability. ACM Queue 16(3) (2018)
Google Scholar
Littman, M.: Markov games as a framework for multi-agent reinforcement learning. In: Mach. Learning (1994)
Google Scholar
Lowe, R., et al.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2017)
Luss, R., et al.: Local explanations for reinforcement learning. arXiv preprint arXiv:2202.03597 (2022)
Malialis, K., Kudenko, D.: Distributed response to network intrusions using multiagent reinforcement learning. Eng. Appl. Artif. Intell. 40, 270–284 (2015)
Google Scholar
Matignon, L., et al.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)
Google Scholar
McCallum, R.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Univ. Rochester, Dept. of Comp. Sci. (1997)
Google Scholar
Meng, Z., et al.: Interpreting deep learning-based networking systems. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (2020)
Google Scholar
Milani, S., et al.: A survey of explainable reinforcement learning. arXiv preprint arXiv:2202.08434 (2022)
Mohanty, S., et al.: Flatland-rl: multi-agent reinforcement learning on trains. arXiv preprint arXiv:2012.05893 (2020)
Molnar, C.: Interpretable Machine Learning (2019)
Google Scholar
Motokawa, Y., Sugawara, T.: MAT-DQN: toward interpretable multi-agent deep reinforcement learning for coordinated activities. In: ICANN (2021)
Google Scholar
Oliehoek, F., et al.: Optimal and approximate q-value functions for decentralized pomdps. JAIR 32, 289–353 (2008)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Google Scholar
Pyeatt, L.: Reinforcement learning with decision trees. In: Appl. Informatics (2003)
Google Scholar
Pyeatt, L., Howe, A.: Decision tree function approximation in reinforcement learning. In: Int. Symp. on Adaptive Syst.: Evol. Comput. and Prob. Graphical Models (2001)
Google Scholar
Quinlan, J.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Google Scholar
Rashid, T., et al.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
Google Scholar
Ross, S., et al.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS (2011)
Google Scholar
Roth, A., et al.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv preprint arXiv:1907.01180 (2019)
Shapley, L.: Stochastic games. PNAS 39(10), 1095–1100 (1953)
Google Scholar
Son, K., et al.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1905.05408 (2019)
Strehl, A., et al.: Efficient structure learning in factored-state mdps. In: AAAI (2007)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)
Topin, N., et al.: Iterative bounding mdps: learning interpretable policies via non-interpretable methods. In: AAAI (2021)
Google Scholar
Tuyls, K., et al.: Reinforcement learning in large state spaces. In: Robot Soccer World Cup (2002)
Google Scholar
Uther, W., Veloso, M.: The lumberjack algorithm for learning linked decision forests. In: International Symposium on Abstraction, Reformulation, and Approximation (2000)
Google Scholar
Vasic, M., et al.: Moët: Interpretable and verifiable reinforcement learning via mixture of expert trees. arXiv preprint arXiv:1906.06717 (2019)
Wang, T., et al.: Dataset distillation. arXiv preprint arXiv:1811.10959 (2018)
Wang, X., et al.: Explanation of reinforcement learning model in dynamic multi-agent system. arXiv preprint arXiv:2008.01508 (2020)
Yu, C., et al.: The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)

Download references

Acknowledgements

This material is based upon work supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program. This research was sponsored by the U.S. Army Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the funding agencies or government agencies. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi & Fei Fang
Shanghai Jiao Tong University, Shanghai, China
Zhicheng Zhang
Army Research Lab, Adelphi, USA
Charles Kamhoua
University of California, Riverside, USA
Evangelos E. Papalexakis

Authors

Stephanie Milani
View author publications
You can also search for this author in PubMed Google Scholar
Zhicheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Nicholay Topin
View author publications
You can also search for this author in PubMed Google Scholar
Zheyuan Ryan Shi
View author publications
You can also search for this author in PubMed Google Scholar
Charles Kamhoua
View author publications
You can also search for this author in PubMed Google Scholar
Evangelos E. Papalexakis
View author publications
You can also search for this author in PubMed Google Scholar
Fei Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephanie Milani .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 669 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Milani, S. et al. (2023). MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-26412-2_16
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Behavior Tree Generation Study for Multi-agent

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 669 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-agent Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Behavior Tree Generation Study for Multi-agent

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 669 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation