Abstract
Many recent breakthroughs in multi-agentĀ reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.
S. Milani and Z. ZhangāEqual contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We use the PytorchĀ [31] implementation https://github.com/shariqiqbal2810/maddpg-pytorch.
References
Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: ICML (2004)
Bastani, O., et al.: Verifiable reinforcement learning via policy extraction. In: NeurIPS (2018)
Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv preprint 1912.06680 (2019)
Bhalla, S., et al.: Deep multi agent reinforcement learning for autonomous driving. In: Canadian Conference Artificial Intelligent (2020)
Brittain, M., Wei, P.: Autonomous air traffic controller: a deep multi-agent reinforcement learning approach. arXiv preprint arXiv:1905.01303 (2019)
BuciluĒ, C., et al.: Model compression. In: KDD (2006)
Chen, Z., et al.: Relace: Reinforcement learning agent for counterfactual explanations of arbitrary predictive models. arXiv preprint arXiv:2110.11960 (2021)
Degris, T., et al.: Learning the structure of factored Markov decision processes in reinforcement learning problems. In: ICML (2006)
Ernst, D., et al.: Tree-based batch mode reinforcement learning. JMLR 6 (2005)
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: ICML (2017)
Foerster, J., et al.: Counterfactual multi-agent policy gradients. In: AAAI (2018)
Heuillet, A., et al.: Collective explainable ai: explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values. IEEE Comput. Intell. Mag. 17, 59ā71 (2022)
Hinton, G., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: ICML (2019)
Kazhdan, D., et al.: Marleme: a multi-agent reinforcement learning model extraction library. In: IJCNN (2020)
Li, S., et al.: Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient. In: AAAI (2019)
Li, W., et al.: Sparsemaac: sparse attention for multi-agent reinforcement learning. In: International Conference on Database Systems for Advanced Applications (2019)
Lipton, Z.: The mythos of model interpretability. ACM Queue 16(3) (2018)
Littman, M.: Markov games as a framework for multi-agent reinforcement learning. In: Mach. Learning (1994)
Lowe, R., et al.: Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275 (2017)
Luss, R., et al.: Local explanations for reinforcement learning. arXiv preprint arXiv:2202.03597 (2022)
Malialis, K., Kudenko, D.: Distributed response to network intrusions using multiagent reinforcement learning. Eng. Appl. Artif. Intell. 40, 270ā284 (2015)
Matignon, L., et al.: Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1ā31 (2012)
McCallum, R.: Reinforcement learning with selective perception and hidden state. Ph.D. thesis, Univ. Rochester, Dept. of Comp. Sci. (1997)
Meng, Z., et al.: Interpreting deep learning-based networking systems. In: Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication (2020)
Milani, S., et al.: A survey of explainable reinforcement learning. arXiv preprint arXiv:2202.08434 (2022)
Mohanty, S., et al.: Flatland-rl: multi-agent reinforcement learning on trains. arXiv preprint arXiv:2012.05893 (2020)
Molnar, C.: Interpretable Machine Learning (2019)
Motokawa, Y., Sugawara, T.: MAT-DQN: toward interpretable multi-agent deep reinforcement learning for coordinated activities. In: ICANN (2021)
Oliehoek, F., et al.: Optimal and approximate q-value functions for decentralized pomdps. JAIR 32, 289ā353 (2008)
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Pyeatt, L.: Reinforcement learning with decision trees. In: Appl. Informatics (2003)
Pyeatt, L., Howe, A.: Decision tree function approximation in reinforcement learning. In: Int. Symp. on Adaptive Syst.: Evol. Comput. and Prob. Graphical Models (2001)
Quinlan, J.: Induction of decision trees. Mach. Learn. 1, 81ā106 (1986)
Rashid, T., et al.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. In: ICML (2018)
Ross, S., et al.: A reduction of imitation learning and structured prediction to no-regret online learning. In: AISTATS (2011)
Roth, A., et al.: Conservative q-improvement: reinforcement learning for an interpretable decision-tree policy. arXiv preprint arXiv:1907.01180 (2019)
Shapley, L.: Stochastic games. PNAS 39(10), 1095ā1100 (1953)
Son, K., et al.: Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1905.05408 (2019)
Strehl, A., et al.: Efficient structure learning in factored-state mdps. In: AAAI (2007)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296 (2017)
Topin, N., et al.: Iterative bounding mdps: learning interpretable policies via non-interpretable methods. In: AAAI (2021)
Tuyls, K., et al.: Reinforcement learning in large state spaces. In: Robot Soccer World Cup (2002)
Uther, W., Veloso, M.: The lumberjack algorithm for learning linked decision forests. In: International Symposium on Abstraction, Reformulation, and Approximation (2000)
Vasic, M., et al.: Moƫt: Interpretable and verifiable reinforcement learning via mixture of expert trees. arXiv preprint arXiv:1906.06717 (2019)
Wang, T., et al.: Dataset distillation. arXiv preprint arXiv:1811.10959 (2018)
Wang, X., et al.: Explanation of reinforcement learning model in dynamic multi-agent system. arXiv preprint arXiv:2008.01508 (2020)
Yu, C., et al.: The surprising effectiveness of mappo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Acknowledgements
This material is based upon work supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program. This research was sponsored by the U.S. Army Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the funding agencies or government agencies. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Milani, S. et al. (2023). MAVIPER: Learning Decision Tree Policies forĀ Interpretable Multi-agent Reinforcement Learning. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13716. Springer, Cham. https://doi.org/10.1007/978-3-031-26412-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-26412-2_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26411-5
Online ISBN: 978-3-031-26412-2
eBook Packages: Computer ScienceComputer Science (R0)