Abstract
Multi-agent deep reinforcement learning (MADRL) has made remarkable progress but usually requires delicate and fragile reward engineering. Modeling other agents (MOA) is an effective method for compensating for the absence of efficient reward signals. However, existing MOA methods often assume that only one agent can model other non-learning agents. In this study, we propose continuous mutual modeling (CMM), which constantly models other agents that also learn appropriate behaviors from their viewpoints to facilitate the coordination among agents in complex MADRL environments. We then propose a CMM framework referred to as predictor-actor-critic (PAC) in which every agent determines its actions by estimating those of other agents through mutual modeling. We experimentally show that the proposed method enables agents to realize other agents’ activities and promotes the emergence of better-coordinated behaviors in agent society.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albrecht, S.V., Stone, P.: Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018)
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
Barrett, S., Stone, P.: Cooperating with unknown teammates in complex domains: a robot soccer case study of ad hoc teamwork. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Bowling, M., McCracken, P.: Coordination and adaptation in impromptu teams. In: AAAI, vol. 5, pp. 53–58 (2005)
He, H., et al.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)
Heess, N., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hong, Z.W., et al.: A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1388–1396 (2018)
Hughes, E., et al.: Inequity aversion improves cooperation in intertemporal social dilemmas. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Iida, H., Handa, K.I., Uiterwijk, J.: Tutoring strategies in game-tree search. ICGA J. 18(4), 191–204 (1995)
Jaques, N., et al.: Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: International Conference on Machine Learning, pp. 3040–3049. PMLR (2019)
Kollock, P.: Social dilemmas: the anatomy of cooperation. Ann. Rev. Sociol. 24(1), 183–214 (1998)
OpenAI: OpenAI five (2018). https://blog.openai.com/openai-five/
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Rovatsos, M., Weiß, G., Wolf, M.: Multiagent learning for open systems: a study in opponent classification. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2001-2002. LNCS (LNAI), vol. 2636, pp. 66–87. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44826-8_5
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Stone, P., Kaminka, G.A., Kraus, S., Rosenschein, J.S.: Ad hoc autonomous agent teams: collaboration without pre-coordination. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Vinitsky, E., et al.: An open source implementation of sequential social dilemma games (2019). GitHub repository. https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182
Acknowledgements
This work was partly supported by JST KAKENHI and SPRING, Grant Numbers 20H04245 and JPMJSP2128.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bai, Y., Sugawara, T. (2023). Imbalanced Equilibrium: Emergence of Social Asymmetric Coordinated Behavior in Multi-agent Games. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13624. Springer, Cham. https://doi.org/10.1007/978-3-031-30108-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-30108-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30107-0
Online ISBN: 978-3-031-30108-7
eBook Packages: Computer ScienceComputer Science (R0)