Imbalanced Equilibrium: Emergence of Social Asymmetric Coordinated Behavior in Multi-agent Games

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13624))

Included in the following conference series:

International Conference on Neural Information Processing

786 Accesses

Abstract

Multi-agent deep reinforcement learning (MADRL) has made remarkable progress but usually requires delicate and fragile reward engineering. Modeling other agents (MOA) is an effective method for compensating for the absence of efficient reward signals. However, existing MOA methods often assume that only one agent can model other non-learning agents. In this study, we propose continuous mutual modeling (CMM), which constantly models other agents that also learn appropriate behaviors from their viewpoints to facilitate the coordination among agents in complex MADRL environments. We then propose a CMM framework referred to as predictor-actor-critic (PAC) in which every agent determines its actions by estimating those of other agents through mutual modeling. We experimentally show that the proposed method enables agents to realize other agents’ activities and promotes the emergence of better-coordinated behaviors in agent society.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning

Article Open access 07 June 2021

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

References

Albrecht, S.V., Stone, P.: Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018)
Article MathSciNet MATH Google Scholar
Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)
Barrett, S., Stone, P.: Cooperating with unknown teammates in complex domains: a robot soccer case study of ad hoc teamwork. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Bowling, M., McCracken, P.: Coordination and adaptation in impromptu teams. In: AAAI, vol. 5, pp. 53–58 (2005)
Google Scholar
He, H., et al.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)
Google Scholar
Heess, N., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hong, Z.W., et al.: A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1388–1396 (2018)
Google Scholar
Hughes, E., et al.: Inequity aversion improves cooperation in intertemporal social dilemmas. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Iida, H., Handa, K.I., Uiterwijk, J.: Tutoring strategies in game-tree search. ICGA J. 18(4), 191–204 (1995)
Article Google Scholar
Jaques, N., et al.: Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: International Conference on Machine Learning, pp. 3040–3049. PMLR (2019)
Google Scholar
Kollock, P.: Social dilemmas: the anatomy of cooperation. Ann. Rev. Sociol. 24(1), 183–214 (1998)
Article Google Scholar
OpenAI: OpenAI five (2018). https://blog.openai.com/openai-five/
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Article Google Scholar
Rovatsos, M., Weiß, G., Wolf, M.: Multiagent learning for open systems: a study in opponent classification. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2001-2002. LNCS (LNAI), vol. 2636, pp. 66–87. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44826-8_5
Chapter MATH Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Stone, P., Kaminka, G.A., Kraus, S., Rosenschein, J.S.: Ad hoc autonomous agent teams: collaboration without pre-coordination. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)
Google Scholar
Vinitsky, E., et al.: An open source implementation of sequential social dilemma games (2019). GitHub repository. https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182

Download references

Acknowledgements

This work was partly supported by JST KAKENHI and SPRING, Grant Numbers 20H04245 and JPMJSP2128.

Author information

Authors and Affiliations

Computer Science and Engineering, Waseda University, Tokyo, 169-8555, Japan
Yidong Bai & Toshiharu Sugawara

Authors

Yidong Bai
View author publications
You can also search for this author in PubMed Google Scholar
Toshiharu Sugawara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yidong Bai .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, Y., Sugawara, T. (2023). Imbalanced Equilibrium: Emergence of Social Asymmetric Coordinated Behavior in Multi-agent Games. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13624. Springer, Cham. https://doi.org/10.1007/978-3-031-30108-7_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-30108-7_26
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30107-0
Online ISBN: 978-3-031-30108-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics