Abstract
When we apply reinforcement learning onto multi-agent environment, credit assignment problem will occur, because it is sometimes difficult to define which agents are the real contributors. If we praise all agents, when a group of cooperative agents get reward, some agents which did not contribute it will also reinforce their policies. On the other hand, if we praise obvious contributors only, indirect contribution will not be reinforced. For the first step to reduce this dilemma, we propose a classification of reward, and then investigate the feature of it. We treat a positioning task on SoccerServer for the experiments. The empirical results show that direct reward takes effect faster and helps obtaining individuality. On the contrary, indirect reward takes effect slower, but agents tend to form a group and obtain another effective positioning.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Andou, T.: Andhill-98: A robocup team which reinforces positioning with observation. In: Asada, M., Kitano, H. (eds.) RoboCup-98: Robot Soccer World Cup II, pp. 338–345. Springer, Heidelberg (1998)
Arai, S., Sycara, K.: Effective learning approach for planning and scheduling in multi-agent domain. In: Proceedings of the 6th International Conference on Simulation of Adaptive Behavior, pp. 507–516 (2000)
Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England (1989)
Grefenstette, J.J.: Credit assignment in rule discovery systems based on genetic algorithms. Machine Learning 3, 225–245 (1988)
Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Miyazaki, K., Kobayashi, S.: Rationality of reward sharing in multi-agent reinforcement learning. In: Second Pacific Rim International Workshop on Multi-Agents, pp. 111–125 (1999)
Noda, I., Matsubara, H., Hiraki, K., Frank, I.: Soccer server: A tool for research on multiagent systems. In: Applied Artificial Intelligence, vol. 12, pp. 233–250 (1998)
Sen, S., Sekaran, M., Hale, J.: Proceedings of the 12th national conference on artificial intelligence. Learning to Coordinate without Sharing Information, 426–431 (1994)
Stone, P., McAllester, D.: An Architecture for Action Selection in Robotic Soccer. In: Proceedings of the Fifth International Conference on Autonomous Agents (2001)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ohta, M. (2003). Direct Reward and Indirect Reward in Multi-agent Reinforcement Learning. In: Kaminka, G.A., Lima, P.U., Rojas, R. (eds) RoboCup 2002: Robot Soccer World Cup VI. RoboCup 2002. Lecture Notes in Computer Science(), vol 2752. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45135-8_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-45135-8_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40666-2
Online ISBN: 978-3-540-45135-8
eBook Packages: Springer Book Archive