Abstract
Reinforcement learning (RL) concerns the problem of a learning agent inter- acting with its environment to achieve a goal. Instead of being given examples of desired behavior, the learning agent must discover by trial and error how to be- have in order to get the most reward. RL has become popular as an approach to artificial intelligence because of its simple algorithms and mathematical founda- tions (Watkins, 1989; Sutton, 1988; Bertsekas and Tsitsiklis, 1996) and because of a string of strikingly successful applications (e.g., Tesauro, 1995; Crites and Barto, 1996; Zhang and Dietterich, 1996; Nie and Haykin, 1996; Singh and Bert- sekas, 1997; Baxter, Tridgell, and Weaver, 1998). An overall introduction to the field is provided by a recent textbook (Sutton and Barto, 1998). Here we summa- rize three stages in the development of the field, which we coarsely characterize as the past, present, and future of reinforcement learning.
The slides used in the talk corresponding to this extended abstract can be found at http://envy.cs.umass.edu/~rich/SEAL98/sld001.htm.
Similar content being viewed by others
References
Baxter, J., Tridgell, A., Weaver, L. (1998). KnightCap: A chess progream that learns by combining TD(λ) with game-tree search. Proceedings of the Fifteenth International Conference on Machine Learning, pp. 28–36.
Bertsekas, D. P., and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using re-inforcement learning. In Advances in Neural Information Processing Systems 9, pp. 1017–1023. MIT Press, Cambridge, MA.
McCallum, A. K. (1995) Reinforcement Learning with Selective Perception and Hidden State. University of Rochester PhD. thesis.
Nie, J., and Haykin, S. (1996). A dynamic channel assignment policy through Q-learning. CRL Report 334. Communications Research Laboratory, Mc-Master University, Hamilton, Ontario.
Precup, D., Sutton, R.S. (1998). Multi-time models for temporally abstract planning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge, MA.
Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 10, pp. 974–980. MIT Press, Cambridge, MA.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9–44.
Sutton, R. S., and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
Sutton, R. S., Precup, D., Singh, S. (1998). Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Technical Report 98-74, Department of Computer Science, University of Massachusetts.
Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58–68.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University.
Zhang, W., and Dietterich, T. G. (1996). High-performance job-shop scheduling with a time delay TD(λ) network. In Advances in Neural Information Processing Systems 9, pp. 1024–1030. MIT Press, Cambridge, MA.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sutton, R.S. (1999). Reinforcement Learning: Past, Present and Future. In: McKay, B., Yao, X., Newton, C.S., Kim, JH., Furuhashi, T. (eds) Simulated Evolution and Learning. SEAL 1998. Lecture Notes in Computer Science(), vol 1585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48873-1_26
Download citation
DOI: https://doi.org/10.1007/3-540-48873-1_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65907-5
Online ISBN: 978-3-540-48873-6
eBook Packages: Springer Book Archive