Abstract
RRL is a relational reinforcement learning system based on Q-learning in relational state-action spaces. It aims to enable agents to learn how to act in an environment that has no natural representation as a tuple of constants. For relational reinforcement learning, the learning algorithm used to approximate the mapping between state-action pairs and their so called Q(uality)-value has to be very reliable, and it has to be able to handle the relational representation of state-action pairs. In this paper we investigate the use of Gaussian processes to approximate the Q-values of state-action pairs. In order to employ Gaussian processes in a relational setting we propose graph kernels as a covariance function between state-action pairs. The standard prediction mechanism for Gaussian processes requires a matrix inversion which can become unstable when the kernel matrix has low rank. These instabilities can be avoided by employing QR-factorization. This leads to better and more stable performance of the algorithm and a more efficient incremental update mechanism. Experiments conducted in the blocks world and with the Tetris game show that Gaussian processes with graph kernels can compete with, and often improve on, regression trees and instance based regression as a generalization algorithm for RRL.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Barnett, S. (1979). Matrix Methods for Engineers and Scientists. MacGraw-Hill.
Cortes, C., Haffner, P., & Mohri, M. (2003). Positive Definite Rational Kernels. In Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop.
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. In Proceedings of AAAI-98/IAAI-98, (pp. 761–768).
Demaine, E., Hohenberger, S., & Liben-Nowell, D. (2002). Tetris is Hard, Even to Approximate. Technical Report MIT-LCS-TR-865, Massachussets Institue of Technology, Boston.
Deshpande, M., Kuramochi, M., & Karypis, G. (2002). Automated Approaches for Classifying Structures. In Proceedings of the 2nd ACM SIGKDD Workshop on Data Mining in Bioinformatics.
Diestel, R. (2000). Graph Theory. Springer-Verlag.
Dietterich, T., & Wang, X. (2002). Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems, vol. 14, Cambridge, MA, The MIT Press.
Driessens, K., & Džeroski, S. (2002). Integrating experimentation and guidance in relational reinforcement learning. In C. Sammut, & A. Hoffmann (Eds.), Proceedings of the Nineteenth International Conference on Machine Learning (pp. 115–122). Morgan Kaufmann Publishers, Inc.
Driessens, K., & Džeroski, S. (2004). Integrating guidance into relational reinforcement learning. Machine Learning, 57, 271–304.
Driessens, K., & Ramon, J. (2003). Relational instance based regression for relational reinforcement learning. In Proceedings of the Twentieth International Conference on Machine Learning (pp. 123–130). AAAI Press.
Driessens, K., Ramon, J., & Blockeel, H. (2001). Speeding up Relational Reinforcement Learning Through the Use of an Incremental First Order Decision Tree Learner. In L. De Raedt, & P. Flach (Eds.), Proceedings of the 13th European Conference on Machine Learning, vol. 2167 of Lecture Notes in Artificial Intelligence (pp. 97–108). Springer-Verlag.
Džeroski, S., De Raedt, L., & Blockeel, H. (1998). Relational reinforcement Learning. In Proceedings of the 15th International Conference on Machine Learning (pp. 136–143). Morgan Kaufmann.
Engel, Y., Mannor, S., & Meir, R. (2003). Bayes meets Bellman: The gaussian process approach to temporal difference learning. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003) (pp. 154–161). Morgan Kaufmann.
Gärtner, T. (2002). Exponential and Geometric Kernels for Graphs. In NIPS Workshop on Unreal Data: Principles of Modeling Nonvectorial Data.
Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations, 5(1), 49–58.
Gärtner, T., Driessens, K., & Ramon, J. (2003a). Graph kernels and Gaussian processes for relational reinforcement learning. In Inductive Logic Programming, 13th International Conference, ILP 2003, Proceedings, vol. 2835 of Lecture Notes in Computer Science (pp. 146–163). Springer.
Gärtner, T., Flach, P., & Wrobel, S. (2003b). On graph kernels: Hardness Results and Efficient Alternatives. In M. W. B. Schölkopf (Ed.), Proceedings of the 16th Annual Conference on Computational Learning Theory and the 7th Kernel Workshop (129–143).
Gibbs, M. (1997). Bayesian Gaussian Processes for Regression and Classification. Ph.D. thesis, University of Cambridge.
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Johns Hopkins Series in the Mathematical Sciences. The Johns Hopkins University Press.
Graepel, T. (2002). PAC-Bayesian Pattern Classification with Kernels. Ph.D. thesis, TU Berlin.
Horvath, T., Gärtner, T., & Wrobel, S. (2004). Cyclic Pattern Kernels for Predictive Graph Mining. In Proceedings of the International Conference on Knowledge Discovery and Data Mining.
Imrich, W., & Klavžar, S. (2000). Product Graphs: Structure and Recognition. John Wiley.
Kaelbling, L., Littman, M., & Moore, A. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285.
Kashima, H., & Inokuchi, A. (2002). Kernels for Graph Classification. In ICDM Workshop on Active Mining.
Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels Between Labeled Graphs. In Proceedings of the 20th International Conference on Machine Learning.
Korte, B., & Vygen, J. (2002). Combinatorial Optimization: Theory and Algorithms. Springer-Verlag.
Kuramochi, M., & Karypis, G. (2001). Frequent subgraph discovery. In Proceedings of the IEEE International Conference on Data Mining.
MacKay, D. (1997a) Introduction to Gaussian processes. Aavailable at http://wol.ra.phy.cam.ac.uk/mackay.
MacKay, D. J. C. (1997b). Introduction to Gaussian processes. Available at http://wol.ra.phy.cam.ac.uk/mackay.
Mitchell, T. (1997). Machine Learning. McGraw-Hill.
Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161–178.
Rasmussen, C. E., & Kuss, M. (2004). Gaussian Processes in Reinforcement Learning. In Advances in Neural Information Processing Systems, vol. 16. MIT Press.
Rifkin, R. M. (2002). Everything old is new again: A fresh Look at Historical Approaches to Machine Learning. Ph.D. thesis, MIT.
Saunders, C., Gammerman, A., & Vovk, v. (1998). Ridge Regression Learning Algorithm in Dual Variables. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann.
Schaal, S., Atkeson, C. G., & Vijayakumar, S. (2000). Real-Time Robot Learning with Locally Weighted Statistical Learning. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 288–293). IEEE Press, Piscataway, N.J.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. MIT Press.
Smart, W. D., & Kaelbling, L. P. (2000). Practical Reinforcement Learning in Continuous Spaces. In Proceedings of the 17th International Conference on Machine Learning (pp. 903–910). Morgan Kaufmann.
Sutton, R., & Barto, A. (1998). Reinforcement Learning: An introduction. Cambridge, MA: The MIT Press.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer-Verlag.
Watkins, C. (1989). Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: David Page and Akihiro Yamamoto
Rights and permissions
About this article
Cite this article
Driessens, K., Ramon, J. & Gärtner, T. Graph kernels and Gaussian processes for relational reinforcement learning. Mach Learn 64, 91–119 (2006). https://doi.org/10.1007/s10994-006-8258-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-006-8258-y