Definition
Learning graphical models (see Graphical Models) means to learn a graphical representation of either a causal or probabilistic model containing the variables X j ∈ { X i }. Although graphical models include more than directed acyclic graphs (DAGs), the focus here shall be on learning DAGs, as that is where the majority of research and application is taking place.
Definition 1 (Directed acyclic graph (DAG)).
A directed acyclic graph (DAG) is a set of variables (nodes, vertices) {X i } and a set of directed arcs (edges) between them such that following the arcs in their given direction can never lead from a variable back to itself.
DAGs parameterized to represent probability distributions are otherwise known as Bayesian networks. Some necessary concepts and notation for discussing the learning of graphical models is given in Table 1.
Recommended Reading
The PC algorithm and variants were initially documented in Spirtes et al. (1993); their second edition (Spirtes, Glymour, & Scheines, 2000) covers more ground. Their TETRAD IV program is available from their web site http://www.phil.cmu.edu/projects/tetrad/. PC is contained within (and is available also with the weka machine learning platform at http://www.cs.waikato.ac.nz/ml/weka/).
A well-known tutorial by David Heckerman (1999) (reprinted without change in Heckerman, 2008) is well worth looking at for background in causal discovery and parameterizing Bayesian networks. A more current review of many of the topics introduced here is to be found in a forthcoming article (Daly, Shen, & Aitken, forthcoming). For other good treatments of parameterization see Cowell, Dawid, Lauritzen, and Spiegelhalter (1999) or Neapolitan (2003).
There are a number of useful anthologies in the area of learning graphical models. Learning in Graphical Models (Jordan, 1999) is one of the best, including Heckerman’s tutorial and a variety of excellent reviews of causal discovery methods, such as Markov Chain Monte Carlo search techniques.
Textbooks treating the learning of Bayesian networks include Borgelt and Kruse (2002), Neapolitan (2003), Korb and Nicholson (2004), and Koller and Friedman (2009).
Aliferis, C. F., Statnikov, A., Tsamardinos, I., Mani, S., & Koutsoukos, X. D. (2010a). Local causal and Markov blanket induction for causal discovery and feature selection for classification. Part I: Algorithms and empirical evaluation. Journal of Machine Learning Research, 11, 171–234.
Aliferis, C. F., Statnikov, A., Tsamardinos, I., Mani, S., & Koutsoukos, X. D. (2010b). Local causal and Markov blanket induction for causal discovery and feature selection for classification. Part II: Analysis and extensions. Journal of Machine Learning Research, 11, 235–284.
Borgelt, C., & Kruse, R. (2002). Graphical models: Methods for data analysis and mining. New York: Wiley.
Bouckaert, R. (1993). Probabilistic network construction using the minimum description length principle. Lecture Notes in Computer Science, 747, 41–48.
Chickering, D. M. (1995). A tranformational characterization of equivalent Bayesian network structures. In P. Besnard & S. Hanks (Eds.), Proceedings of the 11th conference on uncertainty in artificial intelligence, San Francisco (pp. 87–98).
Chickering, D. M., & Heckerman, D. (1997). Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning, 29, 181–212.
Chickering, D. M., Heckerman, D., & Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330.
Chickering, D. M., & Meek, C. (2002). Finding optimal Bayesian networks. In Proceedings of the 18th annual conference on uncertainty in AI, San Francisco (pp. 94–102).
Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467.
Cowell, R. G., Dawid, A. P., Lauritzen, St. L., & Spiegelhalter, D. J. (1999). Probabilistic networks and expert systems. New York: Springer.
Cruz-Ramírez, N., Acosta-Mesa, H. G., Barrientos-Martínez, R. E., & Nava-Fernández, L. A. (2006). How good are the Bayesian information criterion and the Minimum Description Length principle for model selection? A Bayesian network analysis. Lecture Notes in Computer Science, 4293, 494–504.
Daly, R., Shen, Q., & Aitken, S. (forthcoming). Learning Bayesian networks: Approaches and issues. The Knowledge Engineering Review.
Dash, D., & Cooper, G. F. (2004). Model averaging for prediction with discrete Bayesian networks. Journal of Machine Learning Research, 5, 1177–1203.
Guyon, I., Aliferis, C., Cooper, G., Elisseeff, A., Pellet, J.-P., Spirtes, P., et al. (Eds.) (2008). JMLR workshop and conference proceedings: Causation and prediction challenge (WCCI 2008), volume 3. Journal of Machine Learning Research.
Heckerman, D. (1999). A tutorial on learning with Bayesian networks. In M. Jordan, (Ed.), Learning in graphical models (pp. 301–354). Cambridge: MIT.
Heckerman, D. (2008). A tutorial on learning with Bayesian networks. In Innovations in Bayesian networks (pp. 33–82). Berlin: Springer Verlag.
Heckerman, D., Geiger, D., & Chickering, D. M. (1994). Learning Bayesian networks: The combination of knowledge and statistical data. In Lopes de Mantras & D. Poole (Eds.), Proceedings of the tenth conference on uncertainty in artificial intelligence, San Francisco (pp. 293–301).
Jordan, M. I. (1999). Learning in graphical models. Cambridge, MA: MIT.
Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. Cambridge, MA: MIT.
Korb, K. B., & Nicholson, A. E. (2004). Bayesian artificial intelligence. Boca Raton, FL: CRC.
Larrañaga, P., Kuijpers, C. M. H., Murga, R. H., & Yurramendi, Y. (1996). Learning Bayesian network structures by searching for the best ordering with genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics, Part A, 26, 487–493.
Margaritis, D., & Thrun, S. (2000). Bayesian network induction via local neighborhoods. In S. A. Solla, T. K. Leen, & K. R. Müller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 505–511). Cambridge: MIT.
Nägele, A., Dejori, M., & Stetter, M. (2007). Bayesian substructure learning – Approximate learning of very large network structures. In Proceedings of the 18th European conference on machine learning; Lecture notes in AI (Vol. 4701, pp. 238–249). Warsaw, Poland.
Neapolitan, R. E. (2003). Learning Bayesian networks. Upper Saddle River, NJ: Prentice Hall.
Neil, J. R., & Korb, K. B. (1999). The evolution of causal models. In N. Zhong & L. Zhous (Eds.), Third Pacific-Asia Conference on Knowledge Discovery and Datamining (PAKDD-99), Beijing, China (pp. 432–437). New York: Springer.
O’Donnell, R., Nicholson, A., Han, B., Korb, K., Alam, M., & Hope, L. (2006). Causal discovery with prior information. In Australasian joint conference on artificial intelligence, (pp. 1162–1167). New York: Springer.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction and search. In Lecture notes in statistics (Vol. 81). New York: Springer.
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction and search. (2nd ed.). Cambridge: MIT.
Suzuki, J. (1996). Learning Bayesian belief networks based on the minimum description length principle. In L. Saitta (Ed.) Proceedings of the 13th international conference on machine learning, San Francisco (pp. 462–470). Morgan Kaufman.
Suzuki, J. (1999). Learning Bayesian belief networks based on the MDL principle: An efficient algorithm using the branch and bound technique. IEEE Transactions on Information and Systems, 82, 356–367.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Machine learning, 65(1), 31–78.
Verma, T. S., & Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the sixth conference on uncertainty in AI, Boston (pp. 220–227). San Mateo, CA: Morgan Kaufmann.
Wallace, C. S., Korb, K. B., & Dai, H. (1996). Causal discovery via MML. In L. Saitta, (Ed.), Proceedings of the 13th international conference on machine learning, San Francisco (pp. 516–524). Morgan Kaufman.
Wong, M. L., Lam, W., & Leung, K. S. (1999). Using evolutionary programming and minimum description length principle for data mining of Bayesian networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(2), 174–178.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Korb, K.B. (2011). Learning Graphical Models. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_460
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_460
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering