An Introduction to MCMC for Machine Learning

Christophe Andrieu¹,
Nando de Freitas²,
Arnaud Doucet³ &
…
Michael I. Jordan⁴

51k Accesses
30 Altmetric
Explore all metrics

Abstract

This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.

Article PDF

A Short Introduction to Piecewise Deterministic Markov Samplers

Model-Based Machine Learning and Approximate Inference

The Monte–Carlo Method

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Al-Qaq, W. A., Devetsikiotis, M., &; Townsend, J. K. (1995). Stochastic gradient optimization of importance sampling for the efficient simulation of digital communication systems. IEEE Transactions on Communications, 43:12, 2975-2985.
Google Scholar
Albert, J., &; Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88:422, 669-679.
Google Scholar
Anderson, H. L. (1986). Metropolis, Monte Carlo, and the MANIAC. Los Alamos Science, 14, 96-108.
Google Scholar
Andrieu, C., &; Doucet, A. (1999). Joint Bayesian detection and estimation of noisy sinusoids via reversible jump MCMC. IEEE Transactions on Signal Processing, 47:10, 2667-2676.
Google Scholar
Andrieu, C., Breyer, L. A., &; Doucet, A. (1999). Convergence of simulated annealing using Foster-Lyapunov criteria. Technical Report CUED/F-INFENG/TR 346, Cambridge University Engineering Department.
Andrieu, C., de Freitas, N., &; Doucet, A. (1999). Sequential MCMC for Bayesian model selection. In IEEE Higher Order Statistics Workshop, Caesarea, Israel (pp. 130-134).
Andrieu, C., de Freitas, N., &; Doucet, A. (2000a). Reversible jump MCMC simulated annealing for neural networks. In Uncertainty in artificial intelligence (pp. 11-18). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Andrieu, C., de Freitas, N., &; Doucet, A. (2000b). Robust full Bayesian methods for neural networks. In S. A. Solla, T. K. Leen, &; K.-R. Müller (Eds.), Advances in neural information processing systems 12 (pp. 379-385). MIT Press.
Andrieu, C., de Freitas, N., &; Doucet, A. (2001a). Robust full Bayesian learning for radial basis networks. Neural Computation, 13:10, 2359-2407.
Google Scholar
Andrieu, C., de Freitas, N., &; Doucet, A. (2001b). Rao-blackwellised particle filtering via data augmentation. Advances in Neural Information Processing Systems (NIPS13).
Andrieu, C., Doucet, A., &; Punskaya, E. (2001). Sequential Monte Carlo methods for optimal filtering. In A Doucet, N. de Freitas, &; N. J. Gordon (Eds.), Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.
Google Scholar
Applegate, D., &; Kannan, R. (1991). Sampling and integration of near log-concave functions. In Proceedings of the Twenty Third Annual ACM Symposium on Theory of Computing (pp. 156-163).
Bar-Yossef, Z., Berg, A., Chien, S., Fakcharoenphol, J., &; Weitz, D. (2000). Approximating aggregate queries about web pages via random walks. In International Conference on Very Large Databases (pp. 535-544).
Barber, D., &; Williams, C. K. I. (1997). Gaussian processes for Bayesian classification via hybrid Monte Carlo. In M. C. Mozer, M. I. Jordan, &; T. Petsche (Eds.), Advances in neural information processing systems 9 (pp. 340-346). Cambridge, MA: MIT Press.
Google Scholar
Baum, L. E., Petrie, T., Soules, G., &; Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41, 164-171.
Google Scholar
Baxter, R. J. (1982). Exactly solved models in statistical mechanics. San Diego, CA: Academic Press.
Google Scholar
Beichl, I., &; Sullivan, F. (2000). The Metropolis algorithm. Computing in Science &; Engineering, 2:1, 65-69.
Google Scholar
Bergman, N. (1999). Recursive Bayesian estimation: Navigation and tracking applications. Ph.D. Thesis, Department of Electrical Engineering, Linköping University, Sweden.
Google Scholar
Berners-Lee, T., Cailliau, R., Luotonen, A., Nielsen, H. F., &; Secret, A. (1994). The World-Wide Web. Communications of the ACM, 10:4, 49-63.
Google Scholar
Besag, J., Green, P. J., Hidgon, D., &; Mengersen, K. (1995). Bayesian computation and stochastic systems. Statistical Science, 10:1, 3-66.
Google Scholar
Bielza, C., Müller, P., &; Rios Insua, D. (1999). Decision Analysis by Augmented Probability Simulation, Management Science, 45:7, 995-1007.
Google Scholar
Brooks, S. P. (1998). Markov chain Monte Carlo method and its application. The Statistician, 47:1 69-100.
Google Scholar
Browne, W. J., &; Draper, D. (2000). Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Computational Statistics 15, 391-420.
Google Scholar
Bucher, C. G. (1988). Adaptive sampling-An iterative fast Monte Carlo procedure. Structural Safety, 5, 119-126.
Google Scholar
Bui, H. H., Venkatesh, S., &; West, G. (1999). On the recognition of abstract Markov policies. In National Conference on Artificial Intelligence (AAAI-2000).
Carlin, B. P., &; Chib, S. (1995). Bayesian Model choice via MCMC. Journal of the Royal Statistical Society Series B, 57, 473-484.
Google Scholar
Carter, C. K., &; Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika, 81:3, 541-553.
Google Scholar
Casella, G., &; Robert, C. P. (1996). Rao-Blackwellisation of sampling schemes. Biometrika, 83:1, 81-94.
Google Scholar
Casella, G., Mengersen, K. L., Robert, C. P., &; Titterington, D. M. (1999). Perfect slice samplers for mixtures of distributions. Technical Report BU-1453-M, Department of Biometrics, Cornell University.
Celeux, G., &; Diebolt, J. (1985). The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly, 2, 73-82.
Google Scholar
Celeux, G., &; Diebolt, J. (1992). A stochastic approximation typeEMalgorithm for the mixture problem. Stochastics and Stochastics Reports, 41, 127-146.
Google Scholar
Chen, M. H., Shao, Q. M., &; Ibrahim, J. G. (Eds.) (2001). Monte Carlo methods for Bayesian computation. Berlin: Springer-Verlag.
Google Scholar
Cheng, J., &; Druzdzel, M. J. (2000). AIS-BN:An adaptive importance sampling algorithm for evidential reasoning in large bayesian networks. Journal of Artificial Intelligence Research, 13, 155-188.
Google Scholar
Chenney, S., &; Forsyth, D. A. (2000). Sampling plausible solutions to multi-body constraint problems. SIGGRAPH (pp. 219-228).
Clark, E., &; Quinn, A. (1999). A data-driven Bayesian sampling scheme for unsupervised image segmentation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Arizona (Vol. 6, pp. 3497-3500).
Google Scholar
Damien, P., Wakefield, J., &; Walker, S. (1999). Gibbs sampling for Bayesian non-conjugate and hierarchical models by auxiliary variables. Journal of the Royal Statistical Society B, 61:2, 331-344.
Google Scholar
de Freitas, N., Højen-Sørensen, P., Jordan, M. I., &; Russell, S. (2001). Variational MCMC. In J. Breese &; D. Koller (Eds.), Uncertainty in artificial intelligence (pp. 120-127). San Matio, CA: Morgan Kaufmann.
Google Scholar
de Freitas, N., Niranjan, M., Gee, A. H., &; Doucet, A. (2000). Sequential Monte Carlo methods to train neural network models. Neural Computation, 12:4, 955-993.
Google Scholar
De Jong, P., &; Shephard, N. (1995). Efficient sampling from the smoothing density in time series models. Biometrika, 82:2, 339-350.
Google Scholar
Dempster, A. P., Laird, N. M., &; Rubin, D. B. (1997). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B, 39, 1-38.
Google Scholar
Denison, D. G. T., Mallick, B. K., &; Smith, A. F. M. (1998). A Bayesian CART algorithm. Biometrika, 85, 363-377.
Google Scholar
Diaconis, P., &; Saloff-Coste, L. (1998). What do we know about the Metropolis algorithm? Journal of Computer and System Sciences, 57, 20-36.
Google Scholar
Doucet, A. (1998). On sequential simulation-based methods for Bayesian filtering. Technical Report CUED/FINFENG/TR 310, Department of Engineering, Cambridge University.
Doucet, A., de Freitas, N., &; Gordon, N. J. (Eds.) (2001). Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.
Google Scholar
Doucet, A., de Freitas, N., Murphy, K., &; Russell, S. (2000). Rao blackwellised particle filtering for dynamic Bayesian networks. In C. Boutilier &; M. Godszmidt (Eds.), Uncertainty in artificial intelligence (pp. 176-183). Morgan Kaufmann Publishers.
Doucet, A., Godsill, S., &; Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing, 10:3, 197-208.
Google Scholar
Doucet, A., Godsill, S. J., &; Robert, C. P. (2000). Marginal maximum a posteriori estimation using MCMC. Technical Report CUED/F-INFENG/TR 375, Cambridge University Engineering Department.
Duane, S., Kennedy, A. D., Pendleton, B. J., &; Roweth, D. (1987). Hybrid Monte Carlo. Physics Letters B, 195:2, 216-222.
Google Scholar
Dyer, M., Frieze, A., &; Kannan, R. (1991). A random polynomial-time algorithm for approximating the volume of convex bodies. Journal of the ACM, 1:38, 1-17.
Google Scholar
Eckhard, R. (1987). Stan Ulam, John Von Neumann and the Monte Carlo method. Los Alamos Science, 15, 131-136.
Google Scholar
Escobar, M. D., &; West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577-588.
Google Scholar
Fill, J. A. (1998). An interruptible algorithm for perfect sampling via Markov chains. The Annals of Applied Probability, 8:1, 131-162.
Google Scholar
Forsyth, D. A. (1999). Sampling, resampling and colour constancy. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 300-305).
Fox, D., Thrun, S., Burgard,W., &; Dellaert, F. (2001). Particle filters for mobile robot localization. In A. Doucet, N. de Freitas, &; N. J. Gordon (Eds.), Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.
Google Scholar
Gelfand, A. E., &; Sahu, S. K. (1994). On Markov chain Monte Carlo acceleration. Journal of Computational and Graphical Statistics, 3, 261-276.
Google Scholar
Gelfand, A. E., &; Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85:410, 398-409.
Google Scholar
Geman, S., &; Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:6, 721-741.
Google Scholar
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica, 24, 1317-1399.
Google Scholar
Ghahramani, Z. (1995). Factorial learning and the EM algorithm. In G. Tesauro, D. S. Touretzky, &; J. Alspector (Eds.), Advances in neural information processing systems 7 (pp. 617-624).
Ghahramani, Z., &; Jordan, M. (1995). Factorial hidden Markov models. Technical Report 9502, MIT Artificial Intelligence Lab, MA.
Google Scholar
Gilks, W. R., &; Berzuini, C. (1998). Monte Carlo inference for dynamic Bayesian models. Unpublished. Medical Research Council, Cambridge, UK.
Gilks, W. R., &; Roberts, G. O. (1996). Strategies for improving MCMC. In W. R. Gilks, S. Richardson, &; D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 89-114). Chapman &; Hall.
Gilks, W. R., Richardson, S., &; Spiegelhalter, D. J. (Eds.) (1996). Markov chain Monte Carlo in practice. Suffolk: Chapman and Hall.
Google Scholar
Gilks, W. R., Roberts, G. O., &; Sahu, S. K. (1998). Adaptive Markov chain Monte Carlo through regeneration. Journal of the American Statistical Association, 93, 763-769.
Google Scholar
Gilks, W. R., Thomas, A., &; Spiegelhalter, D. J. (1994).A language and program for complex Bayesian modelling. The Statistician, 43, 169-178.
Google Scholar
Godsill, S. J., &; Rayner, P. J. W. (Eds.) (1998). Digital audio restoration: A statistical model based approach. Berlin: Springer-Verlag.
Google Scholar
Gordon, N. J., Salmond, D. J., &; Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings-F, 140:2, 107-113.
Google Scholar
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.
Google Scholar
Green, P. J., &; Richardson, S. (2000). Modelling heterogeneity with and without the Dirichlet process. Department of Statistics, Bristol University.
Haario, H., &; Sacksman, E. (1991). Simulated annealing process in general state space. Advances in Applied Probability, 23, 866-893.
Google Scholar
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their Applications. Biometrika 57, 97-109.
Google Scholar
Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with application. Journal of American Statistical Association, 93:442, 585-595.
Google Scholar
Holmes, C. C., &; Mallick, B. K. (1998). Bayesian radial basis functions of variable dimension. Neural Computation, 10:5, 1217-1233.
Google Scholar
Isard, M., &; Blake, A. (1996). Contour tracking by stochastic propagation of conditional density. In European Conference on Computer Vision (pp. 343-356). Cambridge, UK.
Ishwaran, H. (1999). Application of hybrid Monte Carlo to Bayesian generalized linear models: Quasicomplete separation and neural networks. Journal of Computational and Graphical Statistics, 8, 779-799.
Google Scholar
Jensen, C. S., Kong, A., &; Kjærulff, U. (1995). Blocking-Gibbs sampling in very large probabilistic expert systems. International Journal of Human-Computer Studies, 42, 647-666.
Google Scholar
Jerrum, M., &; Sinclair, A. (1996). The Markov chain Monte Carlo method: an approach to approximate counting and integration. In D. S. Hochbaum (Ed.), Approximation algorithms for NP-hard problems (pp. 482-519). PWS Publishing.
Jerrum, M., Sinclair, A., &; Vigoda, E. (2000). A polynomial-time approximation algorithm for the permanent of a matrix. Technical Report TR00-079, Electronic Colloquium on Computational Complexity.
Kalos, M. H., &; Whitlock, P. A. (1986). Monte Carlo methods. New York: John Wiley &; Sons.
Google Scholar
Kam, A. H. (2000). A general multiscale scheme for unsupervised image segmentation. Ph.D. Thesis, Department of Engineering, Cambridge University, Cambridge, UK.
Google Scholar
Kanazawa, K., Koller, D., &; Russell, S. (1995). Stochastic simulation algorithms for dynamic probabilistic networks. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 346-351). Morgan Kaufmann.
Kannan, R., &; Li, G. (1996). Sampling according to the multivariate normal density. In 37th Annual Symposium on Foundations of Computer Science (pp. 204-212). IEEE.
Kirkpatrick, S., Gelatt, C. D., &; Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220, 671-680.
Google Scholar
Levine, R., &; Casella, G. (2001).Implementations of the Monte Carlo EM algorithm. Journal of Computational and Graphical Statistics, 10:3, 422-440.
Google Scholar
Liu, J. S. (Ed.) (2001). Monte Carlo strategies in scientific computing. Berlin: Springer-Verlag.
Google Scholar
MacEachern, S. N., Clyde, M., &; Liu, J. S. (1999). Sequential importance sampling for nonparametric Bayes models: The next generation. Canadian Journal of Statistics, 27, 251-267.
Google Scholar
McCulloch, C. E. (1994). Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association, 89:425, 330-335.
Google Scholar
Mengersen, K. L., &; Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. The Annals of Statistics, 24, 101-121.
Google Scholar
Metropolis, N., &; Ulam, S. (1949). The Monte Carlo method. Journal of the American Statistical Association, 44:247, 335-341.
Google Scholar
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., &; Teller, E. (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1091.
Google Scholar
Meyn, S. P., &; Tweedie, R. L. (1993). Markov chains and stochastic stability. New York: Springer-Verlag.
Google Scholar
Mira, A. (1999). Ordering, slicing and splitting Monte Carlo Markov chains. Ph.D. Thesis, School of Statistics, University of Minnesota.
Morris, R. D., Fitzgerald, W. J., &; Kokaram, A. C. (1996). A sampling based approach to line scratch removal from motion picture frames. In IEEE International Conference on Image Processing (pp. 801-804).
Müller, P., &; Rios Insua, D. (1998). Issues in Bayesian analysis of neural network models. Neural Computation, 10, 571-592.
Google Scholar
Mykland, P., Tierney, L., &; Yu, B. (1995). Regeneration in Markov chain samplers. Journal of the American Statistical Association, 90, 233-241.
Google Scholar
Neal, R. M. (1993). Probabilistic inference using markov chain monte carlo methods. Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto.
Neal, R. M. (1996). Bayesian learning for neural networks. Lecture Notes in Statistics No. 118. New York: Springer-Verlag.
Google Scholar
Neal, R. M. (2000). Slice sampling. Technical Report No. 2005, Department of Statistics, University of Toronto.
Neuwald, A. F., Liu, J. S., Lipman, D. J., &; Lawrence, C. E. (1997). Extracting protein alignment models from the sequence database. Nucleic Acids Research, 25:9, 1665-1677.
Google Scholar
Newton, M. A., &; Lee,Y. (2000). Inferring the location and effect of tumor suppressor genes by instability-selection modeling of allelic-loss data. Biometrics, 56, 1088-1097.
Google Scholar
Ormoneit, D., Lemieux, C., &; Fleet, D. (2001). Lattice particle filters. Uncertainty in artificial intelligence. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Ortiz, L. E., &; Kaelbling, L. P. (2000). Adaptive importance sampling for estimation in structured domains. In C. Boutilier, &; M. Godszmidt (Eds.), Uncertainty in artificial intelligence (pp. 446-454). San Mateo, CA: Morgan Kaufmann Publishers.
Google Scholar
Page, L., Brin, S., Motwani, R., &; Winograd, T. (1998). The PageRank citation ranking: Bringing order to the Web. Stanford Digital Libraries Working Paper.
Pasula, H., &; Russell, S. (2001). Approximate inference for first-order probabilistic languages. In International Joint Conference on Artificial Intelligence, Seattle.
Pasula, H., Russell, S., Ostland, M., &; Ritov,Y. (1999). Tracking many objects with many sensors. In International Joint Conference on Artificial Intelligence, Stockholm.
Pearl, J. (1987). Evidential reasoning using stochastic simulation. Artificial Intelligence, 32, 245-257.
Google Scholar
Peskun, P. H. (1973). Optimum Monte-Carlo sampling using Markov chains. Biometrika, 60:3, 607-612.
Google Scholar
Pitt, M. K., &; Shephard, N. (1999). Filtering via simulation: Auxiliary particle filters. Journal of the American Statistical Association, 94:446, 590-599.
Google Scholar
Propp, J., &; Wilson, D. (1998). Coupling from the past: a user's guide. InD. Aldous, &; J. Propp (Eds.), Microsurveys in discrete probability. DIMACS series in discrete mathematics and theoretical computer science.
Remondo, D., Srinivasan, R., Nicola, V. F., van Etten, W. C., &; Tattje, H. E. P. (2000). Adaptive importance sampling for performance evaluation and parameter optimization of communications systems. IEEE Transactions on Communications, 48:4, 557-565.
Google Scholar
Richardson, S., &; Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components. Journal of the Royal Statistical Society, 59:4, 731-792.
Google Scholar
Ridgeway, G. (1999). Generalization of boosting algorithms and applications of bayesian inference for massive datasets. Ph.D. Thesis, Department of Statistics, University of Washington.
Rios Insua, D., &; Müller, P. (1998). Feedforward neural networks for nonparametric regression. In D. K. Dey, P. Müller, &; D. Sinha (Eds.), Practical nonparametric and semiparametric bayesian statistics (pp. 181-191). Springer Verlag.
Robert, C. P., &; Casella, G. (1999). Monte Carlo statistical methods. New York: Springer-Verlag.
Google Scholar
Roberts, G., &; Tweedie, R. (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika, 83, 95-110.
Google Scholar
Rubin, D. B. (1998). Using the SIR algorithm to simulate posterior distributions. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, &; A. F. M. Smith (Eds.), Bayesian statistics 3 (pp. 395-402). Cambridge, MA: Oxford University Press.
Google Scholar
Rubinstein, R. Y. (Eds.) (1981). Simulation and the Monte Carlo method. New York: John Wiley and Sons.
Google Scholar
Salmond, D., &; Gordon, N. (2001). Particles and mixtures for tracking and guidance. In A. Doucet, N. de Freitas, &; N. J. Gordon (Eds.), Sequential Monte Carlo methods in practice. Berlin: Springer-Verlag.
Google Scholar
Schuurmans, D., &; Southey, F. (2000). Monte Carlo inference via greedy importance sampling. In C. Boutilier, &; M. Godszmidt (Eds.), Uncertainty in artificial intelligence (pp. 523-532). Morgan Kaufmann Publishers.
Sherman, R. P., Ho, Y. K., &; Dalal, S. R. (1999). Conditions for convergence of Monte Carlo EM sequences with an application to product diffusion modeling. Econometrics Journal, 2:2, 248-267.
Google Scholar
Smith, P. J., Shafi, M., &; Gao, H. (1997). Quick simulation: A review of importance sampling techniques in communications systems. IEEE Journal on Selected Areas in Communications, 15:4, 597-613.
Google Scholar
Stephens, M. (1997). Bayesian methods for mixtures of normal distributions. Ph.D. Thesis, Department of Statistics, Oxford University, England.
Google Scholar
Swendsen, R. H., &; Wang, J. S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters, 58:2, 86-88.
Google Scholar
Tanner, M. A., &; Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82:398, 528-550.
Google Scholar
Thrun, S. (2000). Monte Carlo POMDPs. In S. Solla, T. Leen, &; K.-R. Müller (Eds.), Advances in neural information processing systems 12 (pp. 1064-1070). Cambridge, MA: MIT Press.
Google Scholar
Tierney, L. (1994). Markov chains for exploring posterior distributions. The Annals of Statistics, 22:4, 1701-1762.
Google Scholar
Tierney, L., &; Mira, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine, 18, 2507-2515.
Google Scholar
Troughton, P. T., &; Godsill, S. J. (1998). A reversible jump sampler for autoregressive time series. In International Conference on Acoustics, Speech and Signal Processing (Vol. IV, pp. 2257-2260).
Google Scholar
Tu, Z. W., &; Zhu, S. C. (2001). Image segmentation by data driven Markov chain Monte Carlo. In International Computer Vision Conference.
Utsugi, A. (2001). Ensemble of independent factor analyzers with application to natural image analysis. Neural Processing Letters, 14:1, 49-60.
Google Scholar
van der Merwe, R., Doucet, A., de Freitas, N., &; Wan, E. (2000). The unscented particle filter. Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department.
Van Laarhoven, P. J., &; Arts, E. H. L. (1987). Simulated annealing: Theory and applications. Amsterdam: Reidel Publishers.
Google Scholar
Veach, E., &; Guibas, L. J. (1997). Metropolis light transport. SIGGRAPH, 31, 65-76.
Google Scholar
Vermaak, J., Andrieu, C., Doucet, A., &; Godsill, S. J. (1999). Non-stationary Bayesian modelling and enhancement of speech signals. Technical Report CUED/F-INFENG/TR, Cambridge University Engineering Department.
Wakefield, J. C., Gelfand, A. E., &; Smith, A. F. M. (1991). Efficient generation of random variates via the ratio-of-uniforms methods. Statistics and Computing, 1, 129-133.
Google Scholar
Wei, G. C. G., &; Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. Journal of the American Statistical Association, 85:411, 699-704.
Google Scholar
West, M., Nevins, J. R., Marks, J. R., Spang, R., &; Zuzan, H. (2001). Bayesian regression analysis in the “large p, small n” paradigm with application in DNA microarray studies. Department of Statistics, Duke University.
Wilkinson, D. J., &; Yeung, S. K. H. (2002). Conditional simulation from highly structured Gaussian systems, with application to blocking-MCMC for the Bayesian analysis of very large linear models. Statistics and Computing, 12, 287-300.
Google Scholar
Wood, S., &; Kohn, R. (1998). A Bayesian approach to robust binary nonparametric regression. Journal of the American Statistical Association, 93:441, 203-213.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Statistics Group, University of Bristol, University Walk, Bristol, BS8 1TW, UK
Christophe Andrieu
Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver, BC, V6T 1Z4, Canada
Nando de Freitas
Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, Victoria, 3052, Australia
Arnaud Doucet
Departments of Computer Science and Statistics, University of California at Berkeley, 387 Soda Hall, Berkeley, CA, 94720-1776, USA
Michael I. Jordan

Authors

Christophe Andrieu
View author publications
You can also search for this author in PubMed Google Scholar
Nando de Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Arnaud Doucet
View author publications
You can also search for this author in PubMed Google Scholar
Michael I. Jordan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andrieu, C., de Freitas, N., Doucet, A. et al. An Introduction to MCMC for Machine Learning. Machine Learning 50, 5–43 (2003). https://doi.org/10.1023/A:1020281327116

Download citation

Issue Date: January 2003
DOI: https://doi.org/10.1023/A:1020281327116