Abstract
This paper proposes a simple, practical, and efficient MCMC algorithm for Bayesian analysis of big data. The proposed algorithm suggests to divide the big dataset into some smaller subsets and provides a simple method to aggregate the subset posteriors to approximate the full data posterior. To further speed up computation, the proposed algorithm employs the population stochastic approximation Monte Carlo algorithm, a parallel MCMC algorithm, to simulate from each subset posterior. Since this algorithm consists of two levels of parallel, data parallel and simulation parallel, it is coined as “Double-Parallel Monte Carlo.” The validity of the proposed algorithm is justified mathematically and numerically.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Andrieu, C., Moulines, E., Priouret, P.: Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim. 44, 283–312 (2005). https://doi.org/10.1137/S0363012902417267
Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97, 465–480 (2010)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741 (1984). https://doi.org/10.1109/TPAMI.1984.4767596
Hasting, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97
Kass, R.E., Tierney, L., Kadane, J.B.: The validity of posterior expansions based on Laplace’s method. In: Geisser, S., Hodges, J.S., Press, S.J., ZeUnerv, A. (eds.) Bayesian and Likelihood Methods in Statistics and Econometrics: Essays in Honor of George A. Barnard, vol. 7, pp. 473–488. North Holland, Amsterdam (1990)
Li, C., Srivastava, S., Dunson, D.B.: Simple, scalable and accurate posterior interval estimation. Biometrika 104, 665–680 (2017)
Liang, F.: On the use of stochastic approximation Monte Carlo for Monte Carlo integration. Stat. Probab. Lett. 79, 581–587 (2009)
Liang, F., Liu, C., Carroll, R.J.: Stochastic approximation in Monte Carlo computation. J. Am. Stat. Assoc. 102, 305–320 (2007)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 1087 (1953)
Minsker, S., Srivastava, S., Lin, L., Dunson, D.B.: Robust and scalable Bayes via a median of subset posterior measures. ArXiv e-prints (2014)
Neiswanger, W., Wang, C., Xing, E.P.: Asymptotically exact, embarrassingly parallel MCMC, In: Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI’14, pp. 623–632, AUAI Press, Arlington. http://dl.acm.org/citation.cfm?id=3020751.3020816 (2014)
Ripley, B.D.: Stochastic Simulation. Wiley, New York (1987)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Roe, B.P., Yang, H.-J., Zhu, J., Liu, Y., Stancu, I., McGregor, G.: Boosted decision trees as an alternative to artificial neural networks for particle identification. Nucl. Instrum. Methods Phys. Res. Sect. A Accel. Spectrom. Detect. Assoc. Equip. 543, 577–584 (2005)
Scott, S.L., Blocker, A.W., Bonassi, F.V.: Bayes and big data: the consensus Monte Carlo algorithm. Int. J. Manag. Sci. Eng. Manag. 11, 78–88 (2016)
Song, Q., Wu, M., Liang, F.: Weak convergence rates of population versus single-chain stochastic approximation MCMC algorithms. Adv. Appl. Probab. 46, 1059–1083 (2014). https://doi.org/10.1239/aap/1418396243
Srivastava, S., Cevher, V., Tran-Dinh, Q., Dunson, D.B.: WASP: scalable Bayes via barycenters of subset posteriors. In: AISTATS, volume 38 of JMLR: Workshop and Conference Proceedings (JMLR.org) (2015a)
Srivastava, S., Li, C., Dunson, D.B.: Scalable Bayes via barycenter in Wasserstein space. ArXiv e-prints (2015b)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. (Ser. B) 58, 267–288 (1996)
Wand, M.: KernSmooth: functions for kernel smoothing supporting Wand & Jones (1995). r package version 2.23-15 (2015)
Wang, X., Dunson, D.: Parallelizing MCMC via Weierstrass sampler. ArXiv e-prints (2013)
Wang, X., Guo, F., Heller, K.A., Dunson, D.B.: parallelizing MCMC with random partition trees. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (Eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 451–459. Curran Associates, Inc. http://papers.nips.cc/paper/5986-parallelizing-mcmc-with-random-partition-trees.pdf (2015)
Acknowledgements
Liang’s research was supported in part by the Grants DMS-1545202, DMS-1612924 and R01-GM117597. The authors thank the Editor, Associate Editor and two referees for their constructive comments which has led to significant improvement of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendix A Proof of Theorem 1
Appendix A Proof of Theorem 1
Under conditions A1 and A2, we can expand the mean of each subposterior at the corresponding MLE, \(\hat{\varvec{\theta }}^{(j)}\), as follows:
where \(\tilde{\pi }_j=\tilde{\pi }({\varvec{\theta }}|{\varvec{X}}_{[j]})\), \(\hat{I}^{(j)}=-\frac{1}{m} \frac{\partial ^{2} \log f({\varvec{X}}_{[j]}|\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^{T}}|_{\varvec{\theta }=\hat{\varvec{\theta }}^{(j)}}\), \(\hat{H}^{(j)}=-\frac{1}{m}\frac{\partial ^{3} logf({\varvec{X}}_{[j]}|\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^{T}\partial \varvec{\theta }}|_{\varvec{\theta }=\hat{\varvec{\theta }}^{(j)}}\), and \(\hat{H}^{(j)}\hat{I}^{(j)-1}\) is a vector whose rth element equals \(\sum _{st}\hat{H}_{rst}^{(j)}\hat{I}_{st}^{(j)-1}\). To simplify the notation, we denote \(\hat{I}^{(j)-1}[\partial \log g({\varvec{\theta }})/\partial {\varvec{\theta }}|_{\hat{{\varvec{\theta }}}^{(j)}} -\frac{1}{2}\hat{H}^{(j)}\hat{I}^{(j)-1}]\) by \(\varvec{\nu }^{(j)}\). Here we would like to mention that although A1 and A2 are directly associated with the Laplace approximation of the full posterior, they can also lead to the conditions that guarantee the Laplace approximation of each subposterior, as long as m goes to \(\infty \) with n.
Moreover, for each \(\hat{\varvec{\theta }}^{(j)}\), we have
where
Therefore, the mean of the mixture distribution \(\tilde{\pi }(\varvec{\theta }|{\varvec{X}})\) is given by
We can also implement a similar procedure on the full data posterior \(\pi (\varvec{\theta }|{\varvec{X}})\) and obtain
where \(\varvec{\nu }=\hat{I}^{-1}[\partial \log g({\varvec{\theta }})/\partial {\varvec{\theta }}|_{\hat{{\varvec{\theta }}} } -\frac{1}{2}\hat{H}\hat{I}^{-1}]\), \(\varvec{\xi }=\frac{1}{\sqrt{n}}I^{-1}\sum _{i=1}^{n}\frac{\partial \log f(X_{i}|\varvec{\theta }^{*})}{\partial \varvec{\theta }}\), \(\hat{I}=-\frac{1}{n} \frac{\partial ^{2} \log f({\varvec{X}}|\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^{T}}|_{\varvec{\theta }=\hat{\varvec{\theta }}}\), \(\hat{H}=-\frac{1}{m}\frac{\partial ^{3} logf({\varvec{X}}|\varvec{\theta })}{\partial \varvec{\theta } \partial \varvec{\theta }^{T}\partial \varvec{\theta }}|_{\varvec{\theta }=\hat{\varvec{\theta }}}\), and \(\hat{\varvec{\theta }}\) denotes the MLE of \(\theta \) calculated with the full dataset. By noting that \(\frac{\varvec{\xi }}{\sqrt{n}}=\frac{1}{k}\sum _{j=1}^{k}\frac{\varvec{\xi }^{(j)}}{\sqrt{m}}\), \(\varvec{\nu }^{(j)}\) and \(\varvec{\nu }\) are O(1), \(m<n\), Eq. (4) in Theorem 1 is thereby verified:
The variance of each subposterior can be approximated as follows:
Therefore, the variance of the mixture distribution \(\tilde{\pi }(\varvec{\theta }|{\varvec{X}})\) is given by
In addition, we have
Since \(\hat{I}^{(j)}=I+o_{p}(1)\) and \(\hat{I}=I+o_{p}(1)\), we further have \(\hat{I}^{(j)-1}=I^{-1}+o_{p}(1)\) and \(\hat{I}^{-1}=I^{-1}+o_{p}(1)\). Equation (5) in Theorem 1 is thereby verified:
Finally, for the square of the Wasserstein distance of order 2, we have
where the last equality follows from the fact \(n=o(m^{2})\). This verifies Eq. (5) in Theorem 1.
Rights and permissions
About this article
Cite this article
Xue, J., Liang, F. Double-Parallel Monte Carlo for Bayesian analysis of big data. Stat Comput 29, 23–32 (2019). https://doi.org/10.1007/s11222-017-9791-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-017-9791-1