Abstract
When multiple data owners possess records on different subjects with the same set of attributes—known as horizontally partitioned data—the data owners can improve analyses by concatenating their databases. However, concatenation of data may be infeasible because of confidentiality concerns. In such settings, the data owners can use secure computation techniques to obtain the results of certain analyses on the integrated database without sharing individual records. We present secure computation protocols for Bayesian model averaging and model selection for both linear regression and probit regression. Using simulations based on genuine data, we illustrate the approach for probit regression, and show that it can provide reasonable model selection outputs.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal, R., Srikant, R.: Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD on Management of Data, pp. 439–450 (2000)
Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993)
Barbieri, M., Berger, J.: Optimal predictive model selection. Ann. Stat. 32(3), 870–897 (2004)
Benaloh, J.: Secret sharing homomorphisms: keeping shares of a secret. In: Odlyzko, A. (ed.) Advances in Cryptography: CRYPTO86, vol. 263, pp. 251–260. Springer, New York (1987)
Berger, J., Perichhi, L.: Objective Bayesian methods for model selection: introduction and comparison [with discussion]. In: Lahiri, P. (ed.) Institute of Mathematical Statistics Lecture Notes, Monograph Series, vol. 38, Beachwood Ohio, pp. 135–207 (2001)
Berger, J.O., Ghosh, J.K., Mukhopadhyay, N.: An overview of robust Bayesian analysis. J. Stat. Plan. Inference 112, 241–258 (2003)
Carlin, B., Chib, S.: Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. B 57, 473–484 (1995)
Chib, S.: Marginal likelihood from the Gibbs output. J. Am. Stat. Assoc. 90, 1313–1321 (1995)
Churches, T., Christen, P.: Some methods for blindfolded record linkage. BMC Med. Inf. Decis. Mak. 4, 9 (2004)
Clyde, M.: Bayesian model averaging and model search strategies (with discussion). In: Bayesian Statistics 6—Proceedings of the Sixth Valencia International Meeting, pp. 157–185 (1999)
Clyde, M.: Model averaging. In: Press, S.J. (ed.) Subjective and Objective Bayesian Statistics: Principles, Models and Applications. Wiley, New York (2002)
Clyde, M., George, E.I.: Model uncertainty. Stat. Sci. 19(1), 81–94 (2004)
Clyde, M.A.: BAS: Bayesian Adaptive Sampling for Bayesian Model Averaging. R package version 0.90 (2010)
Clyde, M.A., Ghosh, J., Littman, M.L.: Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph. Stat. 20(1), 80–101 (2011)
Dellaportas, P., Forster, J.J., Ntzoufras, I.: On Bayesian model and variable selection using MCMC. Stat. Comput. 12(1), 27–36 (2002)
Du, W., Han, Y., Chen, S.: Privacy-preserving multivariate statistical analysis: linear regression and classification. In: Proceedings of the 4th SIAM International Conference on Data Mining, pp. 222–233 (2004)
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules (invited journal version). Inf. Syst. 29(4), 343–364 (2004)
Gelfand, A.E., Smith, A.F.M.: Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990)
Gelman, A., Meng, X.-L.: Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat. Sci. 13, 163–185 (1998)
George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88, 881–889 (1993)
Ghosh, J., Clyde, M.A.: Rao-Blackwellization for Bayesian variable selection and model averaging in linear and binary regression: a novel data augmentation approach. J. Am. Stat. Assoc. 106(495), 1041–1052 (2011)
Ghosh, J., Reiter, J.P., Karr, A.F.: Secure computation with horizontally partitioned data using adaptive regression splines. Comput. Stat. Data Anal. 51, 5813–5820 (2007)
Heaton, M., Scott, J.: Bayesian computation and the linear model. In: Chen, M.-H., Dey, D.K., Mueller, P., Sun, D., Ye, K. (eds.) Frontiers of Statistical Decision Making and Bayesian Analysis (2010)
Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14(4), 382–401 (1999)
Holmes, C.C., Held, L.: Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1, 145–168 (2006)
Kantarcioglu, M., Clifton, C.: Privacy-preserving distributed mining of association rules on horizontally partitioned data. In: The ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD 2002), Madison, Wisconsin, pp. 24–31 (2002)
Karr, A., Lin, X., Sanil, A., Reiter, J.: Secure regressions on distributed databases. J. Comput. Graph. Stat. 14, 263–279 (2005)
Kurgan, L., Cios, K., Tadeusiewicz, R., Ogiela, M., Goodenday, L.: Knowledge discovery approach to automated cardiac spect diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)
Liang, F., Paulo, R., Molina, G., Clyde, M., Berger, J.: Mixtures of g-priors for Bayesian variable selection. J. Am. Stat. Assoc. 103, 410–423 (2008)
Lin, X., Clifton, C., Zhu, Y.: Privacy preserving clustering with distributed em mixture modeling. Int. J. Knowl. Inf. Syst. 8(1), 68–81 (2005)
Lindell, Y., Pinkas, B.: Privacy-preserving data mining. In: Advances in Cryptology: CRYPTO2000, pp. 36–54. Springer, New York (2000)
Meng, X.-L., Wong, W.H.: Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Stat. Sin. 6, 831–860 (1996)
Raftery, A.E.: Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83, 251–266 (1996)
Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: Secure logistic regression of horizontally and vertically partitioned distributed databases. In: Data Mining Workshops, 2007. ICDM Workshops 2007. Seventh IEEE International Conference on, pp. 723–728 (2007)
Tierney, L., Kadane, J.: Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 81, 82–86 (1986)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 639–644 (2002)
Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: The 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, pp. 206–215 (2003)
Willenborg, L., de Waal, T.: Elements of Statistical Disclosure Control. Springer, New York (2001)
Zellner, A.: On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In: Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pp. 233–243. North-Holland/Elsevier, Amsterdam (1986)
Zellner, A., Siow, A.: Posterior odds ratios for selected regression hypotheses. In: Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain), pp. 585–603 (1980)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ghosh, J., Reiter, J.P. Secure Bayesian model averaging for horizontally partitioned data. Stat Comput 23, 311–322 (2013). https://doi.org/10.1007/s11222-011-9312-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-011-9312-6