Abstract
For distributed data mining in peer-to-peer systems this work describes a completely asynchronous, scalable and privacy-preserving committee machine. Regularization neural networks are used for all the Peer classifiers and the combiner committee in an embedded architecture. The proposed method builds the committee machine using the large amounts of training data distributed over the peers, without moving the data, and with little centralized coordination. At the end of the training phase no Peer will know anything else besides its own local data. This privacy-preserving obligation is a challenging problem for trainable combiners but is crucial in real world applications. Only classifiers are transmitted to other peers to validate their data and send back average accuracy rates in a classical asynchronous peer-to-peer execution cycle. Here the validation set for one classifier becomes the training set of the other and vice versa. From this entirely distributed and privacy-preserving mutual validation a coarse-grained asymmetric mutual validation matrix can be formed to map all Peer members. We demonstrate here that it is possible to exploit this matrix to efficiently train another regularization network as the combiner committee machine.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aggarwal CC, Yu PS (2008) Privacy-preserving data mining: models and algorithms. Kluwer Academic Publishers, New York
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Bottou L, Chapelle O, DeCoste D, Weston J (2007) Large scale kernel machines. Neural information processing series. MIT Press, Cambridge
Breiman L (1999) Combining predictors. In: Sharkey AJC (ed) Combining artificial neural nets: ensemble and modular multinet systems. Springer, Berlin, pp 31–50
Clifton C, Kantarcioglu M, Vaidya J, Lin X, Zhu M (2003) Tools for privacy preserving distributed data mining. ACM SIGKDD Explor 4(2):1–7
Datta S, Bhaduri K, Giannella C, Wolff R, Kargupta H (2006) Distributed data mining in peer-to-peer networks. IEEE Internet Comput 10(4):18–26
Drucker H (1997) Fast committee machines for regression and classification. In: KDD-97 proceedings
Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13:1–50
Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7:21–269
Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001
Hashem S (1997) Optimal linear combinations of neural networks. Neural Netw 10(4):599–614
Hussain I, Irakleous M, Siddiqi MA, Saraee M (2010) Privacy-preserving data mining in peer to peer networks. In: Proceedings of annual international conference on data analysis, data quality & metadata management (DAMD 2010), 14–15 June 2010, Singapore
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37
Kantarcioglu M, Vaidya J (2003) Privacy-preserving Naive Bayes classifier for horizontally partitioned data. In: Proceedings of IEEE workshop on privacy-preserving data mining
Kargupta H, Sivakumar K (2004) Existential pleasures of distributed data mining. Data mining: next generation challenges and future directions. AAAI/MIT Press, Cambridge
Kashima H, Ide T, Kato T, Sugiyama M (2009) Recent advances and trends in large-scale Kernel methods. IEICE Trans Inf Syst E92–D(7):1338–1353
Kokkinos Y, Margaritis K (2012) A Regularization Network committee machine of isolated Regularization Networks for distributed privacy preserving data mining. In: Iliadis L et al (eds) AIAI 2012. IFIP AICT 381, pp 97–106
Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation and active learning. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems (7). MIT Press, Cambridge, MA, pp 231–238
Kuncheva LI (2004) Combining pattern classifiers: methods and algorithms. Wiley Interscience, Hoboken
Perrone MP, Cooper LN (1993) When networks disagree: ensemble method for neural networks. In: Mammone RJ (ed) Neural networks for speech and image processing. Chapman & Hall, Boca Raton
Poggio T, Girosi F (1990) Regularization algorithms for learning that are equivalent to multilayer networks. Science 247:978–982
Poggio T, Smale S (2003) The mathematics of learning: dealing with data. Notices Am Math Soc 50(5):537–544
Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1–39
Seni G, Elder J (2010) Ensemble Methods in Data Mining. Morgan & Claypool publishers, San Rafael
Tresp V (2002) Committee machines. In: Hu YH, Hwang JN (eds) Handbook of neural network signal processing. CRC Press LLC, Boca Raton, pp 122–141
Wang L, Fu X (2005) Data mining with computational intelligence. Springer, Berlin
Wilson G (1995) Parallel programming for scientists and engineers. MIT Press, Cambridge
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Wu X (2011) Research on privacy preservation in P2P systems. Int J Adv Comput Technol 3(8):324–330
Xiong L, Chitti S, Liu L (2006) k nearest neighbour classification across multiple private databases. In: Proceedings of the ACM fifteenth conference on information and knowledge management, 5–11 November, 2006
Yi X, Zhang Y (2009) Privacy-preserving naïve Bayes classification on distributed data via semi-trusted mixers. Inf Syst 34(3):371–380
Yu H, Jiang X, Vaidya J (2006) Privacy-preserving SVM using nonlinear kernels on horizontally partitioned data. In: Proceedings of SAC conference
Acknowledgments
The authors would like to thank the anonymous reviewers for their useful suggestions that help on improving the presentation and clarity of this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kokkinos, Y., Margaritis, K.G. A distributed privacy-preserving regularization network committee machine of isolated Peer classifiers for P2P data mining. Artif Intell Rev 42, 385–402 (2014). https://doi.org/10.1007/s10462-013-9418-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-013-9418-7