Abstract
We focus on the problem of data mining over large-scale fully distributed databases, where each node stores only one data record. We assume that a data record is never allowed to leave the node it is stored at. Possible motivations for this assumption include privacy or a lack of a centralized infrastructure. To tackle this problem, earlier we proposed the generic gossip learning framework (GoLF), but so far we have studied only basic linear algorithms. In this paper we implement the well-known boosting technique in GoLF. Boosting techniques have attracted growing attention in machine learning due to their outstanding performance in many practical applications. Here, we present an implementation of a boosting algorithm that is based on FilterBoost. Our main algorithmic contribution is a derivation of a pure online multi-class version of FilterBoost, so that it can be employed in GoLF. We also propose improvements to GoLF, with the aim of maximizing the diversity of the evolving models gossiped in the network, a feature that we show to be important. We evaluate the robustness and the convergence speed of the algorithm empirically over three benchmark databases.We compare the algorithm with the sequential AdaBoost algorithm and we test its performance in a failure scenario involving message drop and delay, and node churn.
M. Jelasity was supported by the Bolyai Scholarship of the Hungarian Academy of Sciences. This work was partially supported by the FET programme FP7-COSI-ICT of the European Commission through project QLectives (grant no.: 231200) and by the ANR-2010-COSI-002 grant of the French National Research Agency.
Chapter PDF
Similar content being viewed by others
References
Filelist (2005), http://www.filelist.org
Asuncion, A.U., Smyth, P., Welling, M.: Asynchronous distributed estimation of topic models for document analysis. Statistical Methodology 8(1), 3–17 (2011)
Babenko, B., Yang, M., Belongie, S.: A family of online boosting algorithms. In: Computer Vision Workshops (ICCV Workshops), pp. 1346–1353 (2009)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Intl. Conf. on Computational Statistics, vol. 19, pp. 177–187 (2010)
Bradley, J., Schapire, R.: FilterBoost: Regression and classification on large datasets. In: Advances in Neural Information Processing Systems, vol. 20. The MIT Press (2008)
Collins, M., Schapire, R., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48, 253–285 (2002)
Datta, S., Bhaduri, K., Giannella, C., Wolff, R., Kargupta, H.: Distributed data mining in peer-to-peer networks. IEEE Internet Comp. 10(4), 18–26 (2006)
Fan, W., Stolfo, S.J., Zhang, J.: The application of AdaBoost for distributed, scalable and on-line learning. In: Proc. 5th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 362–366 (1999)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Machine Learning: Proc. Thirteenth Intl. Conf., pp. 148–156 (1996)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. of Comp. and Syst. Sci. 55, 119–139 (1997)
Friedman, J.: Stochastic gradient boosting. Computational Statistics and Data Analysis 38(4), 367–378 (2002)
Jelasity, M., Canright, G., Engø-Monsen, K.: Asynchronous Distributed Power Iteration with Gossip-Based Normalization. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 514–525. Springer, Heidelberg (2007)
Jelasity, M., Montresor, A., Babaoglu, O.: Gossip-based aggregation in large dynamic networks. ACM Trans. on Computer Systems 23(3), 219–252 (2005)
Kégl, B., Busa-Fekete, R.: Boosting products of base classifiers. In: Intl. Conf. on Machine Learning, Montreal, Canada, vol. 26, pp. 497–504 (2009)
Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: Proc. 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2003), pp. 482–491. IEEE Computer Society (2003)
Kowalczyk, W., Vlassis, N.: Newscast EM. In: 17th Advances in Neural Information Processing Systems (NIPS), pp. 713–720. MIT Press, Cambridge (2005)
Luo, P., Xiong, H., Lü, K., Shi, Z.: Distributed classification in peer-to-peer networks. In: Proc. 13th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD 2007, pp. 968–976. ACM, New York (2007)
Ormándi, R., Hegedűs, I., Jelasity, M.: Asynchronous Peer-to-Peer Data Mining with Stochastic Gradient Descent. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 528–540. Springer, Heidelberg (2011)
Ormándi, R., Hegedüs, I., Jelasity, M.: Efficient p2p ensemble learning with linear models on fully distributed data. CoRR abs/1109.1396 (2011)
Oza, N., Russell, S.: Online bagging and boosting. In: Proc. Eighth Intl. Workshop on Artificial Intelligence and Statistics (2001)
Park, B.H., Kargupta, H.: Distributed data mining: Algorithms, systems, and applications. In: Ye, N. (ed.) The Handbook of Data Mining. CRC Press (2003)
PeerSim, http://peersim.sourceforge.net/
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)
Siersdorfer, S., Sizov, S.: Automatic Document Organization in a P2P Environment. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 265–276. Springer, Heidelberg (2006)
Tölgyesi, N., Jelasity, M.: Adaptive Peer Sampling with Newscast. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 523–534. Springer, Heidelberg (2009)
Widrow, B., Hoff, M.E.: Adaptive Switching Circuits. In: 1960 IRE WESCON Convention Record, vol. 4, pp. 96–104 (1960)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hegedűs, I., Busa-Fekete, R., Ormándi, R., Jelasity, M., Kégl, B. (2012). Peer-to-Peer Multi-class Boosting. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-32820-6_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)