Abstract
The field of Distributed Data Mining (DDM) deals with the problem of analyzing data by paying careful attention to the distributed computing, storage, communication, and human-factor related resources. Unlike the traditional centralized systems, DDM offers a fundamentally distributed solution to analyze data without necessarily demanding collection of the data to a single central site. This chapter presents an introduction to distributed data mining for continuous streams. It focuses on the situations where the data observed at different locations change with time. The chapter provides an exposure to the literature and illustrates the behavior of this class of algorithms by exploring two very different types of techniques—one for the peer-to-peer and another for the hierarchical distributed environment. The chapter also briefly discusses several different applications of these algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C Aggarwal. A framework for diagnosing changes in evolving data streams. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.
C. Aggarwal, J. Han, J. Wang, and P. Yu. A framework for clustering evolving data streams. In VLDB conference, 2003.
C. Aggarwal, J. Han, J. Wang, and P. S. Yu. On demand classification of data streams. In KDD, 2004.
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In In Principles of Database Systems (PODS’02), 2002.
B. Babcock and C. Olston. Distributed top-k monitoring. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.
S. Ben-David, J. Gehrke, and D. Kifer. Detecting change in data streams. In VLDB Conference, 2004.
J. Chen, D. DeWitt, F. Tian, and Y. Wang. NiagaraCQ: a scalable continuous query system for Internet databases. In ACM SIGMOD’00 International Conference on Management of Data, 2000.
R. Chen, K. Sivakumar, and H. Kargupta. An approach to online bayesian learning from multiple data streams. In Proceedings of the Workshop on Ubiquitous Data Mining (5th European Conference on Principles and Practice of Knowledge Discovery in Databases), Freiburg, Germany, September 2001.
R. Chen, K. Sivakumar, and H. Kargupta. Collective mining of bayesian networks from distributed heterogeneous data. Knowledge and Information Systems, 6:164–187, 2004.
P. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In ACM Symposium on Parallel Algorithms and Architectures, 2001.
S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In IEEE Symposium on FOCS, 2000.
D. Heckerman. A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, 1995.
M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report TR-1998-011, Compaq System Research Center, 1998.
G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In SIGKDD, 2001.
R. Jin and G. Agrawal. Efficient decision tree construction on streaming data. In SIGKDD, 2003.
H. Kargupta and K. Sivakumar. Existential Pleasures of Distributed Data Mining. Data Mining: Next Generation Challenges and Future Directions. AAAI/MIT press, 2004.
J. Kotecha, V. Ramachandran, and A. Sayeed. Distributed multi-target classification in wireless sensor networks. IEEE Journal of Selected Areas in Communications (Special Issue on Self-Organizing Distributed Collaborative Sensor Networks), 2003.
D. Krivitski, A. Schuster, and R. Wolff. A local facility location algorithm for sensor networks. In Proc. of DCOSS’05, 2005.
S. Kutten and D. Peleg. Fault-local distributed mending. In Proc. of the ACM Symposium on Principle of Distributed Computing (PODC), pages 20–27, Ottawa, Canada, August 1995.
S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, series B, 50:157–224, 1988.
N. Linial. Locality in distributed graph algorithms. SIAM Journal of Computing, 21:193–201, 1992.
A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In International Conference on Data Engineering (ICDE’05), 2005.
C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In ACM SIGMOD’ 03 International Conference on Management of Data, 2003.
J. Widom and R. Motwani. Query processing, resource management, and approximation in a data stream management system. In CIDR, 2003.
R. Wolff, K. Bhaduri, and H. Kargupta. Local L2 thresholding based data mining in peer-to-peer systems. In Proceedings of SIAM International Conference in Data Mining (SDM), Bethesda, Maryland, 2006.
R. Wolff and A. Schuster. Association rule mining in peer-to-peer systems. In Proceedings of ICDM’03, Melbourne, Florida, 2003.
J. Zhao, R. Govindan, and D. Estrin. Computing aggregates for monitoring wireless sensor networks. In Proceedings of the First IEEE International Workshop on Sensor Network Protocols and Applications, 2003.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Bhaduri, K., Das, K., Sivakumar, K., Kargupta, H., Wolff, R., Chen, R. (2007). Algorithms for Distributed Data Stream Mining. In: Aggarwal, C.C. (eds) Data Streams. Advances in Database Systems, vol 31. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-47534-9_14
Download citation
DOI: https://doi.org/10.1007/978-0-387-47534-9_14
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-28759-1
Online ISBN: 978-0-387-47534-9
eBook Packages: Computer ScienceComputer Science (R0)