Abstract
This chapter surveys the field of Big Data analysis from a machine learning perspective. In particular, it contrasts Big Data analysis with data mining, which is based on machine learning, reviews its achievements and discusses its impact on science and society. The chapter concludes with a summary of the book’s contributing chapters divided into problem-centric and domain-centric essays.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
While this application was originally considered a success, it subsequently obtained disappointing results and is now in the process of getting improved [4].
- 2.
Please note that graphs were sometimes considered in traditional data mining (e.g., as structures of chemical compounds), but the graphs in question were of much smaller size than those considered today.
References
Abiteboul, S.: Querying semi-structured data. In: ICDT ’97 Proceedings of the 6th International Conference on Database Theory, pp. 1–18 (1997)
An interview with Michal Jordan—Why Big Data Could Be a Big Fail. IEEE Spectrum. (Posted by Lee Gomes, 20 Oct 2014)
Anderson, C.: The end of Theory. The data deluge makes the scientific method obsolete, Wired Magazine, 16/07 (2008, June 23)
Auerbach, D.: The Mystery of the Exploding Tongue. How reliable is Google Flu Trends? Slate Web page. http://www.slate.com/articles/technology/bitwise/2014/03/google_flu_trends_reliability_a_new_study_questions_its_methods.html (2014)
Azzara, M.: Big Data Ethics: Transparency, Privacy, and Identity. Blog cmo.com. (Retrieved 2015)
Barbaro, M., Zeller, Jr, T.: A Face Is Exposed for AOL Searcher No. 4417749. The New York Times Magazine. (August 9, 2006)
Barbier, G., Liu, H.: Data Mining in Social Media. In: Aggarwal, C. (eds.) Social Network Data Analytics, pp. 327–352. Kluwer Academic Publishers, Springer (2011)
Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning. Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Berkeley Data Analysis Stack. https://amplab.cs.berkeley.edu/software/
Beyer, M.A., Laney, D.: The importance of "Big Data": a definition. Gartner Publications, pp. 1–9 (2012). See also: http://www.gartner-com/it-glosary/big-data
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Billion Price Project. http://bpp.mit.edu/
Boyd, D., Crawford, K.: Six provocations for Big Data. Presented at "A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society" Oxford Internet Institute, Sept 21 (2011)
Boyd, D., Crawford, K.: Critical questions for big data. Inf. Commun. Soc. 15(5), 662–679 (2012)
Che, D., Safran, M., Peng, Z.: From big data to big data mining: challenges, issues and opportunities. In: Hong, B, et al. (eds.) DASFAA Workshops, Springer LNCS 7827, pp. 1–15 (2013)
Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mobile New Appl. 19, 171–209 (2014)
Dai, C., Lin, D., Bertino, E., Kantarcioglu, M.: An approach to evaluate data trustworthiness based on data provenance. In: Proceedings of the 5th VLDB Workshop on Secure Data Management, pp. 82– 98 (2008)
Davidson, S., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the SIGMOD’08 (2008)
Davis, K.: Ethics of Big Data. Balancing Risk and Innovation. O’Reily (2012)
De Mauro, A., Greco, M., Grimaldi, M.: What is big data? a consensual definition and a review of key research topics. In: Proceedings of 4th Conference on Integrated Information (2014)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Einav, L., Levin, J.D.: The data revolution and economic analysis. National Bureau of Economic Research Working Paper, no. 19035 (2013)
Fan, W., Bifet, A.: Mining big data: current status, and forecast to the future. SIGKDD Explor. Newsl. 12(2), 1–5 (2013)
Frontiers in Massive Data Analysis. The National Research Council, the National Academy of Sciences, USA (2013)
Future Attribute Screening Technology. Wikipedia article. https://en.wikipedia.org/wiki/Future_Attribute_Screening_Technology
Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. ACM Sigmod Record 34(2), 18–26 (2005)
Gama, J.: Knowledge Discovery from Data Streams, 1st ed. Hall/CRC, (2010)
Ghoting, A., Kambadur, P., Pednault, E., Kannan, R.: NIMBLE: A toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 2011, pp. 334–342 (2011)
Ginsberg, J., Mohebbi, M. H., Patel, Rajan S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (19 Feb 2009)
Glavic, B.: Big Data provenance: challenges and implications for benchmarking. In: Specifying Big Data Benchmarks, pp. 72–80. Springer (2014)
Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobility patterns. Nature 453, 779–782 (2008)
Hadoop. http://hadoop.apache.org
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. San Francisco, Morgan Kaufmann (2005)
Harford, T.: Big Data: are we making a big mistakes? Financial Times, March 28 (2014)
Hashem, I., Yaqoob, I., Anuor, N., Mokhter, S., Gani, A., Khan, S.: The rise of bog data on cloud computing. Review and open research issues. Inf. Syst. 47, 98–115 (2015)
How big data analysis helped increase Walmart’s sales turnover. DeZyre Web page (23 May 2015)
Kang, U., Faloutsos, C.: Big graph mining: algorithms and discoveries. ACM SIGKDD Explor. Newsl. 14(2), 29–36 (2012)
Kraska, T., Talwalkar, A., Duchi, J.C., Griffith, R., Franklin, M.J., Jordan, M.I. MLbase: A distributed machine-learning system. In: Proceedings of Sixth Biennial Conference on Innovative Data Systems Research (2013)
Krempl, G., Zliobaite, I., Brzezinski, D., Hullermeier, E., Last, M., Lemaire, V., Noack, T., Shaker, A., Sievi, S., Spiliopoulou, M., Stefanowski, J.: Open challenges for data stream mining research. ACM SIGKDD Explor. 16(1), 1–10 (2014). June
Mahout software. http://mahout.apache.org/
Maimon, O., Rokach, L. (eds.): The Data Mining and Knowledge Discovery Handbook. Springer (2005)
Mannila, H.: Data mining: machine learning, statistics, and databases, In: Proceedings of the Eight International Conference on Scientific and Statistical Database Management. Stockholm June 18–20, pp. 1–8 (1996)
Manning C., Schutze H. Foundations of Statistical Natural Language Processing. MIT Press (1999)
Marcus, G., Davis, E.: Eight (No, Nine!) Problems With Big Data. New York Times (Apr 6, 2014)
Matwin, S.: Privacy-preserving data mining techniques: survey and challenges. In: Custers, B., Calders, T., Schermer, B., Zarsky T. (eds.) Discrimination and Privacy in the Information Society. Springer Series on Studies in Applied Philosophy, Epistemology and Rational Ethics, vol. 3, pp. 209–221 (2013)
Matwin, S.: Machine learning: four lessons and what is next? Bull. Pol. AI Soc. 2, 2–7 (2013)
Mayer-Schonberger, V., Cukier, K.: Big Data: A Revolution That Will Transform How We Live, Work and Think. Eamon, Dolan/Houghton Mifflin Harcourt (2013)
Morales, G., Bifet, A.: SAMOA: scalable advanced massive online analysis. J. Mach. Learn. Res. 16, 149–153 (2015)
Narayanan, A., Shmatikov, V.: Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset). In: Proceedings of the 2008 IEEE Symposium on Security and Privacy SP’08, pp. 111–125 (2008)
Piatetsky-Shapiro, G., Matheus, C. (eds): Knowledge discovery in databases. AAAI/MIT Press (1991)
Pietsch, W.: Big Data? The New Science of Complexity. In: 6th Munich-Sydney-Tilburg Conference on Models and Decisions (Munich; 10–12 April 2013)
Reinventing Society in the Wake of Big Data—Edge’s interview with Alex "Sandy" Pentland (Posted August 30, 2012)
Ritter, D.: When to act on a correlation and when no to. Harward Business Review, March 19 (2014)
Roddick, J., Hornsby, K., Spiliopoulou, M.: An updated bibliography of temporal, spatial, and spatio-temporal data mining research. Lect. Notes Comput. Sci. 2007, 147–163 (2001)
Rudin, C., Passonneau, R., Radeva, A., Jerome, S., Issac, D.: 21st century data miners meet 19-th century electrical cables. IEEE Comput. 103–105 (June 2011)
Rudin, C., et al.: Machine learning for the New York city power grid. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 328–345 (2012)
Shekhar, S.: What is special about mining spatial and spatio-temporal datasets? Tutorial (2014)b. http://www-users.cs.umn.edu/~shekhar/talk/sdm2.html
Simmhan, Y., Plale, B., Gannon, D.: A survey on data provenance techniques. Technical Report Indiana University, IUB-CS-TR618 (2005)
Singh, D., Reddy, C.: A survey on platforms for Big Data analytics. J. Big Data 1(8), 2–20 (2014)
Sloan Digital Sky Survey. Wikipedia article. https://en.wikipedia.org/wiki/loan_Digital_Sky_Survey
Sun, Y., Han, J.: Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers (2012)
The h2o software. http://0xdata.com/h2o
Thomson, C.: What Is IBMs Watson? The New York Times Magazine, June 16 (2010)
Tufekci, Z.: Big Data: Pitfalls, methods and concepts for an emergent field. SSRN (March 2013). http://dx.doi.org/10.2139/ssrn.2229952
Venkateswara Rao, K., Govardhan, A., Chalapati, Rao K.V.: Spatiotemporal data mining: issues, tasks and applications. Int. J. Comput. Sci. & Eng. Surv. (IJCSES) 3(1) (Feb 2012)
Vucetic S., Obradovis, Z.: Discovering homogeneous regions in spatial data through competition. In: Proceedings of the 17th International Conference of Machine Learning ICML, pp. 1095–1102 (2000)
Zhou, Z.H., Chavla, N., Jin, Y., Williams, G.: Big Data opportunities and challenges: discussions from data analytics perspectives. IEEE Comput. Intell. Mag. 9(4), 62–74 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Japkowicz, N., Stefanowski, J. (2016). A Machine Learning Perspective on Big Data Analysis. In: Japkowicz, N., Stefanowski, J. (eds) Big Data Analysis: New Algorithms for a New Society. Studies in Big Data, vol 16. Springer, Cham. https://doi.org/10.1007/978-3-319-26989-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-26989-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26987-0
Online ISBN: 978-3-319-26989-4
eBook Packages: EngineeringEngineering (R0)