[go: up one dir, main page]

0% found this document useful (0 votes)
39 views63 pages

Chapter - 1 Introduction To Big Data - r1

The document provides an introduction to Big Data and Big Data Analytics, covering its evolution, sources, characteristics (6 Vs), tools, and applications. It discusses the transition from traditional data processing to handling vast amounts of data generated by both humans and machines, emphasizing the challenges and solutions, particularly through Hadoop. Additionally, it outlines various types of analytics, including descriptive, diagnostic, predictive, prescriptive, exploratory, cognitive, and real-time analytics.

Uploaded by

desta melkamu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views63 pages

Chapter - 1 Introduction To Big Data - r1

The document provides an introduction to Big Data and Big Data Analytics, covering its evolution, sources, characteristics (6 Vs), tools, and applications. It discusses the transition from traditional data processing to handling vast amounts of data generated by both humans and machines, emphasizing the challenges and solutions, particularly through Hadoop. Additionally, it outlines various types of analytics, including descriptive, diagnostic, predictive, prescriptive, exploratory, cognitive, and real-time analytics.

Uploaded by

desta melkamu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Welcome

Chapter -1
Introduction to Big Data and
Big Data Analytics

_________________________________________
@ 2024 FTVT Institute All Rights Reserved
Contents
§ Evolution of Big data
§ sources of Big Data
§ What is Big Data?
§ Characteristic of Big Data( 6 Vs)
§ Tools used in Big Data
§ Introduction to Big Data analytics
§ Big Data analytics goals
§ Applications/use cases of Big Data analytics
§ Challenges of Big Data
§ How Hadoop solves the Big Data problem
ICT Department TVTI 21/11/24
Evolution of Big Data

— The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming data

ICT Department TVTI 21/11/24


Unit of Data Exact size Approximate Size Examples
size
KB (kilobyte ) 2 10 or 1024 (103 or one thousand) A typical joke =1KB
bytes bytes
4
MB(megabyte ) 2 20 bytes (106 or one million) bytes Complete work of Shakespeare =5MB
GB (gigabyte ) 2 30 bytes (109 or one billion) bytes Ten yards of books on a shelf = 1GB
TB (terabyte) 2 40 bytes (1012 or one trillion) bytes All the X-rays for a large hospital =1TB Tweets;
created daily =121TB;
PB (peta byte) 2 50 bytes (1015 or one quadrillion) All U.S. academic research libraries = 2PB B
bytes Data processed in a day by Google =24PB
I
EB (exa byte) 2 60 bytes (1018 or one Quintillion) Total global data created in 2006 = 161EB G
bytes
ZB (zetta byte) 2 70 bytes (1021 or one Sextillion) Total amount of global data created in 2012 = D
bytes 2.7 ZB and expected 44 ZB by 2020 A
T
A
YB (yotta byte) 2 80 bytes (1024 or one Septillion)
bytes

ICT Department TVTI 21/11/24


Evolution of Big Data by technology

ICT Department TVTI 21/11/24


Evolution of Big Data by Internet Of Things

ICT Department TVTI 21/11/24


Evolution of Big Data by Social Media

ICT Department TVTI 21/11/24


Evolution of Big Data by other factors

ICT Department TVTI 21/11/24


Big Data sources

— Human Generated Data


¡ is emails, documents, photos and tweets. We are generating this data faster than
ever. Just imagine the number of videos uploaded to You Tube and tweets swirling
around. This data can be Big Data too.

— Machine Generated Data


¡ is a new breed of data. This category consists of sensor data, and logs generated by
'machines'
¡ such as email logs, click stream logs, etc. Machine generated data is orders of
magnitude larger than Human Generated Data.

ICT Department TVTI 21/11/24


Big Data sources
— Web Data
¡ Social media data : Sites like Facebook, Twitter, LinkedIn generate a large
amount of data
¡ Click stream data : when users navigate a website, the clicks are logged for
further analysis (like navigation patterns). Click stream data is important in on
line advertising and E-Commerce

12+ TBs of tweet data every day

25+ TBs of ? TBs of data every day


log data every day

ICT Department TVTI 21/11/24


Big Data sources
sensor data : sensors embedded in roads to monitor traffic and misc.

4.6 billion
30 billion RFID tags today camera phones
(1.3B in 2005) world wide

100s of millions
of GPS enabled
devices sold
annually

2+ billion
people on the Web
by end 2011
76 million smart meters in 2009…
200M by 2014

ICT Department TVTI 21/11/24


What is Big Data?

Big data
is the term for a collection of data sets so
large and complex that it becomes difficult
to process using traditional data processing
applications.

Real world examples of Big Data


— Facebook : has 40 PB of data and captures 100
TB / day
— Yahoo : 60 PB of data
— Twitter : 8 TB / day
— EBay : 40 PB of data, captures 50TB/ day

ICT Department TVTI 21/11/24


Characteristics of Big Data( 3 Vs of Big data )

ICT Department TVTI 21/11/24


Characteristics of Big Data( 6 Vs of Big data )

Variability

ICT Department TVTI 21/11/24


V-Volume Characteristic of Big Data( 6Vs of Big data )

— 1st V-volume

ICT Department TVTI 21/11/24


V-Volume Characteristic of Big Data( 6Vs of Big data )

— 1st V-volume

¡ Data Volume
• 44x increase from 2009
to 2020 From 0.8
zettabytes to 35zb
• Data volume is
increasing exponentially

ICT Department TVTI 21/11/24


V-velocity Characteristic of Big Data( 6 Vs of Big data )

— 2nd V-velocity: Data is being generated at every minute

ICT Department TVTI 21/11/24


V-velocity Characteristic of Big Data( 6 Vs of Big data )

— 2nd V-velocity: Data is being generated at every minute

ICT Department TVTI 21/11/24


V-velocity Characteristic of Big Data( 6 Vs of Big data )
19

— 2nd V-velocity: Data is being generated at every minute

ICT Department TVTI 21/11/24


V-Variety Characteristic of Big Data( 6 Vs of Big data )

— 3rd V-Variety: different kinds of data generated from various sources

ICT Department TVTI 21/11/24


V-Variety Characteristic of Big Data( 6 Vs of Big data )

— 3rd V-Variety: different kinds of data generated from various sources

ICT Department TVTI 21/11/24


V - Veracity Characteristic of Big Data( 6 Vs of Big data )

— 4th V - Veracity: uncertainties and inconsistencies in big data

ICT Department TVTI 21/11/24


V - Veracity Characteristic of Big Data( 6 Vs of Big data )

— 4th V - Veracity: uncertainties and inconsistencies in big data

ICT Department TVTI 21/11/24


V - Value characteristic of Big Data( 6 Vs of Big data )
24

— 5th V - Value: Mechanism to bring correct meaning out of the data

ICT Department TVTI 21/11/24


V - Value characteristic of Big Data( 6 Vs of Big data )

— 5th V - Value: Mechanism to bring correct meaning out of the data

Note: Use the volume of data to deliver new value.

ICT Department TVTI 21/11/24


V - Value characteristic of Big Data( 6 Vs of Big data )

— 5th V - Value: Mechanism to bring correct meaning out of the data

Note: Use the volume of data to deliver new value.

ICT Department TVTI 21/11/24


V - Variability characteristic of Big Data( 6 Vs of Big data )

— 6th V - Variability

Variability

ICT Department TVTI 21/11/24


Characteristics of Big Data( 6 Vs of Big data )

Variability

ICT Department TVTI 21/11/24


Characteristics of Big Data(6Vs of Big data )

Variability
ICT Department TVTI 21/11/24
Traditional DB vs Big Data

Traditional data base/


Big Data
data warehouse
— Data — Data
¡ TB to PB ¡ PB to ZB
¡ Only structured ¡ structured and unstructured
— Hardware — Hardware
¡ big central servers ¡ computer clusters
¡ Expensive ¡ Cost effective
¡ Hardware reliability ¡ Unreliable HW
¡ Limited scalability ¡ Scales further
— Software — Software
¡ Centralized ¡ Distributed
¡ Schema based ¡ Not schema based
¡ Oracle/mysql/sql server ¡ Hadoop

ICT Department TVTI 21/11/24


Big data tools
31

ICT Department TVTI 21/11/24


What is Big data analytics

ICT Department TVTI 21/11/24


Stages in Big data analytics

ICT Department TVTI 21/11/24


Big data analytics goals

ICT Department TVTI 21/11/24


Big data analytics goals

1.Making organizations more smarter and efficient

ICT Department TVTI 21/11/24


Big data analytics goals

1.Making organizations more smarter and efficient

ICT Department TVTI 21/11/24


Big data analytics goals

ICT Department TVTI 21/11/24


Big data analytics goals

ICT Department TVTI 21/11/24


Big data analytics application domains

ICT Department TVTI 21/11/24


Big data analytics use cases

ICT Department TVTI 21/11/24


Big data analytics use cases

ICT Department TVTI 21/11/24


Big data analytics use cases

ICT Department TVTI 21/11/24


Big data analytics use cases

ICT Department TVTI 21/11/24


Big data analytics use cases

ICT Department TVTI 21/11/24


Big data analytics use cases

— IBM Big data analytics – Big data collected by smart meters

ICT Department TVTI 21/11/24


Big data analytics use cases

— IBM Big data analytics – problem with smart meter big data

ICT Department TVTI 21/11/24


Big data analytics use cases

— IBM Big data analytics – how smart meter big data analysed

ICT Department TVTI 21/11/24


Big data analytics use cases

— IBM Big data analytics – IBM smart meter solution

ICT Department TVTI 21/11/24


Types of Big data analytics

What happened?

Descriptive Analysis:
q Examines historical trends
q Leverages internal data sources

ICT Department TVTI 21/11/24


Types of Big data analytics-Descriptive

Descriptive Analytics that help answer what happened

Eg:

Average Café Metrics


Average wait for customers: 3.4 minutes
Average customers serviced: 135

Recent Café Metrics


Average wait for customers: 4.8 minutes (41% higher)
Total customers serviced: 112. (17% lower)

ICT Department TVTI 21/11/24


Types of Big data analytics

Predictive Analytics that help answer what might happen next

Predictive Analytics: the probability of the future events

ICT Department TVTI 21/11/24


Types of Big data analytics

If this
happens then
that will
happen

ICT Department TVTI 21/11/24


Types of Big data analytics

Prescriptive Analytics that help answer what should be done

Prescriptive Analysis:
q Creates predictions
q Offers recommendations
q Builds an optimization model

ICT Department TVTI 21/11/24


Types of Big data analytics

Diagnostic Analytics that help answer why something happened

ICT Department TVTI 21/11/24


Major Differences Between Various Big Data Analytics
Type Focus Purpose Techniques/Methods Examples Tools
Summarizes past data to Data aggregation, Monthly sales report, website
Descriptive Analytics What happened? Tableau, Power BI, Excel
identify patterns or trends. summarization traffic stats

Identifies the causes behind Root cause analysis, Analyzing why sales dropped,
Diagnostic Analytics Why did it happen? Python, R, SQL, SAS
past outcomes. correlations customer churn reasons

Forecasts future outcomes


Regression, forecasting, ML Sales forecasting, predicting TensorFlow, PyTorch, IBM
Predictive Analytics What will happen? using historical data and
models customer churn SPSS
machine learning models.

Provides actionable
Optimization, simulation, AI Supply chain optimization, Gurobi, CPLEX, SAS, ML
Prescriptive Analytics What should we do? recommendations based on
models pricing strategy frameworks
predictions and simulations.

Uncovers new patterns,


Identifying new customer
Exploratory Analytics What can we discover? relationships, or insights Data visualization, clustering Python (matplotlib, seaborn), R
segments
without predefined hypotheses.

Mimics human cognition to


How can we think like Sentiment analysis, chatbot IBM Watson, Microsoft Azure
Cognitive Analytics interpret complex, AI, natural language processing
humans? development AI
unstructured data.

Processes data in real-time to Streaming analytics, event Monitoring network security, Apache Kafka, Splunk, AWS
Real-Time Analytics What is happening now?
provide immediate insights. processing live stock market trends Kinesis

Note: Look into the supplementary document for more details

ICT Department TVTI 21/11/24


Challenges/problems with Big data

ICT Department TVTI 21/11/24


Challenges/problems with Big data

ICT Department TVTI 21/11/24


Challenges/problems with Big data

ICT Department TVTI 21/11/24


HADOOP is solution to Big data problems

ICT Department TVTI 21/11/24


HADOOP is solution to Big data problems

ICT Department TVTI 21/11/24


HADOOP is solution to Big data problems

ICT Department TVTI 21/11/24


HADOOP is solution to Big data problems

ICT Department TVTI 21/11/24


Introduction to Big data and Analytics

ANY QUESTIONS / DOUBTS

???
ICT Department TVTI 21/11/24

You might also like