Welcome
Chapter -1
Introduction to Big Data and
Big Data Analytics
_________________________________________
@ 2024 FTVT Institute All Rights Reserved
Contents
§ Evolution of Big data
§ sources of Big Data
§ What is Big Data?
§ Characteristic of Big Data( 6 Vs)
§ Tools used in Big Data
§ Introduction to Big Data analytics
§ Big Data analytics goals
§ Applications/use cases of Big Data analytics
§ Challenges of Big Data
§ How Hadoop solves the Big Data problem
ICT Department TVTI 21/11/24
Evolution of Big Data
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
ICT Department TVTI 21/11/24
Unit of Data Exact size Approximate Size Examples
size
KB (kilobyte ) 2 10 or 1024 (103 or one thousand) A typical joke =1KB
bytes bytes
4
MB(megabyte ) 2 20 bytes (106 or one million) bytes Complete work of Shakespeare =5MB
GB (gigabyte ) 2 30 bytes (109 or one billion) bytes Ten yards of books on a shelf = 1GB
TB (terabyte) 2 40 bytes (1012 or one trillion) bytes All the X-rays for a large hospital =1TB Tweets;
created daily =121TB;
PB (peta byte) 2 50 bytes (1015 or one quadrillion) All U.S. academic research libraries = 2PB B
bytes Data processed in a day by Google =24PB
I
EB (exa byte) 2 60 bytes (1018 or one Quintillion) Total global data created in 2006 = 161EB G
bytes
ZB (zetta byte) 2 70 bytes (1021 or one Sextillion) Total amount of global data created in 2012 = D
bytes 2.7 ZB and expected 44 ZB by 2020 A
T
A
YB (yotta byte) 2 80 bytes (1024 or one Septillion)
bytes
ICT Department TVTI 21/11/24
Evolution of Big Data by technology
ICT Department TVTI 21/11/24
Evolution of Big Data by Internet Of Things
ICT Department TVTI 21/11/24
Evolution of Big Data by Social Media
ICT Department TVTI 21/11/24
Evolution of Big Data by other factors
ICT Department TVTI 21/11/24
Big Data sources
Human Generated Data
¡ is emails, documents, photos and tweets. We are generating this data faster than
ever. Just imagine the number of videos uploaded to You Tube and tweets swirling
around. This data can be Big Data too.
Machine Generated Data
¡ is a new breed of data. This category consists of sensor data, and logs generated by
'machines'
¡ such as email logs, click stream logs, etc. Machine generated data is orders of
magnitude larger than Human Generated Data.
ICT Department TVTI 21/11/24
Big Data sources
Web Data
¡ Social media data : Sites like Facebook, Twitter, LinkedIn generate a large
amount of data
¡ Click stream data : when users navigate a website, the clicks are logged for
further analysis (like navigation patterns). Click stream data is important in on
line advertising and E-Commerce
12+ TBs of tweet data every day
25+ TBs of ? TBs of data every day
log data every day
ICT Department TVTI 21/11/24
Big Data sources
sensor data : sensors embedded in roads to monitor traffic and misc.
4.6 billion
30 billion RFID tags today camera phones
(1.3B in 2005) world wide
100s of millions
of GPS enabled
devices sold
annually
2+ billion
people on the Web
by end 2011
76 million smart meters in 2009…
200M by 2014
ICT Department TVTI 21/11/24
What is Big Data?
Big data
is the term for a collection of data sets so
large and complex that it becomes difficult
to process using traditional data processing
applications.
Real world examples of Big Data
Facebook : has 40 PB of data and captures 100
TB / day
Yahoo : 60 PB of data
Twitter : 8 TB / day
EBay : 40 PB of data, captures 50TB/ day
ICT Department TVTI 21/11/24
Characteristics of Big Data( 3 Vs of Big data )
ICT Department TVTI 21/11/24
Characteristics of Big Data( 6 Vs of Big data )
Variability
ICT Department TVTI 21/11/24
V-Volume Characteristic of Big Data( 6Vs of Big data )
1st V-volume
ICT Department TVTI 21/11/24
V-Volume Characteristic of Big Data( 6Vs of Big data )
1st V-volume
¡ Data Volume
• 44x increase from 2009
to 2020 From 0.8
zettabytes to 35zb
• Data volume is
increasing exponentially
ICT Department TVTI 21/11/24
V-velocity Characteristic of Big Data( 6 Vs of Big data )
2nd V-velocity: Data is being generated at every minute
ICT Department TVTI 21/11/24
V-velocity Characteristic of Big Data( 6 Vs of Big data )
2nd V-velocity: Data is being generated at every minute
ICT Department TVTI 21/11/24
V-velocity Characteristic of Big Data( 6 Vs of Big data )
19
2nd V-velocity: Data is being generated at every minute
ICT Department TVTI 21/11/24
V-Variety Characteristic of Big Data( 6 Vs of Big data )
3rd V-Variety: different kinds of data generated from various sources
ICT Department TVTI 21/11/24
V-Variety Characteristic of Big Data( 6 Vs of Big data )
3rd V-Variety: different kinds of data generated from various sources
ICT Department TVTI 21/11/24
V - Veracity Characteristic of Big Data( 6 Vs of Big data )
4th V - Veracity: uncertainties and inconsistencies in big data
ICT Department TVTI 21/11/24
V - Veracity Characteristic of Big Data( 6 Vs of Big data )
4th V - Veracity: uncertainties and inconsistencies in big data
ICT Department TVTI 21/11/24
V - Value characteristic of Big Data( 6 Vs of Big data )
24
5th V - Value: Mechanism to bring correct meaning out of the data
ICT Department TVTI 21/11/24
V - Value characteristic of Big Data( 6 Vs of Big data )
5th V - Value: Mechanism to bring correct meaning out of the data
Note: Use the volume of data to deliver new value.
ICT Department TVTI 21/11/24
V - Value characteristic of Big Data( 6 Vs of Big data )
5th V - Value: Mechanism to bring correct meaning out of the data
Note: Use the volume of data to deliver new value.
ICT Department TVTI 21/11/24
V - Variability characteristic of Big Data( 6 Vs of Big data )
6th V - Variability
Variability
ICT Department TVTI 21/11/24
Characteristics of Big Data( 6 Vs of Big data )
Variability
ICT Department TVTI 21/11/24
Characteristics of Big Data(6Vs of Big data )
Variability
ICT Department TVTI 21/11/24
Traditional DB vs Big Data
Traditional data base/
Big Data
data warehouse
Data Data
¡ TB to PB ¡ PB to ZB
¡ Only structured ¡ structured and unstructured
Hardware Hardware
¡ big central servers ¡ computer clusters
¡ Expensive ¡ Cost effective
¡ Hardware reliability ¡ Unreliable HW
¡ Limited scalability ¡ Scales further
Software Software
¡ Centralized ¡ Distributed
¡ Schema based ¡ Not schema based
¡ Oracle/mysql/sql server ¡ Hadoop
ICT Department TVTI 21/11/24
Big data tools
31
ICT Department TVTI 21/11/24
What is Big data analytics
ICT Department TVTI 21/11/24
Stages in Big data analytics
ICT Department TVTI 21/11/24
Big data analytics goals
ICT Department TVTI 21/11/24
Big data analytics goals
1.Making organizations more smarter and efficient
ICT Department TVTI 21/11/24
Big data analytics goals
1.Making organizations more smarter and efficient
ICT Department TVTI 21/11/24
Big data analytics goals
ICT Department TVTI 21/11/24
Big data analytics goals
ICT Department TVTI 21/11/24
Big data analytics application domains
ICT Department TVTI 21/11/24
Big data analytics use cases
ICT Department TVTI 21/11/24
Big data analytics use cases
ICT Department TVTI 21/11/24
Big data analytics use cases
ICT Department TVTI 21/11/24
Big data analytics use cases
ICT Department TVTI 21/11/24
Big data analytics use cases
ICT Department TVTI 21/11/24
Big data analytics use cases
IBM Big data analytics – Big data collected by smart meters
ICT Department TVTI 21/11/24
Big data analytics use cases
IBM Big data analytics – problem with smart meter big data
ICT Department TVTI 21/11/24
Big data analytics use cases
IBM Big data analytics – how smart meter big data analysed
ICT Department TVTI 21/11/24
Big data analytics use cases
IBM Big data analytics – IBM smart meter solution
ICT Department TVTI 21/11/24
Types of Big data analytics
What happened?
Descriptive Analysis:
q Examines historical trends
q Leverages internal data sources
ICT Department TVTI 21/11/24
Types of Big data analytics-Descriptive
Descriptive Analytics that help answer what happened
Eg:
Average Café Metrics
Average wait for customers: 3.4 minutes
Average customers serviced: 135
Recent Café Metrics
Average wait for customers: 4.8 minutes (41% higher)
Total customers serviced: 112. (17% lower)
ICT Department TVTI 21/11/24
Types of Big data analytics
Predictive Analytics that help answer what might happen next
Predictive Analytics: the probability of the future events
ICT Department TVTI 21/11/24
Types of Big data analytics
If this
happens then
that will
happen
ICT Department TVTI 21/11/24
Types of Big data analytics
Prescriptive Analytics that help answer what should be done
Prescriptive Analysis:
q Creates predictions
q Offers recommendations
q Builds an optimization model
ICT Department TVTI 21/11/24
Types of Big data analytics
Diagnostic Analytics that help answer why something happened
ICT Department TVTI 21/11/24
Major Differences Between Various Big Data Analytics
Type Focus Purpose Techniques/Methods Examples Tools
Summarizes past data to Data aggregation, Monthly sales report, website
Descriptive Analytics What happened? Tableau, Power BI, Excel
identify patterns or trends. summarization traffic stats
Identifies the causes behind Root cause analysis, Analyzing why sales dropped,
Diagnostic Analytics Why did it happen? Python, R, SQL, SAS
past outcomes. correlations customer churn reasons
Forecasts future outcomes
Regression, forecasting, ML Sales forecasting, predicting TensorFlow, PyTorch, IBM
Predictive Analytics What will happen? using historical data and
models customer churn SPSS
machine learning models.
Provides actionable
Optimization, simulation, AI Supply chain optimization, Gurobi, CPLEX, SAS, ML
Prescriptive Analytics What should we do? recommendations based on
models pricing strategy frameworks
predictions and simulations.
Uncovers new patterns,
Identifying new customer
Exploratory Analytics What can we discover? relationships, or insights Data visualization, clustering Python (matplotlib, seaborn), R
segments
without predefined hypotheses.
Mimics human cognition to
How can we think like Sentiment analysis, chatbot IBM Watson, Microsoft Azure
Cognitive Analytics interpret complex, AI, natural language processing
humans? development AI
unstructured data.
Processes data in real-time to Streaming analytics, event Monitoring network security, Apache Kafka, Splunk, AWS
Real-Time Analytics What is happening now?
provide immediate insights. processing live stock market trends Kinesis
Note: Look into the supplementary document for more details
ICT Department TVTI 21/11/24
Challenges/problems with Big data
ICT Department TVTI 21/11/24
Challenges/problems with Big data
ICT Department TVTI 21/11/24
Challenges/problems with Big data
ICT Department TVTI 21/11/24
HADOOP is solution to Big data problems
ICT Department TVTI 21/11/24
HADOOP is solution to Big data problems
ICT Department TVTI 21/11/24
HADOOP is solution to Big data problems
ICT Department TVTI 21/11/24
HADOOP is solution to Big data problems
ICT Department TVTI 21/11/24
Introduction to Big data and Analytics
ANY QUESTIONS / DOUBTS
???
ICT Department TVTI 21/11/24