Introduction to Big Data
and Big Data Analytics
1
12/9/24
contents
2
Evolution of Big data
sources of Big Data
What is Big Data?
Characteristic of Big Data( 5 Vs)
Tools used in Big Data
Introduction to Big Data analytics
Big Data analytics goals
Applications/use cases of Big Data analytics
Challenges of Big Data
How Hadoop solves the Big Data problem
12/9/24
Evolution of Big Data
3
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
12/9/24
Unit of Exact Approximate Size Examples
Data size size
KB (kilobyte ) 2 10 or (103 or one A typical joke =1KB
1024 bytes thousand) bytes 4
MB(megabyte ) 2 20 bytes (106 or one million) Complete work of Shakespeare
bytes =5MB
GB (gigabyte ) 2 30 bytes (109 or one billion) Ten yards of books on a shelf = 1GB
bytes
TB (terabyte) 2 40 bytes (1012 or one trillion) All the X-rays for a large hospital
bytes =1TB Tweets; created daily =121TB;
PB (peta byte) 2 50 bytes (1015 or one All U.S. academic research libraries B
quadrillion) bytes = 2PB
Data processed in a day by Google I
=24PB
G
EB (exa byte) 2 60 bytes (1018 or one Total global data created in 2006 =
Quintillion) bytes 161EB D
A
ZB (zetta byte) 2 70 bytes (1021 or one Total amount of global data created
Sextillion) bytes in 2012 = 2.7 ZB and expected 44 ZB
T
by 2020 A
YB (yotta byte) 2 80 bytes (1024 or one
Dr vasu pinnti ICT
Septillion) bytes 12/9/24
Evolution of Big Data by technology
5
12/9/24
Evolution of Big Data by Internet Of Things
6
12/9/24
Evolution of Big Data by Social Media
7
12/9/24
Evolution of Big Data by other factors
8
12/9/24
Big Data sources
9
Human Generated Data
¡ is emails, documents, photos and tweets. We are generating
this data faster than ever. Just imagine the number of videos
uploaded to You Tube and tweets swirling around. This data
can be Big Data too.
Machine Generated Data
¡ is a new breed of data. This category consists of sensor data,
and logs generated by 'machines'
¡ such as email logs, click stream logs, etc. Machine generated
data is orders of magnitude larger than Human Generated
Data.
12/9/24
Big Data sources
10
Web Data
¡ Social media data : Sites like Facebook, Twitter, LinkedIn
generate a large amount of data
¡ Click stream data : when users navigate a website, the clicks
are logged for further analysis (like navigation patterns). Click
stream data is important in on line advertising and E-
Commerce
12+ TBs of tweet data every day
25+ TBs of
log data every day ? TBs ofdata every day
12/9/24
Big Data sources
11
sensor data : sensors embedded in roads to monitor traffic and misc.
4.6 billion
30 billion RFID tags today camera phones
(1.3B in 2005) world wide
100s of millions
of GPS enabled
devices sold
annually
2+ billion
people on the Web
by end 2011
76 million smart meters in 2009…
200M by 2014
12/9/24
What is Big Data?
12
Big data
is the term for a collection of data sets
so large and complex that it becomes
difficult to process using traditional
data processing applications.
Real world examples of Big Data
Facebook : has 40 PB of data and captures
100 TB / day
Yahoo : 60 PB of data
Twitter : 8 TB / day
EBay : 40 PB of data, captures 50TB/ day
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
13
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
14
1st V-volume
¡ Data Volume
• 44x increase from 2009
to 2020 From 0.8
zettabytes to 35zb
• Data volume is
increasing exponentially
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
15
2nd V-velocity: Data is being generated at every minute
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
16
3rd V-Variety: different kinds of data generated from various sources
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
17
4th V - Veracity: uncertainties and inconsistencies in big data
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
18
5th V - Value: Mechanism to bring correct meaning out of the data
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
19
12/9/24
Characteristics of Big Data( 5 Vs of Big data )
20
12/9/24
Traditional DB vs Big Data
21
Traditional data base/
Big Data
data warehouse
Data Data
¡ TB to PB ¡ PB to ZB
¡ Only structured ¡ structured and unstructured
Hardware Hardware
¡ big central servers ¡ computer clusters
¡ Expensive ¡ Cost effective
¡ Hardware reliability ¡ Unreliable HW
¡ Limited scalability ¡ Scales further
Software Software
¡ Centralized ¡ Distributed
¡ Schema based ¡ Not schema based
¡ Oracle/mysql/sql server ¡ Hadoop
12/9/24
Big data tools
22
Dr vasu pinnti ICT 12/9/24
What is Big data analytics
23
12/9/24
Stages in Big data analytics
24
12/9/24
Big data analytics goals
25
12/9/24
Big data analytics goals
26
1.Making organizations more smarter and efficient
12/9/24
Big data analytics goals
27
1.Making organizations more smarter and efficient
12/9/24
Big data analytics goals
28
12/9/24
Big data analytics goals
29
12/9/24
Big data analytics application domains
30
12/9/24
Big data analytics use cases
31
12/9/24
Big data analytics use cases
32
Dr vasu pinnti ICT 12/9/24
Big data analytics use cases
33
12/9/24
Big data analytics use cases
34
12/9/24
Big data analytics use cases
35
12/9/24
Big data analytics use cases
36
IBM Big data analytics – Big data collected by smart meters
12/9/24
Big data analytics use cases
37
IBM Big data analytics – problem with smart meter big data
12/9/24
Big data analytics use cases
38
IBM Big data analytics – how smart meter big data analysed
12/9/24
Big data analytics use cases
39
IBM Big data analytics – IBM smart meter solution
12/9/24
Types of Big data analytics
40
12/9/24
Types of Big data analytics
41
12/9/24
Types of Big data analytics
42
12/9/24
Types of Big data analytics
43
12/9/24
Challenges/problems with Big data
44
12/9/24
Challenges/problems with Big data
45
12/9/24
Challenges/problems with Big data
46
12/9/24
HADOOP is solution to Big data problems
47
12/9/24
HADOOP is solution to Big data problems
48
12/9/24
HADOOP is solution to Big data problems
49
12/9/24
HADOOP is solution to Big data problems
50
12/9/24
Introduction to Big data and Analytics
51
ANY QUESTIONS /
DOUBTS
???
12/9/24