Basic
Concepts
in
Big
Data
ChengXiang
(Cheng)
Zhai
Department
of
Computer
Science
University
of
Illinois
at
Urbana-Champaign
hBp://www.cs.uiuc.edu/homes/czhai
czhai@illinois.edu
What
is
big
data?
"Big
Data
are
high-volume,
high-velocity,
and/or
high-variety
informaNon
assets
that
require
new
forms
of
processing
to
enable
enhanced
decision
making,
insight
discovery
and
process
opNmizaNon
(Gartner
2012)
Complicated
(intelligent)
analysis
of
data
may
make
a
small
data
appear
to
be
big
BoBom
line:
Any
data
that
exceeds
our
current
capability
of
processing
can
be
regarded
as
big
Why
is
big
data
a
big
deal?
Government
Obama
administraNon
announced
big
data
iniNaNve
Many
dierent
big
data
programs
launched
Private
Sector
Walmart handles more than 1 million customer transactions
every hour, which is imported into databases estimated to
contain more than 2.5 petabytes of data
Facebook handles 40 billion photos from its user base.
Falcon Credit Card Fraud Detection System protects 2.1 billion
active accounts world-wide
Science
Large Synoptic Survey Telescope will generate 140 Terabyte
of data every 5 days.
Biomedical computation like decoding human Genome &
personalized medicine
Social science revolution
-
Lifecycle
of
Data:
4
As
AggregaNon
Analysis
AcquisiNon
ApplicaNon
ComputaNonal
View
of
Big
Data
Data
Visualiza8on
Data
Access
Data
Understanding
Data
Analysis
Data
Integra8on
Forma&ng,
Cleaning
Storage
Data
Big
Data
&
Related
Topics/Courses
Human-Computer
Interac8on
CS199
Data
Visualiza8on
Databases
Informa8on
Retrieval
Data
Access
Computer
Vision
Speech
Recogni8on
Data
Understanding
Natural
Language
Processing
Machine
Learning
Data
Analysis
Data
Mining
Data
Integra8on
Data
Warehousing
Forma&ng,
Cleaning
Signal
Processing
Storage
Informa8on
Theory
Many
Applica8ons!
Data
Some
Data
Analysis
Techniques
Visualiza8on
Classica8on
Time
Series
Predic8ve
Modeling
Clustering
Example
of
Analysis:
Clustering
&
Latent
Factor
Analysis
Group
M1
Group
U1
Group
U2
Movie
1
Movie
2
User1
3.5
User2
Group
M2
Movie
m
5
User
n
Example
of
Analysis:
PredicNve
Modeling
Group
M1
Group
U1
Group
U2
Movie
1
Movie
2
User1
3.5
User2
Group
M2
Movie
m
5
=?
User
n
Does
user2
like
movie
m?
(Binary)
Classica8on
What
raNng
is
user2
likely
going
to
give
movie
m?
Regression
Some
topics
well
cover