7 Classification
7 Classification
7 Classification
Concepts and
Techniques
1
What is classification?
What is prediction?
2
Classification vs. Prediction
Classification:
predicts categorical class labels
target marketing
medical diagnosis
3
Classification—A Two-Step
Process
set
The model is represented as classification rules,
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
M erlisa Associate Prof 7 no
G eorge Professor 5 yes
Joseph Assistant Prof 7 yes
6
Supervised vs. Unsupervised
Learning
Supervised learning (classification)
Supervision: The training data (observations,
measurements, etc.) are accompanied by
labels indicating the class of the observations
New data is classified based on the training
set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations,
etc. with the aim of establishing the existence
of classes or clusters in the data 7
Issues regarding classification and
prediction
8
Issues regarding classification and
prediction (1): Data Preparation
Data cleaning
Preprocess data in order to reduce noise and
handle missing values
Relevance analysis (feature selection)
Remove the irrelevant or redundant attributes
Data transformation
Generalize and/or normalize data
9
Issues regarding classification and
prediction (2): Evaluating Classification
Methods
Predictive accuracy (correctly predict the class
of new data)
Speed and scalability
time to construct the model
Robustness
handling noise and missing values
Scalability
efficiency in disk-resident databases
Interpretability:
understanding and insight provided by the
model
Goodness of rules
decision tree size
10
Classification by decision tree induction
11
Classification by Decision Tree
Induction
Decision tree
A flow-chart-like tree structure
At start, all the training examples are at the root
Partition examples recursively based on selected
attributes
Tree pruning
Identify and remove branches that reflect noise or
outliers
Use of decision tree: Classifying an unknown sample
Test the attribute values of the sample against the 12
Training Dataset
age income student credit_rating
<=30 high no fair
<=30 high no excellent
31…40 high no fair
>40 medium no fair
>40 low yes fair
>40 low yes excellent
31…40 low yes excellent
<=30 medium no fair
<=30 low yes fair
>40 medium yes fair
<=30 medium yes excellent
31…40 medium no excellent
31…40 high yes fair
>40 medium no excellent
13
Output: A Decision Tree for “buys_computer”
age?
<=30 overcast
30..40 >40
no yes no yes
14
Algorithm for Decision Tree
Induction
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-
conquer manner
At start, all the training examples are at the root
discretized in advance)
Examples are partitioned recursively based on selected
attributes
Test attributes are selected on the basis of a heuristic or
15
Attribute Selection
Measure
Information gain
All attributes are assumed to be categorical
16
Information Gain
17
Information Gain in Decision
Tree Induction
18
Attribute Selection by
Information Gain Computation
5 4
E ( age) = I ( 2,3) + I ( 4,0)
Class P: buys_computer = 14 14
“yes” 5
+ I (3,2) =0.69
14
Class N: buys_computer =
“no” Hence
I(p, n) = I(9, 5) =0.940 Gain(age) = I ( p, n) − E (age)
Compute the entropy for
Similarly
age:
age pi ni I(pi, ni) Gain(income) = 0.029
<=30 2 3 0.971
Gain( student ) = 0.151
30…40 4 0 0
>40 3 2 0.971 Gain(credit _ rating ) = 0.048
19
Extracting Classification Rules from
Trees
Attribute construction
Create new attributes based on existing ones
replication
23
Classification in Large Databases
classification methods)
convertible to simple and easy to understand
classification rules
can use SQL queries for accessing databases
24
Scalable Decision Tree
Induction Methods in Data
Mining Studies
SLIQ
builds an index for each attribute and only class
PUBLIC
integrates tree splitting and tree pruning: stop
classification-trees
Semantic interpretation problems.
26
Presentation of Classification
Results
27
Bayesian Classification
28
Bayesian Classification: Why?
30
Naive Bayesian Classifier (simple
Bayesian classifier) works as
34
Bayesian classification
The classification problem may be
formalized using a-posteriori probabilities:
P(C|X) = prob. that the sample tuple
X=<x1,…,xk> is of class C.
E.g. P(class=N |
outlook=sunny,windy=true,…)
Bayes theorem:
P(C|X) = P(X|C)·P(C) / P(X)
P(X) is constant for all classes
P(C) = relative freq of class C samples
C such that P(C|X) is maximum =
C such that P(X|C)·P(C) is maximum
Problem: computing P(X|C) is unfeasible!
36
The independence hypothesis…
38
Bayesian Belief Networks
Bayesian belief network allows a subset of the
variables conditionally independent
A graphical model of causal relationships
Several cases of learning Bayesian belief networks
Given both network structure and all the
variables: easy
Given network structure but only some variables
When the network structure is not known in
advance
39
Classification by backpropagation
40
Neural Networks
Advantages
prediction accuracy is generally high
errors
output may be discrete, real-valued, or a
Criticism
long training time
(weights)
41
A Neuron
- µk
x0 w0
x1 w1
∑ f
output y
xn wn
one
For each unit
Compute the net input to the unit as a linear
combination of all the inputs to the unit
Compute the output value using the activation
43
Multi-Layer Perceptron
Output vector
Err j = O j (1 − O j )∑ Errk w jk
Output nodes k
θ j = θ j + (l) Err j
wij = wij + (l ) Err j Oi
Hidden nodes Err j = O j (1 − O j )(T j − O j )
wij 1
Oj = −I j
1+ e
Input nodes
I j = ∑ wij Oi + θ j
i
Input vector: xi
44
Classification based on concepts from
association rule mining
45
Association-Based Classification
47
Other Classification Methods
48
Instance-Based Methods
Instance-based learning:
Store training examples and delay the
neighbors
Distance-weighted nearest neighbor algorithm
Weight the contribution of each of the k 1
w≡
d ( xq , xi )2
neighbors according to their distance to the
query point xq
giving greater weight to closer neighbors
Similarly, for real-valued target functions
55
Fuzzy Set
Approaches
57
What Is
Prediction?
62
Prediction: Categorical Data
63