Data Mining Alternative Classification Notes
Data Mining Alternative Classification Notes
Rule: (Condition) → y
– where
◆ Condition is a conjunctions of attributes
◆ y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
◆ (Blood Type=Warm) (Lay Eggs=Yes) → Birds
◆ (Taxable Income < 50K) (Refund=Yes) → Evade=No
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
rule (Status=Single) → No
Coverage = 40%, Accuracy = 50%
How does Rule-based Classifier Work?
R1: (Give Birth = no) (Can Fly = yes) → Birds
R2: (Give Birth = no) (Live in Water = yes) → Fishes
R3: (Give Birth = yes) (Blood Type = warm) → Mammals
R4: (Give Birth = no) (Can Fly = no) → Reptiles
Exhaustive rules
– Classifier has exhaustive coverage if it accounts
for every possible combination of attribute
values
– Each record is covered by at least one rule
Classification Rules
Refund (Refund=Yes) ==> No
Yes No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
NO Marital
{Single, Status (Refund=No, Marital Status={Single,Divorced},
{Married } Taxable Income>80K) ==> Yes
Divorced}
(Refund=No, Marital Status={Married}) ==> No
Taxable NO
Income
< 80K > 80K
NO YES
R
ules are mutually exclusive and exhaustive
Taxable 4 NO
Yes Married 120K No
Income 5 No Divorced 95K Yes
< 80K > 80K
6 No Married 60K No
NO YES 7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
Rule-based ordering
– Individual rules are ranked based on their quality
Class-based ordering
– Rules that belong to the same class appear together
Direct Method:
◆ Extract rules directly from data
◆ e.g.: RIPPER, CN2, Holte’s 1R
R1 R1
R2
Rule Growing
Rule Evaluation
Instance Elimination
Stopping Criterion
Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K
RIPPER Algorithm:
– Start from an empty rule: {} => class
– Add conjuncts that maximizes FOIL’s information
gain measure:
◆ R0: {} => class (initial rule)
◆ R1: {A} => class (rule after adding conjunct)
◆ Gain(R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]
◆ where t: number of positive instances covered by both R0
and R1 p0: number of positive instances covered by R0
n0: number of negative instances covered by R0 p1:
number of positive instances covered by R1 n1:
number of negative instances covered by R1
Accuracy
Metrics:– =nc Rule R1 has accuracy=50/55, R2 has accuracy
n =2/2 nc +1
f i
where R = 2 i=1 fi
log ei k is the
numbe of classes
fi is the observed frequencey of class i
ei is the expeceted frequency of class
i
Stopping criterion
– Compute the gain
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
– If gain is not significant, discard the new rule
Rule Pruning
– Similar to post-pruning of decision trees –
Reduced Error Pruning:
◆ Remove one of the conjuncts in the rule
◆ Compare error rate on validation set before and
after pruning
◆ If error improves, prune the conjunct
Indirect Methods
P
No Yes
Q R
No Yes No Yes
- + + Q
No Yes
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
Rule Set r1: (P=No,Q=No) ==>
r2: (P=No,Q=Yes) ==> + r3:
(P=Yes,R=No) ==> + r4:
(P=Yes,R=Yes,Q=No) ==> r5:
(P=Yes,R=Yes,Q=Yes) ==> +
Birds Reptiles
Examples:
– Rote-learner
◆ Memorizes entire training data and performs
classification only if attributes of record match one of
the training examples exactly
– Nearest neighbor
◆ Uses k “closest” points (nearest neighbors) for
performing classification
Basic idea:
– If it walks like a duck, quacks like a duck, then
it’s probably a duck
X X X
d(p,q) = (p −q ) i i i
2
Scaling issues
– Attributes may have to be scaled to prevent
distance measures from being dominated by
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
one of the attributes
– Example:
◆ height of a person may vary from 1.5m to 1.8m
◆ weight of a person may vary from 90lb to 300lb
◆ income of a person may vary from $10K to $1M
Nearest neighbor Classification…
P(A| C)P(C)
P(C | A) =
P(A)
Example of Bayes Theorem
Given:
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20
P(A A A ) 1 2 n
c c c c Class: P(C)
= Nc/N
Tid Refund Marital Taxable
Status Income Evade – e.g., P(No) = 7/10, P(Yes) =
3/10
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No For discrete attributes:
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
10 No Single 90K Yes
How to Estimate Probabilities from Data?
c c c
6 No Married 60K No
7 Yes Divorced 220K No
– One for each (Ai,ci)
8 No Single 85K Yes
pair
9 No Married 75K No
10 No Single 90K Yes For (Income, Class=No):
– If Class=No
1
2 (54.54)
(120−110)2
N
Original: P(Ai |C) = ic Nc =number of examples that are from class C
Nc
+1
Laplace: P(Ai |C) = N ic
p: prior probability Nc +c
m: parameter
+mp
m-estimate : P(Ai |C) = N ic
Nc +m
whale
no
yes
no
no
yes
yes
no
no
non-
mammals
mammals
Bayes Classifier
frog no no sometimes yes non-
mammals
komodo no no no yes non- A: attributes
mammals
bat yes yes no yes mammals
M: mammals
pigeon no yes no yes non- N: non-mammals
mammals
cat yes no no yes mammals
leopard shark yes no yes no non-
mammals
turtle no no sometimes yes non-
mammals
penguin no no sometimes yes non-
mammals
porcupine yes no no yes mammals
eel no no yes no non-
mammals
salamander no no sometimes yes non-
mammals
gila monster no no no yes non-
mammals
platypus no no no yes mammals
owl no yes no yes non-
mammals
dolphin yes no yes no mammals
© Tan,Steinbach,
eagle no Kumaryes no Introductionnon-
yes to Data Mining ‹#›
mammals
P(A| M ) = = 0.06 P(A| N) = = 0.0042 P(A| M )P(M ) =