Lec4 PDF
Lec4 PDF
Rule-based Classifiers
¾ Evaluating a rule
¾ Algorithms
PRISM
Incremental reduced-error pruning
RIPPER
¾ Handling
Missing values
Numeric attributes
¾ Rule-based classifiers versus decision trees → C4.5rules,
C4.5rules PART
Classification Rules
z “if…then…” rules
(Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds
(Taxable_Income < 50K) ∧ (Refund=Yes) → Evade=No
z Rule: (Condition) → y
– where
¾ Condition is a conjunction of attribute tests
(A1 = v1) and (A2 = v2) and … and (An = vn)
¾ y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
TNM033: Introduction to Data Mining 2
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds
Motivation
A
1
3
2
B C C
3
1 2
1 2 3 1 2 3
Y C C
D N N D N N
1 2 3 1 2 3 1 2 3
1 2 3
D N N D N N Y N N Y N N
1 2 3 1 2 3
Y N N Y N N
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
(Status = Single) → No
(n is the number of records in our
sample) Coverage = 40%, Accuracy = 50%
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
z Direct Method
¾ Extract rules directly from data
¾ e.g.: RIPPER, Holte’s 1R (OneR
RIPPER Holte’ OneR)
z Indirect Method
¾ Extract rules from other classification models (e.g.
decision trees, etc).
¾ e.g: C4.5rules
Two issues:
– How to choose the best test? Which attribute to choose?
– When to stop building a rule?
If (x > 1.2)
then class = a
Available in WEKA
z Other metrics:
z The data set is split into a training set and a prune set
z Reduced Error Pruning
1. Remove one of the conjuncts in the rule
2. Compare error rate on the prune set before and after
pruning
3. If error improves, prune the conjunct
z Available in Weka
Example
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Birds Reptiles
– Available in WEKA
z Available in WEKA:
WEKA Prism,
Prism Ripper,
Ripper PART,
PART OneR