[go: up one dir, main page]

0% found this document useful (0 votes)
79 views14 pages

Lec4 PDF

This document discusses rule-based classifiers for data mining. Rule-based classifiers represent data as "if-then" rules that map conditions to class labels. Algorithms like PRISM and RIPPER are used to generate rules from data. Rules are evaluated based on their coverage of data instances and accuracy in classification. A rule-based classifier works by applying rules in order to a new instance until a rule matches or a default prediction is made. Rules can be extracted directly from data or derived from other models like decision trees.

Uploaded by

Akshay Kharat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views14 pages

Lec4 PDF

This document discusses rule-based classifiers for data mining. Rule-based classifiers represent data as "if-then" rules that map conditions to class labels. Algorithms like PRISM and RIPPER are used to generate rules from data. Rules are evaluated based on their coverage of data instances and accuracy in classification. A rule-based classifier works by applying rules in order to a new instance until a rule matches or a default prediction is made. Rules can be extracted directly from data or derived from other models like decision trees.

Uploaded by

Akshay Kharat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Data Mining

Rule-based Classifiers

¾ What is a classification rule Section 5.1 of course book.

¾ Evaluating a rule
¾ Algorithms
ƒ PRISM
Incremental reduced-error pruning
ƒ RIPPER

¾ Handling
ƒ Missing values
ƒ Numeric attributes
¾ Rule-based classifiers versus decision trees → C4.5rules,
C4.5rules PART

TNM033: Introduction to Data Mining 1

Classification Rules

z “if…then…” rules
(Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds
(Taxable_Income < 50K) ∧ (Refund=Yes) → Evade=No

z Rule: (Condition) → y
– where
¾ Condition is a conjunction of attribute tests
(A1 = v1) and (A2 = v2) and … and (An = vn)
¾ y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
TNM033: Introduction to Data Mining 2
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds

R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds


R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
TNM033: Introduction to Data Mining 3

Motivation

z Consider the rule set


– Attributes A, B, C, and D can have values 1, 2, and 3
A = 1 ∧ B = 1 → Class = Y
C = 1 ∧ D = 1 → Class = Y
Otherwise, Class = N

z How to represent it as a decision tree?


– The rules need a common attribute
A = 1 ∧ B = 1 → Class = Y
A = 1 ∧ B = 2 ∧ C = 1 ∧ D = 1 → Class = Y
A = 1 ∧ B = 3 ∧ C = 1 ∧ D = 1 → Class = Y
A = 2 ∧ C = 1 ∧ D = 1 → Class = Y
A = 3 ∧ C = 1 ∧ D = 1 → Class = Y
Otherwise, Class = N

TNM033: Introduction to Data Mining 4


Motivation

A
1
3
2
B C C

3
1 2
1 2 3 1 2 3
Y C C
D N N D N N

1 2 3 1 2 3 1 2 3
1 2 3
D N N D N N Y N N Y N N

1 2 3 1 2 3
Y N N Y N N

TNM033: Introduction to Data Mining 5

Application of Rule-Based Classifier

z A rule r covers an instance x if the attributes of the instance


satisfy the condition (LHS) of the rule

R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds


R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians

Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?

The rule R1 covers a hawk ⇒ Class = Bird


The rule R3 covers the grizzly bear ⇒ Class = Mammal

TNM033: Introduction to Data Mining 6


Rule Coverage and Accuracy
z Quality of a classification rule Tid Refund Marital Taxable
Income Class
can be evaluated by Status

1 Yes Single 125K No


– Coverage: fraction of records
that satisfy the antecedent of a 2 No Married 100K No
rule 3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
– Accuracy: fraction of records
covered by the rule that belong 7 Yes Divorced 220K No

to the class on the RHS 8 No Single 85K Yes


9 No Married 75K No
10 No Single 90K Yes
10

(Status = Single) → No
(n is the number of records in our
sample) Coverage = 40%, Accuracy = 50%

TNM033: Introduction to Data Mining 7

How does Rule-based Classifier Work?

R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds


R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians

Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?

A lemur triggers rule R3 ⇒ class = mammal


A turtle triggers both R4 and R5 → voting or ordering rules
A dogfish shark triggers none of the rules → default rule

TNM033: Introduction to Data Mining 8


Building Classification Rules

z Direct Method
¾ Extract rules directly from data
¾ e.g.: RIPPER, Holte’s 1R (OneR
RIPPER Holte’ OneR)

z Indirect Method
¾ Extract rules from other classification models (e.g.
decision trees, etc).
¾ e.g: C4.5rules

TNM033: Introduction to Data Mining 9

A Direct Method: Sequential Covering

z Let E be the training set


– Extract rules one class at a time

For each class C


1. Initialize set S with E
2. While S contains instances in class C
3. Learn one rule R for class C
4. Remove training records covered by the rule R

Goal: to create rules that cover many examples of a class C and


none (or very few) of other classes

TNM033: Introduction to Data Mining 10


A Direct Method: Sequential Covering

z How to learn a rule for a class C?

1. Start from an empty rule {} → class = C


2. Grow a rule by adding a test to LHS (a = v)
3. Repeat Step (2) until stopping criterion is met

Two issues:
– How to choose the best test? Which attribute to choose?
– When to stop building a rule?

TNM033: Introduction to Data Mining 11

Example: Generating a Rule


b a b a b a
b b b a y b b b a y b b b a
a b b a a a a a a
b b a a b b
y b 2·6
b b b b b b b b
b a b b a b b a b
b b b
b b b b b b
x x
x 1·2 1·2

If {} If (x > 1.2) and (y > 2.6)


then class = a then class = a

If (x > 1.2)
then class = a

z Possible rule set for class “b”:


If (x ≤ 1.2) then class = b
If (x > 1.2) and (y ≤ 2.6) then class = b

z Could add more rules, get “perfect” rule set


TNM033: Introduction to Data Mining 12
Simple Covering Algorithm

z Goal: Choose a test that improves a quality measure for the


rules.
– E.g. maximize rule’s accuracy
z Similar to situation in decision trees: problem of selecting an
attribute to split on
– Decision tree algorithms maximize overall purity
z Each new test reduces rule’s coverage:
space of
examples
• t total number of instances covered by rule C
• p positive examples of the class predicted rule so far
by rule rule after
• t – p number of errors made by rule adding new
term
• Rules accuracy = p/t

TNM033: Introduction to Data Mining 13

When to Stop Building a Rule

z When the rule is perfect, i.e. accuracy = 1


z When increase in accuracy gets below a given
threshold
z When the training set cannot be split any further

TNM033: Introduction to Data Mining 14


PRISM Algorithm
For each class C
Initialize E to the training set
While E contains instances in class C
Create a rule R with an empty left-hand side that predicts class C
Until R is perfect (or there are no more attributes to use) do
For each attribute A not mentioned in R, and each value v,
Consider adding the condition A = v to the left-hand side of R
Select A and v to maximize the accuracy p/t
(break ties by choosing the condition with the largest p)
Add A = v to R
Remove the instances covered by R from E

Learn one rule

Available in WEKA

TNM033: Introduction to Data Mining 15

Rule Evaluation in PRISM


t : Number of instances
covered by rule
p : Number of instances
covered by rule that
belong to the positive
class

z Produce rules that don’t cover negative instances, as quickly as


possible
z Disadvantage: may produce rules with very small coverage
– Special cases or noise? (overfitting)

TNM033: Introduction to Data Mining 16


Rule Evaluation

z Other metrics:

Accuracy after a Accuracy before a


new test is added new test is added

t: number of instances covered by the rule


• These measures take into p: number of instances covered by the rule that are positive
account the coverage/support examples
count of the rule k: number of classes
f: prior probability of the positive class

TNM033: Introduction to Data Mining 17

Missing Values and Numeric attributes

z Common treatment of missing values


– Algorithm delays decisions until a test involving attributes
with no missing values emerges

z Numeric attributes are treated just like they are in


decision trees
1. Sort instances according to attributes value
2. For each possible threshold, a binary-less/greater than test is
considered
– Choose a test A < v, if it is the one that produces higher accuracy

TNM033: Introduction to Data Mining 18


Rule Pruning

z The PRISM algorithm tries to get perfect rules, i.e.


rules with accuracy = 1 on the training set.
– These rules can be overspecialized → overfitting
– Solution: prune the rules

z Two main strategies:


– Incremental pruning: simplify each rule as soon as it is
built
– Global pruning: build full rule set and then prune it

TNM033: Introduction to Data Mining 19

Incremental Pruning: Reduced Error Pruning

z The data set is split into a training set and a prune set
z Reduced Error Pruning
1. Remove one of the conjuncts in the rule
2. Compare error rate on the prune set before and after
pruning
3. If error improves, prune the conjunct

z Sampling with stratification advantageous


– Training set (to learn rules)
¾ Validation set (to prune rules)
– Test set (to determine model accuracy)
TNM033: Introduction to Data Mining 20
Direct Method: RIPPER

z For 2-class problem, choose one of the classes as positive


class, and the other as negative class
– Learn rules for positive class
– Negative class will be default class

z For multi-class problem


– Order the classes according to increasing class prevalence (fraction of
instances that belong to a particular class)
– Learn the rule set for smallest class first, treat the rest as negative class
– Repeat with next smallest class as positive class

z Available in Weka

TNM033: Introduction to Data Mining 21

Direct Method: RIPPER

z Learn one rule:


– Start from empty rule
– Add conjuncts as long as they improve FOIL’s information gain
– Stop when rule no longer covers negative examples
¾ Build rules with accuracy = 1 (if possible)
– Prune the rule immediately using reduced error pruning
– Measure for pruning: W(R) = (p-n)/(p+n)
¾ p: number of positive examples covered by the rule in
the validation set
¾ n: number of negative examples covered by the rule in
the validation set
– Pruning starts from the last test added to the rule
¾ May create rules that cover some negative examples (accuracy < 1)

z A global optimization (pruning) strategy is also applied

TNM033: Introduction to Data Mining 22


Indirect Method: C4.5rules

z Extract rules from an unpruned decision tree


z For each rule, r: RHS → c, consider pruning the rule
z Use class ordering
– Each subset is a collection of rules with the same rule consequent
(class)
– Classes described by simpler sets of rules tend to appear first

TNM033: Introduction to Data Mining 23

Example
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds

TNM033: Introduction to Data Mining 24


C4.5 versus C4.5rules versus RIPPER
Give C4.5rules:
Birth? (Give Birth=No, Can Fly=Yes) → Birds
(Give Birth=No, Live in Water=Yes) → Fishes
Yes No
(Give Birth=Yes) → Mammals
(Give Birth=No, Can Fly=No, Live in Water=No) → Reptiles
Mammals Live In
( ) → Amphibians
Water?
Yes No RIPPER:
(Live in Water=Yes) → Fishes
Sometimes (Have Legs=No) → Reptiles

Fishes Amphibians Can (Give Birth=No, Can Fly=No, Live In Water=No)


Fly? → Reptiles
(Can Fly=Yes,Give Birth=No) → Birds
Yes No () → Mammals

Birds Reptiles

TNM033: Introduction to Data Mining 25

Indirect Method: PART

z Combines the divide-and-conquer strategy with separate-and-


conquer strategy of rule learning
1. Build a partial decision tree on the current set of instances
2. Create a rule from the decision tree
– The leaf with the largest coverage is made into a rule
3. Discarded the decision tree
4. Remove the instances covered by the rule
5. Go to step one

– Available in WEKA

TNM033: Introduction to Data Mining 26


Advantages of Rule-Based Classifiers

z As highly expressive as decision trees


z Easy to interpret
z Easy to generate
z Can classify new instances rapidly
z Performance comparable to decision trees
z Can easily handle missing values and numeric
attributes

z Available in WEKA:
WEKA Prism,
Prism Ripper,
Ripper PART,
PART OneR

TNM033: Introduction to Data Mining 27

You might also like