0% found this document useful (0 votes)

46 views72 pages

Data Mining Alternative Classification Notes

This document discusses rule-based classifiers for data classification. Rule-based classifiers use "if-then" rules to classify records, where the rule condition is a conjunction of attribute values and the conclusion is a class label. Examples of rules are presented for classifying animals. The document also discusses how rule-based classifiers work by applying rules to new records, and characteristics of rule-based classifiers such as mutually exclusive and exhaustive rules. Finally, it shows how rules can be generated from decision trees.

Uploaded by

Md Mahfoozur Rahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views72 pages

Data Mining Alternative Classification Notes

Uploaded by

Md Mahfoozur Rahman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 72

Data Mining

Classification: Alternative Techniques

© Tan,Steinbach, Kumar Introduction to Data Mining 1

Lecture Notes for Chapter 5

Introduction to Data Mining

by
Tan, Steinbach, Kumar
Rule-Based Classifier

Classify records by using a collection of

“if…then…” rules

Rule: (Condition) → y
– where
◆ Condition is a conjunctions of attributes
◆ y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
– Examples of classification rules:
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
◆ (Blood Type=Warm)  (Lay Eggs=Yes) → Birds
◆ (Taxable Income < 50K)  (Refund=Yes) → Evade=No
Rule-based Classifier (Example)
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammals
python cold no no no reptiles
salmon cold no no yes fishes
whale warm yes no yes mammals
frog cold no no sometimes amphibians
komodo cold no no no reptiles
bat warm yes yes no mammals
pigeon warm no yes no birds
cat warm yes no no mammals
leopard shark cold yes no yes fishes
turtle cold no no sometimes reptiles
penguin warm no no sometimes birds
porcupine warm yes no no mammals
eel cold no no yes fishes
salamander cold no no sometimes amphibians
gila monster cold no no no reptiles
platypus warm no no no mammals
owl warm no yes no birds
dolphin warm yes no yes mammals
eagle warm no yes no birds

R1: (Give Birth = no)  (Can Fly = yes) → Birds

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

R2: (Give Birth = no)  (Live in Water = yes) → Fishes
R3: (Give Birth = yes)  (Blood Type = warm) → Mammals
R4: (Give Birth = no)  (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
Application of Rule-Based Classifier

A rule r covers an instance x if the attributes of

the instance satisfy the condition of the rule
R1: (Give Birth = no)  (Can Fly = yes) → Birds
R2: (Give Birth = no)  (Live in Water = yes) → Fishes
R3: (Give Birth = yes)  (Blood Type = warm) → Mammals
R4: (Give Birth = no)  (Can Fly = no) → Reptiles

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

R5: (Live in Water = sometimes) → Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal

Rule Coverage and Tid Refund Marital Taxable

Status Income Class
Accuracy 1 Yes Single 125K No
2 No Married 100K No

Coverage of a rule: 3 No Single 70K No

4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
10 No Single 90K Yes
– Fraction of records that satisfy the antecedent
of a rule Accuracy of a rule:
– Fraction of records that satisfy both the
antecedent and consequent of a 10

rule (Status=Single) → No
Coverage = 40%, Accuracy = 50%
How does Rule-based Classifier Work?
R1: (Give Birth = no)  (Can Fly = yes) → Birds
R2: (Give Birth = no)  (Live in Water = yes) → Fishes
R3: (Give Birth = yes)  (Blood Type = warm) → Mammals
R4: (Give Birth = no)  (Can Fly = no) → Reptiles

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

R5: (Live in Water = sometimes) → Amphibians
Name Blood Type Give Birth Can Fly Live in Class
Water
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
A lemur triggers rule R3, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
This example illustrates two IMPORTANT PROPERTIES of Rule Based
Classifier (when it triggers more than one rule and when triggers no rule)

Characteristics of Rule-Based Classifier

Mutually exclusive rules

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– Classifier contains mutually exclusive rules if
the rules are independent of each other
– Every record is covered by at most one rule

Exhaustive rules
– Classifier has exhaustive coverage if it accounts
for every possible combination of attribute
values
– Each record is covered by at least one rule

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

From Decision Trees To Rules

Classification Rules
Refund (Refund=Yes) ==> No
Yes No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
NO Marital
{Single, Status (Refund=No, Marital Status={Single,Divorced},
{Married } Taxable Income>80K) ==> Yes
Divorced}
(Refund=No, Marital Status={Married}) ==> No
Taxable NO
Income
< 80K > 80K

NO YES
R
ules are mutually exclusive and exhaustive

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule set contains as much information as the
tree

Rules Can Be Simplified

Refund
Tid Refund Marital Taxable
Yes No Status Income Cheat

NO Marital 1 Yes Single 125K No

{Single, Status 2 No Married 100K No
{Married }
Divorced} 3 No Single 70K No

Taxable 4 NO
Yes Married 120K No
Income 5 No Divorced 95K Yes
< 80K > 80K
6 No Married 60K No
NO YES 7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

10 No Single 90K Yes
10

Initial Rule: (Refund=No)  (Status=Married) → No

Simplified Rule: (Status=Married) → No
Effect of Rule Simplification

Rules are no longer mutually exclusive – A

record may trigger more than one rule –
Solution?
◆ Ordered rule set (order by accuracy or coverage
etc)
◆ Unordered rule set – use voting schemes

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rules are no longer exhaustive – A
record may not trigger any rules –
Solution?
◆ Use a default class
Ordered Rule Set

Rules are rank ordered according to their

priority
– An ordered rule set is known as a decision list

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

When a test record is presented to the
classifier
– It is assigned to the class label of the highest ranked rule it has
triggered
– If none of the rules fired, it is assigned to the default class

R1: (Give Birth = no)  (Can Fly = yes) → Birds

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule Ordering Schemes

Rule-based ordering
– Individual rules are ranked based on their quality

Class-based ordering
– Rules that belong to the same class appear together

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule-based Ordering Class-based Ordering
(Refund=Yes) ==> No (Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Single,Divorced},

Taxable Income<80K) ==> No Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced}, (Refund=No, Marital Status={Married}) ==> No

Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Single,Divorced},
(Refund=No, Marital Status={Married}) ==> No Taxable Income>80K) ==> Yes

Building Classification Rules

Direct Method:
◆ Extract rules directly from data
◆ e.g.: RIPPER, CN2, Holte’s 1R

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Indirect Method:
◆ Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
◆ e.g: C4.5rules
Direct Method: Sequential Covering

1. Start from an empty rule

2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion
is met
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
Example of Sequential Covering

(i) Original Data (ii) Step 1

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Example of Sequential Covering…

R1 R1

(iii) Step 2 (iv) Step 3

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Aspects of Sequential Covering

Rule Growing

Rule Evaluation

Instance Elimination

Stopping Criterion

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule Pruning

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule Growing

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Two common strategies
Yes: 3
{} No: 4

Refund=
No
Status =
Single
Status =
Divorced
Status =
Married
... Income
> 80K

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule Evaluation

RIPPER Algorithm:
– Start from an empty rule: {} => class
– Add conjuncts that maximizes FOIL’s information
gain measure:
◆ R0: {} => class (initial rule)
◆ R1: {A} => class (rule after adding conjunct)
◆ Gain(R0, R1) = t [ log (p1/(p1+n1)) – log (p0/(p0 + n0)) ]
◆ where t: number of positive instances covered by both R0
and R1 p0: number of positive instances covered by R0
n0: number of negative instances covered by R0 p1:
number of positive instances covered by R1 n1:
number of negative instances covered by R1

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Rule Evaluation (Through Foil’s Information Gain)

If a training set contains 60 positive examples

and 100 negative examples. We are given two
candidate rules
– Rule r1: covers 50 positive examples and 5
negative examples.
– Rule r2: covers 2 positive examples and no
negative examples.
◆Gain (R0, R1)=50*(log(50/55)-log(60/160)

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

=50(-0.1372+1.412)=63.72 ◆
Gain(R0, R2)= ~3
– Choose rule R1 as it has more gain
Rule Evaluation by Other Metrics

Accuracy
Metrics:– =nc Rule R1 has accuracy=50/55, R2 has accuracy

n =2/2 nc +1

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– Laplace = R1 has 51/57=89.47% n+k
n : Number of instances
n+k R2 has ¾= 75% covered by rule
nc : Number of positive
instances covered by rule
k : Number of classes p :
+
– M-estimate = n kp
c Prior probability for the
positive class

when p=1/k M-estimate

becomes same as Laplace

Rule Evaluation Metric (Likelihood ratio)

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›


f  i

where R = 2  i=1 fi

log ei  k is the
numbe of classes
fi is the observed frequencey of class i
ei is the expeceted frequency of class
i

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

(for exampl Rule r1, f1 = 50, f2 = 5,e1 = 55*60/160,e2 =
55*100/160)
R statistic for Rule r1, R(r1) = 99.9
R statistic for Rule r2, R(r2) = 5.66
So Rule r1 is a better rule than r2
Instance Elimination
R3 R2
Why do we need to R1 + + + +
+
+ ++ +
eliminate instances? class = +
+
+++
+ +
+
+
+
+ + +
– Otherwise, the next rule is + + +
+ + +
+ +
identical to previous rule - - -
- - - - -
- -
class = - - - -
- -
-
© Tan,Steinbach, Kumar Introduction to Data Mining - ‹#›
- -
Why do we remove positive instances?
– Ensure that the next rule is different
Why do we remove
negative instances? Before starting accu(r1)=12/15 or 80%

– Prevent underestimating Accu(r2)=7/10 or 70% and accu(r3)=8/12 or 66.7%

accuracy of rule First choose R1 as more accuracy, remove both
positive and –ve instances covered by it,
– Compare rules R2 and R3 recalculate accuracy of r2, r3 in the diagram
Now accu(r2)=70% but accu(r3)=6/8=75% so
better choice R1 and R3

Stopping Criterion and Rule Pruning

Stopping criterion
– Compute the gain
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
– If gain is not significant, discard the new rule

Rule Pruning
– Similar to post-pruning of decision trees –
Reduced Error Pruning:
◆ Remove one of the conjuncts in the rule
◆ Compare error rate on validation set before and
after pruning
◆ If error improves, prune the conjunct

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Summary of Direct Method

Grow a single rule

Remove Instances from rule

Prune the rule (if necessary)

Add rule to Current Rule Set

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Repeat
Direct Method: RIPPER

For 2-class problem, choose one of the classes as

positive class, and the other as negative class
– Learn rules for positive class
– Negative class will be default class
For multi-class problem
– Order the classes according to increasing class
prevalence (fraction of instances that belong to a
particular class)

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– Learn the rule set for smallest class first, treat the rest
as negative class
– Repeat with next smallest class as positive class
Direct Method: RIPPER
Growing a rule:
– Start from empty rule
– Add conjuncts as long as they improve FOIL’s information gain
– Stop when rule no longer covers negative examples
– Prune the rule immediately using incremental reduced error pruning
Measure for pruning: v = (p-n)/(p+n)
◆ p: number of positive examples covered by the rule in
the validation set
◆ n: number of negative examples covered by the rule in
the validation set
◆If the metric v improves after pruning then the conjunct is removed

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– Pruning method: delete any final sequence of conditions that maximizes v
– For example ABCD→y, Ripper checks whether D should be removed first,
followed by CD, BCD etc.
– The original rule covers only positive example where pruned rule may
cover some of the negative examples from training set

Indirect Methods

P
No Yes

Q R

No Yes No Yes

- + + Q

No Yes
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
Rule Set r1: (P=No,Q=No) ==>
r2: (P=No,Q=Yes) ==> + r3:
(P=Yes,R=No) ==> + r4:
(P=Yes,R=Yes,Q=No) ==> r5:
(P=Yes,R=Yes,Q=Yes) ==> +

Indirect Method: C4.5rules

Extract rules from an unpruned decision tree

For each rule, r: A → y,

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– consider an alternative rule r’: A’ → y where A’
is obtained by removing one of the conjuncts
in A
– Compare the pessimistic error rate for r
against all r’s
– Prune if one of the r’s has lower pessimistic
error rate
– Repeat until we can no longer improve
generalization error

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Indirect Method: C4.5rules

Instead of ordering the rules, order subsets of rules (class

ordering)
– Each subset is a collection of rules with the same rule
consequent (class)
– Compute description length of each subset
◆ Description length = L(error) + g L(model)
◆ g is a parameter that takes into account the presence of
redundant attributes in a rule set (default value = 0.5)
– g value is small if it contains many redundant attributes

– The class that has the lowest description length is

given highest priority as it contains best set of rules

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Example
Name Give Birth Lay Eggs Can Fly Live in Have Legs Class
Water
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

C4.5 versus C4.5rules versus RIPPER

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Give C4.5rules:
Birth? (Give Birth=No, Can Fly=Yes)
→ Birds
(Give Birth=No, Live in Water=Yes)
→ Fishes
Yes No
(Give Birth=Yes)
→ Mammals
(Give Birth=No, Can Fly=No, Live in Water=No)
→ Reptiles
Mammals Live In ( ) → Amphibians
Water?
Yes No RIPPER:
(Live in Water=Yes)
→ Fishes
Sometimes (Have Legs=No)
→ Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)
Fishes Amphibians Can
→ Reptiles
Fly?
(Can Fly=Yes,Give Birth=No)
→ Birds
Yes No () → Mammals

Birds Reptiles

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

C4.5 versus C4.5rules versus RIPPER
C4.5 and C4.5rules:
PREDICTED CLASS
Amphibian Fishes Reptiles Birds Mammals
s
ACTUAL Amphibian 2 0 0 0 0
s
CLASS Fishes 0 2 0 0 1
Reptiles 1 0 3 0 0
Birds 1 0 0 3 0
Mammals 0 0 1 0 6
RIPPER:
PREDICTED CLASS
Amphibian Fishes Reptiles Birds Mammals
s
ACTUAL Amphibian 0 0 0 0 2

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

s
CLASS Fishes 0 3 0 0 0
Reptiles 0 0 3 0 1
Birds 0 0 1 2 1
Mammals 0 2 1 0 4

Advantages of Rule-Based Classifiers

As highly expressive as decision trees

Easy to interpret
Easy to generate
Can classify new instances rapidly
Performance comparable to decision trees

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Instance-Based Classifiers

Set of Stored Cases

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Examples:
– Rote-learner
◆ Memorizes entire training data and performs
classification only if attributes of record match one of
the training examples exactly

– Nearest neighbor
◆ Uses k “closest” points (nearest neighbors) for
performing classification

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Nearest Neighbor Classifiers

Basic idea:
– If it walks like a duck, quacks like a duck, then
it’s probably a duck

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Nearest-Neighbor Classifiers
Unknown record
Requires three
things

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– The set of stored records
– Distance Metric to compute distance between records
– The value of k, the number of nearest neighbors to retrieve

To classify an unknown record:

– Compute distance to other training records
– Identify k nearest neighbors
– Use class labels of nearest neighbors to determine the class label of
unknown record (e.g., by taking majority vote)

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

Definition of Nearest Neighbor

X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

K-nearest neighbors of a record x are data points that
have the k smallest distance to x
Nearest Neighbor Classification

Compute distance between two points:

– Euclidean distance

d(p,q) =  (p −q ) i i i
2

Determine the class from nearest neighbor list

© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›

– take the majority vote of class labels among
the k-nearest neighbors
– Weigh the vote according to distance
◆ weight factor, w = 1/d2
Nearest Neighbor Classification…

Choosing the value of k:

– If k is too small, sensitive to noise points
– If k is too large, neighborhood may include points from
other classes

Nearest Neighbor Classification…

Scaling issues
– Attributes may have to be scaled to prevent
distance measures from being dominated by
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
one of the attributes
– Example:
◆ height of a person may vary from 1.5m to 1.8m
◆ weight of a person may vary from 90lb to 300lb
◆ income of a person may vary from $10K to $1M
Nearest neighbor Classification…

k-NN classifiers are lazy learners

– It does not build models explicitly

– Unlike eager learners such as decision tree
induction and rule-based systems
– Classifying unknown records are relatively
expensive
Bayes Classifier

A probabilistic framework for solving classification

problems
Conditional Probability:
P(A,C)
P(C | A) =

P(A)
P(A,C)
P(A| C) = P(C) Bayes theorem:

P(A| C)P(C)
P(C | A) =
P(A)
Example of Bayes Theorem

Given:
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
– A doctor knows that meningitis causes stiff neck 50% of the
time
– Prior probability of any patient having meningitis is 1/50,000
– Prior probability of any patient having stiff neck is 1/20

If a patient has stiff neck, what’s the probability

he/she has meningitis?

P(M | S) = P(S | M )P(M ) =

0.51/50000 = 0.0002
P(S) 1/ 20

Bayesian Classifiers

Consider each attribute and class label as random

variables

Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C
– Specifically, we want to find the value of C that
maximizes P(C| A1, A2,…,An )

Can we estimate P(C| A1, A2,…,An ) directly from
data?
Bayesian Classifiers
Approach:
– compute the posterior probability P(C | A1, A2, …, An)
for all values of C using the Bayes theorem
P(AAA | C)P(C)
P(C | A A A ) =
1 2 n 1 2
n

P(A A A ) 1 2 n

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes
P(A1, A2, …, An|C) P(C)

How to estimate P(A1, A2, …, An | C )?

Naïve Bayes Classifier

Assume independence among attributes Ai when class is

given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj)  P(Ai| Cj) is
maximal.

How to Estimate Probabilities from Data?

c c c c Class: P(C)
= Nc/N
Tid Refund Marital Taxable
Status Income Evade – e.g., P(No) = 7/10, P(Yes) =
3/10
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No For discrete attributes:
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
© Tan,Steinbach, Kumar Introduction to Data Mining ‹#›
10 No Single 90K Yes
How to Estimate Probabilities from Data?

P(Ai | Ck) = |Aik|/ Nc k

– where |Aik| is number of instances having attribute Ai and
belongs to class Ck – Examples:
P(Status=Married|No) = 4/7
10
P(Refund=Yes|Yes)=0
For continuous attributes:
– Discretize the range into bins
◆ one ordinal attribute per bin
◆ violates independence assumption
– Two-way split: (A < v) or (A > v)

How to Estimate Probabilities from Data?
◆ choose only one of the two splits as new attribute –
Probability density estimation:
◆ Assume attribute follows a normal distribution
◆ Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
◆ Once probability distribution is known, can use it to
estimate the conditional probability P(Ai|c)

c c c

How to Estimate Probabilities from Data?
Tid Refund Marital
Status
Taxable
Income Evade Normal distribution:
1 Yes Single 125K No
2 No Married 100K No P(A | c ) = 1
i j 2 e −

3 No Single 70K No ( Ai2−ij2ij )2

4 Yes Married 120K No

5 No Divorced 95K Yes 2 ij

6 No Married 60K No
7 Yes Divorced 220K No
– One for each (Ai,ci)
8 No Single 85K Yes
pair
9 No Married 75K No
10 No Single 90K Yes For (Income, Class=No):

– If Class=No

How to Estimate Probabilities from Data?
◆ sample mean = 110
10
◆ sample variance = 2975

1
2 (54.54)
(120−110)2

P(Income=120| No) = e − 2(2975)

= 0.0072

Example of Naïve Bayes Classifier
Given a Test Record:

X= (Refund = No,Married,Income =120K)

If one of the conditional

probability is zero, then
the entire expression
becomes zero
Probability estimation: Nic =number of examples that have attribute Ai and class c

N
Original: P(Ai |C) = ic Nc =number of examples that are from class C

c: number of classes

+1
Laplace: P(Ai |C) = N ic
p: prior probability Nc +c
m: parameter

+mp
m-estimate : P(Ai |C) = N ic

Nc +m

Name Give Birth Can Fly Live in Have Legs Class
Water
human
python
yes
no
no
no
no
no
yes
no
mammals
non-
mammals
Example of Naïve
salmon

whale
no

yes
no

no
yes

yes
no

no
non-
mammals
mammals
Bayes Classifier
frog no no sometimes yes non-
mammals
komodo no no no yes non- A: attributes
mammals
bat yes yes no yes mammals
M: mammals
pigeon no yes no yes non- N: non-mammals
mammals
cat yes no no yes mammals
leopard shark yes no yes no non-
mammals
turtle no no sometimes yes non-
mammals
penguin no no sometimes yes non-
mammals
porcupine yes no no yes mammals
eel no no yes no non-
mammals
salamander no no sometimes yes non-
mammals
gila monster no no no yes non-
mammals
platypus no no no yes mammals
owl no yes no yes non-
mammals
dolphin yes no yes no mammals
© Tan,Steinbach,
eagle no Kumaryes no Introductionnon-
yes to Data Mining ‹#›
mammals
P(A| M ) =    = 0.06 P(A| N) =    = 0.0042 P(A| M )P(M ) =

0.06 = 0.021 P(A| N)P(N) = 0.004 = 0.0027

P(A|M)P(M) > P(A|N)P(N)
Give Birth Can Fly Live in Have Legs Class
Water => Mammals
yes no yes no ?
Naïve Bayes
(Summary)

Robust to isolated noise points

Handle missing values by ignoring the instance
during probability estimate calculations

Robust to irrelevant attributes

Independence assumption may not hold for some

attributes
– Use other techniques such as Bayesian Belief
Networks (BBN)

Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
88 pages
Chap5 Alternative Classifi1
No ratings yet
Chap5 Alternative Classifi1
67 pages
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
0% (1)
Lecture Notes For Chapter 5: by Tan, Steinbach, Kumar
88 pages
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
No ratings yet
Data Mining Classification: Alternative Techniques: Lecture Notes For Chapter 5 Introduction To Data Mining
44 pages
Rule Coverage and Accuracy
No ratings yet
Rule Coverage and Accuracy
43 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
27 pages
CH 4 - Classification Rule - Based Global Edition Edited Oct 17, 2024
No ratings yet
CH 4 - Classification Rule - Based Global Edition Edited Oct 17, 2024
28 pages
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Rule-Based Introduction To Data Mining, 2 Edition
28 pages
Chap4 Naive Bayes
No ratings yet
Chap4 Naive Bayes
14 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
14 pages
Chap4 Rule Based
No ratings yet
Chap4 Rule Based
21 pages
M6 Classification Alternative
No ratings yet
M6 Classification Alternative
145 pages
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 5 Introduction To Data Mining: by Tan, Steinbach, Kumar
72 pages
Rule Based Classifier
No ratings yet
Rule Based Classifier
14 pages
Lec06 Classification NaiveBayes RuleBased
No ratings yet
Lec06 Classification NaiveBayes RuleBased
44 pages
Rule Based Classification
No ratings yet
Rule Based Classification
28 pages
Rule Based Classification
No ratings yet
Rule Based Classification
34 pages
Lec4 PDF
No ratings yet
Lec4 PDF
14 pages
Rule Based Classifer
No ratings yet
Rule Based Classifer
62 pages
Rules
No ratings yet
Rules
26 pages
Classification: Alternative Techniques: Salvatore Orlando
No ratings yet
Classification: Alternative Techniques: Salvatore Orlando
52 pages
4 Rules
No ratings yet
4 Rules
23 pages
4-Chap4 Basic Classification
No ratings yet
4-Chap4 Basic Classification
128 pages
Class Adv Classification II
No ratings yet
Class Adv Classification II
32 pages
Lect12-Rule Based Classifier
No ratings yet
Lect12-Rule Based Classifier
27 pages
Classification Techniques
No ratings yet
Classification Techniques
50 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
101 pages
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
No ratings yet
Classification: Alternative Techniques: Md. Fazlul Karim Patwary IIT
65 pages
Chap4 Basic Classification PDF
No ratings yet
Chap4 Basic Classification PDF
101 pages
Chap4 - Basic - Classification-Admin and Economy
No ratings yet
Chap4 - Basic - Classification-Admin and Economy
31 pages
Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
51 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Class Adv Classification I
No ratings yet
Class Adv Classification I
39 pages
Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
101 pages
CS 6823 Data Mining: Classification Decision Tree
No ratings yet
CS 6823 Data Mining: Classification Decision Tree
39 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
No ratings yet
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
95 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
Aiml Unit-4
No ratings yet
Aiml Unit-4
82 pages
Data Mining Classification Basics
No ratings yet
Data Mining Classification Basics
21 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
34 pages
Chap7 Extended Association Analysis
No ratings yet
Chap7 Extended Association Analysis
67 pages
Data Mining Course: Classification & Decision Trees
No ratings yet
Data Mining Course: Classification & Decision Trees
77 pages
DM Lec6
No ratings yet
DM Lec6
18 pages
Rule Based Classifications
No ratings yet
Rule Based Classifications
14 pages
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
No ratings yet
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
63 pages
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
No ratings yet
WK 6 Nearest Neighbor Classifier and Bayesian Classifier 12-05-2021
23 pages
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
67 pages
Rule-Based Classifier Lecture
No ratings yet
Rule-Based Classifier Lecture
20 pages
05 Chap3 - Basic - Classification Edited On Oct 10, 2023
No ratings yet
05 Chap3 - Basic - Classification Edited On Oct 10, 2023
78 pages
Handling Continuous Attributes: Different Kinds of Rules
No ratings yet
Handling Continuous Attributes: Different Kinds of Rules
33 pages
Data Mining Classifiers Guide
No ratings yet
Data Mining Classifiers Guide
23 pages
Lecture 7 - Classification (Rules and Naïve Bayes)
100% (1)
Lecture 7 - Classification (Rules and Naïve Bayes)
19 pages
Advanced Data Mining Concepts
No ratings yet
Advanced Data Mining Concepts
87 pages
Class 2b-Decision Rules
No ratings yet
Class 2b-Decision Rules
24 pages
IME672 - Lecture 48
No ratings yet
IME672 - Lecture 48
21 pages
Class Adv Classification III
No ratings yet
Class Adv Classification III
54 pages
Iit Guidelines
No ratings yet
Iit Guidelines
3 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
AIRLINE
No ratings yet
AIRLINE
63 pages
MCQ Unit Wise ML (ROE083) Que Bank With Ans.
100% (4)
MCQ Unit Wise ML (ROE083) Que Bank With Ans.
22 pages
Khmer Text Classification (News, Health, Finance, Etc.)
No ratings yet
Khmer Text Classification (News, Health, Finance, Etc.)
3 pages
CAF Jun24 Chp13 Statistical Description and Sampling Pranav Popat
No ratings yet
CAF Jun24 Chp13 Statistical Description and Sampling Pranav Popat
28 pages
Digital Image Processing
No ratings yet
Digital Image Processing
10 pages
Text Preprocessing
No ratings yet
Text Preprocessing
39 pages
IV - CSE - Data Warehousing and Data Mining
No ratings yet
IV - CSE - Data Warehousing and Data Mining
4 pages
DEEP LEARNING-Syllabus
No ratings yet
DEEP LEARNING-Syllabus
1 page
Plant Disease and Pest Detection Using Deep Learning-Based Features
No ratings yet
Plant Disease and Pest Detection Using Deep Learning-Based Features
16 pages
Final Report
No ratings yet
Final Report
20 pages
Cloud Based Travel Planning System With A Learned ITA Algorithm Approach
No ratings yet
Cloud Based Travel Planning System With A Learned ITA Algorithm Approach
3 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Data Mining with Weka Lessons
No ratings yet
Data Mining with Weka Lessons
47 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
W7 Weka Classification
No ratings yet
W7 Weka Classification
7 pages
Python ML Package Overview
No ratings yet
Python ML Package Overview
9 pages
Asi 06 00022
No ratings yet
Asi 06 00022
30 pages
Replika: Building An Emotional Conversation With Deep Learning
100% (3)
Replika: Building An Emotional Conversation With Deep Learning
26 pages
IoT Botnet Attack Detection
No ratings yet
IoT Botnet Attack Detection
98 pages
Unit 3
No ratings yet
Unit 3
9 pages
Minerali Et Al 2020 Evaluation of Assay Central Machine Learning Models For Rat Acute Oral Toxicity Prediction
No ratings yet
Minerali Et Al 2020 Evaluation of Assay Central Machine Learning Models For Rat Acute Oral Toxicity Prediction
8 pages
CNN Basic Beak of Bird
100% (1)
CNN Basic Beak of Bird
20 pages
Fake News Detection - Report
No ratings yet
Fake News Detection - Report
21 pages
S Abrol 2016
No ratings yet
S Abrol 2016
5 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
K - NN Classification
No ratings yet
K - NN Classification
4 pages
Distance Metric Learning
No ratings yet
Distance Metric Learning
14 pages
Face Detection System - PPT
No ratings yet
Face Detection System - PPT
24 pages