[go: up one dir, main page]

0% found this document useful (0 votes)
251 views44 pages

Pattern Classification: All Materials in These Slides Were Taken From

This document summarizes key concepts from Chapter 2 of the textbook "Pattern Classification" regarding Bayesian decision theory. It introduces terminology like state of nature, priors, likelihoods, posteriors, and decision rules. It describes using class-conditional probabilities and Bayes' rule to calculate posteriors and make optimal decisions. It also discusses discriminant functions, decision surfaces, the normal density distribution, and discriminant functions for the normal density case.

Uploaded by

Shahriar Student
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
251 views44 pages

Pattern Classification: All Materials in These Slides Were Taken From

This document summarizes key concepts from Chapter 2 of the textbook "Pattern Classification" regarding Bayesian decision theory. It introduces terminology like state of nature, priors, likelihoods, posteriors, and decision rules. It describes using class-conditional probabilities and Bayes' rule to calculate posteriors and make optimal decisions. It also discusses discriminant functions, decision surfaces, the normal density distribution, and discriminant functions for the normal density case.

Uploaded by

Shahriar Student
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 44

Pattern

Classification

All materials in these slides were taken from


Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley
& Sons, 2000
with the permission of the authors and the
publisher
Chapter 2 :
Bayesian Decision Theory
(Sections 1-6)

1. Introduction – Bayesian Decision Theory


• Pure statistics, probabilities known, optimal decision

2. Bayesian Decision Theory–Continuous Features


3. Minimum-Error-Rate Classification (omit)

4. Classifiers, Discriminant Functions, Decision Surfaces

5. The Normal Density


Bayesian Decision Theory
• Design classifiers to make decisions subject to
minimizing an expected ”risk”.
– The simplest risk is the classification error.
– When misclassification errors are not equally
important, the risk can include the cost associated
with different misclassification errors.
Terminology
• State of nature ω (class label):
– e.g., ω1 for sea bass, ω2 for salmon

• Probabilities P(ω1) and P(ω2) (priors):


– e.g., prior knowledge of how likely is to get a sea bass
or a salmon

• Probability density function p(x) (evidence):


– e.g., how frequently we will measure a pattern with
feature value x (e.g., x corresponds to lightness)
Terminology (cont’d)
• Conditional probability density p(x/ωj) (likelihood) :
– e.g., how frequently we will measure a pattern with
feature value x given that the pattern belongs to class ωj

e.g., lightness distributions


between salmon/sea-bass
populations
Terminology (cont’d)

• Conditional probability P(ωj /x) (posterior) :


– e.g., the probability that the fish belongs to class
ωj given feature x.
Decision Rule Using
Conditional Probabilities
• Using Bayes’ rule:
p ( x /  j ) P ( j ) likelihood  prior
P( j / x)  
p( x) evidence
2
where p ( x)   p ( x /  j ) P ( j ) (i.e., scale factor – sum of probs = 1)
j 1

Decide ω1 if P(ω1 /x) > P(ω2 /x); otherwise decide ω2


or
Decide ω1 if p(x/ω1)P(ω1)>p(x/ω2)P(ω2); otherwise decide ω2
or
Decide ω1 if p(x/ω1)/p(x/ω2) >P(ω2)/P(ω1) ; otherwise decide ω2
likelihood ratio threshold
8

1. Introduction
• The sea bass/salmon example
• State of nature, prior
• State of nature is a random variable
• The catch of salmon and sea bass is equiprobable

• P(1) = P(2) (uniform priors)

• P(1) + P( 2) = 1 (exclusivity and exhaustivity)

Pattern Classification, Chapter 2 (Part 1)


9

• Decision rule with only the prior information


• Decide 1 if P(1) > P(2) otherwise decide 2

• Use of the class –conditional information


• P(x |  ) and P(x |  ) describe the difference in
1 2
lightness between populations of sea and salmon

Pattern Classification, Chapter 2 (Part 1)


10

Pattern Classification, Chapter 2 (Part 1)


11

• Posterior, likelihood, evidence

• P( | x) = P(x |  ) P ( ) / P(x)


j j j (Bayes Rule)

• Where in case of two categories


j2
P ( x )   P ( x |  j )P (  j )
j 1

• Posterior = (Likelihood. Prior) / Evidence


Pattern Classification, Chapter 2 (Part 1)
12

Pattern Classification, Chapter 2 (Part 1)


13

• Decision given the posterior probabilities

X is an observation for which:

if P(1 | x) > P(2 | x) True state of nature = 1


if P(1 | x) < P(2 | x) True state of nature = 2

Pattern Classification, Chapter 2 (Part 1)


14
2. Bayesian Decision Theory –
Continuous Features

• Generalization of the preceding ideas


1. Use of more than one feature, vector X
2. Use more than two states of nature, c classes
3. Allowing actions other than deciding the state of nature
(skip)
4. Introduce a loss of function which is more general than
the probability of error (skip)

Pattern Classification, Chapter 2 (Part 1)


15
4. Classifiers, Discriminant Functions,
and Decision Surfaces

• The multi-category case


• Set of discriminant functions g (x), i = 1,…, c
i

• The classifier assigns a feature vector x to class  i

if:
gi(x) > gj(x) j  i

Pattern Classification, Chapter 2 (Part 2)


16
• Discriminant function
gi(x) = P(i | x)

(max. discrimination corresponds to max.


posterior!)
gi(x)  P(x | i) P(i)

gi(x) = ln P(x | i) + ln P(i)


(ln: natural logarithm!)

Pattern Classification, Chapter 2 (Part 2)


17

Pattern Classification, Chapter 2 (Part 2)


18

• Feature space divided into c decision regions


if g (x) > g (x) j  i then x is in R
i j i

(R means assign x to  )
i i

• The two-category case


• A classifier is a “dichotomizer” that has two discriminant
functions g1 and g2

Let g(x)  g1(x) – g2(x)

Decide 1 if g(x) > 0 ; Otherwise decide 2

Pattern Classification, Chapter 2 (Part 2)


19

• The computation of g(x)

g( x )  P (  1 | x )  P (  2 | x )
P( x | 1 ) P( 1 )
 ln  ln
P( x |  2 ) P(  2 )

Pattern Classification, Chapter 2 (Part 2)


20

Pattern Classification, Chapter 2 (Part 2)


21

5. The Normal Density


• Univariate density

• Density which is analytically tractable


• Continuous density
• A lot of processes are asymptotically Gaussian
• Handwritten characters, speech sounds are ideal or prototype
corrupted by random process (central limit theorem)

1  1 x 
2

P( x )  exp     ,
2   2    
Where:
 = mean (or expected value) of x
2 = expected squared standard deviation or variance

Pattern Classification, Chapter 2 (Part 2)


22

Pattern Classification, Chapter 2 (Part 2)


23

• Multivariate normal density p(x) ~ N(, )

• Multivariate normal density in d dimensions is:


1  1 
P( x )  1/ 2
exp  ( x   )  ( x   )
t 1

( 2 ) d/2
   2 
where:
x = (x1, x2, …, xd)t (t stands for the transpose vector form)
 = (1, 2, …, d)t mean vector
 = d*d covariance matrix
|| and -1 are determinant and inverse respectively

Pattern Classification, Chapter 2 (Part 2)


Properties of the Normal Density
Covariance Matrix 

• Always symmetric
• Always positive semi-definite, but for our purposes
 is positive definite
• determinant is positive
  invertible
• Eigenvalues real and positive, and the principal
axes of the hyperellipsoidal loci of points of
constant density are eigenvectors of 
25

Pattern Classification, Chapter 2 (Part 2)


26

Pattern Classification, Chapter 2 (Part 2)


27
6. Discriminant Functions for the
Normal Density
• We saw that the minimum error-rate classification
can be achieved by the discriminant function

gi(x) = ln P(x | i) + ln P(i)

• Case of multivariate normal p(x) ~ N(, )


1
1 d 1
gi ( x )   ( x   i )  ( x   i )  ln 2  ln  i  ln P (  i )
t

2 i 2 2

Pattern Classification, Chapter 2 (Part 3)


28

1.Case  = 
i
2 I (I stands for the identity matrix)

g i ( x )  w x  w i 0 (linear discrimina nt function)


t
i

where :
i 1
wi  2 ; wi 0    i  i  ln P (  i )
t

 2 2

(  i 0 is called the threshold for the ith category! )

Pattern Classification, Chapter 2 (Part 3)


29

• A classifier that uses linear discriminant functions is


called “a linear machine”

• The decision surfaces for a linear machine are


pieces of hyperplanes defined by:

gi(x) = gj(x)

• If equal priors for all classes, this reduces to the


minimum-distance classifier where an unknown is
assigned to the class of the nearest mean

Pattern Classification, Chapter 2 (Part 3)


30

Pattern Classification, Chapter 2 (Part 3)


31

• The hyperplane separating R and R


i j

1 2 P(  i )
x0  (  i   j )  ln ( i   j )
2 i   j
2
P(  j )

always orthogonal to the line linking the means!


1
if P (  i )  P (  j ) then x0  (  i   j )
2

Pattern Classification, Chapter 2 (Part 3)


32

Pattern Classification, Chapter 2 (Part 3)


33

Pattern Classification, Chapter 2 (Part 3)


34

2. Case i =  (covariance of all classes are identical


but arbitrary!)

• Hyperplane separating R and R


i j

1
x0  (  i   j ) 
 
ln P (  i ) / P (  j )
.(  i   j )
2 ( i   j )  ( i   j )
t 1

(the hyperplane separating Ri and Rj is generally not


orthogonal to the line between the means!)
If priors equal, reduces to squared Mahalanobis distance

Pattern Classification, Chapter 2 (Part 3)


35

Pattern Classification, Chapter 2 (Part 3)


36

Pattern Classification, Chapter 2 (Part 3)


37

Pattern Classification, Chapter 2 (Part 3)


38
3. Case i = arbitrary
• The covariance matrices are different for each category
g i ( x )  x tWi x  w it x  w i 0
where :
1 1
Wi    i
2
w i   i 1  i
1 t 1 1
w i 0    i  i  i  ln  i  ln P (  i )
2 2
(Hyperquadrics which are: hyperplanes, pairs of hyperplanes,
hyperspheres, hyperellipsoids, hyperparaboloids,
hyperhyperboloids)
Pattern Classification, Chapter 2 (Part 3)
39

Pattern Classification, Chapter 2 (Part 3)


40

Pattern Classification, Chapter 2 (Part 3)


41

Pattern Classification, Chapter 2 (Part 3)


42

Pattern Classification, Chapter 2 (Part 3)


43

Pattern Classification, Chapter 2 (Part 3)


44

Pattern Classification, Chapter 2 (Part 3)

You might also like