[go: up one dir, main page]

0% found this document useful (0 votes)
57 views22 pages

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 22

Linear Discriminant Functions

Minimum Squared Error Procedures:


Pseudoinverse and Widrow-Hoff
Ho-Kashyap Procedures

CSE555: Srihari
Non-separable Behavior
• Perceptron is an error-correcting procedure
– Modify weight only when an error is encountered
– Success is due to relentless search for error-free
solution
• Even if separating vector is found does not
follow it will perform well on independent test
data
– Any set of fewer than 2d samples is likely to be
linearly separable

CSE555: Srihari
Non-separable Case
• When design sets are large, points are most
likely not linearly separable
• How does error correction procedures behave
when points are not linearly separable?
– Corrections will never cease in an error correction
procedure
– Infinite sequence of weight vectors
• Weight vectors are bounded
– Empirical rule based on weight vector fluctuating near
a terminal value
– Averaging weight vectors can reduce risk of obtaining
a bad solution

CSE555: Srihari
Minimum Squared Error (MSE) Procedures
• Criterion function considered so far involve only
misclassified samples
• Now consider a criterion that involves all of the samples,
not just misclassified ones
• Previously we were interested in making all of the inner
products atyi positive
• Now try to make atyi= bi where bi are some arbitrarily
specified positive constants
• Thus replace the problem of solving a set of linear
inequalities with more stringent but better understood
problem of finding a solution to a set of linear equations

CSE555: Srihari
Minimum Squared Error (MSE)
Formulation

Note that
Data Matrix dimensionality
is d+1
by introducing
0 sub-script

Weight vector
which we wish to minimize.
to be
Can be solved either by a gradient descent
CSE555: Srihari procedure or
determined
a closed form solution based on gradient
MSE Solution based on Pseudoinverse

Closed form solution

CSE555: Srihari
Example of Linear Classifier by Pseudoinverse
x1

ƒ ω1: (1,2)t and (2,0)t


ƒ ω2: (3,1)t and (2,3)t

Sample Matrix (d = 1+2, n = 4) ⎛1⎞


⎜ ⎟
⎡1 1 2⎤ a ⎜ x1 ⎟ = 0
t

⎢1 ⎥ ⎜x ⎟ x1
Y =⎢
2 0 ⎥ ⎝ 2⎠
⎢ − 1 − 3 − 1⎥
⎢⎣− 1 − 2 − 3⎥⎦ ⎡1⎤
⎢1⎥
Pseudo-inverse ⎡ 5 / 4 13 / 12 3 / 4 7 / 12 ⎤ Assuming b = ⎢ ⎥
⎢1⎥
Y ∗ = (Y tY ) −1Y t = ⎢⎢− 1 / 2 − 1 / 6 − 1 / 2 − 1 / 6⎥⎥ ⎢⎣1⎥⎦
⎢⎣ 0 −1/ 3 0 − 1 / 3⎥⎦ ⎡ 11 / 3 ⎤
our solution is a = Y t b = ⎢⎢− 4 / 3⎥⎥
CSE555: Srihari ⎢⎣− 2 / 3⎥⎦
Properties of MSE Procedure
1. Related to Fisher’s Linear Discriminant
2. Asymptotic approximation to Bayes
discriminant function
3. Can be formulated as a gradient descent
procedure

CSE555: Srihari
1. MSE Relationship to Fisher’s
Linear Discriminant
• Show that with proper choice of the vector b the
MSE discriminant function aty is directly related
to Fisher’s linear discriminant
• Assume first n1 samples are labelled ω1 and
second n2 samples are labelled ω2
⎡ 11 X1 ⎤
Y =⎢ ⎥ where 11 is a column vector ni ones

⎣ 21 X 2⎦

and X i is a matrix whose rows are the samples labeled ωi


Partition a and b correspondingly
⎡n ⎤ This special choice of b
⎢ 11 ⎥
⎡ ⎤
w n links the MSE solution
a = ⎢ 0 ⎥ and with b = ⎢ 1 ⎥ to Fisher’s Linear
⎣w⎦ ⎢n1⎥
⎢⎣ n2 2 ⎥⎦ CSE555: Srihari Discriminant
MSE and Fisher’s Linear Discriminant
• Define sample means mi and pooled sample scatter
matrix Sw 1
mi =
ni

x∈ D i
x, i = 1, 2

Sw = ∑ ∑
i x∈ D i
( x − m i )( x − m i ) t

• and plug into MSE formulation yields


w = αnS w−1 (m1 − m2 )
where α is a scalar
• which is identical to the solution to the Fisher’s linear
discriminant except for a scale factor
• Decision rule: Decide ω1 if wt(x-m)>0; otherwise decide ω2
CSE555: Srihari
MSE Relationship to Bayes
• If b=1n MSE approaches a minimum
squared error approximation to Bayes
discriminant function
g0(x) = P(ω1|x) - P(ω2|x)
in the limit as number of samples
approaches infinity

CSE555: Srihari
2. MSE Approximation to Bayes
• If b=1n MSE solution approaches a minumum mean squared approximation
to Bayes discriminant function

Class conditional densities

Posteriors

Bayes discriminant
function
However MSE
does not necessarily MSE solution
minimize probability of (best approximation in region of data points)
error CSE555: Srihari
3. MSE Solution using Gradient Descent:

• Criterion function Js(a) = ||Ya-b||2 could be


minimized by gradient descent
• Advantage over pseudo-inverse:
– Problem when YtY is singular
– Avoids need for working with large matrices
– Computation involved is a feedback scheme
that copes with roundoff or truncation

CSE555: Srihari
Widrow-Hoff or Least Mean
Squared (LMS) Rule
Since ∆ Js = 2Yt(Ya - b)
The obvious update rule is
a(1) arbitrary
a(k+1) = a(k) + η(k)Yt(Ya(k)-b)
Can be reduced for storage requirement to the rule
where samples are considered sequentially:
a(1) arbitrary
a(k+1) = a(k) + η(k)(b(k) - at(k)yk)yk

CSE555: Srihari
Widrow-Hoff or LMS Property

• Relaxation procedures
need not converge to a
separating hyperplane
even if there exists one
• Widrow-Hoff does
converge
• However solution need
not give a separating
vector even if one exists
CSE555: Srihari
Ho-Kashyap Procedures
• Procedures considered so far:
– Perceptron and relaxation procedures finding
separating vectors if samples are linearly separable,
but do not converge on non-separable problems
– MSE procedures yield a weight vector whether
samplesa re linearly separable or not, but no
guarantee that vector is a separating vector
• Criterion function is Js(a)=||Ya-b||2
– Subject to constraint b > 0
– How to obtain both a separating vector a and a
margin b?

CSE555: Srihari
Ho-Kashyap Procedures

CSE555: Srihari
Ho-Kashyap

CSE555: Srihari
Ho-Kashyap Rule

Error vector

Positive
Part of
Error vector

CSE555: Srihari
Ho-Kashyap Algorithm

CSE555: Srihari
Ho-Kashyap Conclusion

In the nonseparable case there will be an indication of nonseparability.


However there is no bound on the number of steps needed to disclose
nonseparability

CSE555: Srihari
Linear Programming Algorithms
• Perceptron, relaxation, Ho-Kashyap are gradient descent
procedures for solving simultaneous inequalities.
• Task can also be approached with LP
• Minimize objective function z = αtu subject to constraints
Au> β
• Simplex algorithm
Surfaces of constant z are shown in gray
While constraints are shown in red
Simplex finds the extremum of z

CSE555: Srihari

You might also like