Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Linear Discriminant Functions
Minimum Squared Error Procedures:

Pseudoinverse and Widrow-Hoff
Ho-Kashyap Procedures
CSE555: Srihari
Non-separable Behavior
• Perceptron is an error-correcting procedure
– Modify weight only when an error is encountered
– Success is due to relentless search for error-free
solution
• Even if separating vector is found does not
follow it will perform well on independent test
data
– Any set of fewer than 2d samples is likely to be
linearly separable
CSE555: Srihari
Non-separable Case
• When design sets are large, points are most
likely not linearly separable
• How does error correction procedures behave
when points are not linearly separable?
– Corrections will never cease in an error correction
procedure
– Infinite sequence of weight vectors
• Weight vectors are bounded
– Empirical rule based on weight vector fluctuating near
a terminal value
– Averaging weight vectors can reduce risk of obtaining
a bad solution
CSE555: Srihari
Minimum Squared Error (MSE) Procedures
• Criterion function considered so far involve only
misclassified samples
• Now consider a criterion that involves all of the samples,
not just misclassified ones
• Previously we were interested in making all of the inner
products atyi positive
• Now try to make atyi= bi where bi are some arbitrarily
specified positive constants
• Thus replace the problem of solving a set of linear
inequalities with more stringent but better understood
problem of finding a solution to a set of linear equations
CSE555: Srihari
Minimum Squared Error (MSE)
Formulation
Note that
Data Matrix dimensionality
is d+1
by introducing
0 sub-script
Weight vector
which we wish to minimize.
to be
Can be solved either by a gradient descent
CSE555: Srihari procedure or
determined
a closed form solution based on gradient
MSE Solution based on Pseudoinverse
Closed form solution
CSE555: Srihari
Example of Linear Classifier by Pseudoinverse
x1
ω1: (1,2)t and (2,0)t

ω2: (3,1)t and (2,3)t
Sample Matrix (d = 1+2, n = 4) ⎛1⎞

⎜ ⎟
⎡1 1 2⎤ a ⎜ x1 ⎟ = 0
t
⎢1 ⎥ ⎜x ⎟ x1
Y =⎢
2 0 ⎥ ⎝ 2⎠
⎢ − 1 − 3 − 1⎥
⎢⎣− 1 − 2 − 3⎥⎦ ⎡1⎤
⎢1⎥
Pseudo-inverse ⎡ 5 / 4 13 / 12 3 / 4 7 / 12 ⎤ Assuming b = ⎢ ⎥
⎢1⎥
Y ∗ = (Y tY ) −1Y t = ⎢⎢− 1 / 2 − 1 / 6 − 1 / 2 − 1 / 6⎥⎥ ⎢⎣1⎥⎦
⎢⎣ 0 −1/ 3 0 − 1 / 3⎥⎦ ⎡ 11 / 3 ⎤
our solution is a = Y t b = ⎢⎢− 4 / 3⎥⎥
CSE555: Srihari ⎢⎣− 2 / 3⎥⎦
Properties of MSE Procedure
1. Related to Fisher’s Linear Discriminant
2. Asymptotic approximation to Bayes
discriminant function
3. Can be formulated as a gradient descent
procedure
CSE555: Srihari
1. MSE Relationship to Fisher’s
Linear Discriminant
• Show that with proper choice of the vector b the
MSE discriminant function aty is directly related
to Fisher’s linear discriminant
• Assume first n1 samples are labelled ω1 and
second n2 samples are labelled ω2
⎡ 11 X1 ⎤
Y =⎢ ⎥ where 11 is a column vector ni ones
−
⎣ 21 X 2⎦
and X i is a matrix whose rows are the samples labeled ωi

Partition a and b correspondingly
⎡n ⎤ This special choice of b
⎢ 11 ⎥
⎡ ⎤
w n links the MSE solution
a = ⎢ 0 ⎥ and with b = ⎢ 1 ⎥ to Fisher’s Linear
⎣w⎦ ⎢n1⎥
⎢⎣ n2 2 ⎥⎦ CSE555: Srihari Discriminant
MSE and Fisher’s Linear Discriminant
• Define sample means mi and pooled sample scatter
matrix Sw 1
mi =
ni
∑
x∈ D i
x, i = 1, 2
Sw = ∑ ∑
i x∈ D i
( x − m i )( x − m i ) t
• and plug into MSE formulation yields

w = αnS w−1 (m1 − m2 )
where α is a scalar
• which is identical to the solution to the Fisher’s linear
discriminant except for a scale factor
• Decision rule: Decide ω1 if wt(x-m)>0; otherwise decide ω2
CSE555: Srihari
MSE Relationship to Bayes
• If b=1n MSE approaches a minimum
squared error approximation to Bayes
discriminant function
g0(x) = P(ω1|x) - P(ω2|x)
in the limit as number of samples
approaches infinity
CSE555: Srihari
2. MSE Approximation to Bayes
• If b=1n MSE solution approaches a minumum mean squared approximation
to Bayes discriminant function
Class conditional densities
Posteriors
Bayes discriminant
function
However MSE
does not necessarily MSE solution
minimize probability of (best approximation in region of data points)
error CSE555: Srihari
3. MSE Solution using Gradient Descent:
• Criterion function Js(a) = ||Ya-b||2 could be

minimized by gradient descent
• Advantage over pseudo-inverse:
– Problem when YtY is singular
– Avoids need for working with large matrices
– Computation involved is a feedback scheme
that copes with roundoff or truncation
CSE555: Srihari
Widrow-Hoff or Least Mean
Squared (LMS) Rule
Since ∆ Js = 2Yt(Ya - b)
The obvious update rule is
a(1) arbitrary
a(k+1) = a(k) + η(k)Yt(Ya(k)-b)
Can be reduced for storage requirement to the rule
where samples are considered sequentially:
a(1) arbitrary
a(k+1) = a(k) + η(k)(b(k) - at(k)yk)yk
CSE555: Srihari
Widrow-Hoff or LMS Property
• Relaxation procedures
need not converge to a
separating hyperplane
even if there exists one
• Widrow-Hoff does
converge
• However solution need
not give a separating
vector even if one exists
CSE555: Srihari
• Procedures considered so far:
– Perceptron and relaxation procedures finding
separating vectors if samples are linearly separable,
but do not converge on non-separable problems
– MSE procedures yield a weight vector whether
samplesa re linearly separable or not, but no
guarantee that vector is a separating vector
• Criterion function is Js(a)=||Ya-b||2
– Subject to constraint b > 0
– How to obtain both a separating vector a and a
margin b?
CSE555: Srihari
CSE555: Srihari
Ho-Kashyap
CSE555: Srihari
Ho-Kashyap Rule
Error vector
Positive
Part of
Error vector
CSE555: Srihari
Ho-Kashyap Algorithm
CSE555: Srihari
Ho-Kashyap Conclusion
In the nonseparable case there will be an indication of nonseparability.

However there is no bound on the number of steps needed to disclose
nonseparability
CSE555: Srihari
Linear Programming Algorithms
• Perceptron, relaxation, Ho-Kashyap are gradient descent
procedures for solving simultaneous inequalities.
• Task can also be approached with LP
• Minimize objective function z = αtu subject to constraints
Au> β
• Simplex algorithm
Surfaces of constant z are shown in gray
While constraints are shown in red
Simplex finds the extremum of z
CSE555: Srihari

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Uploaded by

Copyright:

Available Formats

Linear Discriminant Functions

Minimum Squared Error Procedures:

Closed form solution

ω1: (1,2)t and (2,0)t

Sample Matrix (d = 1+2, n = 4) ⎛1⎞

and X i is a matrix whose rows are the samples labeled ωi

• and plug into MSE formulation yields

Class conditional densities

• Criterion function Js(a) = ||Ya-b||2 could be

In the nonseparable case there will be an indication of nonseparability.

You might also like

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures

Uploaded by

Copyright:

Available Formats

Linear Discriminant Functions

Minimum Squared Error Procedures:

Closed form solution

 ω1: (1,2)t and (2,0)t

Sample Matrix (d = 1+2, n = 4) ⎛1⎞

and X i is a matrix whose rows are the samples labeled ωi

• and plug into MSE formulation yields

Class conditional densities

• Criterion function Js(a) = ||Ya-b||2 could be

In the nonseparable case there will be an indication of nonseparability.

You might also like

ω1: (1,2)t and (2,0)t