Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures
Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures
Linear Discriminant Functions: Minimum Squared Error Procedures: Ho-Kashyap Procedures
CSE555: Srihari
Non-separable Behavior
• Perceptron is an error-correcting procedure
– Modify weight only when an error is encountered
– Success is due to relentless search for error-free
solution
• Even if separating vector is found does not
follow it will perform well on independent test
data
– Any set of fewer than 2d samples is likely to be
linearly separable
CSE555: Srihari
Non-separable Case
• When design sets are large, points are most
likely not linearly separable
• How does error correction procedures behave
when points are not linearly separable?
– Corrections will never cease in an error correction
procedure
– Infinite sequence of weight vectors
• Weight vectors are bounded
– Empirical rule based on weight vector fluctuating near
a terminal value
– Averaging weight vectors can reduce risk of obtaining
a bad solution
CSE555: Srihari
Minimum Squared Error (MSE) Procedures
• Criterion function considered so far involve only
misclassified samples
• Now consider a criterion that involves all of the samples,
not just misclassified ones
• Previously we were interested in making all of the inner
products atyi positive
• Now try to make atyi= bi where bi are some arbitrarily
specified positive constants
• Thus replace the problem of solving a set of linear
inequalities with more stringent but better understood
problem of finding a solution to a set of linear equations
CSE555: Srihari
Minimum Squared Error (MSE)
Formulation
Note that
Data Matrix dimensionality
is d+1
by introducing
0 sub-script
Weight vector
which we wish to minimize.
to be
Can be solved either by a gradient descent
CSE555: Srihari procedure or
determined
a closed form solution based on gradient
MSE Solution based on Pseudoinverse
CSE555: Srihari
Example of Linear Classifier by Pseudoinverse
x1
⎢1 ⎥ ⎜x ⎟ x1
Y =⎢
2 0 ⎥ ⎝ 2⎠
⎢ − 1 − 3 − 1⎥
⎢⎣− 1 − 2 − 3⎥⎦ ⎡1⎤
⎢1⎥
Pseudo-inverse ⎡ 5 / 4 13 / 12 3 / 4 7 / 12 ⎤ Assuming b = ⎢ ⎥
⎢1⎥
Y ∗ = (Y tY ) −1Y t = ⎢⎢− 1 / 2 − 1 / 6 − 1 / 2 − 1 / 6⎥⎥ ⎢⎣1⎥⎦
⎢⎣ 0 −1/ 3 0 − 1 / 3⎥⎦ ⎡ 11 / 3 ⎤
our solution is a = Y t b = ⎢⎢− 4 / 3⎥⎥
CSE555: Srihari ⎢⎣− 2 / 3⎥⎦
Properties of MSE Procedure
1. Related to Fisher’s Linear Discriminant
2. Asymptotic approximation to Bayes
discriminant function
3. Can be formulated as a gradient descent
procedure
CSE555: Srihari
1. MSE Relationship to Fisher’s
Linear Discriminant
• Show that with proper choice of the vector b the
MSE discriminant function aty is directly related
to Fisher’s linear discriminant
• Assume first n1 samples are labelled ω1 and
second n2 samples are labelled ω2
⎡ 11 X1 ⎤
Y =⎢ ⎥ where 11 is a column vector ni ones
−
⎣ 21 X 2⎦
Sw = ∑ ∑
i x∈ D i
( x − m i )( x − m i ) t
CSE555: Srihari
2. MSE Approximation to Bayes
• If b=1n MSE solution approaches a minumum mean squared approximation
to Bayes discriminant function
Posteriors
Bayes discriminant
function
However MSE
does not necessarily MSE solution
minimize probability of (best approximation in region of data points)
error CSE555: Srihari
3. MSE Solution using Gradient Descent:
CSE555: Srihari
Widrow-Hoff or Least Mean
Squared (LMS) Rule
Since ∆ Js = 2Yt(Ya - b)
The obvious update rule is
a(1) arbitrary
a(k+1) = a(k) + η(k)Yt(Ya(k)-b)
Can be reduced for storage requirement to the rule
where samples are considered sequentially:
a(1) arbitrary
a(k+1) = a(k) + η(k)(b(k) - at(k)yk)yk
CSE555: Srihari
Widrow-Hoff or LMS Property
• Relaxation procedures
need not converge to a
separating hyperplane
even if there exists one
• Widrow-Hoff does
converge
• However solution need
not give a separating
vector even if one exists
CSE555: Srihari
Ho-Kashyap Procedures
• Procedures considered so far:
– Perceptron and relaxation procedures finding
separating vectors if samples are linearly separable,
but do not converge on non-separable problems
– MSE procedures yield a weight vector whether
samplesa re linearly separable or not, but no
guarantee that vector is a separating vector
• Criterion function is Js(a)=||Ya-b||2
– Subject to constraint b > 0
– How to obtain both a separating vector a and a
margin b?
CSE555: Srihari
Ho-Kashyap Procedures
CSE555: Srihari
Ho-Kashyap
CSE555: Srihari
Ho-Kashyap Rule
Error vector
Positive
Part of
Error vector
CSE555: Srihari
Ho-Kashyap Algorithm
CSE555: Srihari
Ho-Kashyap Conclusion
CSE555: Srihari
Linear Programming Algorithms
• Perceptron, relaxation, Ho-Kashyap are gradient descent
procedures for solving simultaneous inequalities.
• Task can also be approached with LP
• Minimize objective function z = αtu subject to constraints
Au> β
• Simplex algorithm
Surfaces of constant z are shown in gray
While constraints are shown in red
Simplex finds the extremum of z
CSE555: Srihari