[go: up one dir, main page]

0% found this document useful (0 votes)
9 views34 pages

Data Science L15 - OptimizationMultiVariable

The document outlines the concepts of optimization in data science, focusing on unconstrained optimization for single and multiple variables. It discusses key topics such as optimality criteria, gradients, and the Hessian matrix, along with their applications in finding minima and maxima. Additionally, it includes practical examples and methods for gradient descent and unidirectional search techniques.

Uploaded by

Abhishek Goutam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views34 pages

Data Science L15 - OptimizationMultiVariable

The document outlines the concepts of optimization in data science, focusing on unconstrained optimization for single and multiple variables. It discusses key topics such as optimality criteria, gradients, and the Hessian matrix, along with their applications in finding minima and maxima. Additionally, it includes practical examples and methods for gradient descent and unidirectional search techniques.

Uploaded by

Abhishek Goutam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ED5340 - Data Science: Theory apa th y

and Practise an
h ug
u t
M
a n
L15 - Optimization for multiple
n a t variable h
a
Ram

Ramanathan Muthuganapathy (https://ed.iitm.ac.in/~raman)


Course web page: https://ed.iitm.ac.in/~raman/datascience.html
Moodle page: Available at https://courses.iitm.ac.in/
Unconstrained optimization

2 3
• Single variable (e.g. min J(w), e.g J(w) = w th, yJ(w) = w ,
2
J(w) = w + 54/w) n ap a
a
h ug
t 2 2
• multivariable (e.g. min J(w1, w2) = (w
n
M1 − 2) + (w2 − 2) )
u
h a
a t
• n-dimensional multivariable (e.g. n
m
a
a 2 2 2
J(w1, w2, . . . . . . . , wn) = (w1 − 2) + (w2 − 2) + . . . . + (wn − 2) ))
R

• min J(w1, w2, . . . . . . . , wn)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Surface plot
2 2
J(w1, w2) = (w1 − 2) + (w2 − 2)

h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Contour plot / level set / height function
2 2
J(w1, w2) = (w1 − 2) + (w2 − 2)

• Two points in a contour have h y


the same J value a t
ap
an
ug
• Imaging cutting with J-plane u th
at different J-values M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Demo using SrfPlots.py

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality criteria - multiple variables

• min J(w) th y
p a
n a
• The value of w for which the function J(w)
h u ga has the least (minimum) value
u t
M
• Local minimum t h a n
n a
a
a m
R

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient - multiple variables
2 2
min J(w1, w2) = (w1 − 2) + (w2 − 2) - Partial derivatives
2 2
• J(w1, w2) = (w1 − 2) + (w2 − 2) y
th
a
∂J n ap
- Partial derivation of J(w1, w2) wrt w a
g1
• ∂w1 u th u
M
a n
∂J a t h
- Partial derivation of J(w1a,mw
a ) wrt w
n
2 2
• ∂w2 R

( ∂w1 ∂w2 )
∂J ∂J
• ∇J(w ,
1 2w ) = , , where ∇J(w1 , w2 ) or grad. J

• NOTE: grad. J is a vector.


Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras
Gradient - multiple variables
What is ∇J(w1, w2) or grad. J? f(x, y, z)

• Surface f(x, y, z) = c
h y
a t
• Any curve f(x(t), y(t), z(t)) an ap
ug
∂f dx ∂f dy ∂f dz u th
+
• ∂x dt ∂y dt ∂z dt + = 0 h a n
M
a t
a n

• ( ∂x ∂y ∂z ) ( dt dt dt )
am
∂f ∂f ∂f dx dy dzR
, , . , , = 0

′ ′ ′
• ∇f . (x (t), y (t) . z (t)) = 0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient - multiple variables
What is ∇J(w1, w2) or grad. J?
f(x, y, z)
′ ′ ′
• ∇f is the grad. f and (x (t), y (t) . z (t)) is the y
tangent vector. th
pa
n a
a
h ug
u t
M
a n
t h
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient - multiple variables
What is ∇J(w1, w2) or grad. J? f(x, y, z)

• Take another curve (blue)


h y
a t
′ ′ ′
• ∇f . (x (t), y (t) . z (t)) = 0 an ap

h ug
t
• Dot product n
M
u
h a
a t
• ∇f is perpendicular to set of tangents
am
a at n
that
point. R

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient - multiple variables
What is ∇J(w1, w2) or grad. J? f(x, y, z)

′ ′ ′
• ∇f . (x (t), y (t) . z (t)) = 0 y
th
pa
• Dot product an a
h ug
t
• ∇f is perpendicular to set of tangents
u
Mat that
an h
point. a t
an
am
• ∇f is the Normal vector at a R
point.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient at a point
What is ∇J(w1, w2) or grad. J? - Back to our notation

• ∇J(w1, w2) is a normal vector th y


pa
n a
a
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Hessian Matrix
2 2
min J(w1, w2) = (w1 − 2) + (w2 − 2) - Second partial derivatives

• ∂w 2 ∂w1 ( ∂w1 )
2
∂J ∂ ∂J
= th y
1 pa
n a
a

• ∂w 2 ∂w2 ( ∂w2 )
2
∂J ∂ ∂J h ug
t
= n
M
u
2 h a
a t
a n

• ∂w1∂w2 ∂w1 ( ∂w2 )


2
∂J ∂ ∂J Ram
=

• ∂w2∂w1 ∂w2 ( ∂w1 )


2
∂J ∂ ∂J
=

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Hessian Matrix
Matrix of second partial derivatives

h y
2 2 ap t
∂J ∂ganJ
a

u
M∂w1∂w2
h
∂w1
2
a n
u t

th
n a
H= 2 a 2
∂ JR am ∂J
∂w2∂w1 ∂w2
2

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality Criteria for Multiple Variables
2 2
min J(w1, w2) = (w1 − 2) + (w2 − 2)

• ∇J(w1, w2) = 0, Get w* = (w*


1
, w*
2
)
a th y
ap
an
• Hessian H should be positive definite th ug
at w* for min M
u
n a
th
a
• Hessian H should be negative ma n
a
definite at w* for max R

• At a saddle point, H is indefinite

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimality Criteria for Multiple Variables
How to find the type for H? (Use LA)

• H is positive definite if all the Eigenvalues are >thy0 (All λ′i s > 0 )
p a
n a
• H is negative definite if all the Eigenvalues u g are < 0 (All λ′i s < 0 )
a
th
u
M
n
• H is indefinite if some Eigenvaluesathare > 0 and some are < 0 (All λ′i s
a >0)
an
a m
R

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Example
2 2
min J(w1, w2) = (w1 − 2) + (w2 − 2)

• ∇J(w1, w2) = 0, Get w* = (w*


1
, w*
2
)
a th y
ap
∂J ug
an

• ∂w1 = 2(w1 − 2) u th
M
a n
th
a
∂J a n

• ∂w2 = (2w2 − 2) Ram

• Critical point
w* = (w* 1
, w*
2
) = (2, 2)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Example
Compute Hessian at (2, 2)

2
∂J y
= 2 a th
• ∂w 2 n ap
1 a
2 h ug
∂J M
u t
= 2 a n
• ∂w 2 a th
2 a n
2 am
∂J R
• ∂w1∂w2 = 0

2
∂J
• ∂w2∂w1 = 0

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Hessian Matrix
Matrix of second partial derivatives

[0 2]
y
2 0
th
pa
n a
a
H= h ug
u t
M
a n
th
n a
a
Ram
Eigen values ?

H is then _____________ definite and hence the point


w* = (w*, w*) = (2, 2) is local _________________
1 2

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


CW: Do a similar exercise for
2 2
J(w1, w2) = w1 − w2

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Unidirectional search
2 2
J(w1, w2) = (w1 − 2) + (w2 − 2)

• Starting point
s s s
w = (w1, w2) = (−4, − 4) p
s a th y
n a
a
h ug
• Search direction s (vector) n
M
u t

h a
a t
s
• w* = w + αS am
a n
R
• Bracketing method to find α
• Fine tuning with interval
halving (or golden search etc.)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Unidirectional search - Issues
2 2
J(w1, w2) = (w1 − 2) + (w2 − 2)

• Starting point
p
s a th y
a
• Search direction s (vector) ug
a n
th
u
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient at a point
What is ∇J(w1, w2) or grad. J? - Back to our notation

• ∇J(w1, w2) is a normal vector th y


pa
n a
a
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient at a point
Traveling along grad. J

• If you travel along the direction of h y


t
the grad. J, what happens to J? apa
an
h ug
u t
M
a n
th
n a
a
R am

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Gradient descent
Traveling along -grad. J or − ∇J(w1, w2)

• We should travel along − ∇J th y


pa
n a
a
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Potential directions and steepest descent
Traveling along -grad. J or − ∇J(w1, w2)

• Let d be such that ∇J . d is -ve. th y


pa
n a
• Let d be such that d = − ∇J h ug
a
u t
M
• ∇J. − ∇J = − 1 a th a n
a n
am
• Hence − ∇J is the steepest! R
− ∇J
• Steepest (Cauchy’s) Gradient
Descent d
∇J

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Algorithm - Gradient descent
Traveling along -grad. J or − ∇J(w1, w2)

• Starting point w* = (w*


1
, w*
2
)
a th y
ap
an
• Compute J, − ∇J at w*
k
= w*.
u th ug
M
a n
• Update w’s a n a th

Ram
• w*
k+1
= w*
k
− αk ∇J
− ∇J
• Check for stopping criteria d
• Else continue the iteration ∇J

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Algorithm - Update step
w*
k+1
= w*
k
− αk ∇J

• Update w’s y
th
pa
k+1 k
• w1 = w1 − αk ∇J an a
h ug
k+1 k u t
• w2 = w2 − αk ∇J n
M
h a
a t
• Compute J, − ∇J at w*
k+1
.
am
a n
R
• Finding αk − ∇J
• Unidirectional search (or)
d
• Make it a constant (Learning rate in ML) ∇J

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Algorithm - Stopping criteria

1. if | | ∇J(w*
k
) | | ≤ ϵ1 h y
a t
ap
n
2. if | ∇J(w*
k+1
) . ∇J(w*
k
) | ≤ ϵ2 ug
a
th
u
M
| | w*
k+1 − w*
k | | h a n
3. if ≤ ϵ1 n a t
| | w*
k | | am
a
R
4. if number of iterations exceeds a − ∇J
predefined constant (k > 100, say)
d
• NOTE: Compute 1 or 4 before
update and 2 or 3, after ∇J

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Himmelblau function
2 2 2 2
J(w1, w2) = (w1 + w2 − 11) + (w1 + w2 − 7)

h y
a t
ap
an
h ug
u t
M
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Recap - Single vs Multiple
w*
k+1
= w*
k
− αk ∇J

h y
a t
p
wga2n a
h u
w (1) u t
w (1) M
a n
J(w) a th − ∇J
a n
R am
d
∇J
w
w1

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Optimization strategies
Variations in (steepest) gradient descent

• Constant step length α h y


a t
ap
• Adaptive step length (αk) using line search
ug
a n
th
u
M
• Stochastic gradient descent th a n
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras


Line Search

s s
[w* w*
1 2
] = [w1 w2] + αS
s s
[w* w*
1 2
] = [w1 w2] + α( − ∇J)

α=0 α = αi α = 10
s s
[w1 w2] − ∇J

s s
[w1 w2] + α( − ∇J)
S. Grad. Des.

s s
[w1 w2]
α = 0.01

− ∇J
− ∇J
− ∇J

You might also like