Chen CL
Analytical vs. Numerical ?
Numerical Methods for
Unconstrained Optimization
Analytical Methods: write necessary conditions and
solve them (analytical or numerical ?) for candidate
local minimum designs
Some Diculties:
Number of design variables in constraints can be large
Functions for the design problem can be highly nonlinear
In many applications, cost and/or constraint functions can be
implicit in terms of design variables
Cheng-Liang Chen
PSE
Numerical Methods: estimate an initial design and
improve it until optimality conditions are satised
LABORATORY
Department of Chemical Engineering
National TAIWAN University
Chen CL
General Concepts Related to
Numerical Algorithms
A General Algorithm
Current estimate:
x(k)
Subproblem 1:
(k)
feasible search direction
(positive scalar) step size
Subproblem 2:
k = 0, 1,
New estimate: x(k+1) = x(k) + k d(k)
= x(k) + x(k)
Chen CL
General Concepts Related to
Numerical Algorithms
Chen CL
Chen CL
General Concepts Related to
Numerical Algorithms
General Concepts Related to
Numerical Algorithms
Descent Step Idea
current
estimate
Example: check the descent condition
f (x) = x21 x1x2 + 2x22 2x1 + e(x1+x2)
new
estimate
f (x(k)) > f (x(k+1))
= f (x(k) + k d(k))
Taylor
(k)
(k)
f (x ) + f (x )
(k)
(k)
+ k d
(k)
(k) (k)
= f (x(k)) +
k c d
<0
f (x(k))d(k) = c(k)d(k) < 0 : descent condition
T
Angle between c(k) and d(k) must be between 90o and 270o
Chen CL
Unconstrained Optimization
Verify d1 = (1, 2), d2 = (1, 0) at (0, 0) are descent directions or not
1
2x1 x2 2 + e(x1+x2)
=
c =
1
x1 + 4x2 + e(x1+x2)
(0,0)
1
c d1 = 1 1 = 1 + 2 = 1 > 0 (not a descent dir.)
2
1
c d2 = 1 1 = 1 + 0 = 1 < 0 (a descent dir.)
0
Chen CL
One-Dimensional Minimization:
Reduction to A Function of Single Variable
Assume: a descent direction has been found
f (x)
f (x(k) + d(k))
f ()
<
T
f (x(k)) + f
(x(k))d(k) = f(|x(k))
=cd<0
f (0) = f (x(k)) (small move reducing f )
f (0)
c(k) d(k) < 0
Taylor
d should be a descent direction
Chen CL
Example: analytical step size determination
Analytical Method to Compute Step Size
f (x) = 3x21 + 2x1x2 + 2x22 + 7
d(k) is a descent direction > 0
df 2(k )
df (k )
= 0,
>0
d
d2
0=
Chen CL
df (x(k+1)) df (x(k+1)) dx(k+1)
(k+1)
=
= f T (x
) d(k)
d
dx
d
c(k+1)T
T
c(k+1) d(k) = c(k+1) d(k) = 0
Gradient of the cost function at NEW point, c(k+1),
is orthogonal to the current search direction, d(k)
d(k) = (1, 1) at x(k) = (1, 2)
6x1 + 2x2
10
c(k) = f (x(k)) =
=
2x1 + 4x2 (k)
10
x
1
(k)
(k)
= 10 10 = 20 < 0
c d
1
1
1
1
x(k+1) = + =
2
1
2
f (x(k+1)) = 3(1 )2 + 2(1 )(2 ) + 2(2 )2 + 7
= 72 20 + 22 f ()
Chen CL
10
df
10
= 14k 20 = 0 k =
d
7
2
df
= 14 > 0
d2
1
1
3/7
=
x(k+1) = + ( 10
)
7
1
2
4/7
NC:
54
< 22 = f (x(k))
7
10
7
(k+1)
f (x
) =
10
7
1
10
= 0 (check)
f T (x(k+1))d(k) = 10
7
7
1
f (x(k+1)) =
Chen CL
11
Numerical Methods to Compute Step Size
Most one-dimensional search methods work for only
unimodal functions
(work for = 0
= u,)
(u interval of uncertainty)
Chen CL
12
Chen CL
Unimodal Function
Unimodal Function
Unimodal function: f (x) is one unimodal function if
Outcome of two experiments
x [0, 1], 0 < x1 < x2 < 1
x1 < x2 < x implies f (x1) > f (x2), and
x > x3 > x4 implies f (x3) < f (x4)
Chen CL
f1 < f2 x [0, x2]
f1 > f2 x [x1, 1]
f1 = f2 x [x1, x2]
14
Equal Interval Search
To reduce successively the interval of uncertainty, I,
to a small acceptable value
I = u ,
( = 0)
Evaluate the function at = 0, , 2, 3, , u
If f ((q + 1)) < f (q)
then continue
If f ((q + 1)) > f (q)
then = (q 1),
new pt
current pt
u = (q + 1)
[ , u ]
13
Chen CL
15
Chen CL
16
Chen CL
Equal Interval Search: Example
Equal Interval Search:
Example
f () = 2 4 + e
= 0.5
= 0.001
Note:
f (x) = x(x1.5), x [0, 1] x [x7, x8] = [0.7, 0.8]
i
xi
.1
.2
.3
.4
.5
.6
.7
.8
.9
use 99 points
eliminate 98%
eliminate 1%
per function evaluation
f (xi) .14 .26 .36 .44 .50 .54 .56 .56 .54
Chen CL
18
Equal Interval Search: 3 Interior Points
x [a, b] three tests x1, x0, x2 three possibilities
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
u
u
u
u
Trial step
0.000000
0.500000
1.000000
1.500000
2.000000
1.050000
1.100000
1.150000
1.200000
1.250000
1.300000
1.350000
1.400000
1.450000
1.355000
1.360000
1.365000
1.370000
1.375000
1.380000
1.385000
1.390000
1.380500
1.381000
1.381500
1.382000
1.382500
1.383000
1.383500
1.384000
1.384500
1.385000
1.385500
1.386000
1.386500
1.387000
1.386500
Function
3.000000
1.648721
0.718282
0.481689
1.389056
0.657651
0.604166
0.558193
0.520117
0.490343
0.469297
0.457426
0.455200
0.463115
0.456761
0.456193
0.455723
0.455351
0.455077
0.454902
0.454826
0.454850
0.454890
0.454879
0.454868
0.454859
0.454851
0.454844
0.454838
0.454833
0.454829
0.454826
0.454824
0.454823
0.454823
0.454824
0.454823
17
= 0.5
start
from
= 1.0
= 0.05
start
from
= 1.35
= 0.005
start
from
= 1.38
= 0.0005
Chen CL
19
Equal Interval Search: 2 Interior Points
a = + 13 (u ) = 13 (u + 2)
b = + 23 (u ) = 13 (2u + )
Case 1: f (a) < f (b) < < b
Case 2: f (a) > f (b) a < < u
I = 23 I
reduced interval of uncertainty
2 points eliminate 16.7% per function evaluation ?!
try 3 points (2 are new) eliminate 25% per function evaluation !!
Why ?
old point is NOT used
Chen CL
20
Chen CL
21
Golden Section Search
Golden Section Search
Reduction of Interval of Uncertainty
Question of Equal Interval Search (n = 2):
known midpoint is NOT used in next iteration
Solution: Golden Section Search
Fibonacci Sequence:
F0 = 1;
F1 = 1;
Fn = Fn1 + Fn2, n = 2, 3,
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,
Fn
1.618,
Fn1
Given u,
I
Select a, b s.t. u a
b
Suppose f (b) > f (a)
b
= ,
b
Fn1
0.618 as n
Fn
Chen CL
22
=
=
=
=
=
u
I, a
I, u
[b, u],
a, u
I , u
= (1 )I
b = (1 )I
delete [b, u]
= b, I = u ,
b = (1 )I
Chen CL
23
Golden Section Search
Golden Section Search
Reduction of Interval of Uncertainty
Initial Bracketing of Minimum
I = I,
(1 )I = I = ( I)
2 + 1 = 0
= 1+2 5 = 0.618 =
1 = 0.382
1
1.618
Q : ne initial three points ?
q1
q = 0; 0 =
1 0
q = 1; 1 = + 1.618
= 2.618
1.618(0 0)
q = 2; 2
1
2
2
= 2.618 + 1.618
= 5.236
2
3
3
= 5.236 + 1.618
= 9.472
1.618(1 0 )
q1 q2
u a
q q1
0.618I
= 1.618
=
=
a
q1 q2
0.382I
ratio of increased trial step size is 1.618
q = 0, 1, 2,
j=0
q1
q2
a = 0.382I,
u
a = 0.618I = (1.618) 0.382I
Starting at = 0,
q
evaluate q = (1.618)j = q1 + (1.618)q ,
q = 3; 3
..
..
1.618(2 1 )
Chen CL
24
25
Golden Section Search
Golden Section Search
Initial Bracketing of Minimum
Algorithm
Step 1: choose q, = q2, u = q , I
f (q2) > f (q1) and
f (q1) < f (q )
If
Chen CL
Step 2: a = + 0.382I, b = + 0.618I, f (a), f (b)
Then q2 < < q
u = q
q
Step 3: compare f (a), f (b), go to Step 4, 5, or 6
(1.618)j
Step 4: if f (a) < f (b) < < b
= , u = b, b = a, a = + 0.382(u ), go to Step 7
(1.618)j
Step 5: if f (a) > f (b) a < < u
= a, u = u, a = b, b = + 0.618(u ), go to Step 7
j=0
q2
= q2 =
j=0
I = u
q
q1
= (1.618)
+ (1.618)
q q1
= 2.618(1.618)
q1
Step 6: if f (a) = f (b) a < < b
= a, u = b, return to Step 2
q1 q2
Step 7: if I = u <
=
Chen CL
26
Golden Section Search: Example
u +
2
and Stop; otherwise return to Step 3
Chen CL
27
Golden Section Search: Example
f () = 2 4 + e,
f () = 2 4 + e
= 0.5,
= 0.001
= 0.5
= 0.001
Reducing Interval of Uncertainty
No.
l ; [f (l )]
a ; [f (a )]
b ; [f (b )]
u ; [f (u )]
0.500 000
1.309 017
1.809 017
2.618 034
2.118 034
[1.648 721]
[0.466 464]
[0.868 376]
[5.236 610]
0.500 000
1.000 000
1.309 017
1.809 017
[1.648 721]
[0.718 282]
[0.466 464]
[0.868 376]
Table 5.2 Golden Section Search for
f () = 2 4 + e
3
4
Initial Bracketing of Minimum
Trial step
1
2
3
4
0.000
1 0.500
q 1.309
u 2.618
000
000
071
034
Function value
3.000
1.648
0.466
5.236
000
721
464
610
5
6
7
8
1.000 000
1.309 017
1.500 000
1.809 017
[0.718 282]
[0.466 464]
[0.481 689]
[0.868 376]
1.000 000
1.190 983
1.309 017
1.500 000
[0.718 282]
[0.526 382]
[0.466 464]
[0.481 689]
1.190 983
1.309 017
1.381 966
1.500 000
[0.526 382]
[0.466 464]
[0.454 860]
[0.481 689]
1.309 017
1.381 966
1.427 051
1.500 000
[0.466 464]
[0.454 860]
[0.458 190]
[0.481 689]
1.309 017
1.354 102
1.381 966
1.427 051
[0.466 464]
[0.456 873]
[0.454 860]
[0.458 190]
1.354 102
1.381 966
1.399 187
1.427 051
[0.456 873]
[0.454 860]
[0.455 156]
[0.458 190]
1.309 017
0.809 017
0.500 000
0.309 017
0.190 983
0.118 304
0.072 949
Chen CL
28
Chen CL
29
Golden Section Search: Example
f () = 2 4 + e,
= 0.5,
Golden Section Search: Example
= 0.001
f (x) = x(x 1.5);
Reducing Interval of Uncertainty (cont)
Initial Bracketing of Minimum
No.
l ; [f (l )]
a ; [f (a )]
b ; [f (b )]
u ; [f (u )]
1.354 102
1.371 323
1.381 966
1.399 187
0.045 085
[0.456 873]
[0.455 269]
[0.454 860]
[0.455 156]
1.371 323
1.381 966
1.388 544
1.399 187
[0.455 269]
[0.454 860]
[0.454 833]
[0.455 156]
10
11
12
13
14
15
1.381 966
1.388 544
1.392 609
1.399 187
[0.454 860]
[0.454 833]
[0.454 902
[0.455 156]
1.381 966
1.386 031
1.388 544
1.392 609
[0.454 860]
[0.454 823]
[0.454 833]
[0.454 902]
1.381 966
1.384 479
1.386 031
1.388 544
[0.454860]
[0.454 829]
[0.454 823]
[0.454 833]
1.384 479
1.386 031
1.386 991
1.388 544
[0.454 829]
[0.454 823]
[0.454 824]
[0.454 833]
1.384 471
1.385 438
1.386 031
1.386 991
[0.454 829]
[0.454 824]
[0.454 823]
[0.454 823]
0.027 864
0.017 221
0.010 643
2
3
4
5
6
7
8
Xa
0
0.25
0.6545
1.308981
0
0.3125
0.55338
0.25004
Xl
fmin
Xu
31
Golden Section Search: Example
= 0.25, = 0.001
Xb
1
2
3
4
Chen CL
f (x) = x(x 1.5);
Reducing Interval of Uncertainty
Xl
Fcn value
0.002 512
Golden Section Search: Example
Trial xs
0.004 065
30
No.
No.
0.006 570
Chen CL
f (x) = x(x 1.5);
= 0.25, = 0.001
= 0.25, = 0.001
Reducing Interval of Uncertainty (cont)
Xu
0.250000000
0.654530742
0.904450258
1.308981000
0.312500000
0.553385621
0.538645118
0.250040242
0.250000000
0.499999999
0.654450259
0.904450258
0.312500000
0.499999999
0.553370247
0.538645118
0.499999999
0.654499998
0.749950259
0.904450258
0.499999999
0.553379750
0.562499998
0.538645118
0.654499998
0.749980997
0.808969259
0.904450258
0.553379750
0.562500000
0.559022627
0.538645118
0.654499998
0.713507255
0.749962001
0.808969259
0.553379750
0.561168280
0.562499999
0.559022627
0.713507255
0.749973741
0.772502773
0.808969259
0.561168280
0.562499999
0.561993625
0.559022627
0.713507255
0.736043543
0.749966485
0.772502773
0.561168280
0.562305217
0.562499999
0.561993625
0.736043543
0.749970969
0.758575347
0.772502773
0.562305217
0.562499999
0.562426463
0.561993625
I
1.058981
0.654450258
0.404450259
0.24995026
0.154469261
0.095462003
0.058995518
0.03645923
No.
9
10
11
12
13
14
15
16
Xl
Xa
Xb
Xu
0.736043543
0.744650692
0.749968198
0.758575347
0.562305217
0.562471385
0.562499999
0.562426463
0.744650692
0.749969911
0.753256129
0.758575347
0.562471385
0.562499999
0.562489398
0.562426463
0.744650692
0.747937969
0.749968852
0.753256129
0.562471385
0.562495748
0.562499999
0.562489398
0.747937969
0.749969506
0.751224592
0.753256129
0.562495748
0.562499999
0.562498500
0.562489398
0.747937969
0.749193459
0.749969102
0.751224592
0.562495748
0.562499349
0.562499999
0.562498500
0.749193459
0.749969352
0.750448699
0.751224592
0.562499349
0.562499999
0.562499799
0.562498500
0.749193459
0.749672961
0.749969198
0.750448699
0.562499349
0.562499893
0.562499999
0.562499799
0.749672961
0.749969293
0.750152367
0.750448699
0.562499893
0.562499999
0.562499977
0.562499799
I
0.022531804
0.013924655
0.008605437
0.00531816
0.003286623
0.002031133
0.00125524
0.000775738
Chen CL
32
Quadratic Curve Fitting
(approximated quadratic function)
f () = q() = a0 + a1 + a22
f (i) = q(i) = a0 + a1i + a2i2
f (u) = q(u) = a0 + a1u + a2u2
1
f (u) f () f (i) f ()
a2 =
u i
u
i
f (i) f ()
a1 =
a2(i + )
i
a0 = f () a1 a22
1: locate initial interval of uncertainty (, u)
2: select < i < u f (i)
3: compute a0, a1, a2,
, f (
)
4:
< i
f (i) < f (
)
f (i) > f (
)
[
, u]
[, i]
, i,
i,
, u
[,
]
[i, u]
, i, u
,
, i
Step 5: Stop if two successive estimates of minimum point of
f () are suciently close. Otherwise delete primes on , i , u
and return to Step 2
34
Example:
f () = 2 4 + e
Step
Step
Step
Step
i <
dq()
= a1 + 2a2
= 0
d
a1
d2 q
=
if d
2 = 2a2 > 0
2a2
Chen CL
33
Computational Algorithm:
Polynomial Interpolation
q() = a0 + a1 + a22
Chen CL
= 0.5
Chen CL
35
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
= 0.5 i = 1.309017 u = 2.618034
f () = 1.648721 f (i) = 0.466464 f (u) = 5.236610
3.5879 1.1823
1
a2 = 1.30902
2.1180 0.80902 = 2.410
a1 =
1.1823
0.80902
(2.41)(1.80902) = 5.821
a0 = 1.648271 (5.821)(0.50) 2.41(0.25) = 3.957
= 1.2077 < i
f (
) = 0.5149 > f (i)
=
= 1.2077 u = u = 2.618034, i = i = 1.309017
= 1.2077 i = 1.309017 u = 2.618034
f () = 0.5149 f (i) = 0.466464 f (u) = 5.236610
a2 = 5.3807 a1 = 7.30547 a0 = 2.713
= 1.3464 f (
) = 0.4579
Conjugate Directions
Let A be an n n symmetric matrix.
A set of n vectors (directions) {S i} is said to be
A-conjugate if
S Ti AS j = 0 for i, j = 1, , n;
i
= j
Note: orthogonal directions are a special case of
conjugate directions (A = I)
Chen CL
36
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
37
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Quadratically Convergent Method
If a minimization method, using exact arithmetic, can
nd the minimum point in n steps while minimizing a
quadratic function in n variables, the method is called
a quadratically convergent method
Theorem: Given a quadratic function of n variables
and two parallel hyperplanes 1 and 2 of dimensions
k < n. Let the constrained stationary points of the
quadratic function in the hyperplanes be X 1 and X 2,
respectively. Then the line joining X 1 and X 2 is
conjugate to any line parallel to the hyperplanes.
Chen CL
Chen CL
38
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Meaning: If X 1 and X 2 are the minima of Q obtained
by searching along the direction S from two dierent
starting points X a and X b, respectively,
the line (X 1 X 2) will be conjugate to S
Proof:
1 T
X AX + B T X + C
2
Q(X) = AX + B
(n 1)
Q(X) =
search from a along S X 1
(stationary pt)
search from b along S X 2
S orthogonal to Q(X 1) and Q(X 2)
S T Q(X 1) = S T AX 1 + S T B = 0
S T Q(X 2) = S T AX 2 + S T B = 0
S T [Q(X 1) Q(X 2)] = S T A(X 1 X 2) = 0
Chen CL
39
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Proof:
Theorem:
If a quadratic function
1
Q(X) = X T AX + B T X + C
2
is minimized sequentially,
once along each direction
of a set of n mutually
conjugate directions,
Q(X ) = B + AX = 0
n
j S j
Let X = X 1 +
j=1
Sj
conjugate directions
to A
n
j S j
0 = B + AX 1 + A
j=1
0 =
the
minimum of the function Q
the starting point
S Ti (B+
AX 1)+
n
S Ti A
j S j
j=1
will be found at or before
the nth step irrespective of
= (B + AX 1)T S i + iS Ti AS i
(B + AX 1)T S i
=
S Ti AS i
Chen CL
40
Multi-Dimensional Minimization:
Powells Conjugate Directions Method
Note:
X i+1
i
0
Q(X i+1)
0
i
Xi
Chen CL
Powells Conjugate Directions: Example
f (x1, x2) = 6x21 + 2x22
6x1x2 x1 2x2
x
12 6 x
1
1
1
= 1 2
+ x1 x2
2
x
x
6
4
2
2
1
0
if S 1 =
X1 =
2
0
12 6 s
1
S T1 AS 2 = 1 2
6
4
s
2
s
1
1
= 0
S2 =
= 0 2
s2
0
1
1 2
2
5
1 =
=
4
12
6
1
1 2
6 4 2
0
5/4
5 1
+
X 2 = X 1 + 1 S 1 =
=
4 2
0
5/2
i S i,
=
is
=
=
=
=
Xi +
i = 1, , n
found by minimizing Q(iS i) so that
S Ti Q(X i+1)
B + AX i+1 = B + A(X i + i S i)
S Ti Q(X i+1) = S Ti {B + A(X i + i S i)}
(B + AX i)T S i + i S Ti S i
(B + AX i)T S i
=
S Ti AS i
i1
= X1 +
j S j
j=1
X iT AS i = X 1T AS i +
i1
j S j T AS i = X 1T AS i
j=1
Si
S Ti AS i
Si
= (B + AX 1)T T
S i AS i
i = (B + AX i)T
= i
Chen CL
42
Powells Conjugate Directions: Example
1 2
0
1
=
12
12
6
1
1 0
6 4 0
5/4
1
1
= X 2 + 2 S 2 =
+
12 0
5/2
2 =
X3
=
4/3
5/2
= X (?)
41
Chen CL
43
Powells Algorithm
Chen CL
44
Progress of Powells Method
un ; u1 ,
u2,
, un1,
S (1); u2,
, un1,
S (2); u3,
un,
S (n1); un, S (1),
S (2),
Chen CL
45
Powells Conjugate Directions: Example
un;
un,
S (1);
S (1),
S (2);
, S (n1)
Min: f (x1, x2) = x1 x2 + 2x21 + 2x1x2 + x22
X 1 = [0 0]T
(un, S (1)), (S (1), S (2)),
un, S (1), S (2),
are A-conjugate
Chen CL
46
Cycle 1: Univariate Search
= 0 =
1
2
X 2 = X 1 + u2 =
0.5
2
along u1: f (X 2 u1) = f (, 0.5) = 2 2 0.25
df
d
= 0 =
1
2
X 3 = X 2 u1 =
0.5
0.5
along u2: f (X 3 + u2) = f (0.5, 0.5 + ) = 2 0.75
df
d
= 0 =
47
Cycle 2: Pattern Search
along u2: f (X 1 + u2) = f (0, ) = 2
df
d
Chen CL
1
2
X 4 = X 1 + u2 =
0.5
S (1) = X 4 X 2
0.5
0
0.5
=
=
1
0.5
0.5
f (X 4 + S (1)) = f (0.5 0.5, 1 + 0.5)
= 0.252 0.5 1
df
d
= 0 = 1.0
X 5 = X 4 + S (1) =
1.0
1.5
Chen CL
48
Chen CL
Simplex Method
Chen CL
Simplex Method
50
Simplex Method
49
Chen CL
51
Simplex Method
Chen CL
52
Chen CL
Properties of Gradient Vector
53
Properties of Gradient Vector
Property 1: The gradient vector c of a function
f (x1, , xn) at point x = (x1, , xn) is orthogonal
(normal) to the tangent plane for the surface
f (x1, , xn) = constant.
f (x) = f (x1, x2, . . . , xn)
f (x)
x1
.
f (x) =
. = c
f (x)
xn
c(k) = c(x(k)) = f (x(k)) =
f (x(k))
xi
C is any curve on the surface through x
T is a vector tangent to curve C at x c T = 0
Chen CL
54
Chen CL
Properties of Gradient Vector
Proof:
s : any parameter along C
x1
s
T =
xn
s
x=x
(a unit tangent vector along C at x)
df
= 0
f (x) = constant
ds
df
f x1
f xn
0 =
=
+ +
ds
x1 s
xn s
T
= c T = cT
55
Properties of Gradient Vector
Property 2: Gradient represents a direction of
maximum rate of increase for f (x) at x
Proof:
u
a unit vector in any direction not tangent to C
t : a parameter along u
df
f (x + u) f (x)
= lim
0
dt
f
f
2
f (x + u) = f (x) + u1 x
+
+
u
n
xn + O( )
1
n
1
f
ui
+ O(2)
( )
f (x + u) f (x) =
xi
i=1
n
df
f (x + u)
f
ui
= c u = cT u
= lim
=
0
dt
x
i
i=1
= ||c|| ||u|| cos
(max rate of increase when = 0)
Chen CL
56
Chen CL
Properties of Gradient Vector
57
Verify Properties of Gradient Vector
Property 3: The maximum rate of change of f (x) at
any point x is the magnitude of the gradient vector
(max df
dt = ||c||)
u is in the direction of gradient vector for = 0
f (x) = 25x21 + x22,
f (x(0)) = 25
x(0) = (0.6, 4)
c = f (0.6, 4) =
C T =0
Slope of tangent: m1 =
Slope of gradient: m2 =
dx2
dx1
c1
c2
5x1 2
1x1
50x1
2x2
1
= 3.75
30
8
= 3.75
Property 2: choose arbitrary direction
D = (0.501034, 0.865430), = 0.1
x(1)
C
x(1)
D
f (x(1) )
C
f (x(1) )
D
0.6
0.966235
0.6966235
(0)
= x + C =
+ 0.1
=
4.0
0.257663
4.0257663
0.6
0.501034
0.6501034
= x(0) + D =
+ 0.1
=
4.0
0.865430
4.0854300
= 28.3389
= 27.2566
Property 3:
<
f (x(1) )
C
C C = 1.00 > C D = 0.7071059
50x1
2x2
=
30
8
15
t
=
=
(4)2 +152
||t||
.257663
0.966235
Chen CL
Verify Properties of Gradient Vector
Property 1:
58
30
8
0.966235
c
C =
= 2 2 =
30 +8
||c||
0.257663
(25x21 +x22 =25)
1
= 4
t = (25x2s
2
+x
=25)
1
2
15
s
2
Chen CL
f
x1
f
x2
59
Steepest Descent Algorithm
Steepest Descent Direction
Let f (x) be a dierentiable function w.r.t. x. The
direction of steepest descent for f (x) at any point
is d = c
Steepest Descent Algorithm:
Step
Step
Step
Step
Step
1:
2:
3:
4:
5:
a starting design x(0), k = 0,
c(k) = f (x(k)); stop if ||c(k)|| <
d(k) = c(k)
calculate k to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k), k = k + 1 Step 2
Chen CL
60
Chen CL
61
Steepest Descent Algorithm
Notes:
Steepest Descent: Example
f (x1, x2) = x21 + x22 2x1x2
d = c c d = ||c||2 < 0
The successive directions of steepest
descent are normal to each other
Step 1: x(0) = (1, 0), k = 0, ()
Step 2: c(0) = f (x(0)) = (2x1 2x
2, 2x2 2x1)
= (2, 2); ||c(0)|| = 2 2
= 0
d = c
d(k) d(k+1) = c(k) c(k+1) = 0
Step 3: d(0) = c(0) = (2, 2)
proof:
Step 4: to minimize f (x(0) + d(0)) = f (1 2, 2)
df (x(k+1) )
d (k+1)
0 =
f (1 2, 2) = (1 2)2 + (2)2 2(1 2)(2)
= 162 8 = f ()
df ()
= 32 8 = 0 0 = 0.25
d
T
)
x(k+1)
f (x
=
x
(k)
T
(k) +d
(k+1)
(
x
)
c
d2 f ()
d2
(k+1) T
= c
d(k) = c(k+1) c(k)
(k+1)
= d
d(k)
Chen CL
Chen CL
63
Steepest Descent: Example
f (x1, x2, x3) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
x(0) = (2, 4, 10)
x = (0, 0, 0)
Step 1: k = 0, = 0.005, ( = 0.05, = 0.0001 for Golden)
4x2 + 2x1 + 2x3, 4x3 + 2x2)
Step 2: c(0) = f (x(0)) = (2x1 + 2x2,
= (12, 40, 48); ||c(0)|| = 4048 = 63.6 >
(0)
Step 3: d
= c
(0)
= (12, 40, 48)
Step 4: to minimize f (x(0) + d(0)) by Golden 0 = 0.158718
(1)
(0)
(0)
x = x + 0d = (0.0954, 2.348, 2.381)
c(1) = (4.5, 4.438, 4.828); ||c(1)|| = 7.952 >
Note: c(1) d(0) = 0 (perfect line search)
= 32 > 0
Step 5:
x(1) = x(0) + 0d(0) = (1 0.25(2), 0 + 0.25(2)) = (0.5, 0.5)
c(1) = (0, 0) stop
62
Step 5:
x(0) = (1, 0)
Steepest Descent: Example
Optimum solution for Example 5.10 with steepest descent program
NO.
x1
2.00000E + 00
x2
4.00000E + 00
x3
f (x)
c
1.00000E + 01 3.32000E + 02 1.58718E 01 6.36239E + 01
9.53870E 02
2.34871E + 00
2.38155E + 00 1.07503E + 01 3.05872E 01 7.95922E + 00
1.47384E + 00
9.90342E 01
9.04562E 01 1.05936E + 00 1.81571E 01 2.06142E + 00
1.29826E + 00
1.13477E + 00
6.07228E 01 6.73757E 01 6.49989E 01 8.13910E 01
1.08573E + 00
6.61514E 01
5.03640E 01 4.58534E 01 1.90588E 01 1.21729E + 00
9.24028E 01
7.63036E 01
3.71842E 01 3.17218E 01 5.88053E 01 5.63154E 01
7.34684E 01
4.92294E 01
3.94601E 01 2.24007E 01 1.93877E 01 8.19377E 01
6.40697E 01
5.48401E 01
2.79474E 01 1.58946E 01 5.71554E 01 3.99141E 01
5.35008E 01
3.46139E 01
2.67396E 01 1.13373E 01 1.94660E 01 5.77545E 01
10
4.61478E 01
3.89014E 01
1.93950E 01 8.09174E 02 5.69767E 01 2.84837E 01
11
3.78902E 01
2.49307E 01
1.95219E 01 5.78310E 02 1.94601E 01 4.11895E 01
12
3.28464E 01
2.78695E 01
1.40291E 01 4.13141E 02 5.72088E 01 2.03339E 01
13
2.71519E 01
1.77281E 01
1.38132E 01 2.94940E 02 1.94439E 01 2.94710E 01
14
2.34872E 01
1.98704E 01
9.96396E 02 2.10499E 02 5.72650E 01 1.45112E 01
15
1.93449E 01
1.26669E 01
9.89806E 02 1.50233E 02 1.94457E 01 2.10430E 01
16
1.67477E 01
1.41872E 01
7.12541E 02 1.07197E 02 5.71777E 01 1.03580E 01
17
1.38196E 01
9.03973E 02
7.05267E 02 7.65362E 03 1.94534E 01 1.50077E 01
18
1.19599E 01
1.01263E 01
5.08180E 02 5.46341E 03 5.70832E 01 7.39559E 02
Chen CL
64
Chen CL
Steepest Descent: Example
x1
NO.
x2
x3
f (x)
65
Steepest Descent: Disadvantages
c
19
9.86659E 02
6.46051E 02
5.03927E 02 3.90156E 03 1.94571E 01 1.07016E 01
20
8.54114E 02
7.23289E 02
3.63134E 02 2.78693E 03 5.72147E 01 5.28078E 02
21
7.04412E 02
4.60867E 02
3.59726E 02 1.98946E 03 1.94320E 01 7.65397E 02
22
6.09761E 02
5.16211E 02
2.59230E 02 1.41991E 03 5.74372E 01 3.76650E 02
23
5.02296E 02
3.28470E 02
2.56647E 02 1.01241E 03 1.94224E 01 5.46915E 02
24
4.34774E 02
3.68093E 02
1.84853E 02 7.21938E 04 5.74409E 01 2.68589E 02
25
3.58170E 02
2.34188E 02
1.83000E 02 5.14807E 04 1.94379E 01 3.90103E 02
26
3.09971E 02
2.62487E 02
1.31757E 02 3.67049E 04 5.71430E 01 1.91685E 02
27
2.55704E 02
1.67348E 02
1.30584E 02 2.62104E 04 1.94475E 01 2.77637E 02
28
2.21338E 02
1.87414E 02
9.40925E 03 1.87131E 04 5.72554E 01 1.36815E 02
29
1.82492E 02
1.19396E 02
9.32100E 03 1.33549E 04 1.94475E 01 1.98348E 02
30
1.57951E 02
1.33752E 02
6.71412E 03 9.53054E 05 5.71873E 01 9.76661E 03
31
1.30274E 02
8.52432E 03
6.65348E 03 6.80463E 05 1.94439E 01 1.41534E 02
32
1.12762E 02
9.54793E 03
4.79362E 03 4.85693E 05 5.72520E 01 6.97001E 03
33
9.29691E 03
6.08244E 03
4.74860E 03 3.46611E 05 1.94283E 01 1.01055E 02
Slow to converge, especially when approaching the optimum
a large number of iterations
Information calculated at previous iterations is NOT used,
each iteration is started independent of others
Optimum design variables : 8.04787E 03 , 6.81319E 03 , 3.42174E 03
Optimum cost function value : 2.47347E 05
Norm of gradient at optimum : 4.97071E 03
Total no. of function evaluations : 753
Chen CL
66
Chen CL
67
Scaling of Design Variables
Example:
The steepest descent method converges in only one iteration for a
positive denite quadratic function with a unit condition number
of the Hessian matrix
To accelerate the rate of convergence
scale design variables such that
condition number of new Hessian
matrix
is
unity
Min: f (x1, x2) =
25x21 + x22
50 0
H =
0 2
let x = Dy
D=
Min: f (y1, y2) = 12 y12 + y22
y 0 = ( 50, 2)
x0 = (1, 1)
1
50
0
1
2
Chen CL
68
Chen CL
69
Example:
Min: f (x1, x2) = 6x21 6x1x2 + 2x22 5x1 + 4x2 + 2
H =
12 6
6 4
1,2 = 0.7889, 15.211
(eigenvalues)
v 1,2 = (0.4718, 0.8817), (0.8817, 0.4718)
0.4718 0.8817
let x = Qy
Q = v1 v2 =
0.8817
0.4718
Min: f (y1, y2) = 0.5(0.7889y12 + 15.211y22) + 1.1678y1 + 6.2957y2 + 2
1
0
let y = Dz
D = 0.7889
1
0
15.211
Min: f (y1, y2) = 0.5(z12 + z22) + 1.3148z1 + 1.6142z2
x0 = (1, 2) z = (1.3158, 1.6142)
x = QDz = ( 31 , 23 )
Chen CL
70
Conjugate Gradient Method
Fletcher and Reeves (1964)
Steepest Descent: orthogonal at consecutive steps
converge but slow
Conjugate Gradient Method:
modify current steepest descent direction by adding a
scaled previous direction
cut diagonally through orthogonal steepest descent directions
Conjugate Gradient Directions: d(i), d(j)
orthogonal w.r.t. a symmetric and positive denite
matrix A
T
d(i) Ad(j) = 0
Chen CL
71
Conjugate Gradient Method: algorithm
Step 1: k = 0, x(0) d(0) = c(0) = f (x(0))
Stop if ||c(0)|| < , otherwise go to Step 4
Step 2: c(k) = f (x(k)), Stop if ||c(k)|| <
2
Step 3: d(k) = c(k)+k d(k1), k = ||c(k)||/||c(k1)||
Step 4: compute k = to minimize f (x(k) + d(k))
Step 5: x(k+1) = x(k) + k d(k), k = k + 1, go to Step 2
Note:
Find the minimum in n iterations for positive denite quadratic
forms having n design variables
Inexact line search, non-quadratic forms
re-started every n + 1 iterations for computational stability
(x(0) = x(n+1))
Chen CL
72
Chen CL
73
Example:
0.0956
4.31241
2.348 + 3.81268
5.57838
2.381
x(2) = x(1) + d(1) =
Min: f (x(1) + d(1)) = 0.3156
Min: f (x) = x21 + 2x22 + 2x23 + 2x1x2 + 2x2x3
c(0) = (12, 40, 48);
c(2) = (0.6238, 0.4246, 0.1926),
f (x(0)) = 332.0
Note:
(1)
= (0.0956, 2.348, 2.381)
(1)
= (4.5, 4.438, 4.828); ||c(1)|| = 7.952; f (x(1)) = 10.75
2
= ||c(1)||/||c(0)||
= [7.952/63.3]2 = 0.015633
x
c
||c(0)|| = 63.6;
x(2) = (1.4566, 1.1447, 0.6205)
x(0) = (2, 4, 10)
1
(1)
||c(2)|| = 0.7788
c(2) d(1) = 0
= c(1) + 1d(0)
4.31241
12
4.500
+ (0.015633) 40 = 3.81268
=
4.438
5.57838
48
4.828
Chen CL
74
Newton Method
A Second-order Method
x : current estimate of x
x x + x (desired)
f (x + x) = f (x) + cT x + 12 xT Hx
NC:
f
x
= c + Hx = 0
x = H 1c
x = H 1c
(modied)
Chen CL
75
Steps: (modied)
Step 1: k = 0; c(0);
(k)
x(k)) , i = 1 n; Stop if ||c(k)|| <
Step 2: ci = f (x
i
2
f
Step 3: H(x(k)) = xix
j
Step 4: d(k) = H 1c(k)
or
Hd(k) = c(k)
Note:for computational eciency, a system of linear simultaneous eqns
is solved instead of evaluating the inverse of Hessian
Step 5: compute k = to minimize f (x(k) + d(k))
Step 6: x(k+1) = x(k) + d(k), k = k + 1, go to Step 2
Note: unless H is positive denite,
d(k) will not be that of descent for f
T
1 (k)
H > 0 c(k) d(k) = k c(k) H
c < 0
> 0 for positive H
Chen CL
76
Chen CL
Example:
f (x) = 3x21 + 2x1x2 + 2x22 + 7
x(0) = (5, 10);
= 0.0001
c(0) = (6x1 + 2x2, 2x1 + 4x2) = (50, 50);
6 2
,
2 4
H (0)
(1)
=
||c(0)|| = 50 2
c(1) =
4 2
2 6
4 2 50
5
(0)
1 (0)
1
d
= H c = 20
=
2 6 50
10
5
5
5
5
(0)
x(1) = x(0) + d =
+
=
10
10
10 10
H (0) =
df
d
77
5 5
10 10
50 50
50 50
0
=
0
0
=
0
1
= 20
= 0 or f (x(1)) d(0) = 0
6(5
5)
+
2(10
10)
50
50
f (x(1)) =
=
2(5 5) + 4(10 10)
50 50
5
f (x(1)) d(0) = 50 50 50 50
10
= 5(50 50) 10(50 50) = 0 = 1
Chen CL
78
Example:
f (x) = 10x41 20x21x2 + 10x22 + x21 2x1 + 5,
x(0) = (1, 3)
c = f (x) = (40x31 40x1x2 + 2x1 2, 20x21 + 20x2)
H = f (x) =
120x21 40x2 + 2 40x1
40x1
20
Chen CL
79
Chen CL
80
Comparison of Steepest Descent,
Conjugate Gradient Methods
f (x) = 50(x2 x21)2 + (2 x1)2
Chen CL
Chen CL
81
Chen CL
83
Newton,
x(0) = (5, 5) x = (2, 4)
82
Newton Method
Advantage: quadratic convergent rate
Disadvantages:
Calculation of second-order derivatives at each iteration
A system of simultaneous linear equations needs to be solved
Hessian of the function may be singular at some iterations
Memoryless method: each iteration is started afresh
Not convergent unless Hessian remains positive denite and a
step size determination scheme is used
Chen CL
84
Chen CL
Quasi-Newton Methods
Marquardt Modication (1963)
(k)
Steepest Descent:
1 (k)
= (H + I) c
Use only 1st-order information poor rate of convergence
Each iteration is started with new design variables without using
any information from previous iterations
Far away solution point use Steepest Descent
Near the solution point use Newton Method
Step 1: k = 0; c
Step 2:
Step 3:
(0)
Newton Method:
; ; (= 10000) (large)
x(k)) , i = 1 n; Stop
= f (x
i 2
f
(k)
)
H(x = xix
j
(k)
d = (H + k I)1c(k)
(k)
(k)
(k)
(k)
ci
Use 2nd-order derivatives quadratic convergence rate
if ||c(k)|| <
2nd-order derivatives !
Requires calculation of n(n+1)
2
DIculties if Hessian is singular
Not learning processes
Step 4:
Step 5: if f (x + d ) < f (x ), go to Step 6
Otherwise, let k = 2k and go to Step 4
Step 6: Set k+1 = 0.5k , k = k + 1 and go to Step 2
Chen CL
85
86
Quasi-Newton Methods
Quasi Newton Methods, Update Methods:
Use rst-order derivatives to generate approximations for Hessian
combine desirable features of both steepest descent and Newtons
methods
Use information from previous iterations to speed up convergence
(learning processes)
Several ways to approximate (updated) Hessian or its inverse
Preserve properties of symmetry and positive deniteness
Chen CL
87
Davidon-Fletcher-Powell (DFP) Method
Davidon (1959), Fletcher and Powell (1963)
To approximate Hessian inverse using only rst
derivatives
x = H 1c
Ac
A :
nd A by using only 1st-order information
Chen CL
88
Step
Step
Step
Step
Step
Step
1:
2:
3:
4:
5:
6:
Matrix A(k) is always positive denite
always converge to a local minimum if > 0
T
d
(k)
(k)
= c(k) A(k)c(k) < 0
f (x + d )
d
=0
k = 0; c(0), ; A(0)(= I, H 1)
c(k) = f (x(k)), Stop if ||c(k)|| <
d(k) = A(k)c(k)
compute k = to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k)
update A(k)
When applied to a positive denite quadratic form,
A(k) converges to inverse of Hessian of the quadratic form
A(k+1) = A(k) + B (k) + C (k)
(k) (k) T
(k) (k) T
B (k) = ss(k)sy (k)
C (k) = yz(k)zz (k)
s(k) = k d(k)
(change in design)
y (k) = c(k+1) c(k)
c
(k+1)
= f (x
(k+1)
89
DFP Properties:
A H 1
DFP Procedures:
Chen CL
(change in gradient)
z (k) = A(k)y (k)
Step 7: set k = k + 1 and go to Step 2
Chen CL
90
Chen CL
91
DFP Example:
f (x) = 5x21 + 2x1x2 + x22 + 7
1-1.
x(0) = (1, 2);
c
1-2.
||c
A(0) = I;
x(0) = (1, 2)
k = 0, = 0.001
(0)
(0)
= (10x1 + 2x2, 2x1 + 2x2) = (14, 6),
|| =
142 + 62 = 15.232 >
1-3.
d(0) = c(0) = (14, 6)
1-4.
x(1) = x(0) + d(0) = (1 14, 2 6)
f (x(1)) = f () = 5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7
df
d
= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)
+2(6)(2 6) = 0
= 0.0988,
1-5.
(1)
= x
(0)
d2f
d2
(0)
+ 0d
= 2348 > 0
= (1 14, 2 6) = (0.386, 1.407)
1-6.
s(0) = 0d(0) = (1.386, 0.593),
c(1) = (1.406, 2.042)
y (0) = c(1) c(0) = (15.046, 3.958),
s(0) y (0) = 23.20,
1.921
T
s(0)s(0) =
0.822
0.0828
B (0) =
0.0354
A(1)
z (0) = y (0)
y (0) z (0) = 242.05
0.822
226.40 59.55
T
z (0)z (0) =
0.352
59.55 15.67
0.935 0.246
0.0354
C (0) =
0.246 0.065
0.0152
0.148 0.211
= A(0) + B (0) + C (0) =
0.211 0.950
Chen CL
92
Chen CL
93
s(1) = 1d(1) = (0.455, 1.334),
2-6.
c(2) = (0.836, 0.284)
y (1) = c(2) c(1) = (1.882, 1.758)
z (1) = A(1)y (1) = (0.649, 2.067)
2-2. ||c(1)|| =
s(1) y (1) = 3.201,
y (1) z (1) = 4.855
0.207 0.607
0.421 1.341
T
z (1)z (1)T =
s(1)s(1) =
0.607 1.780
1.341 4.272
.0647
0.19
.0867
0.276
C (1) =
B (1) =
0.19 0.556
0.276 0.880
0.126 0.125
A(2) = A(1) + B (1) + C (1) =
0.125 0.626
1.0462 + 2.0422 = 2.29 >
2-3.
d(1) = A(1)c(1) = (0.586, 1.719)
2-4.
x(2) = x(1) + d(1)
1 = 0.776 (minimize f (x(1) + d(1)))
2-5.
x(2) = x(1) + d(1) = (0.386, 1.407) + (0.455, 1.334)
= (0.069, 0.073)
Chen CL
94
Broyden-Fletcher-Goldfarb-Shanno (BFGS)
Method
Direct update Hessian using only rst derivatives
x = H 1c
Hx = c
Ax c
A :
nd A by using only 1st-order information
Chen CL
95
BFGS Procedures:
Step
Step
Step
Step
Step
Step
1:
2:
3:
4:
5:
6:
k = 0; c(0), ; H (0)(= I, H)
c(k) = f (x(k)), Stop if ||c(k)|| <
solve H (k)d(k) = c(k) to obtain d(k)
compute k = to minimize f (x(k) + d(k))
x(k+1) = x(k) + k d(k)
update H (k)
H (k+1) = H (k) + D (k) + E (k)
T
(k) (k) T
y (k)y (k)
D (k) = y (k)s(k)
E (k) = c (k)c (k)
c d
s(k) = k d(k)
(change in design)
y (k) = c(k+1) c(k)
(change in gradient)
c(k+1) = f (x(k+1))
Step 7: set k = k + 1 and go to Step 2
Chen CL
96
Chen CL
97
BFGS Example:
f (x) = 5x21 + 2x1x2 + x22 + 7
H (0) = I;
x(0) = (1, 2);
1-1.
x(0) = (1, 2)
k = 0, = 0.001
= (10x1 + 2x2, 2x1 + 2x2) = (14, 6),
||c(0)|| =
142 + 62 = 15.232 >
c
1-2.
(0)
1-4.
x(1) = x(0) + d(0) = (1 14, 2 6)
= 5(2)(14)(1 14) + 2(14)(2 6) + 2(6)(1 14)
+2(6)(2 6) = 0
= 0.0988,
d2f
d2
(0)
x(1) = x(0) + 0d
1-5.
= 2348 > 0
H (1)
= (1 14, 2 6) = (0.386, 1.407)
Chen CL
98
c(0) d(0) = 232.0
59.55
196
84
T
c(0)c(0) =
15.67
84 36
2.567
0.845 0.362
E (0) =
0.675
0.362 0.155
9.915
2.205
= H (0) + D (0) + E (0) =
2.205 0.520
y (0) s(0) = 23.20,
226.40
T
y (0)y (0) =
59.55
9.760
D (0) =
2.567
f (x(1)) = f () = 5(1 14)2 + 2(1 14)(2 6) + 2(2 6)2 + 7
df
d
c(1) = (1.406, 2.042)
y (0) = c(1) c(0) = (15.046, 3.958)
= c(0) = (14, 6)
1-3.
s(0) = 0d(0) = (1.386, 0.593),
1-6.
(0)
Chen CL
2-6.
99
s(1) = 1d(1) = (0.317, 1.417),
c(2) = (0.706, 0.157)
y (1) = c(2) c(1) = (0.340, 2.199)
2-2. ||c(1)|| =
2-3.
2-4.
1.0462 + 2.0422 = 2.29 >
c(1) = H (1)d(1) d(1) = (17.20, 76.77)
x(2) = x(1) + d(1)
1 = 0.018455 (minimize f (x(1) + d(1)))
2-5.
x(2) = x(1) + 1d(1) = (0.0686, 0.0098)
y (1) s(1) = 3.224,
c(1) d(1) = 174.76
0.1156 0.748
1.094 2.136
T
c(1)c(1)T =
y (1)y (1) =
0.748 4.836
2.136 4.170
.0063 .0122
0.036 0.232
E (1) =
D (1) =
.0122 .0239
0.232 1.500
9.945
1.985
H (2) = H (1) + D (1) + E (1) =
1.985 1.996