FUNDAMENTALS OF
OPTIMIZATION
FUNDAMENTALS OF OPTIMIZATION
Week 2: Convex Optimization
3
Outline
1. Definitions
• Convex set
• Convex combination
• Convex function
• Exercises
2. Unconstrained Optimization
3. Constrained Optimization
4
Convex set
• A set of points 𝐶 ⊆ ℝ𝑛 is convex if for every pair of points 𝑥, 𝑦 ∈ 𝐶, any
point on the line segment between 𝑥 and 𝑦 is also in 𝐶
• That is, if 𝑥, 𝑦 ∈ 𝐶, then 𝑡𝑥 + 1 − 𝑡 𝑦 ∈ 𝐶 for all 𝑡 ∈ [0,1]
Convex Set Not a Convex Set
5
Examples of convex set
• Line Segments: 𝐶 = 𝑥 + 𝑡 𝑦 − 𝑥 𝑡 ∈ [0,1]} for some x, 𝑦 ∈ ℝ𝑛
𝑥
• Lines, planes, hyperplanes, etc. also define convex sets
6
Examples of convex set
• Half spaces: 𝐶 = 𝑥 ∈ ℝ𝑛 𝑤 𝑇 𝑥 + 𝑏 ≤ 0} for some 𝑤 ∈ ℝ𝑛 and 𝑏 ∈ ℝ
• Balls of Radius 𝝐: 𝐶 = {𝑥 ∈ ℝ𝑛 | 𝑥 2 ≤ 𝜖} for some 𝜖 ≥ 0 ∈ ℝ
𝑛
𝑥 2 = 𝑥𝑇𝑥 = 𝑥𝑖2
𝑖=1
• This is called the Euclidean norm or Euclidean distance because 𝑥 − 𝑦 2 is
equal to the length of the line segment between the points 𝑥 and 𝑦
7
Convex combination
• We say that 𝑦 ∈ ℝ𝑛 is a convex combination of the points 𝑥 (1) , … , 𝑥 𝑘 ∈
ℝ𝑛 if 𝑦 = 𝜆1 𝑥 (1) + ⋯ + 𝜆𝑘 𝑥 𝑘 for some choice of 𝜆1 , … , 𝜆𝑘 ∈ [0,1] such
that 𝜆1 + ⋯ + 𝜆𝑘 = 1
• Let 𝐶 be the set of all points 𝑦 that can be obtained as a convex
combination of the 𝑥 (1) , … , 𝑥 (𝑘)
• In the special case 𝑘 = 2, 𝐶 is just a line segment
• 𝐶 is a convex set called the convex hull of the 𝑥 (1) , … , 𝑥 (𝑘)
8
Convex function
• A function 𝑓: 𝐶 → ℝ is convex if 𝐶 is a convex set and
𝑓 𝑡𝑥 + 1 − 𝑡 𝑦 ≤ 𝑡𝑓 𝑥 + 1 − 𝑡 𝑓(𝑦)
for all 𝑥, 𝑦 ∈ 𝐶 and 𝑡 ∈ 0,1
• 𝑓 is called concave if −𝑓 is convex
9
Exercises
1. 𝑆 = 𝑥, 𝑦 0 ≤ 𝑥 ≤ 𝑎, 0 ≤ 𝑦 ≤ 𝑏} is a convex set ?
2. Union of two convex sets is a convex set ?
3. 𝑆 = 𝑥 ∈ 𝑅𝑛 ||𝑥 − 𝑎|| ≤ 𝑟} is a convex set ?
4. 𝑆 = 𝑥, 𝑦, 𝑧 𝑧 ≥ 𝑥 2 + 𝑦 2 } is a convex set ?
10
Exercises
𝑆= 𝑥, 𝑦 0 ≤ 𝑥 ≤ 𝑎, 0 ≤ 𝑦 ≤ 𝑏} is a convex set ?
11
Exercises
Union of two convex sets is a convex set ?
12
Convex properties
• Nonnegative weighted sums of convex functions are convex, i.e., if
𝑓1 : ℝ𝑛 → ℝ and 𝑓2 : ℝ𝑛 → ℝ are convex functions and 𝑐1 , 𝑐2 ≥ 0, then
𝑔 𝑥 = 𝑐1 𝑓1 𝑥 + 𝑐2 𝑓2 (𝑥)
is a convex function.
• Pointwise maximum of convex functions are convex, i.e., if 𝑓1 : ℝ𝑛 → ℝ
and 𝑓2 : ℝ𝑛 → ℝ are convex functions, then
𝑔 𝑥 = max 𝑓1 𝑥 , 𝑓2 (𝑥)
is a convex function.
16
Convex properties
• A differentiable function 𝑓: ℝ𝑛 → ℝ is convex on a convex set 𝐶 if and
only if
𝑓 𝑥 ≥ 𝑓 𝑦 + ∇𝑓 𝑦 𝑇 𝑥−𝑦
for all 𝑥, 𝑦 ∈ 𝐶
17
Exercises
• Which of the following functions are convex?
• exp 𝑥
• exp(−𝑥)
• log(𝑥)
• sin(𝑥)
• 𝑥2
• 𝑥8
• max(𝑥, 0)
• 𝑥
• |𝑥|
18
Exercises
• Which of the following functions are convex?
• exp 𝑥
• exp(−𝑥)
• log(𝑥)
• sin(𝑥)
• 𝑥2
• 𝑥8
• max(𝑥, 0)
• 𝑥
• |𝑥|
19
Outline
1. Definitions
2. Unconstrained Optimization
• Introduction to unconstrained optimization
• Descent method
• Gradient descent method
• Newton method
3. Constrained Optimization
20
Unconstrained convex optimization
• Unconstrained, smooth convex optimization problem:
min f(x)
• f: Rn → R is convex and twice differentiable
• dom f = R: no constraint
• Assumption: the problem is solvable with f* = minx f(x) and x* = argMinx f(x)
• To find x, solve equation f(x*) = 0: not easy to solve analytically
• Iterative scheme is preferred: compute minimizing sequence x(0), x(1), … s.t. f(x(k)) →
f(x*) as k →
• The algorithm stops at some point x(k) when the error is within acceptable
tolerance: f(x(k)) – f* ≤
21
Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤ ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn
f(x) = ex has no minimizer f(x) = -x + e-x has no minimizer
22
Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤ ( > 0 is a
constant)
• x* is a global minimizer for f: Rn → R if f(x*) ≤ f(x) for all x Rn
f(x) = ex + e-x - 3x2 + x has two local f(x) = ex + e-x - 3x2 has two global minimizers
minimizers and one global minimizer
23
Local minimizer
• x* is a local minimizer for f: Rn → R if f(x*) ≤ f(x) for ||x*-x|| ≤ ( > 0 is a
constant)
• Theorem (Necessary condition for local minimum) If x* is a local minimizer
for f: Rn → R, then f(x*) = 0 (x* is also called stationary point for f)
f(x) = ex + e-x - 3x2 + x has two local f(x) = ex + e-x - 3x2 has two global minimizers
minimizers and one global minimizer
24
Local minimizer
Example
• f(x, y) = x2 + y2 - 2xy + x
• f(x, y) = 2x - 2y + 1 = 0 has no solution
2y - 2x
→ there is no minimizer of f(x, y)
25
Local minimizer
• Theorem (Necessary condition for local minimum) If x* is a local minimizer
for f: Rn → R, then f(x*) = 0 (x* is also called stationary point for f)
• Theorem (Sufficient condition for a local minimum) Assume x* is a stationary
point and that 2f(x*) is positive definite, then x* is a local minimizer
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
. . .
𝑥1𝑥1 𝑥1𝑥2 𝑥1𝑥𝑛
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
• 2f(x) = . . .
𝑥2𝑥1 𝑥2𝑥2 𝑥2𝑥𝑛
. . .
2𝑓(𝑥) 2𝑓(𝑥) 2𝑓(𝑥)
. . .
𝑥𝑛𝑥1 𝑥𝑛 𝑥2 𝑥𝑛 𝑥𝑛
26
Local minimizer
• Matrix Anxn is called positive definite if
a1,1 a1,2 . . . a1,i
a2,1 a2,2 . . . a2,i
Ai = ..... , det(A i) > 0, i = 1,…,n
ai,1 . . ai,2 . . . ai,i
27
Examples
2 2
Example f(x,y) = 𝑒 𝑥 +𝑦
𝑥2+𝑦2
f(x) = 2x𝑒 = 0 has unique solution x* = (0,0)
𝑥2+𝑦2
2y𝑒
2 0
2f(x*) = > 0 → (0,0) is a minimizer of f
0 2
28
Examples
• Example f(x,y) = x2 + y2 - 2xy - x
-2x + 2y + 1
f(x) = -2x -2y =0
has unique solution x* = (-1/4,1/4)
-2 2
2f(x*) = is not positive definite
-2 -2
→ cannot conclude x*
29
Descent method
Determine starting point x(0) Rn;
k 0;
While (stop condition not reach){
Determine a search direction pk Rn;
Determine a step size k > 0 s.t. f(x(k) + kpk) < f(x(k));
x(k+1) x(k) + kpk;
k k+1;
}
Stop condition may be
• ||f(xk)|| ≤
• ||xk+1 - xk|| ≤
• k > K (maximum number of iterations)
30
Gradient descent method
• Gradient descent schema
x(k) = x(k-1) - kf(x(k-1))
init x(0);
k = 1;
while stop condition not reach {
specify constant k;
x(k) = x(k-1) - kf(x(k-1));
k = k + 1;
}
𝑓
• k might be specified in such a way that f(x(k-1) - kf(x(k-1))) is minimized: =0
𝑘
31
Minimize 𝒇 𝒙 = 𝒙𝟒 − 𝟐𝒙𝟑 − 𝟔𝟒𝒙𝟐 + 𝟐𝒙 + 𝟔𝟑
32
Minimize 𝒇 𝒙 = 𝒙𝟒 + 𝟑𝒙𝟐 − 𝟏𝟎 𝒙 + 𝟒
33
Minimize 𝒇 𝒙 = 𝒙𝟐 + 𝟓𝒔𝒊𝒏(𝒙)
34
Minimize 𝒇 𝒙, 𝒚 = 𝒙𝟐 + 𝒚𝟐 + 𝒙𝒚 − 𝒙 − 𝒚
35
Minimize 𝒇 𝒙 = 𝒙𝟐𝟏 + 𝒙𝟐𝟐 + 𝒙𝟐𝟑 − 𝒙𝟏 𝒙𝟐 − 𝒙𝟐 𝒙𝟑 + 𝒙𝟏 + 𝒙𝟑
• k might be specified in such a way that f(x(k-1) - kf(x(k-1))) is
𝑓
minimized: =0
𝑘
36
Newton method
• Second-order Taylor approximation g of f at x is
1 2
f(x+h) g(x + h) = f(x) + hf(x) + h 2f(x)
2
• Which is a convex quadratic function of h
𝑔
• g(x+h) is minimized when = 0 → h = - 2f(x)-1f(x)
ℎ
Generate x(0); // starting point
k = 0;
while stop condition not reach{
x(k+1) x(k) - 2f(x(k))-1f(x(k));
k = k + 1;
}
37
Minimize 𝒇 𝒙 = 𝒙𝟐𝟏 + 𝒙𝟐𝟐 + 𝒙𝟐𝟑 − 𝒙𝟏 𝒙𝟐 − 𝒙𝟐 𝒙𝟑 + 𝒙𝟏 + 𝒙𝟑
import numpy as np
def newton(f,df,Hf,x0):
x = x0
for i in range(10):
iH = np.linalg.inv(Hf(x))
D = np.array(df(x)).T #transpose matrix: convert from list to
#column vector
print('df = ',D)
y = iH.dot(D) #multiply two matrices
if np.linalg.norm(y) == 0:
break
x = x - y
print('Step ',i,': ',x,' f = ',f(x))
38
Minimize 𝒇 𝒙 = 𝒙𝟐𝟏 + 𝒙𝟐𝟐 + 𝒙𝟐𝟑 − 𝒙𝟏 𝒙𝟐 − 𝒙𝟐 𝒙𝟑 + 𝒙𝟏 + 𝒙𝟑
def main():
print('main start....')
f = lambda x: x[0] ** 2 + x[1] ** 2 + x[2] ** 2 - x[0] * x[1] - x[1] *
x[2] + x[0] + x[2] # function f to be minimized
df = lambda x: [2 * x[0] + 1 - x[1], -x[0] + 2 * x[1] - x[2], -x[1] + 2
* x[2] + 1] # gradient
Hf = lambda x: [[2,-1,0],[-1,2,-1],[0,-1,2]]# Hessian
x0 = np.array([0,0,0]).T
newton(f,df,Hf,x0)
if __name__ == '__main__':
main()
39
Outline
1. Definitions
2. Unconstrained Optimization
3. Constrained Optimization
• General constrained optimization problem
• Lagrange multiplier method
40
General constrained optimization problem
• Optimization problem in the standard form
(P) 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥)
s.t. 𝑔𝑖 𝑥 = 0, ∀𝑖 ∈ {1, 2, … , 𝑚}
𝑥 ∈ 𝑋 𝑅𝑛
with 𝑥 ∈ 𝑅𝑛, and assume 𝐷 = (𝑖=1 𝑚
𝒅𝒐𝒎 𝑔𝑖) is not empty.
• Denote 𝑓 ∗ the optimal value of 𝑓(𝑥)
• If 𝑓, 𝑔𝑖 (𝑖 = 1,2, … , 𝑚) are convex functions.
41
Lagrangian function
• Optimization problem in the standard form
(P) 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥)
s.t. 𝑔𝑖 𝑥 = 0, ∀𝑖 ∈ {1, 2, … , 𝑚}
𝑥 ∈ 𝑋 𝑅𝑛
• Langragian function of the above problem is defined as follows, 𝐿: 𝑅𝑛 ×
𝑅𝑚 → 𝑅 𝑚
𝐿 𝑥, 𝜆 = 𝑓 𝑥 + 𝜆𝑖 × 𝑔𝑖 (𝑥)
𝑖=1
42
Lagrangian function - properties
• Optimization problem in the standard form
(P) 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓(𝑥)
s.t. 𝑔𝑖 𝑥 = 0, ∀𝑖 ∈ {1, 2, … , 𝑚}
𝑥 ∈ 𝑋 𝑅𝑛
• Langragian function of the above problem is defined as follows, 𝐿: 𝑅𝑛 ×
𝑅𝑚 → 𝑅 𝑚
𝐿 𝑥, 𝜆 = 𝑓 𝑥 + 𝜆𝑖 × 𝑔𝑖 (𝑥)
𝑖=1
• Theorem: The optimal value of the optimization problem is the following
property: ∇𝑥, 𝜆 𝐿 𝑥, 𝜆 = 0
43
Lagrangian multiplier method
• The method of Lagrange multipliers is a technique in mathematics to find the local
maxima or minima of a function 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) subject to constraints
𝑔𝑖 𝑥1 , … , 𝑥𝑛 = 0, ∀𝑖 ∈ {1, … , 𝑚}.
• Method:
• Step 1: Solving the following system 𝑚
∇𝑓 𝑥1 , … , 𝑥𝑛 = 𝜆𝑖 𝑔𝑖 𝑥1 , … , 𝑥𝑛
𝑖=1
𝑔1 𝑥1 , … , 𝑥𝑛 =0
…
𝑔𝑚 𝑥1 , … , 𝑥𝑛 = 0
• Step 2: we get particular values of 𝑥1 , 𝑥2 , … , 𝑥𝑛 , which can be plugged in 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) to get
the extremum value if it exists.
44
Example 1:
• Problem: Find the maximum and minimum of 𝑓(𝑥, 𝑦) = 5𝑥 − 3𝑦 subject to the
constraint 𝑥 2 + 𝑦 2 = 136.
• Solution:
• Region of possible solutions lies on a disk of radius 136 which is a closed and bounded
region, − 136 ≤ 𝑥, 𝑦 ≤ 136
• The Lagrangian function 𝐿 𝑥, 𝑦, 𝜆 = 5𝑥 − 3𝑦 − 𝜆(𝑥 2 + 𝑦 2 − 136)
• As the result of the theorem, we have
• Notice that, we can’t have 𝜆 = 0 since that would not satisfy the first two equations. So, since
we know that 𝜆 ≠ 0 we can solve the first two equations for 𝑥 and 𝑦 respectively. This gives,
45
Example 1
• Problem: Find the maximum and minimum of 𝑓(𝑥, 𝑦) = 5𝑥 − 3𝑦 subject to the
constraint 𝑥 2 + 𝑦 2 = 136.
• Solution:
• Plugging these into the constraint gives
• We can solve this for 𝜆
• Now, that we know 𝜆 we can find the points that will be potential maximums and/or minimums.
1
• If 𝜆 = − we get 𝑥 = −10, 𝑦 = 6
4
1
• if 𝜆 = we get 𝑥 = 10, 𝑦 = −6
4
• So,
46
Example 2:
• Problem:
• Objective function: 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥, 𝑦 = 𝑥𝑦
• Constraint: 𝑔 𝑥, 𝑦 = 10𝑥 + 20𝑦 − 400 = 0
• Solution
• Form the Lagrange function:
𝐿 𝑥, 𝑦, 𝜇 = 𝑓 𝑥, 𝑦 − 𝜇 𝑔 𝑥, 𝑦
𝐿 𝑥, 𝑦, 𝜇 = 𝑥𝑦 − 𝜇(10𝑥 + 20𝑦 − 400)
• Set each first order partial derivative equal to zero:
47
Example 2:
• Problem:
• Objective function: 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑢 𝑥, 𝑦 = 𝑥𝑦
• Constraint: 𝑔 𝑥, 𝑦 = 10𝑥 + 20 − 400 = 0
• Solution:
• Set each first order partial derivative equal to zero:
• So,
48
Example 3:
• Problem:
• Objective function: 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥, 𝑦 = 𝑥 + 𝑦
• Constraint: 𝑔 𝑥, 𝑦 = 𝑥 2 + 𝑦 2 − 2 = 0
• Solution:
• Form the Lagrange function:
• Set each first order partial derivative equal to zero:
𝜕𝐿
• = 1 + 2𝜆𝑥 = 0
𝜕𝑥
𝜕𝐿
• = 1 + 2𝜆𝑦 = 0
𝜕𝑦
𝜕𝐿
• = 𝑥2 + 𝑦2 − 2 = 0
𝜕𝜆
49
Example 3
• Problem:
• Objective function: 𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥, 𝑦 = 𝑥 + 𝑦
• Constraint: 𝑔 𝑥, 𝑦 = 𝑥 2 + 𝑦 2 − 2 = 0
• Solution:
• Set each first order partial derivative equal to zero:
𝜕𝐿
• = 1 + 2𝜆𝑥 = 0
𝜕𝑥
𝜕𝐿
• = 1 + 2𝜆𝑦 = 0
𝜕𝑦
𝜕𝐿
• = 𝑥2 + 𝑦2 − 2 = 0
𝜕𝜆
• We have , 𝑥, 𝑦 ∈ 1,1 , −1, −1
• So, (𝑥, 𝑦) = (1,1)
50
THANK YOU !
51