Introduction to constrained optimization
Gilles Gasso
INSA Rouen - ASI Departement
Laboratory LITIS
September 19, 2019
Gilles Gasso Introduction to constrained optimization 1 / 26
Plan
1 Introduction
2 Formulation
3 Concept of Lagrangian and duality, condition of optimality
Concept of Lagrangian
Concept of duality
4 QP Problem
Gilles Gasso Introduction to constrained optimization 2 / 26
Introduction
Constrained optimizing problems
Where do these problems come from?
What is the connection with Machine Learning?
How to formulate them ?
Gilles Gasso Introduction to constrained optimization 3 / 26
Introduction
Example 1: sparse Regression
Output to be predicted: y ∈ R
Input variables: x ∈ Rd
Linear model: f (x) = x> θ
θ ∈ Rd : parameters of the model
Determination of a sparse θ
Minimization of square error
Only a few paramters are non-zero
PN
min 12 i=1 (yi − x>
i θ)
2
θ∈Rd
s.c. kθkp ≤ k
http://www.ds100.org/sp17/assets/notebooks/
Pd linear_regression/Regularization.html
with kθkpp = j=1 |θj |
p
Gilles Gasso Introduction to constrained optimization 4 / 26
Introduction
Example 2: where to settle the firehouse?
Caserne Maison
House Mi : defined by its coordinates
zi = [xi yi ]>
Let θ be the coordinates of the
?
firehouse
Minimize the distance from the
firehouse to the farthest house
Problem formulation Equivalent problem
min 2t
t∈R,θ∈R
min max kθ − zi k2 s.c. kθ − zi k2 ≤ t ∀ i = 1, · · · , N
θ i=1,··· ,N
Gilles Gasso Introduction to constrained optimization 5 / 26
Introduction
Third example: Support vector machine (SVM)
D = {(xi , yi ) ∈ Rd × {−1, 1}}N
i=1 linearly separable points
Goal: find a function f (x) = θ > x + b that correctly predicts the class
of each xi while maximizing the margin between positives and
negatives
minθ∈Rd ,b∈R 12 kθk2 maximisation of the margin
>
s.t. yi (θ xi + b) ≥ 1 ∀i = 1, · · · , N correct classification
6 f(x) = -1 f(x)= 1
5
class 1
4 class 2 f(x) = 0
3
1 Margin= 2
w
0
−1
−2 −1 0 1 2 3 4 5 6 7
Gilles Gasso Introduction to constrained optimization 6 / 26
Introduction
SVM example: where are the constraints?
minθ,b 12 kθk2 function J(θ) to be minimized
>
s.t. yi (θ xi + b) ≥ 1 ∀i = 1, · · · , N Contraints to be satisfied
Contraints
How many constraints ?
N constraints yi (θ > xi + b) ≥ 1 with i = 1, · · · , N
Type of constraints ?
Inequality constraints
Goal
Find the minimum θ ∗ of J(θ) such as every constraint being satisfied
Gilles Gasso Introduction to constrained optimization 7 / 26
Formulation
Constrained optimization
Elements of the problem
θ ∈ Rd : vector of unknown real parameters
J : Rd → R : the function to be minimized on its domain domJ
fi and gj are differentiable functions of Rd on R
Formulation of the primal problem P
minθ∈Rd J(θ) function to be minimized
s.c. fi (θ) = 0 ∀i = 1, · · · , n n equality Constraints
gj (θ) ≤ 0 ∀j = 1, · · · , m m inequality Constraints
Feasibility
Let p ∗ = minθ {J(θ) such that fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j}
If p ∗ = ∞ then the problem does not admit a feasible solution
Gilles Gasso Introduction to constrained optimization 8 / 26
Formulation
Characterization of the solutions
Feasibility domain
The feasible domain is defined by the set of constraints
n o
Ω(θ) = θ ∈ Rd ; fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j
Feasible points
θ 0 is feasible if θ 0 ∈ domJ and θ 0 ∈ Ω(θ) ie θ0 fulfills all the
constraints and J(θ0 ) has a finite value
θ ∗ is a global solution of the problem if θ ∗ is a feasible solution such
that J(θ∗ ) ≤ J(θ) for every θ
θ̂ is a local optimal solution if θ̂ is feasible and J(θ̂) ≤ J(θ) for every
kθ − θ̂k ≤
Gilles Gasso Introduction to constrained optimization 9 / 26
Formulation
Example 1
5.75
5 −5
10
0
5.7
6
10
0.9θ12
0
− 0.74θ1 θ2
−1
min
−5
0
−5
4
−10
θ
5.75
2
5
0
5.7
10
+0.75θ12 − 5.4θ1 − 1.2θ2 −1
0
10
−5
−5
0
0
s.c. −4 ≤ θ1 ≤ −1
0
5
−2 5.7
5
5.7 10
10
J(θ) = c
−4
−3 ≤ θ2 ≤ 4 −5 0 5
Domaine Ω
10
θ
Parameters: θ = 1
θ2
Objective function:
J(θ) = 0.9θ12 − 0.74θ1 θ2 + 0.75θ12 − 5.4θ1 − 1.2θ2
Feasibility domain (four inequality constraints) :
Ω(θ) = θ ∈ R2 ; −4 ≤ θ 1 ≤ −1 and − 3 ≤ θ 2 ≤ 4
Gilles Gasso Introduction to constrained optimization 10 / 26
Formulation
Example 2
2.5
1
∇ h(θ*) =
2 T
2 0 3 (1, 1)
4
1.5 ∇ J(θ*) = (1, 1)T
Example 1 −1
0
1
θ* = (1, 1)T 2 3
0.5
θ2
min θ1 + θ2 0 −2
θ∈R2 −1
0
1
2
−0.5
∇ J(θ*) = (1, 1)T
s.c. θ12 + θ22 −2=0 −1 *
θ = (−1, −1)
T
−3
−2 1
−1
−1.5 0
∇ h(θ*) = (1, 1)T
−2
−2 −1 0 1 2
θ1
An equality constraint
Domain of feasibility: a circle with center at 0 and diameter equals to 2
>
The optimal solution is obtained for θ ∗ = −1 −1 and we have
J(θ ∗ ) = −2
Gilles Gasso Introduction to constrained optimization 11 / 26
Formulation
Optimality
How to assess a solution of the primal problem?
Do we have optimality conditions similar to those of unconstrained
optimization?
Gilles Gasso Introduction to constrained optimization 12 / 26
Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Notion of Lagrangian
Primal problem P
minθ∈Rd J(θ)
fi (θ) = 0 ∀i = 1, · · · , n n equality constraints
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m m inequality constraints
θ is called primal variable
Principle of Lagrangian
Each constraint is associated to a scalar parameter called Lagrange
multiplier
Equality constraint fi (θ) = 0 : we associate λi ∈ R
Inequality constraint gj (θ) ≤ 0 : we associate µj ≥ 0a
Lagrangian allows to transform the problem with constraints into a problem
without constraints with additional variables: λi and µj .
a
Beware of the sense of the inequality gj (θ) ≤ 0
Gilles Gasso Introduction to constrained optimization 13 / 26
Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Lagrangian formulation
Associated Lagrange parameters
minθ∈Rd J(θ) None
fi (θ) = 0 ∀i = 1, · · · , n λi any real number ∀i = 1, · · · , n
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m µj ≥ 0 ∀j = 1, · · · , m
Lagrangien
The Lagrangian is defined by :
n
X m
X
L(θ, λ, µ) = J(θ)+ λi fi (θ)+ µj gj (θ) avec µj ≥ 0, ∀j = 1, · · · , m
i=1 j=1
Parameters λi , i = 1, · · · , n and µj , j = 1, · · · , m are called dual
variables
Dual variables are unknown parameters to be determined
Gilles Gasso Introduction to constrained optimization 14 / 26
Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Examples
Example 1
min θ1 + θ2
θ∈R2
s.c. θ12 + 2θ22 − 2 ≤ 0 inequality constraint
s.c. θ2 ≥ 0 inequality constraint (mind the type of inequality)
Lagrangian : L(λ, θ) = (θ1 + θ2 ) + µ1 (θ12 + 2θ22 − 2) + µ2 (−θ2 ), µ1 ≥ 0, µ2 ≥ 0
Example 2
1
θ12 + θ22 + θ32
min 2
θ∈R3
s.c. θ1 + θ2 + 2θ3 = 1 equality constraint
θ1 + 4θ2 + 2θ3 = 3 equality constraint
Lagrangian :
1 2
θ1 + θ22 + θ32 + λ1 (θ1 + θ2 + 2θ3 − 1) + λ2 (θ1 + 4θ2 + 2θ3 − 3),
L(λ, θ) = λ1 , λ2 any
2
Gilles Gasso Introduction to constrained optimization 15 / 26
Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Necessary optimality conditions
Assume that J, fi , gj are differentiable functions. Let θ ∗ be a feasible
solution to the problem P. Then there exists dual variables
λ∗i , i = 1, · · · , n, µ∗j , j = 1, · · · , m such that the following conditions are
satisfied.
Karush-Kuhn-Tucker (KKT) Conditions
Stationarity ∇L(λ, µ,P
θ) = 0 ie
∇J(θ) + ni=1 λi ∇fi (θ) + m
P
j=1 µj ∇gj (θ) = 0
Primal feasibility fi (θ) = 0 ∀i = 1, · · · , n
gj (θ) ≤ 0 ∀j = 1, · · · , m
Dual feasibility µj ≥ 0 ∀j = 1, · · · , m
Complementary slackness µj gj (θ) = 0 ∀j = 1, · · · , m
Gilles Gasso Introduction to constrained optimization 16 / 26
Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Example
1 2
min (θ1 + θ22 )
θ∈R2 2
s.c. θ1 − 2θ2 + 2 ≤ 0
Lagrangian : L(λ, θ) = 12 (θ12 + θ22 ) + µ(θ1 − 2θ2 + 2), µ≥0
KKT Conditions
θ1 = −µ
Stationarity: ∇θ L(µ, θ) = 0 ⇒
θ2 = −2µ
Primal feasibility : θ1 − 2θ2 + 2 ≤ 0
Dual feasibility : µ ≥ 0
Complementary slackness : µ(θ1 − 2θ2 + 2) = 0
Remarks on the complementary slackness
If θ1 − 2θ2 + 2 < 0 (inactive constraint) ⇒ µ = 0 (no penalty required
as the constraint is satisfied)
If µ > 0 ⇒ θ1 − 2θ2 + 2 = 0 (active constraint)
Gilles Gasso Introduction to constrained optimization 17 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Duality
Dual function
Let L(θ, λ, µ) be the lagrangian of the primal problem P with µj ≥ 0. The
dual function corresponding to it is D(λ, µ) = minθ L(θ, λ, µ)
Theorem [Weak duality]
Let p ∗ = minθ {J(θ) such that fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j} be the
optimum value (supposed finite) of the problem P. Then, for any value of
µj ≥ 0, ∀j and λi , ∀i, we have
D(λ, µ) ≤ p ∗
Gilles Gasso Introduction to constrained optimization 18 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Dual problem
The weak duality indicates that the dual function
D(λ, µ) = minθ L(θ, λ, µ) is a lower bound of p ∗
Bridge the gap: maximize the dual w.r.t. dual variables λ and µ to make
this lower bound close to p ∗
Dual problem
max D(λ, µ)
λ,µ
s.c. λj ≥ 0 ∀ j = 1, · · · , m
http://www.onmyphd.com/?p=duality.theory
Gilles Gasso Introduction to constrained optimization 19 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Interest of the dual problem
Remarks
Transform the primal problem into an equivalent dual problem possibly
much simpler to solve
Solving the dual problem can lead to the solution of the primal problem
Solving the dual problem gives the optimal values of the Lagrange multipliers
Gilles Gasso Introduction to constrained optimization 20 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Example : inequality constraints
1 2
min (θ1 + θ22 )
θ∈R2 2
s.c. θ1 − 2θ2 + 2 ≤ 0
Lagrangian : L(θ, µ) = 12 (θ12 + θ22 ) + µ(θ1 − 2θ2 + 2), µ≥0
Stationarity of the KKT Condition
:
θ1 = −µ
∇θ L(µ, θ) = 0 ⇒ (1)
θ2 = 2µ
Dual function D(µ) = minθ L(θ, µ) : by substituting (1) in L we obtain
5
D(µ) = − µ2 + 2µ
2
Dual problem : maxµ D(µ) s.c. µ ≥ 0
Dual solution
2
∇D(µ) = 0 ⇒ µ = (that satisfies µ ≥ 0) (2)
5
4 >
Primal solution : (2) and (1) lead to θ = − 25
5
Gilles Gasso Introduction to constrained optimization 21 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Constrained optimization with equality constraints
1 2 Matrix form
θ1 + θ22 + θ32
min
θ∈R3 2 1 >
min θ θ s.c. Aθ − b = 0
s.c. θ1 + θ2 + 2θ3 − 1 = 0 θ∈R3 2
θ1 + 4θ2 + 2θ3 − 3 = 0
1 1 2
1
with A = ,b=
1 4 2 3
Lagrangian : L(λ, θ) = 12 θ > θ + λ> (Aθ − b) , λ ∈ R2
KKT Conditions
Stationarity : ∇θ L(λ,θ) = 0 ⇒ θ = −A> λ (1)
Dual function : by substituting (1) in L we get
1
D(λ) = − λ> AA> λ − λ> b
2
Dual Problem : maxλ D(λ)
Dual Solution −1
∇D(λ) = 0 ⇒ λ = − AA> b (3)
−1
Primal Solution : (3) in (1) gives θ = A> AA> b
Gilles Gasso Introduction to constrained optimization 22 / 26
Concept of Lagrangian and duality, condition of optimality Concept of duality
Convex constrained optimization
Convexity condition
minθ∈Rd J(θ) J is a convex function
fi (θ) = 0 ∀i = 1, · · · , n fi are linear ∀i = 1, n
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m gj are convex functions ∀j = 1, m
Problems of interest
Linear Programming (LP)
Quadratic Programming (QP)
Off-the-shelves toolboxes exist for those problems (Gurobi, Mosek, CVX . . . )
Gilles Gasso Introduction to constrained optimization 23 / 26
QP Problem
QP convex problem
Standard form
1 >
min 2 θ Gθ + q> θ + r
θ∈Rd
s.c. a>
i θ = bi ∀i = 1, · · · , n affine equality constraint
c>
j θ ≥ dj ∀j = 1, · · · , m linear inequality constraints
with q, ai , cj ∈ Rd , di and dj real scalar values and G ∈ Rd×d a positive
definite matrix
Examples
SVM Problem
1 2 1 2
min2 (θ + θ22 ) minθ,bR 2 kθk
θ∈R 2 1 s.t. >
yi (θ xi + b) ≥ 1 ∀i = 1, N
s.c. θ1 − 2θ2 + 2 ≤ 0
Gilles Gasso Introduction to constrained optimization 24 / 26
QP Problem
Summary
Optimization under constraints: involved problem in the general case
Lagrangian: allows to reduce to an unconstrained problem via Lagrange
multipliers
Lagrange multipliers
to a constraint corresponds a multiplier ⇒
act as a penalty if the corresponding constraints are violated
Optimally: Stationary condition + feasibility conditions + Complementary
conditions
Duality: provides lower bound on the primal problem. Dual problem
sometimes easier to solve than primal.
Gilles Gasso Introduction to constrained optimization 25 / 26
QP Problem
A reference book
Gilles Gasso Introduction to constrained optimization 26 / 26