Chapter 1
Chapter 1
Mathematical optimization
minimize f0 (x)
subject to the constraints:
fi (x) ≤ bi , i = 1, . . . , m.
Here:
• x = (x1 , . . . , xn ) is the optimization variable.
• The function f0 : Rn → R is the objective function.
• The functions fi : Rn → R, i = 1, . . . , m are the inequality constraint functions.
• The constants b1 , . . . , bm are the bounds for the constraints.
A vector x⋆ is called optimal, or a solution of the problem, if it satisfies the constraints and
minimizes the objective function. That is, for any vector z satisfying:
f1 (z) ≤ b1 , . . . , fm (z) ≤ bm ,
we have:
f0 (z) ≥ f0 (x⋆ ).
We often categorize optimization problems by the nature of the objective and constraint
functions. A key example is the linear program, in which the functions f0 , . . . , fm are linear,
meaning they satisfy:
where A ∈ Rk×n (with k ≥ n), aTi are the rows of A, and the vector x ∈ Rn is the optimization
variable.
AT Ax = AT b
This system of equations is known as the normal equations, and it provides the analytical
solution to the least-squares problem. The solution is given by:
x = (AT A)−1 AT b
Numerical Example
Consider a system with the following matrix A and vector b:
1 1 1
A = 1 2 , b = 2
1 3 2
We want to find the least-squares solution x that minimizes ∥Ax − b∥22 .
Step 1: Compute AT A and AT b:
1 1
T 1 1 1 3 6
A A= 1 2 =
1 2 3 6 14
1 3
1
T 1 1 1 5
A b= 2 =
1 2 3 10
2
Step 2: Solve the normal equation AT Ax = AT b:
3 6 x1 5
=
6 14 x2 10
By solving this system of linear equations, we get:
x1 = 0, x2 = 1
Thus, the least-squares solution is:
0
x=
1
14 Chapter 1. Mathematical optimization
Interpretation of Results
0
The least-squares solution x = means that the linear model which best fits the observed data
1
1
b = 2 is given by the equation:
2
y = x1 + x2 t = 0 + 1 · t = t
This implies that the best-fitting model is y = t. In this case, the slope of the best-fit line is 1,
and the intercept is 0. The model predicts that for every unit increase in the independent variable t,
the dependent variable y increases by 1.
The least-squares method has minimized the squared differences between the observed data
points (1, 1), (2, 2), (3, 2) and the predicted values on the line y = t. This method provides a
straightforward way to find the best linear approximation to the data.
Applications
The least-squares problem is the basis for regression analysis, optimal control, and many parameter
estimation and data fitting methods. It has several statistical interpretations, such as maximum
likelihood estimation of a vector x, given linear measurements corrupted by Gaussian measurement
errors.
Recognizing an optimization problem as a least-squares problem is straightforward; we only
need to verify that the objective is a quadratic function (and then test whether the associated
quadratic form is positive semidefinite). While the basic least-squares problem has a simple fixed
form, several standard techniques are used to increase its flexibility in applications.
Weighted Least-Squares:
In weighted least-squares, the weighted least-squares cost function is:
k
minimize f0 (x) = ∑ wi (aTi x − bi )2 , (1.3)
i=1
where w1 , . . . , wk are positive weights. These weights are chosen to reflect differing levels of
concern about the sizes of the terms aTi x − bi , or to influence the solution. In a statistical setting,
weighted least-squares arises in estimation of x, given linear measurements corrupted by errors
with unequal variances.
Regularization:
Another technique in least-squares is regularization, in which extra terms are added to the cost
function. In the simplest case, a positive multiple of the sum of squares of the variables is added:
k n
minimize f0 (x) = ∑ (aTi x − bi )2 + ρ ∑ xi2 , (1.4)
i=1 i=1
where ρ > 0. The extra terms penalize large values of x, resulting in a sensible solution when
minimizing the first sum alone does not. The parameter ρ is chosen by the user to balance
minimizing the original objective function while keeping the sum of squares of x not too large.
Regularization is used in statistical estimation when x is given a prior distribution.
minimize cT x (1.5)
subject to aTi x ≤ bi , i = 1, . . . , m. (1.6)
x+y ≤ 4
x≤3
y≤2
x, y ≥ 0
Solution:
1. Decision variables: x and y
2. Plot constraints (see graph)
3. Feasible region: Shaded area bounded by the constraints
4. Corner points: (0,0), (3,0), (3,1), (2,2), (0,2)
5. Evaluate objective function:
• (0,0): Z = 0
16 Chapter 1. Mathematical optimization
• (3,0): Z = 9
• (3,1): Z = 11
• (2,2): Z = 10
• (0,2): Z = 4
6. Optimal solution: (3,1) with Z = 11
y x=3
y=2
x+ y = 4
x−y ≥ 1
x+y ≥ 3
x, y ≥ 0
Solution:
1. Decision variables: x and y
2. Plot constraints (see graph)
3. Feasible region: Unbounded area above both constraint lines
4. As we move further in the positive x and y directions, Z increases indefinitely
5. Conclusion: Unbounded solution, no finite optimal solution exists
y Z increases
x+y = 3
x
1.1 Solution methods 17
x+y ≤ 5
x+y ≥ 7
x, y ≥ 0
Solution:
1. Decision variables: x and y
2. Plot constraints (see graph)
3. Feasible region: No area satisfies all constraints simultaneously
4. Conclusion: No feasible solution exists
y
No Feasible Region
x+y = 5 x+y = 7 x
x+y ≤ 4
x≤3
y≤3
x, y ≥ 0
18 Chapter 1. Mathematical optimization
Solution:
1. Decision variables: x and y
2. Plot constraints (see graph)
3. Feasible region: Shaded area bounded by the constraints
4. Corner points: (0,0), (3,0), (3,1), (1,3), (0,3)
5. Evaluate objective function:
• (0,0): Z = 0
• (3,0): Z = 6
• (3,1): Z = 8
• (1,3): Z = 8
• (0,3): Z = 6
6. Optimal solutions: Both (3,1) and (1,3) with Z = 8
7. Conclusion: Multiple optimal solutions exist along the line segment connecting (3,1) and
(1,3)
y x=3
(1,3)
y=3
Z=8
Feasible Region
(3,1)
x+ y = 4
The graphical method provides a visual and intuitive way to solve LPPs with two variables. It
allows us to easily identify special cases such as unbounded solutions, infeasible problems, and
multiple optimal solutions. For problems with more than two variables, other methods like the
Simplex method are required.
Key Concepts
• Objective Function: The function we are trying to maximize or minimize.
• Constraints: The limitations or requirements of the problem, expressed as linear inequalities
or equations.
• Feasible Region: The set of all points that satisfy all constraints.
1.1 Solution methods 19
• Basic Feasible Solution: A solution that satisfies all constraints and corresponds to a vertex
of the feasible region.
• Slack Variables: Variables added to inequality constraints to transform them into equality
constraints.
• Pivot: The process of changing the basis (set of basic variables) to move to an adjacent
vertex.
Steps of the Simplex Method
1. Convert the problem to standard form. Transform all inequalities into equalities by adding
slack variables.
2. Create the initial simplex tableau. Write the system of equations and the objective function
in a tableau format.
3. Determine the entering variable (pivot column). Select the column with the most negative
coefficient in the objective function row, as it represents the variable that will increase the
objective value.
4. Determine the leaving variable (pivot row). Calculate the ratio of the right-hand side to the
pivot column for each row, and choose the smallest positive ratio.
5. Perform row operations to get the new basic feasible solution. Make the pivot element 1
and all other elements in the pivot column 0 by performing row operations.
6. Repeat steps 3-5 until the optimal solution is found. Continue this process until there are
no more negative coefficients in the objective function row.
Problem Statement
Maximize Z = 12x1 + 16x2 + 0S1 + 0S2
Subject to:
Final Solution
All values in the Z j −C j row are non-negative, so we’ve reached the optimal solution:
• x1 = 8
• x2 = 2
• Maximum value of Z = 128
1.1.6 Complexity
The exact number of calculations needed to solve a linear program can vary. However, Interior-Point
Methods typically require O(n2 m) operations, where n is the number of variables and m is the
number of constraints. Despite this, modern computers can handle very large problems efficiently.
1.2 Convex optimization 21
minimize t (1.8)
subject to aTi x − t ≤ bi , i = 1, . . . , k (1.9)
−aTi x − t ≤ −bi , i = 1, . . . , k (1.10)
where x ∈ Rn and t ∈ R. The objective function in the Chebyshev approximation problem (1.6)
is not differentiable, unlike the quadratic objective in least-squares problems.
Recognizing problems that can be reduced to linear programs requires some knowledge of
linear programming techniques, but these skills can be acquired with practice. Some software
systems can automatically recognize and reformulate problems as linear programs.
Convex Functions
A convex function f : Rn → R is a function where the line segment between any two points on its
graph lies above or on the graph itself. More formally, a function f is convex if for all x, y ∈ Rn and
for all α, β ≥ 0 such that α + β = 1, the following inequality holds:
minimize f0 (x)
subject to fi (x) ≤ bi , i = 1, . . . , m,
where f0 , f1 , . . . , fm are convex functions and bi are constants.
Examples of Convex Optimization Problems
• Least-Squares Problem:
minimize ∥Ax − b∥22
This is a convex optimization problem because the objective function is a quadratic function
of x, which is convex.
• Linear Programming Problem:
minimize cT x subject to Ax ≤ b
Linear programming is a special case of convex optimization where both the objective
function and the constraints are linear and hence convex.
max{n3 , n2 m, F},
where n is the number of variables, m is the number of constraints, and F is the cost of
evaluating the derivatives of the objective and constraint functions.
• Practical Performance: Interior-point methods can handle problems with hundreds of
variables and thousands of constraints efficiently on modern computers, often within a few
tens of seconds. By exploiting problem structure (such as sparsity), these methods can solve
much larger problems.
State of Technology
• Current Status: Solving general convex optimization problems is not yet as mature as
solving least-squares or linear programming problems. Research is ongoing, and significant
progress is being made.
• Active Research: Research on interior-point methods for general nonlinear convex optimiza-
tion is still very active, and no consensus has emerged regarding the best methods.
• Specialized Problems: For specific subclasses of convex optimization problems, such as
second-order cone programming and geometric programming, interior-point methods are
approaching a mature technology.
1.3 Nonlinear optimization 23
x12 − x2 ≥ 0 (1.15)
24 Chapter 1. Mathematical optimization
Starting from an initial guess of (x1 , x2 ) = (1, 1), a local optimization method might converge
to a solution such as (x1 , x2 ) = (0, 0), which is a local minimum but not necessarily the global
minimum. For this example, (0, 0) is actually the global minimum, but this is not guaranteed for
more complex problems.
Advantages and Disadvantages
• Advantages: Can be applied to large-scale problems, fast computation.
• Disadvantages: Dependence on initial guesses, sensitive to algorithm parameters.
subject to:
In this problem, finding the global minimum within the feasible region (a disk of radius 1
centered at the origin) involves exploring the entire feasible region to guarantee that no better
solution exists outside the region. This approach can be computationally intensive, especially if the
feasible region or the function becomes more complex.
Advantages and Disadvantages
• Advantages: Guarantees finding the global optimum.
• Disadvantages: Computationally expensive, time-consuming.
Both local and global optimization methods have their place depending on the problem context.
Local optimization is generally faster and more practical for large-scale problems but does not
guarantee global optimality. Global optimization, while providing guarantees of finding the true
global optimum, can be computationally expensive and may only be feasible for smaller problems.
y = θ x1 + (1 − θ )x2
1.4 Convex sets 25
where θ determines the location of the point y along the line. Here’s how the value of θ affects
the position of y:
• When θ = 0, the point y is exactly at x2 .
• When θ = 1, the point y is exactly at x1 .
• When 0 ≤ θ ≤ 1, the point y is on the line segment between x1 and x2 .
• If θ > 1, the point y is beyond x1 , extending the line beyond this point.
• If θ < 0, the point y is beyond x2 , extending the line in the opposite direction.
For θ = 0 : y = x2
For θ = 1 : y = x1
For 0 < θ < 1 : y is between x2 and x1
For θ > 1 : y is beyond x1
For θ < 0 : y is beyond x2
Alternative Expression
We can also express y in another form:
y = x2 + θ (x1 − x2 )
Summary
• The parametric representation allows us to easily describe points on a line or line segment.
• For 0 ≤ θ ≤ 1, we obtain a line segment between x1 and x2 .
• If θ is outside this range, the point y lies on the infinite extension of the line beyond x1 or x2 .
26 Chapter 1. Mathematical optimization
y = θ x1 + (1 − θ )x2
also belongs to C. This means C contains every point on the line connecting x1 and x2 , for all values
of θ .
Intuitive Explanation:
• If θ = 0, then y = x2 .
• If θ = 1, then y = x1 .
• If 0 < θ < 1, y is somewhere between x1 and x2 .
• If θ > 1, the point y is beyond x1 , extending the line past it.
• If θ < 0, the point y is beyond x2 , extending the line in the opposite direction.
V = C − x0 = {x − x0 | x ∈ C}
is a subspace, meaning it is closed under addition and scalar multiplication. In other words, V has
the same properties as a vector space.
C = V + x0 = {v + x0 | v ∈ V }
This means that the affine set C can be thought of as a subspace V shifted by the point x0 .
4. Affine Hull
The affine hull of a set C ⊆ Rn is the smallest affine set that contains C. The affine hull of C,
denoted aff C, is the set of all affine combinations of points in C:
aff C = {θ1 x1 + · · · + θk xk | x1 , . . . , xk ∈ C, θ1 + · · · + θk = 1} .
The affine hull is the smallest affine set that contains C in the sense that if S is any affine set with
C ⊆ S, then aff C ⊆ S.
C = {x | Ax = b}
Claim:
The solution set C is an affine set.
Proof:
Let x1 , x2 ∈ C, which means:
Ax1 = b and Ax2 = b.
Now consider an affine combination of x1 and x2 :
y = θ x1 + (1 − θ )x2 .
This shows that the affine combination y also satisfies Ay = b, so y ∈ C. Therefore, C is an affine
set.
Associated Subspace:
The subspace associated with the affine set C is the nullspace of A, which is the set of solutions to
Ax = 0.
Converse:
Every affine set can be expressed as the solution set of a system of linear equations.
Summary
• An affine set is a set where the line between any two points stays inside the set.
• An affine set contains all affine combinations of its points.
• Every affine set is like a subspace that has been shifted by a point.
• The affine hull of a set is the smallest affine set that contains all points of the set.
• The solution set of a system of linear equations is an affine set, with the nullspace being the
associated subspace.
28 Chapter 1. Mathematical optimization
Affine dimension
The affine dimension of a set C ⊆ Rn is defined as the dimension of its affine hull, denoted as
aff(C). The affine dimension is especially useful in the context of convex analysis and optimization.
However, it may not always align with other definitions of dimension.
The affine hull of C is all of R2 , so its affine dimension is 2. However, according to other definitions
of dimension, such as the topological dimension, the unit circle has dimension 1 because it is a
curve.
Relative Interior
If the affine dimension of a set C ⊆ Rn is less than n, the set lies within the affine set aff(C) ⊂ Rn .
In this case, we define the relative interior of C, denoted as relint(C), as its interior relative to its
affine hull:
relint(C) = {x ∈ C | B(x, r) ∩ aff(C) ⊆ C for some r > 0},
Relative Boundary
We define the relative boundary of a set C as the set of points in the closure of C, but not in its
relative interior:
relbd(C) = C \ relint(C),
Example: Square in R3
Consider a square in the (x1 , x2 )-plane in R3 , defined as:
C = {x ∈ R3 | −1 ≤ x1 ≤ 1, −1 ≤ x2 ≤ 1, x3 = 0}.
aff(C) = {x ∈ R3 | x3 = 0}.
- The interior of C is empty because there are no points of C where a full neighborhood is contained
within C in R3 . - However, the relative interior of C is:
- The boundary of C in R3 is the set C itself, but the relative boundary of C is the wireframe outline
of the square:
relbd(C) = {x ∈ R3 | max{|x1 |, |x2 |} = 1, x3 = 0}.
1.4 Convex sets 29
Summary
- The affine dimension of a set is the dimension of its affine hull. - The relative interior of a set is
the interior with respect to its affine hull. - The relative boundary is the closure of the set minus
its relative interior.
The concept of affine dimension and relative interior are important in optimization and convex
analysis, particularly when analyzing sets that may not have an interior in the usual sense, but have
a relative interior in their affine hull.
θ x1 + (1 − θ )x2 ∈ C.
This means that the line segment between any two points in C lies within the set C.
Figure 1.1: Convex and non-convex sets. Left: Hexagon (convex). Middle: Kidney-shaped set
(non-convex). Right: A partially bounded square (non-convex).
Convex Combinations
A point of the form
θ1 x1 + · · · + θk xk
where θ1 + · · · + θk = 1 and θi ≥ 0 for i = 1, . . . , k, is called a convex combination of the points
x1 , . . . , xk .
Convex combinations represent a "weighted average" or a "mixture" of the points, where θi
indicates the proportion of each point in the combination.
Convex Hull
The convex hull of a set C, denoted conv(C), is the set of all convex combinations of points in C:
( )
k
conv(C) = θ1 x1 + · · · + θk xk | xi ∈ C, θi ≥ 0, ∑ θi = 1 .
i=1
Figure 1.2: Convex hulls of two sets in R2 . Left: Convex hull of 15 points forms a pentagon. Right:
Convex hull of a kidney-shaped set.
Cones
A set C is called a cone (or nonnegative homogeneous) if, for every point x ∈ C and for any θ ≥ 0,
we have:
θ x ∈ C.
In simpler terms, if you take any point in the set and scale it by a non-negative factor, the resulting
point will still lie in the set.
A convex cone is a set that is both convex and a cone. This means that for any x1 , x2 ∈ C and
for any θ1 , θ2 ≥ 0, the point
θ1 x1 + θ2 x2 ∈ C.
Conic Combinations
A point of the form
θ1 x1 + · · · + θk xk where θ1 , . . . , θk ≥ 0
is called a conic combination (or nonnegative linear combination) of the points x1 , . . . , xk . A set is
a convex cone if and only if it contains all conic combinations of its elements.
Conic Hull
The conic hull of a set C is the set of all conic combinations of points in C, i.e.,
{θ1 x1 + · · · + θk xk | xi ∈ C, θi ≥ 0} .
The conic hull is the smallest convex cone that contains C.
{x ∈ Rn | aT x = b},
where a ∈ Rn , a ̸= 0, and b ∈ R.
Geometric Interpretation
Geometrically, a hyperplane is the set of points that are orthogonal (perpendicular) to the normal
vector a. The constant b shifts the hyperplane from the origin. We can express the hyperplane as:
{x | aT (x − x0 ) = 0},
where x0 is any point on the hyperplane. This shows that the hyperplane consists of all vectors
orthogonal to the normal vector a, starting from x0 .
Halfspaces
A halfspace is a region of space divided by a hyperplane. A closed halfspace is defined as:
{x ∈ Rn | aT x ≤ b},
where a ̸= 0.
32 Chapter 1. Mathematical optimization
Geometric Interpretation
A halfspace is one of the two regions determined by a hyperplane. The boundary of the halfspace is
the hyperplane {x | aT x = b}. We have:
• {x | aT x ≥ b}: The halfspace extending in the direction of the normal vector a.
• {x | aT x ≤ b}: The halfspace extending in the opposite direction.
The boundary of the halfspace is the hyperplane, while the interior, where aT x < b, is called an
open halfspace.
This hyperplane is called a separating hyperplane. The set {x | aT x = b} represents the hyperplane,
and it divides the space between C and D.
Strict Separation
In some cases, the separating hyperplane can strictly separate the sets C and D. This means that:
Supporting Hyperplanes
A supporting hyperplane is a hyperplane that touches a convex set at a single point but does not
cut through it. If C ⊂ Rn is convex, and x0 ∈ C, then there exists a supporting hyperplane at x0 if
there is a vector a ̸= 0 and a scalar b such that:
Figure 1.7: Construction of a separating hyperplane between two convex sets C and D. The
hyperplane is perpendicular to the line connecting the closest points c ∈ C and d ∈ D.