Main
Main
Aksum University
Department of Mathematics
Copyright © 2024 Teklebirhan Abraha Gebrehiwot
1 Mathematical Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1 Linear Algebra Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Convex Sets and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Polyhedral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Introduction to Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6.1 Dot Product and Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.7 Basics of Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.7.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.3 Convex Sets, Functions and Cones and Polyhedral Theory . . . . . . . . . . . . . . . . . 49
1.7.4 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
2.97 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
2.98 Complementary slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
2.99 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
2.100 Solving linear programming problems . . . . . . . . . . . . . . . . . . . . . . . . . . 263
2.101 Graphical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
2.102 The Geometry of LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
2.103 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
2.104 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
2.105 When is a Linear Program Feasible ? . . . . . . . . . . . . . . . . . . . . . . . . . . 293
2.106 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
2.106.1 Size of the Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
2.107 Complexity of linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
2.108 Solving a Liner Program in Polynomial Time . . . . . . . . . . . . . . . . . . . 304
2.108.1 Ye’s Interior Point Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
2.109 Description of Ye’s Interior Point Algorithm . . . . . . . . . . . . . . . . . . . . 310
2.110 Analysis of the Potential Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
2.111 Bit Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
2.112 Transformation for the Interior Point Algorithm . . . . . . . . . . . . . . . . . 317
2.113 Modeling: Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
2.114 Modeling and Assumptions in Linear Programming . . . . . . . . . . . . . . 321
2.114.1 General models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
2.114.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
2.114.3 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
2.114.4 Capital Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
2.114.5 Work Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
2.114.6 Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
2.114.7 Multi period Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.8 Mixing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.9 Financial Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.10Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.11Multi-Commodity Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
2.115 Modeling Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
2.115.1 Maximizing a minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
CONTENTS 9
9 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
9.1 The Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
9.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
12 CONTENTS
1.1 A convex combination of the points x and y is given by z = λx + (1 − λ)y with any
λ ∈ [0, 1]. Here we demonstrate this using λ = 2/3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.3 The green intersection of the convex sets that are the ball and the polytope is also convex.
This can be seen by considering any points x, y ∈ Ball ∩ Polytope. Since Ball is convex, the line
segment between x and y is completey contained in Ball. And similarly, the line segment is
completely contained in Polytope. Hence, the line segment is also contained in the intersection.
This is how we can reason that the intersection is also convex. . . . . . . . . . . . . . . . . . . . . 46
1.4 Comparison of Convex and Non-Convex Functions. . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5 Convex Functions f (x, y) = x2 + y 2 + x , f (x, y) = ex+y + ex−y + e−x−y , and f (x, y) =
x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6 Examples of Convex Sets: The set on the left (an ellipse and its interior) is a convex set;
every pair of points inside the ellipse can be connected by a line contained entirely in the ellipse.
The set on the right is clearly not convex as we’ve illustrated two points whose connecting line
is not contained inside the set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.7 A convex function: A convex function satisfies the expression f (λx1 + (1 − λ)x2 ) ≤
λf (x1 ) + (1 − λ)f (x2 ) for all x1 and x2 and λ ∈ [0, 1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.8 A hyperplane in 3 dimensional space: A hyperplane is the set of points satisfying an
equation aT x = b, where k is a constant in R and a is a constant vector in Rn and x is a
variable vector in Rn . The equation is written as a matrix multiplication using our assumption
that all vectors are column vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.9 Two half-spaces defined by a hyper-plane: A half-space is so named because any hyper-
plane divides Rn (the space in which it resides) into two halves, the side “on top” and the side
“on the bottom.” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10 A Ray: The points in the graph shown in this figure are in the set produced using the
expression x0 + dλ where x0 = [2, 1]T and d = [2, 2]T and λ ≥ 0. . . . . . . . . . . . . . . . . . . 55
1.11 Convex Direction: Clearly every point in the convex set (shown in blue) can be the
vertex for a ray with direction [1, 0]T contained entirely in the convex set. Thus [1, 0]T is a
direction of this convex set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.12 An Unbounded Polyhedral Set: This unbounded polyhedral set has many directions.
One direction is [0, 1]T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
14 LIST OF FIGURES
1.13 Boundary Point: A boundary point of a (convex) set C is a point in the set so that for
every ball of any radius centered at the point contains some points inside C and some points
outside C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.14 A Polyhedral Set: This polyhedral set is defined by five half-spaces and has a single
degenerate extreme point located at the intersection of the binding constraints 3x1 + x2 ≤ 120,
28
x1 + 2x2 ≤ 160 and 16 x1 + x2 <= 100. All faces are shown in bold. . . . . . . . . . . . . . . . . 62
1.15 Visualization of the set D: This set really consists of the set of points on the red line.
This is the line where d1 + d2 = 1 and all other constraints hold. This line has two extreme
points (0, 1) and (1/2, 1/2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.16 The Cartheodory Characterization Theorem: Extreme points and extreme directions are
used to express points in a bounded and unbounded set. . . . . . . . . . . . . . . . . . . . . . . . . 68
2.1 Goat pen with unknown side lengths. The objective is to identify the values of x and y
that maximize the area of the pen (and thus the number of goats that can be kept). . . . 70
2.2 Plot with Level Sets Projected on the Graph of z. The level sets existing in R2 while the
graph of z existing R3 . The level sets have been projected onto their appropriate heights on
the graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.3 Contour Plot of z = x2 + y 2 . The circles in R2 are the level sets of the function. The
lighter the circle hue, the higher the value of c that defines the level set. . . . . . . . . . . . . 74
2.4 A Line Function: The points in the graph shown in this figure are in the set produced
using the expression x0 + vt where x0 = (2, 1) and let v = (2, 2). . . . . . . . . . . . . . . . . . . 75
2.5 A Level Curve Plot with Gradient Vector: We’ve scaled the gradient vector in this case to
make the picture understandable. Note that the gradient is perpendicular to the level set curve
at the point (1, 1), where the gradient was evaluated. You can also note that the gradient is
pointing in the direction of steepest ascent of z(x, y). . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.6 Level Curves and Feasible Region: At optimality the level curve of the objective function
is tangent to the binding constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.7 Gradients of the Binding Constraint and Objective: At optimality the gradient of the
binding constraints and the objective function are scaled versions of each other. . . . . . . . 81
2.8 Goat pen with unknown side lengths. The objective is to identify the values of x and y
that maximize the area of the pen (and thus the number of goats that can be kept). . . 222
2.9 Graph representing primal in example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
2.10 A polyhedron with no vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
2.11 Traversing the vertices of a convex body (here a polyhedron in R3 ). . . . . . . . . . . 293
2.13 The Projection Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
2.12 Examples of convex and non-convex sets in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . 295
2.14 Exploring the interior of a convex body. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
2.15 A centering mapping. If x is close to the boundary, we map the polyhedron P onto
another one P ′ , s.t. the image x′ of x is closer to the center of P ′ . . . . . . . . . . . . . . . . 306
2.16 Null space of A and gradient direction g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
4.1 Feasible Region and Level Curves of the Objective Function: The shaded region in the
plot is the feasible region and represents the intersection of the five inequalities constraining
the values of x1 and x2 . On the right, we see the optimal solution is the “last” point in the
feasible region that intersects a level set as we move in the direction of increasing profit. 352
LIST OF FIGURES 15
5.1 The feasible region for the diet problem is unbounded and there are alternative optimal
solutions, since we are seeking a minimum, we travel in the opposite direction of the gradient,
so toward the origin to reduce the objective function value. Notice that the level curves hit one
side of the boundary of the feasible region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
5.2 Matlab input for solving the diet problem. Note that we are solving a minimization
problem. Matlab assumes all problems are mnimization problems, so we don’t need to multiply
the objective by −1 like we would if we started with a maximization problem. . . . . . . . 397
6.1 The Simplex Algorithm: The path around the feasible region is shown in the figure. Each
exchange of a basic and non-basic variable moves us along an edge of the polygon in a direction
that increases the value of the objective function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
6.2 Unbounded Linear Program: The existence of a negative column aj in the simplex tableau
for entering variable xj indicates an unbounded problem and feasible region. The recession
direction is shown in the figure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
6.3 Infinite alternative optimal solutions: In the simplex algorithm, when zj − cj ≥ 0 in a
maximization problem with at least one j for which zj − cj = 0, indicates an infinite set of
alternative optimal solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
6.4 An optimization problem with a degenerate extreme point: The optimal solution to this
problem is still (16, 72), but this extreme point is degenerate, which will impact the behavior of
the simplex algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
6.5 Finding an initial feasible point: Artificial variables are introduced into the problem. These
variables allow us to move through non-feasible space. Once we reach a feasible extreme point,
the process of optimizing Problem P1 stops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
6.6 Multiperiod inventory models operate on a principle of conservation of flow. Manufactured
goods and previous period inventories flow into the box representing each period. Demand and
next period inventories flow out of the box representing each period. This inflow and outflow
must be equal to account for all production. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
6.7 Input model to GLPK describing McLearey’s Problem . . . . . . . . . . . . . . . . . . . . . 443
16 LIST OF FIGURES
8.1 System 2 has a solution if (and only if) the vector c is contained inside the positive cone
constructed from the rows of A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
8.2 System 1 has a solution if (and only if) the vector c is not contained inside the positive
cone constructed from the rows of A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
8.3 An example of Farkas’ Lemma: The vector c is inside the positive cone formed by the
rows of A, but c′ is not. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
8.4 The Gradient Cone: At optimality, the cost vector c is obtuse with respect to the directions
formed by the binding constraints. It is also contained inside the cone of the gradients of the
binding constraints, which we will discuss at length later. . . . . . . . . . . . . . . . . . . . . . . . 473
8.5 This figure illustrates the optimal point of the problem given in Example 8.4. Note that
at optimality, the objective function gradient is in the dual cone of the binding constraint. That
is, it is a positive combination of the gradients of the left-hand-sides of the binding constraints
at optimality. The gradient of the objective function is shown in green. . . . . . . . . . . . . 479
9.1 The dual feasible region in this problem is a mirror image (almost) of the primal feasible
region. This occurs when the right-hand-side vector b is equal to the objective function
coefficient column vector cT and the matrix A is symmetric. . . . . . . . . . . . . . . . . . . . . 490
9.2 The simplex algorithm begins at a feasible point in the feasible region of the primal
problem. In this case, this is also the same starting point in the dual problem, which is infeasible.
The simplex algorithm moves through the feasible region of the primal problem towards a point
in the dual feasible region. At the conclusion of the algorithm, the algorithm reaches the unique
point that is both primal and dual feasible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
9.1 Table of Dual Conversions: To create a dual problem, assign a dual variable to each
constraint of the form Ax ◦ b, where ◦ represents a binary relation. Then use the table to
determine the appropriate sign of the inequality in the dual problem as well as the nature of
the dual variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
1. Mathematical Foundations
z = c1 x 1 + c2 x 2 + · · · + cn x n
Definition 1.1 A vector is an ordered list of numbers, which can be written as:
v
1
v2
v=
..
.
vn
u · v = u1 v1 + u2 v2 + · · · + un vn .
u · v = 1 · 4 + 2 · 5 + 3 · 6 = 4 + 10 + 18 = 32.
Unit vector: A vector is said to be a unit vector if all its components are zero
except one with unit value. It is a unit vector in n-dimensional vector space whose
g-th component is 1. e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), e3 = (0, 0, 1, . . . , 0), . . . , en =
(0, 0, . . . , 1) are all unit vectors in n-dimensional space.
Null vector: A vector is said to be a null vector if all the components of the
vector be equal to zero. It is usually denoted by 0. An n-component null vector is
written as 0 = (0, 0, . . . , 0) with n zeros.
Vector spaces
A vector space V is a set (the elements of which are called vectors) on which two
operations are defined: vectors can be added together, and vectors can be multiplied
by real numbers called scalars. V must satisfy
(i) There exists an additive identity (written ⃗0) in V such that x + ⃗0 = x for all
x∈V
(ii) For each x ∈ V , there exists an additive inverse (written −x) ⃗ such that x +
⃗ = ⃗0
(−x)
(iii) There exists a multiplicative identity (written 1) in R such that 1x = x for all
x∈V
(iv) Commutativity: x + y = y + x for all x, y ∈ V
(v) Associativity: (x + y) + ⃗z = x + (x + ⃗z) and α(βx) = (αβ)x for all x,⃗y ,⃗z ∈ V
and α, β ∈ R
(vi) Distributivity: α(⃗x + ⃗y ) = α⃗x + α⃗y and (α + β)⃗x = α⃗x + β⃗x for all ⃗x,⃗y ∈ V and
α, β ∈ R
Metric spaces
Metrics generalize the notion of distance from Euclidean space (although metric
spaces need not be vector spaces).
A metric on a set S is a function d : S × S → R that satisfies
(i) d(x, y) ≥ 0, with equality if and only if x = y
(ii) d(x, y) = d(y, x)
(iii) d(x, z) ≤ d(x, y) + d(y, z) (the so-called triangle inequality)
for all x, y, z ∈ S.
Normed spaces
Norms generalize the notion of length from Euclidean space.
A norm on a real vector space V is a function ∥ · ∥ : V → R that satisfies
(i) ∥x∥ ≥ 0, with equality if and only if x = ⃗0
(ii) ∥αx∥ = |α|∥x∥
(iii) ∥x + y∥ ≤ ∥x∥ + ∥y∥ (the triangle inequality again)
for all x, y ∈ V and all α ∈ R. A vector space endowed with a norm is called a
normed vector space, or simply a normed space.
Note that any norm on V induces a distance metric on V :
d(x, y) = ∥x − y∥
One can verify that the axioms for metrics are satisfied under this definition and
follow directly from the axioms for norms. Therefore any normed space is also a
metric space. We will typically only be concerned with a few specific norms on Rn :
n
X
∥x∥1 = |xi |
i=1
v
u n
uX
∥x∥2 = t x2i
i=1
n
!1
p
|xi |p
X
∥x∥p = (p ≥ 1)
i=1
∥x∥∞ = max |xi |
1≤i≤n
Note that the 1- and 2-norms are special cases of the p-norm, and the ∞-norm is the
limit of the p-norm as p tends to infinity. We require p ≥ 1 for the general definition
of the p-norm because the triangle inequality fails to hold if p < 1.
Inner product spaces
An inner product on a real vector space V is a function ⟨·⟩ : V ×V → R satisfying
(i) ⟨x, x⟩ ≥ 0, with equality if and only if x = 0
(ii) ⟨αx + βy, z⟩ = α⟨x, z⟩ + β⟨y, z⟩
(iii) ⟨x, y⟩ = ⟨y, x⟩
for all x, y,⃗z ∈ V and all α, β ∈ R. A vector space endowed with an inner product is
called an inner product space.
Note that any inner product on V induces a norm on V :
q
∥x∥ = ⟨x, x⟩
1.1.3 Matrices
The individual quantities ajk are called the elements of the matrix.
Definition 1.7 The size of a matrix is denoted by two numbers, namely its rows
and column.
A general m × n matrix looks as follows (indices in red represent the row number of
an element, while indices in blue represent its column number):
a a12 ... a1n
11
a21 a22 ... a2n
.. .. ..
. . . ...
am1 am2 . . . amn
A matrix for which m = n (i.e. it has the same number of rows and columns) is called
a square matrix. In a square matrix, the elements where the two indices are equal (i.e.
i = j) are said to be found on the main diagonal of the matrix (also called the major
diagonal, the principal diagonal, and the primary diagonal).
■ Example 1.4 In the following matrices, the main diagonal elements are highlighted:
4 −7 1 0
1 0 1
1 5 −1 3
A= , 0 0 −1
B=
3 −7 1 6
1 1 0
−5 9 12 2
■
Definition 1.8 In a diagonal matrix, if all non-zero elements are unity, the matrix
is known as Identity matrix or unit matrix. It is usually denoted by In or
simply by I .
R Notice that the main diagonal elements in a diagonal matrix could be equal
to zero. The only criterion is that the elements that are not on the diagonal
must be zero.
A matrix with all elements above the diagonal equaling zero is called an upper
triangular matrix. Similarily, a matrix with all elements "below" the main diagonal
equaling zero is called a lower triangular matrix.
Definition 1.9 The zero matrix is a matrix where all elements are equal to
zero.
0 0 0
a a12 ··· a1n a a21 ··· an1
11 11
a
21 a22 ··· a2n transpose a12
a22 ··· an2
.. .. −
.. −−−−−→ .. .. .
.. (1.3)
. . . . . .
am1 am2 · · · amn a1m a2m · · · anm
■ Example 1.9
2 1 3 2 0 4
T
0 −7 1 −→ 1 −7 5
4 5 0 3 1 0
1 4
1 0 5 T
−→ 0 3
4 3 2
5 2
2
−1
T
2 −1 3 7 −→
3
7
Definition 1.10 The symmetric matrix is a matrix that is invariant under the
transpose operation and satisfies the requirement that A = AT .
3 8 9
is symmetric. ■
0 −7
■ Example 1.11 Given the matrix A = , then
7 0
T
T 0 −7
−A = −
7 0
T
0 7
=
−7 0
0 −7
=
7 0
=G
A + B = [aij + bij ]
Then
−2 5 −1
A+B =
1 6 7
■ Example 1.13
1 −2 3 6 −2 8 7 −4 11
A+B =
0 1 −4 + 1 −5 4 = 1 −3 0
−5 6 7 7 3 9 2 9 16
■
The difference of two matrices A = [aij ] and B = [bij ] of the same order m × η
is also a matrix C = [dij ] of the same order, where dij = aij − bij .
Multiplication of matrix with scalar quantity: The scalar multiple cA is
the matrix obtained by multiplying each entry of A by c.
cA = c[aij ] = [caij ]
■ Example 1.14
2 8 0 1 1/2 2 0 −1 −4 0
2A = A= (−1)A =
−4 12 10 2 −1 3 5/2 2 −6 −5
■
Matrix Multiplication
If A is an m × n matrix and B is an n × r matrix, then the product C = AB is
an m × r matrix. The (i, j) entry of the product is computed as follows:
or
n
X
cij = aik bkj
k=1
For matrix product to be feasible, the number of columns of the first matrix must
be equal to the number of rows of the second matrix.
With the knowledge of the matrix product, the set of simultaneous linear equations
given in (1.1) can be written with the help of matrix notation as given below in a
more compact form:
Ax = b
where A is a matrix of order m × n given by:
a a12 a13 ··· a1n
11
a21
a22 a23 ··· a2n
A=
.. .. .. .. ..
. . . . .
am1 am2 am3 · · · amn
Note: |A| is here different from the modulus of a quantity A, where the same
notation has been used.
Minor: The minor of an element aij of a determinant A is a determinant formed
by omitting the ith row and the jth column of the determinant A. It is usually
denoted by Mij .
Cofactor: The cofactor of an element aij of a determinant |A| is denoted by Cij
and given by
Cij = (−1)i+j Mij
Singular and non-singular matrix: If the value of the determinant of a square
matrix be a non-zero quantity, then the matrix is said to be a non-singular matrix,
and if the value of the determinant of the matrix be zero then the matrix is said to
be a singular matrix. We know that det A = det AT .
Thus, if a square matrix be non-singular, then its transpose is also non-singular.
Comparing entries and using the definition of the matrix equality, we have four linear
equations
c2 + c3 = 1
c1 + c3 = 4
−c1 + c3 = 2
c2 + c3 = 1
Gauss-Jordan elimination easily gives
0 1 1 1 1 0 0 1
1 0 1 4 0 1 0 −2
→
−1 0 1 2
0 0 1 3
0 1 1 1 0 0 0 0
■ Example 1.18 Describe the span of the matrices A1 , A2 , and A3 from the previous
example. ■
for some choice of scalars c1 , c2 , c3 . This gives a system of lienar equations whose
left-hand side is exactly the same as in the previous example but whose right-hand
side is general. The augmented matrix of this system is
0 1 1 w
1 0 1 x
−1 0 1 y
0 1 1 z
The only restriction comes from the last row, where we must have w − z = 0 to have
w x
a solution. Thus, the span of A1 , A2 , and A3 consists of all matrices for
y z
w x
which w = z. That is, span(A1 , A2 , A3 ) =
y w
Matrices A1 , A2 , . . . , Ak are linearly independent if the only solution of
c1 A1 + c2 A2 + · · · + ck Ak = 0
In addition to the example here, there are more examples in the section on matrix
inversion.
With R = 1 (step 1) and C = 1 (step 2), we apply step 3 and interchange rows 1
and 2, obtaining
1 2 3
(1.10)
0 1 4
With R = 1 (step 1) and C = 2 (step 2), P = 2 (step 3), we multiply the elements
of row 1 by 1/P = 1/2 to get a row-echelon form
0 1 2
(1.12)
0 0 1
Inverting a Matrix:
3 7
3 9 2 − 11 5 − 11
−1 = 2 1
A=
1 1 1 and A
11 −1 11
1 6
5 4 7 11 −3 11
Inverses may be found through the use of elementary row operations. This
procedure not only yields the inverse when it exists, but also indicates
when the inverse does not exist. An algorithm for finding the inverse of
a matrix A is as follows:
1. Form the partitioned matrix [A |I ], where I is the identity matrix
having the same order as A.
2. Using elementary row operations, transform A into row-echelon form,
applying each row operation to the entire partioned matrix formed
in Step 1. Denote the result as [ C |D ], where C is in row-echelon
form.
3. If C has a zero row, stop; the original matrix A is singular and
does not have an inverse. Otherwise continue; the original matrix is
invertible.
4. Beginning with the last column of C and progressing backward
iteratively through the second column, use elementary row operation
(3) to transform all elements above the diagonal of C to zero. Apply
each operation, however, to the entire matrix [ C |D ]. Denote the
result as [ I |B ]. The matrix B is the inverse of the original matrix
A.
If exact arithmetic is not used in Step 2, then a pivoting strategy should
be employed. No pivoting strategy is used in Step 4; the pivot is always
one of the unity elements on the diagonal of C. Interchanging any rows
after Step 2 has been completed will undo the work of that step and,
therefore, is not allowed.
In the examples below, the row reduction is done “by hand” first, as further examples
of using the elementary row operations.
■ Example 1.23 We use the elementary row operation algorithm to find the inverse
of
5 3
A= . (1.15)
2 1
C does not have a zero row, so A is invertible, and we continue to find the inverse.
We replace row 1 by the sum of row 1 plus (−3/5) times row 2, to get
1 0 | −1 3
[ I |B ] = . (1.20)
0 1 | 2 −5
■ Example 1.24 We use the elementary row operation algorithm to find the inverse
of
1 2 3
A = 4 5 6
. (1.22)
7 8 9
The left side of this partitioned matrix is in row-echelon form. Since its third row
is a zero row, the original matrix A does not have an inverse. ■
■ Example 1.25 We use the elementary row operation algorithm to find the inverse
of
0 1 1
A = 5 1 −1
. (1.28)
2 −3 −3
The left side of this partitioned matrix is now in row-echelon form and there are
no zero rows, so the inverse of the original matrix A exists. We start on column 3
and replace R2 by the sum of R2 and (−1) R3 to get
1 1/5 −1/5 | 0 1/5 0
0
1 0 | −13/4 2/4 −5/4.
(1.35)
0 0 1 | 17/4 −2/4 5/4
3. Back substitution: Solve the equations starting from the last row upwards
to find the values of the variables.
x + 2y + 3z = 9
2x + 3y + z = 8
3x + y + 2z = 7
z = 2, y = −4, x=1
LU Decomposition
LU decomposition factors a matrix A into the product of a lower triangular
matrix L and an upper triangular matrix U , such that A = LU .
Procedure
1. Decompose A into L and U .
2. Solve Ly = b for y using forward substitution.
3. Solve U y = x for x using back substitution.
−4 −2 8 6
1. Decompose A:
1 0 0 2 −1 −2
L=
−2 1 0 ,
U =
0 4 −1
−2 −1 1 0 0 3
2. Solve Ly = b:
−2
y=
3
3. Solve U y = x:
1
x=
1
1
■
Matrix Inversion
Matrix inversion involves finding the inverse A−1 of the matrix A such that
AA−1 = I.
Procedure
1. Find the inverse A−1 if it exists.
2. Solve x = A−1 b.
2. Solve x = A−1 b:
3 −3.5 5 0.5
x= =
−1 2 3 1
■
Av = λv
Definition 1.12 — Convex Sets. A set S is convex if for any two points in S,
the entire line segment between them is also contained in S. That is, for any
x, y ∈ S
λx + (1 − λy) ∈ S for all λ ∈ [0, 1].
A set S is said to be convex if it contains the line segment joining any two points in
the set. In other words, a set S is convex if for any two points x, y ∈ S, the point
z = λx + (1 − λ)y is also in S for all λ ∈ [0, 1].
Properties of Convex Sets
Some important properties of convex sets include: Convex and Polyhedral Sets
Convex Set: Set S in Rn is a convex set if a line segment joining any pair of points
a1 and a2 in S is completely contained in ∫ , that is, λa1 + (1 − λ)a2 ∈ S, ∀λ ∈ [0, 1].
Polyhedral Set: A polyhedral set (or polyhedron) is the set of points in the
intersection of a finite set of half-spaces. Set S = {x : Ax ≤ b, x ≥ 0}, where A is an
m × n matrix, x is an n-vector, and b is an m-vector, is a polyhedral set defined by
m + n hyperplanes (i.e., the intersection of m + n half-spaces).
• Polyhedral sets are convex.
• A polytope is a bounded polyhedral set.
• A polyhedral cone is a polyhedral set where the hyperplanes (that define the
half-spaces) pass through the origin, thus C = {x : Ax ≤ 0} is a polyhedral
cone.
Edges and Faces: An edge of a polyhedral set S is defined by n − 1 hyperplanes,
and a face of S by one of more defining hyperplanes of S, thus an extreme point
and an edge are faces (an extreme point is a zero-dimensional face and an edge a
one-dimensional face). In R2 faces are only edges and extreme points, but in R3
there is a third type of face, and so on...
Unbounded Sets:
Convex Cone: A Convex Cone is a convex set that consists of rays emanating
from the origin. A convex cone is completely specified by its extreme directions. If C
is convex cone, then for any x ∈ C we have λx ∈ C, λ ≥ 0.
Let’s define a procedure for finding the extreme directions, using the following
LP’s feasible region. Graphically, we can see that the extreme directions should
follow the the s1 = 0 (red) line and the s3 = 0 (orange) line.
x2
max z = −5x1 − x2 5 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 1 s3 = 0
4
− x1 + 2x2 + s3 = 4
x1 , x2 , s1 , s2 , s3 ≥ 0.
3
1
s1 = 0
0
0 1 2 3 4 5 x1
s3 = 0
4
1 s1 = 0
0
0 1 2 3 4 5 x1
We look at some of the geometric properties of sets of points in this section. Consider
any two points v1 and v2 . Then the vector v1 + k(v2 − v1 ) lies on the line segment
joining v1 and v2 for k ∈ [0, 1].Rearranging, we can write this as (1 − k)v1 + kv2 , or as
λ1 v1 + λ2 v2 where λ1 + λ2 = 1 and 0 ≤ λ1 , λ2 ≤ 1. What is interesting, however, is that
this generalizes to larger sets as well. If we consider a set of n points S = {v1 , . . . , vn },
then any point lying in the polygon with v1 , . . . , vn as its vertices can be written as
Pn Pn
i=1 λi vi , where i=1 λi = 1 and 0 ≤ λi ≤ 1.
We now define the term convex combination.
Definition 1.14 A set of points S is called convex if for any subset S ′ of S and
for any point p which we get by convex combination of points in S ′ , p ∈ S.
Convex Hull: The convex hull of a set S is the smallest convex set that contains
S. It is denoted as co(S). Separating Hyperplane: A separating hyperplane
is a hyperplane that separates two disjoint convex sets. It is a crucial concept
in optimization theory. Support Function: The support function of a convex
set S at a point x is the maximum value of the linear function f (y) = xT y for all
y ∈ S.
Definition 1.15 — Convex Set. Let X ⊆ Rn . Then the set X is convex if and
only if for all pairs x1 , x2 ∈ X we have λx1 + (1 − λ)x2 ∈ X for all λ ∈ [0, 1].
then
m
X
x= λi xi (1.40)
i=1
R If you recall the definition of linear combination, we can see that we move
from the very general to the very specific as we go from linear combinations to
positive combinations to convex combinations. A linear combination of points
or vectors allowed us to choose any real values for the coefficients. A positive
combination restricts us to positive values, while a convex combination asserts
that those values must be non-negative and sum to 1.
x1
x1 x2 x2
X X
Figure 1.1: Examples of Convex Sets: The set on the left (an ellipse and its interior)
is a convex set; every pair of points inside the ellipse can be connected by a line
contained entirely in the ellipse. The set on the right is clearly not convex as we’ve
illustrated two points whose connecting line is not contained inside the set.
be the set formed from the intersection of these sets. Choose x1 , x2 ∈ C and λ ∈ [0, 1].
Consider x = λx1 + (1 − λ)x2 . We know that x1 , x2 ∈ C1 , . . . , Cn by definition of C.
By convexity, we know that x ∈ C1 , . . . , Cn by convexity of each set. Therefore, x ∈ C.
Thus C is a convex set. ■
1 100
1 2
−1 −2
2
1 −2
−1
(b) Non-Convex Function
(a) Convex Function f (x, y) = x2 + y 2 − 2(x − 0.3)3 − 2(y −
f (x, y) = x2 + y 2 . 0.4)3 .
Informally, a function is convex if whenever you draw a line between two points
on the function, that line must be above the function.
f (x)
x
x1 λx1 + (1 − λ)x2 x2
Formally, we can make this definition using the idea of convex combinations.
Convexity
A key property that will enable efficient algorithms is convexity. This comes in the
form of convex sets and convex functions. Then the constraints to an optimization
problem form a convex set and the objective function is a convex function, then we
say that it is a convex optimization problem.
bination is any point z that lies on the line between x and y. Algebraically, a
convex combination is any point z that can be represented as z = λx + (1 − λ)y
for some multiplier λ ∈ [0, 1].
y
z = λx + (1 − λ)y
x
Definition 1.20 Convex Set A set C is convex if it contains all convex combina-
tions of points in C. That is, for any x, y ∈ C, it holds that λx + (1 − λ)y ∈ C
for all λ ∈ [0, 1].
Definition 1.21 Epigraph The epigraph of f is the set {(x, y) : y ≥ f (x)}. This
is the set of all points "above" the function.
Theorem 1.4 f (x) is a convex function if and only if the epigraph of f is a convex
set.
10
2
5
1 1
−1 −1
1 1
−1 −1
0.5
1
−1
1
−1
General 2: Sum of Convex Functions is Convex If f and g are both convex, then
f + g is also convex.
Example: f1 (x) = e−x , f2 (x) = ex are convex. Therefore, f (x) = max(ex , e−x )
is also convex.
f (x) = h(g(x)).
√
Example 2: g(x) = x is concave √
(on [0, ∞)), h(x) = e−x is convex and
non-increasing. Therefore, f (x) = e− x is convex on x ∈ [0, ∞).
f (λx1 + (1 − λ)x2 )
Figure 1.6: A convex function: A convex function satisfies the expression f (λx1 +
(1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) for all x1 and x2 and λ ∈ [0, 1].
To visualize this definition, simply flip Figure 1.7 upside down. The following theorem
is a powerful tool that can be used to show sets are convex. It’s proof is outside the
scope of the class, but relatively easy.
Theorem 1.7 Let f : Rn → R be a convex function. Then the set C = {x ∈ Rn :
f (x) ≤ c}, where c ∈ R, is a convex set.
Exercise 1.1 Prove the Theorem 1.7. [Hint: Skip ahead and read the proof of
Lemma 1.3. Follow the steps in that proof, but apply them to f .] ■
(a) Hl (b) Hu
Using these definitions, we are now in a position to define polyhedral sets, which
will be the subject of our study for most of the remainder of this chapter.
Hi = {x|aiT x ≤ bi }
is a polyhedral set.
It should be clear that we can represent any polyhedral set using a matrix
inequality. The set P is defined by the set of vectors x satisfying:
Ax ≤ b, (1.53)
Exercise 1.2 Prove Theorem 1.8. [Hint: You can prove this by brute force, verifying
convexity. You can also be clever and use two results that we’ve proved in the
notes.] ■
■ Example 1.36 We will use the same point and direction as we did for a line in
Chapter 1. Let x0 = [2, 1]T and let d = [2, 2]T . Then the ray defined by x0 and
d is shown in Figure 2.4. The set of points is R = {(x, y) ∈ R2 : x = 2 + 2λ, y =
1 + 2λ, λ ≥ 0}.
Figure 1.9: A Ray: The points in the graph shown in this figure are in the set
produced using the expression x0 + dλ where x0 = [2, 1]T and d = [2, 2]T and λ ≥ 0.
Rays are critical for understanding unbounded convex sets. Specifically, a set is
unbounded, in a sense, only if you can show that it contains a ray. An interesting
class of unbounded convex sets are convex cones:
The fact that every convex cone contains the origin by Lemma 1.4 along with
the fact that for every point x ∈ C we have λx ∈ C (λ ≥ 0) implies that the ray
0 + λx ⊆ C. Thus, since every point x ∈ C must be on a ray, it follows that a convex
cone is just made up of rays beginning at the origin.
Another key element to understanding unbounded convex sets is the notion of
direction. A direction can be thought of as a “direction of travel” from a starting
point inside an unbounded convex set so that you (the traveler) can continue moving
forever and never leave the set.
{x : x = x0 + λd, λ ≥ 0} ⊆ C (1.54)
■Example 1.37 Consider the unbounded convex set shown in Figure 1.11. This set
has direction [1, 0]T .
Figure 1.10: Convex Direction: Clearly every point in the convex set (shown in
blue) can be the vertex for a ray with direction [1, 0]T contained entirely in the
convex set. Thus [1, 0]T is a direction of this convex set.
To see this note that for any positive scaling parameter λ and for any vertex
point x0 , we can draw an arrow pointing to the right (in the direction of [1, 0]T )
with vertex at x0 scaled by λ that is entirely contained in the convex set. ■
Exercise 1.4 Prove the following: Let C ⊆ Rn be a convex cone and let x1 , x2 ∈ C.
If α, β ∈ R and α, β ≥ 0, then αx1 + βx2 ∈ C. [Hint: Use the definition of convex
cone and the definition of convexity with λ = 1/2, then multiply by 2.] ■
Exercise 1.5 Use Exercise 1.4 to prove that if C ⊆ Rn is a convex cone, then every
element x ∈ C (except the origin) is also a direction of C. ■
P = {x ∈ Rn : Ax ≤ b, x ≥ 0} (1.55)
Ad ≤ 0, d ≥ 0, d ̸= 0. (1.56)
Proof. The fact that d ̸= 0 is clear from the definition of direction of a convex set.
Furthermore, d is a direction if and only if
A (x + λd) ≤ b (1.57)
x + λd ≥ 0 (1.58)
for all λ > 0 and for all x ∈ P (which is to say x ∈ Rn such that Ax ≤ b and x ≥ 0).
But then
Ax + λAd ≤ b
for all λ > 0. This can only be true if Ad ≤ 0. Likewise:x + λd ≥ 0 holds for all
λ > 0 if and only if d ≥ 0. This completes the proof. ■
Corollary 1.1 If
P = {x ∈ Rn : Ax = b, x ≥ 0} (1.59)
Ad = 0, d ≥ 0, d ̸= 0. (1.60)
x1 − x2 ≤ 1
2x1 + x2 ≥ 6
x1 ≥ 0
x2 ≥ 0
This set is clearly unbounded as we showed in class and it has at least one direction.
The direction d = [0, 1]T pointing directly up is a direction of this set. This is
illustrated in Figure 1.12.
Figure 1.11: An Unbounded Polyhedral Set: This unbounded polyhedral set has
many directions. One direction is [0, 1]T .
Clearly d ≥ 0 and d ̸= 0. ■
Bϵ (x0 ) ∩ C ̸= ∅ and
Bϵ (x0 ) ∩ Rn \ C ̸= ∅
■ Example 1.39 A convex set, its boundary and a boundary point are illustrated
in Figure 1.13.
BOUNDARY POINT
BOUNDARY
INTERIOR
Proof. Suppose not, then x is not on the boundary and thus there is some ϵ > 0
so that Bϵ (x0 ) ⊂ C. Since Bϵ (x0 ) is a hypersphere, we can choose two points x1
and x2 on the boundary of Bϵ (x0 ) so that the line segment between these points
passes through the center of Bϵ (x0 ). But this center point is x0 . Therefore x0 is the
mid-point of x1 and x2 and since x1 , x2 ∈ C and λx1 + (1 − λ)x2 = x0 with λ = 1/2
it follows that x0 cannot be an extreme point, since it is a strict convex combination
of x1 and x2 . This completes the proof. ■
Most important in our discussion of linear programming will be the extreme
points of polyhedral sets that appear in linear programming problems. The following
theorem establishes the relationship between extreme points in a polyhedral set and
the intersection of hyperplanes in such a set.
Theorem 1.10 Let P ⊆ Rn be a polyhedral set and suppose P is defined as:
P = {x ∈ Rn : Ax ≤ b} (1.63)
P = {x ∈ Rn : Ax ≤ b, x ≥ 0} (1.64)
x0 = λx + (1 − λ)x̂
If this is true, then for some G ∈ Rn×n whose rows are drawn from A and a vector
g whose entries are drawn from the vector b, so that Gx0 = g. But then we have:
and Gx ≤ g and Gx̂ ≤ g (since x, x̂ ∈ P ). But the only way for Equation 1.65 to
hold is if
1. Gx = g and
2. Gx̂ = g
The fact that the hyper-planes defining x0 are linearly independent implies that the
solution to Gx0 = g is unique. (That is, we have chosen n equations in n unknowns
and x0 is the solution to these n equations.) Therefore, it follows that x0 = x = x̂
and thus x0 is an extreme point since it cannot be expressed as a convex combination
of other points in P .
(⇒) By Lemma 1.5, we know that any extreme point x0 lies on the boundary of
P and therefore there is at least one row Ai· such that Ai· x0 = bi (otherwise, clearly
x0 does not lie on the boundary of P ). By way of contradiction, suppose that x0
is the intersection of r < n linearly independent hyperplanes (that is, only these r
constraints are binding). Then there is a matrix G ∈ Rr×n whose rows are drawn
from A and a vector g whose entries are drawn from the vector b, so that Gx0 = g.
Linear independence of the hyperplanes implies that the rows of G are linearly
independent and therefore there is a non-zero solution to the equation Gd = 0. To
see this, apply Expression 5.51 and choose solution in which d is non-zero. Then we
can find an ϵ > 0 such that:
1. If x = x0 + ϵd, then Gx = g and all non-binding constraints at x0 remain
non-binding at x.
2. If x̂ = x0 − ϵd, then Gx̂ = g and all non-binding constraints at x0 remain
non-binding at x̂.
These two facts hold since Gd = 0 and if Ai· is a row of A with Ai· x0 < bi (or
x > 0), then there is at least one non-zero ϵ so that Ai· (x0 ± ϵd) < bi (or x0 ± ϵd > 0)
still holds and therefore (x0 ± ϵd) ∈ P . Since we have a finite number of constraints
that are non-binding, we may choose ϵ to be the smallest value so that the previous
statements hold for all of them. Finally we can choose λ = 1/2 and see that
x0 = λx + (1 − λ)x̂ and x, x̂ ∈ P . Thus x0 cannot have been an extreme point,
contradicting our assumption. This completes the proof. ■
P = {x ∈ Rn : Ax ≤ b}
X = {x ∈ Rn : Gx = g and Ax ≤ b} (1.66)
R Based on this definition, we can easily see that an extreme point, which is the
intersection n linearly independent hyperplanes is a face of dimension zero.
■ Example 1.40 Consider the polyhedral set defined by the system of inequalities:
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
28
x1 + x2 ≤ 100
16
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
Figure 1.13: A Polyhedral Set: This polyhedral set is defined by five half-spaces
and has a single degenerate extreme point located at the intersection of the binding
constraints 3x1 + x2 ≤ 120, x1 + 2x2 ≤ 160 and 28 16 x1 + x2 <= 100. All faces are
shown in bold.
The extreme points of the polyhedral set are shown as large diamonds and
correspond to intersections of binding constraints. Note the extreme point (16, 72) is
degenerate since it occurs at the intersection of three binding constraints 3x1 + x2 ≤
120, x1 + 2x2 ≤ 160 and 2816 x1 + x2 <= 100. All the faces of the polyhedral set are
shown in bold. They are locations where one constraint (or half-space) is binding.
An example of a pair of adjacent extreme points is (16, 72) and (35, 15), as they
are connected by the edge defined by the binding constraint 3x1 + x2 ≤ 120. ■
Exercise 1.7 Consider the polyhedral set defined by the system of inequalities:
4x1 + x2 ≤ 120
x1 + 8x2 ≤ 160
x1 + x2 ≤ 30
x1 ≥ 0
x2 ≥ 0
Identify all extreme points and edges in this polyhedral set and their binding
constraints. Are any extreme points degenerate? List all pairs of adjacent extreme
points. ■
Extreme Directions
We have already seen by Theorem 1.9 that is P is a polyhedral set in the positive
orthant of Rn with form:
P = {x ∈ Rn : Ax ≤ b, x ≥ 0}
then a direction d of P is characterized by the set of inequalities and equations
Ad ≤ 0, d ≥ 0, d ̸= 0.
Clearly two directions d1 and d2 with d1 = λd2 for some λ ≥ 0 may both satisfy this
system. To isolate a unique set of directions, we can normalize and construct the set:
D = {d ∈ Rn : Ad ≤ 0, d ≥ 0, eT d = 1} (1.67)
here we are interested only in directions satisfying eT d = 1. This is a normalizing
constraint that will chose only vectors whose components sum to 1.
Theorem 1.11 A direction d ∈ D is an extreme direction of P if and only if d is
an extreme point of D when D is taken as a polyhedral set.
Proof. (⇒)Suppose that d is an extreme point of D (as a polyhedral set) and not
an extreme direction of P . Then there exist two directions d1 and d2 of P and
two constants λ1 and λ2 with λ1 , λ2 ≥ 0 so that d = λ1 d1 + λ2 d2 . Without loss of
generality, we may assume that d1 and d2 are vectors satisying eT di = 1 (i = 1, 2).
If not, then we can scale them so their components sum to 1 and adjust λ1 and λ2
accordingly. But this implies that:
1 = eT d = λ1 eT d1 + λ2 eT d2 = λ1 + λ2
Further, the fact that d1 and d2 are directions of P implies they must be in D. Thus
we have found a convex combination of element of D that equals d, contradicting
our assumption that d was an extreme point.
■ Example 1.41 Let’s consider Example 1.36 again. The polyhedral set in this
example was defined by the A matrix:
1 −1
A=
−2 −1
D = {d ∈ Rn : Ad ≤ 0, d ≥ 0, eT d = 1}
d1 − d2 ≤ 0
−2d1 − d2 ≤ 0
d1 + d2 = 1
d1 ≥ 0
d2 ≥ 0
The feasible region (which is really only the line d1 + d2 = 1) is shown in red in
Figure 1.15.
Figure 1.14: Visualization of the set D: This set really consists of the set of points
on the red line. This is the line where d1 + d2 = 1 and all other constraints hold.
This line has two extreme points (0, 1) and (1/2, 1/2).
The critical part of this figure is the red line. It is the true set D. As a line, it
has two extreme points: (0, 1) and (1/2, 1/2). Note that (0, 1) as an extreme point
is one of the direction [0, 1]T we illustrated in Example 1.36. ■
Exercise 1.8 Show that d = [1/2, 1/2]T is a direction of the polyhedral set P from
Example 1.36. Now find a non-extreme direction (whose components sum to
1) using the feasible region illustrated in the previous example. Show that the
direction you found is a direction of the polyhedral set. Create a figure like Figure
1.12 to illustrate both these directions. ■
Lemma 1.6. Let P be a non-empty polyhedral set. Then the set of directions of P
is empty if and only if P is bounded.
Proof. Clearly if P is bounded then it cannot have a direction. If P were contained
in a ball Br (x0 ) then we know that for every x ∈ P we have |x − x0 | < r. If d is a
direction of P , then we have x + λd for all λ > 0. We simply need to choose λ large
enough so that |x + λd − x0 | > r.
If P has no directions, then there is some absolute upper bound on the value of
|x| for all x ∈ P . Let r be this value. Then trivially, Br+1 (0) contains P and so P is
bounded. ■
Lemma 1.7. Let P be a non-empty unbounded polyhedral set. Then the number
extreme directions of P is finite and non-zero.
Proof. The result follows immediately from Theorem 1.11 and Lemma 1.6. ■
P = {x ∈ Rn : Ax ≤ b, x ≥ 0}
(where we assume A is not an empty matrix). Suppose that P has extreme points
x1 , . . . , xk and extreme directions d1 , . . . , dl . If x ∈ P , then there exists constants
λ1 , . . . , λk and µ1 , . . . , µl such that:
k
X l
X
x= λi x i + µj dj
i=1 j=1
k
(1.69)
X
λi = 1
i=1
λi ≥ 0 i = 1, . . . , k
µj ≥ 0 1, . . . , l
P = P ∩ {x ∈ Rn : eT x ≤ M } (1.70)
θv = eT (xv − xv(i) )
and let
xv − xv(i)
d=
θv
G(xv − xv(i) ) = Gd = 0
and therefore, there are n − 1 linearly independent binding hyperplanes in the system
Ad ≤ 0, d ≥ 0. At last we see that with eT d = 1 that d must be an extreme point of
D and therefore an extreme direction of P . Let dj(v) = d be this extreme direction.
Thus we have:
xv = xi(v) + θv dj(v)
At last we can see that by substituting this into Expression 1.73 for each such v
and arbitrarily letting i(v) = j(v) = 1 if δv = 0 (in which case it doesn’t matter), we
obtain:
k
X k+u
X k+u
X
x= δj xj + δv xi(v) + δv θv dj(v) (1.74)
i=1 v=k+1 v=k+1
x5 x4
x = µx5 + (1 − µ) (λx2 + (1 − λ)x3 )
x1
x
d1
x2
λx2 + (1 − λ)x3
x3
x = λx2 + (1 − λ)x3 + θd1
This example illustrates simply how one could construct an expression for an
arbitrary point x inside a polyhedral set in terms of extreme points and extreme
directions. ■
P = {x ∈ Rn : Ax ≤ b}
To prove the convexity of P , we’ll use the definition of convexity: For any
x, y ∈ P and any λ ∈ [0, 1], the point z = λx + (1 − λ)y must also belong to P .
Let x, y ∈ P . Then, by the definition of P , we have:
Ax ≤ b and Ay ≤ b
=b
The inequality above uses the fact that Ax ≤ b and Ay ≤ b, and since λ is in the
interval [0,1], it follows that:
Lemma 1.8. Intersection of Convex Sets is Convex Let C1 and C2 be convex sets.
Then the intersection C1 ∩ C2 is convex. In particular,
C1 ∩ C2 := {x : x ∈ C1 and x ∈ C2 }.
Polytope
Ball y
x
Figure 1.16: The green intersection of the convex sets that are the ball and the
polytope is also convex. This can be seen by considering any points x, y ∈ Ball ∩
Polytope. Since Ball is convex, the line segment between x and y is completey
contained in Ball. And similarly, the line segment is completely contained in Polytope.
Hence, the line segment is also contained in the intersection. This is how we can
reason that the intersection is also convex.
■ Example 2.1 Let’s recall a simple optimization problem from differential calculus
(Math 140): Goats are an environmentally friendly and inexpensive way to control
a lawn when there are lots of rocks or lots of hills. (Seriously, both Google and
some U.S. Navy bases use goats on rocky hills instead of paying lawn mowers!)
Suppose I wish to build a pen to keep some goats. I have 100 meters of fencing
and I wish to build the pen in a rectangle with the largest possible area. How long
should the sides of the rectangle be? In this case, making the pen better means
making it have the largest possible area.
Goat Pen y
Figure 2.1: Goat pen with unknown side lengths. The objective is to identify the
values of x and y that maximize the area of the pen (and thus the number of goats
that can be kept).
2x + 2y = 100 (2.1)
because 2x + 2y is the perimeter of the pen and I have 100 meters of fencing to
build my pen. The area of the pen is A(x, y) = xy. We can use Equation 2.1 to
solve for x in terms of y. Thus we have:
y = 50 − x (2.2)
and A(x) = x(50 − x). To maximize A(x), recall we take the first derivative of A(x)
with respect to x, set this derivative to zero and solve for x:
dA
= 50 − 2x = 0; (2.3)
dx
Thus, x = 25 and y = 50 − x = 25. We further recall from basic calculus how to
confirm that this is a maximum; note:
d2 A
= −2 < 0 (2.4)
dx2 x=25
Which implies that x = 25 is a local maximum for this function. Another way of
seeing this is to note that A(x) = 50x − x2 is an “upside-down” parabola. As we
could have guessed, a square will maximize the area available for holding goats. ■
Exercise 2.1 A canning company is producing canned corn for the holidays. They
have determined that each family prefers to purchase their corn in units of 12
fluid ounces. Assuming that metal costs 1 cent per square inch and 1 fluid ounce
is about 1.8 cubic inches, compute the ideal height and radius for a can of corn
assuming that cost is to be minimized. [Hint: Suppose that our can has radius
r and height h. The formula for the surface area of a can is 2πrh + 2πr2 . Since
metal is priced by the square inch, the cost is a function of the surface area. The
volume of the can is πr2 h and is constrained. Use the same trick we did in the
example to find the values of r and h that minimize cost. ■
R Clearly Definition 2.1 is valid only for domains and functions where the
concept of a neighborhood is defined and understood. In general, S must be
a topologically connected set (as it is in a neighborhood in Rn ) in order for
this definition to be used or at least we must be able to define the concept of
neighborhood on the set.
Exercise 2.2 Using analogous reasoning write a definition for a global and local
minimum. [Hint: Think about what a minimum means and find the correct
direction for the ≥ sign in the definition above.] ■
In Example 2.1, we are constrained in our choice of x and y by the fact that
2x + 2y = 100. This is called a constraint of the optimization problem. More
specifically, it’s called an equality constraint. If we did not need to use all the fencing,
then we could write the constraint as 2x + 2y ≤ 100, which is called an inequality
constraint. In complex optimization problems, we can have many constraints. The
set of all points in Rn for which the constraints are true is called the feasible set (or
feasible region). Our problem is to decide the best values of x and y to maximize
the area A(x, y). The variables x and y are called decision variables.
We have formulated the general maximization problem in Proble 2.5. Suppose that
we are interested in finding a value that minimizes an objective function z(x1 , . . . , xn )
subject to certain constraints. Then we can write Problem 2.5 replacing max with
min.
Exercise 2.3 Write the problem from Exercise 2.1 as a general minimization
problem. Add any appropriate non-negativity constraints. [Hint: You must
change max to min.] ■
An alternative and useful definition for the dot product is given by the following
formula. Let θ be the angle between the vectors x and y. Then the dot product of x
and y may be alternatively written as:
This fact can be proved using the law of cosines from trigonometry. As a result, we
have the following small lemma:
Lemma 2.1. Let x, y ∈ Rn . Then the following hold:
1. The angle between x and y is less than π/2 (i.e., acute) iff x · y > 0.
2. The angle between x and y is exactly π/2 (i.e., the vectors are orthogonal) iff
x · y = 0.
3. The angle between x and y is greater than π/2 (i.e., obtuse) iff x · y < 0.
Exercise 2.5 Use the value of the cosine function and the fact that x · y =
||x||||y|| cos θ to prove the lemma. [Hint: For what values of θ is cos θ > 0.] ■
When z : D ⊆ R → R, the graph is precisely what you’d expect. It’s the set of pairs
(x, y) ∈ R2 so that y = z(x). This is the graph that you learned about back in Algebra
1.
■ Example 2.3 Consider the function z = x2 + y 2 . The level set of z at 4 is the set
of points (x, y) ∈ R2 such that:
x2 + y 2 = 4 (2.11)
You will recognize this as the equation for a circle with radius 4. We illustrate this
in the following two figures. Figure 2.2 shows the level sets of z as they sit on the
3D plot of the function, while Figure 2.3 shows the level sets of z in R2 . The plot
in Figure 2.3 is called a contour plot.
Level Set
Figure 2.2: Plot with Level Sets Projected on the Graph of z. The level sets
existing in R2 while the graph of z existing R3 . The level sets have been projected
onto their appropriate heights on the graph.
Level Set
Figure 2.3: Contour Plot of z = x2 + y 2 . The circles in R2 are the level sets of the
function. The lighter the circle hue, the higher the value of c that defines the level
set.
Definition 2.5 (Line) Let x0 , v ∈ Rn . Then the line defined by vectors x0 and
v is the function l(t) = x0 + tv. Clearly l : R → Rn . The vector v is called the
direction of the line.
■ Example 2.4 Let x0 = (2, 1) and let v = (2, 2). Then the line defined by x0 and v
is shown in Figure 2.4. The set of points on this line is the set L = {(x, y) ∈ R2 :
x = 2 + 2t, y = 1 + 2t, t ∈ R}.
Figure 2.4: A Line Function: The points in the graph shown in this figure are in
the set produced using the expression x0 + vt where x0 = (2, 1) and let v = (2, 2).
d
z(x0 + tv) (2.12)
dt t=0
Exercise 2.6 Prove Proposition 2.1. [Hint: Use the definition of derivative for a
univariate function and apply it to the definition of directional derivative and
evaluate t = 0.] ■
∇z(x0 ) · v (2.15)
Proof. Let l(t) = x0 + vt. Then l(t) = (l1 (t), . . . , ln (t)); that is, l(t) is a vector function
whose ith component is given by li (t) = x0i + vi t.
Apply the chain rule:
d
z(x0 + tv) = ∇z(x0 ) · v (2.18)
dt t=0
We now come to the two most important results about gradients, (i) the fact that
they always point in the direction of steepest ascent with respect to the level curves
of a function and (ii) that they are perpendicular (normal) to the level curves of a
function. We can exploit this fact as we seek to maximize (or minimize) functions.
(because we assumed v was a unit vector) where θ is the angle between the vectors
∇z(x0 ) and v. The function cos θ is largest when θ = 0, that is when v and ∇z(x0 )
are parallel vectors. (If ∇z(x0 ) = 0, then the directional derivative is zero in all
directions.) ■
Theorem 2.3 Let z : Rn → R be differentiable and let x0 lie in the level set S
defined by z(x) = k for fixed k ∈ R. Then ∇z(x0 ) is normal to the set S in the
sense that if v is a tangent vector at t = 0 of a path c(t) contained entirely in S
with c(0) = x0 , then ∇z(x0 ) · v = 0.
R Before giving the proof, we illustrate this theorem in Figure 2.5. The function
is z(x, y) = x4 + y 2 + 2xy and x0 = (1, 1). At this point ∇z(x0 ) = (6, 4). We
include the tangent line to the level set at the point (1,1) to illustrate the
normality of the gradient to the level curve at the point.
Figure 2.5: A Level Curve Plot with Gradient Vector: We’ve scaled the gradient
vector in this case to make the picture understandable. Note that the gradient
is perpendicular to the level set curve at the point (1, 1), where the gradient was
evaluated. You can also note that the gradient is pointing in the direction of steepest
ascent of z(x, y).
Proof. As stated, let c(t) be a curve in S. Then c : R → Rn and z(c(t)) = k for all
t ∈ R. Let v be the tangent vector to c at t = 0; that is:
dc(t)
=v (2.20)
dt t=0
Differentiating z(c(t)) with respect to t using the chain rule and evaluating at t = 0
yields:
d
z(c(t)) = ∇z(c(0)) · v = ∇z(x0 ) · v = 0 (2.21)
dt t=0
z(x, y) = k
where k ∈ R. We can compute the slope of any tangent line to this curve at
some point (x0 , y0 ) with implicit differentiation. We have:
d d
z(x, y) = k
dx dx
yields:
∂z ∂z dy
+ =0
∂x ∂y dx
Then the slope of the tangent line is given by:
dy −∂z/∂x
=
dx ∂z/∂y
−zx (x0 , y0 )
m=
zy (x0 , y0 )
y − y0 = m(x − x0 ) (2.22)
We can compute a vector that is parallel to this line by taking two points on
the line, (x0 , y0 ) and (x1 , y1 ) and computing the vector (x1 − x0 , y1 − y0 ). We
know that:
y1 − y0 = m(x1 − x0 )
because any pair (x1 , y1 ) on the tangent line must satisfy Equation 2.22. Thus
we have the vector v = (x1 − x0 , m(x1 − x0 )) parallel to the tangent line. Now
we compute the dot product of this vector with the gradient of the function:
We obtain:
■ Example 2.5 Let’s demonstrate the previous remark and Theorem 2.3. Consider
the function z(x, y) = x4 + y 2 + 2xy with a point (x0 , y0 ). Any level curve of the
function is given by: x4 + y 2 + 2xy = k. Taking the implicit derivative we obtain:
! !
d d dy dy
x4 + y 2 + 2xy = k =⇒ 4x3 + 2y + 2y + 2x = 0
dx dx dx dx
Note that to properly differentiate 2xy implicitly, we needed to use the product
rule from calculus. Now, we can solve for the slope of the tangent line to the curve
at point (x0 , y0 ) as:
−4x30 − 2y0
!
dy
m= =
dx 2y0 + 2x0
y − y0 = m(x − x0 )
Using the same reasoning we did in the remark, a vector parallel to this line is
given by (x1 − x0 , y1 − y0 ) where (x1 , y1 ) is another point on the tangent line. Then
we know that:
y1 − y0 = m(x1 − x0 )
and thus our vector is v = (x1 − x0 , m(x1 − x0 )). Now, computing the gradient of
z(x, y) at (x0 , y0 ) is:
Lastly we compute:
∇z(x0 , y0 ) · v = 4x30 + 2y0 (x1 − x0 ) + (2y0 + 2x0 ) (m(x1 − x0 )) =
−4x30 − 2y0
!
4x30 + 2y0 (x1 − x0 ) + (2y0 + 2x0 ) (x1 − x0 ) =
2y0 + 2x0
4x30 + 2y0 (x1 − x0 ) + −4x30 − 2y0 (x1 − x0 ) = 0
Thus, for any point (x0 , y0 ) on a level curve of z(x, y) = x4 + y 2 + 2xy we know that
the gradient at that point is perpendicular to a tangent line (vector) to the curve
at the point (x0 , y0 ).
It is interesting to note that one can compute the slope of the tangent line (and
its equation) in Figure 2.5. Here (x0 , y0 ) = (1, 1), thus the slope of the tangent line
is:
−4x30 − 2y0 −6 −3
m= = =
2y0 + 2x0 4 2
Exercise 2.7 In this exercise you will use elementary calculus (and a little bit of
vector algebra) to show that the gradient of a simple function is perpendicular to
its level sets:
(a) Plot the level sets of z(x, y) = x2 + y 2 . Draw the gradient at the point
(x, y) = (2, 0). Convince yourself that it is normal to the level set x2 + y 2 = 4.
(b) Now, choose any level set x2 + y 2 = k. Use implicit differentiation to find
dy/dx. This is the slope of a tangent line to the circle x2 + y 2 = k. Let
(x0 , y0 ) be a point on this circle.
(c) Find an expression for a vector parallel to the tangent line at (x0 , y0 ) [Hint:
you can use the slope you just found.]
(d) Compute the gradient of z at (x0 , y0 ) and use it and the vector expression
you just computed to show that two vectors are perpendicular. [Hint: use
the dot product.]
■
■ Example 2.6 — Continuation of Example 2.1. Let’s look at the level curves of
the objective function and their relationship to the constraints at the point of
optimality (x, y) = (25, 25). In Figure 2.6 we see the level curves of the objective
function (the hyperbolas) and the feasible region shown as shaded. The elements in
the feasible regions are all values for x and y for which 2x + 2y ≤ 100 and x, y ≥ 0.
You’ll note that at the point of optimality the level curve xy = 625 is tangent to
the equation 2x + 2y = 100; i.e., the level curve of the objective function is tangent
to the binding constraint.
Figure 2.6: Level Curves and Feasible Region: At optimality the level curve of the
objective function is tangent to the binding constraints.
If you look at the gradient of A(x, y) at this point it has value (25, 25). We see
that it is pointing in the direction of increase for the function A(x, y) (as should be
expected) but more importantly let’s look at the gradient of the function 2x + 2y.
It’s gradient is (2, 2), which is just a scaled version of the gradient of the objective
function. Thus the gradient of the objective function is just a dilation of gradient
of the binding constraint. This is illustrated in Figure 2.7.
Figure 2.7: Gradients of the Binding Constraint and Objective: At optimality the
gradient of the binding constraints and the objective function are scaled versions
of each other.
The elements illustrated in the previous example are true in general. You may
have discussed a simple example of these when you talked about Lagrange Multipliers
in Vector Calculus (Math 230/231). We’ll revisit these concepts later when we talk
about duality theory for linear programs. We’ll also discuss the gradients of the
binding constraints with respect to optimality when we discuss linear programming.
Exercise 2.8 Plot the level sets of the objective function and the feasible region in
Exercise 2.1. At the point of optimality you identified, show that the gradient of
the objective function is a scaled version of the gradient (linear combination) of
the binding constraints. ■
(A subtle issue: we’ve written x, y ∈ R but actually x and y should be integers: you
can’t hire 12 of a yodelist. We’ll ignore this problem today and return to it near the
end of the semester.)
(But don’t worry about "equations" for now: we’ll limit ourselves to inequalities for
the moment.)
In linear algebra, we write down a system of equations as a single matrix equation.
In linear programming, we will also write down a system of inequalities as a single
matrix inequality. Here’s how we do it.
First, we should make sure all our variables are on one side, if we haven’t done
that already. In this example, that’s already been done. Second, let’s multiply the
second inequality by −1, so that all the inequalities are ≤ inequalities:
x + y ≤ 50
−y ≤ 20
2x − y ≤ 40
The column of values on the left-hand side can be written as a matrix multiplication:
x+y 1 1
x
−y = 0 −1 .
y
2x − y 2 −1
Putting a single "≤" between two vectors is something you might not be used to.
What it means is that every component of the vector on the left is less than or equal
to the corresponding component of the vector on the right.
What happens in general? Suppose our linear program has n variables that, for
lack of creativity, we will call x1 , x2 , . . . , xn . We can put all these variables together
into a column vector x ∈ Rn . Then any collection of m linear inequalities in x1 , . . . , xn
can be combined into a matrix inequality Ax ≤ b where A is an m × n matrix and b
is an m × 1 vector.
The set {x ∈ Rn : Ax ≤ b} of all x which satisfy the constraints is called the
feasible region. In our example, the feasible region is shown below (it extends
infinitely far to the left):
2x − y = 40
x + y = 50
x
y = −20
A point x in the feasible region is called a feasible solution. You should think
of it as follows: a point (x, y) in this region is a feasible decision you could make
(even if it loses your company a lot of money), whereas a point (x, y) outside this
region is just not an option you could choose.
(Technically, one of these is a scalar and one of these is a 1 × 1 matrix, but we will
often ignore the difference.)
More generally, when we have a vector of variables x ∈ Rn , we can write the
objective function as cT x for some constant vector c ∈ Rn .
Putting together these ideas, any linear program can be written as
maximize
n
cT x
x∈R
subject to Ax ≤ b.
What about minimizing? Well, minimizing cT x would be the same as maximizing its
negative (−c)T x. We will encounter both kinds of linear programs in class, but we
don’t lose any generality by focusing on one kind whenever it’s convenient.
Whether we’re minimizing or maximizing, a point x ∈ Rn with the best value of
the objective function is called an optimal solution. In our example, the point
(x, y) = (30, 20) is the unique optimal solution, as we’ll see in a moment.
Pick a small value of 1000x − 300y (such as 16000) and the feasible points with
that value of x − y are a line segment. Pick a large value of 1000x − 300y (such as
28000) and there are no feasible points with that value of 1000x − 300y. But when
1000x − 300y = 2400, just before the value becomes impossible, the segment shrinks
to a single point: a corner of the feasible region.
Without drawing this picture and lots of carefully measured parallel lines, all
we know is that this happens a some corner. Where are the corners? Well, at each
corner, two of our boundary lines intersect. So we can try taking our boundaries two
at a time, and seeing where they intersect:
• x + y = 50 and 2x − y = 40 when (x, y) = (30, 20). This is one of our corners:
the top one.
• 2x − y = 40 and y = −20 when (x, y) = (10, −20). This is the lower of the two
corners.
• x + y = 50 and y = −20 when (x, y) = (70, −20). This is not actually a corner:
two boundaries intersect here, but the inequality 2x − y ≤ 40 does not hold.
Now we can compare the values of 1000x − 300y at (30, 20) and (10, −20). The
first corner turns out to be better than the second, so that’s our optimal solution.
(There’s actually one more important thing to check, which we’ll get to in a bit, but
in this case it doesn’t affect the answer.)
This is the “naive" approach to solving linear programs. It’s quick to explain,
and for small examples, especially ones you can draw in the plane, it may be the
easiest thing to do.
Imagine, however, that you have a linear program with 50 variables and 100
inequalities. (This is a "tiny" linear program: my computer solves one of these
in approximately 0.03 seconds.) With the naive approach, there are 100 50 =
100 891 344 545 564 193 334 812 497 256 combinations of 50 of the equations bounding
the region. Each of these combinations (in general) intersects at a single point, so
we need to compare that many points to find the best one.
Our goal in this class is going to be to try to do less work than this. Ahead of us
is the simplex method, which starts at one vertex of a linear program, and moves
from vertex to vertex until it finds the best one: hopefully long before it visits all
the vertices. We won’t solve problems with 100 inequalities, but computers solve
such problems in a very similar way.
fire!
Either way, even the naive approach needs to worry about this: we should
check that in the direction that our region extends forever, the solutions keep
getting worse and not better. We won’t go into detail about how to check this,
because we won’t be using the naive approach.
3. The optimal solution might not exist, because there are no feasible
solutions. Imagine that the constraints we have contradict each other: there
is no way to satisfy all of them. This is a time to rethink our model and see if
we can relax some constraints. (Maybe union negotiations have actually forced
us to acquire more office space before they can be satisfied, for example.)
(x + 2)2 + y 2 ≤ 1 (x − 2)2 + y 2 ≤ 1
No matter how you try, you can never draw a linear inequality that includes both
of these disks, but excludes the origin, (0, 0). Here’s a formal proof. Suppose you
have any system of inequalities Ax ≤ b that includes both disks. Then in particular
it includes the points (−2, 0) and (2, 0) at their centers. So
−2 2 −2 2
A ≤ b and A ≤ b =⇒ A + A ≤ 2b
0 0 0 0
0
=⇒ A ≤ 2b
0
maximize c1 x 1 + c2 x 2 + · · · + cn x n
x1 ,...,xn ∈R
This is convenient to deal with, because linear algebra gives us a lot of tools for
understanding the system of equations Ax = b. We just need to figure out what
happens when we also require x ≥ 0.
(In particular, this is the form of linear program that the simplex method will
use: this method is built on top of Gaussian elimination for solving the system of
equations Ax = b.)
In some cases, we want to go the other way: we want to turn equations into
inequalities. This is also possible. To write down an equation
a1 x 1 + a2 x 2 + · · · + an x n = b
a1 x 1 + a2 x 2 + · · · + an x n ≤ b and a1 x 1 + a2 x 2 + · · · + an x n ≥ b
This means we can express any linear program using only inequalities, and no
equations at all.
In this lecture, I want to go over using Gaussian elimination to do this, and some
finer points of the algorithm that we’ll need to know for this class.
We begin by deciding that x will be the basic variable for the first equation.
Having made this decision:
1. We scale the first equation so that the coefficient of x is 1. We get
1
x + 3y = 2
2x − y = −1
2. We subtract twice the first equation from the second, so that x is eliminated.
(In general, we do this to eliminate the basic variable from every other equation.)
We get
1
x + 3y = 2
5
− 3y = −5
Next, we move on to the second equation. We pick a basic variable there as well; it
can only be y, because that’s the only variable contained in the equation. Again:
1. We scale the second equation so that the coefficient of y is 1. We get
1
x + 3y = 2
y = 3
1
2. To clear y from the first equation, we subtract 3 of the second equation. We
get
x = 1
y = 3
x1 x2 x3 x4
2 1 −2 1 0
3 −1 0 1 5
When an equation has a basic variable, it helps to annotate that row with its basic
variable. This is especially important when expressing the basic variables in terms
of the non-basic variables, since that information does not exist anywhere else! For
example,
2 2 x3 x4
x
1 = 1 + 5 x3 − 5 x4 2
6 1 becomes x1 1 5 − 25
x2 = −2 + 5 x3 − 5 x4 6
x2 −2 5 − 15
Next, choose x3 as the basic variable in the second equation. We should divide by
−2, and then add twice what we get from the first equation. (Equivalently, subtract
the second equation from the first, then divide the second equation by −2.) We get:
−3x + x2 − x4 = −5
1
− 5 x1 + x3 − x4 = − 52
2
To read off x2 , x3 in terms of x1 , x4 , we can move those terms to the other side,
getting
x
2 = −5 + 3x1 + x4
x3 = − 52 + 5
2 x1 + x4
To minimize effort, we can take this solution as a starting point. Let’s begin by
eliminating x3 from the second equation. To do this, just subtract 3 times the first
equation:
2 2
x1 = 1 + 5 x3 − 5 x4
x2 − 3x1 = −5 + x4
We want x3 on the left-hand side and x1 on the right-hand side, so just move those
terms (in both equations)
2 2
− 5 x3 = 1 − x1 − 5 x4
x2 = −5 + 3x1 + x4
In this example, with only two equations, this does not seem like more effort than
solving from scratch. This approach (which we’ll call pivoting in the future) shines
if we have many equations, and we are only making a minor change to the set of
basic variables.
We want to solve for x2 and x3 , so just take the second and third columns of the
coefficient matrix on the left:
1 −2
−1 0
To get a system of equations in which x2 and x3 are the basic variables, find the
inverse of this matrix:
−1
1 −2 1 0 2 0 −1
= =
−1 0 1 · 0 − (−2) · (−1) 1 1 − 12 − 12
This simplifies to
x
1
−3 1 0 −1 x2
h
5
i
= −5 −
− 25 0 1 −1 2
x3
x4
and now we directly have the row-reduced form of the system of equations. Moving
from matrices back to equations, this result tells us that:
−3x + x2 − x4 = −5
1
− 5 x1 + x3 − x4 = − 52
2
All that’s left is to isolate x2 in the first equation and x3 in the second, and we’ll
have the same solution we’ve found twice already:
x
2 = −5 + 3x1 + x4
x3 = − 52 + 5
2 x1 + x4
2.15 Problems
1. The system of equations below has infinitely many solutions. Solve for y and z
in terms of x.
3x + 2y − 3z = −1
3x − y + 2z = 2
2. The following system of equations has already been solved for x1 , x2 , x3 in
terms of x4 , x5 :
x1 + 3x2 − 2x3 + x4 − x5 = 1 x1
= 2 − x4
−2x1 + x2 + 2x4 − x5 = 1 ⇝
x2 = 5 − 4x4 + x5
x 1 + x2 − x3 − x4 = −1
x3 = 8 − 6x4 + x5
a. Find two different particular solutions (x1 , x2 , x3 , x4 , x5 ) to this system of
equations.
b. Solve for x1 , x2 , x5 in terms of x3 , x4 instead. Try to do as little additional
work as possible.
3. Consider the following system of equations:
3x
1 + 5x2 + x3 − 2x4 = 4
x1 + 2x2 + x3 − x4 = −1
a. Write this system of equations in matrix form: as Ax = b, where A is a
2 × 4 matrix, x is the column vector of our variables x1 , . . . , x4 , and b is a
2 × 1 column vector.
b. Take the first two columns of A only. Find the inverse of this 2 × 2 matrix.
c. Left-multiply both sides of the matrix equation Ax = b by the inverse
matrix you’ve found.
d. Your result should now be row-reduced. Use it to solve for x1 , x2 in terms
of x3 , x4 .
4. Consider the following system of equations, already written in matrix form:
2 1 −5 x1 1
0 1 −1 x = 3
2
−2 1 3 x3 2
h i
a. Left-multiply both sides of this matrix equation by the row vector 1 −2 1 .
b. What does the result tell you about the system of equations?
maximize x + y
x,y∈R
subject to 2x + 3y ≤ 15
x + 2y ≤ 9
2x + y ≤ 12
x, y ≥ 0
a. Without trying to solve the linear program, can you give a convincing
argument for why there is no feasible solution (x, y) where x + y is 10 or
higher?
b. An shadowy figure cryptically tells you “take the sum of the first two
inequalities, then divide by three".
How can this help you get a better upper bound on x + y than what you
got in part (a)?
c. Can you find an even better upper bound on x + y in the same way as in
part (b)?
That is, we have a perfectly ordinary system of linear equations, together with the
added constraint that all variables must be nonnegative.
There are infinitely many feasible solutions, but on the first day, we saw a rule
that cuts their number down to a manageable amount:
Rule #1: At least one optimal solution is a corner point of the feasible
region.1
Rule #2: All corner points of the feasible region are basic solutions of
the system of linear equations.
With x servings of fries and y servings of ketchup, the constraints are shown
below on the left:
x+ y ≥ 10
x+ y − w1 = 10
210x + 20y ≤ 2000 210x + 20y + w2 = 2000
⇝
0.1x + 0.2y ≤ 3
0.1x + 0.2y + w3 = 3
x, y ≥ x, y, w1 , w2 , w3 ≥
0
0
We can begin by practicing turning these into equations. Add a slack variable to
each inequality, and we get the equations above on the right.
where the nonnegativity conditions x, y, w1 , w2 , w3 ≥ 0 still hold, but I’ll stop writing
them every time. To find a basic solution, set the nonbasic variables x, y to 0, and
read off the values of the basic variables w1 , w2 , w3 .
Is this one of the corner points? No! When x = y = 0, we get w1 = −10, w2 = 2000,
and w3 = 0. These are not all nonnegative. We should have expected this: setting
x = y = 0 means you’re not eating anything, so you’re violating the constraint "eat
at least 10 servings".
A corner point must be a basic solution, but a corner point must also be feasible:
all the variables must be nonnegative. We are looking for a basic feasible solution:
you will hear these words a lot this semester. This term (sometimes cryptically
abbreviated bfs) is just the sum of its parts: a feasible solution which is also a basic
solution.
We won’t get anywhere with an infeasible solution, so let’s start from scratch.
1
y = 10.75 + 400 w2 − 5.25w3
1
⇝
x = 8.5 − 200 w2 + 0.5w3
1
w1 = 9.25 − 400 w2 − 4.75w3
Our new basic feasible solution is (x, y, w1 , w2 , w3 ) = (8.5, 10.75, 9.25, 0, 0).
This was just aimless wandering around; in the next lecture, we’ll reintroduce
the objective function, and think about pivoting with purpose. Think of what we’ve
done today as driving around the parking lot; next, we’ll get on the highway.
2.18.5 Troubleshooting
The only goal of the pivoting algorithm we learned today is to go from a basic feasible
solution to another basic feasible solution. You know that you’ve picked the correct
leaving variable if your new basic solution is still feasible—if it’s not, then go back
and rethink your choice of leaving variable.
Aside from that, remember the cardinal rule: always do the same thing to both
sides of an equation. Finally, watch out for mistakes with lost negative signs, as
those are very easy to make here.
The first thing to realize is that when the equations hold, the objective function has
many equivalent forms. Since x1 + x2 + x3 = 9, for example, maximizing 2x1 + 3x2 +
4x3 is equivalent to maximizing 2x1 + 3x2 + 4x3 + (x1 + x2 + x3 − 9) or 3x1 + 4x2 +
5x3 − 9: if we maximize that objective function instead, we get the same solution.
We will give the expression 2x1 + 3x2 + 4x3 a name: we’ll call it ζ (zeta).4 Writing
down the equation ζ = 2x1 + 3x2 + 4x3 makes our lives somewhat easier: we now
have 4 equations in 6 variables ζ, x1 , x2 , x3 , w1 , w2 , and we are simply maximizing
one of the variables.
This particular problem conveniently starts out row-reduced with x1 , w1 , w2 as
the basic variables; we can easily solve for them in terms of the non-basic variables
x2 , x3 . Out of our many representations for ζ, it is convenient to pick one that’s also
in terms of x2 , x3 . Just subtract twice (x1 + x2 + x3 − 9) to get ζ = x2 + 2x3 + 18.
max ζ = 18 + x2 + 2x3
x1 = 9 − x2 − x3
w1 = 1 − x2 + 3x3
w2 = 1 + 3x2 − x3
It is helpful to include “max” or “min" in the top left corner of the dictionary, to
remind ourselves that we’re maximizing or minimizing. The simplex method treats
the two cases differently.
Each dictionary corresponds to a basic solution whose parameters we can read off
from the column immediately after =. We have x1 = 9, w1 = 1, and w2 = 1, while the
nonbasic variables x2 , x3 are set to 0; the objective value of this solution is ζ = 18.
With the possible exception of ζ, all the numbers in this column should be
nonnegative if we are looking at a basic feasible solution. If the dictionary has this
property, we call it a feasible dictionary and, for the time being, we will not
consider any other kind of dictionary.
There are as many feasible dictionaries as there are basic feasible solutions.5 The
simplex method operates by moving from dictionary to dictionary until we arrive at
one that gives us the optimal solution. The method of moving from dictionary to
dictionary is the same as in the previous lecture; today, we will see how the objective
value fits in.
5 Note for the future: sometimes, unfortunately, there are slightly more—multiple feasible
dictionaries for the same basic feasible solution! Don’t worry about this for now.
We’ve obtained our new dictionary! The new values of our variables are (x1 , x2 , x3 , w1 , w2 ) =
(8, 0, 1, 4, 0), and the objective value ζ has increased to 20.
(This makes sense: since w2 just left the basis, pivoting around w2 will return us to
where we were previously, undoing our progress!)
In x2 ’s column, only x1 has a negative coefficient, so it is our only valid choice of
leaving variable: we don’t even have to compare ratios.
Solving the equation x1 = 8 − 4x2 + w2 for x2 , we get x2 = 2 − 14 x1 + 14 w2 . Now
we are ready to substitute this in for x2 in all the other rows of our dictionary:
This is closer to the way we write things when we do Gaussian elimination. It has
more columns, but the advantage is that it is easier to put in a table, without having
to write the variables every time. Here, the simplex tableau would be:
x1 x2 x 3 w 1 w 2
−ζ −20 0 7 0 0 −2
x1 8 1 4 0 0 −1
w1 4 0 −8 0 1 3
x3 1 0 −3 1 0 1
We annotate the columns with the variables whose coefficients are in those columns;
we annotate the rows with the basic variable in that row. We write −ζ in the
objective row to remind ourselves that with this method, −20 is the negative of the
objective value. Iterations of the simplex method are just ordinary row reduction
with this grid of numbers.
Having the negative of the objective value appear in the tableau is a bit weird,
so you might also see tableaux written with the top equation ordered differently: as
ζ − 7x2 + 2w2 = 20. Then, the tableau could look like the following:
ζ x1 x2 x3 w1 w2
ζ 20 1 0 −7 0 0 2
x1 8 0 1 4 0 0 −1
w1 4 0 0 −8 0 1 3
x3 1 0 0 −3 1 0 1
This way, you can read off the current solution and the objective value from the
left of the tableau. The downside of this approach is that the reduced costs in
this version are the negatives of the reduced costs we’re used to seeing! There’s
nothing wrong with that—provided we reverse our rules of dealing with the reduced
costs—but it means that building the initial tableau is a little bit weird. The
numbers we’ll have to put in the top row of the tableau will be the negatives of the
coefficients in the objective function, because we rewrite ζ = c1 x1 + c2 x2 + · · · + cn xn
as ζ − c1 x1 − c2 x2 − · · · − cn xn = 0.
There are other variants of the the tableau, with the rows and columns rearranged
in minor ways. This makes it extra important to keep the rows and columns labeled
with variables, so that we can interpret them more easily.
maximize 2x + 3y maximize 2x + 3y
x,y∈R x,y,w1 ,w2 ,w3 ∈R
subject to −x + y ≤ 3 subject to −x + y + w1 =3
x − 2y ≤ 2 ⇝ x − 2y + w2 =2
x+ y ≤ 7 x+ y + w3 = 7
x, y ≥ 0 x, y, w1 , w2 , w3 ≥ 0
Adding slack variables has a convenient bonus effect. The slack variables
(w1 , w2 , w3 ) form a convenient set of basic variables to start with, for two reasons:
• The dictionary will already be row-reduced for the slack variables, since each
one shows up in only one equation. This will be true any time we add slack
variables.
• The basic solution is (x, y, w1 , w2 , w3 ) = (0, 0, 3, 2, 7), which is feasible. This
happens whenever our starting inequalities are all upper bounds with a positive
constant on the right-hand side. So it’s not always useful, but sometimes makes
our lives easier.
Here is our starting dictionary, and a graph of the feasible region (of the original
linear program in x and y) with the corresponding basic feasible solution marked:
y
max ζ = 0 + 2x + 3y
w1 = 3 + x − y
w2 = 2 − x + 2y
w3 = 7 − x − y x
(0, 0)
Let’s bring y into the basis. (This is an arbitrary choice: we could also have
chosen x.) Since w2 ’s coefficient of y is 2, it’s not a valid leaving variable; w1 and
w3 have ratios of 31 and 71 , of which the smallest is 3. So y replaces w1 in the basis,
giving us the new dictionary below:
y
max ζ = 9 + 5x − 3w1
(0, 3)
y = 3 + x − w1
w2 = 8 + x − 2w1
w3 = 4 − 2x + w1
x
The basic feasible solution is (x, y, w1 , w2 , w3 ) = (0, 3, 0, 8, 4). The nonbasic variables
of the new dictionary are x and w1 . As before, x “owns” the x ≥ 0 constraint.
Meanwhile, w1 “owns” the w1 ≥ 0 constraint, but in the original linear program,
this was the −x + y ≤ 3 constraint. We should be at the corner where x = 0 and
−x + y = 3 meet, and indeed, these lines meet at (0, 3).
The choice of entering variable corresponds to picking the direction in which we
went around the polygon: which edge out of (0, 0) we used. The edge from (0, 0) to
(0, 3) moves away from the y ≥ 0 constraint, so y is the variable that becomes basic.
We could also have brought x into the basis, moving away from the x ≥ 0 constraint.
But now, at (0, 3), there is only one good choice of entering variable. We don’t
want to go back to (0, 0), so the only choice is to continue going clockwise. In the
dictionary, this corresponds to how we don’t want to bring w1 back into the basis
(its reduced cost is negative, so this would decrease ζ). Instead, the only helpful
entering variable is x, whose reduced cost is positive.
In x’s column, the coefficients of y and w2 are both positive, so those can’t be
leaving variables. Therefore x replaces w3 in the basis, giving us the new dictionary
below:
y (2, 5)
1 5
max ζ = 19 − 2 w1 − 2 w3
1 1
y = 5 − 2 w1 − 2 w3
3 1
w2 = 10 − 2 w1 − 2 w3
1 1
x = 2 + 2 w1 − 2 w3 x
In this dictionary, all reduced costs are negative. Therefore ζ is maximized and
(x, y) = (2, 5) is the optimal solution.
(Here, w1 and w3 are nonbasic. The constraints they “own” are the −x + y ≤ 3
constraint and the x + y ≤ 7 constraint. So we end up at the corner point where the
lines −x + y = 3 and x + y = 7 intersect.)
If we had decided pivot around x first, rather than y, we would have arrived at
the same final answer, but going counterclockwise around the feasible region instead.
There would have been three steps, not two, because there are three edges to take
when going around that way.
maximize 2x + 3y maximize 2x + 3y
x,y∈R x,y,w1 ,w2 ∈R
subject to −x + y ≤ 3 ⇝ subject to −x + y + w1 =3
x − 2y ≤ 2 x − 2y + w2 = 2
x, y ≥ 0 x, y, w1 , w2 ≥ 0
Our first iteration of the simplex method will be nearly the same with Problem 2.8
as it was with Problem 2.7, and will also bring us to the point (0, 3). We can quickly
get the dictionary for that point by dropping the equation for w3 :
Looking at the diagram, we see what’s about to happen: the feasible region is
unbounded in the direction we want to go.
It’s still a good idea to bring x into the basis: it still has a positive reduced cost.
But now, both basic variables are ruled out at the first stage: both of them have a
positive coefficient in x’s column, so neither of them decreases as x increases. There
is no leaving variable to choose.
This is what it looks like when the linear program is unbounded, and we can
improve the objective value as much as we want. There is no optimal solution.
From this dictionary, we can also learn a bit about how the linear program
is unbounded. The variables doing useful work are the basic variables y, w2 and
the entering variable x. All other variables (just w1 in this case) should be set to
0. Setting w1 to 0 and getting rid of it in every other row gives us the following
pseudo-dictionary:
y
max ζ = 9 + 5x
(0, 3)
y = 3 + x
w2 = 8 + x
w1 = 0
x
We get better and better solutions as we travel along the line y = 3 + x, increasing x
as much as we want: the objective value increases as ζ = 9 + 5x. All our variables
remain positive (including the slack variables w1 and w2 ), so the solution remains
feasible the whole way.
All this is happening behind the scenes when we do any pivot step. But here,
because the coefficients in x’s column were both positive, the slopes of y = x + 3 and
w2 = x + 8 are both positive, which means that we can increase x without a limit.
And since x had a positive reduced cost of 5, we know that this gives us arbitrarily
large objective values.
Whenever we learn from the dictionary that the linear program is unbounded,
we can perform such an analysis to find an infinite ray of feasible solutions along
which the objective value improves without bound.
maximize 2x + 3y maximize 2x + 3y
x,y∈R x,y,w1 ,w2 ,w3 ∈R
subject to −x + y ≤ 3 subject to −x + y + w1 =3
x − 2y ≤ 2 ⇝ x − 2y + w2 =2
x + 2y ≤ 6 x + 2y + w3 = 6
x, y ≥ 0 x, y, w1 , w2 , w3 ≥ 0
To see what makes Problem 2.9 different from Problem 2.7, let’s take a look at
the initial dictionary and especially at the feasible region:
max ζ = 0 + 2x + 3y
w1 = 3 + x − y
w2 = 2 − x + 2y
w3 = 6 − x − 2y x
(0, 0)
Here is where things start to go wrong. Our next entering variable must be x,
because it’s the only variable with positive reduced cost. As x increases, y and w2
also increase, so they will not leave the basis: the leaving variable must be w3 . But
when we make this happen, the dictionary changes, but the values of all the variables
stay the same!
y
1 5
max ζ = 9 + 3 w1 − 3 w3 (0, 3)
1 1
y = 3 − 3 w1 − 3 w3
4 1
w2 = 8 − 3 w1 − 3 w3
2 1
x = 0 + 3 w1 − 3 w3 x
You’ll need at least one cup an hour to keep you focused. To stay awake until
the assignment is due, you’ll need at least 7 “units" of caffeine (if we say a unit of
caffeine is the amount in a cup of tea; there’s 3 unit in a cup of coffee). Finally, to
have the energy to work on the assignment, you need at least 6 units of sugar (the
amount in a cup of coffee; a cup of sweet tea has 2 units).
If every cup of coffee costs $4.50 and every cup of tea costs $3, what is the
cheapest way to make this work?
We can write this problem as follows:
The difficulty in adapting our methods to this problem is this: how do we find an
initial basic feasible solution?
minimize x0
x0 ,x1 ,x2 ,w1 ,w2 ,w3 ∈R
subject to −x1 − x2 + w1 − x0 = −5
−3x1 − x2 + w2 − x0 = −7
−x1 − 2x2 + w 3 − x0 = −6
x0 , x 1 , x 2 , w 1 , w 2 , w 3 ≥ 0
As usual, we’ll set x1 = x2 = 0 in our initial feasible solution. We’ll need to set x0 = 7,
because that’s the largest number on the right-hand side. Then w2 = 0 satisfies our
second equation, and we can set w1 = 7 − 5 = 2 and w3 = 6 − 5 = 1 to satisfy the
second and third equation.
That’s an unsystematic description of how we get our initial dictionary, though.
We begin by taking our basic variables to be w1 , w2 , w3 :
min ξ = 0 + x0
w1 = −5 + x1 + x2 + x0
w2 = −7 + 3x1 + x2 + x0
w3 = −6 + x1 + 2x2 + x0
This is not feasible: all three of w1 , w2 , w3 are negative in the basic solution. Our
first step in the phase one problem is always, ignoring any pivoting rules, to bring x0
into the basis, and take w2 (the variable with the most negative value) out of the
basis. This is guaranteed to lead us to a feasible dictionary:
min ξ = 7 − 3x1 − x2 + w2
w1 = 2 − 2x1 + w2
x0 = 7 − 3x1 − x2 + w2
w3 = 1 − 2x1 + x2 + w2
(It will always be the case that the equation for ξ at the top matches the equation
for x0 , for as long as x0 is a basic variable. I will keep writing the same equation in
both places, just to match the usual way we write a dictionary.)
Now we can proceed to solve this linear program in the usual way. Since we’re
minimizing ξ, we should pivot on entries that have a negative reduced cost. In this
example, pivoting on x2 turns out to be the best choice. The only possible leaving
variable is x0 (since it is the only one with a negative coefficient on x2 . Solving x0 ’s
equation for x2 gives x2 = 7 − 3x1 + w2 − x0 , and then we will just substitute that in
for x2 in the other equations:
min ξ = 7 − 3x1 + w2 − (7 − 3x1 + w2 − x0 ) min ξ = 0 + x0
w1 = 2 − 2x1 + w2 w1 = 2 − 2x1 + w2
⇝
x2 = 7 − 3x1 + w2 − x0 x2 = 7 − 3x1 + w2 − x0
w3 = 1 − 2x1 + w2 + (7 − 3x1 + w2 − x0 ) w3 = 8 − 5x1 + 2w2 − x0
The phase one problem is solved once the objective value reaches 0, which typically
happens exactly when x0 leaves the basis. Once this happens, we can solve the
phase two problem: the one we started with! To get there, we:
1. Remove x0 from the dictionary; we no longer need it.
2. Replace the artificial objective function ξ by the original objective function ζ,
expressed in terms of the current basic variables.
In this case our original objective is to minimize ζ = 4.5x1 + 3x2 . Substituting
x2 = 7 − 3x1 + w2 gives us ζ = 21 − 4.5x1 − 3w2 , so our new dictionary is:
min ζ = 21 − 4.5x1 + 3w2
w1 = 2 − 2x1 + w2
x2 = 7 − 3x1 + w2
w3 = 8 − 5x1 + 2w2
Since we are minimizing ζ, the only good choice of entering variable is x1 . Comparing
the ratios 22 , 73 , and 85 , we see that w1 must leave the basis. Solving w1 ’s equation
for x1 , we get x1 = 1 − 12 w1 + 12 w2 . Now we substitute that into the other equations:
One silver lining is that we can always make the right-hand side nonnegative.
An equation constraint can always be multiplied by −1 and remain valid (unlike an
inequality constraint, which reverses when it is multiplied by −1). So let’s assume
that b ≥ 0.
The solution here is to introduce artificial slack variables to the problem. We
turn the problem Ax = b into the problem Ax ≤ b, and then add slack variables
w1 , w2 , . . . , wm ≥ 0 to turn it back into equational form. (In matrix form, this looks
like Ax + Iw = b.)
What’s the point? Well, because we’ve assumed b ≥ 0, the new is a problem for
which the two-phase simplex method is not necessary: if we make the slack variables
w1 , w2 , . . . , wm our basic variables, we get an initial basic feasible solution.
As before, we introduce an artificial objective function to optimize in the phase
one problem. In this case, our slack variables w1 , w2 , . . . , wm are artificial: they do
not belong in the problem, since we want to have Ax = b and not just Ax ≤ b. So
we decide to minimize ξ = w1 + w2 + · · · + wm : the sum of the slack variables. If we
can get it down to 0, then we get a solution where Ax = b, and then we can proceed
to the phase two problem.
For example, suppose that we have the following constraints:
x1 + x2 + x3 = 1
6x − 2x3 = 1
1
2x1 + x2 − 3x3 = −1
≥
x1 , x 2 , x 3 0
Our first step is to rewrite the third constraint as −2x1 − x2 + 3x3 = 1, so that all
the numbers on the right-hand side are positive. Now we are ready to insert artificial
slack variables w1 , w2 , w3 :
x1 + x2 + x3 + w1 = 1
6x1 − 2x3 + w2 = 1
−2x1 − x2 + 3x3 + w3 = 1
≥
x1 , x 2 , x 3 , w 1 , w 2 , w 3 0
Our objective function for the phase one problem is ξ = w1 + w2 + w3 , but that’s
phrased entirely in terms of the basic variables. We must substitute w1 = 1 − x1 −
x2 − x3 , w2 = 1 − 6x1 + 2x3 , and w3 = 1 + 2x1 + x2 − 3x3 to get the objective function
in the form we want. If we do, then ξ simplifies to 3 − 5x1 − 2x3 , and we get the
initial dictionary
2.29 Troubleshooting
There are several unexpected things that can go wrong in the two-phase simplex
method.
It is possible that we can never get the artificial objective function ξ down to
0. This is an indicator that our original problem did not have a feasible solution!
Although this is disappointing for the problem we were trying to solve, it’s convenient
for the solver: now we can skip phase two.
In most cases, we expect that ξ will hit 0 at the same time that our artificial
variable(s) leave the basis. After all, if ξ = 0, then x0 (from our first two-phase
method) or the artificial slack variables w1 , . . . , wm (from our second method) must
all be 0, which is the sort of thing nonbasic variables generally do. However, it is
possible for these variables to be basic and still be equal to 0.
In such a degenerate case, we can make some quick final adjustments. If an
artificial variable is equal to 0 but still basic, pick any nonbasic, non-artificial variable
in its equation, and do a pivot step to replace the artificial variable by that nonbasic
variable—ignoring our usual pivoting rules. Because both variables will remain equal
to 0, this will not change the value of any other variables, so this pivot step preserves
feasibility.
In our second two-phase method, an even weirder thing can happen. It’s possible
that:
• The artificial objective function ξ has reached 0;
• Some artificial slack variable wi is still basic;
• There are no non-artificial variables in wi ’s equation to replace it with!
If this happens, just forget that equation entirely. What this means is that one of the
equations in the system Ax = b was redundant; it could be deduced from the others.
Once we eliminate the artificial slack variables, the redundant equation becomes
0 = 0; we don’t need it.
maximize x + y + z
x,y,z∈R
subject to −2x + y + z ≤ 0
−y + 2z ≤ 0
x + y − 3z ≤ 0
x, y, z ≥ 0.
The astute observer will notice that (as usual with baking) if we find a feasible
solution (x, y, z) then we can scale it up to (2x, 2y, 2z) or even (100x, 100y, 100z)
without violating the inequalities. So it seems like there can’t be any limit in the
number of biscuits baked.
This is almost correct. The challenge here is to figure out if there’s any division of
biscuits that will make all three bakers happy. If not, then the only feasible solution
is (x, y, z) = (0, 0, 0) and no amount of scaling that up will get you biscuits.
If we add slack variables w1 , w2 , w3 to the inequalities, then they get us an initial
basic feasible solution; no two-phase simplex method needed here! Unfortunately,
the initial dictionary we write down looks somewhat concerning. . .
Here it is:
max ζ = 0 + x + y + z
w1 = 0 + 2x − y − z
w2 = 0 + y − 2z
w3 = 0 − x − y + 3z
Any of the three entering variables seem like equally good candidates. Let’s just try
making y the entering variable.
If we follow our usual procedure, then only w1 and w3 make it onto our “shortlist"
of leaving variables. (Even this step doesn’t seem entirely justified! Usually, if a
variable is not on our shortlist, it’s because pivoting on it is guaranteed to produce
an infeasible dictionary. However, in this case, all three leaving variables will produce
feasible dictionaries when we pivot, because we won’t be able to leave point (0, 0, 0)
after this step.) The ratios for w1 and w3 are both 01 , meaning that we can’t increase
y past 0 before either of them becomes negative. This is a tie, so we can’t tell which
variable to pick; let’s arbitrarily take w1 .
After solving w1 = 2x − y − z to get y = 2x − w1 − z and substituting this for y in
the other equations, we get the dictionary below:
max ζ = 0 + 3x − w1
y = 0 + 2x − w1 − z
w2 = 0 + 2x − w1 − 3z
w3 = 0 − 3x + w1 + 4z
It sure doesn’t seem like we’re making any progress. However, there is still a positive
reduced cost, so we can still
keep going by pivoting on x.
Altogether, there are 63 = 20 ways to choose three basic variables in this problem.
One of them turns out not to work: if you try to solve for x, w1 , and w3 , you end up
having to take the inverse of a singular matrix. That leaves 19 feasible dictionaries,
all of which describe the point (x, y, z) = (0, 0, 0) in various ways.
What’s even the point of pivoting, then?6 Actually, there are two possible
outcomes that would solve the problem for us:
• Suppose that one of these 19 dictionaries has all negative reduced costs. Then
that formula for ζ proves that whenever x1 , x2 , x3 , w1 , w2 , w3 ≥ 0, we have ζ ≤ 0.
In that case, we’d be able to conclude that (0, 0, 0) is the only feasible solution.
• Suppose that one of these 19 dictionaries has an entering variable, with positive
reduced cost, such that all the coefficients in that column are positive. Then
we’d have a way to escape to infinity: by increasing that variable and keeping
the other nonbasic variables at 0, we increase all the basic variables (and ζ)
and discover that the linear program is unbounded.
The problem is that because of all the degenerate pivots we’re doing, we can
never tell if we’re making progress toward either of these goals. In fact, we don’t
even have a clear proof that either of these outcomes is guaranteed to happen!
Fact 1: Bland’s rule prevents cycling: it can never return to a dictionary it’s
previously considered.
I am calling this a “fact" and not a “theorem" because we will not prove it.
The drawback of Bland’s rule is that it’s slow: even though it never returns to
the same feasible dictionary twice in degenerate cases, it tends to perform badly
in cases with no degeneracy. That is, it often picks longer paths from the initial
corner point to the optimal one. Intuitively, the reason this happens is that variables
earlier in our list are both more likely candidates to enter the basis and more likely
candidates to leave: so they end up flipping back and forth often. (Unfortunately,
this property also plays a key role in the proof that Bland’s rule prevents cycling.)
We’d like to come up with a rule that avoids cycling just by addressing situation
2 (how to choose leaving variables). That way, we can pair it with the highest-cost
pivoting rule, which only addresses situation 1 (how to choose entering variables).
The highest-cost pivoting rule is not the smartest rule there is, but it’s good enough
in many cases.
Geometrically, we’ve taken each equation and pushed it by a random tiny amount.
It is very unlikely that the result has even a single degenerate dictionary. So with
this adjustment, none of our pivot steps will be degenerate, and so we’ll never cycle.
Of course, we’re solving a slightly different problem now, but as long as our random
adjustments were sufficiently small, our final answer will be very very very close to
the answer to our original problem.
(Once we’re done, we may even be able to recover the exact answer to the original
problem, by assuming that our random adjustment doesn’t change the optimal choice
of basic variables.)
max ζ = 0 + x + y + z max ζ = 0 + x + y + z
w1 = 0 + 2x − y − z w1 = ϵ1 + 2x − y − z
⇝
w2 = 0 + y − 2z w2 = ϵ2 + y − 2z
w3 = 0 − x − y + 3z w3 = ϵ3 − x − y + 3z
1 ≫ ϵ1 ≫ ϵ2 ≫ · · · ≫ ϵm > 0.
so w3 is the only possibly leaving variable. After solving its equation for y, we get
y = ϵ3 − x + 3z − w3 , which we then substitute for x in our other equations. The
result is shown on the right:
max ζ = 0 + x + y + z max ζ = ϵ3 + 4z − w3
w1 = ϵ1 + 2x − y − z w1 = (ϵ1 − ϵ3 ) + 3x − 4z + w3
⇝
w2 = ϵ2 + y − 2z w2 = (ϵ2 + ϵ3 ) − x + z − w3
w3 = ϵ3 − x − y + 3z y= ϵ3 − x + 3z − w3
We’ve made an infinitesimal amount of progress: the objective value has improved
from 0 to ϵ3 . (Granted, that’s pretty much the least amount of progress possible,
but so what.) Note that all three basic variables are still positive: in particular,
ϵ1 − ϵ3 > 0.
There is only one positive reduced cost: it is on z. No need to compare ratios: the
only possible leaving variable when z enters the basis is w1 . The resulting dictionary
is
max ζ = ϵ1 + 3x − w1
z = ( 14 ϵ1 − 14 ϵ3 ) + 3
4x − 1
4 w1 + 1
4 w3
w2 = ( 41 ϵ1 + ϵ2 + 34 ϵ3 ) − 1
4x − 1
4 w1 − 3
4 w3
y = ( 34 ϵ1 + 14 ϵ3 ) + 5
4x − 3
4 w1 − 1
4 w3
Now the only positive reduced cost is on x. Once again, there is only one possible
leaving variable, which is w2 . After a third pivot step, we get:
Since the reduced costs of w1 , w2 , w3 are all negative, this tells us that we’ve “maxi-
mized" ζ: 4ϵ1 + 12ϵ2 + 9ϵ3 is the highest possible value it could have. Of course, just
like every other value we saw for ζ, it rounds to 0. To get our final answer, we set
ϵ1 = ϵ2 = ϵ3 = 0 and get that (0, 0, 0) really is our optimal solution.
This is just a different way of using the definition of matrix multiplication. We can
check that in the equation above, for example, both sides give 1 · x + 2 · y + 3 · z for
the first component of the result.
To add a bit of a twist to this idea: once we’ve split the product Ax up into
columns like this, we can recombine some of the columns into smaller matrix-vector
products. For example:
x1
x
2
1 1 1 1 1 1 1 1 1 1
x3 = x1 + x2 + x3 + x4 + x5
1 2 3 4 5 1 2 3 4 5
x4
x5
1 1 1 1 1
= x2 + x5 + x1 + x3 + x4
2 5 1 3 4
x1
1 1 x2 1 1 1
= + x3 .
2 5 x5 1 3 4
x4
decision for what the notation means.) Just as with vectors, if I is a sequence of
several indices, then we’ll write AI for the matrix we get by picking out the columns
numbered by I from A.
With this notation, the equation we wrote down a bit ago can be written more
compactly as
Ax = A(2,5) x(2,5) + A(1,3,4) x(1,3,4) .
What are the extreme points? Nothing in the middle will do: if we can go a little bit
right and a little bit left from a point and stay in S, for example, then you’re in the
middle between “a little bit right" and “a little bit left", so you’re not an extreme
point. Also, a point that lies on a straight-line boundary is also not an extreme
point: it is between two points obtained by going a little bit in one direction along
the boundary, and a little bit in the other direction.
The three corner points of the triangle attached on the left of S are all extreme
points. Also, every single point on the curved boundary on the right of S is an
extreme point: from a point on the boundary of a circle, if you pick two opposite
directions to go in, one of them will leave the circle.
The definition of an extreme point describes a geometric intution. We can also
define a corner point in terms of what we want corner points to do. This gives us the
definition of a vertex. When S ⊆ Rn , a point x ∈ S is a vertex of S if there is some
nonzero vector a ∈ Rn such that the dot product aT x = a1 x1 + · · · + an xn is strictly
bigger (no ties allowed!) than aT y for any y ∈ S with y ̸= x.
In other words, the vertices are the points in S that are the unique optimal
solutions to a linear maximization problem over S.
Looking at the region drawn above, its vertices are almost the same as its extreme
points. For the corner points between two straight boundaries, there are many
vectors a we could choose to justify that the corner point is a vertex. For a point on
the boundary of the circular arc, pick a to be the direction away from the center of
that circle.
There’s only one exception, which is very subtle: the points where the circular
arc meets the straight boundary are extreme points, but they’re not vertices. That’s
because if we optimize along the vector a which points from the center of the circle
toward one of these points, then a points vertically, and all the points along that
straight boundary will be tied with x.
The last definition of a corner point only applies to the regions we care about:
regions of the form S = {x ∈ Rn : Ax = b and x ≥ 0}. We will assume that the system
of equations Ax = b has no redundant or inconsistent equations: this assumption
holds whenever we’re using the simplex method, though sometimes we need a two-
phase method to check it. Let m be the number of equations (the number of rows in
A).
In this setting, a basic feasible solution x is any x ∈ S such that we can
split (1, 2, . . . , n) into m basic variables B and n − m nonbasic variables N to have
xB = (AB )−1 b and xN = 0. (Note that from x ∈ S, it follows that x ≥ 0.) The basic
feasible solutions are exactly the solutions that the simplex method explores. What
relationship do they have to the extreme points and the vertices?
Proof. Suppose that x is a basic feasible solution: choose B and N such that
xB = (AB )−1 b and xN = 0. Then, define a by setting aB = 0 and aN = −1.
Then the dot product aT y = a1 y1 + a2 y2 + · · · + an yn simplifies to the sum
aT y =
X
(−1) · yi .
i∈N
Proof. Let x ∈ S be a vertex of S, and let a be the vector such that aT x < aT y for
all y ∈ S with y ̸= x.
Suppose that x is not an extreme point: then there are y, z ∈ S not equal to x
and some 0 ≤ t ≤ 1 such that x = ty + (1 − t)z. Multiplying by aT on both sides and
distributing, we get
But the right-hand side of this inequality just simplifies to aT x, and we get the
ridiculous inequality aT x < aT x. Therefore assuming x is not an extreme point has
led us to a contradiction, and x must be an extreme point. ■
Proof. Let x be any extreme point of the feasible region. Split up (1, 2, . . . , n) into
P and Z such that xZ = 0Z and xP > 0P : the positive entries of x and the zero
entries of x.
What we’d like to be the case is that xP is m-dimensional (remember, m is the
number of rows in A) and that AP is invertible. Then we can take B = P and N = Z,
and x will be forced to be the basic feasible solution with basic variables B.
This can go wrong in a few ways. First of all, P might be too small. This is still
fine; sometimes we have basic variables equal to 0. If the columns of AP are at least
linearly independent, then we can pick some more columns to add to P to make
B in such a way that the columns of AB are still linearly independent, making AB
invertible. Remove those same columns from Z to get N . Now, once again, x will
be the basic feasible solution with basic variables B.
There are two more things that can go wrong:
• Maybe P is too small, but the columns of AP are already linearly dependent.
In that case, we can’t add any columns to get an invertible matrix, and the
procedure above won’t work.
• Maybe P is not too small, but too big: it has more than m entries. In that
case, the columns of AP are also linearly dependent: they are more than m
vectors in Rm .
So if anything goes wrong, then it’s because the columns of AP are linearly dependent.
In this case, we’ll try to arrive at a contradiction by showing that x is not actually
an extreme point.
If the columns of AP are linearly dependent, then we can take a nontrivial linear
combination of them to get 0. This linear combination can be written as
X
yi A i = 0
i∈P
where not all the yi are 0. Let’s turn these numbers yi into an n-dimensional vector
y, by setting yj = 0 for every j ∈ Z. Then the linear combination above is just AP yP .
Now pick a very very very very small value r > 0, and consider the points x + ry
and x − ry. We’ll show that these are two points in S and x is between them,
concluding that x is not an extreme point.
• We know AP yP = 0. We also know AZ yZ = 0, because yZ = 0. Therefore
Ay = 0.
So the row vector of our reduced costs is given by the formula (cN )T −(cB )T (AB )−1 AN .
We’re writing the product (cB )T (AB )−1 a lot, so let’s give it a name: let’s call it
T
u . (It has a transpose because it’s a row vector.) We’ll learn much more about this
vector later; for now, it’s just a vector that’s handy in our calculations!
All this can be summarized by putting our dictionaries in matrix form:
(cN )T − uTAN xN
ζ = uT b +
xB = (AB )−1 b − (AB )−1 AN xN
When doing the ordinary simplex method, it would be bad to recompute the dictionary
at every step using these formulas, because computing (AB )−1 at every step is
expensive. On the other hand, this can be useful to compute a dictionary if, for
some reason, all you know is which variables are basic.
We will also use these formulas in the revised simplex method: an improvement on
the simplex method which is more computationally efficient by avoiding unnecessary
calculations.
Gold Silver Rubies Diamonds Magic rings Spell scrolls Stale cookies
Price/kg 2 1 3 5 2 5 0
Volume/kg 3 3 1 2 4 5 5
How do you figure out the most efficient combination of precious items?
We just have two constraints here, aside from nonnegativity constraints:
• if x1, . . . , x7 measure the total amount of the objects in kilograms, then we want
x1 + x2 + x3 + x4 + x5 + x6 + x7 ≤ 30.
• The volume/kg row of the table gives us the constraint on volume: 3x1 + 3x2 +
x3 + 2x4 + 4x5 + 5x6 + 5x7 ≤ 100.
The price/kg row gives us the objective function: we want to maximize 2x1 + x2 +
3x3 + 5x4 + 2x5 + 5x6 .
The challenging part is the number of variables (most of which will not be used
in the optimal solution). If 7 variables (9 when we add slack variables) is not bad
enough for you, you can imagine a more varied hoard for which the problem would
be much worse.
We will do something unusual with the notation today. To make it easier to
connect our dictionary to the matrix formulas, we will name our slack variables x8
and x9 , putting them at the end of our vector x. The variables that describe our
linear program are:
h i
cT = 2 1 3 5 2 5 0 0 0
1 1 1 1 1 1 1 1 0
A=
3 3 1 2 4 5 5 0 1
30
b = .
100
Normally, our first choice of basic variables would be B = (8, 9): the slack variables.
To try out our new formulas, we’ll take B = (1, 7): we’ll consider filling up our
backpack with gold and stale cookies.
we’d begin by finding uT = (cB )T (AB )−1 , then calculating (cN )T − uTAN . This
avoids having to deal with the product (AB )−1 AN .
• If we use Bland’sT
rule for pivoting, then we get to save some work. After
computing u , we can find the reduced cost of variable xi by calculating
ci − uTAi : xi ’s component of (cN )T − uTAN . Bland’s rule says that we can
stop once we find the first positive reduced cost.
This helps counteract the disadvantage of Bland’s rule: its slowness. We don’t
mind doing more pivot steps if each pivot step becomes faster!
Either way, we begin by computing
5
T T −1
h i
2
− 12 h i
u = (cB ) (AB ) = 2 0 = 5 −1 .
− 32 1
2
Let’s try computing the reduced costs one at a time. Silver (x2 ) gives us
1 h i
c2 − uTA2 = 1 − 5 −1 = 1 − (5 · 1 − 1 · 3) = −1.
3
Doing the same calculation for rubies (x3 ) gives us c3 − uTA3 = 3 − (5 · 1 − 1 · 1) = −1,
but for diamonds (x4 ) we finally get c4 − uTA4 = 5 − (5 · 1 − 1 · 2) = 2, which is positive.
Now that we know x4 is our entering variable, we want to find our leaving variable.
The trick is that we don’t need all of (AB )−1 AN to do this! We only care about x4 ’s
column of that matrix, which is given by
5
− 12 1 32
(AB )−1 A4 = 23 1 = .
−2 2 2 − 12
Remember that our dictionary has xB = (AB )−1 (b − AN xN ) in it, so we are subtract-
ing (AB )−1 A4 x4 . Our shortlist of leaving variables comes from negative coefficients,
which means we’re looking for positive values in (AB )−1 A4 . The 32 is positive, which
puts our first variable in B = (1, 7) on our shortlist: x1 .
If we had more than one variable on our shortlist, we’d continue by computing
the ratios between the column (AB )−1 A4 we just found, and the column (AB )−1 b
that we computed earlier. But in this case, we can skip that step: x1 is the only
candidate.
So now we know x4 is our entering variable and x1 is our leaving variable. We’re
done, right? We can just go to the next step with B = (4, 7).
Not so fast! We really don’t want to compute (AB )−1 again at each step. (In this
example, it’s only a 2 × 2 matrix inverse, but for larger systems, the inverse is much
harder to compute.) Let’s try to compute the inverse of A(4,7) (the new inverse we
want) from the inverse of A(1,7) (the old inverse we have).
Here’s the idea. Using our old B, we already know all the entries of
5
− 21 1 1 1 1 23 0
(A(1,7) )−1 A(1,4,7) = 2 = .
− 32 1
2 3 2 5 0 − 21 1
because whatever (A(4,7) )−1 is, multiplying by it must turn the x4 and x7 columns
of A into the identity matrix.
We can figure out what row operations turn (A(1,7) )−1 A(1,4,7) (the first 2 × 3
matrix above) into (A(4,7) )−1 A(1,4,7) (the second 2 × 3 matrix above). To do this, we
multiply the first row by 23 (to turn 32 into 1) and then add half the result to the
second row (to turn − 12 into 0).
But row operations are just matrix multiplication from the left. So those same
row operations will turn (A(1,7) )−1 into (A(4,7) )−1 , which is what we want! We take
(A(1,7) )−1 , multiply the first row by 23 , and then add half the result to the second
row:
5
− 12 5
− 13
(A(1,7) )−1 = 2 ⇝ A−1
(4,7) = 3 .
− 32 1
2 − 23 1
3
1
1 0 1 0 ?
(A(4,7) )−1 A(4,5,7) = 3
2 ⇝ (A(4,5) )−1 A(4,5,7) = .
0 3 1 0 1 ?
To get there, we must multiply the second row by 32 (to turn the 23 into 1) and
then subtract 13 of that from the first row (to turn 13 into 0). So let’s do the
same things to (A(4,7) )−1 :
5
−1 − 13 2 − 12
(A(4,7) ) = 3
⇝ (A(5,7) )−1 =
− 32 1
3 −1 1
2
First, consider the following linear program (whose feasible region forms a d-
dimensional hypercube):
maximize xd
x∈Rd
subject to 0 ≤ x1 ≤ 1
0 ≤ x2 ≤ 1
..
.
0 ≤ xd ≤ 1
This is not actually the worst case for the simplex method, under any reasonable
pivoting rule. The initial basic feasible solution is x = 0, the only variable with
positive reduced cost is xd , and pivoting on xd gets us to an optimal solution within
one step. But tiny modifications will make it much much worse!
First of all, there are some really inefficient trajectories possible in theory: paths
we can take going from (0, 0, . . . , 0, 0) to (0, 0, . . . , 0, 1) that visit every other vertex of
the hypercube in between. Here is an illustration of such a path in the 3-dimensional
case (when the feasible region is a cube):
(0,1,1)
•
• (1,1,1)
(0,0,1) •
• (1,0,1)
(0,1,0) •
• (1,1,0)
(0,0,0) •
•
(1,0,0)
We’ll call this path the “terrible trajectory". (Despite the alliteration, this is not es-
tablished terminology.) The terrible trajectory has a fairly simple recursive definition:
to follow it in d-dimensions, first follow the (d − 1)-dimensional terrible trajectory
(keeping xd = 0), then change xd to 1, then follow the (d − 1)-dimensional terrible
trajectory again, but in reverse.
Second, note that this trajectory is actually kind of close to being reasonable for
the linear program we want to solve. Every single step of the terrible trajectory is
neutral with respect to the objective value (it does not change xd ), except for one
step, which increases it. So it’s possible that if we push the corners around a bit,
then every single step of the terrible trajectory will increase the objective value. And
at that point, we’re close to tricking the simplex method into following it.
maximize xd
x∈Rd
subject to 0.1 ≤ x1 ≤ 1 − 0.1
0.1x1 ≤ x2 ≤ 1 − 0.1x1
0.1x2 ≤ x3 ≤ 1 − 0.1x2
..
.
0.1xd−1 ≤ xd ≤ 1 − 0.1xd−1
The value 0.1 could be replaced by any reasonably small constant. The smaller we
make it, the closer we get to the original cube, and if we set it to 0, we just get back
that cube.
Here’s what the terrible trajectory looks like for this linear program, in 3 dimen-
sions. (It’s a bit of a lie, because with the modification, the feasible region is no
longer a perfect cube.)
(0.1,0.99,0.901)
•
• (0.9,0.91,0.909)
(0.1,0.01,0.999) •
• (0.9,0.09,0.991)
(0.1,0.99,0.099) •
• (0.9,0.91,0.091)
(0.1,0.01,0.001) •
•
(0.9,0.09,0.009)
You can see that in this trajectory, the objective values steadily increase:
0.001 < 0.009 < 0.091 < 0.099 < 0.901 < 0.909 < 0.991 < 0.999.
It turns out that, with a natural choice of variable ordering, Bland’s rule will end
up picking this trajectory. Let’s first add slack variables to the problem, rewriting
0.1xi−1 ≤ xi ≤ 1 − 0.1xi−1 as
0
i−1 − xi + wi
0.1x =0
0.1xi−1 + xi + w 1 =1
i
Here, the superscript in wi0 and wi1 is not an exponent: it’s an extra index, since we
have 2d slack variables. To explain the naming convention: wi0 is the slack variable
for the lower bound on xi , and when wi0 = 0, xi is close to 0. Meanwhile, wi1 is the
slack variable for the upper bound on xi , and when wi1 = 0, xi is close to 1.
We have to use the two-phase simplex method for this problem, since 0 is not
feasible, but let’s skip ahead and suppose we arrive at the correct basic feasible solution
we wanted: the corner point (0.1, 0.01, . . . , 0.1d ). Here, the variables x1 , x2 , . . . , xd are
all basic—and they’ll stay basic forever, because none of them can be 0. In order
to start out at this corner point, our nonbasic variables (corresponding to the tight
constraints) must be w10 , w20 , . . . , wd0 ; the slack variables w11 , w21 , . . . , wd1 are basic.
In each of the basic feasible solutions we can encounter, exactly one of wi0 and
wi1 is basic for each i When xi ≈ 0, wi0 ’s constraint is tight, so wi1 is nonbasic. When
xi ≈ 1, wi1 ’s constraint is tight, so wi1 is nonbasic. Moving from one corner point to
an adjacent one means pivoting so that wi0 enters the basis and wi1 leaves for som i,
or vice versa.
If we put the slack variables in the order
then Bland’s rule will pivot on w10 or w11 whenever this improves the objective value,
which is every other step. In between those, it will pivot on w20 or w21 as often as
possible, and so on. In 3 dimensions, the sequence of entering variables will be
The pattern continues in higher dimensions: the subscripts will follow the sequence
1, 2, 1, 3, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 1, 5, 1, 2, 1, 3, 1, 2, 1, . . . ,
The Klee–Minty linear program in d dimensions is given below (I’ve written the
inequalities backwards to make the pattern easier to see):
maximize 2d−1 x1 + 2d−2 x2 + · · · + xd
x∈Rd
subject to 5 ≥ x1
25 ≥ 4x1 + x2
125 ≥ 8x1 + 4x2 + x3
625 ≥ 16x1 + 8x2 + 4x3 + x4
..
.
5d ≥ 2d x1 + 2d−1 x2 + 2d−2 x3 + · · · + 8xd−2 + 4xd−1 + xd
x1 , x2 , . . . , xd ≥ 0.
This is also shaped kind of like a hypercube in d dimensions (and a cube in 3
dimensions), but it is very distorted. Here is a picture of the “terrible trajectory"
for the Klee–Minty cube, with coordinates given in the figure on the left, and their
objective values on the right. (For this linear program, the shape of the cube is an
incredible lie, but the adjacencies between the corners are the same.)
(0,25,25)
•
• (5,5,65)
(0,0,125) •
• (5,0,85)
(0,25,0) •
• (5,5,0)
(0,0,0) •
•
(5,0,0)
75
•
• 95
125 •
• 105
50 •
• 30
0•
•
20
The best way to understand why this cube tricks the highest-cost rule is to try doing
it, and see how the reduced costs change. But essentially, this construction exploits a
weakess of the pivoting rule that we’ve already talked about: it’s sensitive to changes
in units. To get the highest-cost rule to pick earlier variables over later ones, it’s
enough to set up the problem so that a very small change in x1 or x2 has the same
effect as a very large change in xd−1 or xd . However, the constraints are set up so
that the distance that it’s possible to go in the xd−1 or xd direction is always much
larger.
If you have $7, what is the largest amount of chocolate you can buy and take
home with you?
Let x1 be the amount of plain chocolate chips and x2 the amount of deluxe
chocolate chips, in pints (so that we want to maximize x1 + x2 ). Let x3 be the
number of 3-pint bags you buy (if it is a fraction, we assume that you bought some
other size of bag.) Then the amount of money you brought limits these variables
to x1 + 2x2 + 4x3 ≤ 7. Also, you can carry at most 1 + 3x3 pints of chocolate, so
x1 + x2 ≤ 1 + 3x3 , or x1 + x2 − 3x3 ≤ 1.
In summary, we get the linear program below:
maximize x1 + x2
x1 ,x2 ,x3 ∈R
subject to x1 + 2x2 + 4x3 ≤ 7
x1 + x2 − 3x3 ≤ 1
x1 , x 2 , x 3 ≥ 0
Today, we’re going to be too lazy to try to solve this linear program. Instead, we
want to prove some lower and upper bounds on the objective value of the solution.
Lower bounds for a maximization problem are easy to find.
• Setting x1 = x2 = x3 = 0 satisfies both constraints, so clearly we can’t do worse
than an objective value of 0.
• We could try tweaking that: say x1 = 1 and x2 = x3 = 0, then we get an
objective value of 1.
• In general, any feasible solution gives us a lower bound on the objective value.
If we wanted to get good lower bounds this way, we’d start trying to solve the
linear program, which we said we didn’t want to do.
What about upper bounds? Well, here are some ideas:
• x1 + x2 is always less than or equal to x1 + 2x2 + 4x3. So if x1 + 2x2 + 4x3 ≤ 7,
we can immediately conclude x1 + x2 ≤ 7.
• Note that we can’t conclude from x1 + x2 − 3x3 ≤ 1 that x1 + x2 ≤ 1, because
the −3x3 term could potentially make x1 + x2 − 3x3 a lot smaller than x1 + x2 .
• However, if we average the two constraints, we get an improvement:
1 1 1 3 1
(x1 + 2x2 + 4x3 ) + (x1 + x2 − 3x3 ) ≤ (7 + 1) =⇒ x1 + x2 + x3 ≤ 4
2 2 2 2 2
and we always have x1 + x2 ≤ x1 + 32 x2 + 12 x3 , so we conclude that x1 + x2 ≤ 4.
More generally, we could try to combine the two constraints with any coefficients.
As long as u1 ≥ 0 and u2 ≥ 0, we can try to combine the inequalities with weights u1
and u2 to get
This is a valid inequality, but not necessarily a useful one. We want the left-hand
side to be an upper bound on x1 + x2 if we want to apply the same logic that we did
earlier. For this to happen, the coefficients of x1 and x2 must be at least 1, and the
coefficient of x3 must be nonnegative. This gives us three constraints on u1 and u2
in order for 7u1 + u2 to be an upper bound.
What is the best upper bound we can find by combining the inequalities in this
way? The answer can be found by solving a different linear program in terms of u1
and u2 :
minimize 7u1 + u2
u1 ,u2 ∈R
subject to u1 + u2 ≥ 1
2u1 + u2 ≥ 1
4u1 − 3u2 ≥ 0
u1 , u 2 ≥ 0
uT b bT u
minimize minimize
u∈Rm u∈Rm
(D) subject to uTA ≥ cT ⇐⇒ subject to AT u ≥ c
u≥0 u≥0
(When (D) is the dual linear program of (P), we call (P) the primal linear
program.)
The dual linear program above is written in two forms. On the right, we took
the transpose of both sides, putting it into a form more usual for linear programs.
But when we think about the dual relationship between (P) and (D), it’s more
convenient to use the formulation on the left, because then the dual program is
distinguished by being in terms of a row vector uT instead of a column vector u.
The reasoning by which (D) gives upper bounds for (P) holds in general. Formally,
this relationship is called weak duality, and is summarized in the theorem below:
Theorem 2.7 — Weak duality of linear programs. For any x ∈ Rn which is feasible
for the primal linear program (P) (or primal feasible) and for any u ∈ Rm which
is feasible for the dual linear program (D) (or dual feasible), we have cT x ≤ uT b.
In particular, the objective value of the dual optimal solution is an upper bound
for the objective value of the primal optimal solution (assuming both optimal
solutions exist).
Memorizing the rules in the table is possible, but it probably isn’t very satisfying.
It is healthier to practice figuring out the correspondence for yourself, by asking the
questions in the examples above: how can we combine the constraints of the primal
problem to get bounds on its optimal value, of whichever kind makes sense?
In other words, the dual program is good at finding bounds on the primal program:
the best bound it finds is exactly correct.
We have not yet proved strong duality. (We will see a proof later.)
However, keep in mind the word “if" at the beginning of this theorem. We are not
guaranteed that a linear program has an optimal solution: it could be unbounded,
or infeasible!
In fact, just from weak duality, we can already deduce a relationship between
unbounded and infeasible linear programs.
• Suppose that (P) hasT a feasible solution x. Then we know that for every dual
feasible u, we have c x ≤ u b. Therefore uT b cannot be arbitrarily low: it is
T
the primal program is infeasible (we can’t have x ≤ −1 and x ≥ 0 at the same time)
and the dual program is unbounded (by setting u to be very large, we make −u very
small).
We can get an example where the primal program is infeasible and the dual
program is unbounded simply by reversing the roles of the two programs. Or, if we
want to keep (P) a maximization problem and (D) a minimization problem, we
Here, the primal is infeasible because we can’t choose a value of x, and the dual is
infeasible because we can’t choose a value of v.
Additionally, each factory can ship at most 6 pounds of chocolate chips per day
(total).
What is the most cost-efficient way to supply both stores with chocolate chips?
To model this linear program, our first step is to understand the variables. What
quantities do we need to know to specify how we’re supplying both stores? We need
a variable telling us how many pounds of chocolate chips are shipped from each
factory to each store.
Let’s write a1 , a2 , a3 for the amount shipped from factories 1, 2, 3 respectively to
Atlanta, and s1 , s2 , s3 for the amount shipped from factories 1, 2, 3 respectively to
Seattle. These are all nonnegative variables.
We have two “demand constraints": each store needs a certain amount of chocolate.
We can write these as a1 + a2 + a3 = 10 and s1 + s2 + s3 = 5. We also have three
“supply constraints": each factory can ship at most 6 pounds of chocolate per day.
We can write these as a1 + s1 ≤ 6, a2 + s2 ≤ 6, and a3 + s3 ≤ 6. We must minimize
the total cost of shipping, which we can get by multiplying the cost per pound in
each entry of the table by the amount shipped from that factor to that store.
This gives us the primal linear program (P) below:
minimize 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3
a,s∈R3
subject to a1 + a2 + a3 = 10 (u1 )
s1 + s2 + s3 = 5 (u2 )
(P) a1 + s1 ≤ 6 (v1 )
a2 + s2 ≤ 6 (v2 )
≤
a3 + s3 6 (v3 )
a1 , a 2 , a 3 , s 1 , s 2 , s 3 ≥ 0
Let’s also try to understand how these lower bounds work, so that we can better
understand those rules.
A working lower bound for (P) would be an inequality P a1 + Qa2 + Ra3 +
Ss1 + T s2 + U s3 ≥ X, where P, Q, R, S, T, U are less than the costs 7, 12, 10, 10, 12, 20
respectively. This would make P a1 + Qa2 + Ra3 + Ss1 + T s2 + U s3 a lower bound on
the primal objective function 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3 , which means X
would also be a lower bound on that primal objective function. (Since all variables in
(P) are nonnegative, it’s okay if some coefficients P, Q, R, S, T, U are less than their
corresponding costs.) This gives us all six constraints in (D).
The objective function in (D) comes from seeing what the lower bound X will
be if we multiply the constraints in (P) by coefficients u1 , u2 , v1 , v2 , v3 and add them
up. Since we want the most informative (and therefore greatest) lower bound, we
want to maximize.
The trickiest part is understanding the types of variables (D) has. I’ve written
v1 , v2 , v3 ≤ 0, and this is not a typo: v1 , v2 , v3 really are nonpositive variables. Why?
It’s because the corresponding constraints a1 + s1 ≤ 6, a2 ≤ s2 ≤ 6, and a3 + s3 ≤ 6
are all “≤" inequalities: they give upper bounds. To turn them into lower bounds
we want, we need to multiply them by a negative number to flip them.
Similarly, u1 , u2 are unconstrained, because the equations can be multiplied by
any coefficient: positive or negative.
Here’s a few feasible solutions to (D) to look at. First, suppose we take u1 = 7,
u2 = 10, and v1 = v2 = v3 = 0. This correspond to adding together 7a1 +7a2 +7a3 = 70
and 10s1 + 10s2 + 10s3 = 50 to get
Since the objective function 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3 is at least as big
as the left-hand-side of the equation above, 120 is a lower bound on the objective
value.
If we increased u1 to 10, then that wouldn’t be true: the coefficient of a1 would be
too big. But supposed we fixed that by setting v1 = −3: subtracting 3a1 + 3s1 ≤ 18.
We’d get:
7. If the coefficient of P were a smaller number like 6.5, it would mean that our
dual solution would still prove a lower bound of 142 when the price of shipping from
factory #1 to Atlanta dropped to $6.50 per pound. But that’s impossible, since we
know that our solution to (P) gets cheapper by $3 in that case!
Similarly, the coefficients of a3 and s2 must be exact. This tells us that three of
the inequalities in (D) must actually be equations if we are to match the bound of
142: we must get u1 + v1 = 7, u1 + v3 = 10, and u2 + v2 = 12.
Deduction 2. In our solution to (P), the constraint a2 + s2 ≤ 6 is slack: actually,
a2 + s2 = 0 + 5 < 6. This means that if we use this constraint to prove an inequality
then for our solution to (P), it will actually prove a strict inequality with <. This is
impossible: it would prove that our primal solution has objective value strictly less
than 142, which is false.
Therefore we shouldn’t use that constraint in our hypothetical lower bound of
142: we should have v2 = 0 in the dual solution we want. Similarly, since a3 + s3 ≤ 6
is slack in our solution to (P), we should have v3 = 0 in the solution to (D) we’re
looking for.
Combining the two deductions: since u2 + v2 = 12 and v2 = 0, we want u2 = 12.
Since u1 + v3 = 10 and v3 = 0, we want u1 = 10. Finally, since u1 + v1 = 7 and u1 = 10,
we want v1 = −3.
This is all resting on the hypothetical assumption that our solution to (P)
is optimal and has a matching lower bound based on a solution to (D). So
it is extremely important to check our work: is the resulting dual solution
(u1 , u2 , v1 , v2 , v3 ) = (10, 12, −3, 0, 0) actually a feasible solution for (D)?
It turns out that yes: this solution satisfies all six constraints in (D), and has
an objective value of 10 · 10 + 12 · 5 − 3 · 6 = 142. Therefore our primal solution is
optimal: the shipping plan does not need to be changed!
(If we had started with a suboptimal solution to (P), we would have gotten a
dual solution that fails this final check; that’s why checking is so important.)
The proof is just the sort of reasoning we used in our deductions above, but
generalized. Let’s consider one specific case: when the primal and dual have the
form
cT x uT b
maximize
minimize
x∈Rn
u∈Rm
(P) subject to Ax ≤ b (D) subject to uTA ≥ cT
x≥0
u≥0
The proof is not significantly different in all other cases, there are just a lot of cases
to check.
Theorem 2.9 — Complementary slackness. Suppose that we have a feasible solution
x for (P) and a feasible solution uT for (D) with cT x = uT b. Then the following
relationship holds:
• For all i, either (Ax)i = bi orTui = 0.
• For all j, either xj = 0 or (u A)j = ci.
Proof. Recall our proof of weak duality: we showed that cT x ≤ uT b by showing that
cT x ≤ uTAx ≤ uT b. So if cT x = uT b, then we must have equality in the second
equation as well: cT x = uTAx = uT b.
We can rewrite uTAx = uT b as uT (b − Ax) = 0. This is a dot product which we
can expand as a sum: we must have
m
X
ui (bi − (Ax)i ) = 0.
i=1
In every term, we must have ui ≥ 0 (since u is feasible for (D)) and bi − (Ax)i ≥ 0
(since x is feasible for (P)). So every term of the sum is nonnegative, and the
only way for the sum to be 0 is to have every term equal to 0. Therefore for all i,
ui (bi − (Ax)i ) = 0, which means that either (Ax)i = bi or ui = 0.
This proves the first bullet point. For the second bullet point, we use the same
reasoning, but applied to the equation cT x = uTAx, rewritten as (uTA−cT )x = 0. ■
As we saw in today’s example, complementary slackness can be useful when we
have a candidate solution, and we want to know whether it is optimal. (Note that
if we find a feasible solution x to (P) and a feasible solution u to (D) such that
cT x = uT b, then weak duality automatically tells us that both solutions are optimal!)
Additionally, each factory can ship at most 6 pounds of chocolate chips per day
(total).
What is the most cost-efficient way to supply both stores with chocolate chips?
minimize 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3
a,s∈R3
subject to a1 + a2 + a3 = 10 (u1 )
s1 + s2 + s3 = 5 (u2 )
(P) a1 + s1 ≤ 6 (v1 )
a2 + s2 ≤ 6 (v2 )
≤
a3 + s3 6 (v3 )
a1 , a 2 , a 3 , s 1 , s 2 , s 3 ≥ 0
maximize 10u1 + 5u2 + 6v1 + 6v2 + 6v3
2 3
u∈R ,v∈R
≤
subject to u1 + v1 7 (a1 )
u1 + v2 ≤ 12 (a2 )
u1 + v3 ≤ 10 (a3 )
(D)
u2 + v1 ≤ 10 (s1 )
u2 + v2 ≤ 12 (s2 )
≤
u2 + v3 20 (s3 )
v1 , v2 , v3 ≤ 0
cT x
maximize
uT b
x∈Rn minimize
(P) (D) u∈Rm
subject to Ax = b subject to uTA ≥ cT
x≥0
Recall that we have a formula for the dictionaries we get through the simplex method.
If we choose basic variables B and nonbasic variables N , then the corresponding
dictionary is
(cN )T − uTAN xN
ζ = uT b +
xB = (AB )−1 b − (AB )−1 AN xN
where uT = cB T (AB )−1 . It is not a coincidence that the vector u used in this formula
was given the same letter as the vector u we are using for the dual solution: they
are the same!
More precisely, suppose that we have achieved an optimal dictionary for maxi-
mizing cT x. This means that our reduced costs are all less than or equal to 0: we
have no variables left worth pivoting on. In other words, (cN )T − uTAN ≤ 0, or
uTAN ≥ (cN )T . This looks a lot like the constraints in (D): more precisely, it is the
constraints, but only the ones indexed by N .
What about the constraints indexed by B? These constraints correspond to the
basic variables, which are probably positive in our optimal solution, so we expect
them to be satisfied with equality: we expect that uTAB = (cB )T . This is also true,
since uT = (cB )T (AB )−1 .
The general case of strong duality can be deduced from this one, since all linear
programs can be put into equational form. (The proof is not automatic, since when
we have a linear program in two forms, its dual also has two forms, so optimal dual
solutions also look different. We would need to check that the dual solution we got
from the simplex method can be used to “recover" a dual solution for the dual of the
original linear program.)
When we add slack variables, (D) still has nonnegativity constraints on u, but
instead of being treated separately as nonnegativity constraints, they are simply the
constraints corresponding to the primal variables w.
Looking at our dictionary formula, we can notice that if xi is a nonbasic variable,
then its reduced cost xi is given by ci − uT Ai : the right-hand side of the dual
constraint corresponding to xi , minus the left-hand side of that constraint. This is
also true if xi is a basic variable, assuming that we consider the reduced cost of a
basic variable to be 0.
What if we do this for a slack variable? The dual constraint corresponding to
slack variable wi is just the constraint ui ≥ 0. The right-hand side minus the left-hand
side is just equal to −ui . So we deduce a simplified rule for finding uT :
Theorem 2.11 If (P) started out in the inequality form Ax ≤ b, then an optimal
solution u for (D) can be read off from the optimal dictionary for (P) by taking
the negatives of the reduced costs of the slack variables.
2.52.4 Examples
Let’s start with an example in equational form. Take the following primal-dual pair:
minimize 4u1 + 10u2
maximize x1 + 2x3 − x4
u1 ,u2 ∈R
x∈R4
subject to u1 + u2 ≥ 1(x1 )
(P) subject to x1 + x2 + x3 + x4 = 4(u1 )
(D) u1 + 2u2 ≥ 0(x2 )
x1 + 2x2 + 3x3 + 4x4 = 10(u2 )
u1 + 3u2 ≥ 2(x3 )
x1 , x 2 , x 3 , x 4 ≥ 0
u1 + 4u2 ≥ −1(x4 )
To get started finding the optimal dual solution, it’s enough for me to tell you that
in the optimal primal solution, x1 and x3 are basic; you don’t even need to know
what their values are! Then we use the formula uT = (cB )T(AB )−1 to compute
−1
3
h i h i1 1 h i
2
− 21 h
1 1
i
u1 u2 = 1 2 = 1 2 = .
1 3 − 12 1
2
2 2
We add slack variables to (P), solve it, and end up at the following optimal dictionary:
1 5
max ζ = 19 − 2 w1 − 2 w3
1 1
x = 2 + 2 w1 − 2 w3
1 1
y = 5 − 2 w1 − 2 w3
3 1
w2 = 10 − 2 w1 − 2 w3
The reduced costs of w1 and w3 are − 12 and − 52 , telling us that in the dual optimal
solution, u1 = 12 and u3 = 52 . What about w2 ? It’s a basic variable, so its reduced
cost is automatically 0. Therefore (u1 , u2 , u3 ) = ( 12 , 0, 25 ) is an optimal solution to
(D).
On the right is a very bad initial dictionary for it. It is not feasible: every single
basic variable has a negative value!
But let’s look at the bright side: all the reduced costs are positive! This is just
what we want to see in a minimization problem. It would indicate that we’ve found
an optimal solution. . . if it weren’t for that pesky “not actually feasible" problem. . .
Now that we know about finding dual solutions from the dictionary, we know
that these reduced costs are exactly the information we need to know that the dual
solution we can extract from it is feasible. More precisely, the dual solution here has
(u1 , u2 , u3 ) = (0, 0, 0); the dual constraints are u1 +3u2 +u3 ≤ 4.5 and u1 +u2 +2u3 ≤ 3,
and they are satisfied with a slack of 4.5 and 3, which are precisely the reduced costs
in our dictionary. Our “optimal-but-not-feasible" solution to the primal corresponds
to a feasible (but not optimal) dual solution!
The dual simplex method takes this idea and runs with it. Call a dictionary
dual feasible if all the reduced costs are the correct sign for optimality. We will start
with a dual feasible dictionary, and do pivot steps that preserve dual feasibility, while
getting the dictionary closer to ordinary (primal) feasibility. To do this, the overall
strategy is: choose a basic variable whose value is negative to leave the
basis, then choose an entering variable so that dual feasibility is preserved.
In this example, we’re spoiled for choice in leaving variables: all three of w1 , w2 , w3
are negative. Let’s pick w1 for no good reason. Meanwhile, we don’t know how to
choose an entering variable yet, so let’s try both. Here are the two dictionaries we
can get if either x1 (left) or x2 (right) enters the basis:
Choosing x1 is bad: we end up losing dual feasibility. On the other hand, dual
feasibility is preserved if we pivot on x2 . What are the rules we have to follow to
make this decision in general?
1. First, we have to pick an entering variable with a positive coefficient in
the leaving variable’s equation. In this example, both x1 and x2 had this
property, so we didn’t notice.
The reason for this rule is to make sure that our leaving variable ends up with
the correct sign of reduced cost. When we did the substitution of either x1 or
x2 in ζ’s equation, the old reduced cost was multiplied by the coefficient of w1 ,
so that coefficient had to be positive.
This is our “shortlist for entering variables", analogous to the “shortlist for
leaving variables" in the ordinary simplex method.
2. When multiple entering variables satisfy this property, we should compare
ratios. Specifically, we compute the ratio
and choose the entering variable with the smallest ratio. This is the calculation
we get if we track what happens to the reduced costs of other variables, and
make sure that they stay positive.
In a maximization problem, dual feasibility means negative reduced costs, and
we want to keep them negative. In that case, all these ratios will be negative,
and we want to pick the least negative ratio (the one closest to 0). In other
words, if we take absolute values first, the rule stays the same.
Let’s do another pivot from the dictionary where x2 is a basic variable. The
only negative basic variable in that dictionary is w2 , so let’s make w2 the leaving
variable to fix that. In the equation w2 = −2 + 2x1 + w1 , both x1 and w1 have positive
coefficients. We compute the ratios: x1 ’s ratio is 1.5 3
2 = 0.75 and w1 ’s ratio is 1 = 3.
This means x1 should be our entering variable, since its ratio is smaller.
and it is both feasible and dual feasible. So we’ve found the optimal solution! It is
(x1 , x2 ) = (1, 4) with objective value 16.5.
We will see many uses of the dual simplex method in the future, but there is
one practical use we can see directly from this example. If we used the ordinary
simplex method, we would have had to do two phases, because we don’t have an
initial basic feasible solution! Meanwhile, we do have a initial basic solution which is
dual feasible, so the dual simplex method is easier to start.
Under the hood, the dual simplex method is actually applying the simplex method
to the dual linear program. However, we don’t have to know that to know what is
going on: we don’t have to know what the dual constraints are, or the values of the
dual variables.
max ζ = 0 + x3
maximize x3 w1 = −0.1 + x1
x1 ,x2 ,x3 ∈R
subject to 0.1 ≤ x1 ≤ 1 − 0.1 w1′ = 0.9 − x1
0.1x1 ≤ x2 ≤ 1 − 0.1x1 w2 = 0 − 0.1x1 + x2
0.1x2 ≤ x3 ≤ 1 − 0.1x2 w2′ = 1 − 0.1x1 − x2
x1 , x 2 , x 3 ≥ 0 w3 = 0 − 0.1x2 + x3
w3 = 1 − 0.1x2 − x3
dictionary is neither feasible (since w1 < 0) nor dual feasible (since the reduced cost
of x3 is positive).
We could handle this using a two-phase method where we add an artificial variable.
But let’s see a different method of accomplishing the same thing.
Just like our earlier two-phase methods, the two-phase dual simplex method involves
solving a phase one problem before we get to the problem we actually want to
solve. But this method is much more economical: though it will require an auxiliary
objective function, we will not need to add any new variables or constraints.
The logic is this: dual feasibility only depends on the objective function, and not
on the constraints. So if we replace our original objective function by an auxiliary
objective function, we can choose that auxiliary objective function to make our
dictionary dual feasible! Then, we can apply the dual simplex method and solve that
phase one problem.
What should our auxiliary objective function be? Anything we like, as long as
it gives us a dual feasible dictionary. In general, this means minimizing any linear
expression with nonnegative coefficients on all the nonbasic (non-slack) variables.
You might be tempted to go with the simplest such linear expression: minimize
0. This is a bad choice, because the value of 0 doesn’t change as we pivot from basis
to basis. This means that the dual simplex method will constantly be doing “dual
degenerate pivots", and once again we have to worry about cycling.
A simple choice that will work as well as any other in general is to minimize the sum
of all the nonbasic variables. If you’re worried about degeneracy, you could borrow
from the lexicographic pivoting rule and decide to minimize ϵ1 x1 + ϵ2 x2 + · · · + ϵn xn ,
where ϵ1 ≫ ϵ2 ≫ · · · ≫ ϵn > 0. We will not bother doing this in our examples.
min ξ = 0 + x1 + x2 + x3
w1 = −0.1 + x1
w1′ = 0.9 − x1
w2 = 0 − 0.1x1 + x2
w2′ = 1 − 0.1x1 − x2
w3 = 0 − 0.1x2 + x3
w3 = 1 − 0.1x2 − x3
Only one basic variable currently has a negative value: w1 = −0.1. The only variable
with a positive coefficient in w1 ’s equation is x1 , so we have no choice in our pivot:
x1 enters and w1 leaves. We get x1 = 0.1 + w1 when we solve for x1 , and then we
substitute that in the other rows, getting
min ξ = 0.1 + w1 + x2 + x3
x1 = 0.1 + w1
w1′ = 0.8 − w1
w2 = −0.01 − 0.1w1 + x2
w2′ = 0.99 − 0.1w1 − x2
w3 = 0 − 0.1x2 + x3
w3 = 1 − 0.1x2 − x3
Again, only one basic variable has a negative value: w2 = −0.01. The only variable
with a positive coefficient in w2 ’s equation is x2 , so we still have no choice in our
pivot: x2 enters and w2 leaves. We get x2 = 0.01 + 0.1w1 + w2 when we solve for x2 ,
and then we substitute that in the other rows, getting
Again, only one basic variable has a (barely) negative value: w3 = −0.001. And yet
again, there is only one choice of entering variable to replace w3 as a leaving variable:
x3 . When we pivot, we solve for x3 and get x3 = 0.001 + 0.01w1 + 0.1w2 + w3 , leading
We are done with phase one! We don’t really care that we’ve optimized ξ, but the
good news for us is that the dictionary is feasible. When we replace the objective
function with ζ = x3 = 0.001 + 0.01w1 + 0.1w2 + w3 , it remains feasible:
Now we are ready to maximize ζ, and if we like, we can use Bland’s rule and take
the most ridiculous number of steps possible to do it.
By the way, if you notice, the actual artificial objective function ξ never played
a role in our pivoting. This is not guaranteed to happen, but it’s not particularly
surprising: if we want the point (x1 , x2 , x3 ) = (0, 0, 0) to be dual feasible for ξ, then
ξ will probably be minimized at some point close to (0, 0, 0). This means that we
shouldn’t stress out too much about our choice of ξ in problems like this.
Your boss takes one look at the printout and says, “No, no, that won’t work. What
kind of fool doesn’t know that the doohickey will overheat if you run it for more than
10 hours a day? The table is right there on the machine: you need the doohickey
for 15 minutes per doodad and 5 minutes per gizmo. Go fix this, I need the factory
schedule yesterday!"
Do you have to start from scratch? No. Let’s take the new constraint and
insert it into our final dictionary. Okay, this takes a bit of work: the constraint is
15xd + 5xg ≤ 600, which we write as w5 = 600 − 15xd − 5xg for a new slack variable
w5 . But xd and xg are also basic, so we substitute their equations in:
which simplifies to w5 = −1700 + 25w2 + 45w3 + 1.5w4 . Now we can add that to our
dictionary.
The new dictionary is
Today, we look at the following question: what happens to the optimal solution when
we change the linear program slightly?
7 Note: the term “row generation" is often used specifically for a technique called Benders
decomposition, which is one particular example of the general idea.
We will start by looking at one specific change in the objective function. Rather
than maximizing 2x + 3y, what happens when we maximize (2 + δ)x + 3y for some
real number δ? (In an economic application, this could happen when the profitability
of a product goes up or down.)
Because this is a small linear program with only five corner points, we can answer
this question in the silliest way. At each corner point, we can compute the value of
(2 + δ)x + 3y, as a function of δ. We get the diagram on the left:
ζ = 19 + 2δ ζ
y
40
ζ =9
30
47
ζ= 3 + 16
3δ 20
10
x
ζ =0 ζ = 2δ δ
-8 -6 -4 -2 2 4
We know that the optimal solution is going to be the best of the corner points.
Therefore, as a function of δ, the maximum value of (2 + δ)x + 3y is max{0, 9, 19 +
2δ, 47 16
3 + 3 δ, 2δ}. The diagram on the right plots the resulting function.
What we see is a piecewise linear function whose slope increases from left to
right. It has a segment where it is equal to ζ = 9, a segment where it is equal to
ζ = 19 + 2δ, and a segment where it acts as ζ = 47 16
3 + 3 δ. Those segments correspond
to the ranges of δ where each of those corner points is optimal.
(There are no segments where the function is equal to ζ = 0 or ζ = 2δ. That’s
because (0, 0) and (2, 0) will never be optimal as long as y has a positive coefficient
in the objective function: (0, 0) is always worse than (0, 3) and (2, 0) is always worse
than (2, 5).)
Theorem 2.12 Suppose that our linear program has optimal solution x∗ with
objective value cT x∗ = ζ ∗ . If we change the coefficient of xi in the objective
function from ci to ci + δ, then the new objective value will be at least as good
as ζ ∗ + δx∗i . (This is a lower bound when maximizing, and an upper bound when
minimizing.)
For small δ, we can hope that the new objective value will be exactly ζ ∗ + δx∗i .
2.57.3 Ranging
Even a prediction for small values of δ can be more precise than this. From looking
at the plot we got by comparing all corner points, we see that the prediction of
19 + 2δ is exactly correct when −5 ≤ δ ≤ 1. We can try to determine the interval
where our prediction is correct: this is called ranging.
To do this, we’ll need to look at the optimal dictionary, not just the optimal
solution. The key idea is that to understand the effect of adding δx to the objective
function, we can add δx to the equation for ζ in our optimal dictionary.
Of course, this is no longer a properly formed dictionary, because x is a basic
variable and should never appear on the right-hand side. So we substitute the
equation x = 2 + 12 w1 − 12 w3 into δx and simplify. This results in the dictionary on
the right:
for every nonbasic variable xj . Negative values are lower bounds on δ and positive
values are upper bounds on δ; the prediction is guaranteed to be correct if all the
bounds hold.
optimal solution is 0, so our prediction says that the objective value will not change
when the cost of the variable changes.
For how long will this be true? Well, adding δ to the cost of a basic variable is
the same as adding δ to its reduced cost. So we have just one restriction: addition
of δ can’t change the sign of the reduced cost.
To see this in our example, we’ll have to do something a bit unnatural and see what
happens when we add a δw1 term to the objective function. (Usually, slack variables
don’t show up at all in the objective function, but we’ll make an exception here.)
This changes the reduced cost: ζ = 19 − 12 w1 + 52 w3 becomes ζ = 19 + (δ − 12 )w1 + 52 w3 .
Therefore our solution remains optimal as long as δ − 12 ≤ 0, or δ ≤ 12 . (In particular,
making δ an arbitrarily big negative number will never change anything.)
This theorem has one detail we did not mention: it is an “optimistic" prediction.
Intuitively, for large changes in δ, one of two things will happen:
• If there is a very large positive change, say to −x + y ≤ 10000, what can happen
is that the constraint might stop being relevant. In this problem, even if the
−x + y ≤ 3 constraint is removed entirely, the optimal solution will just be
(0, 7) with an objective value of 21. Therefore at some point
• If there is a very large negative change, say to −x + y ≤ −10000, what can
happen is that the problem might become infeasible. In this case, we consider
the maximum objective value to be −∞, which is infinitely worse than the
prediction 19 + 2δ.
The underlying reason that this prediction is “optimistic" while our previous pre-
dictions were “pessimistic" is that the dual program is the reverse of the primal: it
minimizes when the primal maximizes, and vice versa.
The dual variables are also called shadow costs due to an economic application
of this analysis. Suppose that our objective value 2x + 3y measures the profit we make
from a particular solution. Then the prediction “when the constraint −x + y ≤ 3
changes to −x + y ≤ 3 + δ, we predict that the objective value will change to 19 + 12 δ"
means that an increase by δ in this constraint is worth 12 δ dollars to us. In other
words, we should be willing to pay up to 50 cents for each unit increase in this upper
bound.
The term “shadow cost" refers to the idea that we’re putting an inferred value on
something that might not have a clear inherent value to us. For example, if the ith
constraint is given by the number of hours our employees can work, then the dual
variable ui tells us the price of labor: a limit on how much we should be willing to
pay one of them to work an additional hour.
This gives us the same conclusion: that the objective value changes to 19 + 12 δ.
However, we can also see what happens to the basic variables. What does this tell
us about the limits on δ? Well, this prediction stops being valid if our basic solution
stops being feasible: if any of the basic variables become negative. So we get the
following constraints on δ:
1 3 1
5 + δ ≥ 0, 10 + δ ≥ 0, 2 − δ ≥ 0.
2 2 2
This gives us a lower bound δ ≥ −10, another lower bound δ ≥ − 20 3 , and an upper
bound δ ≤ 4. Therefore our prediction is valid for δ in the range [− 20
3 , 4].
For a basic variable like w2 , the same method works, but we can work things
out more intuitively. Seeing w2 = 10 in the dictionary tells us that the constraint
x − 2y ≤ 2 is not even tight at the moment: x − 2y is 10 lower than its upper bound.
So changing the right-hand side by a small amount will not affect the objective value,
and this stays true, provided that we don’t reduce it by more than 10.
It is best for both players to stay silent, rather than both testify, so they should
agree not to testify. However, each individual player is better of testifying, no matter
what the other does, so they should betray that agreement. However, that leaves
both players worse off. This weird behavior gives the prisoner’s dilemma a rich and
complicated dynamic. . .
A zero-sum game is one in which any hope of cooperation between the players
is eliminated, because Alice’s payoff is the negative of Bob’s payoff. (They sum to
0.) Whatever outcome helps one player, hurts the other player equally.
Though a lot of the concepts we will introduce make sense for general matrix
games, we will mostly rely on the assumption that Alice should expect Bob to make
the choice that’s worst for Alice, and vice versa. This is a bad assumption in general:
Bob wants to make the choice that’s best for Bob, whether or not that hurts Alice!
But in the case of zero-sum games, the two are equivalent.
In any case, our goal in analyzing these games will be to determine what Alice
and Bob’s optimal strategies are, and what the resulting payoff is for both players.
2.60 Strategies
We will look at some examples of zero-sum games to illustrate a few cases where we
can find the optimal strategies easily, and the general case which is more complicated.
It is immediate to see that both players want to hold up more fingers rather
than fewer. The point of this example is just to introduce two terms that help in
analyzing matrix games.
• We say that one strategy of a player dominates another strategy if, no matter
what the other player does, the first strategy is better than (or at least as good
as) the second.
Here, “two" dominates “one" and “three" dominates both “one" and “two", for
both players: no matter what the other player does, you can never go wrong
by holding up more fingers.
• If a strategy dominates every other strategy, we call it a dominant strategy.
Here, “three" is the dominant strategy for both players.
If we can identify a dominant strategy for a game, there is nothing left to analyze—
at least in the case of zero-sum games. A player with an optimal strategy should
always play it. With that assumption, the only thing left for the other player to do
is to pick the best response to that strategy.
Even if there is no dominant strategy, it makes sense to eliminate from considera-
tion any strategy that’s dominated by another strategy. After all, that other strategy
is never worse. This can help us simplify the problem and make the payoff matrix
smaller.
There is no dominant strategy in this game: for each player, both of the “all-in"
strategies that focus on one gate have the highest reward, but also the highest risk.
To analyze this game, we can make the following observations:
• If Alice splits her forces, she is certain not to lose points. She might even win
a few points if Bob attacks in force.
• If Bob sends raiding groups, he is also certain not to lose points (and so Alice
is certain not to win any points). If one of the gates is unprotected, Bob might
even win some points.
This means that for Alice, splitting her forces is an optimal strategy. On the one
hand, it guarantees her at least 0 points. On the other hand, Bob has a counter-
strategy that prevents Alice from gaining any positive amount of points, so she can’t
possibly do better. For Bob, sending raiding groups is an optimal strategy for the
same reason. (Either one of the generals can even announce the strategy they take
in advance: it will not make any difference.)
In general, we call an outcome like “Alice splits her forces and Bob sends raiding
groups" a saddle point. A saddle point in a zero-sum game is an outcome that’s
the worst outcome for Alice in its row, but the best outcome for Alice in its column.
Whenever there is a saddle point, either player can guarantee an outcome at least as
good as the saddle point by choosing its row or its column as a strategy.
It is a coincidence that the saddle point gives 0 points to both players in this
example. We could modify the problem by having Bob give Alice one extra point in
each outcome (to model the idea that Bob’s army is running low on supplies, while
Alice is in a well-stocked city with months of food). The same outcome would stay a
saddle point, because relative comparisons between two outcomes come out the same
way. However, the saddle point would now give payoffs of (1, −1) to Alice and Bob.
Alice would like to choose the best probability vector y for her mixed strategy.
But she can’t use the formula yTAx to evaluate how good a strategy is directly,
because she doesn’t know which vector x ∈ Rn represents Bob’s strategy. Instead,
one thing Alice can do is evaluate her strategies by what happens if Bob knows
her strategy and chooses the option that’s worst for Alice. This is called Alice’s
maximin strategy.
For all we know, this could be a terrible idea! In fact, we can cook up lots of
examples of games that aren’t zero-sum, in which the maximin strategy does terribly.
Consider the “Win, Lose, and Copy" game, defined as follows. Bob has two
options: “Win $100" and “Lose $100". Alice also has two options: “Don’t play" and
“Copy Bob". This game has payoff matrix
In this game, Alice’s maximin strategy is not to play: copying Bob has the risk
that Bob will pick “Lose $100", and not playing can’t lose any money. But Bob isn’t
stupid and will never pick “Lose $100", so “Copy Bob" is guaranteed to earn Alice
$100 as well.
We will see that in the case of zero-sum games, the maximin strategy is reasonable,
but it will take us some time to get there.
Proof. If Alice is playing the mixed strategy given by some probability vector y ∈ Rm ,
then yTA is the vector of her possible payoffs, depending on Bob’s choices. If Bob
plays the pure strategy “always pick option i" for some i ∈ {1, 2, . . . , n}, then Alice’s
payoff is going to be (yTA)i : the ith component of this vector.
If Bob plays a mixed strategy x ∈ Rn , then Alice’s expected payoff yTAx is
a weighted average of the payoffs above, where payoff (yTA)i is multiplied by
weight xi . The weighted average can’t be lower than the smallest of the payoffs
(yTA)1 , (yTA)2 , . . . , (yTA)n . So the smallest of those payoffs is Alice’s worst case.
Or in other words: if option j ∈ {1, 2, . . . , n} is the best response for Bob, then
playing a mixed strategy x instead could be described as “with probability xj , do the
best thing; the rest of the time, do something worse." That’s obviously suboptimal.
(Technically, multiple options could be tied for Bob’s best response, in which
case choosing randomly between them is just as good as choosing one of them; but
choosing randomly will never be strictly better.) ■
Based on this claim, the worst-case payoff when Alice plays a mixed strategy
given by y ∈ Rm is
n o
min (yTA)1 , (yTA)2 , . . . , (yTA)n .
Therefore Alice can find a maximin strategy by solving the following optimization
problem:
n o
maximize
m
min (yTA)1 , (yTA)2 , . . . , (yTA)n
y∈R
subject to y1 + y2 + · · · + ym = 1
y≥0
This is not a linear program. But there is a trick that turns it into one!
To maximize the minimum of multiple options, we can maximize an auxiliary
variable u, subject to the constraint that u is smaller than each option. In other words,
we maximize u, adding the constraints u ≤ (yTA)1 , u ≤ (yTA)2 , . . . , u ≤ (yTA)n .
Let 1 denote the vector (1, 1, . . . , 1) in which every component is 1. (In this case,
we’ll want to have 1 ∈ Rn , but in general, we’ll abuse notation and write 1 for the
all-ones vector of whichever dimension we need.) Then a quick way to write down
these constraints on u is u1T ≤ yTA. Similarly, the constraint y1 + y2 + · · · + ym = 1
can be written as yT 1 = 1, where 1 ∈ Rm .
So we get the following linear program:
maximize
m
u
y∈R ,u∈R
subject to yTA ≤ u1T
yT 1 = 1
y≥0
Bob: 1 Bob: 2
Alice: 1 (−1, 1) (1, −1)
Alice: 2 (2, −2) (−2, 2)
If Alice plays a mixed strategy given by the probability vector (y1 , y2 ), what
happens?
• When Bob counters with playing “1" (holding up 1 finger), Alice’s expected
payoff is y1 (−1) + y2 (2) = −y1 + 2y2 .
• When Bob counters with playing “2” (holding up 2 fingers), Alice’s expected
payoff is y1 (1) + y2 (−2) = y1 − 2y2 .
Alice wants to choose (y1 , y2 ) to maximize her expected payoff in the worst case:
she wants to maximize min{−y1 + 2y2 , y1 − 2y2 }. For this, we write down the linear
program
maximize u
u,y1 ,y2 ∈R
subject to −y1 + 2y2 ≥ u
y1 − 2y2 ≥ u
y1 + y2 = 1
y1 , y2 ≥ 0
The amazing thing that happens is that Alice and Bob’s linear programs are duals
of each other! We can see this by rewriting them side-by-side in a more standardized
form, and pairing variables and constraints appropriately:
maximize
u
minimize v
u,y1 ,y2 ∈R
v,x1 ,x2 ∈R
subject to y1 + y2 = 1 (v) subject to x1 + x2 = 1 (u)
(P) u + y1 − 2y2 ≤ 0(x1 ) (D) v + x1 − x2 ≥ 0(y1 )
u − y1 + 2y2 ≤ 0(x2 ) v − 2x1 + 2x2 ≥ 0(y2 )
y1 , y2 ≥ 0 x1 , x 2 ≥ 0
In particular, pay attention to u and v: these are unconstrained variables, and each
one is paired with an equation constraint in the other linear program.
It is also true that in general, the linear program for Alice’s maximin strategy
is dual to the linear program for Bob’s minimax strategy. We can write the linear
programs in the following form to expose the duality:
maximize
u minimizen v
u∈R,y∈Rm
v∈R,x∈R
yT 1 = 1(v) 1T x = 1(u)
subject to subject to
(P) (D)
u1T − yTA ≤ 0T (x)
1v − Ax ≥ 0(y)
y≥ 0 x≥0
The matrix of coefficients in the constraints is, in both cases, the matrix with block
structure
0 1T
1 −A
Bob: 1 Bob: 2
Alice: 1 (1, −1) (3, −3)
Alice: 2 (4, −4) (0, 0)
In this payoff matrix, Alice’s payoffs are always nonnegative, so u will also be
nonnegative. However, Alice’s optimal strategy is unaffected by Bob giving her $2
unconditionally. Therefore solving the linear program with this new payoff matrix
will produce the same optimal (y1 , y2 ), and now we can assume that u ≥ 0.
Also, though in general an equational constraint is hard to deal with, in this
problem we don’t have to go to the trouble of using a two-phase method. Any pure
strategy, such as for example (y1 , y2 , . . . , ym ) = (1, 0, . . . , 0), is a feasible solution to
Alice’s linear program. This means that after adding slack variables w1 , w2 , . . . , wn
to Alice’s linear program, we can solve for the basic variables (y1 , w1 , w2 , . . . , wn ) to
get an initial feasible basis.
let’s limit those to Chicago (ORD), New York (JFK), and Dallas (DFW). On our
simplified map of the airports, we will draw arrows representing the possible flights.
Every airplane can carry the same amount of oranges, but the number of flights
between the airports is limited. To keep track of that number, we label each arrow
from airport to airport with the number of flights we can use to carry oranges going
in that direction. This labeling is shown below on the left. On the right, we see one
possible (though maybe not very efficient) way that the flights could be used: there
are 5 airplanes carrying oranges from LAX to DFW, 5 more carrying those same
oranges from DFW to ORD, and then the oranges are split up; 4 airplanes’ worth of
oranges are taken directly to ATL, and the remaning oranges are shipped through
JFK.
6 0/6
6 4 0/6 4/4
LAX 5 7 LAX 5/5 1/7
3 0/3
6 5/6
DFW 2 ATL DFW 0/2 ATL
We would like to get as many oranges as possible from LAX to ATL per day.
How can we formulate this question as a linear program?
Our variables in this problem will have to be the variables we need to specify
feasible solutions such as the one we see in the diagram of the right: for every
pair of airports connected by flights, we need to know the number of airplanes
used to ship oranges from one to the other. For example, we might have a variable
xLAX,DFW to represent how much is being shipped from LAX to DFW; in the diagram,
xLAX,DFW = 5. Sometimes, there are flights going both ways, which need different
variables; for example, we will have separate variables xDFW,JFK and xJFK,DFW .
What are the constraints on these variables? The most straightforward ones
are the ones coming from the numbers in the diagram: for example, xLAX,DFW ≤ 6
because we are told that there are only 6 flights from LAX to DFW. (Also, of course,
all these variables should be nonnegative.) But if those were all the constraints, then
we’d “solve” the problem by setting each variable to its maximum value—after all,
why not?
There are some logical problems with doing that. For example, we’d have 12
flights leaving LAX carrying oranges each day, but 13 flights entering ATL with
oranges. Where do the extra oranges come from? The problem is that we forgot to
add any constraints saying that oranges can’t appear out of nowhere or vanish into
nowhere.
The “conservation of oranges” constraints will look at every intermediate airport
and say: the number of oranges going in should equal the number of oranges going
out. For example, at JFK, this constraint would be
We do not have such a constraint at LAX or ATL. There are no oranges that can be
shipped into LAX, but that does not mean that xLAX,ORD +xLAX,DFW (the number of
oranges shipped out of LAX) should be 0: in fact, we want this quantity to be as large
as possible! To solve our problem, we can either maximize xLAX,ORD + xLAX,DFW
or, equivalently, maximize xORD,ATL + xJFK,ATL + xDFW,ATL : the number of oranges
arriving in ATL. With the “conservation of oranges” constraint in play, these should
be one and the same.
Now we have all the ingredients we need for this linear program: our first example
of a maximum flow problem.
Here, “i : (i, k) ∈ A” means “this sum ranges over all nodes i such that (i, k) is
an arc”. Similarly, the second sum ranges over all nodes j such that (k, j) is an
arc.
The meaning of this constraint is that the total flow going into node k is equal
to the total flow going out of node k.
Subject to all three sets of constraints, we maximize the total flow out of the source:
X
xsj .
j:(s,j)∈A
Here, the value of the flow 7; we can guarantee that this is best possible, because
the total capacity of edges leaving s is 7, so at most 7 flow can leave s.
a a
3/3 2/2 3/3 3/5
s 1/5 t s 0/3 t
4/4 5/7 4/7 4/4
b b
Now consider the second diagram. Here, things are a bit more complicated. However,
we can still see a “bottleneck" in the flow if we think of splitting up the nodes into
{s, b} and {a, t}. The total flow going from {s, b} to {a, t} is 7 (which is still the
value of the flow). This cannot be increased, because all edges from s or b to a or t
(namely, (s, a) and (b, t)) have the maximum flow possible, and all edges going the
other way (namely, (a, b)) have zero flow.
The generalization of this notion is a cut. A cut in a network is a partition of
the node set N into two sets, S and T , such that s ∈ S and t ∈ T . (Being a partition
requires that S ∩ T = ∅ and S ∪ T = N : each node is exactly one of the two sets
S, T .)
The capacity of a cut (S, T ) is, informally, the maximum amount of flow that
can move from S to T . Formally, it is the sum
XX
c(S, T ) = cij
i∈S j∈T
where we take cij to be 0 if (i, j) ∈/ A. For example, in the second diagram above,
the cut ({s, b}, {a, t}) has capacity csa + cbt = 7, because (s, a) and (b, t) are the only
two arcs going from {s, b} to {a, t}.
Low-capacity cuts are bottlenecks in the network: if we have a cut (S, T ) with
capacity c(S, T ), then no more than c(S, T ) flow can be sent from s to t. This might
make intuitive sense, but just in case, let’s prove why this happens.
Theorem 2.16 If a cut (S, T ) has capacity c(S, T ), then no more than c(S, T ) flow
can be sent from s to t in the network.
Proof. Let x be a feasible flow in the network, and consider the sum
X X X
v(x) := xkj − xik .
k∈S j:(k,j)∈A i:(i,k)∈A
On one hand, flow conservation tells us that only the k = s term of this sum is
allowed to be nonzero. In fact, the k = s term of this sum is equal to the value of x,
and therefore the whole sum v(x) simplifies to the value of x.
Now rearrange this sum slightly differently. First, split it up:
X X X X
v(x) = xkj − xik .
k∈S j:(k,j)∈A k∈S i:(i,k)∈A
Now split up each of these sums further: in each sum ranging over all i or all
j, consider i, j ∈ S separately from i, j ∈ T . (To simplify notation, we’ll drop the
requirement that (k, j) ∈ A or (i, k) ∈ A: with the convention that the capacity of
arcs not in A is 0, this doesn’t make a difference.) We get:
XX X X XX XX
v(x) = xkj + xkj − xik + xik .
k∈S j∈S k∈S j∈T k∈S i∈S k∈S i∈T
The double sum over k ∈ S, j ∈ S cancels with the double sum over k ∈ S, i ∈ S,
because those two sums include the exact same terms, so we have
X X XX
v(x) = xkj − xik .
k∈S j∈T k∈S i∈T
Now we’re going to put an upper bound on v(x). For the first sum, we have xkj ≤ ckj
in each term, and so replacing xkj by ckj can only increase the result. For the second
sum, we have xik ≥ 0 in each term, and we’re subtracting all of these terms, so
replacing xik by 0 can also only increase the result. Therefore
X X XX
v(x) ≤ ckj − 0.
k∈S j∈T k∈S i∈T
But this is precisely the definition (with slightly different summation variables) of the
capacity c(S, T ). Therefore v(x) ≤ c(S, T ), which is the inequality we wanted. ■
Here, (P) is essentially the same as our maximum-flow linear program. To make the
dual look nicer, we’ve added a dummy variable v that stands in for the objective
function, and a dummy variable v ′ that does nothing (but should be equal to v in
any feasible solution). This means that we have constraints for s and for t that look
a bit like the flow conservation constraint.
Meanwhile, if we stare at (D) long enough, we will recognize a minimum-cut-like
idea to it. There is only one constraint involving each dual variable yij , which can
be rewritten as yij ≥ ui − uj ; however, yij also must be nonnegative. So we could
make the y-variables disappear if we rewrote (D) as a minimax program, where we
minimize
X
cij max{0, ui − uj }.
(i,j)∈A
a 0/10 b
0/10 0/4 0/12
s t
0/10 0/4 0/8
c 0/4 d
Our first step is to notice the promising path that goes s → a → b → t. The arcs
along this path all have capacity at least 10, so we can send 10 flow along this path:
a 10/10 b
10/10 0/4 10/12
s t
0/10 0/4 0/8
c 0/4 d
Similarly, there is the path s → c → d → t. This one doesn’t, let us make as much
progress, but the bottleneck capacity along that path is 4, so we can send 4 flow
along this path:
a 10/10 b
10/10 0/4 10/12
s t
4/10 0/4 4/8
c 4/4 d
And there is room to send 2 more flow along the path s → c → b → t (but no
more, because the arc (b, t) reaches its capacity of 12 when we do so):
a 10/10 b
10/10 0/4 12/12
s t
6/10 2/4 4/8
c 4/4 d
At this point, we seem to be done. There are no paths that go from s to t along
which we can increase the flow.
However, this still not the maximum flow: there are flows with larger value. It’s
just that we’ve gotten stuck at a feasible flow where we can’t increase all the flows
without decreasing all the flows. We need a better strategy.
(Disclaimer: if we had chosen different paths from s to t to use at each step, we
could have gotten to the maximum flow. But the point is that we don’t know which
paths are the right ones, so we need to be smarter than that.)
10
a 10/10 b 10
a b 12
10/10 0/4 12/12 4
s t s 2 t
6
6/10 2/4 4/8 4 2 4
4
c 4/4 d c d
4
4 2 10 4 4
s→
− c→
− b −→ a →
− d→
− t
in the residual graph. That corresponds the same augmenting path we found earlier,
but now we don’t have to follow arcs in unnatural directions to see it. What’s more,
the value of δ is now easy to find: it’s just the smallest residual capacity along this
path.
This residual graph is a impossible “maze”, and it doesn’t take us long to discover
this. From node s, we can only get to node c; from node c, we can only return to
node s. As a result, there is no augmenting path to find for this flow.
This seems disappointing, but actually it’s expected, and it’s what we want to
see. Here’s why.
Theorem 2.17 Suppose that there is no path from s to t in the residual graph (for
some feasible flow x in some network). Then:
• The flow x is a maximum flow.
• Let S be the set of all nodes reachable from s in the residual graph (including
s itself). Let T be the set of all other nodes. Then (S, T ) is a minimum cut,
and has capacity equal to the value of x.
Before we prove the theorem, we can verify that this is true in our network. The
value of our flow is 18: that’s the total flow leaving s (10 + 8), and also the total flow
entering t (12 + 6). Meanwhile, if S is the set of all nodes reachable from s in the
residual graph, then S = {s, c}, which means T = {a, b, d, t}. The capacity of the cut
(S, T ) is csa + ccb + ccd = 10 + 4 + 4 = 18.
On the left, we have (by definition) the value of the flow x. By some algebraic
manipulation, we showed that this is equal to the middle expression: the total flow
crossing from S to T , minus the total flow crossing from T back to S. This is upper
bounded by the expression on the right (by using xij ≤ cij on every term of the first
sum, and xij ≥ 0 on every term of the middle sum), and the expression on the right
is just the capacity of the cut (S, T ).
Now let’s think about what happens in the special case where S is the set of all
vertices reachable from the source in the residual graph. This means that there can
be no residual arc from any node i ∈ S to any node j ∈ T .
There are two kinds of residual arcs.
• Forward residual arcs i → j corresponding to arcs (i, j) with xij < cij .
If there are no such arcs from S to T , then for every i ∈ S and j ∈ T , we have
xij = cij .
• Backward residual arcs i ← j corresponding to arcs (i, j) with xij > 0.
If there are no such arcs from S to T , then for every i ∈ T and j ∈ S, we have
xij = 0.
As a result, for the cut we get from the residual graph, the ≤ inequality is actually
a = equation. We replaced xij by cij only in cases where we already had xij = cij ,
and we replaced xij by cij only in cases where we already had xij = 0.
Therefore the value of the flow x is equal to the capacity of the cut (S, T ).
The optimality of both of the flow and the cut follows. We know the capacity
of the cut (S, T ) is an upper bound on the value of any flow; since x achieves that
upper bound, it is a maximum flow (no other flow can be better). Similarly, the
value of x is a lower bound on the capacity of any cut: since (S, T ) achieves that
lower bound, it is a minimum cut.
a
0/1000 0/1000
s 0/1 t
0/1000 0/1000
b
The maximum flow here has value 2000, which can be reached in just 2 steps. But if
we have poor judgement, and alternate between the augmenting paths s → a → b → t
and s → b ← a → t, we increase the flow by 1 at each step, and only finish is 2000
steps.
Things are even worse if some capacities are irrational numbers. Then there is
an example in which a poor choice of augmenting paths means we never even get
close to the maximum flow.
Fortunately, there is a simple rule to avoid this situation. If we always pick the
shortest augmenting path available at every step, then it can be shown that we’ll
always reach the maximum flow after at most n · m steps (in a network with n nodes
and m arcs). This refinement is called the Edmonds–Karp algorithm.
Proof. When the Ford–Fulkerson method finishes computing the maximum flow in a
network, it ends by finding a cut whose capacity is equal to the value of the resulting
flow: that’s how we know that we’ve found the optimal solution. ■
Proof. This result comes from thinking about how the Ford–Fulkerson method finds
a maximum flow. We repeatedly find an augmenting path, then identify the minimum
residual capacity of any of its arcs, and then increase or decrease the flow of those
arcs by that residual capacity.
As long as we have an integer flow x, every residual capacity will also be an
integer: the residual capacity of any arc in the residual graph is given by either xij
or cij − xij , and both of those will have integer values. So in the next step, several
of the xij ’s will change by an integer value, which means that the next flow will also
be an integer flow. That means that when the Ford–Fulkerson method finishes, the
values of x will still all be integers. ■
It appears that the linear program for the maximum flow problem is quite special.
First, we found out that the dual program has integer optimal solutions (which
describe cuts, and not some weird fractional analog of cuts). Now we are seeing that
the primal program also has integer optimal solutions!
Although we have already shown that this is true, the first thing we’ll do today
will be to give another proof of this fact—one that relies on just thinking about the
properties of the maximum flow linear program. This is useful to know, because
it can guarantee integer solutions to some other problems as well. After that, we
will see some applications of maximum flow problems, including ones where the
integrality plays a key role.
The important thing to notice about this formula is that the only number we divide
by is ad − bc: the determinant of the matrix. This continues for larger matrices; for
example, for 3 × 3 matrices, we have
−1
a b c ei − f h ch − bi bf − ce
1
d e f = f g − di ai − cg cd − af
aei + bf g + cdh − af h − bdi − ceg
g h i dh − eg bg − ah ae − bd
and the big fraction in front is exactly the determinant of the matrix. In general, the
1
inverse of an n × n matrix A is equal to det(A) multiplied by another matrix called
the adjugate matrix, whose entries are polynomial functions of the entries of A.
We do not need to know the details. Just knowing this much gives us a useful
corollary:
Theorem 2.20 If A is an n × n matrix with integer entries, then A−1 has integer
entries if and only if det(A) = ±1.
Proof. If det(A) = ±1, then we conclude that A−1 has integer entries from formulas
like the ones above: we compute the adjugate matrix (which will have integer entries,
because we multiply, add, and subtract the entries of A) and divide by det(A) (which
is 1 or −1, so it will not give us any fractional results).
On the other hand, if A and A−1 both have integer entries, then det(A) and
det(A−1 ) must both be integers. But it’s always true that det(A−1 ) = det(A)
1
, because
−1 −1
det(A) · det(A ) = det(AA ) = det(I) = 1. The only way that both det(A) and
1
det(A) can be integers is if det(A) = ±1. ■
Proof. The key to this is that each variable xij in a flow appears in at most two
conservation constraints: once in flow conservation at i, and once in flow conservation
at j. (When i or j is s or t, there might be just one constraint.) Moreover, these
have opposite coefficients: 1 and −1.
If we take a k × k submatrix of the flow conservation matrix, one of the following
happens happens:
1. We picked the column for a variable xij , but didn’t pick any of the rows where
xij has a positive coefficient. Then our submatrix has a column of all zeroes,
and the determinant is 0.
2. We picked the column for a variable xij , but only picked one of the rows
where xij has a nonzero coefficient. Then we can do an expansion by minors
along xij ’s column, and get a determinant equal to ±1 times a (k − 1) × (k − 1)
determinant; repeat this argument for that determinant instead.
(If k is already 1, we get a determinant of ±1 equal to our single nonzero entry.)
3. If case 1 and 2 don’t occur for any column, then every column has both a 1
and a −1 inside it. Then the rows of our submatrix add up to 0, because the 1
and −1 in every column cancel. This is a linear dependency between the rows,
so the determinant is 0.
Since all cases result in a determinant of −1, 0, or 1, the matrix is totally unimodular.
■
To help with visualization, here is an example network and the matrix for its flow
conservation constraints:
a 0/10 b
1 0 −1 −1 0 0 0 0
0/10 0/4 0/12
s t
0
0 1 0 −1 1 0 0
0/10 0/4 0/8
0
1 0 0 0 −1 −1 0
c 0/4 d 0 0 0 1 0 0 1 −1
The rows correspond to nodes a, b, c, d; the columns, to variables xsa , xsc , xab , xad , xbt , xcb , xcd , xdt .
Theorem 2.21 implies the integral flow theorem (we have to consider the capacity
constraints as well, but it turns out these do not change much). It also explains why
the dual program always has an integer optimal solution (representing a cut).
s
0/5 0/5 0/5 0/5
b1 b2 b10 b11
0/5 0/5 0/5 0/5
Suppose we can find an integer flow in this network with value 5 · 11 = 55. Then
every node ai receives 5 flow from s, which it must send to 5 different nodes among
b1 , b2 , . . . , b11 . Meanwhile, each of those nodes must receive a total of 5 flow, which it
sends on to t. Now, if we interpret a flow of 1 from ai to bj as “Employee i is trained
in process j” then we satisfy the conditions in the problem exactly.
How do we know that an integer flow like this exists? The key is that a fractional
flow like this can be found without any work. As before, send 5 flow from s to every
Can we round all percentages to integer values so that the row and column totals
still make sense?
There are more than just aesthetic reasons why we might want to round data like
this. In a scientific report, we want to avoid revealing specific numbers, but as it is,
a determined investigator might notice that all of these percentages are approximate
1
multiples of 14 . This strongly suggests that there were 14 people total, that exactly
one student preferred coffee, and so forth.
It turns out that we achieve the rounding we want if we allow ourselves a tiny bit
of flexibility: we can round each percentage in either direction, not just to the closest
integer. Moreover, this can be described as a variant of the network flow problem!
The network that represents the problem is given below. Arcs (s′ , a) and (s′ , b)
correspond to the row sums; arcs (p, t), (q, t), and (r, t) correspond to the row sums;
the six intermediate arcs correspond to individual entries of the table. With this
setup, flow conservation constraints are exactly the constraints that tell us that row
and column sums do what they’re supposed to do.
a [7,8] p
[21,22]
[42,43] [14,15] [35,36]
s [100,100] s′ q [28,29] t
[57,58] [28,29] [35,36]
[7,8]
b [21,22] r
We have not talked about how to solve a network flow problem with lower and
upper bounds. It turns out that there are ways to convert this problem to a problem
with only capacities, and if we had more time in the semester, we’d absolutely talk
about how. For now, let’s just prove that this network must have an integer flow
satisfying all the constraints (and with value exactly 100).
First, there is a feasible flow x using the exact values that generated our per-
6
centages: for example, xsa in this feasible flow is exactly 14 · 100%, the value that
created our approximate value of 42.86%. This flow is optimal, so an optimal integer
flow exists as well.
2.77 Transversals
Problem 2.20 At Network Flow University, there are many clubs and student
organizations. There are so many that the students ran into a problem: how can
they elect presidents for them all?
Of course, every club must have a president (who must be a member of the club).
Moreover, being president is a lot of work, so every student should be a member
of only one club. If we know the rosters of all the clubs, can we always choose a
president for each one?
Here is a particular instance of this problem—this one is probably much smaller
than Network Flow University would really have to deal with, but it lets us look at
something concrete. Say that there are six students at NFU; their names are Declan
(D), Evelyn (E), Finn (F ), Genevieve (G), Harper (H), and Jasper (J). There are
also six clubs, whose rosters we’ll abbreviate to:
A1 : {F, J} A2 : {D, E, F, H} A3 : {D, F, G, J} A4 : {E, J} A5 : {E, F } A6 : {E, F, J}
Thinking more generally, we can pose this problem whenever we have a universe U
(in this case, the universe is the university: the set of students {D, E, F, G, H, J})
and a family F of subsets of U (in this case, F = {A1 , A2 , A3 , A4 , A5 , A6 }). Our goal
is to find a transversal of F: from each set in F, we want to choose one element,
without choosing the same element of U twice. Formally, we want to choose elements
a1 , a2 , a3 , a4 , a5 , a6 ∈ U such that ai ∈ Ai for all i and whenever i ̸= j, we have ai ̸= aj .
You may already see some issues with our particular instance of this problem,
but we will ignore those issues and keep going. Let’s try to model this problem using
integer flows.
The idea is that we want to set up a network so that “elect student u to be
president of club Ai ” will be represented by “send flow 1 from node i to node u”. (We
could also build a network to send flow the other way: the choice to send flow from
clubs to students is arbitrary.) Of course, we only include an arc (i, u) if student u is
a member of club Ai .
1 2 3 4 5 6
D E F G H J
We did not put any capacities on arcs from clubs to students. In principle, we
could put a capacity of 1 on each such arc, since a club cannot elect a student to be
“double president”. But that constraint is already implied by the fact that a student
cannot be elected president twice (not even by the same club).
Instead, we will give these arcs infinite capacity. In the linear program, we can
represent this by just not having a capacity constraint. For the Ford–Fulkerson
method, using ∞ as a capacity is fine if we’re careful, but we could also use any big
number M . The reason to use a large capacity on these arcs is that later on, it will
limit the cuts we can find to ones with a nice structure.
1 2 3 4 5 6
D E F G H J
So it turns out we cannot elect a president for every club. Why not? For an
explanation, we must turn to the minimum cut.
We did not draw the residual graph, but let’s try to reconstruct it in our heads:
going forward along arcs where the capacity can be increased, and backwards where
the capacity can be decreased. From s, we can only go to 6 in the residual graph.
From 6, we can go to E, F , and J. From these, we cannot take any forward arcs
(those are at their maximum capacity), but we can go along the backward arcs (5, E)
(1, F ), and (4, J) with positive flow to get to nodes 1, 4, 5. Finally, from these nodes,
we can only go forward, but the only forward arcs lead to nodes E, F , and J, which
we’ve seen already. So the minimum cut (S, T ) we find has
The capacity of this cut is 5: the arcs going from S to T are the arcs (s, 2), (s, 3),
(E, t), (F, t), and (G, t), which have capacity 1 each.
We can interpret this cut to identify an issue with the club rosters we started
with. The four clubs A1 , A4 , A5 , A6 have only three students between them altogether:
Evelyn, Finn, and Jasper. So how could they possibly elect four different presidents?
This tells how to identify a general obstacle to the existence of a transversal of a
family F: a transversal cannot exist if F has a subfamily G such that G (the union
S
of all sets in G) has fewer than G elements. In our case, G = {A1 , A4 , A5 , A6 } and
[
G = {F, J} ∪ {E, J} ∪ {E, F } ∪ {E, F, J} = {E, F, J},
so |G| = 4 and | G| = 3.
S
In fact, we can prove that this is the only kind of obstacle that can rule out
a transversal! To do this, we have to look at the structure of cuts in the network
carefully.
Suppose that we are looking at a general example with F = {A1 , A2 , . . . , An }. Our
network has four “layers": one layer with just s, one layer with nodes {1, 2, . . . , n},
one layer with nodes U (the universe), and one layer with just t. Let’s begin choosing
our cut by saying that the set S will contain s and a subset I ⊆ {1, 2, . . . , n}.
For every i ∈ I and every x ∈ Ai , there is an arc (i, x) in our network with infinite
capacity! We are trying to find a cut with small capacity, and it seems reasonable to
aim for “not infinite” as a starting point toward “small”. So we should make sure to
include every such x in S as well, so that arc (i, x) does not cross from S to T . In
S
other words, S must include the entire union i∈I Ai .
At this point, the following arcs cross from S to T :
• The arcs (s, j) for every j ∈/ I, since these are the nodes from {1, 2, . . . , n} we
decided not to include in S. The total capacity of these is n − |I|.
• The arcs (x,St) for every x ∈ S, since we definitely have t ∈ T . The total capacity
of these is | i∈I Ai |.
• Actually, we could add more elements of U to S, but this would only increase
our capacity, so it wouldn’t help.
This cut proves that we don’t have a transversal exactly when its total capacity
n − |I| + | i∈I Ai | is less than n: in other words, when | i∈I Ai | < |I|. In other words,
S S
the subfamily G = {Ai : i ∈ I} has | G| < |G|: this is the same type of obstacle that
S
we identified earlier.
What we’ve proved is a result known as Hall’s theorem:
Theorem 2.22 — Hall’s theorem. A family F has a transversal if and only if every
subset G ⊆ F satisfies Hall’s condition: | G| ≥ |G|.
S
Many results about when transversals are guaranteed to exist can be shown using
Hall’s theorem. For example:
Theorem 2.23 If every set in F has the same size k, every element of U occurs in
ℓ sets of F, and k ≥ ℓ, then F has a transversal.
has k|G| elements. This is false when the sets in G are not disjoint, since the same
element can be counted multiple times. However, we will never count an element
more than ℓ times, since every element of U appears in ℓ sets of F. Therefore we can
at least say that | G| ≥ kℓ |G|. Since k ≥ ℓ, this implies that | G| ≥ |G|, so Hall’s
S S
1 2 3 4 5 6
D E F G H J
Rather than “nodes” and “arcs”, it’s more common to say vertices and edges in
the case of a graph, but these mean nearly the ssame thing, except that edges don’t
have a direction. The graph is bipartite because its vertices can be partitioned into
two sets X and Y , such that every edge has one endpoint in X and one endpoint in
Y . Let’s say that X = {1, 2, 3, 4, 5, 6} and Y = {D, E, F, G, H, J} in our case.
In any flow, the arcs we use (except for the ones out of s and into t, which
are gone now) cannot repeat a node. The equivalent notion in a graph is called a
matching: a set of edges that does not repeat any endpoints. The solution we found
in the previous section corresponds to the matching {1F, 2H, 3D, 4J, 5E}.
It is common to see problems where X and Y are the same size. In this case, a
transversal of a set family corresponds to a perfect matching in a graph: this is a
matching which uses up all the vertices in X and in Y .
What about the equivalent notion to a cut in our network? For this, it’s convenient
to turn our pair (S, T ) into a different sort of object. We’ll take a set C with the
following elements:
• All vertices in X (the side that used to be closer to s) which are in T , and
• All vertices in Y (the side that used to be closer to t) which are in S.
In our example with S = {s, 1, 4, 5, 6, E, F, J} and T = {2, 3, D, G, H, t}, we end up
with C = {2, 3, E, F, J}.
Why do we do this? Well, the defining feature of a cut with finite capacity is that
we cannot have an infinite capacity arc going from S to T ; in our graph, this means
that there cannot be an edge such that neither endpoint is in C. The resulting set C
is always a vertex cover: a set of vertices that includes one endpoint of every edge.
The max-flow min-cut theorem has the following result when we translate it into
the language of bipartite graphs:
Theorem 2.24 — König’s theorem. In any bipartite graph, the number of edges
in a maximum matching is equal to the number of vertices in a minimum vertex
cover.
Hall’s theorem could also also be stated in the language of graphs, as a condition
for the existence of a perfect matching. Often, that is the way you first see it.
However, no matter what language we use, Hall’s theorem and König’s theorem have
slightly different applications:
• Hall’s theorem asks: when can we guarantee that we can find a perfect matching
between X and Y ? This sees more use in theoretical applications, where we
are looking for a bijection of a certain type between two sets.
• König’s theorem asks: what is the size of the largest matching we can find?
This sees more use in practical applications, where we might want a large
matching even if a perfect one does not exist.
In both cases, the best way to answer the question is to solve the network flow
problem.
We can replace 8 and 10 by large numbers like 998 and 1000 to get a vertex
arbitrarily far away from an integer point. This is just one of the reasons that
integer programming is hard, and weird things can happen when we add the integer
constraint.
5 3 7
6 1 9 5
9 8 6
8 6 3
4 8 3 1
7 2 6
6 2 8
4 1 9 5
8 7 9
Suppose that instead of solving this Sudoku ourselves, we want to write an integer
program so that an automatic solver can do it for us. How can we do that?
If you ask yourself, “What should my variables be?” there is a tempting but
incorrect answer: for each row i and column j, have an integer variable xij with the
constraint 1 ≤ xij ≤ 9 that tells you the number in cell (i, j).
The reason that this is not the right choice of variables is that the constraint
“these nine cells contain each value exactly once” cannot be written as a linear
inequality in terms of these variables. For example, it is possible for a row of a
Sudoku grid to contain the numbers (1, 2, 3, 4, 5, 6, 7, 8, 9), in that order; it is also
possible for that row to contain the numbers (9, 8, 7, 6, 5, 4, 3, 2, 1), in that order.
However, any set of linear constraints that allows these two points as solutions also
allows their midpoint, which is (5, 5, 5, 5, 5, 5, 5, 5, 5): not a valid way to fill in a row
of a Sudoku grid!
Instead, we will use binary variables. These are a very common choice in integer
programs; they are possibly the most common integer variable. What we’ll do here
is have an integer variable xijk where i, j, and k are all between 1 and 9. These 729
8The example is taken from Wikipedia, and its author is listed as Tim Stellmach.
variables will be bound by constraints 0 ≤ xijk ≤ 1; since they’re all integers, then
they can only be 0 or 1. The meaning we’ll attach to these variables is that xijk will
be 1 if cell (i, j) contains the number k, and 0 otherwise.
There are many constraints to be added to enforce the rules of Sudoku, but they
all come in four types:
• In every row, the numbers must all be different; in other words, no value can
be repeated. This is a set of 81 constraints: for every row i, and for every value
k, we add the constraint
to ensure that exactly one of the cells (i, 1) through (i, 9) contains the value k.
• In every column, the numbers must all be different. This is a very similar-
looking set of 81 constraints: for every column j, and for every value k, we add
the constraint
to ensure that exactly one of the cells (1, j) through (9, j) contains the value k.
• There are also 81 constraints for the boxes. These also look very similar, though
they’re slightly harder to describe. For example, for the top left 3 × 3 box, we
will have a constraint
for each k between 1 and 9, which ensures that the value k appears exactly
once in that box. We add similar sets of constraints for the other 3 × 3 boxes.
• The last set of constraints is easier to forget about. It does not enforce the
rules of Sudoku as described above; rather, it enforces a rule of common sense
that comes from the meaning we attach to those variables. We want to ensure
that every cell of the 9 × 9 grid contains exactly one number. Thus, for every
row i, and for every column j, we add the constraint
The resulting integer program has 729 variables and 324 constraints (not counting
the 729 constraints of the form xijk ≤ 1, and the 729 nonnegativity constraints), so it
is well out of reach of what we’ll be able to solve by hand. However, computers are
very good at problems like this: a general-purpose algorithm for solving problems
with {0, 1}-valued variables can solve the Sudoku above basically instantly. Though
in general, all known integer programming algorithms take exponential time in the
worst case, this takes a while to kick in, especially in special cases.
we’ve solved this semester: we have no objective function, we just want to know if a
feasible solution exists.
Boolean satisfiability problems are typically expressed using logical operations:
“and”, “or”, “not”, and others. However, all of these can be expressed using linear
expressions and linear constraints:
• If x and y are {0, 1}-valued variables and 1 is interpreted as “true”, then
x + y ≥ 1 encodes the constraint “x or y is true”.
• If x and y are {0, 1}-valued variables and 1 is interpreted as “true”, then
x + y = 2 encodes the constraint “x and y are both true”.
• If x is a {0, 1}-valued variable and 1 is interpreted as “true”, then 1 − x
represents the same truth value as “not x”.
Typically, Boolean satisfiability problems are written in a standard form called
“conjunctive normal form”. This standard form consists of:
• literals, which are either “xi” or “not xi” for some variable xi;
• combined into clauses: sets of literals among which at least one must be true;
• finally, the overall problem is a collection of clauses among which all must be
true.
We will not get into the weeds of encoding problems in this form, but it is a form
that lends itself particularly well to an integer programming representation. Each
clause can be represented by a single linear inequality: for example, “x or y or not z”
could be written as
x + y + (1 − z) ≥ 1
or x + y − z ≥ 0.
x ≤ 100 + 1900y.
Here, we have an upper bound on x in either case, one bound is just larger than
the other. If, on the other hand, the warehouse can store an effectively unlimited
number of calculators, then we might have to “make up” an upper bound. That is,
we’d write an inequality
x ≤ 100 + M y
where M is a number chosen large enough that this constraint won’t limit the number
of calculators we can produce. For example, based on looking at the rest of the linear
program, we might conclude that we’ll never have the materials to produce more
than 10000 calculators over the time period we’re considering, and then we could set
M = 10000.
Note that in practice, we’ll see better behavior when we solve our integer programs
if the value of M is not chosen to be gratuitously large. (But we must be careful: if
we pick too small an M , we might cut off legitimate solutions.)
Multiple binary variables. There are many ways to make this setup more
complicated.
Maybe we can rent multiple storage locations with different costs and different
capacities: we could end up a constraint like
It’s possible that the ideas from the previous section might get involved. Maybe
options y1 and y2 are mutually exclusive, but option y3 requires option y2 to be
chosen first. We could represent such logical constraints by the inequalities y1 + y2 ≤ 1
and y3 ≤ y2 , respectively.
5 3 4 6 7 8 9 1 2
6 7 2 1 9 5 3 4 8
1 9 8 3 4 2 5 6 7
8 5 9 7 6 1 4 2 3
4 2 6 8 5 3 7 9 1
7 1 3 9 2 4 8 5 6
9 6 1 5 3 7 2 8 4
2 8 7 4 1 9 6 3 5
3 4 5 2 8 6 1 7 9
To make sure that for each j, we don’t exceed our weight limit on the j th trip, we
could add constraints
for each j = 1, . . . , 15. (Wait on this, though; we’ll modify this constraint in a bit.)
The tricky part is figuring out how to minimize the number of trips necessary.
One way to do this is with a boolean variable yj for each trip j (that is, an integer
variable with 0 ≤ yj ≤ 1) answer the question: did we take trip j at all? To enforce
this interpretation of yj , we can modify the above constraint to
minimize y1 + y2 + y3 + · · · + y15
Here (4, 4, 4), for example, means that you carry three 4-kilogram crates; (4, 6)
means that you carry a 4-kilogram crate and a 6-kilogram crate. This list leaves out
suboptimal configurations like “carry just two 4-kilogram crates”; this simplifies our
setup, though we’ll have to address this issue later.
Now we can define nonnegative integer variables x444 , x55 , x66 , x45 , x46 , x56 to be
the number of trips taken with each configuration of goods. The total number of
trips is just the sum of these variables: x444 + x55 + x66 + x45 + x46 + x56 ; this is the
quantity we will want to minimize.
For each size of crate, we ask for the total number of crates taken of that size to
be at least 5; for example, we add the constraint
3x444 + x45 + x46 ≥ 5
for the 4-kilogram crates. Why at least 5 and not exactly 5? Because in practice,
we might want to carry suboptimal configurations we didn’t include: if our first two
trips just deal with the 4-kilogram crates, the configurations will be (4, 4, 4) and then
(4, 4). We will model that as “taking an extra 4-kilogram crate”, with two trips that
are both (4, 4, 4).
In this case, our entire linear program is:
minimize x444 + x55 + x66 + x45 + x46 + x56
subject to 3x444 + x45 + x46 ≥ 5
2x55 + x45 + x56 ≥ 5
2x66 + x46 + x56 ≥ 5
x444 , x55 , x66 , x45 , x46 , x56 ≥ 0
x444 , x55 , x66 , x45 , x46 , x56 ∈ Z
Figuring out the configurations takes a bit of work (and in general, there may be
many of them, which is a weakness of this method). However, once we get there,
solving this integer program will turn out to be easier.
How do we get there? Well, we can begin by ignoring the integer variables, and
pretending they can be fractions. This is called “solving the LP relaxation”.
In this case, doing that will tell us that we can solve the problem in 6 23 trips,10
by setting x55 = 52 , x66 = 52 , and x444 = 53 .
Of course, this is not an actual workable solution. But it does give us some
important information! We now know for certain that at least 7 trips will be necessary.
6 23
x55 ≤ 2 x55 ≥ 3
6 32 7 16
7 7 7 13 7 23
x444 ≤ 1 x444 ≤ 1
x444 ≥ 2 x444 ≥ 2
7 12 8 9 8
x66 ≤ 1
x66 ≥ 2
7 32 8
x444 ≤ 0
x444 ≥ 1
8 8
In this diagram, each rectangle is labeled with an optimal objective value, and every
arrow is labeled with the constraint we added when branching. The rectangles with
no branches correspond to places where the linear program obtained an integer
optimal solution.
2.84.2 Pruning
As you can see, it can take a while to end up at integer solutions by branching.
In fact, it is typical to have to deal with many cases in the branch-and-bound
method; often, there are exponentially many cases. We saw in the previous lecture
that integer programming is very expressive and can handle many different problems;
the downside is that it is not usually easy to solve.
However, things are not as bad as I’ve made them seem, because we have only seen
half of the branch-and-bound method. In addition to branching, there is pruning:
discarding branches that are not promising without exploring them.
Here’s an example. Suppose that we start solving the potato-crate problem, and
get the following results:
1. We solve the original LP and get the the solution
with objective value 6 23 , which seems more promising. From here, we branch
on x444 .
4. We try the x444 ≤ 1 sub-branch of the x55 ≤ 2 branch, and get the solution
6 23
x55 ≤ 2 x55 ≥ 3
6 23 7 61
x444 ≤ 1 x66 ≤ 2
x444 ≥ 2 x66 ≥ 3
7 ?? ?? ??
There are certainly branches we haven’t explored yet. But are they worth
exploring? We have an integer solution with objective value 7 already: we know we
can solve the problem in 7 trips. So the only reason to explore other branches is if
they could give us a 6-trip solution.
However, whenever we add more constraints, we can only make the objective
value worse. Therefore exploring down from the node labeled 6 23 will give us more
solutions with objective value at least 6 23 : if they’re integer solutions, the objective
value will be at least 7. Exploring down from the node labeled 7 16 will be even worse:
the smallest integer objective value we can get is at least 8.
So at this point we know we’ve found an optimal integer solution! We can stop.
The general philosophy is:
1. If ζ ∗ is the objective value of the best integer solution we’ve found, we can
prune nodes that branched from an objective value of ζ ∗ or worse: we don’t
explore those sub-branches. They will never give us anything better than what
we’ve already found.
2. In problems where integer solutions have integer objective values, we can do a
bit better. For each fractional objective value, treat it as the next-worst integer
value when pruning, since that is the best we can do from that branch. (In
our example, we prune the other branch leading out of 6 23 , even though 6 23 < 7,
because there’s no integer between 6 23 and 7.)
(Sometimes, adding constraints will create an infeasible linear program in one of our
sub-branches. These should also be pruned, because adding more constraints to an
already-infeasible lineanear program will never yield solutions.)
maximize
n
cT x
x∈Z
subject to Ax ≤ b
x≥0
a1 x1 + a2 x2 + · · · + an xn = b.
where ⌊r⌋ denotes the floor of r: the greatest integer less than or equal to r.
Moreover, the difference between the two sides of this inequality is an integer.
Proof. The quantity ⌊a1 ⌋x1 + ⌊a2 ⌋x2 + · · · + ⌊an ⌋xn has smaller or equal coefficients
on every variable, compared to a1 x1 + a2 x2 + · · · + an xn . Since all the variables are
nonnegative, it must be smaller. We conclude that
However, in the inequality we just wrote down, the left-hand side is an integer. So if
it is less than or equal to b, it is also less than or equal to ⌊b⌋, giving us the inequality.
The difference between the two sides of the inequality is an integer simply because
both sides are integers. ■
(a1 − ⌊a1 ⌋)x1 + (a2 − ⌊a2 ⌋)x2 + · · · + (an − ⌊an ⌋)xn ≥ b − ⌊b⌋.
Here, ai − ⌊ai ⌋ and b − ⌊b⌋ are called the fractional parts of ai and of b.
Be careful when taking fractional parts of negative numbers! If r is a positive real
number, r − ⌊r⌋ is just the part of r after the decimal, but this is no longer true if r
is negative. For example, if r = −1.23, then its floor is ⌊r⌋ = −2, and its fractional
part is r − ⌊r⌋ = 0.77.
There is a further detail we need to know: not just that this inequality is valid,
but that it cuts off fractional optimal solutions. This will force the dual simplex
method to find us a new solution, with a new opportunity for it to be an integer
solution.
If the Gomory fractional cut is obtained from a row of our dictionary, then the
fractional-part form of the inequality will contain only nonbasic variables: the single
basic variable xi will have ai = 1 (since it appears on the left side with coefficient 1)
and therefore ai − ⌊ai ⌋ = 0. So the right-hand side of the inequality will be 0 at the
current basic solution. Provided we pick a row where the constant term is a fraction,
we will have b − ⌊b⌋ > 0, so the inequality will not hold!
2.86.2 An example
As an example, consider the following integer program:
maximize 3x + 2y
x,y∈R
subject to 3x + y ≤ 6
y ≤ 2
x, y ≥ 0
Before we can find a cutting plane, we should solve the linear programming relaxation.
If we did not find an integer solution, then by definition, one of the basic variables
will have a fractional value in the optimal dictionary. In our case, x has a fractional
value. Finding the Gomory fractional cut requires us to pick one such variable; if
there are several, it doesn’t matter which one we pick, but in this case, we can only
pick x.
To apply Theorem 2.25 here, we need to write x’s equation in the appropriate
form. We move all the nonbasic variables to the left-hand side, getting
1 1 4
x + w1 − w2 = .
3 3 3
The inequality in Theorem 2.25 is obtained by rounding every coefficient down to
the nearest integer. This turns x into x, 13 w1 into 0w1 or 0, − 13 w2 into −w2 , and 43
into 1, so we get the inequality
x − w2 ≤ 1.
The reason we like to subtract this inequality from the previous inequality (or
equivalently, take the fractional parts of the coefficients) is that this eliminates the
basic variable x, giving us an inequality in the nonbasic variables:
1 2 1
w1 + w2 ≥ .
3 3 3
Our next step is to add this to our dictionary. We add a new variable w3 representing
the amount by which 13 w1 + 23 w2 exceeds 13 . Solving for w3 , this gives us the equation
1 1 2
w3 = − + w1 + w2 .
3 3 3
The clause at the end of Theorem 2.25 is important here: it tells us that the new
variable w3 will be an integer at any integer solution to our program, so we will
continue to have an integer program in which all variables are integers.
(A brief synopsis of the dual simplex pivot we did: we know that w3 is our leaving
variable, and because both w1 and w2 have positive coefficients, they’re both on our
1
shortlist for entering variables. We compare ratios, and w1 ’s ratio 1/3 is larger than
1
w2 ’s ratio 2/3 , so w2 is our entering variable. After pivoting, we end up with an
optimal dictionary.)
We are still not done, because (x, y) = ( 32 , 32 ) is not a integer solution. But let’s
take a break from that to look at what is going on graphically.
To visualize the procedure of adding a cutting plane, we can rewrite our inequality
x − w2 ≤ 1 in yet a third form: in terms of x and y. To do this, substitute w2 = 2 − y,
getting x − (2 − y) ≤ 1 or x + y ≤ 3. In the diagram of the feasible region, here is
what this looks like:
The newly added inequality separates all the integer solutions (in black) from the
fractional solution to the LP relaxation (the large point in red). Unfortunately, the
optimal solution to the new feasible region happens to be the one corner of that
region that doesn’t have integer coordinates. We have the worst luck!
To continue, we go back to our dictionary. All three basic variables have a
fractional value, so we could pick any of them to deal with, but let’s pick x again.
(In this example, it turns out that we’ll get the same cut no matter which equation
we get it from.)
Moving all the variables to the left, we get x + 12 w1 − 12 w3 = 32 . Taking the
fractional parts according to the alternate form of Theorem 2.25, we get the inequality
1 1 1
2 w1 + 2 w3 ≥ 2 . We can add this to our dictionary, with a new slack variable w4 , if
we define w4 = − 12 + 12 w1 + 12 w3 .
Applying the dual simplex method again takes us only one pivot step:
15 1 3
max ζ = 2 − 2 w1 − 2 w3 max ζ = 7 − w3 − w4
3 1 1
x = 2 − 2 w1 + 2 w3 x = 1 + w3 − w4
3 1 3
y = 2 + 2 w1 − 2 w3
⇝ y = 2 − 2w3 + w4
1 1 3
w2 = 2 − 2 w1 + 2 w3 w2 = 0 + 2w3 − w4
w4 = − 12 + 1
2 w1 + 1
2 w3 w1 = 1 − w3 + 2w4
The optimal solution is (x, y) = (1, 2), which is an integer solution! After the second
cutting plane, we are done.
The second cut we added, which we wrote as 12 w1 + 12 w3 ‘ ≥ 21 , can be written in
terms of x and y as 2x + y ≤ 4. Here is another diagram showing the evolution of
our feasible region as we add the cutting planes:
⇝ ⇝
2.87 Extensions
The cutting plane method is often combined with the branch-and-bound method into
a hybrid algorithm called “branch-and-cut". Here, when solving a linear program
and getting a fractional solution, we make a choice between two options:
• Pick a variable xi with a fractional value, and use it to branch out to two new
linear programs, as usual in the branch-and-bound method.
• Add a cutting plane inequality to replace the linear program by a new one with
a different solution.
It’s a matter of heuristics (in other words, guesswork) to decide between these two
options. These heuristics are only partially developed by mathematical reasoning;
partially, we just check them on practical examples to see how well they behave.
There are also many industrial applications in which “cities” and “travel” are
more metaphorical. For example, if we are constructing an object layer by layer in a
3D printer, then optimizing the order in which we deposit material is a variant of
the traveling salesman problem. We might also be drilling holes in a circuit board,
cutting a sheet of wood with a laser cutter, or even manipulating a robot arm to
take photos of an object from multiple angles11 .
Even if some of these problems add additional twists to the problem, the starting
point is usually one of the two TSP formulations we will look at today.
It will be convenient for us to assume that we never visit a city more than once
in a tour. In some formulations, this may require distinguishing between “official”
and “unofficial” visits to a city. For example, we can imagine that if we’re trying
to tour the US by taking airplane flights, we might go from Atlanta to Orlando to
Charlotte, and the flight from Orlando to Charlotte might have a layover in Atlanta.
In this case, the cost of the Orlando–Charlotte route in our problem would simply
be the total cost of the two-leg trip, and we don’t even notice the layover in Atlanta
when we’re finding the optimal tour.
On the other hand, we could also consider a problem where “unofficial” visits to
a city are not allowed—for example, we are still trying to visit n different cities in
the US by airplane and return, but we want our trip to consist of n direct flights.
In this case, the cost of going from Orlando to Charlotte might increase if we have
to avoid stopping in Atlanta. We will return to this distinction at the end of the
lecture.
In a tour, we visit each city only once: we enter the city once, and then we leave
the city once. (For city 1, we do those in a different order: first we leave city 1, and
then we return to it. But this doesn’t affect things; in fact, in a tour, it doesn’t
matter which city is the starting city.) We can represent this requirement by a pair
of constraints for each city:
X
xij = 1 for each j = 1, 2, . . . , n (2.23)
1≤i≤n
i̸=j
X
xjk = 1 for each j = 1, 2, . . . , n (2.24)
1≤k≤n
k̸=j
11 Formore details of this unusual application—with pictures!—see the paper where I originally
found it: https://doi.org/10.3390/robotics11010016.
Equation (2.23) says that we arrive at city j from exactly one other city. Equa-
tion (2.24) says that we leave city j to go to exactly one other city.
If these constraints were all we needed, we’d be in great shape. (In fact, the
constraint matrix so far is totally unimodular, so we wouldn’t even need to worry
about integer programming techniques. We will not prove this, because it doesn’t
immediately help us with anything, but it’s true.) Unfortunately, there’s a problem.
Take a random set of 9 points (as in the first diagram) and let cij be the distance
between the ith point and the j th point. Then the minimum-cost tour between
the 9 points, as found by a brute-force search, is shown in the second diagram.
Unfortunately, the solution to the integer program with constraints (2.23) and (2.24),
shown in the third diagram, is not a tour at all!
The optimal solution to the integer program we have so far satisfies the constraint
that we must enter each node once and leave it once, and so it looks like a tour
“locally". Unfortunately, it is missing the “global" condition that the tour must be
connected.
For every set S of cities, other than the empty set ∅ and the set {1, 2, . . . , n} of all
cities, the sum on the left-hand side ranges over all pairs (i, j) such that going from
city i to city j leave S. By requiring the sum to be at least 1, we require that the
tour will leave the set S at least once.
This is guaranteed to happen for any legitimate tour. Since the tour visits every
single city, it must visit a city in S at some point. However, the tour cannot stay in
S forever, since there are also cities not in S, so eventually it must take a step that
leaves S.
However, the optimal solution to the constraints in (2.23) and (2.24) on the
previous page violates this condition. We could, for example, take S to be the set of
the three points on the bottom. The solution there consists of a “subtour" that just
cycles between the three cities in S, and some other thing that happens between the
six cities outside S, with no step that leaves S.
With the subtour elimination constraints in play, every integer solution to (2.23),
(2.24), and (2.25) is actually a valid tour, and so we can solve the TSP problem
using an integer program. A slightly concerning feature of the subtour elimination
constraints is that there are 2n − 2 of them. That number grows almost as quickly
as the number (n − 1)! of possible tours, so solving even the linear programming
relaxation might not be quicker than solving the TSP problem by brute force.
A solution to this is to add the constraints in (2.25) on the fly, one at a time, just
as we added the fractional cuts in the previous lecture. Given any integer solution
to (2.23) and (2.24) that is not a tour, we can quickly find a set S for which the
corresponding constraint in (2.25) is violated. For example, we can start at city
1 and follow the path defined by the integer solution (by going from city i to the
unique city j such that xij = 1) until we return to city 1. Let S be the set of all
cities we visit: then either S = {1, 2, . . . , n} and we have a tour, or else the subtour
elimination constraint for S is violated because
XX
xij = 0.
i∈S j ∈S
/
There is no solution to these three constraints: when we add all three of them
together, the variables ta , tb , tc all cancel and we get the false inequality 3 ≤ 0. We
get a similar contradiction for a longer subtour.
With the equations (2.23), (2.24), (2.26), we only have around n2 constraints
in our (n2 + n)-variable integer program, which is much better than the around 2n
constraints we had earlier. The variables t2 , t3 , . . . , tn don’t even need to be integer
variables, although there is an optimal solution where they all have integer values.
It is not necessarily true that the MTZ constraints are better than the DFJ
constraints, just because there are fewer of them. In practice, it seems that the DFJ
constraints have better performance—and adding the constraints on the fly with a
branch-and-cut approach solves the main obstacle to using them. Still, the MTZ
constraints have the advantage that they’re easier to work with without specialized
code: both approaches are useful in the right circumstances.
In this case, though we did not find the optimal solution, we got much closer
than the factor-2 guarantee. The optimal solution (shown on a previous page) has
total length about 4.88733; the solution found by our approximation algorithm has
total length about 5.01606.
A fancier version of this approximation algorithm, called the Christofides algo-
rithm, does even better: its cost is at most 1.5 times the cost of an optimal tour.
This is not always what we want, but it can be better than nothing if our integer
programs are too big to solve directly!
1 2 3 4 5
1 2 −1 1 −1
0 1
0 1 −1
0 0 1 1 −1
2. xj = 0 whenever j ̸∈ B.
For example, suppose we have
1 2 3 4 5
1 2 −1 1 −1
A =
0 1 0
1 −1
0 0 1 1 −1
2
B =
1
1
1
1
Then x = 1
is a basic solution for basis B = {1, 2, 3} since Ax = b and
0
0
x4 = x5 = 0.
Notice that when we are given a basis, then there is only 1 solution. This is a
theorem.
Theorem 2.27 Consider Ax = b and a basis B of A. Then there exists a unique
basic solution x for B.
Proof. We have
b = Ax
X
= A j xj
j
X X
= A j xj + A j xj
j∈B j̸∈B
X
= A j xj Since xj = 0 for all j ̸∈ B
j∈B
= AB xB
Now, since B is a basis, it implies AB is invertible, so A−1
B exists. Hence, xB =
A−1
B b. ■
Our LP is in canonical form for basis B = {1, 4} and (2, 0, 0, 5)T is a basic solution.
We will pick k ̸∈ B such that ck > 0. So we pick k = 2, and set x2 = t ≥ 0, while
keeping other non-basic variables to 0. So x3 = 0.
Then we choose basic variables such that Ax = b holds. So we have
2
=
1 1 2 0
x
5 0 1 1 1
1 1 2 0
= x1 + x2 + x3 + x4
0 1 1 1
x1 1 2 0
= +t +0 +
0 1 1 x4
1 x1
= t +
1 x4
Rearranging gives
x 2 1
1 = − t
x4 5 1
xB = b − tAk
x1 = 2 − t ≥ 0 ⇒ t ≤ 2
x4 = 5 − t ≥ 0 ⇒ t ≤ 5
Notice that our new solution is a basic solution for basis B = {2, 4}, we can say
that 1 left the basis and 2 entered the basis. Rewriting our LP to canonical
form with basis B = {2, 4}, we have
max
−1 0 1 0 x
s.t.
1 1 2 0 2
x=
−1 0 −1 1 3
x1 , x 2 , x 3 , x 4 ≥ 0
xB = b − tAk
x 2
2 = − t
2
x4 3 −1
From here, we pick t = 1, so x2 = 0, thus 2 leaves the basis. The new basis is
now B = {3, 4} with canonical form
max
−1.5 −0.5 0 0 x
s.t.
0.5 0.5 1 0 1
x=
−0.5 0.5 0 1 4
x1 , x 2 , x 3 , x 4 ≥ 0
aT x ≤ α ≤ aT y
for all x ∈ C, y ∈ D.
In other words, this theorem guarantees that if you have two sets that do not
intersect and are convex, there is a hyperplane (which is a generalization of a flat
surface to n-dimensions; for instance, a line in 2D or a plane in 3D) that separates
the two sets. This hyperplane can be thought of as a decision boundary, which
one can use to distinguish points belonging to one set from those belonging to the
other.
Proof. See studocu course notes ■
C + D = {x + y : x ∈ C, y ∈ D}
C − D = {x − y : x ∈ C, y ∈ D}
Lemma 2.2. Let C ⊂ Rn be a closed convex set and let d ̸∈ C. Let x∗ be the nearest
point in C to d. Then the hyperplane
H = {x : ⟨v, x⟩ = ⟨v, d⟩ − ∥v∥2 }
strictly separates C from d. That is, we have
⟨v, x⟩ ≤ ⟨v, d⟩ − ∥v∥2 < ⟨v, d⟩
for all x ∈ C.
xT v ≤ b, ∀x ∈ Q
K = conv(ext(K))
Definition 2.19 Optimization is the process of finding the best solution among a
set of possible solutions. In other words, it is the art of finding the maximum or
minimum value of a function subject to certain constraints.
min
x
f (x) s.t. g(x) ≤ 0
where f (x) is the objective function, x is the decision variable, and g(x) is
the constraint function.
max{cx : Ax = b, x ∈ R},
Constraints are the conditions that the decision variables must satisfy. They can
be equalities or inequalities.
■ Example 2.7
To use optimization, first you must formulate your model, based on the system of
interest and any simplifications required (i.e., assumptions). Formulating the model
is not enough, we are also interested in solving the problem, and in a reasonable
amount of time (however that is determined). To solve these problems, algorithms
are developed. An algorithms is a step-by-step process for finding a solution.
Optimization is the process of finding the best solution that satisfies a set of
constraints and criteria. The goal is to find the optimal solution that maximizes or
minimizes a objective function. The objective function is a mathematical function
that describes the problem to be solved.
Linear Programming is a sub-field of optimization theory, which is itself a subfield
of Applied Mathematics. Applied Mathematics is a very general area of study that
could arguably encompass half of the engineering disciplines. Put simply, applied
mathematics is all about applying mathematical techniques to understand or do
something practical.
Optimization is an exciting sub-discipline within applied mathematics! Opti-
mization is all about making things better; this could mean helping a company
make better decisions to maximize profit; helping a factory make products with less
environmental impact; or helping a zoologist improve the diet of an animal. When
we talk about optimization, we often use terms like better or improvement. It’s
important to remember that words like better can mean more of something (as in
the case of profit) or less of something (as in the case of waste). As we study linear
programming, we’ll quantify these terms in a mathematically precise way. For the
time being, let’s agree that when we optimize something, we are trying to make some
decisions that will make it better.
■ Example 2.8 Let’s recall a simple optimization problem from differential calculus:
Goats are an environmentally friendly and inexpensive way to control a lawn when
there are lots of rocks or lots of hills. (Seriously, both Google and some U.S. Navy
bases use goats on rocky hills instead of paying lawn mowers!) Suppose I wish to
build a pen to keep some goats. I have 100 meters of fencing and I wish to build
the pen in a rectangle with the largest possible area. How long should the sides of
the rectangle be? In this case, making the pen better means making it have the
largest possible area. The problem is illustrated in Figure 1.1. Clearly, we know
that:
2x + 2y = 100 (2.27)
because 2x + 2y is the perimeter of the pen and I have 100 meters of fencing to
build my pen. The area of the pen is A(x, y) = xy. We can use Equation 1.1 to
solve for x in terms of y. Thus we have:
y = 50 − x (2.28)
and A(x) = x(50 − x). To maximize A(x), recall we take the first derivative of A(x)
with respect to x, set this derivative to zero and solve for x:
dA
= 50 − 2x = 0 (2.29)
dx
Goat pen y
Figure 2.8: Goat pen with unknown side lengths. The objective is to identify the
values of x and y that maximize the area of the pen (and thus the number of goats
that can be kept).
d2 A
= −2 < 0 (2.30)
dx2
x=25
Which implies that x = 25 is a local maximum for this function. Another way of
seeing this is to note that A(x) = 50x − x2 is an "upside-down" parabola. As we
could have guessed, a square will maximize the area available for holding goats.
■
Let’s take a more general look at the goat pen example. The area function is a
mapping from R2 to R, written A : R2 → R. The domain of A is the two-dimensional
space R2 and its range is R.
Our objective in Example 2.8 is to maximize the function A by choosing values
for x and y. In optimization theory, the function we are trying to maximize (or
minimize) is called the objective function. In general, an objective function is a
mapping f : D ⊆ Rn → R. Here D is the domain of the function f .
R Clearly Definition 2.21 is valid only for domains and functions where the
concept of a neighborhood is defined and understood. In general, S must be a
topologically connected set (as it is in a neighborhood in Rn ) in order for this
definition to be used, or at least we must be able to define the concept of a
neighborhood on the set.
In Example 2.8, we are constrained in our choice of x and y by the fact that
2x + 2y = 100. This is called a constraint of the optimization problem. More
specifically, it’s called an equality constraint. If we did not need to use all the fencing,
then we could write the constraint as 2x + 2y ≤ 100, which is called an inequality
constraint. In complex optimization problems, we can have many constraints. The
set of all points in Rn for which the constraints are true is called the feasible set (or
feasible region). Our problem is to decide the best values of x and y to maximize
the area A(x, y). The variables x and y are called decision variables.
Let f : D ⊆ Rn → R; for i = 1, . . . , m, gi : D ⊆ Rn → R; and for j = 1, . . . , l,
hj : D ⊆ Rn → R be functions. Then the general maximization problem with objective
function f (x1 , . . . , xn ) and inequality constraints gi (x1 , . . . , xn ) ≤ bi (i = 1, . . . , m) and
equality constraints hj (x1 , . . . , xn ) = rj is written as:
max f (x1 , . . . , xn )
subject to :
g1 (x1 , . . . , xn ) ≤ b1
..
.
(2.31)
gm (x1 , . . . , xn ) ≤ bm
h1 (x1 , . . . , xn ) = r1
..
.
hl (x1 , . . . , xn ) = rl
function z(x1 , . . . , xn ) in terms of the feasible region instead of the entire domain of
z, since we are only concerned with values of x1 , . . . , xn that satisfy our constraints.
Constrained optimization
In optimization, the objective is to maximize or minimize some function. For
example, if we are a factory, we want to minimize our cost of production. Often,
our optimization is not unconstrained. Otherwise, the way to minimize costs is to
produce nothing at all. Instead, there are some constraints we have to obey. The is
known as constrained optimization.
Note that everything above is a vector, but we do not bold our vectors. This is since
almost everything we work with is going to be a vector, and there isn’t much point
in bolding them.
This is indeed the most general form of the problem. If we want to maximize
f instead of minimize, we can minimize −f . If we want our constraints to be
an inequality in the form h(x) ≥ b, we can introduce a slack variable z, make the
functional constraint as h(x) − z = b, and add the regional constraint z ≥ 0. So all is
good, and this is in fact the most general form.
Linear programming is, surprisingly, the case where everything is linear. We can
write our problem as:
minimize cT x subject to
where we’ve explicitly written out the different forms the constraints can take.
This is too clumsy. Instead, we can perform some tricks and turn them into a
nicer form:
Definition 2.23 — General and standard form. The general form of a linear
program is
minimize cT x subject to Ax ≥ b, x ≥ 0
minimize cT x subject to Ax = b, x ≥ 0.
It takes some work to show that these are indeed the most general forms. The
equivalence between the two forms can be done via slack variables, as described
above. We still have to check some more cases. For example, this form says that x ≥ 0,
i.e. all decision variables have to be positive. What if we want x to be unconstrained,
ie can take any value we like? We can split x into to parts, x = x+ − x− , where each
part has to be positive. Then x can take any positive or negative value.
Note that when I said “nicer”, I don’t mean that turning a problem into this
form necessarily makes it easier to solve in practice. However, it will be much easier
to work with when developing general theory about linear programs.
x1 + 2x2 ≤ 6
x1 − x2 ≤ 3
x1 , x 2 ≥ 0
x 1 − x2 = 3
x1 + 2x2 = 6
x1
c
−(x1 + x2 ) = 0 −(x1 + x2 ) = −2 −(x1 + x2 ) = −5
The shaded region is the feasible region, and c is our cost vector. The dotted lines,
which are orthogonal to c are lines in which the objective function is constant. To
minimize our objective function, we want the line to be as right as possible, which
is clearly achieved at the intersection of the two boundary lines. ■
Now we have a problem. In the general case, we have absolutely no idea how to
solve it. What we do know, is how to do unconstrained optimization.
Unconstrained optimization
Let f : Rn → R, x∗ ∈ Rn . A necessary condition for x∗ to minimize f over Rn is
∇f (x∗ ) = 0, where
!T
∂f ∂f
∇f = ,··· ,
∂x1 ∂xn
is the gradient of f .
However, this is obviously not a sufficient condition. Any such point can be a max-
imum, minimum or a saddle. Here we need a notion of convexity:
Definition 2.24 — Convex region. A region S ⊆ Rn is convex iff for all δ ∈ [0, 1],
x, y ∈ S, we have δx + (1 − δ)y ∈ S. Alternatively, If you take two points, the
line joining them lies completely within the region.
non-convex convex
x δx + (1 − δ)y
y
ϕ
α
ϕ(b)
x
Theorem 2.33 (P ) satisfies strong duality iff ϕ(c) = inf f (x) has a supporting
x∈X(c)
hyperplane at b.
Note that here we fix a b, and let ϕ be a function of c.
Proof. (⇐) Suppose there is a supporting hyperplane. Then since the plane passes
through ϕ(b), it must be of the form
α(c) = ϕ(b) + λT (c − b).
Since this is supporting, for all c ∈ Rm ,
ϕ(b) + λT (c − b) ≤ ϕ(c),
or
ϕ(b) ≤ ϕ(c) − λT (c − b),
This implies that
ϕ(b) ≤ infm (ϕ(c) − λT (c − b))
c∈R
= infm inf (f (x) − λT (h(x) − b))
c∈R x∈X(c)
= g(λ)
By weak duality, g(λ) ≤ ϕ(b). So ϕ(b) = g(λ). So strong duality holds.
(⇒). Assume now that we have strong duality. The there exists λ such that for
all c ∈ Rm ,
ϕ(b) = g(λ)
= inf L(x, λ)
x∈X
≤ inf L(x, λ)
x∈X(c)
= ϕ(c) − λT (c − b)
So ϕ(b) + λT (c − b) ≤ ϕ(c). So this defines a supporting hyperplane. ■
We are having some progress now. To show that Lagrange multipliers work, we
need to show that (P ) satisfies strong duality. To show that (P ) satisfies strong
duality, we need to show that it has a supporting hyperplane at b. How can we show
that there is a supporting hyperplane? A sufficient condition is convexity.
Theorem 2.34 — Supporting hyperplane theorem. Suppose that ϕ : Rm → R is
convex and b ∈ Rm lies in the interior of the set of points where ϕ is finite. Then
there exists a supporting hyperplane to ϕ at b.
Proof follows rather straightforwardly from the definition of convexity, and is omitted.
This is some even better progress. However, the definition of ϕ is rather convoluted.
How can we show that it is convex? We have the following helpful theorem:
Theorem 2.35 Let
Proof. Consider b1 , b2 ∈ Rm such that ϕ(b1 ) and ϕ(b2 ) are defined. Let δ ∈ [0, 1] and
define b = δb1 + (1 − δ)b2 . We want to show that ϕ(b) ≤ δϕ(b1 ) + (1 − δ)ϕ(b2 ).
Consider x1 ∈ X(b1 ), x2 ∈ X(b2 ), and let x = δx1 + (1 − δ)x2 . By convexity of X,
x ∈ X.
By convexity of h,
ϕ(b) ≤ f (x)
= f (δx1 + (1 − δ)x2 )
≤ δf (x1 ) + (1 − δ)f (x2 )
This holds for any x1 ∈ X(b1 ) and x2 ∈ X(b2 ). So by taking infimum of the right
hand side,
So ϕ is convex. ■
h(x) = b is equivalent to h(x) ≤ b and −h(x) ≤ −b. So the result holds for
problems with equality constraints if both h and −h are convex, i.e. if h(x) is linear.
So
Theorem 2.36 If a linear program is feasible and bounded, then it satisfies strong
duality.
maximize x1 + x2
subject to
x1 + 2x2 ≤ 6
x1 − x2 ≤ 3
x1 , x 2 ≥ 0
We can plot the solution space out
x2
x 1 − x2 = 3
c
x1
x1 + 2x2 = 6
A x1 − x2 = 3
c
x1
x1 + x2 = 3.5
Basic solutions
Here we will assume that the rows of A are linearly independent, and any set of m
columns are linearly independent. Otherwise, we can just throw away the redundant
rows or columns.
In general, if both the constraints and the objective functions are linear, then the
optimal point always lies on a “corner”, or an extreme point.
x = δy + (1 − δ)z,
then x = y = z.
We will later see (via an example) that basic solutions correspond to solutions at the
“corners” of the solution space.
x1 + 2x2 + z1 = 6
x1 − x2 + z2 = 3
x1 , x 2 , z 1 , z 2 ≥ 0
x1 x2 z1 z2 f (x)
A 0 0 6 3 0
B 0 3 0 6 3
C 4 1 0 0 5
D 3 0 3 0 3
E 6 0 0 −4 6
F 0 −3 12 0 −3
Among all 6, E and F are not feasible solutions since they have negative entries.
So the basic feasible solutions are A, B, C, D.
x2
x 1 − x2 = 3
D
C
A B E x1
x1 + 2x2 = 6
In previous example, we saw that the extreme points are exactly the basic feasible
solutions. This is true in general.
Theorem 2.37 A vector x is a basic feasible solution of Ax = b if and only if it is
an extreme point of the set X(b) = {x′ : Ax′ = b, x′ ≥ 0}.
x = δy + (1 − δ)z.
We will show there exists an optimal solution strictly fewer than r non-zero entries.
Then the result follows by induction.
By optimality of x, we have cT x ≥ cT y and cT x ≥ cT z.
Since cT x = δcT y + (1 − δ)cT z, we must have that cT x = cT y = cT z, i.e. y and z
are also optimal.
Since y ≥ 0 and z ≥ 0, x = δy + (1 − δ)z implies that yi = zi = 0 whenever xi = 0.
So the non-zero entries of y and z is a subset of the non-zero entries of x. So y
and z have at most r non-zero entries, which must occur in rows where x is also
non-zero.
If y or z has strictly fewer than r non-zero entries, then we are done. Otherwise,
for any δ̂ (not necessarily in (0, 1)), let
So any point in the interior cannot be better than the extreme points.
Theorem 2.39 The dual of the dual of a linear program is the primal.
Proof. It suffices to show this for the linear program in general form. We have shown
above that the dual problem is
minimize −bT λ subject to −AT λ ≥ −c, λ ≥ 0.
This problem has the same form as the primal, with −b taking the role of c, −c
taking the role of b, −AT taking the role of A. So doing it again, we get back to the
original problem. ■
2x1 + x2 + z1 = 4
2x1 + 3x2 + z2 = 6
x1 , x2 , z1 , z2 ≥ 0.
2λ1 + 2λ2 − µ1 = 3
λ1 + 3λ2 − µ2 = 2
λ1 , λ2 , µ1 , µ2 ≥ 0.
We can compute all basic solutions of the primal and the dual by setting n − m − 2
variables to be zero in turn.
λ1 z1 = λ2 z2 = 0, µ1 x1 = µ2 x2 = 0.
x1 x2 z1 z2 f (x) λ1 λ2 µ1 µ2 g(λ)
A 0 0 4 6 0 0 0 -3 -2 0
3
B 2 0 0 2 6 2 0 0 − 21 6
3 5
C 3 0 -2 0 9 0 2 0 2 9
3 13 5 1 13
D 2 1 0 0 2 4 4 0 0 2
2
E 0 2 2 0 4 0 3 − 35 0 4
F 0 4 0 -6 8 2 0 1 0 8
x2 λ2
F
C
E D
B D
C x1 F
λ1
A B A E λ1 + 3λ2 = 2
2x1 + 3x2 = 6
2x1 + x2 = 4 2λ1 + 2λ2 = 3
We see that D is the only solution such that both the primal and dual solutions
are feasible. So we know it is optimal without even having to calculate f (x). It
turns out this is always the case. ■
Theorem 2.40 Let x and λ be feasible for the primal and the dual of the linear
program in general form. Then x and λ and optimal if and only if they satisfy
complementary slackness, i.e. if
cT x = λT b
cT x = λT b
= inf
′
(cT x′ − λT (Ax′ − b))
x ∈X
≤ c x − λT (Ax − b)
T
≤ cT x.
So
(cT − λT A)x = 0.
Also,
cT x − λT (Ax − b) = cT x
implies
λT (Ax − b) = 0.
then
maximize x1 + x2 subject to
x1 + 2x2 + z1 = 6
x1 − x2 + z2 = 3
x1 , x2 , z1 , z2 ≥ 0.
x1 x2 z1 z2
Constraint 1 1 2 1 0 6
Constraint 2 1 -1 0 1 3
Objective 1 1 0 0 0
x1 x2 z1 z2
1 1
Constraint 1 2 1 2 0 3
2 1
Constraint 2 3 0 2 1 6
1
Objective 2 0 − 12 0 -3
Here we adopt the following notation: let A ⊆ Rm×n and b ∈ Rm . Assume that A
has full rank. Let B be a basis and set B ⊆ {1, 2, · · · , n} with |B| = m, corresponding
to at most m non-zero entries.
We rearrange the columns so that all basis columns are on the left. Then we can
write our matrices as
Am×n = (AB )m×m (AN )m×(n−m)
T
xn×1 = (xB )m×1 (xN )(n−m)×1
c1×n = (cB )m×1 (cN )(n−m)×1 .
Ax = b
can be decomposed as
AB xB + AN xN = b.
xB = A−1
B (b − AN xN ).
xB = A−1
B b.
A−1
B AB = I A−1
B AN A−1
B b
This might look really scary, and it is! Without caring too much about how the
formulas for the cells come from, we see the identity matrix on the left, which is
where we find our basic feasible solution. Below that is the row for the objective
function. The values of this row must be 0 for the basis columns.
On the right-most column, we have A−1 B b, which is our xB . Below that is
−cB A−1
T
B b, which is the negative of our objective function cTB xB .
The simplex tableau
We have
f (x) = cT x
= cTB xB + cTN xN
= cTB A−1 T
B (b − AN xN ) + cN xN
= cTB A−1 T T −1
B b + (cN − cB AB AN )xN .
We will maximize cT x by choosing a basis such that cTN − cTB A−1 B AN ≤ 0, i.e. non-
−1
positive everywhere and AB b ≥ 0.
If this is true, then for any feasible solution x ∈ Rn , we must have xN ≥ 0. So
(cN − cTB A−1
T
B AN )xN ≤ 0 and
aij ai0
a0j a00
where ai0 is b, a0j corresponds to the objective function, and a00 is initial 0.
The simplex method proceeds as follows:
1. Find an initial basic feasible solution.
2. Check whether a0j ≤ 0 for every j. If so, the current solution is optimal. Stop.
3. If not, choose a pivot column j such that a0j > 0. Choose a pivot row i ∈ {i :
aij > 0} that minimizes ai0 /aij . If multiple rows are minimize ai0 /aij , then
the problem is degenerate, and things might go wrong. If aij ≤ 0 for all i, i.e.
we cannot choose a pivot row, the problem is unbounded, and we stop.
4. We update the tableau by multiplying row i by 1/aij (such that the new
aij = 1), and add a (−akj /aij ) multiple of row i to each row k ̸= i, including
k = 0 (so that akj = 0 for all k ̸= i)
We have a basic feasible solution, since our choice of aij makes all right-hand
columns positive after subtracting (apart from a00 ).
5. GOTO (ii).
Now visit the example at the beginning of the section to see how this is done in
practice. Then read the next section for a more complicated example.
x1 + x2 ≥ 1
2x1 − x2 ≥ 1
3x2 ≤ 2
x1 , x 2 ≥ 0
x1 + x2 − z1 = 1
2x1 − x2 − z2 = 1
3x2 + z3 = 2
x1 , x2 , z1 , z2 , z3 ≥ 0
Now we don’t have a basic feasible solution, since we would need z1 = z2 = −1, z3 = 2,
which is not feasible. So we add more variables, called the artificial variables.
x1 + x2 − z1 + y1 = 1
2x1 − x2 − z2 + y2 = 1
3x2 + z3 = 2
x1 , x 2 , z 1 , z 2 , z 3 , y 1 , y 2 ≥ 0
Note that adding y1 and y2 might create new solutions, which is bad. We solve
this problem by first trying to make y1 and y2 both 0 and find a basic feasible
solution. Then we can throw away y1 and y2 and then get a basic feasible for our
original problem. So momentarily, we want to solve
minimize y1 + y2 subject to
x1 + x2 − z1 + y1 = 1
2x1 − x2 − z2 + y2 = 1
3x2 + z3 = 2
x1 , x 2 , z 1 , z 2 , z 3 , y 1 , y 2 ≥ 0
x1 x2 z1 z2 z3 y1 y2
1 1 -1 0 0 1 0 1
2 -1 0 -1 0 0 1 1
0 3 0 0 1 0 0 2
-6 -3 0 0 0 0 0 0
0 0 0 0 0 -1 -1 0
Note that we keep both our original and “kill-yi ” objectives, but now we only care
about the second one. We will keep track of the original objective so that we can
use it in the second phase.
We see an initial feasible solution y1 = y2 = 1, z3 = 2. However, this is not a
proper simplex tableau, as the basis columns should not have non-zero entries
(apart from the identity matrix itself). But we have the two −1s at the bottom!
So we add the first two rows to the last to obtain
x1 x2 z1 z2 z3 y1 y2
1 1 -1 0 0 1 0 1
2 -1 0 -1 0 0 1 1
0 3 0 0 1 0 0 2
-6 -3 0 0 0 0 0 0
3 0 -1 -1 0 0 0 2
Our pivot column is x1 , and our pivot row is the second row. We divide it by 1
and add/subtract it from other rows.
x1 x2 z1 z2 z3 y1 y2
3 1
0 2 -1 2 0 1 − 12 1
2
1 − 12 0 − 21 0 0 1
2
1
2
0 3 0 0 1 0 0 2
0 -6 0 -3 0 0 3 3
3 1
0 2 −1 2 0 0 − 32 1
2
There are two possible pivot columns. We pick z2 and use the first row as the pivot
row.
x1 x2 z1 z2 z3 y1 y2
0 3 -2 1 0 2 -1 1
1 1 -1 0 0 1 0 1
0 3 0 0 1 0 0 2
0 3 -6 0 0 6 0 6
0 0 0 0 0 -1 -1 0
We see that y1 and y2 are no longer in the basis, and hence take value 0. So we
drop all the phase I stuff, and are left with
x1 x2 z1 z2 z3
0 3 -2 1 0 1
1 1 -1 0 0 1
0 3 0 0 1 2
0 3 -6 0 0 6
x1 x2 z1 z2 z3
0 1 − 23 1
3 0 1
3
1 0 − 13 − 31 0 2
3
0 0 2 -1 1 1
0 0 -4 -1 0 5
Since the last row is all negative, we have complementary slackness. So this is a
optimal solution. So x1 = 23 , x2 = 13 , z3 = 1 is a feasible solution, and our optimal
value is 5.
Note that we previously said that the bottom right entry is the negative of the
optimal value, not the optimal value itself! This is correct, since in the tableau, we
are maximizing −6x1 − 3x2 , whose maximum value is −5. So the minimum value
of 6x1 + 3x2 is 5. ■
min xT c s.t. Ax ≤ b, x ≥ 0
where:
• x is the decision vector
• c is the cost vector
• A is the coefficient matrix
• bTis the right-hand side vector
• x denotes the transpose of x
The objective is to find the optimal solution that minimizes or maximizes the
objective function xT c, subject to the constraints Ax ≤ b and x ≥ 0.
2.94.1 Definitions
The following is an example of a problem in linear programming:
Maximize x + y − 2z
Subject to 2x + y + z ≤ 4
3x − y + z = 0
x, y, z ≥ 0
Solving this problem means finding real values for the variables x, y, z satisfying
the constraints 2x + y + z ≤ 4, 3x − y + z = 0, and x, y, z ≥ 0 that gives the maximum
possible value (if it exists) for the objective function x + y − 2z.
z 1
solution. Its objective function value, obtained by evaluating the objective
x 0
function at y = 1 , is 0 + 1 − 2(1) = −1. The set of feasible solutions to a linear
z 1
programming problem is called the feasible region.
More formally, a linear programming problem is an optimization problem of the
following form:
n
X
Maximize (or Minimize) cj x j
j=1
Subject to Pi (x1 , . . . , xn ) i = 1, . . . , m
max x + y
s.t. 2x + 2y ≤ 1
1
x x 0
The constraint says that x + y cannot exceed 12 . Now, both = 2 and = 1
y 0 y 2
1
are feasible solutions having objective function value 2 . Hence, they are both optimal
solutions. (In fact, this problem has infinitely many optimal solutions. Can you
specify all of them?)
Not all linear programming problems have optimal solutions. For example, a
problem can have no feasible solution. Such a problem is said to be infeasible. Here
is an example of an infeasible problem:
min x
s.t. x ≤ 1
x≥2
There is no value for x that is at the same time at most 1 and at least 2.
Even if a problem is not infeasible, it might not have an optimal solution as the
following example shows:
min x
s.t. x ≤ 0
Note that now matter what real number M we are given, we can always find a
feasible solution whose objective function value is less than M . Such a problem is
said to be unbounded. (For a maximization problem, it is unbounded if one can
find feasible solutions who objective function value is larger than any given real
number.)
So far, we have seen that a linear programming problem can have an optimal
solution, be infeasible, or be unbounded. Is it possible for a linear programming
problem to be not infeasible, not unbounded, and with no optimal solution?
The following optimization problem, though not a linear programming problem, is
not infeasible, not unbounded, and has no optimal solution:
min 2x
s.t. x ≤ 0
The objective function value is never negative and can get arbitrarily close to 0 but
can never attain 0.
A main result in linear programming states that if a linear programming problem
is not infeasible and is not unbounded, then it must have an optimal solution.
This result is known as the Fundamental Theorem of Linear Programming
(Theorem 2.42) and we will see a proof of this importan result. In the meantime,
we will consider the seemingly easier problem of determining if a system of linear
constraints has a solution.
Exercises
1. Determine all values of a such that the problem
min x + y
s.t. −3x + y ≥ a
2x − y ≥ 0
x + 2y ≥ 2
is infeasible.
2. Show that the problem
min 2x · 4y
s.t. e−3x+y ≥ 1
|2x − y| ≤ 4
Solutions
1. Adding the first two inequalities gives −x ≥ a. Adding 2 times the second
inequality and the third inequality gives 5x ≥ 2, implying that x ≥ 25 . Hence,
if a > − 25 , there is no solution.
Note that if a ≤ − 25 , then (x, y) = 25 , 45 satisfies all the inequalities. Hence,
the problem is infeasible if and only if a > − 25 .
2. Note that the constraint |2x − y| ≤ 4 is equivalent to the constraints 2x − y ≤ 4
and 2x − y ≥ −4 taken together, and the constraint e−3x+y ≥ 1 is equivalent to
−3x + y ≥ 0. Hence, we can rewrite the problem with linear constraints.
Finally, minimizing 2x · 4y is the same as minimizing 2x+2y , which is equivalent
to minimizing x + 2y.
Definition 2.32 Linear Function A function f (x1 , x2 , · · · , xn ) is linear if, and only
if, we have f (x1 , x2 , · · · , xn ) = c1 x1 + c2 x2 + · · · + cn xn , where the c1 , c2 , · · · , cn
coefficients are constants.
Min z = c1 x1 + c2 x2 + c3 x3 (2.33)
s.t. a11 x1 + a12 x2 + a13 x3 ≥ b1 (2.34)
a21 x1 + a22 x2 + a23 x3 ≤ b2 (2.35)
a31 x1 + a32 x2 + a33 x3 = b3 (2.36)
a41 x1 + a42 x2 + a43 x3 ≥ b4 (2.37)
x1 ≥ 0, x2 ≤ 0, x3 urs. (2.38)
Eq. (2.56) is the objective function, (2.57)-(2.60) are the functional constraints,
while (2.61) is the sign restrictions (urs signifies that the variable is unrestricted). If
we were to add any one of these following constraints x2 ∈ {0, 1} (x2 is binary-valued)
or x3 ∈ Z (x3 is integer-valued) we would have an Integer Program. For the purposes
of this class, an Integer Program (IP) is just an LP with added integer restrictions
on (some) variables.
While, in general, solvers will take any form of the LP, there are some special
forms we use in analysis:
LP Standard Form: The standard form has all constraints as equalities, and
all variables as non-negative. The generic LP is not in standard form, but any LP
can be converted to standard form.
−
Min z = c1 x1 − c2 x̂2 + c3 (x+
3 − x3 )
−
s.t. a11 x1 − a12 x2 + a13 (x+
3 − x 3 ) − s 1 = b1
−
a21 x1 − a22 x̂2 + a23 (x+
3 − x 3 ) + s 2 = b2
−
a31 x1 − a32 x̂2 + a33 (x+
3 − x 3 ) = b3
a41 x1 − a42 x̂2 + a43 x3 − s4 = b4
−
x1 , x̂2 , x+
3 , x3 , s1 , s2 , s4 ≥ 0.
x1 + x2 − x3 ≥ 2
3x1 − x2 + x3 = 2
2x1 − x2 ≤ 1
1
⃗x′ then J(S,⃗x′ ) = {1, 2} since ⃗x′ satisfies the first two constraints with
If =
3,
2
1 1 −1
equality but not the third. Hence, AS,⃗x′ = . ■
3 −1 1
1
A basic feasible solution to the system in Example 2.15 is 1 .
0
It is not difficult to see that in two dimensions, basic feasible solutions correspond
to “corner points” of the set of all solutions. Therefore, the notion of a basic feasible
solution generalizes the idea of a corner point to higher dimensions.
The following result is the basis for what is commonly known as the corner
method for solving linear programming problems in two variables.
Theorem 2.41 Basic Feasible Optimal Solutioncorner Let (P) be a linear program-
ming problem. Suppose that (P) has an optimal solution and there exists a basic
feasible solution to its constraints. Then there exists an optimal solution that is a
basic feasible solution.
We first state the following simple fact from linear algebra:
Lemma 2.4. Let A ∈ Rm×n and d⃗ ∈ Rn be such that Ad⃗ = ⃗0. If ⃗q ∈ Rn satisfies
⃗qT d⃗ ̸= 0 then ⃗qT is not in the row space of A.
ϵ⃗cT d⃗ ≥ 0
−ϵ⃗cT d⃗ ≥ 0,
min λ
⃗ ≥ p⃗
s.t. Q(⃗x∗ + λd)
Exercises
1. Find all basic feasible solutions to
x1 + 2x2 − x3 ≥ 1
x2 + 2x3 ≥ 3
−x1 + 2x2 + x3 ≥ 3
−x1 + x2 + x3 ≥ 0.
A⃗x = ⃗b
⃗x ≥ ⃗0.
Let J = {i : x′i > 0}. Prove that the columns of A indexed by J are linearly
independent.
Solutions
1. To obtain all the basic feasible solutions, it suffices to enumerate all subsystems
A′⃗x ≥ ⃗b′ of the given system such that the rank of A′ is three and solve A′⃗x = ⃗b′
for ⃗x and see if is a solution to the system, in which case it is a basic feasible
solution. Observe that every basic feasible solution can be discovered in this
manner.
We have at most four subsystems to consider.
0
Setting the first three inequalities to equality gives the unique solution 1
1
0
which satisfies the given system.. Hence, 1
is a basic feasible solution.
1
Setting the
5
first, second, and fourth inequalities to equality gives the unique
3
1
solution
3 which violates the third inequality of the given system.
4
3
Setting the first, third, and fourth inequalities to equality leads to no solution.
(In fact, the coefficient matrix of the system does not have rank 3 and therefore
this case can be ignored.)
3
Setting the last three inequalities to equality gives the unique solution 3
0
3
which satisfies the given system. Hence, 3
is a basic feasible solution.
0
A⃗x = ⃗b
(2.40)
xj = 0 j ∈
/ J.
Note that the coefficient matrix of this system has rank n if and only if it has
a unique solution. Now, (2.40) simplifies to
xj Aj = ⃗b,
X
j∈J
which has a unique solution if and only if the columns of A indexed by J are
linearly independent.
Proof. Without loss of generality, we may assume that the linear programming
problem is of the form
min ⃗cT⃗x
(2.41)
s.t. A⃗x ≥ ⃗b
z −⃗cT⃗x ≥ 0
−z +⃗cT⃗x ≥ 0 (2.42)
A⃗x ≥ ⃗b.
Solving (2.41) is equivalent to finding among all the solutions to (2.42) one that
minimizes z, if it exists. Eliminating the variables x1 , . . . , xn (in any order) using
Fourier-Motzkin elimination gives a system of linear inequalities (S) containing at
most the variable z. By scaling, we may assume that the each coefficient of z in (S)
is 1, −1, or 0. Note that any z satisfying (S) can be extended to a solution to (2.42)
and the z value from any solution to (2.42) must satisfy (S).
That (2.41) is not unbounded implies that (S) must contain an inequality of the
form z ≥ β for some β ∈ R. (Why?) Let all the inequalites in which the coefficient of
z is positive be
z ≥ βi
1 −⃗cT h
0
h i z i
α β y1∗ ∗ T
· · · ym −1 ⃗c
∗ ∗
≥ α β y1 · · · ym 0
⃗x ⃗b
0 A
where y∗ = [y1∗ , . . . , ym
∗ ]T . Hence, y ∗ , . . . , y ∗ are the desired multipliers.,
1 m
The significance of the fact that we can infer ⃗cT⃗x ≥ γ where γ will be discussed
in more details when we look at duality theory for linear programming.
Exercises
1. Determine the optimal value of the following linear programming problem:
min x
s.t. x + y ≥ 2
x − 2y + z ≥ 0
y − 2z ≥ −1.
min x1 + 2x2
s.t. x1 + 3x2 ≥ 4
−x1 + x2 ≥ 0.
Solutions
1. The problem is equivalent to determining the minimum value for x among all
x, y, z satisfying
x+y ≥ 2 (1)
x − 2y + z ≥ 0 (2)
y − 2z ≥ −1. (3)
x+y ≥ 2 (1)
x − 2y + z ≥ 0 (2)
1 1
2y − z ≥ −2. (4)
Eliminating z, we obtain
x+y ≥ 2 (1)
x − 23 y ≥ − 12 (5)
x+y ≥ 2 (1)
2
3x − y≥ − 13 (6)
Eliminating y, we get
5 5
3x ≥ 3 (7)
z − x1 − 2x2 ≥ 0 (1)
−z + x1 + 2x2 ≥ 0 (2)
x1 + 3x2 ≥ 4 (3)
−x1 + x2 ≥ 0 (4)
It is easily seen that if such a ⃗y exists, then the system A⃗x = ⃗b cannot have a
solution. (Simply multiply both sides of A⃗x = ⃗b on the left by ⃗y T .) However, proving
the converse requires a bit of work. A standard elementary proof involves using
Gauss-Jordan elimination to reduce the original system to an equivalent system
Q⃗x = d⃗ such that Q has a row of zero, say in row i, with d⃗i ̸= 0. The process can be
captured by a square matrix M satisfying MA = Q. We can then take ⃗y T to be the
ith row of M.
An analogous result holds for systems of linear inequalities. The following result
is one of the many variants of a result known as the Farkas’ Lemma:
In other words, the system A⃗x ≥ ⃗b has no solution if and only if one can infer
the inequality 0 ≥ γ for some γ > 0 by taking a nonnegative linear combination of
the inequalities.
This result essentially says that there is always a certificate (the m-tuple ⃗y with
the prescribed properties) for the infeasibility of the system A⃗x ≥ ⃗b. This allows
third parties to verify the claim of infeasibility without having to solve the system
from scratch.
2x − y + z ≥ 2
−x + y − z ≥ 0
−y + z ≥ 0,
1
example. ■
We now give a proof of Theorem 2.45. It is easy to see that if such a ⃗y exists,
then the system A⃗x ≥ ⃗b has no solution.
Part 1 of the theorem is known as weak duality. Part 2 of the theorem is often
called the Complementary Slackness Theorem.
Proof of Theorem 2.44. Note that if x∗j is constrained to be nonnegative, its corre-
T T
sponding dual constraint is y⃗∗ Aj ≤ cj . Hence, (cj − y⃗∗ Aj )x∗j ≥ 0 with equality if
T
and only if x∗j = 0 or y⃗∗ Aj = cj (or both).
T
If x∗j is constrained to be nonpositive, its corresponding dual constraint is y⃗∗ Aj ≥
T T
cj . Hence, (cj − y⃗∗ Aj )x∗j ≥ 0 with equality if and only if x∗j = 0 or y⃗∗ Aj = cj (or
both).
T
If x∗j is free, its corresponding dual constraint is y⃗∗ Aj = cj . Hence, (cj −
T
y⃗∗ Aj )x∗j = 0.
n
T
We can combine these three cases and obtain that (⃗cT − y⃗∗ A)x⃗∗ =
X
(cj −
j=1
T
y⃗∗ Aj )x∗j ≥ 0 with equality if and only if for each j = 1, . . . , n,
T
x∗j = 0 or y⃗∗ Aj = cj .
yi∗ = 0 or ⃗aT ⃗∗
(i) x = bi .
1
5
Solution: One could answer this question by solving (P) and then see if the
objective function value of ⃗x∗ , assuming that its feasibiilty has already been verified,
is equal to the optimal value. However, there is a way to make use of the given
information to save some work.
Let (D) denote the dual problem of (P):
max y1 + y2
s.t. y1 − y2 ≤ 2
y1 + 2y2 + 3y3 = 4
3y1 + y2 − 6y3 ≤ 2
y1 ≤ 0
y2 ≥ 0
y3 free.
One can check that ⃗x∗ is a feasible solution to (P). If ⃗x∗ is optimal, then there
must exist a feasible solution ⃗y ∗ to (D) satisfying together with ⃗x∗ the complementary
slackness conditions:
y1∗ = 0 or x∗1 + x∗2 + 3x∗3 = 1
y2∗ = 0 or −x∗1 + 2x∗2 + x∗3 = 1
y3∗ = 0 or 3x∗2 − 6x∗3 = 0
x∗1 = 0 or y1∗ − y2∗ = 2
x∗2 = 0 or y1∗ + 2y2∗ + 3y3∗ = 4
x∗3 = 0 or 3y1∗ + y2∗ − 6y3∗ = 2.
As x∗2 , x∗3 > 0, satisfying the above conditions require that
y1∗ + 2y2∗ + 3y3∗ = 4
3y1∗ + y2∗ − 6y3∗ = 2.
Solving for y2∗ and y3∗ in terms of y1∗ gives y2∗ = 2 − y1∗ , y3∗ = 13 y1∗ . To make ⃗y ∗
feasible to (D), we can set y1∗ = 0 to obtain the feasible solution y1∗ = 0, y2∗ = 2, y3∗ = 0.
We can check that this ⃗y ∗ satisfies the complementary slackness conditions with ⃗x∗ .
Hence, ⃗x∗ is an optimal solution to (P) by Theorem 2.44, part 2.
Exercises
1. Let (P) and (D) denote a primal-dual pair of linear programming problems.
Prove that if (P) is not infeasible and (D) is infeasible, then (P) is unbounded.
2. Let (P) denote the following linear programming problem:
3
x1
51
x2 = − 5 is an optimal solution to (P).
Determine if
x3 0
3. Let (P) denote the following linear programming problem:
x1 0
Determine if
x2 = 1 is an optimal solution to (P).
x3 0
4. Let m and n be positive integers. Let A ∈ Rm×n . Let ⃗b ∈ Rm . Let ⃗c ∈ Rn . Let
(P) denote the linear programming problem
min ⃗cT⃗x
s.t. A⃗x = ⃗b
⃗x ≥ ⃗0.
max ⃗y T⃗b
s.t. ⃗y T A ≤ ⃗cT .
Suppose that A has rank m and that (P) has at least one optimal solution.
Prove that if x∗j = 0 for every optimal solution x∗ to (P), then there exists an
optimal solution ⃗y ∗ to (D) such that (⃗y ∗ )T Aj < ci where Aj denotes the jth
column of A.
Solutions
max y1 + y2
s.t. y1 + y2 + y3 ≤ 0
y1 − 2y2 + 3y3 = 4
3y1 + y2 − 6y3 ≤ 2
y1 ≤ 0
y2 ≥ 0
y3 free.
\end{bmatrix}) were an optimal solution, there would exist y⃗∗ feasible to (D)
satisfying the complementary slackness conditions with ⃗x∗ :
Since x∗1 + x∗2 + 3x∗3 < 1, we must have y1∗ = 0. Also, x∗1 , x∗2 are both nonzero.
Hence,
y2∗ + y3∗ = 0
−2y2∗ + 3y3∗ = 4.
Solving gives y2∗ = − 54 and y3∗ = 45 . But this implies that y ∗ is not a feasible
solution to the dual problem since we need y2∗ ≥ 0. Hence, ⃗x∗ is not an optimal
solution to (P).
3. We show that it is not an optimal solution to (P). First, note that the dual
problem of (P) is
max 2y1 + y2
s.t. y1 − y2 − y3 ≤ 1
2y1 + y2 + y3 ≤ 2
2y1 + y2 − y3 ≤ −3
y1 , y2 free.
y3 ≥ 0
0
⃗x∗ =
Note that 1
is a feasible solution to (P). If it were an optimal solution
0
to (P), there would exist y⃗∗ feasible to the dual problem (D) satisfying the
complementary slackness conditions with ⃗x∗ :
Since −x∗1 + x∗2 − x∗3 > 0, we must have y3∗ = 0. Also, x∗2 > 0 implies that
2y1∗ + y2∗ + y3∗ = 2. Simplifying gives y2∗ = 2 − 2y1∗ .
Hence, for y ∗ to be feasible to the dual problem, it needs to satisfy the third
constraint, 2y1∗ + (2 − 2y1∗ ) ≤ −3, which simplifies to the absurdity 2 ≤ −3.
Hence, ⃗x∗ is not an optimal solution to (P).
4. Let v denote the optimal value of (P). Let (P’) denote the problem
min −xi
s.t. A⃗x = ⃗b
⃗cT⃗x ≤ v
⃗x ≥ ⃗0
Note that x∗ is a feasible solution to (P’) if and only if it is an optimal solution
to (P). Since x∗i = 0 for every optimal solution to (P), we see that the optimal
value of (P’) is 0.
Let (D’) denote the dual problem of (P’):
max ⃗y T⃗b + vu
s.t. ⃗y T Ap + cp u ≤ 0 for all p ̸= i
⃗y T Ai + ci u ≤ −1
u ≤ 0.
Suppose that an optimal solution to (D’) is given by ⃗y ′ , u′ . Let ⃗y¯ be an optimal
solution to (D). We consider two cases.
Case 1: u′ = 0.
Then (⃗y ′ )T⃗b = 0. Hence, ⃗y ∗ = ⃗y¯ +⃗y ′ is an optimal solution to (D) with (⃗y ∗ )T Ai <
ci .
Case 2: u′ < 0.
Then (⃗y ′ )T⃗b + vu′ = 0, implying that |u1′ | (⃗y ′ )T⃗b = v. Let ⃗y ∗ = |u|
1 ′
⃗y . Then ⃗y ∗ is
an optimal solution to (D) with (⃗y ∗ )T Ai < ci .
It is easily seen that if such a ⃗y exists, then the system A⃗x = ⃗b cannot have a
solution. (Simply multiply both sides of A⃗x = ⃗b on the left by ⃗y T .) However, proving
the converse requires a bit of work. A standard elementary proof involves using
Gauss-Jordan elimination to reduce the original system to an equivalent system
Q⃗x = d⃗ such that Q has a row of zero, say in row i, with d⃗i ̸= 0. The process can be
captured by a square matrix M satisfying MA = Q. We can then take ⃗y T to be the
ith row of M.
An analogous result holds for systems of linear inequalities. The following result
is one of the many variants of a result known as the Farkas’ Lemma:
In other words, the system A⃗x ≥ ⃗b has no solution if and only if one can infer
the inequality 0 ≥ γ for some γ > 0 by taking a nonnegative linear combination of
the inequalities.
This result essentially says that there is always a certificate (the m-tuple ⃗y with
the prescribed properties) for the infeasibility of the system A⃗x ≥ ⃗b. This allows
third parties to verify the claim of infeasibility without having to solve the system
from scratch.
2x − y + z ≥ 2
−x + y − z ≥ 0
−y + z ≥ 0,
adding two times the second inequality and the third inequality to the first
We now give a proof of Theorem 2.45. It is easy to see that if such a ⃗y exists,
then the system A⃗x ≥ ⃗b has no solution.
Conversely, suppose that the system A⃗x ≥ ⃗b has no solution. It suffices to show
that we can infer the inequality 0 ≥ α for some postive α by taking nonnegative
linear combination of the inequalities in the system A⃗x ≥ ⃗b. If the system already
contains an inequality 0 ≥ α for some positive α, then we are done. Otherwise, we
show by induction on n that we can infer such an inequality.
Base case: The system A⃗x ≥ ⃗b has only one variable.
For the system to have no solution, there must exist two inequalites ax1 ≥ t and
′
−a′ x1 ≥ t′ such that a, a′ > 0 and at > −t 1
a′ . Adding a times the inequality ax1 ≥ t
′
and a1′ times the inequality −a′ x1 ≥ t′ gives the inequality 0 ≥ at + at ′ with a positive
right-hand side. This establishes the base case.
Induction hypothesis: Let n ≥ 2 be an integer. Assume that given any system
of linear inequalities A′⃗x ≥ ⃗b′ in n − 1 variables having no solution, one can infer the
inequality 0 ≥ α′ for some positive α′ by taking a nonnegative linear combination of
the inequalities in the system P⃗x ≥ ⃗q.
Apply Fourier-Motzkin elimination to eliminate xn from A⃗x ≥ ⃗b to obtain the
system P⃗x ≥ ⃗q. As A⃗x ≥ ⃗b has no solution, P⃗x ≥ ⃗q also has no solution.
By the induction hypothesis, one can infer the inequality 0 ≥ α for some positive
α by taking a nonnegative linear combination of the inequalities in P⃗x ≥ ⃗q. However,
each inequality in P⃗x ≥ ⃗q can be obtained from a nonnegative linear combination
of the inequalites in A⃗x ≥ ⃗b. Hence, one can infer the inequality 0 ≥ α by taking a
nonnegative linear combination of nonnegative linear cominbations of the inequalities
in A⃗x ≥ ⃗b. Since a nonnegative linear combination of nonnegative linear cominbations
of the inequalities in A⃗x ≥ ⃗b is simply a nonnegative linear combination of the
inequalities in A⃗x ≥ ⃗b, the result follows.
■
Remark. Notice that in the proof above, if A and ⃗b have only rational entries,
then we can take ⃗y to have only rational entries as well.
A⃗x = ⃗b
⃗x ≥ ⃗0
can be rewritten as
⃗b
A
⃗
x ≥ −b
−A ⃗
I ⃗0
where I is the n × n identity matrix. Then by Theorem 2.45, if this system has
no solution, then there exist ⃗u,⃗v ∈ Rm , w
⃗ ∈ Rn satisfying
⃗ ≥ ⃗0, uT A − vT A + w = 0, uT b − vT b > 0.
⃗u,⃗v , w
Exercises
1. You are given that the following system has no solution.
x1 + x2 + 2x3 ≥ 1
−x1 + x2 + x3 ≥ 2
x1 − x2 + x3 ≥ 1
−x2 − 3x3 ≥ 0.
Obtain a certificate of infeasibility for the system.
Solutions
1 1 2 1
−1 1 1 2
1. The system can be written as A⃗x ≥ ⃗b with A = and ⃗b = .
1 −1 1 1
0 −1 −3 0
So we need to find ⃗y ≥ 0 such that ⃗y T A = ⃗0 and ⃗y T⃗b > 0. As the system of
equations ⃗y T A = ⃗0 is homogeneous, we could without loss of generality fix
⃗y T⃗b = 1, thus leading to the system
⃗y T A = ⃗0
⃗y T⃗b = 1
⃗y ≥ ⃗0
that we could attempt to solve directly. However, it is possible to obtain a ⃗y
using the Fourier-Motzkin Elimination Method.
Let us first label the inequalities:
x1 + x2 + 2x3 ≥ 1 (1)
−x1 + x2 + x3 ≥ 2 (2)
x1 − x2 + x3 ≥ 1 (3)
−x2 − 3x3 ≥ 0. (4)
Eliminating x1 gives:
Note that (5) is obtained from (1) + (2) and (6) is obtained from (2) + (3).
Multiplying (5) by 12 gives
Eliminating x2 gives:
2x3 ≥ 3 (6)
3 3
− x3 ≥ (8)
2 2
where (8) is obtained from (4) + (7).
Now 34 × (6) + (8) gives 0 ≥ 15 4 , a contradiction.
To obtain a certificate of infeasibility, we trace back the computations. Note
that 34 (6) + (8) is given by 34 ((2) + (3)) + (4) + (7), which in turn is given by
3 1 3 1
4 ((2)+(3))+(4)+ 2 (5), which in turn is given by 4 ((2)+(3))+(4)+ 2 ((1)+(2)).
15
Thus, we can obtain 0 ≥ 4 from the nonnegative linear combination of the
original inequalities as follows: 12 (1) + 45 (2) + 34 (3) + (4).
1
2
5
Therefore, ⃗y 43
= is a certificate of infeasibility.
4
1
(Check that ⃗y T A = ⃗0 and ⃗y T⃗b > 0.
min x + y
s.t. x + 2y ≥ 2 (2.43)
3x + 2y ≥ 6.
min z
s.t. z − x − y = 0
(2.44)
x + 2y ≥ 2
3x + 2y ≥ 6.
Note that the objective function is replaced with z and z is set to the original
objective function in the first constraint of (3.2) since z = x + y if and only if
z − x − y = 0. Then, solving (3.2) is equivalent to finding among all the solutions to
the following system a solution that minimizes z, if it exists.
z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
3x + 2y ≥ 6 (4)
Since we are interested in the minimum possible value for z we use Fourier-Motzking
elimination to eliminate the variables x and y.
To eliminate x, we first multiply (4) by 31 to obtain:
z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
x + 32 y ≥ 2 (5)
Note that there is no need to keep the first inequality. To eliminate y, we first
multiply (7) by 3 to obtain:
z +y ≥ 2 (6)
3z − y ≥ 6 (8)
4z ≥ 8 (9)
Multiplying (9) by 14 gives z ≥ 2. Hence, the minimum possible value for z among
all the solutions to the system is 2. So the optimal value of (3.2) is 2. To obtain an
optimal solution, set z = 2. Then we have no choice but to set y = 0 and x = 2. One
can check that (x, y) = (2, 0) is a feasible solution with objective function value 2.
We can obtain an independent proof that the optimal value is indeed 2 if we
trace back the computations. Note that the inequality z ≥ 2 is given by
1 1 1
(9) ⇐ (6) + (8)
4 4 4
1 1 3
⇐ (1) + (3) + (7)
4 4 4
1 1 3 3
⇐ (1) + (3) + (1) + (5)
4 4 4 4
1 1
⇐ (1) + (3) + (4)
4 4
This shows that 14 (3) + 14 (4) gives the inequality x + y ≥ 2. Hence, no feasible
solution to (3.1) can have objective function value less than 2. But we have found
one feasible solution with objective function value 2. Hence, 2 is the optimal value.
maximize 3x + 2y
subject to x + 3y ≤ 6
2x + y ≤ 4
x ≥ 0
y ≥ 0.
■
We can solve
this maximization problem graphically as follows. We first sketch the
x
set of satisfying the constraints, called the feasible region, on the (x, y)-plane.
y
We then take the objective function 3x + 2y and turn it into an equation of a line
3x + 2y = z where z is a parameter. Note that as the value of z increases, the line
3
defined by the equation 3x + 2y = z moves in the direction of the normal vector .
2
We call this direction the direction of improvement. Determining the maximum
value of the objective function, called the optimal value, subject to the contraints
amounts to finding the maximum value of z so that the line defined by the equation
3x + 2y = z still intersects the feasible region.
x>=0
Direction of improvement
2x+
y
<=
4
(1.2,1.6)
x+3
y<=
6
y>=0
3x
3x
3x
+2
+2
+2
y=
y=
y=
6.8
0
In the figure above, the lines with z at 0, 4 and 6.8 have been drawn. From the
picture, we can see that if z is greater than 6.8, the line defined by 3x + 2y = z will
not intersect the feasible region. Hence, the profit cannot exceed 6.8 dollars.
As the line 3x + 2y = 6.8 does intersect the feasible region, 6.8 is the maximum
value for the objective function. Note that there is only one point in the feasible
x 1.2
region that intersects the line 3x + 2y = 6.8, namely = . In other words, to
y 1.6
maximize profit, we want to make 1.2 units of lemonade and 1.6 units of lemon juice.
The above solution method can hardly be regarded as rigorous because we relied
x
on a picture to conclude that 3x + 2y ≤ 6.8 for all satisfying the constraints. But
y
we can actually show this algebraically.
Note that multiplying both sides of the constraint x+3y ≤ 6 gives 0.2x+0.6y ≤ 1.2,
and multiplying both sides of the constraint 2x + y ≤ 4 gives 2.8x + 1.4y ≤ 5.6.
Exercises
x
1. Sketch all satisfying
y
x − 2y ≤ 2
on the (x, y)-plane.
2. Determine the optimal value of
Minimize x + y
Subject to 2x + y ≥ 4
x + 3y ≥ 1.
3. Show that the problem
Minimize −x + y
Subject to 2x − y ≥ 0
x + 3y ≥ 3
is unbounded.
4. Suppose that you are shopping for dietary supplements to satisfy your required
daily intake of 0.40mg of nutrient M and 0.30mg of nutrient N . There are
three popular products on the market. The costs and the amounts of the two
nutrients are given in the following table:
You want to determine how much of each product you should buy so that the
daily intake requirements of the two nutrients are satisfied at minimum cost.
Formulate your problem as a linear programming problem, assuming that you
can buy a fractional number of each product.
Solutions
1. The points (x, y) satisfying x − 2y ≤ 2 are precisely those above the line passing
through (2, 0) and (0, −1).
2. We want to determine the minimum value z so that x + y = z defines a line
that has a nonempty intersection with the feasible region. However, we can
avoid referring to a sketch by setting x = z − y and substituting for x in the
inequalities to obtain:
2(z − y) + y ≥ 4
(z − y) + 3y ≥ 1,
or equivalently,
1
z ≥ 2+ y
2
z ≥ 1 − 2y,
Thus, the minimum value for z is min{2 + 12 y, 1 − 2y}, which occurs at y = − 25 .
Hence, the optimal value is 95 .
We can verify our work by doing the following. If our calculations above are
correct, then an optimal solution is given by x = 11 2
5 , y = − 5 since x = z − y. It
is easy to check that this satisfies both inequalities and therefore is a feasible
solution.
Now, taking 25 times the first inequality and 15 times the second inequality,
we can infer the inequality x + y ≥ 95 . The left-hand side of this inequality is
precisely the objective function. Hence, no feasible solution can have objective
function value less than 95 . But x = 11 2
5 , y = − 5 is a feasible solution with
9
objective function value equal to 5 . As a result, it must be an optimal solution.
Remark. We have not yet discussed how to obtain the multipliers 25 and 15 for
inferring the inequality x + y ≥ 59 . This is an issue that will be taken up later.
In the meantime, think about how one could have obtained these multipliers
for this particular exercise.
3. We could glean some insight by first making a sketch on the (x, y)-plane.
The
line
defined
by −x + y = z has x-intercept −z. Note that for z ≤ −3,
x
=
−z
satisfies both inequalities and the value of the objective function
y 0
x −z
at = is z. Hence, there is no lower bound on the value of objective
y 0
function.
4. Let xi denote the amount of Product i to buy for i = 1, 2, 3. Then, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.
Remark. If one cannot buy fractional amounts of the products, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.
x1 , x2 , x3 ∈ Z.
x2
4
3 3
≤
x2
+2
1
−x
5
2
≤
x2
−
1
x1
2x
1
+
x2
≥
3
0 x1
-1 0 1 2 3 4 5
-1
-2
-3
footnote https://tex.stackexchange.
com/questions/75933/how-to-draw-the-region-of-inequality
Work Scheduling Problem: You are the manager of LP Burger. The following
table shows the minimum number of employees required to staff the restaurant on
each day of the week. Each employees must work for five consecutive days. Formulate
an LP to find the minimum number of employees required to staff the restaurant.
Decision variables:
xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7
Min z = x1 + x2 + x3 + x4 + x5 + x6 + x7
s.t. x1 + x4 + x5 + x6 + x7 ≥ 6
x2 + x5 + x6 + x7 + x1 ≥ 4
x3 + x6 + x7 + x1 + x2 ≥ 5
x4 + x7 + x1 + x2 + x3 ≥ 4
x5 + x1 + x2 + x3 + x4 ≥ 3
x6 + x2 + x3 + x4 + x5 ≥ 7
x7 + x3 + x4 + x5 + x6 ≥ 7
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.
The solution is as follows:
LP Solution IP Solution
zLP = 7.333 zI = 8.0
x1 = 0 x1 = 0
x2 = 0.333 x2 = 0
x3 = 1 x3 = 0
x4 = 2.333 x4 = 3
x5 = 0 x5 = 0
x6 = 3.333 x6 = 4
x7 = 0.333 x7 = 1
LP Burger has changed it’s policy, and allows, at most, two part time workers,
who work for two consecutive days in a week. Formulate this problem.
Decision variables:
xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7
yi : the number of workers that start 2 consecutive days of work on day i, i = 1, · · · , 7.
Min z = 5(x1 + x2 + x3 + x4 + x5 + x6 + x7 )
+ 2(y1 + y2 + y3 + y4 + y5 + y6 + y7 )
s.t. x1 + x4 + x5 + x6 + x7 + y1 + y7 ≥ 6
x2 + x5 + x6 + x7 + x1 + y2 + y1 ≥ 4
x3 + x6 + x7 + x1 + x2 + y3 + y2 ≥ 5
x4 + x7 + x1 + x2 + x3 + y4 + y3 ≥ 4
x5 + x1 + x2 + x3 + x4 + y5 + y4 ≥ 3
x6 + x2 + x3 + x4 + x5 + y6 + y5 ≥ 7
x7 + x3 + x4 + x5 + x6 + y7 + y6 ≥ 7
y1 + y2 + y3 + y4 + y5 + y6 + y7 ≤ 2
xi ≥ 0, yi ≥ 0, ∀i = 1, · · · , 7.
The Diet Problem: In the future (as envisioned in a bad 70’s science fiction
film) all food is in tablet form, and there are four types, green, blue, yellow, and red.
A balanced, futuristic diet requires, at least 20 units of Iron, 25 units of Vitamin B,
30 units of Vitamin C, and 15 units of Vitamin D. Formulate an LP that ensures a
balanced diet at the minimum possible cost.
The Next Diet Problem: Progress is important, and our last problem had
too many tablets, so we are going to produce a single, purple, 10 gram tablet for our
futuristic diet requires, which are at least 20 units of Iron, 25 units of Vitamin B,
30 units of Vitamin C, and 15 units of Vitamin D, and 2000 calories. The tablet is
made from blending 4 nutritious chemicals; the following table shows the units of our
nutrients per, and cost of, grams of each chemical. Formulate an LP that ensures a
balanced diet at the minimum possible cost.
Decision variables:
xi : grams of chemical i to include in the purple tablet, ∀i = 1, 2, 3, 4.
The assignment problem has an integrality property, such that if we remove the
binary restriction on the x variables (now just non-negative, i.e., xij ≥ 0) then we
still get binary assignments, despite the fact that it is now an LP. This property is
very interesting and useful. Of course, the objective function might not quite what
we want, we might be interested ensuring that the team with the worst assignment
is as good as possible (a fairness criteria). One way of doing this is to modify the
assignment problem using a max-min objective:
M ax z
n
X
s.t. xij = 1, ∀j = 1, · · · , n
i=1
Xn
xij = 1, ∀i = 1, · · · , n
j=1
xij ≥ 0, ∀i = 1, · · · , n, J = 1, · · · , n
n
X
z≤ Rij xij , ∀j = 1, · · · , n.
i=1
Does this formulation have the integrality property (it is not an assignment problem)?
Consider a very simple example where two teams are to be assigned to two projects
and the teams give the projects the following rankings: Both teams prefer Project
Project 1 Project 2
Team 1 2 1
Team 2 2 1
2. For both problems, if we remove the binary restriction on the x-variable, they
can take values between (and including) zero and one. For the assignment problem
the optimal solution will have z = 3, and fractional x-values will not improve z. For
the max-min assignment problem this is not the case, the optimal solution will have
z = 1.5, which occurs when each team is assigned half of each project (i.e., for Team
1 we have x11 = 0.5 and x21 = 0.5).
Linear Data Models: Consider a data set that consists of n data points (xi , yi ).
We want to fit the best line to this data, such that given an x-value, we can predict
the associated y-value. Thus, the form is yi = αxi + β and we want to choose the α
and β values such that we minimize the error for our n data points.
Variables:
ei : error for data point i, i = 1, · · · , n.
α : slope of fitted line.
β : intercept of fitted line.
n
X
M in |ei |
i=1
s.t. αxi + β − yi = ei , i = 1, · · · , n
ei , α, β urs.
Of course, absolute values are not linear function, so we can linearize as follows:
Decision variables:
+
ei : positive error for data point i, i = 1, · · · , n.
e−
i : negative error for data point i, i = 1, · · · , n.
α : slope of fitted line.
β : intercept of fitted line.
n
−
e+
X
M in i + ei
i=1
−
s.t. αxi + β − yi = e+ i − ei , i = 1, · · · , n
−
e+
i , ei ≥ 0, α, β urs.
We can have a similar game, but with a different payoff matrix, as follows:
A
R P S
R 4 -1 -1
B P -2 4 -2
S -3 -3 4
What is the optimal strategy for A (for either game)? We define xj as the
probability that A takes action j (related to the columns). Then the payoff for A, if
B takes action i is m j=1 cij xj . Of course, A does not know what action B will take,
P
so let’s find a strategy that maximizes the minimum expected winnings of A given
any random strategy of B, which we can formulate as follows:
m
X
M ax (mini=1,··· ,n cij xi )
j=1
m
X
s.t. xj = 1
j=1
xj ≥ 0, i = 1, · · · , m,
M ax z
m
X
s.t. z≤ cij xj , i = 1, · · · , n
j=1
m
X
xj = 1
j=1
xj ≥ 0, i = 1, · · · , m.
The last two constraints ensure the that xi -variables are valid probabilities. If
you solved this LP for the first game (i.e., payoff matrix) you find the best strategy
is x1 = 1/3, x2 = 1/3, and x3 = 1/3 and there is no expected gain for player A. For
the second game, the best strategy is x1 = 23/107, x2 = 37/107, and x3 = 47/107,
with A gaining, on average, 8/107 per round.
variables that are being optimized in the problem. They are typically denoted as
x1 , x2 , . . . , xn . Constraints: The constraints are the restrictions imposed on the
problem. They are typically denoted as Ax ≤ b, where A is the coefficient matrix and
b is the right-hand side vector. Non-Negativity Constraints: The non-negativity
constraints are the constraints that restrict the values of the decision variables to be
non-negative.
n
X
Min cj x j (objective function)
j=1
subject to:
n
X
aij xj = bi , i = 1 . . . m (constraints)
j=1
xj ≥ 0, j = 1 . . . n (non-negativity constraints)
Ax = b
min cT x subject to
x ≥ 0
where
x
1
..
x = . ∈ Rn×1
xn
b
1
..
b = . ∈ Rm×1
b
m
c1
..
c =
.
∈ Rn×1
cn
a11
...
A = ∈ Rm×n
amn
Basic Terminology
Definition 2.34 If x satisfies Ax = b, x ≥ 0, then x is feasible.
Definition 2.35 A linear program (LP) is feasible if there exists a feasible solution,
otherwise it is said to be infeasible.
Equivalent Forms
A linear program can take on several forms. We might be maximizing instead of
minimizing. We might have a combination of equality and inequality contraints.
Some variables may be restricted to be non-positive instead of non-negative, or be
unrestricted in sign. Two forms are said to be equivalent if they have the same set of
optimal solutions or are both infeasible or both unbounded.
1. A maximization problem can be expressed as a minimization problem.
aTi x ≤ bi
⇔
aT x ≥ b i
aTi x = bi i
aTi x ≤ bi
⇔ T
−a x ≤ −bi
i
aTi x ≤ bi ⇔ aTi x + si = bi , si ≥ 0.
The optimal solution is (4, 2) of cost 2 (see Figure 2.9). If we were maximizing x2
instead of minimizing under the same feasible region, the resulting linear program
would be unbounded since x2 can increase arbitrarily. From this picture, the reader
should be convinced that, for any objective function for which the linear program is
bounded, there exists an optimal solution which is a “corner” of the feasible region.
We shall formalize this notion in the next section.
An example of an infeasible linear program can be obtained by reversing some of
the inequalities of the above LP:
x1 ≤ 2
3x1 − x2 ≥ 0
x1 + x2 ≥ 6
−x1 + 2x2 ≤ 0.
′
Theorem 2.46 Assume min{cT x : x ∈ P } is finite, then ∀x ∈ P, ∃ a vertex x such
′
that cT x ≤ cT x.
′
Proof. If x is a vertex, then take x = x.
If x is not a vertex, then, by definition, ∃y =
̸ 0 s.t. x + y, x − y ∈ P . Since
A(x + y) = b and A(x − y) = b, Ay = 0.
Proof. Suppose not. Take an optimal solution. By Theorem 2.46 there exists a
vertex costing no more and this vertex must be optimal as well. ■
■ Example 2.23 Niki holds two part-time jobs, Job I and Job II. She never wants
to work more than a total of 12 hours a week. She has determined that for every
hour she works at Job I, she needs 2 hours of preparation time, and for every hour
she works at Job II, she needs one hour of preparation time, and she cannot spend
more than 16 hours for preparation.
If Nikki makes 40 an hour at Job I, and 30 an hour at Job II, how many hours
should she work per week at each job to maximize her income? ■
I = 40x + 30y
Our next task is to find the constraints. The second sentence in the problem
states, “She never wants to work more than a total of 12 hours a week.” This
translates into the following constraint:
x + y ≤ 12
The third sentence states, “For every hour she works at Job I, she needs 2 hours
of preparation time, and for every hour she works at Job II, she needs one hour of
preparation time, and she cannot spend more than 16 hours for preparation.” The
translation follows:
2x + y ≤ 16
The fact that x and y can never be negative is represented by the following two
constraints:
x ≥ 0 and y ≥ 0
The line for a constraint will divide the plane into two regions, one of which
satisfies the inequality part of the constraint. A test point is used to determine which
portion of the plane to shade to satisfy the inequality. Any point on the plane that
is not on the line can be used as a test point.
• If the test point satisfies the inequality, then the region of the plane that
satisfies the inequality is the region that contains the test point.
• If the test point does not satisfy the inequality, then the region that satisfies
the inequality lies on the opposite side of the line from the test point.
In the graph below, after the lines representing the constraints were graphed
using an appropriate method from Chapter 1, the point (0, 0) was used as a test
point to determine that
• (0, 0) satisfies the constraint x + y ≤ 12 because 0 + 0 ≤ 12.
• (0, 0) satisfies the constraint 2x + y ≤ 16 because 2(0) + 0 ≤ 16.
Therefore, in this example, we shade the region that is below and to the left of
both constraint lines, but also above the x-axis and to the right of the y-axis, in
order to further satisfy the constraints x ≥ 0 and y ≥ 0.
The shaded region where all conditions are satisfied is called the feasibility region
or the feasibility polygon.
The Fundamental Theorem of Linear Programming states that the maximum (or
minimum) value of the objective function always takes place at the vertices of the
feasibility region.
Therefore, we will identify all the vertices (corner points) of the feasibility region.
We call these points critical points. They are listed as (0, 0), (0, 12), (4, 8), (8, 0). To
maximize Niki’s income, we will substitute these points in the objective function to
see which point gives us the highest income per week. We list the results below.
■ Example 2.24 A factory manufactures two types of gadgets, regular and premium.
Each gadget requires the use of two operations, assembly and finishing, and there
are at most 12 hours available for each operation. A regular gadget requires 1 hour
of assembly and 2 hours of finishing, while a premium gadget needs 2 hours of
assembly and 1 hour of finishing. Due to other restrictions, the company can make
at most 7 gadgets a day. If a profit of 20isrealizedf oreachregulargadgetand30 for
a premium gadget, how many of each should be manufactured to maximize profit?
■
P = 20x + 30y
We now write the constraints. The fourth sentence states that the company can
make at most 7 gadgets a day. This translates as
x+y ≤ 7
Since the regular gadget requires one hour of assembly and the premium gadget
requires two hours of assembly, and there are at most 12 hours available for this
operation, we get
x + 2y ≤ 12
Similarly, the regular gadget requires two hours of finishing and the premium
gadget one hour. Again, there are at most 12 hours available for finishing. This gives
us the following constraint:
2x + y ≤ 12
The fact that x and y can never be negative is represented by the following two
constraints:
x ≥ 0 and y ≥ 0
In order to solve the problem, we next graph the constraints and feasibility region.
Again, we have shaded the feasibility region, where all constraints are satisfied.
Since the extreme value of the objective function always takes place at the vertices
of the feasibility region, we identify all the critical points. They are listed as (0, 0),
(0, 6), (2, 5), (5, 2), and (6, 0). To maximize profit, we will substitute these points
in the objective function to see which point gives us the maximum profit each day.
The results are listed below.
Critical Point Income
(0, 0) 20(0) + 30(0) = $0
(0, 6) 20(0) + 30(6) = $180
(2, 5) 20(2) + 30(5) = $190
(5, 2) 20(5) + 30(2) = $160
(6, 0) 20(6) + 30(0) = $120
The point (2, 5) gives the most profit, and that profit is $190.
maximize 3x + 2y
subject to x + 3y ≤ 6
2x + y ≤ 4
x ≥ 0
y ≥ 0.
contraints amounts to finding the maximum value of z so that the line defined by
the equation 3x + 2y = z still intersects the feasible region.
x>=0
Direction of improvement
2x+
y<=
4
(1.2,1.6)
x+3
y <=6
y>=0
3x
3x
3x
+2
+2
+2
y=
y=
y=
6
0
.8
In the figure above, the lines with z at 0, 4 and 6.8 have been drawn. From the
picture, we can see that if z is greater than 6.8, the line defined by 3x + 2y = z will
not intersect the feasible region. Hence, the profit cannot exceed 6.8 dollars.
As the line 3x + 2y = 6.8 does intersect the feasible region, 6.8 is the maximum
value for the objective function. Note that there is only one point
in the feasible
x 1.2
region that intersects the line 3x + 2y = 6.8, namely = . In other words,
y 1.6
to maximize profit, we want to make 1.2 units of lemonade and 1.6 units of lemon
juice.
The above solution method can hardly be regarded as rigorous because we relied
x
on a picture to conclude that 3x + 2y ≤ 6.8 for all satisfying the constraints.
y
But we can actually show this algebraically.
Note that multiplying both sides of the constraint x + 3y ≤ 6 gives 0.2x + 0.6y ≤
1.2, and multiplying
both sides of the constraint 2x + y ≤ 4 gives 2.8x + 1.4y ≤ 5.6.
x
Hence, any that satisfies both x + 3y ≤ 6 and 2x + y ≤ 4 must also satisfy
y
(0.2x + 0.6y) + (2.8x + 1.4y) ≤ 1.2 + 5.6, which simplifies to 3x + 2y ≤ 6.8 as desired!
(Here, we used the fact that if a ≤ b and c ≤ d, then a + c ≤ b + d.)
Now, one might ask if it is always possible to find an algebraic proof like the
one above for similar problems. If the answer is yes, how does one find such a
proof? We will see answers to this question later on.
Before we end this segment, let us consider the following problem:
minimize −2x + y
subject to −x + y ≤ 3
x − 2y ≤ 2
x ≥ 0
y ≥ 0.
x t
Note that for any t ≥ 0, = satisfies all the constraints. The value of
y t
x t
the objective function at = is −t. As t → ∞, the value of the objective
y t
function tends to −∞. Therefore, there is no minimum value for the objective
function. The problem is said to be unbounded. Later on, we will see how to
detect unboundedness algorithmically. ■
■ Example 2.26
x
1. Sketch all satisfying
y
x − 2y ≤ 2
Minimize x + y
Subject to 2x + y ≥ 4
x + 3y ≥ 1.
Minimize −x + y
Subject to 2x − y ≥ 0
x + 3y ≥ 3
is unbounded.
4. Suppose that you are shopping for dietary supplements to satisfy your required
daily intake of 0.40mg of nutrient M and 0.30mg of nutrient N . There are
three popular products on the market. The costs and the amounts of the
two nutrients are given in the following table:
You want to determine how much of each product you should buy so that
the daily intake requirements of the two nutrients are satisfied at minimum
cost. Formulate your problem as a linear programming problem, assuming
that you can buy a fractional number of each product.
■
Solution:
1. The points (x, y) satisfying x − 2y ≤ 2 are precisely those above the line passing
through (2, 0) and (0, −1).
2. We want to determine the minimum value z so that x + y = z defines a line
that has a nonempty intersection with the feasible region. However, we can
avoid referring to a sketch by setting x = z − y and substituting for x in the
inequalities to obtain:
2(z − y) + y ≥ 4
(z − y) + 3y ≥ 1,
or equivalently,
1
z ≥ 2+ y
2
z ≥ 1 − 2y,
Thus, the minimum value for z is min{2 + 12 y, 1 − 2y}, which occurs at y = − 25 .
Hence, the optimal value is 95 .
We can verify our work by doing the following. If our calculations above are
correct, then an optimal solution is given by x = 11 2
5 , y = − 5 since x = z − y. It
is easy to check that this satisfies both inequalities and therefore is a feasible
solution.
Now, taking 25 times the first inequality and 15 times the second inequality,
we can infer the inequality x + y ≥ 95 . The left-hand side of this inequality is
precisely the objective function. Hence, no feasible solution can have objective
function value less than 95 . But x = 11 2
5 , y = − 5 is a feasible solution with
objective function value equal to 95 . As a result, it must be an optimal solution.
Remark. We have not yet discussed how to obtain the multipliers 25 and 15 for
inferring the inequality x + y ≥ 59 . This is an issue that will be taken up later.
In the meantime, think about how one could have obtained these multipliers
for this particular exercise.
3. We could glean some insight by first making a sketch on the (x, y)-plane.
The
line defined
by −x + y = z has x-intercept −z. Note that for z ≤ −3,
x
=
−z
satisfies both inequalities and the value of the objective function
y 0
Remark. If one cannot buy fractional amounts of the products, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.
x1 , x2 , x3 ∈ Z.
2.103 Bases
Let x be a vertex of P = {x : Ax = b, x ≥ 0}. Suppose first that |{j : xj > 0}| = m
(where A is m × n). In this case we denote B = {j : xj > 0}. Also let AB = Ax ; we
use this notation not only for A and B, but also for x and for other sets of indices.
Then AB is a square matrix whose columns are linearly independent (by Theorem
2.47), so it is non-singular. Therefore we can express x as xj = 0 if j ̸∈ B, and since
AB xB = b, it follows that xB = A−1 B b. The variables corresponding to B will be called
basic. The others will be referred to as nonbasic. The set of indices corresponding to
nonbasic variables is denoted by N = {1, . . . , n} − B. Thus, we can write the above
as xB = A−1 B b and xN = 0.
Without loss of generality we will assume that A has full row rank, rank(A) = m.
Otherwise either there is a redundant constraint in the system Ax = b (and we can
remove it), or the system has no solution at all.
If |{j : xj > 0}| < m, we can augment Ax with additional linearly independent
columns, until it is an m × m sub-matrix of A of full rank, which we will denote AB .
In other words, although there may be less than m positive components in x, it is
convenient to always have a basis B such that |B| = m and AB is non-singular. This
enables us to always express x as we did before, xN = 0, xB = A−1 B b.
Summary x is a vertex of P iff there is B ⊆ {1, . . . , n} such that |B| = m and
1. xN = 0 for N = {1, . . . , n} − B
2. AB is non-singular
3. xB = A−1 B b≥0
In this case we say that x is a basic feasible solution. Note that a vertex can
have several basic feasible solution corresponding to it (by augmenting {j : xj > 0}
in different ways). A basis might not lead to any basic feasible solution since A−1 B b
is not necessarily non-negative.
■ Example 2.27
x1 + x2 + x3 = 5
2x1 − x2 + 2x3 = 1
x1 , x 2 , x 3 ≥ 0
We can select as a basis B = {1, 2}. Thus, N = {3} and
1 1
AB =
2 −1
1 1
A−1
B = 3
2 −1
3
3 3
2
A−1
B b =
3
x = (2, 3, 0)
n
R A crude upper bound on the number of vertices of P is m . This number
m
is exponential (it is upper bounded by n ). We can come up with a tighter
n− m
approximation of m 2 , though this is still exponential. The reason why the
2
number is much smaller is that most basic solutions to the system Ax = b
(which we counted) are not feasible, that is, they do not satisfy x ≥ 0.
min cB xB + cN xN
s.t. AB xB + AN xN = b
xB , x N ≥ 0
Here B is the basis corresponding to the bfs we are starting from. Note that, for
any solution x, xB = A−1 −1 T
B b − AB AN xN and that its total cost, c x can be specified
as follows:
cT x = cB x B + cN x N
= cB (A−1 −1
B b − AB AN xN ) + cN xN
= cB A−1 −1
B b + (cN − cB AB AN )xN
We denote the reduced cost of the non-basic variables by c̃N , c̃N = cN − cB A−1
B AN ,
i.e. the quantity which is the coefficient of xN above. If there is a j ∈ N such that
c̃j < 0, then by increasing xj (up from zero) we will decrease the cost (the value of
the objective function). Of course xB depends on xN , and we can increase xj only
as long as all the components of xB remain positive.
So in a step of the Simplex method, we find a j ∈ N such that c̃j < 0, and increase
it as much as possible while keeping xB ≥ 0. It is not possible any more to increase
xj , when one of the components of xB is zero. What happened is that a non-basic
variable is now positive and we include it in the basis, and one variable which was
basic is now zero, so we remove it from the basis.
If, on the other hand, there is no j ∈ N such that c̃j < 0, then we stop, and the
current basic feasible solution is an optimal solution. This follows from the new
expression for cT x since xN is nonnegative.
Remarks:
1. Note that some of the basic variables may be zero to begin with, and in this
case it is possible that we cannot increase xj at all. In this case we can replace
say j by k in the basis, but without moving from the vertex corresponding to
the basis. In the next step we might replace k by j, and be stuck in a loop.
Thus, we need to specify a “pivoting rule” to determine which index should
enter the basis, and which index should be removed from the basis.
2. While many pivoting rules (including those that are used in practice) can lead
to infinite loops, there is a pivoting rule which will not (known as the minimal
index rule - choose the minimal j and k possible [Bland, 1977]). This fact was
discovered by Bland in 1977. There are other methods of “breaking ties” which
eliminate infinite loops.
3. There is no known pivoting rule for which the number of pivots in the worst
case is better than exponential.
4. The question of the complexity of the Simplex algorithm and the last remark
leads to the question of what is the length of the shortest path between two
vertices of a convex polyhedron, where the path is along edges, and the length
of the path in measured in terms of the number of vertices visited.
Hirsch Conjecture: For m hyperplanes in d dimensions the length of the
shortest path between any two vertices of the arrangement is at most m − d.
This is a very open question — there is not even a polynomial bound proven
on this length.
On the other hand, one should note that even if the Hirsch Conjecture is true,
it doesn’t say much about the Simplex Algorithm, because Simplex generates
paths which are monotone with respect to the objective function, whereas the
shortest path need not be monotone.
Recently, Kalai (and others) has considered a randomized pivoting rule. The
idea is to randomly permute the index columns of A and to apply the Simplex
method, always choosing the smallest j possible. In this way, it is possible to
show a subexponential bound on the expected number of pivots. This leads to
a subexponential bound for the diameter of any convex polytope defined by m
hyperplanes in a d dimension space.
The question of the existence of a polynomial pivoting scheme is still open
though. We will see later a completely different algorithm which is polynomial,
which means of course that the system of equations has no feasible solution.
In fact, an elementary theorem of linear algebra says that if a system has no
solution, there is always a vector y such as in our example (y = (−4, 1, 1)) which
proves that the system has no solution. ■
Theorem 2.48 Exactly one of the following is true for the system Ax = b:
1. There is x such that Ax = b.
2. There is y such that AT y = 0 but y T b = 1.
This is not quite enough for our purposes, because a system can be feasible,
but still have no non-negative solutions x ≥ 0. Fortunately, the following lemma
establishes the equivalent results for our system Ax = b, x ≥ 0.
Theorem 2.49 — Farkas’ Lemma. Exactly one of the following is true for the system
Ax = b, x ≥ 0:
1. There is x such that Ax = b, x ≥ 0.
2. There is y such that AT y ≥ 0 but bT y < 0.
Proof. We will first show that the two conditions cannot happen together, and then
than at least one of them must happen.
Ax = b =⇒ y T Ax = y T b =⇒ xT AT y = y T b
The other direction is less trivial, and usually shown using properties of the
Simplex algorithm, mainly duality. We will use another tool, and later use Farkas’
Lemma to prove properties about duality in linear programming. The tool we shall
use is the Projection theorem, which we state without proof:
Theorem 2.50 — Projection Theorem. Let K be a closed convex (see Figure 2.12a)
non-empty set in Rn , and let b be any point in Rn . The projection of b onto K
is a point p ∈ K that minimizes the Euclidean distance ∥b − p∥. Then p has the
property that for all z ∈ K, (z − p)T (b − p) ≤ 0 (see Figure 2.13) non-empty set.
We are now ready to prove the other direction of Farkas’ Lemma. Assume that
there is no x such that Ax = b, x ≥ 0; we will show that there is y such that AT y ≥ 0
but y T b < 0.
Let K = {Ax : x ≥ 0} ⊆ Rm (A is an m × n matrix). K is a cone in Rm and it is
convex, non-empty and closed. According to our assumption, Ax = b, x ≥ 0 has no
solution, so b does not belong to K. Let p be the projection of b onto K.
Since p ∈ K, there is a w ≥ 0 such that Aw = p. According to the Projection
Theorem, for all z ∈ K, (z − p)T (b − p) ≤ 0 That is, for all x ≥ 0 (Ax − p)T (b − p) ≤ 0
We define y = p−b, which implies (Ax−p)T y ≥ 0. Since Aw = p, (Ax−Aw)T y ≥ 0.
(x − w)T (AT y) ≥ 0 for all x ≥ 0 (remember that w was fixed by choosing b).
Using a very similar proof one can show the same for the canonical form:
Theorem 2.51 Exactly one of the following is true for the system Ax ≤ b:
1. There is x such that Ax ≤ b.
2. There is y ≥ 0 such that AT y = 0 but y T b < 0.
The intuition behind the precise form for 2. in the previous theorem lies in the proof
that both cannot happen. The contradiction 0 = 0x = (y T A)x = y T (Ax) = y T b < 0
is obtained if AT y = 0 and y T b < 0.
2.106 Duality
Duality is the most important concept in linear programming. Duality allows to
provide a proof of optimality. This is not only important algorithmically but also it
leads to beautiful combinatorial statements. For example, consider the statement
min cT x
s.t. Ax = b
x≥0
Suppose we wanted to obtain the best possible upper bound on the cost function.
By multiplying each equation Am x = bm by some number ym and summing up the
resulting equations, we obtain that y T Ax = bT y. if we impose that the coefficient of
xj in the resulting inequality is less or equal to cj then bT y must be a lower bound
max bT y
s.t. AT y ≤ c
This is another linear program. We call this one the dual of the original one,
called the primal. As we just argued, solving this dual LP will give us a lower bound
on the optimum value of the primal problem. Weak duality says precisely this: if we
denote the optimum value of the primal by z, z = min cT x, and the optimum value
of the dual by w, then w ≤ z. We will use Farkas’ lemma to prove strong duality
which says that these quantities are in fact equal. We will also see that, in general,
the dual of the dual is the problem.
Example:
■ Example 2.29
z = min x1 + 2x2 + 4x3
x1 + x2 + 2x3 = 5
2x1 + x2 + 3x3 = 8
The first equality gives a lower bound of 5 on the optimum value z, since x1 +
2x2 + 4x3 ≥ x1 + x2 + 2x3 = 5 because of nonnegativity of the xi . We can get an
even better lower bound by taking 3 times the first equality minus the second one.
This gives 2x2 + 3x3 = 7 ≤ x1 + 2x2 + 4x3 , implying a lower bound of 7 on
x1 +
3
z. For x = 2
, the objective function is precisely 7, implying optimality. The
0
mechanism of generating lower bounds is formalized by the dual linear program:
y1 represents the multiplier for the first constraint and y2 the multiplier for the
second constraint,
This LP’s objective function also achieves a maximum value of
3
7 at y = . ■
−1
We now formalize the notion of duality. Let P and D be the following pair of
dual linear programs:
(P ) z = min{cT x : Ax = b, x ≥ 0}
(D) w = max{bT y : AT y ≤ c}.
(P ) is called the primal linear program and (D) the dual linear program.
In the proof below, we show that the dual of the dual is the primal. In other
words, if one formulates (D) as a linear program in standard form (i.e. in the same
form as (P )), its dual D(D) can be seen to be equivalent to the original primal (P ).
In any statement, we may thus replace the roles of primal and dual without affecting
the statement.
Proof. The dual problem D is equivalent to min{−bT y : AT y + Is = c, s ≥ 0}. Chang-
ing forms we get min{−bT y + + bT y − : AT y + − AT y − + Is = c, and y + , y − , s ≥ 0}.
Taking the dual of this we obtain: max{−cT x : A(−x) ≤ −b, −A(−x) ≤ b, I(−x) ≤ 0}.
But this is the same as min{cT x : Ax = b, x ≥ 0} and we are done. ■
We have the following results relating w and z.
Lemma 2.5. (Weak Duality) z ≥ w.
Proof. Suppose x is primal feasible and y is dual feasible. Then, cT x ≥ y T Ax = y T b,
thus z = min{cT x : Ax = b, x ≥ 0} ≥ max{bT y : AT y ≤ c} = w. ■
From the preceding lemma we conclude that the following cases are not possible
(these are dual statements):
1. P is feasible and unbounded and D feasible.
2. P is feasible and D is feasible and unbounded.
We should point out however that both the primal and the dual might be infeasible.
To prove a stronger version of the weak duality lemma, let’s recall the following
corollary of Farkas’ Lemma (Theorem 2.51):
Corollary 2.4 Exactly one of the following is true:
1. ∃x′ : A′ x′ ≤ b′ .
2. ∃y ′ ≥ 0 : (A′ )T y ′ = 0 and (b′ )T y ′ < 0.
Proof. We only need to show that z ≤ w. Assume without loss of generality (by
duality) that P is feasible. If P is unbounded, then by Weak Duality, we have that
z = w = −∞. Suppose P is bounded, and let x∗ be an optimal solution, i.e. Ax∗ = b,
x∗ ≥ 0 and cT x∗ = z. We claim that ∃y s.t. AT y ≤ c and bT y ≥ z. If so weare done.
A T
Suppose no such y exists. Then, by the preceding corollary, with A′ = ,
−bT
c ′ x
b′ = , x = y, y ′ = , ∃x ≥ 0, λ ≥ 0 such that
−z λ
Ax = λb
and cT x < λz.
We have two cases
• Case 1: λ ̸= 0. Since we can normalize byTλ we can assume that λ = 1. This
means that ∃x ≥ 0 such that Ax = b and c x < z. But this is a contradiction
with the optimality of x∗ .
• Case 2: λ = 0. This means∗ that ∃x ≥ 0 such that Ax = 0 and cT Tx <∗0. If this
is the case then ∀µ ≥ 0, x + µx is feasible for P and its cost is c (x + µx) =
cT x∗ + µ(cT x) < z, which is a contradiction.
■
Complementary Slackness
Let P and D be
(P ) z = min{cT x : Ax = b, x ≥ 0}
(D) w = max{bT y : AT y ≤ c},
c T x − bT y = c T x − x T A T y
= xT (c − AT y)
= xT s
since AT y + s = c.
The following theorem allows to check optimality of a primal and/or a dual
solution.
y1 + 2y2 = 1
y1 + y2 = 2
Note that this implies that y1 = 3 and y2 = −1. Since this solution satisfies the
other constraint of the dual, y is dual feasible, proving that x is an optimum solution
to the primal (and therefore y is an optimum solution to the dual).
min cT x
s.t.
Ax = b
x≥0
equivalent one with integer coefficients (just multiply everything by l.c.d.). In the
rest of these notes, we assume that A, b, c have integer coefficients.
For any integer n, we define its size as follows:
△
size(n) = 1 + ⌈log2 (|n| + 1)⌉
where the first 1 stands for the fact that we need one bit to store the sign of n,
size(n) represents the number of bits needed to encode n in binary. Analogously, we
define the size of a p × 1 vector d, and of a p × l matrix M as follows:
△ Pp
size(v) =i=1 size(vi )
△
size(M ) = pi=1 lj=1 size(mij )
P P
△
size(LP) = size(A) + size(b) + size(c).
Definition 2.40
△
L = size(detmax ) + size(bmax ) + size(cmax ) + m + n
where
△
detmax = max
′
(| det(A′ )|)
A
△
bmax = max(|bi |)
i
△
cmax = max(|cj |)
j
Hence, by 2,
n n n 2
2size(ai )−n = 2size(A)−n .
Y Y Y
1 + |det(A)| ≤ 1 + ∥ai ∥ ≤ (1 + ∥ai ∥) ≤
i=1 i=1 i=1
Let v ∈ Zp . Then size(v) ≥ size(maxj |vj |) + p − 1 = ⌈log(1 + maxj |vj |)⌉ + p. Hence,
size(b) + size(c) ≥ ⌈log(1 + max |cj |)⌉ + ⌈log(1 + max |bi |)⌉ + m + n. (2.46)
j i
R detmax ∗ bmax ∗ cmax ∗ 2m+n < 2L , since for any integer n, 2size(n) > |n|.
In what follows we will work with L as the size of the input to our algorithm.
and
0 ≤ pi < 2L
1 ≤ q < 2L .
Proof. Since x is a basic feasible solution, ∃ a basis B such that xB = A−1B b and
xN = 0. Thus, we can set pj = 0, ∀ j ∈ N , and focus our attention on the xj ’s such
that j ∈ B. We know by linear algebra that
1
xB = A−1
B b= cof (AB )b
det(AB )
where cof (AB ) is the cofactor matrix of AB . Every entry of AB consists of a
determinant of some submatrix of A. Let q = |det(AB )|, then q is an integer since AB
has integer components, q ≥ 1 since AB is invertible, and q ≤ detmax < 2L . Finally,
note that pB = qxB = |cof (AB )b|, thus pi ≤ m
j=1 |cof (AB )ij ||bj | ≤ m detmax bmax <
P
L
2 . ■
The first polynomial-time algorithm for linear programming is the so-called ellipsoid
algorithm which was proposed by Khachian in 1979. The ellipsoid algorithm was in
fact first developed for convex programming (of which linear programming is a special
case) in a series of papers by the russian mathematicians A.Ju. Levin and, D.B. Judin
and A.S. Nemirovskii, and is related to work of N.Z. Shor. Though of polynomial
running time, the algorithm is impractical for linear programming. Nevertheless it
has extensive theoretical applications in combinatorial optimization. For example,
the stable set problem on the so-called perfect graphs can be solved in polynomial
time using the ellipsoid algorithm. This is however a non-trivial non-combinatorial
algorithm.
Furthermore,
q 1 cT x 1 q 2 cT x 2
|cT x1 − cT x2 | = −
q1 q2
q1 q2 (cT x1 − cT x2 )
=
q1 q2
1
≥ since cT x1 − cT x2 ̸= 0, q1 , q2 ≥ 1
q1 q2
1
> L L = 2−2L since q1 , q2 < 2L .
2 2
■
Proof. Suppose x′ is not optimal. Then, ∃x∗ , an optimal vertex, such that cT x∗ = z.
Since x′ is not optimal, cT x′ ̸= cT x∗ , and by Theorem 2.56
⇒ cT x′ − cT x∗ > 2−2L
⇒ cT x′ > cT x∗ + 2−2L
= Z + 2−2L
≥ cT x by definition of x
≥ cT x ′ by definition of x′
⇒ cT x ′ > c T x ′ ,
a contradiction. ■
What this corollary tells us is that we do not need to be very precise when
choosing an optimal vertex. More precisely we only need to compute the objective
function with error less than 2−2L . If we find a vertex that is within that margin of
error, then it will be optimal.
minimize Z = cT x
(P ) subject to Ax = b,
x≥0
W = bT y
maximize
(D) subject to AT y + s = c,
s ≥ 0.
c T x − bT y = x T s > 0
very small but stay away from the boundaries. Two tools will be used to achieve this
goal in polynomial time.
x1
x1 x1
x2
x2
x2
. .
x= −→ x′ =
.
.
.
.
xn xn
xn .
and, therefore, the duality gap xT s = j xj sj remains unchanged under affine scaling.
P
As a consequence, we will see later that one can always work equivalently in the
transformed space.
Proof. Given any n positive numbers t1 , . . . , tn , we know that their geometric mean
does not exceed their arithmetic mean, i.e.
1/n
n n
Y 1 X
tj ≤ tj
j=1 n j=1
.
(In fact the last inequality can be derived directly from the concavity of the logarithmic
function). The lemma follows if we set tj = xj sj . ■
Since our objective is that G → −∞ as xT s → 0 (since our primary goal is to get
close to optimality), according to Lemma 2.7, we should choose some q > n (notice
that ln xT s → −∞ as xT s → 0) . In particular, if we choose q = n + 1, the√algorithm
will terminate after O(nL) iterations.
√ In fact we are going to set q = n + n, which
gives us the smallest number — O( nL) — of iterations by this method.
Question 2: When can we stop?
Iteration:
√
while G(xi , si ) > −2 nL
xi
either a primal step (changing only)
to get (xi+1 , si+1 )
i
do or a dual step (changing s only)
i := i + 1
The iterative step is as follows. Affine scaling maps (xi , si ) to (e, s′ ). In this
transformed space, the point is far away from the boundaries. Either a dual or
primal step occurs, giving (x̃, s̃) and reducing the potential function. The point is
then mapped back to the original space, resulting in (xi+1 , si+1 ).
Next, we are going to describe precisely how the primal or dual step is made such
that
7
G(xi+1 , si+1 ) − G(xi , si ) ≤ − <0
120
√
holds for either a primal or dual step, yielding an O( nL) total number of iterations.
In order to find the new point (x̃, s̃) given the current iterate (e, s′ ) (remember
we are working in the transformed space), we compute the gradient of the potential
function. This is the direction along which the value of the potential function changes
at the highest rate. Let g denote the gradient. Recall that (e, s′ ) is the map of the
current iterate, we obtain
g = ∇x G(x, s)|(e,s′ )
1/x1
q
.
= T s − ..
x s
1/xn (e,s′ )
q
= T ′ s′ − e (2.48)
e s
We would like to maximize the change in G, so we would like to move in the
direction of −g. However, we must insure the new point is still feasible (i.e. Ax̃ = b).
Let d be the projection of g onto the null space {x : Ax = 0} of A. Thus, we will
move in the direction of −d.
Ad = 0
T
∃w, s.t. A w = g − d.
This implies
T
A w = g−d
T (normal equations).
(A A )w = Ag
Claim 3: x̃ > 0.
d
Proof. x˜j = 1 − 41 ||d||
j
≥ 3
4 > 0. ■
This claim insures that the new iterate is still an interior point. For the similar
reason, we will see that s̃ > 0 when we make a dual step.
Proposition 2.3 When a primal step is made, G(x̃, s̃) − G(e, s′ ) ≤ − 120
7
.
If ||d|| < 0.4, we make a dual step. Again, we calculate the gradient
h = ∇s G(x, s)|(e,s′ )
1/s′1
q
..
= T ′e−
.
(2.49)
e s
1/s′n
Notice that hj = gj /sj , thus h and g can be seen to be approximately in the same
direction.
Thus, in the dual space, we move perpendicular to the null space and in the direction
of −(g − d).
Thus, we have
s̃ = s′ − (g − d)µ
T
For any µ, ∃y A y + s̃ = c
T ′ T
So, we can choose µ = e qs and get A (y ′ + µw) + s̃ = c.
Therefore,
′ eT s′
s̃ = s− (g − d)
q
eT s′ s′
= s′ − (q T ′ − e − d)
q e s
T
e s ′
= (d + e)
q
x̃ = x′ = e.
One can show that s̃ > 0 as we did in Claim 3. So such move is legal.
Proposition 2.4 When a dual step is made, G(x̃, s̃) − G(e, s′ ) ≤ − 61
According to these two propositions, the potential function decreases by a constant
amount at each √ step. So if we start
√ from an initial interior point (x0 , s0 ) with
G(x0 , s0 ) = O( nL), then after O(√ nL) iterations we will obtain another interior
j j j j
point (x , s ) with G(x , s ) ≤ −k nL. From Lemma 2.8, we know that the duality
gap (xj )T sj satisfies
(xj )T sj ≤ 2−kL ,
and the algorithm terminates by that time. Moreover, each iteration requires O(n3 )
operations. Indeed, in each iteration, the only non-trivial task is the computation of
the projected gradient d. This can be done by solving the linear system (ĀĀT )w = Āg
in O(n3 ) time using Gaussian elimination. Therefore, the overall time complexity of
this algorithm is O(n3.5 L). By using approximate solutions to the linear systems, we
can obtain O(n2.5 ) time per iteration, and total time O(n3 L).
1
G(x̃, s̃) − G(e, s′ ) = G(e − d, s̃) − G(e, s′ )
4||d||
n n
dT s′
! !
T ′ dj
ln s′j −
X X
= q ln e s − − ln 1 − −
4||d|| j=1 4||d|| j=1
n n
T ′
ln s′j
X X
−q ln e s + ln 1 +
j=1 j=1
n
d s′
T
! !
X dj
= q ln 1 − − ln 1 − .
4||d||eT s′ j=1 4||d||
x2
−x − ≤ ln(1 − x) ≤ −x (2.50)
2(1 − a)
q dT s′ n
dj n d2j
G(x̃, s̃) − G(e, s′ ) ≤ −
X X
+ + for a = 1/4
4||d||eT s′ j=1 4||d|| j=1 16||d||2 2(3/4)
q dT s′ eT d 1
= − T ′
+ +
4||d||e s 4||d|| 24
1 q 1
= (e − T ′ s′ )T d +
4||d|| e s 24
1 1
= (−g)T d +
4||d|| 24
||d|| 2 1
= − +
4||d|| 24
||d|| 1
= − +
4 24
1 1
≤ − +
10 24
7
= − .
120
Note that g T d = ||d||2 , since d is the projection of g. (This is where we use the
fact that d is the projected gradient!)
Before proving Proposition 2.4, we need the following lemma.
Lemma 2.9.
n
X eT s̃ −2
ln(s̃j ) − n ln( )≥ .
j=1 n 15
we see that
Pn eT s̃ Pn ∆ ∆ eT d
j=1 ln(s̃j ) − n ln( n ) = j=1 ln( q (1 + dj )) − n ln( q (1 + n ))
2 T
Pn j d e d
≥ j=1 (dj − 2(3/5) ) − n n
2
≥ − ||d||
6/5
−2
≥ 15
■
Proof of 2: Proposition 2.4
Using Lemma 2.9 and the inequality
n
X eT s
ln(sj ) ≤ n ln( ),
j=1 n
T T T ′
≤ q ln( eeT ss̃′ ) + 15
2
− n ln( e ns̃ ) + n ln( e ns )
√ T
= 2
15 + n ln( eeT ss̃′ )
On the other hand,
∆
eT s̃ = (n + eT d)
q
and recall that ∆ = eT s′ ,
eT s̃ 1 1 √
T ′
= (n + eT d) ≤ √ (n + 0.4 n),
e s q n+ n
√
since, by Cauchy-Schwartz inequality, |eT d| ≤ ||e|| ||d|| = n||d||. Combining the
above inequalities yields
√ √
0.6 √n
G(e, s̃) − G(e, s′ ) ≤ 2
15 + n ln(1 − n+ n
)
2 0.6n
≤ 15 − n+ √
n
2 3
≤ 15 − 10 = − 16
√
since n + n ≤ 2n.
This completes the analysis of Ye’s algorithm.
Assume that a11 ̸= 0 (otherwise, we can permute rows or columns). In the first
(1) (1)
iteration, we substract ai1 /a11 times the first row from row i where i = 2, . . . , m,
resulting in the following matrix:
(2) (2) (2)
a11 a12 . . . a1n
(2) (2)
(2)
0 a22 . . . a2n
A = .. .. .. ..
.
.
. . .
(2) (2)
0 am2 . . . amn
(i) (i)
In general, A(i+1) is obtained by subtracting aji /aii times row i from row j of A(i)
for j = i + 1, . . . , m.
(i)
Theorem 2.57 For all i ≤ j, k, ajk can be written in the form det(B)/ det(C) where
B and C are some submatrices of A.
Proof. Let Bi denote the i × i submatrix of A(i) consisting of the first i entries of the
(i)
first i rows. Let Bjk denote the i × i submatrix of A(i) consisting of the first i − 1
(i)
rows and row j, and the first i − 1 columns and column k. Since Bi and Bjk are
upper triangular matrices, their determinants are the products of the entries along
the main diagonal and, as a result, we have:
(i) det(Bi )
aii =
det(Bi−1 )
and
(i)
(i) det(Bjk )
ajk = .
det(Bi−1 )
Moreover, remember that row operations do not affect the determinants and, hence,
(i)
the determinants of Bjk and Bi−1 are also determinants of submatrices of the original
matrix A. ■
Using the fact that the size of the determinant of any submatrix of A is at most
the size of the matrix A, we obtain that all numbers occuring during Gaussian
elimination require only O(L) bits.
Finally, we need to round the current iterates x, y and s to O(L) bits. Otherwise,
these vectors would require a constantly increasing number of bits as we iterate. By
rounding up x and s, we insure that these vectors are still strictly positive. It is
fairly easy to check that this rounding does not change the potential function by a
significant amount and so the analysis of the algorithm is still valid. Notice that now
the primal and dual constraints might be slightly violated but this can be taken care
of in the rounding step.
M in cT x M ax bT y
(P ) s.t. Ax = b (D) s.t. AT y + s = c
x≥0 s ≥ 0
M in cT x + kc xn+1
′
(P ) s.t. 2L
Ax + (b − 2 Ae)xn+1 = b
4L T
(2 e − c) x 4L
+ 2 xn+2 = kb
x≥0 xn+1 ≥ 0 xn+2 ≥ 0
and
M in bT y + kb ym+1
′
(D ) s.t. T 4L
A y + (2 e − c)ym+1 + s = c
(b − 22L Ae)T y + sn+1 = kc
24L ym+1 + sn+2 = 0
s, sn+1 , sn+2 ≥ 0
where kb = 26L (n + 1) − 22L cT e is chosen in such a way that x′ = (x, xn+1 , xn+2 ) =
(22L e, 1, 22L ) is a (strict) feasible solution to (P ′ ) and kc = 26L . Notice that (y ′ , s′ ) =
(y, ym+1 , s, sn+1 , sn+2 ) = (0, −1, 24L e, kc , 24L ) is a feasible solution to (D′ ) with s′ > 0.
x′ and (y ′ , s′ ) serve as our initial feasible solutions.
We have to show: √
1. G(x′ ; s′ ) = O( n′ L) where n′ = n + 2,
2. the pair (P ′ ) − (D′ ) is equivalent to (P ) − (D),
3. the input size L′ for (P ′ ) as defined in the lecture notes does not increase too
much.
The proofs of these statements are simple but heavily use the definition of L and
the fact that vertices have all components bounded by 2L .
We first show 1. Notice first that x′j s′j = 26L for all j, implying that
′
′ ′ ′
√ T
n
n′ ) ln(x′ s′ ) − ln(x′j s′j )
X
G(x ; s ) = (n +
j=1
√
= (n′ + n′ ) ln(26L n′ ) − n′ ln(26L )
√ √
= n′ ln(26L ) + (n′ + n′ ) ln(n′ )
√
= O( n′ L)
Proof. To show that x′ is a feasible solution to (P ′ ) with x′n+2 > 0, we only need to
show that kb − (24L e − c)T x∗ > 0 (the reader can easily verify that x′ satisfy all the
equalities defining the feasible region of (P ′ )). This follows from the fact that
and
kb = 26L (n + 1) − 22L cT e ≥ 26L (n + 1) − 22L n max |cj | ≥ 26L n + 26L − 23L > n26L
j
where we have used the definition of L and the fact that vertices have all their entries
bounded by 2L .
To show that (y ′ , s′ ) is a feasible solution to (D′ ) with s′n+1 > 0, we only need to
show that kc − (b − 22L Ae)T y ∗ > 0. This is true since
(b − 22L Ae)T y ∗ ≤ bT y ∗ − 22L eT AT y ∗
■
This proposition shows that, from an optimal solution to (P ) − (D), we can easily
construct an optimal solution to (P ′ ) − (D′ ) of the same cost. Since this solution
has s′n+1 > 0, any optimal solution x̂ to (P ′ ) must have x̂n+1 = 0. Moreover, since
x′n+2 > 0, any optimal solution (ŷ, ŝ) to (D′ ) must satisfy ŝn+2 = 0 and, as a result,
ŷm+1 = 0. Hence, from any optimal solution to (P ′ ) − (D′ ), we can easily deduce an
optimal solution to (P ) − (D). This shows the equivalence between (P ) − (D) and
(P ′ ) − (D′ ).
By some tedious but straightforward calculations, it is possible to show that
L (corresponding to (P ′ ) − (D′ )) is at most 24L. In other words, (P ) − (D) and
′
The company can spend no more than 120 hours per week making toys and since
a plane takes 3 hours to make and a boat takes 1 hour to make we have:
Likewise, the company can spend no more than 160 hours per week finishing toys
and since it takes 1 hour to finish a plane and 2 hour to finish a boat we have:
Finally, we know that x1 ≤ 35, since the company will make no more than 35
planes per week. Thus the complete linear programming problem is given as:
max
z(x1 , x2 ) = 7x1 + 6x2
s.t. 3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
(2.54)
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
Min z = c1 x1 + c2 x2 + c3 x3 (2.56)
s.t. a11 x1 + a12 x2 + a13 x3 ≥ b1 (2.57)
a21 x1 + a22 x2 + a23 x3 ≤ b2 (2.58)
a31 x1 + a32 x2 + a33 x3 = b3 (2.59)
a41 x1 + a42 x2 + a43 x3 ≥ b4 (2.60)
x1 ≥ 0, x2 ≤ 0, x3 urs. (2.61)
z(x1 , . . . , xn ) = c1 x1 + · · · + cn xn (2.62)
For the time being, we will eschew the general form and focus exclusively on
linear programming problems with two variables. Using this limited case, we will
develop a graphical method for identifying optimal solutions, which we will generalize
later to problems with arbitrary numbers of variables.
2.114.2 Assumptions
Inspecting Example 2.30 (or the more general Problem 2.55) we can see there are
several assumptions that must be satisfied when using a linear programming model.
We enumerate these below:
Proportionality Assumption A problem can be phrased as a linear program only
if the contribution to the objective function and the left-hand-side of each
constraint by each decision variable (x1 , . . . , xn ) is proportional to the value of
the decision variable.
Additivity Assumption A problem can be phrased as a linear programming prob-
lem only if the contribution to the objective function and the left-hand-side of
each constraint by any decision variable xi (i = 1, . . . , n) is completely indepen-
dent of any other decision variable xj (j ̸= i) and additive.
Divisibility Assumption A problem can be phrased as a linear programming
problem only if the quantities represented by each decision variable are infinitely
divisible (i.e., fractional answers make sense).
Certainty Assumption A problem can be phrased as a linear programming prob-
lem only if the coefficients in the objective function and constraints are known
with certainty.
The first two assumptions simply assert (in English) that both the objective
function and functions on the left-hand-side of the (in)equalities in the constraints
are linear functions of the variables x1 , . . . , xn .
The third assumption asserts that a valid optimal answer could contain fractional
values for decision variables. It’s important to understand how this assumption
comes into play–even in the toy making example. Many quantities can be divided
into non-integer values (ounces, pounds etc.) but many other quantities cannot be
divided. For instance, can we really expect that it’s reasonable to make 12 a plane in
the toy making example? When values must be constrained to true integer values,
the linear programming problem is called an integer programming problem. There is
a vast literature dealing with these problems [PS98, WN99]. For many problems,
particularly when the values of the decision variables may become large, a fractional
optimal answer could be obtained and then rounded to the nearest integer to obtain
a reasonable answer. For example, if our toy problem were re-written so that the
optimal answer was to make 1045.3 planes, then we could round down to 1045.
The final assumption asserts that the coefficients (e.g., profit per plane or boat)
We will begin with a few examples, and then discuss specific problem types that
occur often.
■ Example 2.31 — Production with welding robot. You have 21 units of transpar-
ent aluminum alloy (TAA), LazWeld1, a joining robot leased for 23 hours, and
CrumCut1, a cutting robot leased for 17 hours of aluminum cutting. You also
have production code for a bookcase, desk, and cabinet, along with commitments
to buy any of these you can produce for $18, $16, and $10 apiece, respectively. A
bookcase requires 2 units of TAA, 3 hours of joining, and 1 hour of cutting, a desk
requires 2 units of TAA, 2 hours of joining, and 2 hour of cutting, and a cabinet
requires 1 unit of TAA, 2 hours of joining, and 1 hour of cutting. Formulate an
LP to maximize your revenue given your current resources. ■
Solution: Sets:
• The types of objects = { bookcase, desk, cabinet}.
Parameters:
• Purchase cost of each object
• Units of TAA needed for each object
• Hours of joining needed for each object
• Hours of cutting needed for each object
• Hours of TAA, Joining, and Cutting available on robots
Decision variables:
xi : number of units of product i to produce,
for all i =bookcase, desk, cabinet.
■ Example 2.32 — The Diet Problem. In the future (as envisioned in a bad 70’s
science fiction film) all food is in tablet form, and there are four types, green, blue,
yellow, and red. A balanced, futuristic diet requires, at least 20 units of Iron, 25
units of Vitamin B, 30 units of Vitamin C, and 15 units of Vitamin D. Formulate
an LP that ensures a balanced diet at the minimum possible cost. ■
Solution: Sets:
• Set of tablets {1, 2, 3, 4}
Parameters:
• Iron in each tablet
• Vitamin B in each tablet
• Vitamin C in each tablet
• Vitamin D in each tablet
• Cost of each tablet
Decision variables:
xi : number of tablet of type i to include in the diet, ∀i ∈ {1, 2, 3, 4}.
■ Example 2.33 — The Next Diet Problem. Progress is important, and our last
problem had too many tablets, so we are going to produce a single, purple, 10
gram tablet for our futuristic diet requires, which are at least 20 units of Iron, 25
units of Vitamin B, 30 units of Vitamin C, and 15 units of Vitamin D, and 2000
calories. The tablet is made from blending 4 nutritious chemicals; the following
table shows the units of our nutrients per, and cost of, grams of each chemical.
Formulate an LP that ensures a balanced diet at the minimum possible cost. ■
Solution: Sets:
• Set of chemicals {1, 2, 3, 4}
Parameters:
• Iron in each chemical
• Vitamin B in each chemical
• Vitamin C in each chemical
• Vitamin D in each chemical
• Cost of each chemical
Decision variables:
xi : grams of chemical i to include in the purple tablet, ∀i = 1, 2, 3, 4.
Objective and Constraints:
■Example 2.34 — Work Scheduling Problem. You are the manager of LP Burger.
The following table shows the minimum number of employees required to staff the
restaurant on each day of the week. Each employees must work for five consecutive
days. Formulate an LP to find the minimum number of employees required to staff
the restaurant.
■
Solution: This problem has multiple optimal solutions for which days workers
begin working on, all of which result in 8 total workers hired.
Decision variables:
xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7
Objective and Constraints:
Min z = x1 + x2 + x3 + x4 + x5 + x6 + x7
s.t. x1 + x4 + x5 + x6 + x7 ≥ 6
x2 + x5 + x6 + x7 + x1 ≥ 4
x3 + x6 + x7 + x1 + x2 ≥ 5
x4 + x7 + x1 + x2 + x3 ≥ 4
x5 + x1 + x2 + x3 + x4 ≥ 3
x6 + x2 + x3 + x4 + x5 ≥ 7
x7 + x3 + x4 + x5 + x6 ≥ 7
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.
■ Example 2.35 — LP Burger - extended. LP Burger has changed its policy, and
allows, at most, two part time workers, who work for two consecutive days in a
week. Formulate this problem. ■
Min z = 5(x1 + x2 + x3 + x4 + x5 + x6 + x7 )
+ 2(y1 + y2 + y3 + y4 + y5 + y6 + y7 )
s.t. x1 + x4 + x5 + x6 + x7 + y1 + y7 ≥ 6
x2 + x5 + x6 + x7 + x1 + y2 + y1 ≥ 4
x3 + x6 + x7 + x1 + x2 + y3 + y2 ≥ 5
x4 + x7 + x1 + x2 + x3 + y4 + y3 ≥ 4
x5 + x1 + x2 + x3 + x4 + y5 + y4 ≥ 3
x6 + x2 + x3 + x4 + x5 + y6 + y5 ≥ 7
x7 + x3 + x4 + x5 + x6 + y7 + y6 ≥ 7
y1 + y2 + y3 + y4 + y5 + y6 + y7 ≤ 2
xi ≥ 0, yi ≥ 0, ∀i = 1, · · · , 7.
■ Example 2.36 — Camping Trip. Imagine you are preparing for a week-long camping
trip to the mountains. You have a backpack with a weight capacity of 20 kilograms.
Your goal is to pack the most valuable items to ensure a comfortable and safe trip
without exceeding the backpack’s weight limit. Each item has a weight and a value
associated with it, representing its importance and utility for the trip. The table
below lists the potential items you can take, their weights in kilograms, and their
values on a scale from 1 to 10 (with 10 being the most valuable).
Formulate an Integer Program to find the optimal items you should bring on your
trip with you.
■
■ Example 2.37 — Capital Allocation Problem. You are a financial planner for an
investment firm. The firm has $100,000 to invest in a portfolio of projects. Each
project has a projected return and requires an initial investment. The goal is to
maximize the total return of the portfolio while not exceeding the available capital.
The following table lists potential projects, their required investments, and their
projected returns.
Formulate an LP to maximize the total return of the portfolio while not
exceeding the available investment capital.
■
■ Example 2.38 Hiring for tasks In this assignment problem, we need to hire three
people (Person 1, Person 2, Person 3) to three tasks (Task 1, Task 2, Task 3). In
the table below, we list the cost of hiring each person for each task, in dollars.
Since each person has a different cost for each task, we must make an assignment
Cost Task 1 Task 2 Task 3
Person 1 40 47 80
to minimize our total cost.
Person 2 72 36 58
Person 3 24 61 71
Given the specific costs of assigning three people to three tasks, we can write
Z = 40x11 + 47x12 + 80x13 + 72x21 + 36x22 + 58x23 + 24x31 + 61x32 + 71x33 (2.63)
Subject to Constraints:
Each person is assigned to exactly one task:
This explicit model incorporates the specific costs associated with each person-
task assignment and ensures that each person is assigned to exactly one task, each
task is assigned to exactly one person, and the overall cost is minimized.
We could write out this model using more generic notation in the following
way: We define the following sets, parameters, and variables to construct the
mathematical model.
Sets:
• I = {1, 2, 3}, the set of people.
• J = {1, 2, 3}, the set of tasks.
Parameters:
• Cij , the cost of assigning person i ∈ I to task j ∈ J. The costs are given in
the following table:
40 47 80
C = 72 36 58
24 61 71
Variables:
1 if person i is assigned to task j
• xij = 0 otherwise for all i ∈ I, j ∈ J.
Model:
The objective is to minimize the total cost of assignments:
XX
Minimize Z = Cij xij (2.71)
i∈I j∈J
X
xij = 1 ∀i ∈ I (2.72)
j∈J
X
xij = 1 ∀j ∈ J (2.73)
i∈I
This model ensures that each person is assigned to exactly one task, each task is
assigned to exactly one person, and the total cost of the assignments is minimized.
■
The assignment problem has an integrality property, such that if we remove the
binary restriction on the x variables (now just non-negative, i.e., xij ≥ 0) then we
still get binary assignments, despite the fact that it is now an LP. This property is
very interesting and useful. Of course, the objective function might not quite what
we want, we might be interested ensuring that the team with the worst assignment
is as good as possible (a fairness criteria). One way of doing this is to modify the
assignment problem using a max-min objective:
Max-min Assignment-like Formulation
M ax z
n
X
s.t. xij = 1, ∀j = 1, · · · , n
i=1
Xn
xij = 1, ∀i = 1, · · · , n
j=1
xij ≥ 0, ∀i = 1, · · · , n, J = 1, · · · , n
n
X
z≤ Rij xij , ∀j = 1, · · · , n.
i=1
Does this formulation have the integrality property (it is not an assignment problem)?
Consider a very simple example where two teams are to be assigned to two projects
and the teams give the projects the following rankings: Both teams prefer Project
2. For both problems, if we remove the binary restriction on the x-variable, they
Project 1 Project 2
Team 1 2 1
Team 2 2 1
can take values between (and including) zero and one. For the assignment problem
the optimal solution will have z = 3, and fractional x-values will not improve z. For
the max-min assignment problem this is not the case, the optimal solution will have
z = 1.5, which occurs when each team is assigned half of each project (i.e., for Team
1 we have x11 = 0.5 and x21 = 0.5).
6
5
4 1
2
3
6
5
4 1
2
3
Sets A finite network G is described by a finite set of vertices V and a finite set A of
arcs. Each arc (i, j) has two key attributes, namely its tail j ∈ V and its head i ∈ V .
We think of a (single) commodity as being allowed to "flow" along each arc, from
its tail to its head.
Variables Indeed, we have "flow" variables
X
max xsi max total flow from source (2.75)
(s,i)∈A
X X
s.t. xiv − xvj = 0 v ∈ V \{s, t} (2.76)
i:(i,v)∈A j:(v,j)∈A
0 ≤ xij ≤ uij ∀(i, j) ∈ A (2.77)
X
minimize ℓu→v · xu→v
u→v
X X
subject to xu→s − xs→w =1
u w
X X
xu→t − xt→w = −1
u w
X X
xu→v − xv→w = 0 for every vertex v ̸= s, t
u w
xu→v ≥ 0 for every edge u → v
Shortest Path Problem
Or maybe write it like this:
X
min cij xij (2.78)
(i,j)∈A
X
s.t. xij = 0∀i ∈ V \ {s, t} (2.79)
(i,j)∈δ + (i)
X
xij = −1 (2.80)
(i,j)∈δ + (s)
7
4
8 4
v2 11 v4
1, 2
7, 8
-2 v5 v6 10
1
4,
8, 2
7 4,
v2 v4
11, 7
-5 -6
for v ∈ V .
A flow is conservative if the net flow out of node v, minus the net flow into node
v, is equal to the net supply at node v, for all nodes v ∈ V .
The (single-commodity min-cost) network-flow problem is to find a minimumcost
conservative flow that is non-negative and respects the flow upper bounds on the
arcs.
Objective and Constraints We can formulate this as follows:
X
min cij xij minimize cost
(i,j)∈A
X X
xiv − xvi = bv , for all v ∈ V, flow conservation
(i,v)∈A (v,i)∈A
0 ≤ xij ≤ uij , for all (i, j) ∈ A.
k=0
xke ≥ 0, for e ∈ A, k = 1, 2, . . . , K
Notes:
K=1 is ordinary single-commodity network flow. Integer solutions for free when
node-supplies and arc capacities are integer. K=2 example below with integer data
gives a fractional basic optimum. This example doesn’t have any feasible integer
flow at all.
R Unfortunately, the same integrality theorem does not hold in the multi-
commodity network flow probem. Nontheless, if the quanties in each flow are
very large, then the LP solution will likely be very close to an integer valued
solution.
max min{x1 , . . . , xn }
such that x ∈ X
Having the minimum on the inside is inconvenient. To remove this, we just define
a new variable y and enforce that y ≤ xi and then we maximize y. Since we are
maximizing y, it will take the value of the smallest xi . Thus, we can recast the
problem as
max y
such that y ≤ xi for i = 1, . . . , n
x∈X
min z (2.83)
s.t. (2.84)
t ≤ z −t ≤ z (2.85)
min x + y
s.t. x + 2y ≥ 2 (3.1)
3x + 2y ≥ 6.
min z
s.t. z − x − y = 0
(3.2)
x + 2y ≥ 2
3x + 2y ≥ 6.
Note that the objective function is replaced with z and z is set to the original
objective function in the first constraint of (3.2) since z = x + y if and only if
z − x − y = 0. Then, solving (3.2) is equivalent to finding among all the solutions to
the following system a solution that minimizes z, if it exists.
z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
3x + 2y ≥ 6 (4)
Since we are interested in the minimum possible value for z we use Fourier-Motzking
elimination to eliminate the variables x and y.
To eliminate x, we first multiply (4) by 31 to obtain:
z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
x + 32 y ≥ 2 (5)
Note that there is no need to keep the first inequality. To eliminate y, we first
multiply (7) by 3 to obtain:
z +y ≥ 2 (6)
3z − y ≥ 6 (8)
4z ≥ 8 (9)
Multiplying (9) by 14 gives z ≥ 2. Hence, the minimum possible value for z among
all the solutions to the system is 2. So the optimal value of (3.2) is 2. To obtain an
optimal solution, set z = 2. Then we have no choice but to set y = 0 and x = 2. One
can check that (x, y) = (2, 0) is a feasible solution with objective function value 2.
We can obtain an independent proof that the optimal value is indeed 2 if we
trace back the computations. Note that the inequality z ≥ 2 is given by
1 1 1
(9) ⇐ (6) + (8)
4 4 4
1 1 3
⇐ (1) + (3) + (7)
4 4 4
1 1 3 3
⇐ (1) + (3) + (1) + (5)
4 4 4 4
1 1
⇐ (1) + (3) + (4)
4 4
This shows that 14 (3) + 14 (4) gives the inequality x + y ≥ 2. Hence, no feasible
solution to (3.1) can have objective function value less than 2. But we have found
one feasible solution with objective function value 2. Hence, 2 is the optimal value.
x+y ≥ 0
2x + y ≥ 2
−x + y ≥ 1
−x + 2y ≥ −1.
The key is to take one of the variables and see how it is constrained by the
remaining variables. We “isolate” x by rewriting the system to the equivalent system
x ≥ −y
1
x ≥ 1− y
2
x ≤ −1 + y
x ≤ 1 + 2y.
Hence, x is constrained by the lower bounds −y and 1 − 12 y and the upper bounds
−1 + y and 1 + 2y. Therefore, we can find a value for x satisfying these bounds if
and only if each of the upper bounds is at least each of the lower bounds; that is,
−1 + y ≥ −y
1
−1 + y ≥ 1 − y
2
1 + 2y ≥ −y
1
1 + 2y ≥ 1 − y.
2
Simplifying this system gives
2y ≥ 1
3
y≥2
2
3y ≥ −1
5
y ≥ 0,
2
or more simply,
1
y≥
2
4
y≥
3
Note that this system does not contain the variable x and it has a solution if and
only if y ≥ 34 . Hence, the original system has a solution if and only if y ≥ 34 . If we
set y = 2, for example, then x must satisfy
x ≥ −2
x≥0
x≤1
x ≤ 5.
Thus,
we can pick x to be any value in the closed interval [0, 1]. In particular,
x
=
0
is one solution to the given system of linear inequalities. There could be
y 2
other solutions.
The above example illustrates the process of solving a system of linear inequaltiies
by constructing a system that has a reduced number of variables. As the number
of variables is finite, the process can be repeated until we obtain a system whose
solvability is apparent (as in the one-variable case).
Observe that the pairing of an upper bound constraint of the form x ≤ q and a
lower bound constraint of the form x ≥ p to obtain q ≥ p is equivalent to adding the
inequalities −x ≥ −q and x ≥ p. This observation leads to the following:
original system. (Why?) Hence, the original system has a solution if and only
if the new system does.
Now, if we apply Fourier-Motzkin elimination repeatedly, we obtain a system
with at most one variable such that it has a solution if and only if the original system
does. Since solving systems of linear inequalities with at most one variable is easy,
we can conclude whether or not the original system has a solution.
Note that if the coefficients are all rational, the system obtained after eliminating
one variable using Fourier-Motzkin elimination will also have only rational coefficients.
x1 + x2 − 2x3 ≥ 2 (1)
−x1 − 3x2 + x3 ≥ 0 (2)
x2 + x3 ≥ 1 (3)
Remark. Note that setting x3 to another value larger than 4 will lead to different
solutions to the system. Since there are infinitely many different values that we can
set x3 to, there are infinitely many solutions.
Exercises
1. Use Fourier-Motzkin elimination to determine if there exist x, y, z ∈ R satisfying
x + y + 2z ≥ 1
−x + y + z ≥ 2
x−y+z ≥ 1
−y − 3z ≥ 0.
2. Let a , . . . , am ∈ Rn .
1 Let β1 , . . . , βm ∈ R. Let λ1 , . . . , λm ≥ 0. Then the inequal-
m
!T m
i
X X
ity λi a x≥ λi βi is called a nonnegative linear combination of the
i=1 i=1
T
inequalities ai x ≥ βi , i = 1, . . . , m. Show that any new inequality created by
Fourier-Motzkin Elimination is a nonnegative linear combination of the original
inequalities.
Solutions
−y − 3z ≥ 0
2y + 3z ≥ 3
2z ≥ 3.
Note that this system has a solution if and only if the original system does.
We now use Fourier-Motzkin elimination to eliminate y. First we multiply the
second inequality by 12 to obtain
−y − 3z ≥ 0
3 3
y+ z ≥
2 2
2z ≥ 3.
Eliminating y gives
2z ≥ 3
3 3
− z ≥ ,
2 2
or equivalently,
3
z ≥
2
z ≤ −1,
which clearly has no solution. Hence, there is no x, y, z satisfying the original
system.
2. First of all, observe that a nonnegative linear combination of ≥-inequalites that
are themselves nonnegative linear combination of the inequalites in Ax ≥ b is
again a nonnegative linear combination of inequalities in Ax ≥ b.
It is easy to see that in Step 1 of Fourier-Motzkin Elimination all inequalities
are nonnegative linear combinations of the original inequalities. For instance,
′T
multiplying ai x ≥ βi′ by α > 0 is the same as taking the nonnegative linear
m
!T m
i
λi βi with λi = 0 for all i ̸= i′ and λi′ = α.
X X
combination λi a x≥
i=1 i=1
In Step 2, new inequalities are formed by adding two inequalities from Step 1.
Hence, they are nonnegative linear combinations of the inequalities from Step 1.
By the observation at the beginning, they are nonnegative linear combinations
of the original system.
Remark. By the observation at the beginning and this result, we see that after
repeated applications of Fourier-Motzkin Elimination, all resulting inequalities
are nonnegative linear combinations of the original inequalities. This is an
important fact that will be exploited later.
min x1 − 2x2 + x3
s.t. x1 − 2x2 + x3 − x4 ≥ 1
x1 + x2 + 2x3 − 2x4 = 3
− x2 + 3x3 − 5x4 ≤ 7
x3 , x4 ≥ 0.
Any variable that does not appear in the bounds section is automatically assumed
to be nonnegative.
3.7 SoPlex
SoPlex is an open-source linear programming solver. It is free for noncommercial
use. Binaries for Mac OS X and Windows are readily available for download.
One great feature of SoPlex is that it can return exact rational solutions whereas
most other solvers only return solutions as floating-point numbers.
Suppose that the problem for the example at the beginning is saved in LP format
in an ASCII file named eg.lp. The following is the output of running SoPlex in a
macOS command-line terminal:
int:solvemode = 2
real:feastol = 0
real:opttol = 0
Floating-point optimal.
Max. bound violation = 0
Max. row violation = 1/4503599627370496
Max. reduced cost violation = 0
Max. dual violation = 0
Performing rational reconstruction . . .
Tolerances reached.
Solved to optimality.
The option -X asks the solver to display the primal rational solution. The options
--solvemode=2 invokes iterative refinement for solving for a rational solution. The
options -f=0 -o=0 set the primal feasibility and dual feasibility tolerances to 0.
Without these options, one might get only approximate solutions to the problem. If
we remove the last three options and replace -X with -x, we obtain the following
instead:
There are many solver options that one can specify. To view the list of all the
options, simply run the solver without options and arguments.
Exercises
1. Use SoPlex to obtain the exact optimal value of
min 3x + 2y + 9z
s.t. 53x + 20y + 96z ≥ 2
3
13x − 7y + 6z ≥ 17
−x + 71y − 3z ≥ 73
x , y , z ≥ 0.
Solutions
1882
1. The optimal value is 11679 .
max 2X + 5Y
s.t. X + 2Y ≤ 16
5X + 3Y ≤ 45
X, Y ≥ 0
We want to start by plotting the feasible region, that is, the set points (X, Y )
that satisfy all the constraints.
We can plot this by first plotting the four lines
• X + 2Y = 16
• 5X + 3Y = 45
• X =0
• Y =0
and then shading in the side of the space cut out by the corresponding inequality.
The resulting feasible region can then can be shaded in as the region that satisfies
all the inequalties.
Notice that the feasible region is nonempty (it has points that satisfy all the
inequalities) and also that it is bounded (the feasible points dont continue infinitly
in any direction).
We want to identify the extreme points (i.e., the corners) of the feasible region.
Understanding these points will be critical to understanding the optimal solutions of
the model. Notice that all extreme points can be computed by finding the intersection
of 2 of the lines. But! Not all intersections of any two lines are feasible.
We will later use the terminology basic feasible solution for an extreme point of
the feasible region, and basic solution as a point that is the intersection of 2 lines,
but is actually infeasible (does not satisfy all the constraints).
We will explore why this theorem is true, and also what happens when the feasible
region does not satisfy the assumptions of either nonempty or bounded. We illustrate
the idea first using the problem from Example 2.30.
3x1 + x2 = 120
x1 = 35
∇(7x1 + 6x2 )
x1 + 2x2 = 160
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
Figure 4.1: Feasible Region and Level Curves of the Objective Function: The shaded
region in the plot is the feasible region and represents the intersection of the five
inequalities constraining the values of x1 and x2 . On the right, we see the optimal
solution is the “last” point in the feasible region that intersects a level set as we
move in the direction of increasing profit.
■ Example 4.1 — Continuation of Example 2.30. Let’s continue the example of the
Toy Maker begin in Example 2.30. Solve this problem graphically. ■
−7 c
7x1 + 6x2 = c =⇒ x2 = x1 +
6 6
This is a set of parallel lines with slope −7/6 and intercept c/6 where c can be varied
as needed. The level curves for various values of c are parallel lines. In Figure 4.1
they are shown in colors ranging from red to yellow depending upon the value of c.
Larger values of c are more yellow.
To solve the linear programming problem, follow the level sets along the gradient
(shown as the black arrow) until the last level set (line) intersects the feasible region.
If you are doing this by hand, you can draw a single line of the form 7x1 + 6x2 = c
and then simply draw parallel lines in the direction of the gradient (7, 6). At some
point, these lines will fail to intersect the feasible region. The last line to intersect
the feasible region will do so at a point that maximizes the profit. In this case,
the point that maximizes z(x1 , x2 ) = 7x1 + 6x2 , subject to the constraints given, is
(x∗1 , x∗2 ) = (16, 72).
Note the point of optimality (x∗1 , x∗2 ) = (16, 72) is at a corner of the feasible
region. This corner is formed by the intersection of the two lines: 3x1 + x2 = 120 and
x1 + 2x2 = 160. In this case, the constraints
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
are both binding, while the other constraints are non-binding. In general, we will
see that when an optimal solution to a linear programming problem exists, it will
always be at the intersection of several binding constraints; that is, it will occur at a
corner of a higher-dimensional polyhedron.
We can now define an algorithm for identifying the solution to a linear programing
problem in two variables with a bounded feasible region (see Algorithm 1):
The example linear programming problem presented in the previous section has
a single optimal solution. In general, the following outcomes can occur in solving a
linear programming problem:
1. The linear programming problem has a unique solution. (We’ve already seen
this.)
2. There are infinitely many alternative optimal solutions.
3. There is no solution and the problem’s objective function can grow to posi-
tive infinity for maximization problems (or negative infinity for minimization
problems).
4. There is no solution to the problem at all.
Case 3 above can only occur when the feasible region is unbounded; that is, it
cannot be surrounded by a ball with finite radius. We will illustrate each of these
possible outcomes in the next four sections. We will prove that this is true in a later
chapter.
Solution: Applying our graphical method for finding optimal solutions to linear
programming problems yields the plot shown in Figure 4.2. The level curves for the
function z(x1 , x2 ) = 18x1 + 6x2 are parallel to one face of the polygon boundary of
the feasible region. Hence, as we move further up and to the right in the direction
of the gradient (corresponding to larger and larger values of z(x1 , x2 )) we see that
there is not one point on the boundary of the feasible region that intersects that
level set with greatest value, but instead a side of the polygon boundary described
by the line 3x1 + x2 = 120 where x1 ∈ [16, 35]. Let:
that is, S is the feasible region of the problem. Then for any value of x∗1 ∈ [16, 35]
and any value x∗2 so that 3x∗1 + x∗2 = 120, we will have z(x∗1 , x∗2 ) ≥ z(x1 , x2 ) for all
(x1 , x2 ) ∈ S. Since there are infinitely many values that x1 and x2 may take on, we
see this problem has an infinite number of alternative optimal solutions.
Exercise 4.1 Use the graphical method for solving linear programming problems
to solve the linear programming problem you defined in Exercise 2.9. ■
Based on the example in this section, we can modify our algorithm for finding
the solution to a linear programming problem graphically to deal with situations
with an infinite set of alternative optimal solutions (see Algorithm 2):
Exercise 4.2 Modify the linear programming problem from Exercise 2.9 to obtain
a linear programming problem with an infinite number of alternative optimal
solutions. Solve the new problem and obtain a description for the set of alternative
optimal solutions. [Hint: Just as in the example, x1 will be bound between two
value corresponding to a side of the polygon. Find those values and the constraint
that is binding. This will provide you with a description of the form for any
x∗1 ∈ [a, b] and x∗2 is chosen so that cx∗1 + dx∗2 = v, the point (x∗1 , x∗2 ) is an alternative
optimal solution to the problem. Now you fill in values for a, b, c, d and v.] ■
Solution: The level sets of the objective and the constraints are shown in Figure
4.3.
Figure 4.3: A Linear Programming Problem with no solution. The feasible region of
the linear programming problem is empty; that is, there are no values for x1 and x2
that can simultaneously satisfy all the constraints. Thus, no solution exists.
The fact that the feasible region is empty is shown by the fact that in Figure 4.3
there is no blue region–i.e., all the regions are gray indicating that the constraints
are not satisfiable.
Based on this example, we can modify our previous algorithm for finding the solution
to linear programming problems graphically (see Algorithm 3):
min Z = 5X + 7Y
s.t. X + 3Y ≥ 6
5X + 2Y ≥ 10
Y ≤4
X, Y ≥ 0
As you can see, the feasible region is unbounded. In particular, from any point in
the feasible region, one can always find another feasible point by increasing the X
coordinate (i.e., move to the right in the picture). However, this does not necessarily
mean that the optimization problem is unbounded.
Indeed, the optimal solution is at the B, the extreme point in the lower left hand
corner.
max Z = 5X + 7Y
s.t. X + 3Y ≥ 6
5X + 2Y ≥ 10
Y ≤4
X, Y ≥ 0
Solution: The feasible region and level curves of the objective function are shown
in Figure 4.4.
x1 − x2 = 1
2x1 + x2 = 6
Figure 4.4: A Linear Programming Problem with Unbounded Feasible Region: Note
that we can continue to make level curves of z(x1 , x2 ) corresponding to larger and
larger values as we move down and to the right. These curves will continue to
intersect the feasible region for any value of v = z(x1 , x2 ) we choose. Thus, we can
make z(x1 , x2 ) as large as we want and still find a point in the feasible region that will
provide this value. Hence, the optimal value of z(x1 , x2 ) subject to the constraints
+∞. That is, the problem is unbounded.
The feasible region in Figure 4.4 is clearly unbounded since it stretches upward
along the x2 axis infinitely far and also stretches rightward along the x1 axis infinitely
far, bounded below by the line x1 − x2 = 1. There is no way to enclose this region by
a disk of finite radius, hence the feasible region is not bounded.
We can draw more level curves of z(x1 , x2 ) in the direction of increase (down
and to the right) as long as we wish. There will always be an intersection point
with the feasible region because it is infinite. That is, these curves will continue to
intersect the feasible region for any value of v = z(x1 , x2 ) we choose. Thus, we can
make z(x1 , x2 ) as large as we want and still find a point in the feasible region that
will provide this value. Hence, the largest value z(x1 , x2 ) can take when (x1 , x2 ) are
in the feasible region is +∞. That is, the problem is unbounded.
Just because a linear programming problem has an unbounded feasible region
does not imply that there is not a finite solution. We illustrate this case by modifying
example .
Solution: The feasible region, level sets of z(x1 , x2 ) and gradients are shown
in Figure 4.5. In this case note, that the direction of increase of the objective
function is away from the direction in which the feasible region is unbounded (i.e.,
downward). As a result, the point in the feasible region with the largest z(x1 , x2 )
value is (7/3, 4/3). Again this is a vertex: the binding constraints are x1 − x2 = 1
and 2x1 + x2 = 6 and the solution occurs at the point these two lines intersect.
x1 − x2 = 1
2x1 + x2 = 6
� �
1
∇z(x1 , x2 ) = , −1
2
� �
7 4
,
3 3
Figure 4.5: A Linear Programming Problem with Unbounded Feasible Region and
Finite Solution: In this problem, the level curves of z(x1 , x2 ) increase in a more
“southernly” direction that in Example –that is, away from the direction in which
the feasible region increases without bound. The point in the feasible region with
largest z(x1 , x2 ) value is (7/3, 4/3). Note again, this is a vertex.
Based on these two examples, we can modify our algorithm for graphically solving a
two variable linear programming problems to deal with the case when the feasible
region is unbounded.
Exercise 4.3 Does the following problem have a bounded solution? Why?
min z(x1 , x2 ) = 2x1 − x2
s.t. x 1 − x2 ≤ 1
(4.5)
2x1 + x2 ≥ 6
x1 , x 2 ≥ 0
Exercise 4.5 Modify the objective function in Exercise 4.3 to produce a mini-
mization problem that has a finite solution. Draw the feasible region and level
curves of the objective to “prove” your example works.
[Hint: Think about what direction of increase is required for the level sets of
z(x1 , x2 ) (or find a trick using Exercise 2.4).] ■
Vectors: Vector n has n-elements and represents a point (or an arrow from
the origin to the point, denoting a direction) in Rn space (Euclidean or real space).
Vectors can be expressed as either row or column vectors.
Vector Addition: Two vectors of the same size can be added, componentwise, e.g.,
for vectors a = (2, 3) and b = (3, 2), a + b = (2 + 3, 3 + 2) = (5, 5).
Scalar Multiplication: A vector can be multiplied by a scalar k (constant) component-
wise. If k > 0 then this does not change the direction represented by the vector,
it just scales the vector.
Inner or Dot Product: Two vectors of the same size can be multiplied to produce
a real number. For example, ab = 2 ∗ 3 + 3 ∗ 2 = 10.
linear combination of a1 , a2 , · · · , ak .
combination of two points will lie on the line segment between the points.
two vectors are only linearly dependent if they lie on the same line. Can you have
three linearly independent vectors in R2 ?
vector in Rm .
Convex Set: Set S in Rn is a convex set if a line segment joining any pair of points
a1 and a2 in S is completely contained in ∫ , that is, λa1 + (1 − λ)a2 ∈ S, ∀λ ∈ [0, 1].
Polyhedral Set: A polyhedral set (or polyhedron) is the set of points in the
intersection of a finite set of half-spaces. Set S = {x : Ax ≤ b, x ≥ 0}, where A is an
m × n matrix, x is an n-vector, and b is an m-vector, is a polyhedral set defined by
m + n hyperplanes (i.e., the intersection of m + n half-spaces).
• Polyhedral sets are convex.
• A polytope is a bounded polyhedral set.
• A polyhedral cone is a polyhedral set where the hyperplanes (that define the
half-spaces) pass through the origin, thus C = {x : Ax ≤ 0} is a polyhedral
cone.
Edges and Faces: An edge of a polyhedral set S is defined by n − 1 hyperplanes,
and a face of S by one of more defining hyperplanes of S, thus an extreme point
and an edge are faces (an extreme point is a zero-dimensional face and an edge a
one-dimensional face). In R2 faces are only edges and extreme points, but in R3
there is a third type of face, and so on...
Unbounded Sets:
Convex Cone: A Convex Cone is a convex set that consists of rays emanating
from the origin. A convex cone is completely specified by its extreme directions. If C
is convex cone, then for any x ∈ C we have λx ∈ C, λ ≥ 0.
Let’s define a procedure for finding the extreme directions, using the following
LP’s feasible region. Graphically, we can see that the extreme directions should
follow the the s1 = 0 (red) line and the s3 = 0 (orange) line.
x2
max z = −5x1 − x2 5 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 1 s3 = 0
4
− x1 + 2x2 + s3 = 4
x1 , x2 , s1 , s2 , s3 ≥ 0.
3
1
s1 = 0
0
0 1 2 3 4 5 x1
x2
5 s2 = 0
s3 = 0
4
1 s1 = 0
0
0 1 2 3 4 5 x1
E.g., consider the s3 = 0 (orange) line, to find the extreme direction start at
extreme point (2,3) and find another feasible point on the orange line, say (4,4) and
subtract (2,3) from (4,4), which yields (2,1).
x2
max z = −5x1 − x2 4 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 0
3
− x1 + 2x2 + s3 = 0
x1 + x2 = 1
x1 , x2 , s1 , s2 , s3 ≥ 0. 2
s3 = 0
1
x1 + x2 = 1
s1 = 0
0
0 1 2 3 4 x1
x2
4 s2 = 0
2
s3 = 0
1
x1 + x2 = 1
s1 = 0
0
0 1 2 3 4 x1
1
s3 = 0
x1 + x2 = 1
s1 = 0
0 x1
0 1 2
The extreme directions are thus (4/5, 1/5) and (2/3, 1/3).
x2
max z = −5x1 − x2 5 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 1 s3 = 0
4
− x1 + 2x2 + s3 = 4
x1 , x2 , s1 , s2 , s3 ≥ 0.
3
1
s1 = 0
0
0 1 2 3 4 5 x1
1
Represent point ( 2 , 1) as a convex combination of the extreme points of the above
LP. Find λs to solve the following system of equations:
0 0 2 1/2
λ1 + λ2 + λ3 =
0 1 3 1
5.5 Matrices
Recall an m × n matrix is a rectangular array of numbers, usually drawn from a field
such as R. We write an m × n matrix with values in R as A ∈ Rm×n . The matrix
consists of m rows and n columns. The element in the ith row and j th column of
A is written as Aij . The j th column of A can be written as A·j , where the · is
interpreted as ranging over every value of i (from 1 to m). Similarly, the ith row of
A can be written as Ai· . When m = n, then the matrix A is called square.
■ Example 5.1
1 2 5 6 1+5 2+6 6 8
+ = = (5.2)
3 4 7 8 3+7 4+8 10 12
■
It should be clear that any row of matrix A could be considered a row vector in
Rn and any column of A could be considered a column vector in Rm .
Note, Ai· ∈ R1×n (an n-dimensional vector) and B·j ∈ Rn×1 (another n-dimensional
vector), thus making the dot product meaningful.
■ Example 5.2
1 2 5 6 1(5) + 2(7) 1(6) + 2(8) 19 22
= = (5.4)
3 4 7 8 3(5) + 4(7) 3(6) + 4(8) 43 50
■
■ Example 5.3
T
1 2 1 3
= (5.6)
3 4 2 4
■
Exercise 5.1 Let A, B ∈ Rm×n . Use the definitions of matrix addition and transpose
to prove that:
(A + B)T = AT + BT (5.8)
[Hint: If C = A + B, then Cij = Aij + Bij , the element in the (i, j) position of
matrix C. This element moves to the (j, i) position in the transpose. The (j, i)
position of AT + BT is ATji + BTji , but ATji = Aij . Reason from this point.] ■
Exercise 5.2 Let A, B ∈ Rm×n . Prove by example that AB ̸= BA; that is, matrix
multiplication is not commutative. [Hint: Almost any pair of matrices you pick
(that can be multiplied) will not commute.] ■
Exercise 5.3 Let A ∈ Rm×n and let, B ∈ Rn×p . Use the definitions of matrix
multiplication and transpose to prove that:
(AB)T = BT AT (5.9)
[Hint: Use similar reasoning to the hint in Exercise 5.1. But this time, note that
Cij = Ai· · B·j , which moves to the (j, i) position. Now figure out what is in the
(j, i) position of BT AT .] ■
Let A and B be two matrices with the same number of rows (so A ∈ Rm×n and
B ∈ Rm×p ). Then the augmented matrix [A|B] is:
a11 a12 ... a1n b11 b12 ... b1p
a21 a22 ... a2n b21 b22 ... b2p
.. .. .. .. .. .. (5.10)
. . . . . .
am1 am2 . . . amn bm1 bm2 . . . bmp
h i
Exercise 5.4 By analogy define the augmented matrix A
B . Note, this is not
a fraction. In your definition, identify the appropriate requirements on the
relationship between the number of rows and columns that the matrices must
have. [Hint: Unlike [A|B], the number of rows don’t have to be the same, since
your concatenating on the rows, not columns. There should be a relation between
the numbers of columns though.] ■
When it is clear from context, we may simply write I and omit the subscript
n.
Exercise 5.5 Let A ∈ Rn×n . Show that AIn = In A = A. Hence, I is an iden-
tify for the matrix multiplication operation on square matrices. [Hint: Do the
multiplication out long hand.] ■
Definition 5.6 — Standard Basis Vector. The standard basis vector ei ∈ Rn is:
ei = 0, 0, . . ., 1, 0, . . . , 0
| {z } | {z }
i−1 n−i−1
Note, this definition is only valid for n ≥ i. Further the standard basis vector ei
is also the ith row or column of In .
Definition 5.7 — Unit and Zero Vectors. The vector e ∈ Rn is the one vector
e = (1, 1, . . . , 1). Similarly, the zero vector 0 = (0, 0, . . . , 0) ∈ Rn . We assume that
the length of e and 0 will be determined from context.
Define:
x
y=
eT x
Show that eT y = yT e = 1. [Hint: First remember that eT x is a scalar value (it’s
e · x. Second, remember that a scalar times a vector is just a new vector with
each term multiplied by the scalar. Last, use these two pieces of information to
write the product eT y as a sum of fractions.] ■
Ax = b (5.13)
Ax ≤ b (5.14)
Using this representation, we can write our general linear programming problem
using matrix and vector notation. Expression 2.55 can be written as:
z(x) =cT x
max
s.t. Ax ≤ b (5.15)
Hx = r
For historical reasons, linear programs are not written in the general form of
Expression 5.15.
z(x) =cT x
max
s.t. Ax ≤ b (5.16)
x≥0
written as:
z(x) =cT x
min
s.t. Ax ≥ b (5.17)
x≥0
z(x) =cT x
max
s.t. Ax = b (5.18)
x≥0
Theorem 5.1 Every linear programming problem in canonical form can be put into
standard form.
Proof. Consider the constraint corresponding to the first row of the matrix A:
This act can be repeated for each row of A (constraint) yielding m new variables
s1 , . . . , sm , which we can express as a row s. Then the new linear programming
problem can be expressed as:
z(x) =cT x
max
s.t. Ax + Im s = b
x, s ≥ 0
Using augmented matrices, we can express this as:
T
c x
max z(x) =
0 s
x
s.t. [A|Im ] =b
s
x
≥0
s
Clearly, this new linear programming problem is in standard form and any solution
maximizing the original problem will necessarily maximize this one. ■
■ Example 5.5 Consider the Toy Maker problem from Example 2.30. The problem
in canonical form is:
max z(x1 , x2 ) = 7x1 + 6x2
s.t. 3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
We can introduce slack variables s1 , s2 and s3 into the constraints (one for each
constraint) and re-write the problem as:
max z(x1 , x2 ) = 7x1 + 6x2
s.t. 3x1 + x2 + s1 = 120
x1 + 2x2 + s2 = 160
x1 + s3 = 35
x1 ≥ 0
x2 ≥ 0
Theorem 5.2 Every linear programming problem in standard form can be put into
canonical form.
Proof. Recall that Ax = b if and only if Ax ≤ b and Ax ≥ b. The second inequality
can be written as −Ax ≤ −b. This yields the linear programming problem:
z(x) =cT x
max
s.t. Ax ≤ b
(5.22)
− Ax ≤ −b
x≥0
Defining the appropriate augmented matrices allows us to convert this linear pro-
gramming problem into canonical form. ■
Exercise 5.7 Complete the “pedantic” proof of the preceding theorem by defining
the correct augmented matrices to show that the linear program in Expression
5.22 is in canonical form. ■
The standard solution method for linear programming models (the Simplex
Algorithm) assumes that all variables are non-negative. Though this assumption
can be easily relaxed, the first implementation we will study imposes this restriction.
The general linear programming problem we posed in Expression 5.15 does not
(necessarily) impose any sign restriction on the variables. We will show that we can
transform a problem in which xi is unrestricted into a new problem in which all
variables are positive. Clearly, if xi ≤ 0, then we simply replace xi by −yi in every
expression and then yi ≥ 0. On the other hand, if we have the constraint xi ≥ li ,
then clearly we can write yi = xi − li and yi ≥ 0. We can then replace xi by yi + li
in every equation or inequality where xi appears. Finally, if xi ≤ ui , but xi may be
negative, then we may write yi = ui − xi . Clearly, yi ≥ 0 and we can replace xi by
ui − yi in every equation or inequality where xi appears.
If xi is unrestricted in sign and has no upper or lower bounds, then let xi = yi − zi
where yi , zi ≥ 0 and replace xi by (yi − zi ) in the objective, equations and inequalities
of a general linear programming problem. Since yi , zi ≥ 0 and may be given any values
as a part of the solution, clearly xi may take any value in R.
Exercise 5.8 Convince yourself that the general linear programming problem
shown in Expression 5.15 can be converted into canonical (or standard) form using
the following steps:
1. Every constraint of the form xi ≤ ui can be dealt with by substituting
yi = ui − xi , yi ≥ 0.
2. Every constraint of the form li ≤ xi can be dealt with by substituting
yi = xi − li , yi ≥ 0.
3. If xi is unrestricted in any way, then we can variables yi and zi so that
xi = yi − zi where yi , zi ≥ 0.
4. Any equality constraints Hx = r can be transformed into inequality con-
straints.
Thus, Expression 5.15 can be transformed to standard form. [Hint: No hint, the
hint is in the problem.] ■
We can then use scalar multiplication and multiply the first row by (3/2) to obtain:
0 1
D = 4
1 3
We can then use scalar multiplication and addition to multiply the first row by
(−4/3) add it to the second row to obtain:
0 1
E=
1 0
Thus using elementary row operations, we have transformed the matrix A into the
matrix I2 . ■
Proof. We’ll show that scalar multiplication and row addition can be accomplished
by a matrix multiplication. In Exercise 5.9, you’ll be asked to complete the proof for
the other two elementary row operations.
This is simply the identity Im with an α in the (2, 1) position instead of 0. Now
consider EA. Let A·j = [a1j , a2j , . . . , amj ]T be the j th column of A. Then :
1 0 0 . . . 0 a1j a1j
α
1 0 . . . 0 a2j α(a1j ) + a2j
.... .. .. = .. (5.24)
. . . 0 . .
0 0 0 . . . 1 amj amj
That is, we have taken the first element of A·j and multiplied it by α and added it
to the second element of A·j to obtain the new second element of the product. All
other elements of A·j are unchanged. Since we chose an arbitrary column of A, it’s
clear this will occur in each case. Thus EA will be the new matrix with rows the
same as A except for the second row, which will be replaced by the first row of A
multiplied by the constant α and added to the second row of A. To multiply the ith
row of A and add it to the j th row, we would simply make a matrix E by starting
with Im and replacing the ith element of row j with α. ■
Exercise 5.9 Complete the proof by showing that scalar multiplication and row
swapping can be accomplished by a matrix multiplication. [Hint: Scalar multi-
plication should be easy, given the proof above. For row swap, try multiplying
matrix A from Example 5.6 by:
0 1
1 0
and see what comes out. Can you generalize this idea for arbitrary row swaps?] ■
Matrices of the kind we’ve just discussed are called elementary matrices. Theorem
5.3 will be important when we study efficient methods for solving linear programming
problems. It tells us that any set of elementary row operations can be performed
by finding the right matrix. That is, suppose I list 4 elementary row operations to
perform on matrix A. These elementary row operations correspond to for matrices
E1 , . . . , E4 . Thus the transformation of A under these row operations can be written
using only matrix multiplication as B = E4 · · · E1 A. This representation is much
simpler for a computer to keep track of in algorithms that require the transformation
of matrices by elementary row operations.
Definition 5.11 — Row Equivalence. Let A ∈ Rm×n and let B ∈ Rm×n . If there
is a sequence of elementary matrices E1 , . . . , Ek so that:
B = Ek · · · E1 A
then matrix A is said to be invertible (or nonsingular) and A−1 is called its
inverse. If A is not invertible, it is called a singular matrix.
Exercise 5.10 Find the equivalent elementary row operation matrices for Example
5.6. There should be five matrices E1 , . . . , E5 corresponding to the five steps
shown. Show that the product of these matrices (in the correct order) yields the
identity matrix. Now compute the product B = E5 · · · E1 . Show that B = A−1
[Hint: You’ve done most of the work.] ■
The proof of the following theorem is beyond the scope of this class.
■ Example 5.7 Again consider the matrix A from Example 5.6. We can follow the
steps in Algorithm 5 to compute A−1 .
Gauss-Jordan Elimination
Computing an Inverse
1. Let A ∈ Rn×n . Let X = [A|In ].
2. Let i := 1
3. If Xii = 0, then use row-swapping on X to replace row i with a row j (j > i)
so that Xii ̸= 0. If this is not possible, then A is not invertible.
4. Replace Xi· by (1/Xii )Xi· . Element (i, i) of X should now be 1.
−X
5. For each j ̸= i, replace Xj· by Xiiji Xi· + Xj· .
6. Set i := i + 1.
7. If i > n, then A has been replaced by In and In has been replaced by A−1
in X. If i ≤ n, then goto Line 3.
Step 1
1 2 1 0
X :=
3 4 0 1
Step 2 i := 1
Step 3 and 4 (i = 1) A11 = 1, so no swapping is required. Furthermore, replacing
X1· by (1/1)X1· will not change X.
Step 5 (i = 1) We multiply row 1 of X by −3 and add the result to row 2 of X
to obtain:
1 2 1 0
X :=
0 −2 −3 1
7 8 9
Ax = b (5.26)
x1 + 2x2 + 3x3 = 7
4x1 + 5x2 + 6x3 = 8
7x1 + 8x2 + 9x3 = 9
7 8 9
and right hand side vector b = [7, 8, 9]T . Applying Gauss-Jordan elimination in
this case yields:
1 0 −1 − 19
3
20
X := 0 1 2
3
(5.28)
0 0 0 0
Since the third row is all zeros, there are an infinite number of solutions. An easy
way to solve for this set of equations is to let x3 = t, where t may take on any value
in R. Then, row 2 of Expression 5.28 tells us that:
20 20 20
x2 + 2x3 = =⇒ x2 + 2t = =⇒ x2 = − 2t (5.29)
3 3 3
We then solve for x1 in terms of t. From row 1 of Expression 5.28 we have:
19 19 19
x1 − x3 = − =⇒ x1 − t = − =⇒ x1 = t − (5.30)
3 3 3
x1 + 2x2 + 3x3 = 7
4x1 + 5x2 + 6x3 = 8
7x1 + 8x2 + 9x3 = 10
The new right hand side vector is b = [7, 8, 20]T . Applying Gauss-Jordan elimination
in this case yields:
1 0 −1 0
X := 0 1 2 0
(5.32)
0 0 0 1
Since row 3 of X has a non-zero element in the b′ column, we know this problem
has no solution, since there is no way that we can find values for x1 , x2 and x3
satisfying:
x1 + 2x2 = 7
3x1 + 4x2 = 8
α 1 x1 + · · · + α m xm (5.34)
α 1 x1 + · · · + α m xm = 0 (5.36)
If the set of vectors x1 , . . . , xm is not linearly dependent, then they are linearly
independent and Equation 5.36 holds just in case αi = 0 for all i = 1, . . . , n.
Exercise 5.14 Consider the vectors x1 = [0, 0]T and x2 = [1, 0]T . Are these vectors
linearly independent? Explain why or why not. ■
0 1 1
We can show these vectors are linearly independent: Suppose there are values
α1 , α2 , α3 ∈ R such that
α1 x1 + α2 x2 + α3 x3 = 0
Then:
α1 α2 0 α1 + α2 0
α + 0 α = α + α = 0
1 3 1 3
0 α2 α3 α2 + α3 0
α1 +α2 =0
α1 + α3 = 0
α2 + α3 = 0
This is just a simple matrix equation, but note that the three vectors we are
focused on: x1 , x2 , and x3 , have become the columns of the matrix on the left-
hand-side. We can use Gauss-Jordan elimination to solve this matrix equation
yielding: α1 = α2 = α3 = 0. Thus these vectors are linearly independent. ■
R It is worthwhile to note that the zero vector 0 makes any set of vectors a
linearly dependent set.
3 6 9
are not linearly independent. [Hint: Following the example, create a matrix
whose columns are the vectors in question and solve a matrix equation with
right-hand-side equal to zero. Using Gauss-Jordan elimination, show that a zero
row results and thus find the infinite set of values solving the system.] ■
R So far we have only given examples and exercises in which the number of
vectors was equal to the dimension of the space they occupied. Clearly, we
could have, for example, 3 linearly independent vectors in 4 dimensional space.
We illustrate this case in the following example.
α1 + 4α2 = 0
α2 = 0
0α1 + 0α2 = 0
The last equation is tautological (true regardless of the values of α1 and α2 ). The
second equation implies α2 = 0. Using this value in first equation implies that
α1 = 0. This is the unique solution to the problem and thus the vectors are linearly
independent. ■
The following theorem is related to the example above. It’s proof is outside the
scope of the course. It should be taught in a Linear Algebra course (Math 436).
Proofs can be found in most Linear Algebra textbooks. Again, see [Str87] (Theorem
3.1) for a proof using vector spaces.
Theorem 5.5 Let x1 , . . . , xm ∈ Rn . If m > n, then the vectors are linearly dependent.
5.12 Basis
Definition 5.17 — Basis. Let X = {x1 , . . . , xm } be a set of vectors in Rn . The
set X is called a basis of Rn if X is a linearly independent set of vectors and
every vector in Rn is in the span of X . That is, for any vector w ∈ Rn we can
find scalar values α1 , . . . , αm such that
m
X
w= α i xi (5.37)
i=1
form a basis of R3 . We already know that the vectors are linearly independent. To
show that R3 is in their span, chose an arbitrary vector in Rm : [a, b, c]T . Then we
hope to find coefficients α1 , α2 and α3 so that:
a
α1 x1 + α2 x2 + α3 x3 =
b
which clearly has a solution for all a, b, and c. Another way of seeing this is to
note that the matrix:
1 1 0
A= 1 0 1
(5.40)
0 1 1
is invertible. ■
The following theorem on the size of a basis in Rn is outside the scope of this
course.
3 6 9
We will use the following lemma, which is related to the notion of the basis of
Rn when we come to our formal method for solving linear programming problems.
Lemma 5.1. Let {x1 , . . . , xm+1 } be a linearly dependent set of vectors in Rn and
let X = {x1 , . . . , xm } be a linearly independent set. Further assume that xm+1 ̸= 0.
Assume α1 , . . . , αm+1 are a set of scalars, not all zero, so that
m+1
X
αi xi = 0 (5.41)
i=1
As long as αj ̸= 0, then we can replace xj with xm+1 and still have a basis of
Rm .
Exercise 5.18 Prove the following theorem: In Rn every set of n linearly indepen-
dent vectors is a basis. [Hint: Let X = {x1 , . . . , xn } be the set. Use the fact that
α1 x1 + · · · + αn xn = 0 has exactly one solution.] ■
5.13 Rank
Definition 5.18 — Row Rank. Let A ∈ Rm×n . The row rank of A is the size of
the largest set of row (vectors) from A that are linearly independent.
Exercise 5.19 By analogy define the column rank of a matrix. [Hint: You don’t
need a hint.] ■
α1 a1 + · · · + αm am = 0
α1 a1 + · · · + αm am = 0
which is a contradiction.
Case 2: Suppose that the size of the largest set of linearly independent rows is
k. Denote such a set by S = {as1 , . . . , ask }. There are several possibilities: (i) Both
ai and aj are in this set. In this case, we simply replace our argument above with
constants α1 through αk and the result is the same.
(ii) ai and aj are not in this set, in which case we know that there are αs1 , . . . , αsk
and βs1 , . . . , βsk so that:
αs1 a1 + · · · + αsk ak = ai
βs1 a1 + · · · + βsk ak = aj
But this implies that αai + aj can also be written as a linear combination of the
elements of S and thus the rank of A′ is no larger than the rank of A.
(iii) Now suppose that ai in the set S. Then there are constants αs1 , . . . , αsk so
that:
αs1 a1 + · · · + αsk ak = aj
Again, this implies that αai + aj is still a linear combination of the elements of S
and so we cannot have increased the size of the largest linearly independent set of
vectors, nor could we have decreased it.
(iv) Finally, suppose that aj ∈ S. Again let aj = as1 . Then there are constants
αs1 , . . . , αsk
Apply Lemma 5.1 to replace aj in S with some other row vector al . If l = i, then we
reduce to sub-case (iii). If l ̸= i, then we reduce to sub-case (ii).
Finally, suppose we multiply a row by α. This is reduces to the case of multiplying
row i by α − 1 and adding it to row i, which is covered in the above analysis. This
completes the proof. ■
We have the following theorem, whose proof is again outside the scope of this
course. There are very nice proofs available in [Str87].
Theorem 5.8 If A ∈ Rm×n is a matrix, then the row rank of A is equal to the
column rank of A. Further, rank(A) ≤ min{m, n}.
Lastly, we will not prove the following theorem, but it should be clear from all
the work we have done up to this point.
Definition 5.19 Suppose that A ∈ Rm×n and let m ≤ n. Then A has full row
rank if rank(A) = m.
7 8 9
By now you should suspect that it does not have full row rank. Recall that the
application of Gauss-Jordan elimination transforms A into the matrix
1 0 −1
′
A = 0 1 2
0 0 0
No further transformation is possible. It’s easy to see that the first two rows of A′
are linearly independent. (Note that the first row vector has a non-zero element in
its first position and zero in it’s second position, while the second row vector has a
non-zero element in the second position and a zero element in the first position.
Because of this, it’s impossible to find any non-zero linear combination of those
vectors that leads to zero.) Thus we conclude the matrix A has the same rank as
matrix A′ which is 2. ■
Exercise 5.20 Change one number in matrix A in the preceding example to create
a new matrix B that as full row rank. Show that your matrix has rank 3 using
Gauss-Jordan elimination. ■
Ax = b (5.46)
has more variables than equations and is underdetermined and if A has full row
rank then the system will have an infinite number of solutions. We can formulate an
expression to describe this infinite set of solutions.
Sine A has full row rank, we may choose any m linearly independent columns of
A corresponding to a subset of the variables, say xi1 , . . . , xim . We can use these to
form the matrix
independent column vectors. We can then use elementary column operations to write
the matrix A as:
A = [B|N] (5.48)
where the vector xB are the variables corresponding to the columns in B and
the vector xN are the variables corresponding to the columns of the matrix N.
Definition 5.20 — Basic Variables. For historical reasons, the variables in the
vector xB are called the basic variables and the variables in the vector xN are
called the non-basic variables.
We can use matrix multiplication to expand the left hand side of this expression
as:
The fact that B is composed of all linearly independent columns implies that
applying Gauss-Jordan elimination to it will yield an m × m identity and thus that B
is invertible. We can solve for basic variables xB in terms of the non-basic variables:
We can find an arbitrary solution to the system of linear equations by choosing values
for the variables the non-basic variables and solving for the basic variable values
using Equation 5.51.
xB = B−1 b (5.52)
We then solvea :
−19
x 7
1 = B−1 = 3
20 (5.55)
x2 8 3
aThanks to Doug Mercer, who found a typo below that was fixed.
Exercise 5.21 Find the two other basic solutions in Example 5.13 corresponding to
2 3
B=
5 6
and
1 3
B=
4 6
In each case, determine what the matrix N is. [Hint: Find the solutions any way
you like. Make sure you record exactly which xi (i ∈ {1, 2, 3}) is equal to zero in
each case.] ■
z(x) =cT x
min
s.t. Ax ≤ b
Hx = r (5.56)
x≥l
x≤u
4. H,
5. r,
6. l,
7. u
If there are no inequality constraints, then we set A = [] and b = [] in Matlab; i.e., A
and b are set as the empty matrices. A similar requirement holds on H and r if there
are no equality constraints. If some decision variables have lower bounds and others
don’t, the term -inf can be used to set a lower bound at −∞ (in l). Similarly, the
term inf can be used if the upper bound on a variable (in u) is infinity. The easiest
way to understand how to use Matlab is to use it on an example.
■ Example 5.14 Suppose I wish to design a diet consisting of Raman noodles and
ice cream. I’m interested in spending as little money as possible but I want to
ensure that I eat at least 1200 calories per day and that I get at least 20 grams of
protein per day. Assume that each serving of Raman costs $1 and contains 100
calories and 2 grams of protein. Assume that each serving of ice cream costs $1.50
and contains 200 calories and 3 grams of protein.
We can construct a linear programming problem out of this scenario. Let x1
be the amount of Raman we consume and let x2 be the amount of ice cream we
consume. Our objective function is our cost:
x1 + 1.5x2 (5.57)
x1 + 2x2 ≥ 12 (5.59)
We can look at the various results from using Expression 5.51 and Definition 5.21.
Let:
2 3 −1 0 20
A= b= (5.62)
1 2 0 −1 12
Suppose we wish to have xB = [s1 s2 ]T . Then we would transform the matrix as:
−2 −3 1 0 −20
−1 −2 0 1 −12
This is not an accident, it’s because we started with a negative identity matrix
inside the augmented matrix. The point x1 = 4, x2 = 4 is a point of intersection,
shown in Figure 5.1. It also happens to be one of the alternative optimal solutions
of this problem. Notice in Figure 5.1 that the level curves of the objective function
are parallel to one of the sides of the boundary of the feasible region.
Matlab Solution
Figure 5.1: The feasible region for the diet problem is unbounded and there are
alternative optimal solutions, since we are seeking a minimum, we travel in the
opposite direction of the gradient, so toward the origin to reduce the objective
function value. Notice that the level curves hit one side of the boundary of the
feasible region.
Notice there are now columns of the identity matrix in the columns corresponding
to s1 and x1 . That’s how we know we’re solving for s1 and x2 . We have x1 = 12
and s1 = 4. By definition x1 = s2 = 0. This corresponds to the point x1 = 12, x2 = 0
shown in Figure 5.1.
Let’s use Matlab to solve this problem. Our original problem is:
min
x1 + 1.5x2
s.t. 2x1 + 3x2 ≥ 20
x1 + 2x2 ≥ 12
x1 , x 2 ≥ 0
− x1 − 2x2 ≤ −12
x1 , x 2 ≥ 0
Then we have:
1
c=
1.5
−2 −3
A=
−1 −2
−20
b=
−12
H = r = []
0
l = u = []
0
Figure 5.2: Matlab input for solving the diet problem. Note that we are solving a
minimization problem. Matlab assumes all problems are mnimization problems, so
we don’t need to multiply the objective by −1 like we would if we started with a
maximization problem.
Exercise 5.22 In previous example, you could also have just used the problem in
standard form with the surplus variables and had A = b = [] and defined H and r
instead. Use Matlab to solve the diet problem in standard form. Compare your
results to Example 5.14 ■
5.16 Introduction
General LP Objective: minf (x); x ∈ F ⊂ R where F is the set of feasible solutions.
There are many types of problems that are included in linear programming. When a
problem can be put in terms of an objective function and competing constraints, it
might be solved with LP. Although we can’t solve sorting with LP, if you go to a
desert island and can only take one algorithm with you, take an LP solver1 . Here is
an example LP problem:
Example 1: Min-conductance of G
|∂S| P
ϕ(G) = mins∈∅,vol(s)≤
/ 1
vol(v) vol(s) where vol(s) = i∈s d
2
3. xi · di ≤ 12 di
P P
Remarks:
1. maxf (x) = −min[−f (x)]
2. A1 x ≥ b ⇔ (−A1 )x ≤ −b1
3. A1 x = b1 ⇔ A1 x ≤ b1 & A1 x ≥ b1
So we can already see how the same problem can be written in different ways. The
general goal of the chapter is to be able to solve equations of this form via an
algorithm that works in polynomial time.
LP Formulation:
max (s,u)∈E fs,u − (u,s)∈E fu,s = |f | s.t.
P P
Although we can put a max flow problem into an LP solver, that doesn’t mean
that we’ll get as good of a runtime as some of the other algorithms we discussed.
At first glance this may appear easier than what we saw earlier. We’re just
looking at solutions to a system of linear equations where x is positive. However, it
turns out that the two problems are equivalent.
Reduction from general form to standard form:
We have two things to address. First we are now allowing only positive variables,
and second, now Ax = b is an equality. The reduction proceeds in two steps.
1. To accomodate the desire for all non-negative variables, for all variables xi ∈ R
−
in the general form, we create two new variables x+ i and xi with the definition
− −
xi ≜ x+ +
i − xi and xi , xi ≥ 0. Since any real number can be written as the
difference of two other positive real numbers, this is a safe step to take.
−
2. Plugging this into the general form would imply Ai x ≥ b ⇒ Ai (x+ i − xi ) ≥ b,
+ −
which we want now to be an equality. If we need Ai (xi − xi ) ≥ b this is
−
equivalent to saying we need Ai (x+ i − xi ) − ξi = b for some non-negative ξi .
−
This motivates the introduction of slack variables ξi ≜ Ai (x+ i − xi ) − b where
we constrain ξi ≥ 0.
The slack variables tell us how far off we are from the previous inequalities in
general form. If ξi = 0, we call the ith constraint tight. We have introduced a bunch
of equations which will be captured in the new Ax = b 2 . Note that our x, A and b in
Ax = b will not be the same as were for the general form, otherwise we could easily
see that the mathematical equivalence breaks down. Rather these objects will grow
−
in dimension. In particular x will now contain x+ i , xi , and ξi . The c in the objective
function may also grow, and in general the objective function will not look the same.
However, the key point is that we will guarantee that an optimal solution to the
new linear program will yield an optimal solution to the original. More specifically,
for each feasible solution x in the general form with objective value v, there is a
corresponding feasible solution x′ in standard form with the same objective value v
and vice versa.
2 Except for the non-negativity constraint on the variables which may be implicitly taken care of.
We proceed to get some intuition for what F looks like, meanwhile ignoring the
objective for now. First, recall we need to satisfy Ai x ≥ bi , ∀i . To break this down
we can think about A1 · x ≥ b1 where A1 is a vector of the first row of A and b1 is just
a real number. For 2 dimensional x the solutions lie on one side of a line which A1 is
perpendicular to, for 3 dimensions the solutions are confined to one side of a plane,
and in general the solutions will be a half-space on one side of a hyperplane with
A1 normal to the plane. To see why A1 is normal, consider the simplified A1 x ≥ 0.
We’ve defined a hyperplane through the origin and if x sits on the hyperplane, we
should get 0, thus A1 would need to be perpendicular to x and the hyperplane.
Because we must satisfy not just A1 x ≥ b1 but all Ai x ≥ bi , F will have to sit
inside all of these half-spaces or a polytope.
x2
4
2
x1 ≤ 2
1 x1 + x2 ≥ 1
x1 − x2 ≥ −1
0 x1
-1 0 1 2 3 4 5
x2 ≥ 0
-1
-2
-3
5.19.2 *Algorithm
This is really more of a mathematical procedure than an algorithm. We can take
a v ∈ R which is a guess that hopefully doesn’t over estimate the optimal value v ∗ .
Now we have cT x = v, which by virtue of fixing v constrains x to lie somewhere in
a hyperplane we’ll call Hv . Now, if v < v ∗ , Hv ∩ F , otherwise v ∗ would not be the
optimum. So then we may progress by guessing v + ϵ, effectively moving Hv up a
little bit. After some iterations, we will see Hv ∩ F , which would put v within ϵ of
v ∗ , and we are done.
5.19.3 Remarks
To think a bit more about this geometrically, c is normal to Hv and governs its
orientation. In the two dimensional figure above Hv would be a line with slope
perpendicular to c; as the line moves up in the x1 x2 plane it will eventually intersect
the yellow region F . c drives x in some particular direction for which F only permits
us to go so far.
In general the intersection with F will occur at a vertex of the polytope, which
will become useful because this vertex is the solution to some system of equations.
This motivates our interest in solving systems of equations, which we now turn to.
5.20.1 Example
Here we could solve the first equation for x1 , and then plug into the second
equation to eliminate x1 from it. Then we could solve the second equation for x2
and so on3 until we get to the nth equation, which we could then solve for xn and
plug back in to get the rest.
5.20.2 Remarks
First we wish to understand the complexity of the solution. Suppose we even start
with all integers defining the system, how many bits do we need to represent x? In
particular, we’d like to prove that the number of bits is not exponential in n because
if it is any algorithm to solve the system would also be exponential in runtime
simply because of the description complexity of the output. We’d also like to better
understand the cases when Gaussian Elimination fails.
X = {x ∈ Rn : Ax ≤ b, x ≥ 0} (6.1)
be a polyhedral set over which we will maximize the objective function z(x1 , . . . , xn ) =
cT x, where c, x ∈ Rn . That is, we will focus on the linear programming problem:
cT x
max
P s.t. Ax ≤ b (6.2)
x≥0
Theorem 6.1 If Problem P has an optimal solution, then Problem P has an optimal
extreme point solution.
Proof. Applying the Cartheodory Characterization theorem, we know that any point
x ∈ X can be written as:
k
X l
X
x= λi xi + µi di (6.3)
i=1 i=1
where x1 , . . . xk are the extreme points of X and d1 , . . . , dl are the extreme directions
of X and we know that
k
X
λi = 1
i=1 (6.4)
λi , µi ≥ 0 ∀i
We can rewrite problem P using this characterization as:
k l
λi cT xi + µi cT di
X X
max
i=1 i=1
Xk (6.5)
s.t. λi = 1
i=1
λi , µi ≥ 0 ∀i
If there is some i such that cT di > 0, then we can simply choose µi as large as we
like, making the objective as large as we like, the problem will have no finite solution.
Therefore, assume that cT di ≤ 0 for all i = 1, . . . , l (in which case, we may simply
choose µi = 0, for all i). Since the set of extreme points x1 , . . . xk is finite, we can
simply set λp = 1 if cT xp has the largest value among all possible values of cT xi ,
i = 1, . . . , k. This is clearly the solution to the linear programming problem. Since xp
is an extreme point, we have shown that if P has a solution, it must have an extreme
point solution. ■
Corollary 6.1 Problem P has a finite solution if and only if cT di ≤ 0 for all
i = 1, . . . l when d1 , . . . , dl are the extreme directions of X.
Corollary 6.2 Problem P has alternative optimal solutions if there are at least
two extreme points xp and xq so that cT xp = cT xq and so that xp is the extreme
point solution to the linear programming problem.
Proof. Suppose that xp is the extreme point solution to P identified in the proof of
the theorem. Suppose xq is another extreme point solution with cT xp = cT xq . Then
every convex combination of xp and xq is contained in X (since X is convex). Thus
every x with form λxp + (1 − λ)xq and λ ∈ [0, 1] has objective function value:
λcT xp + (1 − λ)cT xq = λcT xp + (1 − λ)cT xp = cT xp
which is the optimal objective function value, by assumption. ■
min cT x
s.t. Ax ≤ b (6.6)
x≥0
has a finite optimal solution if (and only if) cT dj ≥ 0 for k = 1, . . . , l. [Hint: Modify
the proof above using the Cartheodory characterization theorem.] ■
X = {x ∈ Rn : Ax = b, x ≥ 0} (6.7)
Our work in the previous sections shows that this is possible. Recall we can separate
A into an m × m matrix B and an m × (n − m) matrix N and we have the result:
We know that B is invertible since we assumed that A had full row rank. If we
assume that xN = 0, then the solution
xB = B−1 b (6.9)
was called a basic solution (See Definition 5.21.) Clearly any basic solution satisfies
the constraints Ax = b but it may not satisfy the constraints x ≥ 0.
Let J be the set of indices of non-basic variables. Then we can write Equation
6.11 as:
X
z(x1 , . . . , xn ) = cTB B−1 b + cj − cTB B−1 A·j xj (6.12)
j∈J
Consider now the fact xj = 0 for all j ∈ J . Further, we can see that:
∂z
= cj − cTB B−1 A·j (6.13)
∂xj
This means that if cj − cTB B−1 A·j > 0 and we increase xj from zero to some new
value, then we will increase the value of the objective function. For historic reasons,
we actually consider the value cTB B−1 A·j − cj , called the reduced cost and denote it
as:
∂z
− = zj − cj = cTB B−1 A·j − cj (6.14)
∂xj
By Equation 6.8 we know that the only current basic variables will be affected by
increasing xj . Let us focus explicitly on Equation 6.8 where we include only variable
xj (since all other non-basic variables are kept zero). Then we have:
Let b = B−1 b be an m × 1 column vector and let that aj = B−1 A·j be another m × 1
column. Then we can write:
x B = b − a j xj (6.16)
x B i = b i − aj i x j (6.18)
bi
0 = bi − aji xj =⇒ xj = (6.19)
aj i
Thus, the largest possible value we can assign xj and ensure that all variables remain
positive is:
( )
bi
min : i = 1, . . . , m and aji > 0 (6.20)
aj i
Expression 6.20 is called the minimum ratio test. We are interested in which index i
is the minimum ratio.
Suppose that in executing the minimum ratio test, we find that xj = bk /ajk .
The variable xj (which was non-basic) becomes basic and the variable xBk becomes
non-basic. All other basic variables remain basic (and positive). In executing this
procedure (of exchanging one basic variable and one non-basic variable) we have
moved from one extreme point of X to another.
Theorem 6.3 If zj − cj ≥ 0 for all j ∈ J , then the current basic feasible solution is
optimal.
Proof. We have already shown in Theorem 6.1 that if a linear programming problem
has an optimal solution, then it occurs at an extreme point and we’ve shown in
Theorem 6.2 that there is a one-to-one correspondence between extreme points and
basic feasible solutions. If zj − cj ≥ 0 for all j ∈ J , then ∂z/∂xj ≤ 0 for all non-basic
variables xj . That is, we cannot increase the value of the objective function by
increasing the value of any non-basic variable. Thus, since moving to another basic
feasible solution (extreme point) will not improve the objective function, it follows
we must be at the optimal solution. ■
Proof. The fact that zj − cj < 0 implies that increasing xj will improve the value of
the objective function. Since aji < 0 for all i = 1, . . . , m, we can increase xj indefinitely
without violating feasibility (no basic variable will ever go to zero). Thus the objective
function can be made as large as we like. ■
R We should note that in executing the exchange of one basic variable and
one non-basic variable, we must be very careful to ensure that the resulting
basis consist of m linearly independent columns of the original matrix A. The
conditions for this are provided in Lemma 5.1. Specifically, we must be able
to write the column corresponding to xj , the entering variable, as a linear
combination of the columns of B so that:
α1 b1 + . . . αm bm = A·j (6.21)
Baj = A·j
which shows how to write the column A·j as a linear combination of the
columns of B.
Exercise 6.2 Consider the linear programming problem given in Exercise 6.1.
Under what conditions should a non-basic variable enter the basis? State and
prove an analogous theorem to Theorem 6.3 using your observation. [Hint: Use
the definition of reduced cost. Remember that it is −∂z/∂xj .] ■
■ Example 6.1 Consider the Toy Maker Problem (from Example 2.30). The linear
We can convert this problem to standard form by introducing the slack variables
s1 , s2 and s3 :
max z(x1 , x2 ) = 7x1 + 6x2
s.t. 3x1 + x2 + s1 = 120
x1 + 2x2 + s2 = 160
x1 + s3 = 35
x1 , x 2 , s 1 , s 2 , s 3 ≥ 0
0 0 1 1 0
and
120 3 1
−1 −1
160 B N = 1 2
B b=
35 1 0
Therefore:
h i h i
cTB B−1 b = 0 cTB B−1 N = 0 0 cTB B−1 N − cN = −7 −6
and therefore:
∂z ∂z
= 7 and =6
∂x1 ∂x2
Based on this information, we could chose either x1 or x2 to enter the basis and
the value of the objective function would increase. If we chose x1 to enter the basis,
then we must determine which variable will leave the basis. To do this, we must
investigate the elements of B−1 A·1 and the current basic feasible solution B−1 b.
Since each element of B−1 A·1 is positive, we must perform the minimum ratio
test on each element of B−1 A·1 . We know that B−1 A·1 is just the first column of
B−1 N which is:
3
−1
B A·1 =
1
0 0 1 1 0
Note we have simply swapped the column corresponding to x1 with the column
corresponding to s3 in the basis matrix B and the non-basic matrix N. We will do
this repeatedly in the example and we recommend the reader keep track of which
variables are being exchanged and why certain columns in B are being swapped
with those in N.
Using the new B and N matrices, the derived matrices are then:
15 −3 1
−1 −1
125 B N = −1 2
B b=
35 1 0
Based on this information, we can only choose x2 to enter the basis to ensure
that the value of the objective function increases. We can perform the minimum
ration test to figure out which basic variable will leave the basis. We know that
B−1 A·2 is just the second column of B−1 N which is:
1
−1
B A·2 =
2
0 0 1 1 0
16 −1/5 2/5
Since the reduced costs are now positive, we can conclude that we’ve obtained an
optimal solution because no improvement is possible. The final solution then is:
x2 72
∗ −1
xB = s3 = B b = 19
x1 16
Figure 6.1: The Simplex Algorithm: The path around the feasible region is shown
in the figure. Each exchange of a basic and non-basic variable moves us along an
edge of the polygon in a direction that increases the value of the objective function.
Exercise 6.3 Assume that a leather company manufactures two types of belts:
regular and deluxe. Each belt requires 1 square yard of leather. A regular belt
requires 1 hour of skilled labor to produce, while a deluxe belt requires 2 hours of
labor. The leather company receives 40 square yards of leather each week and a
total of 60 hours of skilled labor is available. Each regular belt nets $3 in profit,
while each deluxe belt nets $5 in profit. The company wishes to maximize profit.
1. Ignoring the divisibility issues, construct a linear programming problem
whose solution will determine the number of each type of belt the company
should produce.
2. Use the simplex algorithm to solve the problem you stated above remember-
ing to convert the problem to standard form before you begin.
3. Draw the feasible region and the level curves of the objective function. Verify
that the optimal solution you obtained through the simplex method is the
point at which the level curves no longer intersect the feasible region in the
direction following the gradient of the objective function.
cT x
max
P s.t. Ax = b
x≥0
max z
s.t. z − cTB xB − cTN xN = 0
(6.22)
BxB + NxN = b
xB , xN ≥ 0
Here 0 is the vector of zeros of appropriate size. This equation can be written as:
z + 0T xB + cTB B−1 N − cTN xN = cTB B−1 b (6.26)
We can now represent this set of equations as a large matrix (or tableau):
z xB xN RHS
z 1 0 cTB B−1 N − cTN cTB B−1 b Row 0
xB 0 1 B−1 N B−1 b Rows 1 through m
first row of the (m + 1) × (m + 1) identity matrix, the reduced costs of the non-basic
variables and the current objective function values. The remainder of the rows consist
of the rest of the (m + 1) × (m + 1) identity matrix, the matrix B−1 N and B−1 b the
current non-zero part of the basic feasible solution.
This matrix representation (or tableau representation) contains all of the infor-
mation we need to execute the simplex algorithm. An entering variable is chosen
from among the columns containing the reduced costs and matrix B−1 N. Naturally,
a column with a negative reduced cost is chosen. We then chose a leaving variable
by performing the minimum ratio test on the chosen column and the right-hand-side
(RHS) column. We pivot on the element at the entering column and leaving row and
this transforms the tableau into a new tableau that represents the new basic feasible
solution.
■ Example 6.2 Again, consider the toy maker problem. We will execute the simplex
algorithm using the tableau method. Our problem in standard form is given as:
max z(x1 , x2 ) = 7x1 + 6x2
s.t. 3x1 + x2 + s1 = 120
x1 + 2x2 + s2 = 160
x1 + s3 = 35
x1 , x 2 , s 1 , s 2 , s 3 ≥ 0
We can assume our initial basic feasible solution has s1 , s2 and s3 as basic variables
and x1 and x2 as non-basic variables. Thus our initial tableau is simply:
z x1 x2 s1 s2 s3 RHS
z
1 −7 −6 0 0 0 0
s1
0 3 1 1 0 0 120 (6.28)
s2
0 1 2 0 1 0 160
s3 0 1 0 0 0 1 35
Note that the columns have been swapped so that the identity matrix is divided
and B−1 N is located in columns 2 and 3. This is because of our choice of basic
variables. The reduced cost vector is in Row 0.
Using this information, we can see that either x1 or x2 can enter. We can
compute the minimum ratio test (MRT) next to the RHS column. If we chose x2
as the entering variable, then the MRT tells us s2 will leave. We put a box around
the element on which we will pivot:
z x1 x2 s1 s2 s3 RHS MRT (x2 )
z
1 −7 −6 0 0 0 0
s1
0 3 1 1 0 0 120 120 (6.29)
s2
0 1 2 0 1 0 160 80
s3 0 1 0 0 0 1 35 −
This process will correctly compute the new reduced costs and B−1 matrix as well
as the new cost information. The new tableau becomes:
z x1 x2 s1 s2 s3 RHS
z
1 −4 0 0 3 0 480
s1
0 2.5 0 1 −0.5 0 40
(6.31)
x2
0 0.5 1 0 0.5 0 80
s3 0 1 0 0 0 1 35
We can see that x1 is a valid entering variable, as it has a negative reduced cost
(−4). We can again place the minimum ratio test values on the right-hand-side of
the matrix to obtain:
z x1 x2 s1 s2 s3 RHS MRT (x1 )
z
1 −4 0 0 3 0 480
s1
0 2.5 0 1 −0.5 0 40
16 (6.32)
x2
0 0.5 1 0 0.5 0 80
160
s3 0 1 0 0 0 1 35 35
We now pivot on the element we have boxed to obtain the new tableaua :
z x1 x2 s 1 s2 s3 RHS
z
1 0 0 1.6 2.2 0 544
x1
0 1 0 0.4 −0.2 0 16
(6.33)
x2
0 0 1 −0.2 0.6 0 72
s3 0 0 0 −0.4 0.2 1 19
All the reduced costs of the non-basic variables (s1 and s2 ) are positive and so this
is the optimal solution to the linear programming problem. We can also see that
this solution agrees with our previous computations on the Toy Maker Problem. ■
aThanks to Ethan Wright for catching a typo here.
This condition occurs when a variable xj should enter the basis because ∂z/∂xj >
0 and there is no blocking basis variable. That is, we can arbitrarily increase
the value of xj without causing any variable to become negative. We give an
example:
2x1 + x2 ≥ 6
x1 , x 2 ≥ 0
We can convert this problem into standard form by adding a slack variable s1 and
a surplus variable s2 :
max z(x1 , x2 ) = 2x1 − x2
s.t. x 1 − x2 + s 1 = 1
2x1 + x2 − s2 = 6
x1 , x 2 , s 1 , s 2 ≥ 0
We have both slack and surplus variables, so the case when x1 = x2 = 0 is not a
valid initial solution. We can chose a valid solution based on our knowledge of the
problem. Assume that s1 = s2 = 0 and so we have:
1 −1 1 0
B= N=
2 1 0 −1
This yields:
7/3 1/3 −1/3
B−1 b = B−1 N =
4/3 −2/3 −1/3
We see that s2 should enter the basis because cB B−1 A·4 − c4 < 0. But the
column corresponding to s2 in the tabluau is all negative. Therefore there is no
minimum ratio test. We can let s2 become as large as we like and we will keep
increasing the objective function without violating feasibility.
What we have shown is that the ray with vertex
7/3
4/3
x0 =
0
0
and direction:
1/3
1/3
d=
0
1
is entirely contained inside the polyhedral set defined by Ax = b. This can be see
from the fact that:
We know that
1/3
−B−1 A·4 =
1/3
We will be increasing s2 (which acts like λ in the definition of ray) and leaving s1
equal to 0. It’s now easy to see that the ray we described is contained entirely in
the feasible region. This is illustrated in the original constraints in Figure 6.2.
x1 − x2 = 1
2x1 + x2 = 6
Extreme direction
Based on our previous example, we have the following theorem that extends Theorem
6.4:
Theorem 6.5 In a maximization problem, if aji ≤ 0 for all i = 1, . . . , m, and
zj − cj < 0, then the linear programming problem is unbounded furthermore, let
aj be the j th column of B−1 A·j and let ek be a standard basis column vector in
Rm×(n−m) where k corresponds to the position of j in the matrix N. Then the
direction:
−aj
d= (6.35)
ek
Proof. The fact that d is a direction is easily verified by the fact there is an extreme
point x = [xB xN ]T and for all λ ≥ 0 we have:
x + λd ∈ X (6.36)
Thus it follows from the proof of Theorem 1.9 that Ad ≤ 0. The fact that d ≥ 0 and
d ̸= 0 follows from our assumptions. Now, we know that we can write A = [B|N].
Further, we know that aj = B−1 A·j . Let us consider Ad:
−aj
Ad = [B|N] = −BB−1 A·j + Nek (6.37)
ek
Remember, ek is the standard basis vector that has have 1 precisely in the position
corresponding to column A·j in matrix N, so A·j = Nej . Thus we have:
−BB−1 A·j + Nek = −A·j + A·j = 0 (6.38)
2x1 + x2 − s2 = 6
x1 , x 2 , s 1 , s 2 ≥ 0
Using the rule you developed in Exercise 6.2, show that the minimization problem
has an unbounded feasible solution. Find an extreme direction for this set. [Hint:
The minimum ratio test is the same for a minimization problem. Execute the
simplex algorithm as we did in Example 6.3 and use Theorem 6.5 to find the
extreme direction of the feasible region.] ■
= B−1 b − aj xj
xB
" ( )#
bi
xj ∈ 0, min : i = 1, . . . , m, aji > 0 (6.39)
aj i
xr = 0, ∀r ∈ J , r ̸= j
Proof. It follows from the proof of Theorem 6.3 that the solution must be optimal as
∂z/∂xj ≤ 0 for all j ∈ J and therefore increasing and xj will not improve the value
of the objective function. If there is some j ∈ J so that zj − cj = 0, then ∂z/∂xj = 0
and we may increase the value of xj up to some point specified by the minimum ratio
test, while keeping other non-basic variables at zero. In this case, we will neither
increase nor decrease the objective function value. Since that objective function
value is optimal, it follows that the set of all such values (described in Equation 6.39)
are alternative optimal solutions. ■
■ Example 6.4 Let us consider the toy maker problem again from Example 2.30
Now consider the penultimate basis from Example 6.1 in which we had as basis
variables x1 , s2 and x2 .
x1 18
s 0
1 cB = 6 cN =
xB =
x2 xN =
s3
0
s2 0
Unlike example 6.1, the reduced cost for s3 is 0. This means that if we allow
s3 to enter the basis, the objective function value will not change. Performing the
minimum ratio test however, we see that s2 will still leave the basis:
z x1 x2 s1 s2 s3 RHS MRT (s3 )
z
1 0 0 6 0 0 720
x1
0 1 0 0 0 1 35
35 (6.42)
x2
0 0 1 1 0 −3 15 −
s2 0 0 0 −2 1 5 95 19
s3 ∈ [0, 19]
x1 35 1
(6.43)
x = 15 − −3 s3
2
s2 95 5
Figure 6.3: Infinite alternative optimal solutions: In the simplex algorithm, when
zj − cj ≥ 0 in a maximization problem with at least one j for which zj − cj = 0,
indicates an infinite set of alternative optimal solutions.
Exercise 6.5 Consider the diet problem we covered in Example 5.14. I wish
to design a diet consisting of Raman noodles and ice cream. I’m interested in
spending as little money as possible but I want to ensure that I eat at least 1200
calories per day and that I get at least 20 grams of protein per day. Assume that
each serving of Raman costs $1 and contains 100 calories and 2 grams of protein.
Assume that each serving of ice cream costs $1.50 and contains 200 calories and 3
grams of protein.
1. Develop a linear programming problem that will help me minimize the cost
of my food intake.
2. Remembering to transform the linear programming problem you found above
into standard form, use the simplex algorithm to show that this problem
has an infinite set of alternative optimal solutions.
3. At an optimal extreme point, find an expression for the set of infinite
alternative optimal exteme points like the one shown in Equation 6.43.
4. Plot the feasible region and the level curves of the objective function. High-
light the face of the polyhedral set on which the alternative optimal solutions
can be found.
■
■ Example 6.5 Consider the modified form of the toy maker problem originally
stated in Example 1.38:
max 7x1 + 6x2
s.t. 3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35 (6.44)
7
x1 + x2 ≤ 100
4
x1 , x 2 ≥ 0
The polyhedral set and level curves of the objective function are shown Figure 6.4.
Figure 6.4: An optimization problem with a degenerate extreme point: The optimal
solution to this problem is still (16, 72), but this extreme point is degenerate, which
will impact the behavior of the simplex algorithm.
From this, we see that the variable s3 should enter (because its reduce cost
is negative). In this case, there is a tie for the leaving variables: we see that
95/5 = 19 = (95/4)/(5/4), therefore, either s2 or s4 could be chosen as the leaving
variable. This is because we will move to a degenerate extreme point when s3
enters the basis.
Suppose we choose s4 as the leaving variable. Then our tableau will become:
z x1 x2 s1 s2 s3 s4 RHS MRT (s1 )
z
1 0 0 −14/5 0 0 44/5 544
x1 0 1 0 4/5 0 0 −4/5 16 20
(6.47)
x2
0 0 1 −7/5 0 0 12/5 72
−
s2
0 0 0 2 1 0 −4 0
0
s3 0 0 0 −4/5 0 1 4/5 19 −
Notice the objective function value cB B−1 b has not changed, because we
really have not moved to a new extreme point. We have simply changed from one
representation of the degenerate extreme point to another. This was to be expected,
the fact that the minimum ratio was zero showed that we could not increase s1
and maintain feasibility. As such s1 = 0 in the new basic feasible solution. The
reduced cost vector cB B−1 N − cN has changed and we could now terminate the
simplex method. ■
aThe minimum ratio test still applies when bj = 0. In this case, we will remain at the same
extreme point.
Proof. At any basic feasible solutions we have chosen m variables as basic. This
basic feasible solution satisfies BxB = b and thus provides m binding constraints.
The remaining variables are chosen as non-basic and set to zero, thus xN = 0, which
provides n − m binding constraints on the non-negativity constraints (i.e., x ≥ 0).
If there is a basic variable that is zero, then an extra non-negativity constraint is
binding at that extreme point. Thus n + 1 constraints are binding and, by definition,
Exercise 6.6 State the simplex algorithm in Tableau Form. [Hint: Most of the
simplex algorithm is the same, simply add in the row-operations executed to
compute the new reduced costs and B−1 N.] ■
Theorem 6.8 If the feasible region of Problem P has no degenerate extreme points,
then the simplex algorithm will terminate in a finite number of steps with an
optimal solution to the linear programming problem.
Sketch of Proof. In the absence of degeneracy, the value of the objective function
improves (increases in the case of a maximization problem) each time we exchange a
basic variable and non-basic variable. This is ensured by the fact that the entering
variable always has a negative reduced cost. There are a finite number of extreme
points for each polyhedral set, as shown in Lemma 1.6. Thus, the process of moving
from extreme point to extreme point of X, the polyhedral set in Problem P must
terminate with the largest possible objective function value. ■
discuss the issue of finding an initial basic feasible solution to start execution of the
Simplex Algorithm.
max cT x
s.t. Ax ≤ b
x≥0
max cT x
s.t. Ax + Im xs = b
x, xs ≥ 0
where xs are slack variables, one for each constraint. If b ≥ 0, then our initial basic
feasible solution can be x = 0 and xs = b (that is, our initial basis matrix is B = Im ).
We have also explored small problems where a graphical technique could be used to
identify an initial extreme point of a polyhedral set and thus an initial basic feasible
solution for the problem.
Suppose now we wish to investigate problems in which we do not have a problem
structure that lends itself to easily identifying an initial basic feasible solution. The
simplex algorithm requires an initial BFS to begin execution and so we must develop
a method for finding such a BFS.
For the remainder of this chapter we will assume, unless told otherwise, that we
are interested in solving a linear programming problem provided in Standard Form.
That is:
cT x
max
P s.t. Ax = b (6.49)
x≥0
and that b ≥ 0. Clearly our work in Chapter 3 shows that any linear programming
problem can be put in this form.
Suppose to each constraint Ai· x = bi we associate an artificial variable xai . We
can replace constraint i with:
Since bi ≥ 0, we will require xai ≥ 0. If xai = 0, then this is simply the original
constraint. Thus if we can find values for the ordinary decision variables x so that
xai = 0, then constraint i is satisfied. If we can identify values for x so that all the
artificial variables are zero and m variables of x are non-zero, then the modified
constraints described by Equation 6.50 are satisfied and we have identified an initial
basic feasible solution.
e T xa
min
P1 s.t. Ax + Im xa = b (6.51)
x, xa ≥ 0
R We can see that the artificial variables are similar to slack variables, but they
should have zero value because they have no true meaning in the original
problem P . They are introduced artificially to help identify an initial basic
feasible solution to Problem P .
Lemma 6.1. The optimal objective function value in Problem P1 is bounded below
by 0. Furthermore, if the optimal solution to problem P1 has xa = 0, then the values
of x form a feasible solution to Problem P .
Proof. Clearly, setting xa = 0 will produce an objective function value of zero. Since
e > 0, we cannot obtain a smaller objective function value. If at optimality we have
xa = 0, then we know that m of the variables in x are in the basis and the remaining
variables (in both x and xa ) are not in the basis and hence at zero. Thus we have
found a basic feasible solution to Problem P . ■
min x1 + 2x2
s.t. x1 + 2x2 ≥ 12
(6.52)
2x1 + 3x2 ≥ 20
x1 , x 2 ≥ 0
We can convert the problem to standard form by adding two surplus variables:
min x1 + 2x2
s.t. x1 + 2x2 − s1 = 12
(6.53)
2x1 + 3x2 − s2 = 20
x1 , x 2 , s 1 , s 2 ≥ 0
It’s not clear what a good basic feasible solution would be for this. Clearly, we
cannot set x1 = x2 = 0 because we would have s1 = −12 and s2 = −20, which is
not feasible. We can introduce two artificial variables (xa1 and xa2 ) and create a
new problem P1 .
A basic feasible solution for our artificial problem would let xa1 = 12 and xa2 = 20.
Thus xa2 leaves the basis and x1 enters. The new tableau becomes:
z x1 x2 s 1 s2 xa1 xa2 RHS
z 1 0 1/2 −1 1/2 0 −3/2 2
(6.57)
xa1
0 0 1/2 −1 1/2 1 −1/2 2
x1 0 1 3/2 0 −1/2 0 1/2 10
In this case, we see that x2 should enter the basis. Performing the minimum ratio
test, we obtain:
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x2 )
z 1 0 1/2 −1 1/2 0 −3/2 2
(6.58)
xa1 0 0 1/2 −1 1/2 1 −1/2 2 4
x1 0 1 3/2 0 −1/2 0 1/2 10 20/3
At this point, we have eliminated both artificial variables from the basis and we
have identified and initial basic feasible solution to the original problem: x1 = 4,
x2 = 4, s1 = 0 and s2 = 0. The process of moving to a feasible solution in the
original problem is shown in Figure 6.5.
Figure 6.5: Finding an initial feasible point: Artificial variables are introduced into
the problem. These variables allow us to move through non-feasible space. Once
we reach a feasible extreme point, the process of optimizing Problem P1 stops.
We could now continue on to solve the initial problem we were given. At this
point, our basic feasible solution makes x2 and x1 basic variables and s1 and s2
non-basic variables. Our problem data are:
x2 s1
xB = xN =
x1 s2
Note that we keep the basic variables in the order in which we find them at the
end of the solution to our first problem.
1 2 −1 0 12
A= b=
2 3 0 −1 20
2 1 −1 0 4
B= N= B−1 b =
3 2 0 −1 4
2 0
cB = cN =
1 0
h i
cTB B −1 b = 12 cTB B−1 N − cTN = −1 0
Notice that we don’t have to do a lot of work to get this information out of
the last tableau in Expression 6.59. The matrix B−1 is actually positioned in the
columns below the artificial variables. This is because we started with an identity
matrix in this position. As always, the remainder of the matrix holds B−1 N. Thus,
we can read this final tableau as:
z xB s xa RHS
z
1 0 0 −e 0 (6.60)
x2
−1 −1 −1
0 I2 B N B B b
x1
We can use this information (and the reduced costs and objective function we
computed) to start our tableau to solve the problem with which we began. Our
next initial tableau will be:
z x1 x2 s1 s2 RHS
z 1 0 0 −1 0 12
(6.62)
x2
0 0 1 −2 1 4
x1 0 1 0 3 −2 4
Notice all we’ve done is removed the artificial variables from the problem and
substituted the newly computed reduced costs for s1 and s2 (−1 and 0) into Row
0 of the tableau. We’ve also put the correct objective function value (12) into
Row 0 of the right hand side. We’re now ready to solve the original problem.
However, since this is a minimization problem we can see we’re already at a point
of optimality. Notice that all reduced costs are either negative or zero, suggesting
that entering any non-basic variable will at best keep the objective function value
the same and at worst make the objective function worse. Thus we conclude that
an optimal solution for our original problem is x∗1 = x∗2 = 4 and s∗1 = s∗2 = 0. ■
cT x
max
P s.t. Ax = b
x≥0
with b ≥ 0.
2. Introduce auxiliary variables xa and solve the Phase I problem:
e T xa
min
P1 s.t. Ax + Im xa = b
x, xa ≥ 0
3. If xa∗ = 0, then an initial feasible solution has been identified. This solution
can be converted into a basic feasible solution as we discuss below. Otherwise,
there is no solution to Problem P .
4. Use the Basic Feasible solution identified in Step 3 to start the Simplex
Algorithm (compute the reduced costs given the c vector).
5. Solve the Phase II problem:
cT x
max
P s.t. Ax = b
x≥0
This shows that the m − k rows of A are not linearly independent of the first k rows
and thus the matrix A did not have full row rank. When this occurs, we can discard
the last m − k rows of A and simply proceed with the solution given in xB = b,
xN = 0. This is a basic feasible solution to the new matrix A in which we have
removed the redundant rows.
■ Example 6.7 Once execution of the Phase I simplex algorithm is complete, the
reduced costs of the current basic feasible solution must be computed. These can
be computed during Phase I by adding an additional “z” row to the tableau. In
this case, the initial tableau has the form:
z x1 x2 s1 s2 xa1 xa2 RHS
zII
1 −1 −2 0 0 0 0 0
z
1 3 5 −1 −1 0 0 32
(6.65)
xa1
0 1 2 −1 0 1 0 12
xa2 0 2 3 0 −1 0 1 20
which is precisely the Phase II problem, except we never allow the artificial variables
xa1 and xa2 to carry into the Phase II problem. If we carry out the same steps we
did in Example 6.6 then we obtain the sequence of tableaux:
TABLEAU I
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x1 )
zII
1 −1 −2 0 0 0 0 0
z
1 3 5 −1 −1 0 0 32
xa1
0 1 2 −1 0 1 0 12
12
xa2 0 2 3 0 −1 0 1 20 20/2 = 10
TABLEAU II
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x1 )
zII 1 0 −1/2 0 −1/2 0 1/2 10
z
1 0 1/2 −1 1/2 0 −3/2 2
xa1 0 0 1/2 −1 1/2 1 −1/2 2 4
x1 0 1 3/2 0 −1/2 0 1/2 10 20/3
TABLEAU III
z x1 x2 s1 s2 xa1 xa2 RHS
zII
1 0 0 −1 0 1 0 12
z
1 0 0 0 0 −1 −1 0
x2
0 0 1 −2 1 2 −1 4
x1 0 1 0 3 −2 −3 2 4
We again arrive at the end of Phase I, but we are now prepared to immediately
execute Phase II with the tableau:
z x1 x2 s1 s2 RHS
zII
1 0 0 −1 0 12
x2
0 0 1 −2 1 4
x1 0 1 0 3 −2 4
In this case, we see that we are already at an optimal solution for a minimization
problem because the reduced costs are all less than or equal to zero. We also note
that since the reduced cost of the non-basic variable s2 is zero, there are alternative
optimal solutions. ■
the two-phase simplex method but we also modify the objective function:
cT x − M eT x a
max
PM s.t. Ax + Im xa = b (6.67)
x, xa ≥ 0
Here, M is a large positive constant, much larger than the largest coefficient in the
vector c. The value of M is usually chosen to be at least 100 times larger than the
largest coefficient in the original objective function.
R In the case of a minimization problem, the objective function in the Big-M
method becomes:
min cT x + M eT xa (6.68)
Exercise 6.7 In Exercise 2.4 we showed that every maximization problem can be
written as a minimization problem (and vice-versa). Show that Equation 6.68
follows by changing Problem PM into a minimization problem. ■
cT d − M eT da > 0 (6.69)
by Corollary 6.1.
Since we are free to choose M as large as we like, it follows that for a large value
of M , the left-hand-side of Inequality 6.69 must be negative unless da = 0.
The fact that dM is a direction implies that Ad + Im da = 0 and therefore Ad = 0.
We know further that d ≥ 0 and d ̸= 0. Thus it follows that we have identified
a direction d of the feasible region for Problem P . Furthermore, we know that
following this direction must result in an unbounded objective function for P since
the coefficients of the artificial variables are all negative. ■
R Lemma 6.2 tells us that if the Problem PM is unbounded, then we know that
there is no useful solution to Problem P . If Problem P has non-empty feasible
region, then it [Problem P ] is unbounded and thus there is no useful solution.
On the other hand, if Problem P has no feasible solution, there is still no
useful solution to Problem P . In either case, we may need to re-model the
problem to obtain a useful result.
Theorem 6.10 If Problem P is feasible and has a finite solution. Then there is an
M > 0 so that the optimal solution to PM has all artificial variables non-basic and
thus the solution to Problem P can be extracted from the solution to Problem
PM .
Let zM1 , . . . zMk be the objective function value of Problem PM at each of these
extreme points. Since P is feasible, at least one of these extreme points has xa = 0.
Let us sub-divide the extreme points in Ya = {y1 , . . . , yl } and Y = {yl+1 , . . . , yk }
where the points in Ya are the extreme points such that there is at least one non-zero
artificial variable and the points in Y are the extreme points where all artificial
variables are zero. At any extreme point in Y we know that at most m elements of
the vector x are non-zero and therefore, every extreme point in Y corresponds to
an extreme point of the original problem P . Since Problem P has a finite solution,
it follows that the optimal solution to problem P occurs at some point in Y, by
Theorem 6.1. Furthermore the value of the objective function for Problem P is
precisely the same as the value of the objective function of Problem PM for each
point in Y because xa = 0. Define:
Such a value exists for M since there are only a finite number of extreme points in
Y. Our choice of M ensures that the optimal solution to PM occurs at an extreme
point where xa = 0 and the x component of y is the solution to Problem P . ■
R The Big-M method is not particularly effective for solving real-world problems.
The introduction of a set of variables with large coefficients (M ) can lead
to round-off errors in the execution of the simplex algorithm. (Remember,
computers can only manipulate numbers in binary, which means that all
floating point numbers are restricted in their precision to the machine precision
of the underlying system OS. This is generally given in terms of the largest
amount of memory that can be addressed in bits. This has led, in recent times,
to operating system manufacturers selling their OS’s as “32 bit” or “64 bit.”
When solving real-world problems, these issue can become a real factor with
which to contend.
Another issue is we have no way of telling how large M should be without
knowing that Problem P is feasible, which is precisely what we want the Big-M
method to tell us! The general rule of thumb provided earlier will suffice.
■ Example 6.8 Suppose we solve the problem from Example 6.6 using the Big-M
method. Our problem is:
min x1 + 2x2
s.t. x1 + 2x2 ≥ 12
(6.72)
2x1 + 3x2 ≥ 20
x1 , x 2 ≥ 0
min x1 + 2x2
s.t. x1 + 2x2 − s1 = 12
(6.73)
2x1 + 3x2 − s2 = 20
x1 , x 2 , s 1 , s 2 ≥ 0
To execute the Big-M method, we’ll choose M = 300 which is larger than 100
times the largest coefficient of the objective function of the original problem. Our
new problem becomes:
TABLEAU II
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x1 )
z 1 0 299/2 −300 299/2 0 −899/2 610
xa1 0 0 1/2 −1 1/2 1 −1/2 2 4
x1 0 1 3/2 0 −1/2 0 1/2 10 20/3
TABLEAU III
z x1 x2 s1 s2 xa1 xa2 RHS
z
1 0 0 −1 0 −299 −300 12
x2
0 0 1 −2 1 2 −1 4
x1 0 1 0 3 −2 −3 2 4
It is worth noting that this is essentially the same series of tableau we had when
executing the Two-Phase method, but we have to deal with the large M coefficients
in our arithmetic. ■
Im xB + B−1 N + ya xa = b (6.75)
Lemma 6.3. Suppose we enter xa into the basis by pivoting on the row of the
simplex tableau with most negative right hand side. That is, xa is exchanged with
variable xBj having most negative value. Then the resulting solution is a basic
feasible solution to the constraints:
Im xB + B−1 N + ya xa = b
(6.77)
x, xa ≥ 0
The resulting basic feasible solution can either be used as a starting solution for the
two-phase simplex algorithm with the single artificial variable or the Big-M method.
For the two-phase method, we would solve the Phase I problem:
min xa
s.t. Ax + B0 ya xa = b (6.78)
x, xa ≥ 0
where B0 is the initial crash basis we used to identify the coefficients of single artificial
variable. Equation 6.78 is generated by multiplying by B0 on both sides of the
inequalities.
x1 + 2x2 − s1 = 12
2x1 + 3x2 − s2 = 20 (6.79)
x1 , x 2 , s 1 , s 2 ≥ 0
− x1 − 2x2 + s1 = −12
(6.81)
− 2x1 − 3x2 + s2 = −20
That is, s1 = −12 and s2 = −20 is our basic solution, which is not feasible. We
append the artificial variable with coefficient vector ya = [−1 − 1]T (since both
elements of the right-hand-side are negative) to obtain:
− x1 − 2x2 + s1 − xa = −12
(6.82)
− 2x1 − 3x2 + s2 − xa = −20
We enter the variable xa and pivot out variable s2 which has the most negative
We can now complete the Phase I process and execute the simplex algorithm until
we drive xa from the basis and reduce the right-hand-side to 0. At this point we
will have identified an initial basic feasible solution to the initial problem and we
can execute Phase II. ■
Additionally, there are some constraints linking production, storage and demand.
These constraints are depicted graphically in Figure 6.6. Multiperiod inventory
models operate on a principle of conservation of flow. Manufactured goods and
previous period inventories flow into the box representing each period. Demand
and next period inventories flow out of the box representing each period. This
inflow and outflow must be equal to account for all shamrocks produced. This is
depicted below:
x1 x2 x3
Initial Inventory
Enters
y0 Day 1
y1 Day 2
y2 Day 3
y3
Remaining Final Inventory
Shamrocks Leaves
goto Inventory
d1 d2 d3
Demand Causes Shamrocks to Leave Boxes
yt−1 + xt = yt + dt ∀t (6.84)
This equation says that at period t the amount of inventory carried over from
period t − 1 plus amount of shamrocks produced in period t must be equal to
the total demand in period t plus any left over shamrocks at the end of period
t. Clearly we also know that xt ≥ 0 for all t, since you cannot make a negative
number of shamrocks. However, by also requiring yt ≥ 0 for all t, then we assert
that our inventory can never be negative. A negative inventory is a backorder.
Thus by saying that yt ≥ 0 we are also satisfying the requirement that McLearey
satisfy his demand in each period. Note, when t = 1, then yt−1 = y0 , which is the
parameter we defined above.
The complete problem describing McLearey’s situation is:
min 200x1 + 200x2 + 200x3 + y1 + y2 + 5y3
s.t. yt−1 + xt = yt + dt ∀t ∈ {1, 2, 3}
(6.85)
xt ≤ 150 ∀t ∈ {1, 2, 3}
xt , yt ≥ 0 ∀t ∈ {1, 2, 3}
Constraints of the form xt ≤ 150 for all t come from the fact that McLearey can
produce at most 150 shamrocks per day.
This simple problem now has 6 variables and 6 constraints plus 6 non-negativity
constraints and it is non-trivial to determine a good initial basic feasible solution,
especially since the problem contains both equality and inequality constraints.
A problem like this can be solved in Matlab (see Chapter ??.5.15), or on a
commercial or open source solver like the GNU Linear Programming Kit (GLPK,
http://www.gnu.org/software/glpk/). In Figure 6.7 we show an example model
file that describes the problem. In Figure 6.8, we show the data section of the
GLPK model file describing McLearey’s problem. Finally, figure 6.9 shows a
portion of the output generated by the GLPK solver glpsol using this model.
Note that there is no inventory in Year 3 (because it is too expensive) even though
it might be beneficial to McLearey to hold inventory for next year. This is because
the problem has no information about any other time periods and so, in a sense,
the end of the world occurs immediately after period 3. This type of end of the
world phenomenon is common in multi-period problems.
#
# This finds the optimal solution for McLearey
#
/* sets */
set DAY;
set DAY2;
/* parameters */
param makeCost {t in DAY};
param holdCost {t in DAY};
param demand {t in DAY};
param start;
param S;
/* decision variables: */
var x {t in DAY} >= 0;
var y {t in DAY} >= 0;
/* objective function */
minimize z: sum{t in DAY} (makeCost[t]*x[t]+holdCost[t]*y[t]);
/* Flow Constraints */
s.t. FLOWA : x[1] - y[1] = demand[1] - start;
s.t. FLOWB{t in DAY2} : x[t] + y[t-1] - y[t] = demand[t];
/* Manufacturing constraints */
s.t. MAKE{t in DAY} : x[t] <= S;
end;
param makeCost:=
1 200
2 200
3 200;
param holdCost:=
1 1
2 1
3 5;
param demand:=
1 100
2 200
3 50;
param start:=10;
param S:=150;
end;
Problem: Shamrock
Rows: 7
Columns: 6
Non-zeros: 17
Status: OPTIMAL
Objective: z = 68050 (MINimum)
■ Example 6.11 A craftsman makes two kinds of jewelry boxes for craft shows.
The oval box requires 30 minutes of machine work and 20 minutes of finishing.
The square box requires 20 minutes of machine work and 40 minutes of finishing.
Machine work is limited to 600 minutes per day and finishing to 800 minutes. If
there is $3 profit on the oval box and $4 on the square box, how many of each
should be produced to maximize profit? ■
Solution. Let x1 denote the number of square boxes and x2 the number of oval
Clearly we can divide our constraints by a positive number without altering them. If
we divide the first by 10 and the second by 20 we get
3x1 + 2x2 ≤ 60,
x1 + 2x2 ≤ 40.
x1 x2 s 1 s 2 z
3 2 1 0 0 60
1 2 0 1 0 40 .
−3 −4 0 0 1 0
Now you need to observe one important thing: The position of each column is only
important in the sense that the location gives you the corresponding variable. That
is, the first column corresponds to x1 , the second to x2 , the third to s1 , the fourth
to s2 , and the fifth to z. If we keep track of this information we can of course
interchange columns.
Hence we can rearrange our tableau according to
s1 s2 z x1 x2
1 0 0 3 2 60
0 1 0 1 2 40 .
0 0 1 −3 −4 0
s1 x 2 z x1 s2
1 2 0 3 0 60
0 2 0 1 1 40
0 −4 1 −3 0 0
and note that the columns now correspond to s1 , x2 , z, x1 , s2 . Next, compute the
reduced echelon form by typing “rref Ans”
s 1 x2 z x1 s2
1 0 0 2 −1 20
0 1 0 12 1
2 20
0 0 1 −1 2 80
Since there is still a negative entry in the last row we are not done yet. The new
pivot element is the 1,4 entry (again printed bold) and hence we are going to swap
column 1 and 4 on the calculator by typing “rSwap(AnsT ,1,4)T ”
x1
x2 z s1 s2
20 0 1 −1 20
1
2 1 0 0 12 20
−1 0 1 0 2 80
x1 x2 z s 1 s2
1 0 0 21 − 12 10
0 1 0 − 14 43 15
0 0 1 21 3
2 90
Since there are no negative entries in the last row we are done. Now we can rearrange
the tableau in such a way that the columns correspond to x1 , x2 , s1 , s2 , z as at the
outset
x1 x2 s 1 s 2 z
1
1 0 − 12 0 10
2
0 1 − 14 34 0 15 .
1 3
0 0 2 2 1 90
This is the final simplex tableau. Setting s1 = 0, s2 = 0 and solving for the remaining
variables we get x1 = 10, x2 = 15 and z = 90.
We have seen in Example 6.5 that degeneracy can cause us to take extra steps
on our way from an initial basic feasible solution to an optimal solution. When
the simplex algorithm takes extra steps while remaining at the same degenerate
extreme point, this is called stalling. The problem can become much worse; for
certain entering variable rules, the simplex algorithm can become locked in a cycle of
pivots each one moving from one characterization of a degenerate extreme point to
the next. The following example from Beale and illustrated in Chapter 4 of [BJS04]
demonstrates the point.
3 1
min − x4 + 20x5 − x6 + 6x7
4 2
1
s.t x1 + x4 − 8x5 − x6 + 9x7 = 0
4
1 1 (7.1)
x2 + x4 − 12x5 − x6 + 3x7 = 0
2 2
x3 + x6 = 1
xi ≥ 0 i = 1, . . . , 7
The fact that the A matrix contains an identity matrix embedded within it suggests
that an initial basic feasible solution with basic variables x1 , x2 and x3 would be a
good choice. This leads to a vector of reduced costs given by:
h i
cB T B−1 N − cN T = 3/4 −20 1/2 −6 (7.3)
Tableau II:
z x1 x2 x3 x4 x5 x6 x7 RHS
z
1 −3 0 0 0 4 7/2 −33 0
x4
0 4 0 0 1 −32 −4 36 0
x2
0 −2 1 0 0 4 3/2 −15 0
x3 0 0 0 1 0 0 1 0 1
Tableau III:
z x1 x2 x3 x4 x5 x6 x7 RHS
z
1 −1 −1 0 0 0 2 −18 0
x4
0 −12 8 0 1 0 8 −84 0
x5
0 −1/2 1/4 0 0 1 3/8 −15/4 0
x3 0 0 0 1 0 0 1 0 1
Tableau IV:
z x1 x2 x3 x4 x5 x6 x7 RHS
z 1 2 −3 0 −1/4 0 0 3 0
x6 0 −3/2 1 0 1/8 0 1 −21/2 0
x5
0 1/16 −1/8 0 −3/64 1 0 3/16 0
x3 0 3/2 −1 1 −1/8 0 0 21/2 1
Tableau V:
z x1 x2 x3 x4 x5 x6 x7 RHS
z
1 1 −1 0 1/2 −16 0 0 0
x6
0 2 −6 0 −5/2 56 1 0 0
x7
0 1/3 −2/3 0 −1/4 16/3 0 1 0
x3 0 −2 6 1 5/2 −56 0 0 1
Tableau VI:
z x1 x2 x3 x4 x5 x6 x7 RHS
z 1 0 2 0 7/4 −44 −1/2 0 0
x1 0 1 −3 0 −5/4 28 1/2 0 0
x7
0 0 1/3 0 1/6 −4 −1/6 1 0
x3 0 0 0 1 0 0 1 0 1
Tableau VII:
z x1 x2 x3 x4 x5 x6 x7 RHS
z
1 0 0 0 3/4 −20 1/2 −6 0
x1
0 1 0 0 1/4 −8 −1 9 0
x2
0 0 1 0 1/2 −12 −1/2 3 0
x3 0 0 0 1 0 0 1 0 1
We see that the last tableau (VII) is the same as the first tableau and thus we
have constructed an instance where (using the given entering and leaving variable
rules), the Simplex Algorithm will cycle forever at this degenerate extreme point. ■
Lexicographic ordering is simply the standard order operation > applied to the
individual elements of a vector in Rn with a precedence on the index of the vector.
the minimum ratio test asserts that we will chose as the leaving variable the basis
variable with the minimum ratio in the minimum ratio test. Consider the following
set:
( " #)
br bi
I0 = r : = min : i = 1, . . . , m and aji > 0 (7.4)
aj r aj i
In the absence of degeneracy, I0 contains a single element: the row index that has
the smallest ratio of bi to aji , where naturally: b = B−1 b and aj = B−1 A·j . In this
case, xj is swapped into the basis in exchange for xBr (the rth basic variable).
When we have a degenerate basic feasible solution, then I0 is not a singleton set
and contains all the rows that have tied in the minimum ratio test. In this case, we
can form a new set:
( " #)
a1 a1 i
I1 = r : r = min : i ∈ I0 (7.5)
aj r aj i
Here, we are taking the elements in column 1 of B−1 A·1 to obtain a1 . The elements
of this (column) vector are then being divided by the elements of the (column) vector
aj on a index-by-index basis. If this set is a singleton, then basic variable xBr leaves
the basis. If this set is not a singleton, we may form a new set I2 with column a2 .
In general, we will have the set:
( " #)
ak ak i
Ik = r : r = min : i ∈ Ik−1 (7.6)
aj r aj i
Lemma 7.2. For any degenerate basis matrix B for any linear programming problem,
we will ultimately find a k so that Ik is a singleton.
Exercise 7.2 Prove Lemma 7.2. [Hint: Assume that the tableau is arranged so that
the identity columns are columns 1 through m. (That is aj = ej for i = 1, . . . , m.)
Show that this configuration will easily lead to a singleton Ik for k < m.] ■
In executing the lexicographic minimum ratio test, we can see that we are essentially
comparing the tied rows in a lexicographic manner. If a set of rows ties in the
minimum ratio test, then we execute a minimum ratio test on the first column of
the tied rows. If there is a tie, then we move on executing a minimum ratio test on
the second column of the rows that tied in both previous tests. This continues until
the tie is broken and a single row emerges as the leaving row.
■Example 7.2 Let us consider the example from Beale again using the lexicographic
minimum ratio test. Consider the tableau shown below.
Tableau I:
z x1 x2 x3 x4 x5 x6 x7 RHS
z 1 0 0 0 3/4 −20 1/2 −6 0
x1 0 1 0 0 1/4 −8 −1 9 0
x2
0 0 1 0 1/2 −12 −1/2 3 0
x3 0 0 0 1 0 0 1 0 1
Again, we chose to enter variable x4 as it has the most positive reduced cost.
Variables x1 and x2 tie in the minimum ratio test. So we consider a new minimum
ratio test on the first column of the tableau:
( )
1 0
min , (7.7)
1/4 1/2
From this test, we see that x2 is the leaving variable and we pivot on element 1/2
as indicated in the tableau. Note, we only need to execute the minimum ratio test
on variables x1 and x2 since those were the tied variables in the standard minimum
ratio test. That is, I0 = {1, 2} and we construct I1 from these indexes alone. In
this case I1 = {2}. Pivoting yields the new tableau:
Tableau II:
z x1 x2 x3 x4 x5 x6 x7 RHS
z
1 0 −3/2 0 0 −2 5/4 −21/2 0
x1
0 1 −1/2 0 0 −2 −3/4 15/2 0
x4
0 0 2 0 1 −24 −1 6 0
x3 0 0 0 1 0 0 1 0 1
There is no question this time of the entering or leaving variable, clearly x6 must
enter and x3 must leave and we obtaina :
Tableau III:
z x1 x2 x3 x4 x5 x6 x7 RHS
z
1 0 −3/2 −5/4 0 −2 0 −21/2 −5/4
x1
0 1 −1/2 3/4 0 −2 0 15/2 3/4
x4
0 0 2 1 1 −24 0 6 1
x6 0 0 0 1 0 0 1 0 1
Since this is a minimization problem and the reduced costs of the non-basic variables
are now all negative, we have arrived at an optimal solution. The lexicographic
minimum ratio test successfully prevented cycling. ■
aThanks to Ethan Wright for finding a small typo in this example, that is now fixed.
cT x
max
P s.t. Ax = b
x≥0
z x1 ... xj ... xm xm+1 ... xk ... xn RHS
z 1 z1 − c1 ... z j − cj ... z m − cm zm+1 − cm+1 ... z k − ck ... z n − cn z
xB 1 0 a11 ... a1j ... a1m a1m+1 ... a1k ... a1n b1
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
xBi 0 ai1 ... aij ... aim aim+1 ... aik ... ain bi
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
xBr 0 ar1 ... arj ... arm arm+1 ... ark ... arn br
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
xBm 0 am1 ... amj ... amm amm+1 ... amk ... amn bm
Assume using the entering variable rule of our choice that xk will enter. Let us
consider what happens as we choose a leaving variable and execute a pivot. Suppose
that after executing the lexicographic minimum ratio test, we pivot on element ark .
Consider the pivoting operation on row i: there are two cases:
Case I i ̸∈ I0 If i ̸∈ I0 , then we replace bi with
′ aik
bi = bi − br
ark
If aij < 0, then clearly b′i > 0. Otherwise, since i ̸∈ I0 , then:
br bi aik aik
< =⇒ br < bi =⇒ 0 < bi − br
ark aik ark ark
br bi
=
ark aik
There are now two possibilities, either i ∈ I1 or i ̸∈ I1 . In the first, case we can
argue that
aik
a′i1 = ai1 − ar1 > 0
ark
′
for the same reason that bi > 0 in the case when i ∈ I0 , namely that the
lexicographic minimum ratio test ensures that:
ar1 ai1
<
ark aik
′
if i ̸∈ I1 . This confirms (since bi = 0) row i of the augmented matrix [b|B−1 ] is
lexicographically positive.
In the second case that i ∈ I1 , then we may proceed to determine whether i ∈ I2 .
This process continues until we identify the j for which Ij is the singleton index
r. Such a j must exist by Lemma 7.2. In each case, we may reason that row i
of the augmented matrix [b|B−1 ] is lexicographically positive.
The preceding argument shows that at step n + 1 of the simplex algorithm we
will arrive an augmented matrix [b|B−1 ] for which every row is lexicographically
positive. This completes the proof. ■
R The assumption that we force Im into the basis can be justified in one of two
ways:
1. We may assume that we first execute a Phase I simplex algorithm with
artificial variables. Then the forgoing argument applies.
2. Assume we are provided with a crash basis B and we form the equivalent
problem:
Lemma 7.4. Under the assumptions of Lemma 7.3, let zi and zi+1 be row vectors
in Rn+1 corresponding to Row 0 from the simplex tableau at iterations i and i + 1
respectively. Assume, however, that we exchange the z column (column 1) and the
RHS column (column n + 2). Then zi+1 − zi is lexicographically positive.
Proof. Consider the tableau in Table 7.1. If we are solving a maximization problem,
then clearly for xk to be an entering variable (as we assumed in the proof of Lemma
7.3) we must have zk − ck < 0. Then the new Row Zero is obtained by adding:
−(zk − ck ) h i
y= 0 ar1 . . . arj . . . arm arm+1 . . . ark . . . arn br
ark
Theorem 7.1 Under the assumptions of Lemma 7.3, the simplex algorithm converges
in a finite number of steps.
R Again, the proof of correctness, i.e., that the simplex algorithm with the
lexicographic minimum ratio test finds a point of optimality, is left until the
next chapter when we’ll argue that the simplex algorithm finds a so-called
KKT point.
1. Largest absolute reduced cost: In this case, the variable with most negative
(in maximization) or most positive (in minimization) reduced cost is chosen to
enter.
2. Largest impact on the objective: In this case, the variable whose entry will
cause the greatest increase (or decrease) to the objective function is chosen.
This of course requires pre-computation of the value of the objective function
for each possible choice of entering variable and can be time consuming.
Leaving variable rules (like Bland’s rule or the lexicographic minimum ratio test)
can also be expensive to implement. Practically, many systems ignore these rules
and use floating point error to break ties. This does not ensure that cycling does
not occur, but often is useful in a practical sense. However, care must be taken. In
certain simplex instances, floating point error cannot be counted on to ensure tie
breaking and consequently cycling prevention rules must be implemented. This is
particularly true in network flow problems that are coded as linear programs. It is
also important to note that none of these rules prevent stalling. Stalling prevention is
a complicated thing and there are still open questions on whether certain algorithms
admit or prevent stalling. See Chapter 4 of [BJS04] for a treatment of this subject.
The tableau method is a substantially data intensive process as we carry the entire
simplex tableau with us as we execute the simplex algorithm. However, consider the
data we need at each iteration of the algorithm:
1. Reduced costs: cTB B−1 A·j − cj for each variable xj where j ∈ J and J is the
set of indices of non-basic variables.
2. Right-hand-side values: b = B−1 b for use in the minimum ratio test.
3. aj = B−1 A·j for use in the minimum ratio test.
4. z = cTB B−1 b, the current objective function value.
The one value that is clearly critical to the computation is B−1 as it appears in
each and every computation. It would be far more effective to keep only the values:
B−1 , cTB B−1 , b and z and compute the reduced cost values and vectors aj as we
need them.
Let w = cTB B−1 , then the pertinent information may be stored in a new revised
simplex tableau with form:
w z
(8.2)
xB B−1 b
The revised simplex algorithm is detailed in Algorithm 8. In essence, the revised
simplex algorithm allows us to avoid computing aj until we absolutely need to do so.
In fact, if we do not apply Dantzig’s entering variable rule and simply select the first
acceptable entering variable, then we may be able to avoid computing a substantial
number of columns in the tableau.
■ Example 8.1 Consider a software company who is developing a new program. The
company has identified two types of bugs that remain in this software: non-critical
and critical. The company’s actuarial firm predicts that the risk associated with
these bugs are uniform random variables with mean $100 per non-critical bug and
mean $1000 per critical bug. The software currently has 50 non-critical bugs and 5
critical bugs.
Assume that it requires 3 hours to fix a non-critical bug and 12 hours to fix a
critical bug. For each day (8 hour period) beyond two business weeks (80 hours)
that the company fails to ship its product, the actuarial firm estimates it will loose
$500 per day.
We can find the optimal number of bugs of each type the software company
should fix assuming it wishes to minimize its exposure to risk using a linear
programming formulation.
Let x1 be the number of non-critical bugs corrected and x2 be the number of
critical software bugs corrected. Define:
y1 = 50 − x1 (8.3)
y2 = 5 − x2 (8.4)
Here y1 is the number of non-critical bugs that are not fixed while y2 is the number
of critical bugs that are not fixed.
The time (in hours) it takes to fix these bugs is:
5. Perform the minimum ratio test and determine a leaving variable (using any
leaving variable rule you prefer).
a. If aj ≤ 0, STOP, the problem is unbounded.
b. Otherwise, assume that the leaving variable is xBr which appears in
row r of the revised simplex tableau.
6. Use row operations and pivot on the leaving variable row of the column:
z − cj
j
aj
where er is an identity column with a 1 in row r (the row that left). The
variable xj is now the rth element of xB .
7. Goto Step 2.
Let:
1
y3 = (80 − 3x1 − 12x2 ) (8.6)
8
Then y3 is a variable that is unrestricted in sign and determines the amount of
time (in days) either over or under the two-week period that is required to ship
the software. As an unrestricted variable, we can break it into two components:
y3 = z1 − z2 (8.7)
We will assume that z1 , z2 ≥ 0. If y3 > 0, then z1 > 0 and z2 = 0. In this case, the
software is completed ahead of the two-week deadline. If y3 < 0, then z1 = 0 and
z2 > 0. In this case the software is finished after the two-week deadline. Finally, if
y3 = 0, then z1 = z2 = 0 and the software is finished precisely on time.
We can form the objective function as:
Notice we have modified the objective function by dividing by 100. This will make
the arithmetic of the simplex algorithm easier. The matrix of coefficients for this
problem is:
x1 x2 y1 y2 z1 z2
1 0 1 0 0 0
(8.10)
0 1 0 1 0 0
3 3
8 2 0 0 1 −1
Notice there is an identity matrix embedded inside the matrix of coefficients. Thus
a good initial basic feasible solution is {y1 , y2 , z1 }. The initial basis matrix is I3
and naturally, B−1 = I3 as a result. We can see that cB = [1 10 0]T . It follows that
cTB B−1 = w = [1 10 0].
Our initial revised simplex tableau is thus:
z 1 10 0 100
y1 1 0 0 50
(8.11)
y2
0 1 0 5
z1 0 0 1 10
There are three variables that might enter at this point, x1 , x2 and z1 . We can
compute the reduced costs for each of these variables using the columns of the A
matrix, the coefficients of these variables in the objective function and the current
w vector (in row 0 of the revised simplex tableau). We obtain:
h i
1
z1 − c1 = wA·1 − c1 = 1 10 0
−0 = 1
0
3/8
h i
0
z2 − c2 = wA·2 − c2 = 1 10 0
− 0 = 10
1
3/2
h i
0
z6 − c6 = wA·6 − c6 = 1 10 0
0 − 5 = −5
−1
By Dantzig’s rule, we enter variable x2 . We append B−1 A·2 and the reduced cost
to the revised simplex tableau to obtain:
z 1 10 0 100 10 M RT
y1 1 0 0 50 0
−
(8.12)
y2
0 1 0 5 1
5
z1 0 0 1 10 3/2 20/3
We can compute reduced costs for the non-basic variables (except for y2 , which we
know will not re-enter the basis on this iteration) to obtain:
z1 − c1 = wA·1 − c1 = 1
z6 − c6 = wA·6 − c6 = −5
In this case, x1 will enter the basis and we augment our revised simplex tableau to
obtain:
z 1 0 0 50 1 M RT
y1 1 0 0 50 1 50
(8.14)
−
x2
0 1 0 5
0
z1 0 −3/2 1 5/2 3/8 20/3
Note that:
1 0 0 1 1
−1
B A·1 = 0
1 0 0 = 0
0 −3/2 1 3/8 3/8
This is the ā1 column that is appended to the right hand side of the tableau along
with z1 − c1 = 1. After pivoting, the tableau becomes:
z 1 4 −8/3 130/3
y1 1 4 −8/3 130/3
(8.15)
x2
0 1 0 5
x1 0 −4 8/3 20/3
We can now check our reduced costs. Clearly, z1 will not re-enter the basis.
Therefore, we need only examine the reduced costs for the variables y2 and z2 .
z4 − c4 = wA·4 − c4 = −6
z6 − c6 = wA·6 − c6 = −7/3
Since all reduced costs are now negative, no further minimization is possible and
we conclude we have arrived at an optimal solution.
Two things are interesting to note: first, the solution for the number of non-
critical software bugs to fix is non-integer. Thus, in reality the company must fix
either 6 or 7 of the non-critical software bugs. The second thing to note is that
this economic model helps to explain why some companies are content to release
software that contains known bugs. In making a choice between releasing a flawless
product or making a quicker (larger) profit, a selfish, profit maximizer will always
choose to fix only those bugs it must fix and release sooner rather than later. ■
Exercise 8.1 Solve the following problem using the revised simplex algorithm.
max x1 + x2
s.t. 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 6
x1 , x 2 ≥ 0
■
Proof. We can prove Farkas’ Lemma using the fact that a bounded linear program-
ming problem has an extreme point solution. Suppose that System 1 has a solution
x. If System 2 also has a solution w, then
The fact that System 1 has a solution ensures that cx < 0 and therefore wAx < 0.
However, it also ensures that Ax ≥ 0. The fact that System 2 has a solution implies
that w ≥ 0. Therefore we must conclude that:
This contradiction implies that if System 1 has a solution, then System 2 cannot
have a solution.
Now, suppose that System 1 has no solution. We will construct a solution for
System 2. If System 1 has no solution, then there is no vector x so that cx < 0 and
Ax ≥ 0. Consider the linear programming problem:
(
min cx
PF (8.18)
s.t. Ax ≥ 0
Applying Theorems 6.1 and 6.2, we see we can obtain an optimal basic feasible
solution for Problem PF′ in which the reduced costs for the variables are all negative
(that is, zj − cj ≤ 0 for j = 1, . . . , 2n + m). Here we have n variables in vector y, n
variables in vector z and m variables in vector s. Let B ∈ Rm×m be the basis matrix
at this optimal feasible solution with basic cost vector cB . Let w = cB B−1 (as it
was defined for the revised simplex algorithm).
wA·k − ck ≤ 0 and
−wA·k + ck ≤ 0 and
wA = c (8.20)
Since this holds for all surplus variable columns, we see that −w ≤ 0 which implies
w ≥ 0. Thus, the optimal basic feasible solution to Problem PF′ must yield a vector
w that solves System 2.
Lastly, the fact that if System 2 does not have a solution, then System 1 does
follows from contrapositive on the previous fact we just proved. ■
Use contrapositive to prove explicitly that if System 2 has no solution, then System
1 must have a solution. [Hint: NOT NOT X ≡ X.] ■
wA = c and w ≥ 0
1Thanks to Akinwale Akinbiyi for pointing out a typo in this discussion.
Geometrically, this states that c is inside the positive cone generated by the rows of
A. That is, let w = (w1 , . . . , wm ). Then we have:
A2· c
A1· Am·
Figure 8.1: System 2 has a solution if (and only if) the vector c is contained inside
the positive cone constructed from the rows of A.
On the other hand, suppose System 1 has a solution. Then let y = −x. System
1 states that Ay ≤ 0 and cy > 0. That means that each row of A (as a vector)
must be at a right angle or obtuse to y. (Since Ai· x ≥ 0.) Further, we know that
the vector y must be acute with respect to the vector c. This means that System
1 has a solution only if the vector c is not in the positive cone of the rows of A or
equivalently the intersection of the open half-space {y : cy > 0} and the set of vectors
{y : Ai· y ≤ 0, i = 1, . . . m} is non-empty. This set is the cone of vectors perpendicular
to the rows of A. This is illustrated in Figure 8.2
Am·
Non-empty intersection
c
A2·
Half-space
cy > 0
Cone of perpendicular
vectors to Rows of A
A1·
Figure 8.2: System 1 has a solution if (and only if) the vector c is not contained
inside the positive cone constructed from the rows of A.
[0 1]
[1 2]
System 2 has
a solution
Positive cone
[1 0]
[1 -1]
System 1 has
a solution
Figure 8.3: An example of Farkas’ Lemma: The vector c is inside the positive cone
formed by the rows of A, but c′ is not.
Let:
A
M=
E
and
h i
u= w v
Proof. Let x = −d. Then Md ≤ 0 implies Mx ≥ 0 and cd > 0 implies cx < 0. This
converts System 1 to the System 1 of Farkas’ Lemma. System 2 is already in the
form found in Farkas’ Lemma. This completes the proof. ■
aThanks to Rich Benjamin for pointing out the fact I was missing “. . . is an optimal solution. . . ”
R The vectors w∗ and v∗ are sometimes called dual variables for reasons that
will be clear in the next chapter. They are also sometimes called Lagrange
Multipliers. You may have encountered Lagrange Multipliers in your Math
230 or Math 231 class. These are the same kind of variables except applied
to linear optimization problems. There is one element in the dual variable
vector w∗ for each constraint of the form Ax ≤ b and one element in the dual
variable vector v∗ for each constraint of the form x ≥ 0.
The fact that x∗ is optimal implies that there is no improving direction d at the
point x∗ . That is, there is no d so that Md ≤ 0 and cT d > 0. Otherwise, by moving
in this direction we could find a new point x̂ = x∗ + λd (with λ sufficiently small) so
that:
That is, this point is both feasible and has a larger objective function value than x∗ .
We can now apply Corollary 8.1 to show that there are vectors w and v so that:
h AE
i
w v =c (8.29)
E
w≥0 (8.30)
v≥0 (8.31)
Let I be the indices of the rows of A making up AE and let J be the indices of
the variables that are zero (i.e., binding in x ≥ 0). Then we can re-write Equation
8.29 as:
X X
wi Ai· − vj ej = c (8.32)
i∈I j∈J
The vector w has dimension equal to the number of binding constraints of the form
Ai· x = b while the vector v has dimension equal to the number of binding constraints
of the form x ≥ 0. We can extend w to w∗ by adding 0 elements for the constraints
where Ai· x < bi . Similarly we can extend v to v∗ by adding 0 elements for the
constraints where xj > 0. The result is that:
w∗ (Ax∗ − b) = 0 (8.33)
v∗ x ∗ = 0 (8.34)
We know that Ai· x∗ = bi for i ∈ I and that x∗j = 0 for j ∈ J. We can use this to
simplify Equation 8.35:
vj ej x = cx∗ − cx
X X
wi (bi − Ai· x) + (8.36)
i∈I j∈J
The left hand side must be non-negative, since w ≥ 0 and v ≥ 0, bi − Ai· x ≥ 0 for all
i, and x ≥ 0 and thus it follows that x∗ must be an optimal point since cx∗ − cx ≥ 0.
This completes the proof. ■
R The expressions:
Ax∗ ≤ b
(
Primal Feasibility (8.37)
x∗ ≥ 0
∗ ∗
w A − v = c
Dual Feasibility w∗ ≥ 0 (8.38)
v∗ ≥ 0
( ∗
w (Ax∗ − b) = 0
Complementary Slackness (8.39)
v∗ x ∗ = 0
element in the vector v for each constraint of the form x ≥ 0. The vectors
w and v are sometimes called dual variables and sometimes called Lagrange
Multipliers.
We can think of dual feasibility as expressing the following interesting
fact: at optimality, the gradient of the objective function c can be expressed
as a positive combination of the gradients of the binding constraints written
as less-than-or-equal-to inequalities. That is, the gradient of the constraint
Ai· x ≤ bi is Ai· and the gradient of the constraint −xj ≤ 0 is −ej . More
specifically, the vector c is in the cone generated by the binding constraints at
optimality.
■ Example 8.3 Consider the Toy Maker Problem (Equation 2.54) with Dual
h i
3 1
h i h i
−
w w2 w3 1 2 v v = 7 6
1 1 2
1 0
Dual Feasibility h i h i
w2 w3 ≥ 0 0 0
w1
h i h i
v v2 ≥ 0 0
1
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
To see this note that if 3(16) + 72 = 120 and 16 + 2(72) = 160. Then we should be
able to express c = [7 6] (the vector of coefficients of the objective function) as a
positive combination of the gradients of the binding constraints:
h i
∇(7x1 + 6x2 ) = 7 6
h i
∇(3x1 + x2 ) = 3 1
h i
∇(x1 + 2x2 ) = 1 2
Note, this is how Equation 8.32 looks when we apply it to this problem. The result
is a system of equations:
3w1 + w2 = 7
w1 + 2w2 = 6
3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0
Figure 8.4: The Gradient Cone: At optimality, the cost vector c is obtuse with
respect to the directions formed by the binding constraints. It is also contained
inside the cone of the gradients of the binding constraints, which we will discuss at
length later.
max x1 + x2
s.t. 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 6
x1 , x 2 ≥ 0
Write the KKT conditions for an optimal point for this problem. (You will have a
vector w = [w1 w2 ] and a vector v = [v1 v2 ]).
Draw the feasible region of the problem. At the optimal point you identified
in Exercise 8.1, identify the binding constraints and draw their gradients. Show
that the objective function is in the positive cone of the gradients of the binding
constraints at this point. (Specifically find w and v.) ■
Ax∗ = b
(
Primal Feasibility (8.42)
x∗ ≥ 0
w∗ A − v∗ = c
Dual Feasibility w∗ unrestricted (8.43)
v∗ ≥ 0
n
Complementary Slackness v∗ x∗ = 0 (8.44)
wA − v = w1∗ A − w2 A − v∗ = c (8.50)
min cx
s.t. Ax ≥ b (8.52)
x≥0
[Hint: Remember, Ax ≥ b is the same as writing −Ax ≤ −b. Now use the KKT
conditions for the maximization problem to find the KKT conditions for this
problem.] ■
wA − v = c
vx = 0
at an optimal point x for some vector w unrestricted in sign and v ≥ 0. (Note, for
the sake of notational ease, we have dropped the ∗ notation.)
Suppose at optimality we have a basis matrix B corresponding to a set of basic
variables xB and we simultaneously have non-basic variables xN . We may likewise
divide v into vB and vN .
Then we have:
h i h i h i
wA − v = c =⇒ w B N − vB vN = cB cN (8.54)
h i x
B = 0
vx = 0 =⇒ vB vN (8.55)
xN
wB − vB = cB
wN − vN = cN
wB − vB = cB =⇒ cB B−1 B − vB = cB =⇒ cB − vB = cB =⇒ vB = 0 (8.57)
Thus, the vN are just the reduced costs of the non-basic variables. (vB are the
reduced costs of the basic variables.) Furthermore, dual feasibility requires that
v ≥ 0. Thus we see that at optimality we require:
cB B−1 N − cN ≥ 0 (8.59)
Note we have added two new dual variables v3 and v4 for the non-negativity
constraints on slack variables s1 and s2 . Our dual variable vectors are: w = [w1 w2 ]
and v = [v1 v2 v3 v4 ]. We can construct an initial simplex tableau as:
z x1 x2 s1 s2 RHS
z
1 −3 −5 0 0 0
s1
0 1 2 1 0 60
s2
0 1 1 0 1 40
w = cB B−1
x1 + 2x2 ≤ 60
x1 + x2 ≤ 40
are both binding at optimality (since s1 and s2 are both zero). This means we
should be able to express c = [3 5]T as a positive combination of the gradients of the
left-hand-sides of these constraints using w. To see this, note that w1 corresponds
to x1 + 2x2 ≤ 60 and w2 to x1 + x2 ≤ 40. We have:
1
∇(x1 + 2x2 ) =
2
1
∇(x1 + x2 ) =
1
Then:
1 1 1 1 3
w1 + w2 = (2) + (1) =
2 1 2 1 5
Thus, the objective function gradient is in the dual cone of the binding constraint.
That is, it is a positive combination of the gradients of the left-hand-sides of the
binding constraints at optimality. This is illustrated in Figure 8.5.
Figure 8.5: This figure illustrates the optimal point of the problem given in Example
8.4. Note that at optimality, the objective function gradient is in the dual cone
of the binding constraint. That is, it is a positive combination of the gradients of
the left-hand-sides of the binding constraints at optimality. The gradient of the
objective function is shown in green.
We can also verify that the KKT conditions hold for the problem in standard
form. Naturally, complementary slackness and primal feasibility hold. To see that
dual feasibility holds note that v = [0 0 2 1] ≥ 0. Further:
h i 1 2 1 0 h i h i
2 1 − 0 0 2 1 = 3 5 0 0
1 1 0 1
h i
Here 3 5 0 0 is the objective function coefficient vector for the problem in
Standard Form. ■
Exercise 8.7 Use a full simplex tableau to find the values of the Lagrange multipliers
(dual variables) at optimality for the problem from Exercise 8.5. Confirm that
complementary slackness holds at optimality. Lastly show that dual feasibility
holds by showing that the gradient of the objective function (c) is a positive
combination of the gradients of the binding constraints at optimality. [Hint: Use
the vector w you should have identified.] ■
Lemma 9.1. The dual of the dual problem is the primal problem.
Proof. Rewrite Problem D as:
max − bT wT
s.t. − AT wT ≤ −cT (9.4)
wT ≥ 0
Let β = −bT , G = −AT , u = wT and κ = −cT . Then this new problem becomes:
max βu
s.t. Gu ≤ κ (9.5)
u≥0
Let xT be the vector of dual variables (transposed) for this problem. We can
formulate the dual problem as:
min
xT κ
s.t. xT G ≥ β (9.6)
xT ≥ 0
cT x
max
P s.t. Ax ≤ b (9.8)
x≥0
CONSTRAINTS
0
VARIABLES
0
UNRESTRICTED =
CONSTRAINTS
0
VARIABLES
0
= UNRESTRICTED
Table 9.1: Table of Dual Conversions: To create a dual problem, assign a dual
variable to each constraint of the form Ax ◦ b, where ◦ represents a binary relation.
Then use the table to determine the appropriate sign of the inequality in the dual
problem as well as the nature of the dual variables.
■Example 9.1 Consider the problem of finding the dual problem for the Toy Maker
Problem (Example 2.30) in standard form. The primal problem is:
Here we have placed dual variable names (w1 , w2 and w3 ) next to the constraints
to which they correspond.
The primal problem variables in this case are all positive, so using Table 9.1
we know that the constraints of the dual problem will be greater-than-or-equal-to
constraints. Likewise, we know that the dual variables will be unrestricted in sign
since the primal problem constraints are all equality constraints.
The coefficient matrix is:
3 1 1 0 0
A = 1 2 0 1 0
1 0 0 0 1
Clearly we have:
h i
c= 7 6 0 0 0
120
b=
160
35
This vector will be related to c in the constraints of the dual problem. Remember,
in this case, all constraints are greater-than-or-equal-to. Thus we see that
the constraints of the dual problem are:
3w1 + w2 + w3 ≥ 7
w1 + 2w2 ≥ 6
w1 ≥ 0
w2 ≥ 0
w3 ≥ 0
Again, note that in reality, the constraints we derived from the wA ≥ c part
of the dual problem make the constraints “w unrestricted” redundant, for in fact
w ≥ 0 just as we would expect it to be if we’d found the dual of the Toy Maker
problem given in canonical form. ■
max x1 + x2
s.t. 2x1 + x2 ≥ 4
x1 + 2x2 ≤ 6
x1 , x 2 ≥ 0
■
Exercise 9.2 Use the table or the definition of duality to determine the dual for
the problem:
min cx
s.t. Ax ≤ b (9.10)
x≥0
wb ≥ cx (9.11)
Ax ≤ b
Therefore, we have:
wAx ≤ wb (9.12)
wA ≥ c
Therefore we have:
wAx ≥ cx (9.13)
wb ≥ cx
R Lemma 9.2 ensures that the optimal solution w∗ for Problem D must provide
an upper bound to Problem P , since for any feasible x, we know that:
w∗ b ≥ cx (9.14)
Proof. For any x, feasible to Problem P we know that wb ≥ cx for any feasible w.
The fact that Problem P is unbounded implies that for any V ∈ R we can find an x
feasible to Problem P that cx > V . If w were feasible to Problem D, then we would
have wb > V for any arbitrarily chosen V . There can be no finite vector w with this
property and we conclude that Problem D must be infeasible.
The alternative case that when Problem D is unbounded, then Problem P is
infeasible follows by reversing the roles of the problem. This completes the proof. ■
(w∗ A − c) x∗ = 0
(
Complementary Slackness (9.17)
w ∗ s∗ = 0
Furthermore, these KKT conditions are equivalent to the KKT conditions for the
primal problem.
Proof. Following the proof of Lemma 9.1, let β = −bT , G = −AT , u = wT and
κ = −cT . Then the dual problem can be rewritten as:
max
βu
s.t. Gu ≤ κ
u≥0
Let xT ∈ R1×n and sT ∈ R1×m be the dual variables for this problem. Then applying
Theorem 8.1, we obtain KKT conditions for this problem:
Gu∗ ≤ κ
(
Primal Feasibility (9.18)
u∗ ≥ 0
T
x ∗ G − s∗ T =β
Dual Feasibility ∗T (9.19)
x ≥0
∗T
s ≥0
x ∗ T
(Gu∗ − κ) = 0
Complementary Slackness (9.20)
s∗ T u ∗ = 0
We can rewrite:
s∗ T u ∗ = 0 ≡ s∗ T w ∗ T = 0 ≡ w ∗ s∗ = 0
Thus, we have shown that the KKT conditions for the dual problem are:
( ∗
w A≥c
Primal Feasibility
w∗ ≥ 0
Ax∗ + s∗
=b
∗
Dual Feasibility x ≥0
s∗ ≥ 0
(w∗ A − c) x∗ = 0
(
Complementary Slackness
w ∗ s∗ = 0
To prove the equivalence to the KKT conditions for the primal problem, define:
s∗ = b − Ax∗ (9.21)
v∗ = w∗ A − c (9.22)
That is, s∗ is a vector slack variables for the primal problem P at optimality and
v∗ is a vector of surplus variables for the dual problem D at optimality. Recall the
KKT conditions for the primal problem are:
Ax∗ ≤ b
(
Primal Feasibility
x∗ ≥ 0
∗ ∗
w A − v
=c
∗
Dual Feasibility w ≥0
v∗ ≥ 0
( ∗
w (Ax∗ − b) = 0
Complementary Slackness
v∗ x ∗ = 0
Ax∗ + s∗ = b
(
Primal Feasibility (9.23)
x∗ ≥ 0
∗
w A − v∗ =c
Dual Feasibility w∗ ≥ 0 (9.24)
v∗ ≥ 0
( ∗ ∗
w (s ) = 0
Complementary Slackness (9.25)
v∗ x ∗ = 0
Likewise beginning with the KKT conditions for the dual problem (Expression
9.15 - 9.17) we can write:
( ∗
w A − v∗ ≥c
Primal Feasibility ∗
w ≥0
Ax∗ + s∗
=b
∗
Dual Feasibility x ≥0
s∗ ≥ 0
(v∗ ) x∗ = 0
(
Complementary Slackness
w ∗ s∗ = 0
R Notice that the KKT conditions for the primal and dual problems are equiva-
lent, but the dual feasibility conditions for the primal problem are identical to
the primal feasibility conditions for the dual problem and vice-versa. Thus,
two linear programming problems are dual to each other if they share KKT
conditions with the primal and dual feasibility conditions swapped.
Exercise 9.3 Compute the dual problem for a canonical form minimization problem:
cx
min
P s.t. Ax ≥ b
x≥0
Find the KKT conditions for the dual problem you just identified. Use the result
from Exercise 8.6 to show that KKT conditions for Problem P are identical to
the KKT conditions for the dual problem you just found. ■
Lemma 9.4 (Strong Duality). There is a bounded optimal solution x∗ for Problem
P if and only if there is a bounded optimal solution w∗ for Problem D. Furthermore,
cx∗ = w∗ b.
Proof. Suppose that there is a solution x∗ for Problem P . Let s∗ = b − Ax∗ . Clearly
s∗ ≥ 0.
By Theorem 8.1 there exists dual variables w∗ and v∗ satisfying dual feasibility
and complementary slackness. Dual feasibility in the KKT conditions implies that:
v∗ = w∗ A − c (9.26)
v∗ x∗ = 0 =⇒ (w∗ A − c) x = 0 (9.27)
max x1 + x2
s.t. x1 − x2 ≥ 1
− x1 + x2 ≥ 1
x1 , x 2 ≥ 0
The following theorem summarizes all of the results we have obtained in the last
two sections.
Theorem 9.1 — Strong Duality Theorem. Consider Problem P and Problem D.
Then exactly one of the following statements is true:
1. Both Problem P and Problem D possess optimal solutions x∗ and w∗
respectively and cx∗ = w∗ b.
2. Problem P is unbounded and Problem D is infeasible.
This results in a geometry in which the dual feasible region is a reflection of the
primal feasible region (ignoring non-negativity constraints). This is illustrated in
Figure 9.1.
Figure 9.1: The dual feasible region in this problem is a mirror image (almost) of
the primal feasible region. This occurs when the right-hand-side vector b is equal to
the objective function coefficient column vector cT and the matrix A is symmetric.
1Thanks for Michael Cline for suggesting this section.
We can also illustrate the process of solving this problem using the (revised)
simplex algorithm. In doing so, we first convert Problem 9.31 to standard form:
max 6x1 + 6x2
s.t. 3x1 + 2x2 + s1 =6
(9.33)
2x1 + 3x2 + s2 = 6
x1 , x 2 ≥ 0
z 0 0 0
s1
1 0 6
(9.34)
s2 0 1 6
h i h i
This is because our initial cB = 0 0 and thus w = 0 0 . This means at the start
of the simplex algorithm, w1 = 0 and w2 = 0 and x1 = 0 and x2 = 0, so we begin at
the origin, which is in the primal feasible region and not in the dual feasible
region. If we iterate and choose x1 as an entering variable, our updated tableau
will be:
z 2 0 12
x1 1/3 0 2
(9.35)
s2 −2/3 1 2
z 6/5 6/5 72/5
x1 3/5 −2/5 6/5
(9.36)
x2 −2/5 3/5 6/5
Figure 9.2: The simplex algorithm begins at a feasible point in the feasible region
of the primal problem. In this case, this is also the same starting point in the dual
problem, which is infeasible. The simplex algorithm moves through the feasible
region of the primal problem towards a point in the dual feasible region. At the
conclusion of the algorithm, the algorithm reaches the unique point that is both
primal and dual feasible.
We should note that this problem is atypical in that the primal and dual feasible
regions share one common point. In more standard problems, the two feasible regions
cannot be drawn in this convenient way, but the simplex process is the same. The
simplex algorithm begins at a point in the primal feasible region with a corresponding
dual vector that is not in the feasible region of the dual problem. As the simplex
algorithm progresses, this dual vector approaches and finally enters the dual feasible
region.
Exercise 9.5 Draw the dual feasible region for the following problem.
Solve the problem using the revised simplex algorithm and trace the path of dual
variables (the w vector) in your plot of the dual feasible region. Also trace the
path of the primal vector x through the primal feasible region. [Hint: Be sure to
draw the area around the dual feasible region. Your dual vector w will not enter
the feasible region until the last simplex pivot.] ■
■ Example 9.2 Consider a leather company that requires 1 square yard of leather
to make a regular belt and a 1 square yard of leather to make a deluxe belt. If the
leather company can use up to 40 square yards per week to construct belts, then
one constraint it may have is:
x1 + x2 ≤ 40
In the absence of degeneracy, the dual variable (say w1 ) will tell the fair price
we would pay for 1 extra yard of leather. Naturally, if this were not a binding
constraint, then w1 = 0 indicating that extra leather is worth nothing to us since
we already have a surplus of leather. ■
Simultaneously, suppose that m resources (leather, wood, time etc.) are used to
make these n products and that aij units of resource i are used to manufacture
product j. Then clearly our constraints will be:
ai1 x1 + · · · + ain xn ≤ bi (9.43)
where bi is the amount of resource i available to the company. Suppose now that the
company decides to sell off some of its resources (instead of manufacturing products).
Suppose we sell each resource for a price wi (i = 1, . . . , m) we’d like to know what a
fair price for these resources would be. Each unit of product j not manufactured
would result in a loss of profit of cj . At the same time, we would obtain a gain (from
selling the excess resources) of:
m
X
aij wi (9.44)
i=1
Because we would save aij units of unit i from not manufacturing 1 unit of xj
(i = 1, . . . , m). Selling this resource would require us to make more money in the sale
of the resource then we could in manufacturing the product, or:
m
X
aij wi ≥ cj (9.45)
i=1
If a selfish profit maximizer wishes to buy these items, then we will seek a price per
resource that minimizes the total he could pay for all the items, that is:
m
X
min w i bi (9.46)
i=1
The Strong Duality Theorem asserts that the optimal solution to this problem will
produce fair shadow prices that force the total amount an individual could purchase
the resources of the company for to be equal to the amount the company could make
in manufacturing products itself.
■ Example 9.3 Assume that a leather company manufactures two types of belts:
regular and deluxe. Each belt requires 1 square yard of leather. A regular belt
requires 1 hour of skilled labor to produce, while a deluxe belt requires 2 hours of
labor. The leather company receives 40 square yards of leather each week and a
total of 60 hours of skilled labor is available. Each regular belt nets $3 in profit,
while each deluxe belt nets $5 in profit. The company wishes to maximize profit.
We can compute the fair price the company could sell its time or labor (or the
amount the company should be willing to pay to obtain more leather or more
hours).
The problem for the leather manufacturer is to solve the linear programming
problem:
If we solve the primal problem, we obtain the final revised simplex tableau as:
z 2 1 160
x1 −1 2
20
x2 1 −1 20
Note that both x1 and x2 are in the basis. In this case, we have w1 = 2 and w2 = 1
from Row 0 of the tableau.
We can likewise solve the dual-problem by converting it to standard form and
then using the simplex algorithm, we would have:
In this case, it is more difficult to solve the dual problem because there is no
conveniently obvious initial basic feasible solution (that is, the identity matrix is not
embedded inside the coefficient matrix).
The final full simplex tableau for the dual problem would look like:
z w1 w2 v1 v2 RHS
z
1 0 0 −20 −20 160
w1
0 1 0 1 −1 2
w2 0 0 1 −2 1 1
We notice two things: The reduced costs of v1 and v2 are precisely the negatives
of the values of x1 and x2 . This was to be expected, these variables are duals of each
other. However, in a minimization problem, the reduced costs have opposite sign.
The second thing to notice is that w1 = 2 and w2 = 1. These are the same values we
determined in the primal simplex tableau.
Lastly, let’s see what happens if we increase the amount of leather available by 1
square yard. If w2 (the dual variable that corresponds to the leather constraint) is
truly a shadow price, then we should predict our profit will increase by 1 unit. Our
new problem will become:
max 3x1 + 5x2
s.t. x1 + 2x2 ≤ 60
x1 + x2 ≤ 41
x1 , x 2 ≥ 0
Thus, if our Leather Manufacturer could obtain leather for a price under $1 per yard.
He would be a fool not buy it. Because he could make an immediate profit. This is
what economists call thinking at the margin.
■ Example 9.4
We can compute the dual variables w for this by using cB and B−1 at optimality.
You’ll notice that B−1 can always be found in the columns of the slack variables
for this problem because we would have begun the simplex algorithm with an
identity matrix in that position. We also know that cB = [7 6 0 0] at optimality.
Therefore, we can compute cB B−1 as:
0 0 −2/5 0 4/5
h i 1 0 7/10 0 −2/5 h i
cB B−1 = 7 6 0 0 = 0 7/5 0 16/5
0
1 1/2 0 −2
0 0 2/5 1 −4/5
In this case, it would seem that modifying the right-hand-side of constraint 1 would
have no affect. This is true, if we were to increase the value by a increment of
1. Suppose however we decreased the value of the right-hand-side by 1. Since we
claim that:
∂z
= w1 (9.47)
∂b1
there should be no change to the optimal objective function value. However,
our new optimal point would occur at x1 = 15.6 and x2 = 72.2 with an objective
function value of 542.4, clearly the value of the dual variable for constraint 1 is
not a true representation the shadow price of resource 1. This is illustrated in
Figure ?? where we can see that modifying the right-hand-side of Constraint 1 is
transforming the feasible region in a way that substantially changes the optimal
solution. This is simply not detected because degeneracy in the primal problem
leads to alternative optimal solutions in the dual problem.
It should be noted that a true margin price can be computed, however this is
outside the scope of the notes. The reader is referred to [BJS04] (Chapter 6) for
details. ■
min x1 + 2x2
s.t. x1 + 2x2 ≥ 12
2x1 + 3x2 ≥ 20
x1 , x 2 ≥ 0
In this case there is no immediately obvious initial basic feasible solution and
we would have to solve a Phase I problem. Consider the dual of the original
maximization problem:
In this case, a reasonable initial basic feasible solution for the dual problem is
to set v1 = v2 = 1 and w1 = w2 = 0 (i.e., w1 and w2 are non-basic variables) and
proceed with the simplex algorithm from this point. ■
In cases like the one illustrated in Example 9.5, we can solve the dual problem
directly in the simplex tableau of the primal problem instead of forming the dual
problem and solving it as a primal problem in its own tableau. The resulting
algorithm is called the dual simplex algorithm.
For the sake of space, we will provide the dual simplex algorithm for a maximiza-
tion problem:
max cx
P s.t. Ax = b
x≥0
We will then shown how to adjust the dual simplex algorithm for minimization
problems.
6. If no entering variable can be selected (aki ≥ 0 for all k ∈ K) then the dual
problem is unbounded and the primal problem is infeasible. STOP.
7. Using a standard simplex pivot, pivot on element aji , thus causing xBi to
become 0 (and thus feasible) and causing xj to enter the basis. GOTO STEP
3.
The pivoting step works because we choose the entering variable specifically so
that the reduced costs will remain positive. Just as we chose the leaving variable
in the standard simplex algorithm using a minimum ratio test to ensure that B−1 b
remains positive, here we use it to ensure that zj − cj remains non-negative for all
j ∈ J and thus we assure dual feasibility is maintained.
The convergence of the dual simplex algorithm is outside of the scope of this
course. However, it suffices to understand that we are essentially solving the dual
problem in the primal simplex tableau using the simplex algorithm applied to the
dual problem. Therefore under appropriate cycle prevention rules, the dual simplex
does in fact converge to the optimal (primal) solution.
Theorem 9.2 In the absence of degeneracy, or when using an appropriate cycling
prevention rule, the dual simplex algorithm converges and is correct.
max − x1 − x2
s.t. 2x1 + x2 ≥ 4
x1 + 2x2 ≥ 2
x1 , x 2 ≥ 0
max − x1 − x2
s.t. 2x1 + x2 − s1 = 4
x1 + 2x2 − s2 = 2
x1 , x 2 ≥ 0
In standard form, there is no clearly good choice for a starting basic feasible solution.
However, since this is a maximization problem and we know that x1 , x2 ≥ 0, we
know that the objective function −x1 − x2 must be bounded above by 0. A basic
solution that yields this objective function value occurs when s1 and s2 are both
non-basic and x1 and x2 are both non-basic.
If we let
−1 0
B=
0 −1
Likewise we have:
h i
w = cB B−1 = 0 0
since both s1 and s2 do not appear in the objective function. We can compute the
reduced costs in this case to obtain:
z1 − c1 = wA·1 − c1 = 1 ≥ 0
z2 − c2 = wA·2 − c2 = 1 ≥ 0
z3 − c3 = wA·3 − c3 = 0 ≥ 0
z4 − c4 = wA·4 − c4 = 0 ≥ 0
Thus, the fact that w ≥ 0 and the fact that zj − cj ≥ 0 for all j, shows us that we
have a dual feasible solution and based on our use of a basic solution, we know
that complementary slackness is ensured.
We can now set up our initial simplex tableau for the dual simplex algorithm.
This is given by:
z x1 x2 s1 s2 RHS
z
1 1 1 0 0 0
s1
0 −2 −1 1 0 −4
s2 0 −1 -2 0 1 −2
At this point, we see we have maintained dual feasibility, but we still do not have
primal feasibility. We can therefore choose a new leaving variable (s1 ) corresponding
to the negative element in the RHS. The minimum ratio test shows that this time
x1 will enter and the final simplex tableau will be:
z x1 x2 s1 s2 RHS
z
1 0 0 1/3 1/3 −2
x1
0 1 0 −2/3 1/3 2
x2 0 0 1 1/3 −2/3 0
It’s clear this is the optimal solution to the problem since we’ve achieved primal
and dual feasibility and complementary slackness. It’s also worth noting that this
optimal solution is degenerate, since there is a zero in the right hand side. ■
Exercise 9.6 Prove that the minimum ratio test given in the dual simplex algorithm
will maintain dual feasibility from one iteration of the simplex tableau to the next.
[Hint: Prove that the reduced costs remain greater than or equal to zero, just as
we proved that b remains positive for the standard simplex algorithm.] ■
min cT x
s.t Ax = b
(P)
ℓ≤x≤u
x ∈ Rn ,
where we assume A ∈ Rm×n to be full row rank (we will see in the section “Starting
basis” how to make sure that this assumption holds). We first introduce the concept
of a basis:
• There are n variables xj for j = {0, . . . , n − 1}.
• A basis of (P) is a partition of {0, . . . , n − 1} into three disjoint index subsets
B, L and U, such that if B is the matrix formed by taking the columns of A
indexed by B, then B is square and invertible.
Thus, we always have |B| = m, and there are at most ( m n ) different bases, possibly
less than that since some of the combinations may yield a singular B matrix. Given
a specific basis, we establish some notation:
• For all j ∈ B the variable xj is called a basic variable, and the corresponding
jth column of A is called a basic column.
• For all j ∈ L ∪ U the variable xj is called a nonbasic variable, and the corre-
sponding jth column of A is called a nonbasic column.
• By convention, the vector formed by taking together all the basic variables
is denoted xB . Similarly, cB , ℓB and uB are formed by taking together the
same indices of c, ℓ and u, respectively. The same notation is also used for the
indices in L and U, giving cL , cU , ℓL , ℓU , uL , and uU . We already defined B
as taking together the basic columns of A. The remaining (nonbasic) columns
form the submatrices L and U . Thus, there is a permutation of the columns
of A that is given by [B | L | U ]. For conciseness, we will write A = [B | L | U ],
although it is an abuse of notation.
• B, L and U are sets, so the order of the indices does not matter. However, it
must be consistent in the vectors defined above. For example, (i) cB1 must be
the objective function coefficient associated with the variable xB1 , (ii) ℓB1 and
uB1 must be the bounds on that same variable, and (iii) the first column of B
is the column of A that corresponds to that same variable.
The concept of basis is useful because of the following construction:
• We construct a solution x̄ of (P) as follows. Let us fix the components of x̄
in L or U at their lower or upper bound, respectively: x̄L = ℓL and x̄U = uU .
Given that these components are fixed, we can now compute the unique value
of x̄B such that the equality constraints Ax = b of (P) are satisfied. Indeed,
using the abuse of notation described earlier, we have
[B | L | U ] x̄ = b
B x̄B + Lx̄L + U x̄U = b
B x̄B + LℓL + U uU = b
B x̄B = b − LℓL − U uU
x̄B = B −1 (b − LℓL − U uU ) .
• c̄T := cT − cTB B −1 A are called the reduced costs corresponding to the basis
B, L, U. They have the property that c̄B = 0, so c̄ expresses the direction of the
objective function only in terms of the nonbasic variables.
• b̄ := B −1 b.
• Ā := B −1 A. If we use the partition Ā = [B̄ | L̄ | Ū ], we have that B̄ = B −1 B = I.
As a consequence, the tableau can be written
• A basis of (D) is a partition of {1, . . . , 3n}. However, (D) has a special structure
with two identity matrices in the constraints and no bounds on π. This yields
a characterization of the bases of (D) that follows.
• A basis of (D) needs n basic variables, as many as there are equality constraints
in (D). The π variables do not have bounds, so they are always basic. We now
need to select n − m basic variables among the λ and µ.
• For any given j, the variables λj and µj cannot be both basic, because if they
were, the basis matrix would contain twice the same identity column, and thus
would not be invertible (see Figure 10.1). So we have the following: Consider
the n possible indices j ∈ {0, . . . , n − 1}. For n − m of them, either λj or µj is
basic (but not both). For m of them, neither λj nor µj is basic.
· · · · ·
· · · · ·
[AT | I | I] =
· · · · ·
· · · · ·
· · · · ·
· · · · · ↓ · · · · ↓ · ·
· · · · · j · · · · j · ·
• Let use consider one of the latter values of j, i.e., one of the m values of j such
that neither λj nor µj is basic. Then, the only nonzeros in the jth row of the
basis matrix come from the jth row of AT (Figure 10.2). Now, considering all
· · · · · · · · ·
• • • · ·
• • •
· · · · ·
→ basis matrix
· · · ·
• • • · ·
• • •
• • • · · • • •
· · · · ↓ · ↓ ↓ · ↓ · ↓ ↓ · · · · ·
· · · · nonbasic · nonbasic · · · · ·
BT π = cB
LT π + λL = cL
T
U π + µ U = cU
T
U π free λ ≥ 0, µ ≤ 0.
The π variables are all basic and their values can be computed directly as π̄ T =
cTB B −1 . Then, the basic λ variables have values λ̄TL = cTL − π T L = cTL − cTB B −1 L
and the basic µ variables have values µ̄TU = cTU − π T U = cTL − cTB B −1 U . For the
basic solution (π̄ T , λ̄T , µ̄T ) to be feasible in (D), we need λ̄ ≥ 0 and µ̄ ≤ 0. The
basis is then called dual feasible. Let c̄ be the reduced costs in the corresponding
primal tableau, i.e., c̄T := cT − cTB B −1 A. It is easy to verify that (π̄ T , λ̄T , µ̄T ) is
feasible if and only if c̄j ≥ 0 for all j ∈ L and c̄j ≤ 0 for all j ∈ U. Observe that
these are the optimality condition of the simplex method on the primal (P).
• To derive the reduced costs of (D) for a given basis B, L, U, we need to express
the objective function in terms of λB , λU , µB , µL only (the nonbasic variables).
Let us write a partitioned version of (D) again, this time without discarding
This gives us
π = B −T (cB − λB − µB ) = (B −T cB ) − B −T (λB + µB )
λL = cL − LT π − µL = (cL − LT B −T cB ) + LT B −T (λB + µB ) − µL
µU = cU − U T π − λU = (cU − U T B −T cB ) + U T B −T (λB + µB ) − λU
where the first term of each right-hand side is constant and can be ignored in
an objective function. After rewriting the objective function and simplifying
the result, we get
min (x̄B − ℓB )λB + (uU − ℓU )λU − (uB − x̄B )µB − (uL − ℓL )µL
current tableau:
where āie , āig > 0 and āif , āih < 0. The four indices e, f, g, h represent the four possible
configurations: variable at upper or lower bound, and āik positive or negative. We
only use the notation e, f, g, h for simplicity: there can be zero or more than one
variable in each configuration. All variables in one given configuration are treated
similarly.
Any āik = 0 can be ignored. They do not interfere with the computations below,
and it can be shown that the B ′ matrix of the next iteration will be invertible if and
only if we do not consider the corresponding columns as candidate entering columns.
Since e, f ∈ L and g, h ∈ U, we currently have c̄e , c̄f ≥ 0 and c̄g , c̄h ≤ 0.
• If xj leaves to its lower bound, we will need c̄′j ≥ 0 at the next iteration,
while maintaining zero reduced costs for all other indices in B. Any such new
objective function can be achieved by adding a nonnegative multiple t of the
ith row of the tableau to the current objective function. The multiplier t will
be called the dual step length.
- We know that c̄e will become c̄′e = c̄e + t āie , which is guaranteed to always
meet c̄′e ≥ 0 because āie > 0.
- Instead, since āif < 0, we will have c̄′f = c̄f + t āif ≥ 0 if and only if t ≤
c̄f /(−āif ).
- For c̄′g = c̄g + t āig ≤ 0, we need t ≤ −c̄g /āig .
- Finally, c̄′h = c̄h + t āih ≤ 0 is guaranteed to always be met.
• If xj leaves to its upper bound, we will need c̄′j ≤ 0 at the next iteration,
while maintaining zero reduced costs for all other indices in B. Any such new
objective function can be achieved by subtracting a nonnegative multiple t of
the ith row of the tableau to the current objective function.
- The condition c̄′e = c̄e − t āie ≥ 0 requires t ≤ c̄e /āif .
- The condition c̄′f = c̄f − t āif ≥ 0 is always satisfied.
- The condition c̄′g = c̄g − t āig ≤ 0 is always satisfied.
- The condition c̄′h = c̄h − t āih ≤ 0 requires t ≤ (−c̄h )/(−āih ).
If the signs of the c̄k and āik coefficients are such that no conditions are imposed
on t, it can be shown that (D) is unbounded, which corresponds to (P) being infeasible
(note that, because of the finite bounds ℓ and u, (P) is never unbounded).
Each of the above conditions defines an upper bound tk on t, i.e., t ≤ tk for all
k ∈ L ∪ U. The most restrictive condition can be selected by computing t = mink tk .
If k is a value of k that yields the minimum, we will have c̄′k = 0 and k can be our
entering variable, i.e., we can set B′ = B \ {j} ∪ {k}. Finding k is called the ratio test.
Figure 10.3 summarizes how to compute tk depending on the signs of āik and c̄k .
k∈L >0
j ∈ L′ <0 c̄k /(−āik )
(x̄j < ℓj ) k∈U >0 (−c̄k )/āik
<0
k∈L >0 c̄k /āik
j ∈ U′ <0
(x̄j > uj ) k∈U >0
< 0 (−c̄k )/(−āik )
Figure 10.3: Computing the upper bounds tk on the dual step length t in the ratio
test.
Starting basis. Before we can apply the dual simplex method, we need to have a
dual feasible basis. First, this means that we need a set of column indices B such
that B is invertible. A simple way to obtain that is to add m artificial variables z
fixed to zero, as demonstrated in (P+):
min cT x + 0T z
s.t Ax + Iz = b
ℓ≤x≤u (P+)
0≤z≤0
x ∈ Rn , z ∈ Rm
We can do that as a very first step before starting the dual simplex method. Then, it
is easier to let n := n + m, cT := [cT 0T ], ℓT := [ℓT 0T ], uT := [uT 0T ] and A := [A I],
so that you can forget about the z variables and have a problem of the form (P),
but with the guarantee that the last m columns of A form an identity matrix (which
is invertible: I −1 = I). Note that having an m × m identity in A also ensures that A
is full row rank.
Once we have B, it is straightforward to construct L and U such that B, L, U is
dual feasible. Having B is enough to compute the reduced costs c̄T = cT − cTB B −1 A.
For all j ∈/ B, we can assign j to L if c̄j ≥ 0 or to U if c̄j ≤ 0. This way, c̄j will always
have the appropriate sign to ensure dual feasibility.
Summary. We can now give, in Figure 10.4, a precise description of the operations
in the dual simplex method with bounds. We can also make a few observation that
will prove useful in implementing the dual simplex method.
At Step 1, in most cases, there will be multiple candidate values of i such that
x̄Bi violates its bounds. Choosing one to become the leaving variable is called a
pricing rule. In theory, any candidate would work, but in practice it is a good idea
to choose a candidate with a large bound violation, for example one with the largest
violation.
There are a few useful invariants in the dual simplex method that we can use to
verify that our implementation is working as intended. First, we have the matrix
B formed with the columns of A with indices in B. This matrix must always
stay invertible. If B becomes singular, then the ratio test is not working properly.
Specifically, we are choosing an entering variable k such that the tableau element āik
is zero. Second, there is dual feasibility. We must always have c̄j ≥ 0 for all j ∈ L
and c̄j ≤ 0 for all j ∈ U. If we lose dual feasibility, it also means that the ratio test
is not working. In this case, we chose a wrong value for tk , not actually mink {tk },
something larger.
Finally, recall that at any given iteration of the simplex method, we can compute
the corresponding basic solution by letting x̄B = B −1 (b − LℓL − U uU ), x̄L = ℓL and
x̄U = uU . In the dual simplex method, x̄ will not be feasible (until the last iteration,
at which point we stop). However, we can still compute the corresponding dual
obective function value: z̄ = cT x̄. As the dual simplex method makes progress, this
objective should be nondecreasing: from one iteration to the next, it either stays the
same (when t = 0), or increases. If z̄ decreases, it means that we made a mistake in
the choice of the leaving variable.
Exercise 1
Consider the problem
max 2x1 + 3x2
s.t. 2x1 + x2 ≤ 10
x1 + 2x2 ≤ 10
x1 + x2 ≤ 6
x1 , x 2 ≥ 0
a)
Write down a matrix A and vectors b and c so that the problem can be written on
the form
max cT x
s.t. Ax ≤ b
x≥0
b)
Sketch the feasible region of the problem.
c)
Solve the problem using the Simplex method.
d)
We consider the same constraints as in a), but change the objective function to
2x1 + 2x2 . Apply the simplex method again to find all optimal solutions to this
modified problem.
Exercise 2
Find any optimal solution to the problem
max x1 + x2 + x3
s.t. 2x1 − 2x2 + x3 ≤ 4
3x1 − x2 + 2x3 ≤ 2
x1 , x 2 , x 3 ≥ 0
Exercise 3
In the field of compressive sensing one attempts to recover an unknown vector from
an underdetermined set of (linear) measurements, i.e., find an unknown x ∈ RN that
satisfies Ax = p, where
• p ∈ Rm is the vector of measurements, and
• A is the m × N matrix which collects those measurements.
In practical applications m is much smaller than N , and we can’t expect to recover
x in general. But if we have some additional information about x, it turns out that
the knowledge of the measurements in p may still be enough to recover x. The
additional information we will consider is sparsity (a vector is called sparse if it has
mostly components that are zero): For many “magic" matrices A, if it is known that
x is sparse, x can be recovered as the optimal solution to the problem
min ∥x∥1
(10.1)
s.t. Ax = p,
where ∥x∥1 = |x1 | + |x2 | + . . . + |xN |. Note that the variable x here is unconstrained:
It is not required to be non-negative. In the following we will test if this procedure
works for a very small vector and matrix.
a)
Show that x is an optimal solution to (10.1) if and only if it is an optimal solution
to
(−x+ − x− )
P
max i i i
A −A x+ p
s.t. ≤ (10.2)
−A A x− −p
x+ , x− ≥ 0
science to recover a vector with 3 components from two measurements. But the
magic is that solving the problem (10.1) also can help recover sparse vectors x when
N is very large and m is very small compared to N !
b)
Solve (10.2), with A and p as given above, using the simplex method. Is the correct
x recovered? Is the optimum unique?
To get started, you can use that the primal dictionary is
Initialization
Add m variables to the problem, fixed to zero by their bounds.
From now on, only consider the enlarged problem:
n := n + m, cT := [cT 0T ], ℓT := [ℓT 0T ], uT := [uT 0T ] and A := [A I],
where 0T is a row vector of size m with all components set to zero.
Build the starting basis:
Set B := {n, . . . , n + m − 1}.
Form the corresponding basis matrix B.
Compute c̄T = cT − cTB B −1 A.
For all j ∈ {0, . . . , n − 1},
if c̄j > 0, set j ∈ L,
if c̄j < 0, set j ∈ U,
if c̄j = 0, we can arbitrarily select either j ∈ L or j ∈ U.
Step 1 (leaving variable)
Form the basis matrix B (from the columns of A indexed by B).
Compute c̄T = cT − cTB B −1 A.
Compute x̄B = B −1 (b − LℓL − U uU ).
Find a component i of xB such that either x̄Bi < ℓBi or x̄Bi > uBi .
If no such i exists, we reached optimality. Stop.
Let j be such that xj corresponds to xBi .
Step 2 (entering variable)
Compute the ith row of B −1 A.
Perform the ratio test: compute k = arg mink∈L∪U {tk }, where tk is defined as in Figure 10.3.
If there is no bound tk , the problem is infeasible. Stop.
Step 3 (pivoting)
Leaving variable:
B := B \ {j}
If x̄Bi < ℓBi , then L := L ∪ {j}.
If x̄Bi > uBi , then U := U ∪ {j}.
Entering variable:
If k ∈ L, then L := L \ {k}.
If k ∈ U, then U := U \ {k}.
B := B ∪ {k}
Go to Step 1.