[go: up one dir, main page]

0% found this document useful (0 votes)
14 views515 pages

Main

Uploaded by

Teklebirhan Ab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views515 pages

Main

Uploaded by

Teklebirhan Ab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 515

Linear Optimization

Principles, Techniques, and Applications in Science and


Engineering

Teklebirhan Abraha Gebrehiwot

Aksum University

Department of Mathematics
Copyright © 2024 Teklebirhan Abraha Gebrehiwot

Licensed under the Creative Commons Attribution-NonCommercial 4.0 License (the


“License”). You may not use this file except in compliance with the License. You
may obtain a copy of the License at https://creativecommons.org/licenses/
by-nc-sa/4.0. Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “as is” basis, without war-
ranties or conditions of any kind, either express or implied. See the License
for the specific language governing permissions and limitations under the License.

First amendment, July 2024


Contents

1 Mathematical Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1 Linear Algebra Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Convex Sets and Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 Polyhedral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Introduction to Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.6.1 Dot Product and Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.7 Basics of Convex Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.7.1 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.7.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.3 Convex Sets, Functions and Cones and Polyhedral Theory . . . . . . . . . . . . . . . . . 49
1.7.4 Extreme Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2 Introduction to Linear Optimization . . . . . . . . . . . . . . . . . . . . . . 69


2.1 Overview of Linear Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.2 Historical Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3 Importance in Various Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4 A General Maximization Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.5 Some Geometry for Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.6 Gradients, Constraints and Optimization . . . . . . . . . . . . . . . . . . . . . . . . 80
2.7 Linear programs and optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.7.1 The feasible region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.7.2 The objective function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.8 The naive approach to solving linear programs . . . . . . . . . . . . . . . . . . . 84
2.8.1 Misbehaving linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4 CONTENTS

2.9 What can a linear program model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86


2.10 Different formulations of linear programs . . . . . . . . . . . . . . . . . . . . . . . . 87
2.10.1 Nonnegativity constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.10.2 Equations and inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.11 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.12 Infinitely many solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.13 Terminology and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.14 Choosing a different basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.14.1 Solving the problem from scratch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.14.2 Modify an existing solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.14.3 Multiply by an inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.15 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.16 Challenge problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.17 From linear algebra back to linear programming . . . . . . . . . . . . . . . . . . 96
2.18 An example of pivoting in the simplex method . . . . . . . . . . . . . . . . . . . 97
2.18.1 Step 1: a basic solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.18.2 Step 1, again: a basic feasible solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.18.3 Step 2: pivoting (intuitively) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.18.4 Step 3: pivoting (algebraically) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.18.5 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
2.19 Introducing objective functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.20 The dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
2.21 Using the simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.21.1 The first pivoting step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.21.2 How do we make progress? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.21.3 One more pivot step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.21.4 The end of the simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.22 Optional: dictionaries and tableaux . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.23 An example with nothing weird going on . . . . . . . . . . . . . . . . . . . . . . . 106
2.24 An unbounded linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.25 An example of degenerate pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.26 The problem with initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.26.1 A tricky example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.26.2 The problem in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.27 The two-phase method for inequalities . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.28 The general two-phase simplex method . . . . . . . . . . . . . . . . . . . . . . . . 113
2.29 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.30 A very degenerate problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.31 Pivoting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
2.32 Lexicographic pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.32.1 Intuition: random perturbations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
2.32.2 Actual lexicographic pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
CONTENTS 5

2.32.3 Working through an example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119


2.32.4 Shortcuts (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.33 Matrix calculations for the dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . 121
2.34 Definitions of corner points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
2.35 Relationships between the definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.35.1 From basic feasible solutions to vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.35.2 From vertices to extreme points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.35.3 From extreme points to basic feasible solutions . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.36 Calculating the reduced costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.37 The revised simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.37.1 Finding a basic solution using matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.37.2 Being careful about what we compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.37.3 A summary of the revised simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.37.4 One more pivot step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.38 Lessons learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.39 The terrible trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.40 Tricking Bland’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.41 The Klee–Minty cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2.42 Closing remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.42.1 Other pivoting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.42.2 The average case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.43 An example of duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.44 Weak duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.45 Duals of other kinds of programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.46 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.46.1 A stronger theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.46.2 Examples with infeasible primal and dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.47 A solution we suspect to be optimal . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.47.1 A shipping problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.47.2 Taking the dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.47.3 An example of complementary slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
2.48 Complementary slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.49 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.50 The primal LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
2.51 The dual LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
2.52 Finding the dual solution from the dictionary . . . . . . . . . . . . . . . . . . . 147
2.52.1 A general formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
2.52.2 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
2.52.3 A special case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
2.52.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6 CONTENTS

2.53 The dual simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150


2.54 Example: another look at a terrible cube . . . . . . . . . . . . . . . . . . . . . . . 152
2.55 The two-phase dual simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.55.1 The plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.55.2 Solving the example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
2.55.3 Problems in equational form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
2.56 Warm starts and row generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
2.57 Sensitivity analysis of the costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
2.57.1 Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
2.57.2 What we can compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.57.3 Ranging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
2.57.4 Nonbasic variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
2.58 Sensitivity analysis of the constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.58.1 The dual variables as "shadow costs" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.58.2 Ranging with slack variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
2.59 Introduction to games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.59.1 Matrix games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.59.2 Zero-sum games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.60 Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.60.1 Dominated strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.60.2 Saddle points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
2.60.3 Mixed strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
2.61 The maximin strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
2.61.1 Alice’s plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
2.61.2 Writing down a linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
2.61.3 Example: the odd-even game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
2.62 Zero-sum games and duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
2.63 Solving the linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
2.63.1 Complementary slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.63.2 Simplifying the linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.64 Example: a fruit-shipping problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.65 Maximum flow problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
2.66 Cuts and their capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
2.67 Cuts and linear programming duality . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
2.68 Greedily increasing flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
2.69 An augmenting path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
2.70 The residual graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
2.71 Residual graphs and minimum cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
2.72 Proof of Theorem 2.17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
2.73 The Ford–Fulkerson algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
2.74 Consequences of the Ford–Fulkerson method . . . . . . . . . . . . . . . . . . . 182
2.75 Totally unimodular matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
CONTENTS 7

2.76 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185


2.76.1 Graph factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
2.76.2 Consistent matrix rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.77 Transversals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
2.78 Hall’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
2.79 Vertex covers and matchings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
2.80 Integer linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
2.81 Logical constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2.81.1 Sudoku . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2.81.2 Boolean satisfiability problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
2.81.3 Tying together logical and linear constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2.82 Sudoku solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
2.83 The bin packing problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
2.83.1 Bin packing by planning each trip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
2.83.2 The configuration linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
2.84 The branch-and-bound method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.84.1 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.84.2 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.84.3 Why symmetry is bad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.85 Cutting planes in general . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.86 The Gomory fractional cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
2.86.1 The general rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
2.86.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
2.87 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
2.88 The traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
2.89 An incomplete formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
2.90 Subtour elimination constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
2.90.1 Solution #1: the DFJ constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
2.90.2 Solution #2: MTZ constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
2.91 Approximation algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
2.92 Review of Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
2.93 Definition of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
2.93.1 Solutions of linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
2.93.2 Simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
2.94 Introduction to Linear optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
2.94.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
2.94.2 Linear Programming Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
2.95 Basic feasible solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
2.96 Fundamental Theorem of Linear Programming . . . . . . . . . . . . . . . . . . 250
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
8 CONTENTS

Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
2.97 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
2.98 Complementary slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
2.99 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
2.100 Solving linear programming problems . . . . . . . . . . . . . . . . . . . . . . . . . . 263
2.101 Graphical example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
2.102 The Geometry of LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
2.103 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
2.104 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
2.105 When is a Linear Program Feasible ? . . . . . . . . . . . . . . . . . . . . . . . . . . 293
2.106 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
2.106.1 Size of the Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
2.107 Complexity of linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
2.108 Solving a Liner Program in Polynomial Time . . . . . . . . . . . . . . . . . . . 304
2.108.1 Ye’s Interior Point Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
2.109 Description of Ye’s Interior Point Algorithm . . . . . . . . . . . . . . . . . . . . 310
2.110 Analysis of the Potential Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
2.111 Bit Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
2.112 Transformation for the Interior Point Algorithm . . . . . . . . . . . . . . . . . 317
2.113 Modeling: Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
2.114 Modeling and Assumptions in Linear Programming . . . . . . . . . . . . . . 321
2.114.1 General models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
2.114.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
2.114.3 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
2.114.4 Capital Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
2.114.5 Work Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
2.114.6 Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
2.114.7 Multi period Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.8 Mixing Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.9 Financial Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.10Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
2.114.11Multi-Commodity Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
2.115 Modeling Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
2.115.1 Maximizing a minimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
CONTENTS 9

3 Geometry of Linear Optimization . . . . . . . . . . . . . . . . . . . . . . . . 339


3.1 Geometric Interpretation of Linear Programs . . . . . . . . . . . . . . . . . . . 339
3.2 Feasible Regions and Vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
3.3 Optimality and Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
3.4 Solving systems of linear inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
3.4.1 Fourier-Motzkin Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
3.5 Using an LP solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
3.6 CPLEX LP file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
3.7 SoPlex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
3.8 NEOS server for optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

4 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349


4.1 Introduction to the Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 349
4.2 Tableau Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
4.3 Pivot Operations and Computational Aspects . . . . . . . . . . . . . . . . . . . 349
4.4 Variants of the Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
4.5 Nonempty and Bounded Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
4.6 Infinitely Many Optimal Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
4.7 Problems with No Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
4.8 Problems with Unbounded Feasible Regions . . . . . . . . . . . . . . . . . . . . 359
4.9 Formal Mathematical Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364

5 Duality Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369


5.1 Duality Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
5.2 Primal-Dual Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
5.3 Economic Interpretation of Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
5.4 Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
5.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
5.6 Special Matrices and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
5.7 Matrices and Linear Programming Expression . . . . . . . . . . . . . . . . . . . 373
5.8 Gauss-Jordan Elimination and Solution to Linear Equations . . . . . . 376
5.9 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
5.10 Solution of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
5.11 Linear Combinations, Span, Linear Independence . . . . . . . . . . . . . . . . 383
5.12 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
10 CONTENTS

5.13 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389


5.14 Solving Systems with More Variables than Equations . . . . . . . . . . . . 391
5.15 Solving Linear Programs with Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . 393
5.16 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
5.16.1 Linear Programming General Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
5.17 Standard Form of LP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
5.18 Structure of Feasible Solutions for LP . . . . . . . . . . . . . . . . . . . . . . . . . . 400
5.18.1 Example of F in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.19 How to Find an Optimum Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.19.1 Possibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.19.2 *Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.19.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.20 Solving Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
5.20.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
5.20.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
5.21 Looking Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402

6 Interior-Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403


6.1 Basics of Interior-Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
6.2 Barrier Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
6.3 Path-Following Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
6.4 Comparison with the Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . 403
6.5 Linear Programming and Extreme Points . . . . . . . . . . . . . . . . . . . . . . . 403
6.6 Algorithmic Characterization of Extreme Points . . . . . . . . . . . . . . . . . 405
6.7 The Simplex Algorithm–Algebraic Form . . . . . . . . . . . . . . . . . . . . . . . . 406
6.8 Simplex Method–Tableau Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6.9 Identifying Unboundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
6.10 Identifying Alternative Optimal Solutions . . . . . . . . . . . . . . . . . . . . . . . 420
6.11 Degeneracy and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
6.11.1 The Simplex Algorithm and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
6.12 Simplex Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
6.13 Artificial Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
6.14 The Two-Phase Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
6.14.1 Case I: xa = 0 and is out of the basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
6.14.2 Case II: xa = 0 and is not out of the basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
6.15 The Big-M Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
6.16 The Single Artificial Variable Technique . . . . . . . . . . . . . . . . . . . . . . . . 439
6.17 Problems that Can’t be Initialized by Hand . . . . . . . . . . . . . . . . . . . . . 441
CONTENTS 11

7 Advanced Topics In Linear Optimization . . . . . . . . . . . . . . . . 447


7.1 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
7.2 Large-Scale Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
7.3 Decomposition Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
7.4 Multi-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
7.5 Degeneracy Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
7.6 The Lexicographic Minimum Ratio Leaving Variable Rule . . . . . . . . 450
7.6.1 Lexicographic Minimum Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
7.6.2 Convergence of the Simplex Algorithm Under Lexicographic Minimum Ratio Test 453
7.7 Bland’s Rule, Entering Variable Rules and Other Considerations . . 455

8 Applications in Science and Engineering . . . . . . . . . . . . . . . . . 457


8.1 Engineering Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.1.1 Network Flow Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.1.2 Optimal Design and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.1.3 Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.2 Operations Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.2.1 Transportation and Assignment Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.2.2 Production Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.2.3 Scheduling Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.3 Economics and Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.3.1 Portfolio Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.3.2 Market Equilibrium Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.3.3 Game Theory and Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.4 Data Science and Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.4.1 Linear Regression and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.4.2 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.4.3 Data Fitting and Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.5 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.5.1 Real-World Applications and Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.5.2 Success Stories and Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.6 The Revised Simplex Method and Optimality Conditions . . . . . . . . . 458
8.7 The Revised Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
8.8 Farkas’ Lemma and Theorems of the Alternative . . . . . . . . . . . . . . . . 463
8.8.1 Geometry of Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
8.8.2 Theorems of the Alternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
8.9 The Karush-Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
8.10 Relating the KKT Conditions to the Tableau . . . . . . . . . . . . . . . . . . . 475

9 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
9.1 The Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
9.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
12 CONTENTS

9.3 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486


9.4 Geometry of the Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
9.5 Economic Interpretation of the Dual Problem . . . . . . . . . . . . . . . . . . . 492
9.6 The Dual Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

10 More LP Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503


List of Figures

1.1 A convex combination of the points x and y is given by z = λx + (1 − λ)y with any
λ ∈ [0, 1]. Here we demonstrate this using λ = 2/3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.3 The green intersection of the convex sets that are the ball and the polytope is also convex.
This can be seen by considering any points x, y ∈ Ball ∩ Polytope. Since Ball is convex, the line
segment between x and y is completey contained in Ball. And similarly, the line segment is
completely contained in Polytope. Hence, the line segment is also contained in the intersection.
This is how we can reason that the intersection is also convex. . . . . . . . . . . . . . . . . . . . . 46
1.4 Comparison of Convex and Non-Convex Functions. . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5 Convex Functions f (x, y) = x2 + y 2 + x , f (x, y) = ex+y + ex−y + e−x−y , and f (x, y) =
x2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6 Examples of Convex Sets: The set on the left (an ellipse and its interior) is a convex set;
every pair of points inside the ellipse can be connected by a line contained entirely in the ellipse.
The set on the right is clearly not convex as we’ve illustrated two points whose connecting line
is not contained inside the set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.7 A convex function: A convex function satisfies the expression f (λx1 + (1 − λ)x2 ) ≤
λf (x1 ) + (1 − λ)f (x2 ) for all x1 and x2 and λ ∈ [0, 1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.8 A hyperplane in 3 dimensional space: A hyperplane is the set of points satisfying an
equation aT x = b, where k is a constant in R and a is a constant vector in Rn and x is a
variable vector in Rn . The equation is written as a matrix multiplication using our assumption
that all vectors are column vectors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.9 Two half-spaces defined by a hyper-plane: A half-space is so named because any hyper-
plane divides Rn (the space in which it resides) into two halves, the side “on top” and the side
“on the bottom.” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10 A Ray: The points in the graph shown in this figure are in the set produced using the
expression x0 + dλ where x0 = [2, 1]T and d = [2, 2]T and λ ≥ 0. . . . . . . . . . . . . . . . . . . 55
1.11 Convex Direction: Clearly every point in the convex set (shown in blue) can be the
vertex for a ray with direction [1, 0]T contained entirely in the convex set. Thus [1, 0]T is a
direction of this convex set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.12 An Unbounded Polyhedral Set: This unbounded polyhedral set has many directions.
One direction is [0, 1]T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
14 LIST OF FIGURES

1.13 Boundary Point: A boundary point of a (convex) set C is a point in the set so that for
every ball of any radius centered at the point contains some points inside C and some points
outside C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.14 A Polyhedral Set: This polyhedral set is defined by five half-spaces and has a single
degenerate extreme point located at the intersection of the binding constraints 3x1 + x2 ≤ 120,
28
x1 + 2x2 ≤ 160 and 16 x1 + x2 <= 100. All faces are shown in bold. . . . . . . . . . . . . . . . . 62
1.15 Visualization of the set D: This set really consists of the set of points on the red line.
This is the line where d1 + d2 = 1 and all other constraints hold. This line has two extreme
points (0, 1) and (1/2, 1/2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
1.16 The Cartheodory Characterization Theorem: Extreme points and extreme directions are
used to express points in a bounded and unbounded set. . . . . . . . . . . . . . . . . . . . . . . . . 68

2.1 Goat pen with unknown side lengths. The objective is to identify the values of x and y
that maximize the area of the pen (and thus the number of goats that can be kept). . . . 70
2.2 Plot with Level Sets Projected on the Graph of z. The level sets existing in R2 while the
graph of z existing R3 . The level sets have been projected onto their appropriate heights on
the graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.3 Contour Plot of z = x2 + y 2 . The circles in R2 are the level sets of the function. The
lighter the circle hue, the higher the value of c that defines the level set. . . . . . . . . . . . . 74
2.4 A Line Function: The points in the graph shown in this figure are in the set produced
using the expression x0 + vt where x0 = (2, 1) and let v = (2, 2). . . . . . . . . . . . . . . . . . . 75
2.5 A Level Curve Plot with Gradient Vector: We’ve scaled the gradient vector in this case to
make the picture understandable. Note that the gradient is perpendicular to the level set curve
at the point (1, 1), where the gradient was evaluated. You can also note that the gradient is
pointing in the direction of steepest ascent of z(x, y). . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.6 Level Curves and Feasible Region: At optimality the level curve of the objective function
is tangent to the binding constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.7 Gradients of the Binding Constraint and Objective: At optimality the gradient of the
binding constraints and the objective function are scaled versions of each other. . . . . . . . 81
2.8 Goat pen with unknown side lengths. The objective is to identify the values of x and y
that maximize the area of the pen (and thus the number of goats that can be kept). . . 222
2.9 Graph representing primal in example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
2.10 A polyhedron with no vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
2.11 Traversing the vertices of a convex body (here a polyhedron in R3 ). . . . . . . . . . . 293
2.13 The Projection Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
2.12 Examples of convex and non-convex sets in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . 295
2.14 Exploring the interior of a convex body. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
2.15 A centering mapping. If x is close to the boundary, we map the polyhedron P onto
another one P ′ , s.t. the image x′ of x is closer to the center of P ′ . . . . . . . . . . . . . . . . 306
2.16 Null space of A and gradient direction g. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

4.1 Feasible Region and Level Curves of the Objective Function: The shaded region in the
plot is the feasible region and represents the intersection of the five inequalities constraining
the values of x1 and x2 . On the right, we see the optimal solution is the “last” point in the
feasible region that intersects a level set as we move in the direction of increasing profit. 352
LIST OF FIGURES 15

4.2 An example of infinitely many alternative optimal solutions in a linear programming


problem. The level curves for z(x1 , x2 ) = 18x1 + 6x2 are parallel to one face of the polygon
boundary of the feasible region. Moreover, this side contains the points of greatest value for
z(x1 , x2 ) inside the feasible region. Any combination of (x1 , x2 ) on the line 3x1 + x2 = 120
for x1 ∈ [16, 35] will provide the largest possible value z(x1 , x2 ) can take in the feasible region
S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
4.3 A Linear Programming Problem with no solution. The feasible region of the linear
programming problem is empty; that is, there are no values for x1 and x2 that can simultaneously
satisfy all the constraints. Thus, no solution exists. . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
4.4 A Linear Programming Problem with Unbounded Feasible Region: Note that we can
continue to make level curves of z(x1 , x2 ) corresponding to larger and larger values as we move
down and to the right. These curves will continue to intersect the feasible region for any value
of v = z(x1 , x2 ) we choose. Thus, we can make z(x1 , x2 ) as large as we want and still find a
point in the feasible region that will provide this value. Hence, the optimal value of z(x1 , x2 )
subject to the constraints +∞. That is, the problem is unbounded. . . . . . . . . . . . . . . . 361
4.5 A Linear Programming Problem with Unbounded Feasible Region and Finite Solution:
In this problem, the level curves of z(x1 , x2 ) increase in a more “southernly” direction that in
Example –that is, away from the direction in which the feasible region increases without bound.
The point in the feasible region with largest z(x1 , x2 ) value is (7/3, 4/3). Note again, this is a
vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

5.1 The feasible region for the diet problem is unbounded and there are alternative optimal
solutions, since we are seeking a minimum, we travel in the opposite direction of the gradient,
so toward the origin to reduce the objective function value. Notice that the level curves hit one
side of the boundary of the feasible region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
5.2 Matlab input for solving the diet problem. Note that we are solving a minimization
problem. Matlab assumes all problems are mnimization problems, so we don’t need to multiply
the objective by −1 like we would if we started with a maximization problem. . . . . . . . 397

6.1 The Simplex Algorithm: The path around the feasible region is shown in the figure. Each
exchange of a basic and non-basic variable moves us along an edge of the polygon in a direction
that increases the value of the objective function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
6.2 Unbounded Linear Program: The existence of a negative column aj in the simplex tableau
for entering variable xj indicates an unbounded problem and feasible region. The recession
direction is shown in the figure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
6.3 Infinite alternative optimal solutions: In the simplex algorithm, when zj − cj ≥ 0 in a
maximization problem with at least one j for which zj − cj = 0, indicates an infinite set of
alternative optimal solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
6.4 An optimization problem with a degenerate extreme point: The optimal solution to this
problem is still (16, 72), but this extreme point is degenerate, which will impact the behavior of
the simplex algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
6.5 Finding an initial feasible point: Artificial variables are introduced into the problem. These
variables allow us to move through non-feasible space. Once we reach a feasible extreme point,
the process of optimizing Problem P1 stops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
6.6 Multiperiod inventory models operate on a principle of conservation of flow. Manufactured
goods and previous period inventories flow into the box representing each period. Demand and
next period inventories flow out of the box representing each period. This inflow and outflow
must be equal to account for all production. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
6.7 Input model to GLPK describing McLearey’s Problem . . . . . . . . . . . . . . . . . . . . . 443
16 LIST OF FIGURES

6.8 Input data to GLPK describing McLearey’s Problem . . . . . . . . . . . . . . . . . . . . . . . 444


6.9 Output from glpsol on the McLearey Problem. . . . . . . . . . . . . . . . . . . . . . . . . . 444

8.1 System 2 has a solution if (and only if) the vector c is contained inside the positive cone
constructed from the rows of A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
8.2 System 1 has a solution if (and only if) the vector c is not contained inside the positive
cone constructed from the rows of A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
8.3 An example of Farkas’ Lemma: The vector c is inside the positive cone formed by the
rows of A, but c′ is not. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
8.4 The Gradient Cone: At optimality, the cost vector c is obtuse with respect to the directions
formed by the binding constraints. It is also contained inside the cone of the gradients of the
binding constraints, which we will discuss at length later. . . . . . . . . . . . . . . . . . . . . . . . 473
8.5 This figure illustrates the optimal point of the problem given in Example 8.4. Note that
at optimality, the objective function gradient is in the dual cone of the binding constraint. That
is, it is a positive combination of the gradients of the left-hand-sides of the binding constraints
at optimality. The gradient of the objective function is shown in green. . . . . . . . . . . . . 479

9.1 The dual feasible region in this problem is a mirror image (almost) of the primal feasible
region. This occurs when the right-hand-side vector b is equal to the objective function
coefficient column vector cT and the matrix A is symmetric. . . . . . . . . . . . . . . . . . . . . 490
9.2 The simplex algorithm begins at a feasible point in the feasible region of the primal
problem. In this case, this is also the same starting point in the dual problem, which is infeasible.
The simplex algorithm moves through the feasible region of the primal problem towards a point
in the dual feasible region. At the conclusion of the algorithm, the algorithm reaches the unique
point that is both primal and dual feasible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492

10.1 The constraint matrix of (D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505


10.2 A basis of (D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
10.3 Computing the upper bounds tk on the dual step length t in the ratio test. . . . . 509
10.4 Summary of the dual simplex method with bounds. . . . . . . . . . . . . . . . . . . . . . . 513
List of Tables

7.1 Tableau used for Proof of Lemma 7.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453

9.1 Table of Dual Conversions: To create a dual problem, assign a dual variable to each
constraint of the form Ax ◦ b, where ◦ represents a binary relation. Then use the table to
determine the appropriate sign of the inequality in the dual problem as well as the nature of
the dual variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
1. Mathematical Foundations

1.1 Linear Algebra Review


1.1.1 Introduction
Linear programming (LP) is a powerful mathematical technique used for optimization.
To understand LP, a solid grasp of linear algebra is essential. This document provides
a review of key linear algebra concepts relevant to linear programming.
The method of solving a Linear Programming Problem (Simply L.P.P.) mainly
lies in finding out a solution set of m simultaneous linear equations with n (n > m)
unknowns xj (j = 1, 2, . . . , n)



 a11 x1 + a12 x2 + · · · + a1n xn = b1

a x + a x + · · · + a x

= b2
21 1 22 2 2n n
(1.1)




............... ...
am1 x1 + am2 x2 + · · · + amn xn


= bm

subject to the restrictions xj ≥ 0, j = 1, 2, . . . , n, which makes a linear function

z = c1 x 1 + c2 x 2 + · · · + cn x n

of all the variables (unknowns), either a maximum or minimum. The function z


is known as the objective function. The quantities aij , [i = 1, . . . , m; j = 1, 2, . . . , n];
bi [i = 1, 2, . . . , m] and cj [j = 1, 2, . . . , n] are known constants. In order to solve the
problem, some fundamental knowledge about matrix and vector algebra is essential.

1.1.2 Vectors and Vector Spaces

T.Abraha(PhD) @AKU, 2024 Linear Optimization


20 Chapter 1. Mathematical Foundations

Definition 1.1 A vector is an ordered list of numbers, which can be written as:
 
v
 1
 v2 
 
v=
 .. 

 . 
 
vn

Vectors are usually denoted by boldface letters, e.g., v, or by an arrow over a


letter, e.g., ⃗v .

A vector v in Rn (n-dimensional real space) is written as:


 
v
 1
 v2 
 
v=
 .. 

 . 
 
vn
where vi are the components of the vector.
Vector Operations
Addition: If u, v ∈ Rn , their sum is:
 
u + v1
 1 
 u2 + v2 
 
u+v =  . 

 ..  
 
un + vn
Scalar Multiplication: If c ∈ R and v ∈ Rn , the scalar multiplication is:
 
cv
 1
 cv2 
 
cv =  . 

 .. 

 
cvn
• Addition: For u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ),
u + v = (u1 + v1 , u2 + v2 , . . . , un + vn ).
• Scalar Multiplication: For a scalar c ∈ R and v = (v1 , v2 , . . . , vn ),
cv = (cv1 , cv2 , . . . , cvn ).
Dot Product: The dot product of u, v ∈ Rn is:
n
X
u·v = ui vi
i=1
Norm: The norm (or length) of a vector v ∈ Rn is given by:
v
u n
√ uX
∥v∥ = v · v = t v2 i
i=1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 21

Definition 1.2 The dot product of two vectors u = (u1 , u2 , . . . , un ) and v =


(v1 , v2 , . . . , vn ) in Rn is given by

u · v = u1 v1 + u2 v2 + · · · + un vn .

■ Example 1.1 Let u = (1, 2, 3) and v = (4, 5, 6). Then,

u · v = 1 · 4 + 2 · 5 + 3 · 6 = 4 + 10 + 18 = 32.

Definition 1.3 — Norm. The norm (or magnitude) of a vector v = (v1 , v2 , . . . , vn )


in Rn is given by
q
∥v∥ = v12 + v22 + · · · + vn2 .

■ Example 1.2 Let v = (3, 4). Then,


q √ √
∥v∥ = 32 + 42 = 9 + 16 = 25 = 5.

Unit vector: A vector is said to be a unit vector if all its components are zero
except one with unit value. It is a unit vector in n-dimensional vector space whose
g-th component is 1. e1 = (1, 0, . . . , 0), e2 = (0, 1, 0, . . . , 0), e3 = (0, 0, 1, . . . , 0), . . . , en =
(0, 0, . . . , 1) are all unit vectors in n-dimensional space.
Null vector: A vector is said to be a null vector if all the components of the
vector be equal to zero. It is usually denoted by 0. An n-component null vector is
written as 0 = (0, 0, . . . , 0) with n zeros.

Definition 1.4 — Euclidean Distance. The Euclidean distance between two


points u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) in Rn is given by
q
d(u, v) = ∥u − v∥ = (u1 − v1 )2 + (u2 − v2 )2 + · · · + (un − vn )2 .

■ Example 1.3 Let u = (1, 2) and v = (4, 6). Then,


q q √ √
d(u, v) = (1 − 4)2 + (2 − 6)2 = (−3)2 + (−4)2 = 9 + 16 = 25 = 5.

Vector spaces

T.Abraha(PhD) @AKU, 2024 Linear Optimization


22 Chapter 1. Mathematical Foundations

A vector space V is a set (the elements of which are called vectors) on which two
operations are defined: vectors can be added together, and vectors can be multiplied
by real numbers called scalars. V must satisfy
(i) There exists an additive identity (written ⃗0) in V such that x + ⃗0 = x for all
x∈V
(ii) For each x ∈ V , there exists an additive inverse (written −x) ⃗ such that x +
⃗ = ⃗0
(−x)
(iii) There exists a multiplicative identity (written 1) in R such that 1x = x for all
x∈V
(iv) Commutativity: x + y = y + x for all x, y ∈ V
(v) Associativity: (x + y) + ⃗z = x + (x + ⃗z) and α(βx) = (αβ)x for all x,⃗y ,⃗z ∈ V
and α, β ∈ R
(vi) Distributivity: α(⃗x + ⃗y ) = α⃗x + α⃗y and (α + β)⃗x = α⃗x + β⃗x for all ⃗x,⃗y ∈ V and
α, β ∈ R
Metric spaces
Metrics generalize the notion of distance from Euclidean space (although metric
spaces need not be vector spaces).
A metric on a set S is a function d : S × S → R that satisfies
(i) d(x, y) ≥ 0, with equality if and only if x = y
(ii) d(x, y) = d(y, x)
(iii) d(x, z) ≤ d(x, y) + d(y, z) (the so-called triangle inequality)
for all x, y, z ∈ S.
Normed spaces
Norms generalize the notion of length from Euclidean space.
A norm on a real vector space V is a function ∥ · ∥ : V → R that satisfies
(i) ∥x∥ ≥ 0, with equality if and only if x = ⃗0
(ii) ∥αx∥ = |α|∥x∥
(iii) ∥x + y∥ ≤ ∥x∥ + ∥y∥ (the triangle inequality again)
for all x, y ∈ V and all α ∈ R. A vector space endowed with a norm is called a
normed vector space, or simply a normed space.
Note that any norm on V induces a distance metric on V :

d(x, y) = ∥x − y∥

One can verify that the axioms for metrics are satisfied under this definition and
follow directly from the axioms for norms. Therefore any normed space is also a
metric space. We will typically only be concerned with a few specific norms on Rn :
n
X
∥x∥1 = |xi |
i=1
v
u n
uX
∥x∥2 = t x2i
i=1
n
!1
p
|xi |p
X
∥x∥p = (p ≥ 1)
i=1
∥x∥∞ = max |xi |
1≤i≤n

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 23

Note that the 1- and 2-norms are special cases of the p-norm, and the ∞-norm is the
limit of the p-norm as p tends to infinity. We require p ≥ 1 for the general definition
of the p-norm because the triangle inequality fails to hold if p < 1.
Inner product spaces
An inner product on a real vector space V is a function ⟨·⟩ : V ×V → R satisfying
(i) ⟨x, x⟩ ≥ 0, with equality if and only if x = 0
(ii) ⟨αx + βy, z⟩ = α⟨x, z⟩ + β⟨y, z⟩
(iii) ⟨x, y⟩ = ⟨y, x⟩
for all x, y,⃗z ∈ V and all α, β ∈ R. A vector space endowed with an inner product is
called an inner product space.
Note that any inner product on V induces a norm on V :
q
∥x∥ = ⟨x, x⟩

Theorem 1.1 — Pythagorean Theorem. If x ⊥ y, then

∥x + y∥2 = ∥x∥2 + ∥y∥2

Proof. Suppose x ⊥ y, i.e. ⟨x, y⟩ = 0. Then


∥x + y∥2 = ⟨x + y, x + y⟩ = ⟨x, x⟩ + ⟨y, x⟩ + ⟨x, y⟩ + ⟨y, y⟩ = ∥x∥2 + ∥y∥2
as claimed.

Cauchy-Schwarz inequality
This inequality is sometimes useful in proving bounds:
|x · y| ≤ ∥x∥ · ∥y∥
for all x, y ∈ V . Equality holds exactly when x and y are scalar multiples of each
other (or equivalently, when they are linearly dependent). Vectors in Euclidean
Space

Definition 1.5 A vector in Rn is an ordered tuple of n real numbers. We denote


a vector by v = (v1 , v2 , . . . , vn ), where each vi is a real number.

Linear Combinations and Span


A linear combination of vectors v1 , v2 , . . . , vk is an expression of the form:
a1 v 1 + a2 v 2 + · · · + ak v k
where a1 , a2 , . . . , ak are scalars. The span of vectors v1 , v2 , . . . , vk is the set of all
linear combinations of these vectors.
Linear Independence
A set of vectors {v1 , v2 , . . . , vk } is linearly independent if the equation:
a1 v 1 + a2 v 2 + · · · + ak v k = 0
implies that a1 = a2 = · · · = ak = 0.
Basis and Dimension
A basis of a vector space V is a set of linearly independent vectors that span V .
The dimension of V is the number of vectors in a basis for V .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


24 Chapter 1. Mathematical Foundations

1.1.3 Matrices

Definition 1.6 A matrix is a set of m × n quantities arranged in a rectangular


array of m rows and n column. Matrices are commonly denoted by a single
capital letter and are of the form
 
a a12 a13 ··· a1n
 11 
 a21

a22 a23 ··· a2n 

A=
 .. .. .. ... ..  (1.2)
 . . . . 

 
am1 am2 am3 · · · amn

The individual quantities ajk are called the elements of the matrix.

Definition 1.7 The size of a matrix is denoted by two numbers, namely its rows
and column.

A general m × n matrix looks as follows (indices in red represent the row number of
an element, while indices in blue represent its column number):
 
a a12 ... a1n
 11 
 a21 a22 ... a2n 


 .. .. ..
 
 . . . ...


 
am1 am2 . . . amn

A matrix for which m = n (i.e. it has the same number of rows and columns) is called
a square matrix. In a square matrix, the elements where the two indices are equal (i.e.
i = j) are said to be found on the main diagonal of the matrix (also called the major
diagonal, the principal diagonal, and the primary diagonal).

■ Example 1.4 In the following matrices, the main diagonal elements are highlighted:
 
4 −7 1 0  
  1 0 1
 1 5 −1 3  
A= , 0 0 −1
B=
  
 3 −7 1 6
 
1 1 0
−5 9 12 2

Definition 1.8 In a diagonal matrix, if all non-zero elements are unity, the matrix
is known as Identity matrix or unit matrix. It is usually denoted by In or
simply by I .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 25

■ Example 1.5 The matrix


 
1 0 0
 
I3 = 0 1 0


0 0 1

is an identity matrix of order 3. ■

Diagonal and Triangular Matrices


A matrix with all elements outside the main diagonal equaling zero is called a
diagonal matrix.

■ Example 1.6 The follows matrices are diagonal matrices:


 
3 0 0 0  

0
 2 0 0
2 0 0  
A= , B = 0 0 0 
  
0 0 −3 0 
 
0 0 −6
0 0 0 5

R Notice that the main diagonal elements in a diagonal matrix could be equal
to zero. The only criterion is that the elements that are not on the diagonal
must be zero.
A matrix with all elements above the diagonal equaling zero is called an upper
triangular matrix. Similarily, a matrix with all elements "below" the main diagonal
equaling zero is called a lower triangular matrix.

■ Example 1.7 An upper triangular matrix:


 
5 0 0 0
 
 1 1 0 0
A=
 

−3 4 7 0
 
2 2 0 −1

A lower triangular matrix:


 
5 7 9 1
 
0 3 4 −3
B=



0
 0 −5 0 

0 0 0 2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


26 Chapter 1. Mathematical Foundations

Definition 1.9 The zero matrix is a matrix where all elements are equal to
zero.

■ Example 1.8 The matrix


 
0 0 0
 
O3 = 
0 0 0

0 0 0

is a zero matrix of order 3. ■

The transpose of an m × n matrix A is denoted by A′ or AT and A′ will be an


n × m matrix. The transpose of a square matrix is also a square matrix of the same
order. The transpose of the transposed matrix is the original matrix, i.e., (A′ )′ = A,
etc. Transposing a matrix is a common operation. When transposing a matrix A
(notated by AT ), we take the elements and exchange their row with their column:
this operation "rotates" all rows of the matrix to columns, and wise-versa:
transpose
aij −→ aji

   
a a12 ··· a1n a a21 ··· an1
 11   11 
a
 21 a22 ··· a2n  transpose  a12
  a22 ··· an2 

 .. .. −
..  −−−−−→   .. .. .
..  (1.3)

 . . .   . . . 
   
am1 am2 · · · amn a1m a2m · · · anm

■ Example 1.9
   
2 1 3 2 0 4
  T  
0 −7 1 −→ 1 −7 5
   
4 5 0 3 1 0

 
 1 4


1 0 5 T 
 −→ 0 3

4 3 2  
5 2

 
2
 
−1
T 
 
2 −1 3 7 −→ 


 3 
 
7

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 27

Row matrix: A matrix having a single row is known as a row matrix. A 1 × η


matrix is a row matrix.
Column matrix: A matrix having a single column is known as a column matrix.
An m × 1 matrix is a column matrix.
From the definition, it is clear that the transpose of a row matrix is a column
matrix and vice versa.
Equality of two matrices: Two matrices A = [aij ], B = [bij ] of the same order
m × η are said to be equal if aij = bij for all i and j.

Definition 1.10 The symmetric matrix is a matrix that is invariant under the
transpose operation and satisfies the requirement that A = AT .

■ Example 1.10 The matrix


 
1 2 3
 
A=
2 4 8

3 8 9

is symmetric. ■

Definition 1.11 A matrix is called skew-symmetric if and only if A = −AT .

 
0 −7
■ Example 1.11 Given the matrix A =  , then
7 0
 T
T 0 −7
−A = − 
7 0
 T
0 7
=
−7 0
 
0 −7
=
7 0
=G

satisfies all conditions of a skew-symmetric matrix. ■

Addition and subtraction of matrices: The sum of two matrices A = [aij ]


and B = [bij ] of the same order m × n is also a matrix C = [cij ] of the same order,
where cij = aij + bij for all i and j, etc.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


28 Chapter 1. Mathematical Foundations

The sum of matrix A and B is obtained by adding the corresponding entries.

A + B = [aij + bij ]

■ Example 1.12 Let


     
1 4 0 −3 1 −1 4 3
A= B= C =
−2 6 5 3 0 2 2 1

Then
 
−2 5 −1
A+B = 
1 6 7

but neither A + C not B + C is defined. ■

■ Example 1.13
     
1 −2 3 6 −2 8 7 −4 11
     
A+B = 
 0 1 −4 + 1 −5 4 = 1 −3 0 
   

−5 6 7 7 3 9 2 9 16

The difference of two matrices A = [aij ] and B = [bij ] of the same order m × η
is also a matrix C = [dij ] of the same order, where dij = aij − bij .
Multiplication of matrix with scalar quantity: The scalar multiple cA is
the matrix obtained by multiplying each entry of A by c.

cA = c[aij ] = [caij ]

■ Example 1.14
     
2 8 0 1 1/2 2 0  −1 −4 0 
2A =  A= (−1)A = 
−4 12 10 2 −1 3 5/2 2 −6 −5

If A = [aij ] be a matrix of order m × n and k be any scalar quantity, then kA = Ak =


[kaij ] is also a matrix of the same order.
Multiplying a Matrix and a Vector Multiplying a matrix and a vector results
in applying some linear transformation to the vector. This is of course possible only
when the number of columns in the matrix is the same as the number of elements
of the vector.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 29

■ Example 1.15 The following matrix-vector multiplication are allowed:


 
      4
3 1 5   1 0 0 2  
     2   3

5 3  3 
1   5 0 0 1 0 4 
  
  
1 Invalid 7
 
        
4 −1 0 −1
−3 2 0 0 1 3
    
 
 
−5  5
4 4 7 0 0 0 1  
1

Matrix Multiplication
If A is an m × n matrix and B is an n × r matrix, then the product C = AB is
an m × r matrix. The (i, j) entry of the product is computed as follows:

cij = ai1 b1j + ai2 b2j + · · · + ain bnj

or
n
X
cij = aik bkj
k=1

■ Example 1.16 Compute AB if


 
  −4 0 3 −1
1 3 −1  
A=  5 −2 −1 1 
and B =  
−2 −1 1
−1 2 0 6

Solution: Since A is 2 × 3 and B is 3 × 4, AB will be a 2 × 4 matrix.

c11 = 1(−4) + 3(5) + (−1)(−1) = 12


c12 = 1(0) + 3(−2) + (−1)(2) = −8
c13 = 1(3) + 3(−1) + (−1)(0) = 0
c14 = 1(−1) + 3(1) + (−1)(6) = −4
c21 = (−2)(−4) + (−1)(5) + (1)(−1) = 2
c22 = (−2)(0) + (−1)(−2) + (1)(2) = 4
c23 = (−2)(3) + (−1)(−1) + (1)(0) = −5
c24 = (−2)(−1) + (−1)(1) + (1)(6) = 7

Thus, product matrix is given by


 
12 −8 0 −4
AB = 
2 4 −5 7

For matrix product to be feasible, the number of columns of the first matrix must
be equal to the number of rows of the second matrix.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


30 Chapter 1. Mathematical Foundations

With the knowledge of the matrix product, the set of simultaneous linear equations
given in (1.1) can be written with the help of matrix notation as given below in a
more compact form:
Ax = b
where A is a matrix of order m × n given by:
 
a a12 a13 ··· a1n
 11 
 a21

a22 a23 ··· a2n 

A=
 .. .. .. .. .. 
 . . . . . 

 
am1 am2 am3 · · · amn

x is a column matrix of order n × 1, having elements x1 , x2 , . . . , xn . b is a column


matrix of order m × 1, having elements b1 , b2 , . . . , bm . A is known as the coefficient
matrix.
Sub-matrix: A matrix formed by omitting some rows and columns of a matrix,
is known as a sub-matrix of the original matrix.
Determinant of a matrix: The determinant formed with the elements of a
square matrix is known as determinant of the matrix.
If A is a square matrix of order n, then the determinant of the matrix which is
denoted by |A| or det A, is given by

a11 a12 . . . a1n


a21 a22 . . . a2n
|A| = .. .. (1.4)
. .
an1 an2 . . . ann

A determinant is a single number computed from a square matrix (m = n). The


simplest case is the determinant of a 2 × 2 matrix
 
α δ α δ
det  = αβ − γδ = (1.5)
γ β γ β

Note: |A| is here different from the modulus of a quantity A, where the same
notation has been used.
Minor: The minor of an element aij of a determinant A is a determinant formed
by omitting the ith row and the jth column of the determinant A. It is usually
denoted by Mij .
Cofactor: The cofactor of an element aij of a determinant |A| is denoted by Cij
and given by
Cij = (−1)i+j Mij
Singular and non-singular matrix: If the value of the determinant of a square
matrix be a non-zero quantity, then the matrix is said to be a non-singular matrix,
and if the value of the determinant of the matrix be zero then the matrix is said to
be a singular matrix. We know that det A = det AT .
Thus, if a square matrix be non-singular, then its transpose is also non-singular.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 31

Rank of a matrix: The rank of a m × n matrix A will be r, if all square


sub-matrices of order (r + 1) are singular and at least one square sub-matrix of order
r is non-singular. From the definition, it is clear that r ≤ min(m, n). The rank of a
matrix A is  usually denoted by r(A).
2 −3 2 −3
(i) A =  , |A| = = 2 × 4 − (−3) × 3 = 17 ̸= 0.
3 4 3 4
Thusr(A) =  2.
5 10 5 10
(ii) A =  , |A| = = 5 × 6 − 10 × 3 = 0.
3 6 3 6
Thus r(A) ̸= 2. h i
Now one square sub-matrix of order 1 is B1 = 5 and B1 = 5 ̸= 0.
Thus r(A) = 1
Note: In has rank n and a null matrix has rank 0.
Adjoint of a matrix: The adjoint of a square matrix A is a matrix of the same
order, which is the transpose of the matrix B, where the elements of B are the
co-factors of the corresponding elements of the determinant of the matrix A. It is
denoted by Adj(A).
If A = [αij ]n×n then
 
c c12 . . . c1n
 11 
 c21 c22 . . . c2n 
 
Adj(A) =  .

.. ..  (1.6)
 .. . . 

 
cn1 cn2 . . . cnn
where cij , etc. are the co-factors of the corresponding elements aij of |A| given in
(1.4).
Inverse of a matrix: If there exist two square matrices A and B of the same order
such that AB = BA = I (identity matrix of the same order) then B is called as the
inverse matrix of A and vice versa. The inverse matrix of A which is denoted by A−1
is given by
1
A−1 = · Adj(A) (1.7)
|A|
From the definition, it is clear that AA−1 = A−1 A = I. It is also clear that inverse
exists only in the non-singular matrices. The inverse matrix of an identity matrix is
the same matrix.
Matrix B is the inverse of a square matrix A if
AB = BA = I. (1.8)
For both products to be defined simultaneously, A and B must be square
matrices of the same order. A square matrix is said to be singular
if it does not have an inverse; a matrix that has an inverse is called
nonsingular or invertible. The inverse of A, when it exists, is denoted
as A−1 .
Properties of the Inverse

T.Abraha(PhD) @AKU, 2024 Linear Optimization


32 Chapter 1. Mathematical Foundations

1. The inverse of a nonsingular matrix is unique.


 −1
2. If A is nonsingular, then A−1 = A.
3. If A and B are nonsingular, then (A B)−1 = B −1 A−1 .  t
4. If A is nonsingular, then so too is At . Further, (At )−1 = A−1 .

Let A be an n × n matrix. If any of the following equivalent conditions hold, then A


does not have an inverse:
1. A x = 0 has a nontrivial solution.
2. The columns of A are linearly dependent.
3. The rank of A is less than n.
4. A is not row equivalent to the identity matrix.
5. det(A) = 0
A linear combination of matrices looks like
c1 A1 + c2 A2 + · · · + ck Ak
Where c1 , c2 , . . . , ck are the coefficients of the linear combination.
     
0 1 1 0 1 1
■ Example 1.17 Let A1 =  , A2 =  , and A3 =  .
−1 0 0 1 1 1
 
1 4
Is B =  a linear combination of A1 , A2 , and A3 ? ■
2 1

Solution: We want to find scalars c1 , c2 , and c3 such that c1 A1 + c2 A2 + c3 A3 = B.


Thus,
       
0 1 1 0 1 1 1 4
c1  + c2  + c3  =
−1 0 0 1 1 1 2 1

The left-hand side of this equation can be rewritten as


 
c + c3 c1 + c3 
 2
−c1 + c3 c2 + c3

Comparing entries and using the definition of the matrix equality, we have four linear
equations
c2 + c3 = 1
c1 + c3 = 4
−c1 + c3 = 2
c2 + c3 = 1
Gauss-Jordan elimination easily gives
   
0 1 1 1 1 0 0 1
   
 1 0 1 4   0 1 0 −2 
→
   
 

 −1 0 1 2 


 0 0 1 3 
0 1 1 1 0 0 0 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 33

so c1 = 1, c2 = −2, and c3 = 3. Thus, A1 − 2A2 + 3A3 = B.


The span of a set of matrices is the set of all linear combinations of the matrices.

■ Example 1.18 Describe the span of the matrices A1 , A2 , and A3 from the previous
example. ■

Solution: Write out a general linear combination of A1 , A2 , and A3 .


     
0 1 1 0 1 1
c1 A1 + c2 A2 + c3 A3 = c1  + c2  + c3 
−1 0 0 1 1 1
 
c2 + c3 c1 + c3 
=
−c1 + c3 c2 + c3
 
w x
Suppose we want to know when the matrix  is in span (A1 , A2 , A3 ). We know
y z
that it is when
   
c + c3 c1 + c3  w x
 2 =
−c1 + c3 c2 + c3 y z

for some choice of scalars c1 , c2 , c3 . This gives a system of lienar equations whose
left-hand side is exactly the same as in the previous example but whose right-hand
side is general. The augmented matrix of this system is
 
0 1 1 w
 

 1 0 1 x 

 

 −1 0 1 y 

0 1 1 z

and row reduction produces


1 1
   
0 1 1 w 1 0 0 2x − 2y
0 − 12 x − 12 y + w
   
 1 0 1 x   0 1 
→
   
 1 1 

 −1 0 1 y 


 0 0 1 2x + 2y


0 1 1 z 0 0 0 w−z

The only restriction comes from the last row, where we must have w − z = 0 to have
w x
a solution. Thus, the span of A1 , A2 , and A3 consists of all matrices  for
y z
 
 w x 
which w = z. That is, span(A1 , A2 , A3 ) = 
y w 
Matrices A1 , A2 , . . . , Ak are linearly independent if the only solution of

c1 A1 + c2 A2 + · · · + ck Ak = 0

Is the trivial one: c1 = c2 = · · · = ck = 0. If not, the matrices are linearly dependent.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


34 Chapter 1. Mathematical Foundations

Theorem 1.2 If A is an invertible n × n matrix, then the system of linear equations


given by Ax = b has the unique solution x = A−1 b for any b in Rn .
 
a b
If A =  , then A is invertible if ad − bc ̸= 0, in which case
c d
 
1  d −b
A−1 =
ad − bc −c a

If ad − bc = 0, then A is not invertible.


The expression ad − bc is called the determinant of A.
   
1 2 12 −15
■ Example 1.19 Find the inverses of A =  and B =  , if they exist.
3 4 4 −5

Solution: We have det A = 1(4) − 2(3) = −2 ̸= 0, so A is invertible, with


   
−1 1  4 −2  −2 1 
A = =
−2 −3 1 3/2 −1/2

On the other hand, det B = 12(−5) − (−15)(4) = 0, so B is not invertible.


Row-Echelon Transformation Algorithm

An algorithm for using elementary row operations to transform a matrix


into row-echelon form is as follows:
1. Let R denote the work row, and initialize R = 1 (so the top row is
the first work row).
2. Find the first column containing a nonzero element in either row
R or any succeeding row. If no such column exists, stop; the
transformation is complete. Otherwise let C denote this column.
3. Beginning with row R and continuing through successive rows, locate
the first row having a nonzero element in column C. If this row
is not row R, interchange it with row R. Row R will now have a
nonzero element in column C. This element is called the pivot; let
P denote its value.
4. If P ̸= 1, multiply the elements of row R by 1/P ; otherwise continue.
5. Search all rows following R for one having a nonzero element in
column C. If no such row exists, go to step 8; otherwise designate
that row as row N , and the value of the nonzero element in row N
and column C as V .
6. Add to the elements of row N the scalar −V times the corresponding
elements of row R.
7. Return to step 5.
8. Increase R by 1. If this new value of R is larger than the number of
rows in the matrix, stop; the transformation is complete. Otherwise,
return to step 2.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 35

In addition to the example here, there are more examples in the section on matrix
inversion.

■ Example 1.20 Follow the above steps to transform


 
0 1 4
B= (1.9)
1 2 3

With R = 1 (step 1) and C = 1 (step 2), we apply step 3 and interchange rows 1
and 2, obtaining
 
1 2 3
 (1.10)
0 1 4

which is in row-echelon form. ■

■ Example 1.21 Follow the above steps to transform


 
0 2 4
B= (1.11)
0 0 1

With R = 1 (step 1) and C = 2 (step 2), P = 2 (step 3), we multiply the elements
of row 1 by 1/P = 1/2 to get a row-echelon form
 
0 1 2
 (1.12)
0 0 1

which is in row-echelon form. ■

■ Example 1.22 Follow the above steps to transform


 
1 2 3
B= (1.13)
4 9 7

With R = 1 (step 1) and C = 1 (step 2), N = 2 and V = 4 (step 5), we add to


each element of row 2 (-4) times the corresponding element in row 1, to get a
row-echelon form
 
1 2 3
 (1.14)
0 1 −5

Inversion Using Elementary Row Operations


Elementary Matrix Operations: These operations are used to solve systems
of linear equations or inverting a matrix. The three operations are as follows (for
any matrix A):

T.Abraha(PhD) @AKU, 2024 Linear Optimization


36 Chapter 1. Mathematical Foundations

• Interchange two rows of A.


• Multiply a row by a nonzero scalar.
• Replace row i with row i plus row j multiplied by a nonzero scalar.

Inverting a Matrix:
3 7
   
3 9 2 − 11 5 − 11
−1 =  2 1 
   
A=
 1 1 1  and A

 11 −1 11 
1 6
5 4 7 11 −3 11

To find A−1 using elementary row operations to transform A into an identity


matrix, while performing these same operations on the attached identity matrix.
3 7
   
3 9 2 1 0 0 1 0 0 − 11 5 − 11
2 1 .
   
 1 1 1 0 1 0 ⇒ 0 1 0 −1
   11 11 
1 6
5 4 7 0 0 1 0 0 1 11 −3 11

Calculating Inverse using Elementary Row Operations

Inverses may be found through the use of elementary row operations. This
procedure not only yields the inverse when it exists, but also indicates
when the inverse does not exist. An algorithm for finding the inverse of
a matrix A is as follows:
1. Form the partitioned matrix [A |I ], where I is the identity matrix
having the same order as A.
2. Using elementary row operations, transform A into row-echelon form,
applying each row operation to the entire partioned matrix formed
in Step 1. Denote the result as [ C |D ], where C is in row-echelon
form.
3. If C has a zero row, stop; the original matrix A is singular and
does not have an inverse. Otherwise continue; the original matrix is
invertible.
4. Beginning with the last column of C and progressing backward
iteratively through the second column, use elementary row operation
(3) to transform all elements above the diagonal of C to zero. Apply
each operation, however, to the entire matrix [ C |D ]. Denote the
result as [ I |B ]. The matrix B is the inverse of the original matrix
A.
If exact arithmetic is not used in Step 2, then a pivoting strategy should
be employed. No pivoting strategy is used in Step 4; the pivot is always
one of the unity elements on the diagonal of C. Interchanging any rows
after Step 2 has been completed will undo the work of that step and,
therefore, is not allowed.

In the examples below, the row reduction is done “by hand” first, as further examples
of using the elementary row operations.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 37

■ Example 1.23 We use the elementary row operation algorithm to find the inverse
of
 
5 3
A= . (1.15)
2 1

We first form the partitioned matrix [A |I ].


 
5 3 | 1 0
[A |I ] =  . (1.16)
2 1 | 0 1

With R = 1, C = 1, and P = 5, we first multiply row 1 by (1/5) to get


 
1 3/5 | 1/5 0
 . (1.17)
2 1 | 0 1

With R = 1, C = 1, N = 2, and V = 2, we replace row 2 by the sum of row 2 and


(−2) times row 1 to get
 
1 3/5 | 1/5 0
 . (1.18)
0 −1/5 | −2/5 1

With R = 2, C = 2, and P = −1/5, we multiply row 2 by (−5) to get


 
1 3/5 | 1/5 0 
[ C |D ] =  . (1.19)
0 1 | 2 −5

C does not have a zero row, so A is invertible, and we continue to find the inverse.
We replace row 1 by the sum of row 1 plus (−3/5) times row 2, to get
 
1 0 | −1 3 
[ I |B ] =  . (1.20)
0 1 | 2 −5

This determines the inverse of A as


 
−1 3 
A−1 = B =  , (1.21)
2 −5

■ Example 1.24 We use the elementary row operation algorithm to find the inverse

T.Abraha(PhD) @AKU, 2024 Linear Optimization


38 Chapter 1. Mathematical Foundations

of
 
1 2 3
 
A = 4 5 6

. (1.22)
7 8 9

We first form the partitioned matrix [A |I ].


 
1 2 3 | 1 0 0
 
[A |I ] = 4 5 6 | 0 1 0

. (1.23)
7 8 9 | 0 0 1

With R = 1, C = 1, N = 2, and V = 4, we replace row 2 by the sum of row 2 and


(−4) times row 1 to get
 
1 2 3 | 1 0 0
 
0 −3 −6 | −4 1 0 . (1.24)
 
7 8 9 | 0 0 1

With R = 1, C = 1, N = 3, and V = 7, we replace row 3 by the sum of row 3 and


(−7) times row 1 to get
 
1 2 3 | 1 0 0
 
0 −3 −6 | −4 1 0 . (1.25)
 
0 −6 −12 | −7 0 1

We increase R by 1, and with R = 2, C = 2, and P = −3, we replace row 2 by


(−1/3) times row 2 to get
 
1 2 3 | 1 0 0
 
0 1
 2 | 4/3 −1/3 0. (1.26)
0 −6 −12 | −7 0 1

With R = 2, C = 2, N = 3, and V = −6 we replace row 3 by the sum of row 3 and


(6) times row 2 to get
 
1 2 3 | 1 0 0
 
[ C |D ] = 0 1 2 | 4/3 −1/3 0

. (1.27)
0 0 0 | 1 −2 1

The left side of this partitioned matrix is in row-echelon form. Since its third row
is a zero row, the original matrix A does not have an inverse. ■

■ Example 1.25 We use the elementary row operation algorithm to find the inverse

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 39

of
 
0 1 1
 
A = 5 1 −1

. (1.28)
2 −3 −3

We first form the partitioned matrix [A |I ].


 
0 1 1 | 1 0 0
 
[A |I ] = 5 1 −1 | 0 1 0

. (1.29)
2 −3 −3 | 0 0 1

We interchange R1 and R2 to get


 
5 1 −1 | 0 1 0
 
0 1
 1 | 1 0 0. (1.30)
2 −3 −3 | 0 0 1

With R = 1, C = 1, and P = 5, replace R1 by (1/5) R1 to get


 
1 1/5 −1/5 | 0 1/5 0
 
0
 1 1 | 1 0 0 . (1.31)
2 −3 −3 | 0 0 1

With R = 1, C = 1, N = 3, and V = 2, we replace R3 with the sum of R3 and


(−2) R1 to get
 
1 1/5 −1/5 | 0 1/5 0
 
0
 1 1 | 1 0 0 .
 (1.32)
0 −17/5 −13/5 | 0 −2/5 1

With R = 2, C = 2, N = 3, and V = −17/5, we replace R3 with the sum of R3 and


(17/5) R2 to get
 
1 1/5 −1/5 | 0 1/5 0
 
0
 1 1 | 1 0 0. (1.33)
0 0 4/5 | 17/5 −2/5 1

With R = 3, C = 3, and P = 4/5, we replace R3 by (5/4) R3 to get


 
1 1/5 −1/5 | 0 1/5 0
 
[ C |D ] = 0 1
 1 | 1 0 .
0  (1.34)
0 0 1 | 17/4 −2/4 5/4

T.Abraha(PhD) @AKU, 2024 Linear Optimization


40 Chapter 1. Mathematical Foundations

The left side of this partitioned matrix is now in row-echelon form and there are
no zero rows, so the inverse of the original matrix A exists. We start on column 3
and replace R2 by the sum of R2 and (−1) R3 to get
 
1 1/5 −1/5 | 0 1/5 0
 
0
 1 0 | −13/4 2/4 −5/4.
 (1.35)
0 0 1 | 17/4 −2/4 5/4

We next replace R1 by the sum of R1 and (1/5) R3 to get


 
1 1/5 0 | 17/20 2/20 5/20
 
0
 1 0 | −13/4 2/4 −5/4 . (1.36)
0 0 1 | 17/4 −2/4 5/4

We next replace R1 by the sum of R1 and (−1/5) R2 to get


 
1 0 0 | 3/2 0 1/2
 
[ I |B ] = 
0 1 0 | −13/4 2/4 −5/4 .
 (1.37)
0 0 1 | 17/4 −2/4 5/4

This determines the inverse of A as


   
3/2 0 1/2 6 0 2
−1   1 
A = B = −13/4 2/4 −5/4 = −13 2 −5
   (1.38)
4 
17/4 −2/4 5/4 17 −2 5

1.1.4 Linear Systems of Equations


A linear system of equations can be written in matrix form as Ax = b, where A
is a matrix, x is a vector of variables, and b is a vector of constants.
1.1.4.1 Solving Linear Systems
Solving systems of linear equations is a fundamental problem in linear algebra with
applications in various fields such as engineering, physics, and economics. This
section provides an overview of three important methods for solving linear systems:
Gaussian Elimination, LU Decomposition, and Matrix Inversion.
Gaussian Elimination
Gaussian elimination is a method for solving linear systems by transforming the
system’s augmented matrix into row-echelon form (REF) or reduced row-echelon
form (RREF).
Procedure
Consider the linear system Ax = b, where A is an m × n matrix, x is a vector of
n variables, and b is a vector of m constants. The steps are:
1. Form the augmented matrix [A|b].
2. Forward elimination: Use row operations to convert the matrix into an
upper triangular form.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.1 Linear Algebra Review 41

3. Back substitution: Solve the equations starting from the last row upwards
to find the values of the variables.

■ Example 1.26 Solve the system:

x + 2y + 3z = 9
2x + 3y + z = 8
3x + y + 2z = 7

1. Form the augmented matrix:


 
1 2 3 9
 
 2 3 1 8 
 
3 1 2 7

2. Perform row operations to obtain an upper triangular form:


 
1 2 3 9
 
 0 −1 −5 −10 
 
0 0 −7 −14

3. Back substitution to find z, y, and x:

z = 2, y = −4, x=1

LU Decomposition
LU decomposition factors a matrix A into the product of a lower triangular
matrix L and an upper triangular matrix U , such that A = LU .
Procedure
1. Decompose A into L and U .
2. Solve Ly = b for y using forward substitution.
3. Solve U y = x for x using back substitution.

■ Example 1.27 Consider the matrix A and vector b:


   
2 −1 −2 −2
   
−4 6
A= 3, b=
 7 

−4 −2 8 6

1. Decompose A:
   
1 0 0 2 −1 −2
   
L=
 −2 1 0 ,
 U =
0 4 −1 

−2 −1 1 0 0 3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


42 Chapter 1. Mathematical Foundations

2. Solve Ly = b:
 
−2
 
y=
 3 

3. Solve U y = x:
 
1
 
x=
1

1

Matrix Inversion
Matrix inversion involves finding the inverse A−1 of the matrix A such that
AA−1 = I.
Procedure
1. Find the inverse A−1 if it exists.
2. Solve x = A−1 b.

■ Example 1.28 Consider the matrix A and vector b:


   
4 7 5
A= , b= 
2 6 3

1. Find the inverse:


   
1 6 −7  3 −3.5
A−1 =  =
4 · 6 − 7 · 2 −2 4 −1 2

2. Solve x = A−1 b:
    
3 −3.5 5 0.5
x= =
−1 2 3 1

Eigenvalues and Eigenvectors


For a square matrix A, a scalar λ is an eigenvalue and a non-zero vector v is an
eigenvector if:

Av = λv

1.2 Convex Sets and Convex Functions


Convex sets and convex functions are fundamental concepts in optimization theory
that have numerous applications in various fields. In this lecture, we will introduce
the basics of convex sets and convex functions, including their definitions, properties,
and examples.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.2 Convex Sets and Convex Functions 43

1.2.1 Convex Sets

Definition 1.12 — Convex Sets. A set S is convex if for any two points in S,
the entire line segment between them is also contained in S. That is, for any
x, y ∈ S
λx + (1 − λy) ∈ S for all λ ∈ [0, 1].

A set S is said to be convex if it contains the line segment joining any two points in
the set. In other words, a set S is convex if for any two points x, y ∈ S, the point
z = λx + (1 − λ)y is also in S for all λ ∈ [0, 1].
Properties of Convex Sets
Some important properties of convex sets include: Convex and Polyhedral Sets

Convex Set: Set S in Rn is a convex set if a line segment joining any pair of points
a1 and a2 in S is completely contained in ∫ , that is, λa1 + (1 − λ)a2 ∈ S, ∀λ ∈ [0, 1].

Hyperplanes and Half-Spaces: A hyperplane in Rn divides Rn into 2 half-


spaces (like a line does in R2 ). A hyperplane is the set {x : px = k}, where p is
the gradient to the hyperplane (i.e., the coefficients of our linear expression). The
corresponding half-spaces is the set of points {x : px ≥ k} and {x : px ≤ k}.

Polyhedral Set: A polyhedral set (or polyhedron) is the set of points in the
intersection of a finite set of half-spaces. Set S = {x : Ax ≤ b, x ≥ 0}, where A is an
m × n matrix, x is an n-vector, and b is an m-vector, is a polyhedral set defined by
m + n hyperplanes (i.e., the intersection of m + n half-spaces).
• Polyhedral sets are convex.
• A polytope is a bounded polyhedral set.
• A polyhedral cone is a polyhedral set where the hyperplanes (that define the
half-spaces) pass through the origin, thus C = {x : Ax ≤ 0} is a polyhedral
cone.
Edges and Faces: An edge of a polyhedral set S is defined by n − 1 hyperplanes,
and a face of S by one of more defining hyperplanes of S, thus an extreme point
and an edge are faces (an extreme point is a zero-dimensional face and an edge a
one-dimensional face). In R2 faces are only edges and extreme points, but in R3
there is a third type of face, and so on...

Extreme Points: x ∈ S is an extreme point of S if:


Definition 1: x is not a convex combination of two other points in S, that is, all line
segments that are completely in S that contain x must have x as an endpoint.
Definition 2: x lies on n linearly independent defining hyperplanes of S.
If more than n hyperplanes pass through an extreme points then it is a degenerate
extreme point, and the polyhedral set is considered degenerate. This just adds a bit
of complexity to the algorithms we will study, but it is quite common.

Unbounded Sets:

Rays: A ray in Rn is the set of points {x : x0 + λd, λ ≥ 0}, where x0 is the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


44 Chapter 1. Mathematical Foundations

vertex and d is the direction of the ray.

Convex Cone: A Convex Cone is a convex set that consists of rays emanating
from the origin. A convex cone is completely specified by its extreme directions. If C
is convex cone, then for any x ∈ C we have λx ∈ C, λ ≥ 0.

Unbounded Polyhedral Sets: If S is unbounded, it will have directions. d is


a direction of S only if Ax + λd ≤ b, x + λd ≥ 0 for all λ ≥ 0 and all x ∈ S. In other
words, consider the ray {x : x0 + λd, λ ≥ 0} in Rn , where x0 is the vertex and d is
the direction of the ray. d ̸= 0 is a direction of set S if for each x0 in S the ray
{x0 + λd, λ ≥ 0} also belongs to S.

Extreme Directions: An extreme direction of S is a direction that cannot be


represented as positive linear combination of other directions of S. A non-negative
linear combination of extreme directions can be used to represent all other directions
of S. A polyhedral cone is completely specified by its extreme directions.

Let’s define a procedure for finding the extreme directions, using the following
LP’s feasible region. Graphically, we can see that the extreme directions should
follow the the s1 = 0 (red) line and the s3 = 0 (orange) line.

x2
max z = −5x1 − x2 5 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 1 s3 = 0
4
− x1 + 2x2 + s3 = 4
x1 , x2 , s1 , s2 , s3 ≥ 0.
3

1
s1 = 0
0
0 1 2 3 4 5 x1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.2 Convex Sets and Convex Functions 45
x2
5 s2 = 0

s3 = 0
4

1 s1 = 0

0
0 1 2 3 4 5 x1

We look at some of the geometric properties of sets of points in this section. Consider
any two points v1 and v2 . Then the vector v1 + k(v2 − v1 ) lies on the line segment
joining v1 and v2 for k ∈ [0, 1].Rearranging, we can write this as (1 − k)v1 + kv2 , or as
λ1 v1 + λ2 v2 where λ1 + λ2 = 1 and 0 ≤ λ1 , λ2 ≤ 1. What is interesting, however, is that
this generalizes to larger sets as well. If we consider a set of n points S = {v1 , . . . , vn },
then any point lying in the polygon with v1 , . . . , vn as its vertices can be written as
Pn Pn
i=1 λi vi , where i=1 λi = 1 and 0 ≤ λi ≤ 1.
We now define the term convex combination.

Definition 1.13 Given n vectors v1 , . . . , vn , vector v of the form


n
X n
X
v= λi vi , 0 ≤ λi ≤ 1, λi = 1
i=1 i=1

is called a convex combination of v1 , . . . , vn .

A convex set is defined as:

Definition 1.14 A set of points S is called convex if for any subset S ′ of S and
for any point p which we get by convex combination of points in S ′ , p ∈ S.

As an example the set {x : Ax ≤ b} is convex. This is because for any x1 , . . . , xn


satisfying Axi ≤ b, A( i λi xi ) = i λi Axi ≤ i λi b = b as i λi = 1.
P P P P

Convex Hull: The convex hull of a set S is the smallest convex set that contains
S. It is denoted as co(S). Separating Hyperplane: A separating hyperplane
is a hyperplane that separates two disjoint convex sets. It is a crucial concept
in optimization theory. Support Function: The support function of a convex
set S at a point x is the maximum value of the linear function f (y) = xT y for all

T.Abraha(PhD) @AKU, 2024 Linear Optimization


46 Chapter 1. Mathematical Foundations

y ∈ S.

Definition 1.15 — Convex Set. Let X ⊆ Rn . Then the set X is convex if and
only if for all pairs x1 , x2 ∈ X we have λx1 + (1 − λ)x2 ∈ X for all λ ∈ [0, 1].

The definition of convexity seems complex, but it is easy to understand. First


recall that if λ ∈ [0, 1], then the point λx1 +(1−λ)x2 is on the line segment connecting
x1 and x2 in Rn . For example, when λ = 1/2, then the point λx1 + (1 − λ)x2 is the
midpoint between x1 and x2 . In fact, for every point x on the line connecting x1
and x2 we can find a value λ ∈ [0, 1] so that x = λx1 + (1 − λ)x2 . Then we can see
that, convexity asserts that if x1 , x2 ∈ X, then every point on the line connecting x1
and x2 is also in the set X.

Definition 1.16 — Positive Combination. Let x1 , . . . , xm ∈ Rn . If λ1 , . . . , λm > 0


and then
m
X
x= λi xi (1.39)
i=1

is called a positive combination of x1 , . . . , xm .

Definition 1.17 — Convex Combination. Let x1 , . . . , xm ∈ Rn . If λ1 , . . . , λm ∈ [0, 1]


and
m
X
λi = 1
i=1

then
m
X
x= λi xi (1.40)
i=1

is called a convex combination of x1 , . . . , xm . If λi < 1 for all i = 1, . . . , m, then


Equation 1.41 is called a strict convex combination.

R If you recall the definition of linear combination, we can see that we move
from the very general to the very specific as we go from linear combinations to
positive combinations to convex combinations. A linear combination of points
or vectors allowed us to choose any real values for the coefficients. A positive
combination restricts us to positive values, while a convex combination asserts
that those values must be non-negative and sum to 1.

■ Example 1.29 Figure 1.6 illustrates a convex and non-convex set.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.2 Convex Sets and Convex Functions 47

x1
x1 x2 x2

X X

Convex Set Non-Convex Set

Figure 1.1: Examples of Convex Sets: The set on the left (an ellipse and its interior)
is a convex set; every pair of points inside the ellipse can be connected by a line
contained entirely in the ellipse. The set on the right is clearly not convex as we’ve
illustrated two points whose connecting line is not contained inside the set.

Non-convex sets have some resemblance to crescent shapes or have components


that look like crescents. ■

Theorem 1.3 The intersection of a finite number of convex sets in Rn is convex.

Proof. Let C1 , . . . , Cn ⊆ Rn be a finite collection of convex sets. Let


n
\
C= Ci (1.41)
i=1

be the set formed from the intersection of these sets. Choose x1 , x2 ∈ C and λ ∈ [0, 1].
Consider x = λx1 + (1 − λ)x2 . We know that x1 , x2 ∈ C1 , . . . , Cn by definition of C.
By convexity, we know that x ∈ C1 , . . . , Cn by convexity of each set. Therefore, x ∈ C.
Thus C is a convex set. ■

1.2.2 Convex Functions


Convex function are "nice" functions that "open up". They represent an extremely
important class of functions in optimization and typically can be optimized over
efficiently.

1 100

1 2
−1 −2
2
1 −2
−1
(b) Non-Convex Function
(a) Convex Function f (x, y) = x2 + y 2 − 2(x − 0.3)3 − 2(y −
f (x, y) = x2 + y 2 . 0.4)3 .

Figure 1.2: Comparison of Convex and Non-Convex Functions.

Informally, a function is convex if whenever you draw a line between two points
on the function, that line must be above the function.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


48 Chapter 1. Mathematical Foundations
y
 
f λx1 + (1 − λ)x2
λf (x1 ) + (1 − λ)f (x2 )

f (x)

x
x1 λx1 + (1 − λ)x2 x2
Formally, we can make this definition using the idea of convex combinations.

Definition 1.18 Convex Functions A function f : Rn → R is convex if for all


x, y ∈ Rn and λ ∈ [0, 1] we have

λf (x) + (1 − λ)f (y) ≥ f (λx + (1 − λ)y). (1.42)

A function f : Rn → R is said to be convex if its epigraph is a convex set. In


other words, a function f is convex if its graph is a convex set.
Properties of Convex Functions
Some important properties of convex functions include:
• Jensen’s Inequality: Jensen’s inequality states that for any random variable
X and any convex function f , the expected value of f (X) is greater than or
equal to f (E[X]).
• Convexity Preservation: The composition of convex functions with linear
maps preserves convexity.
• Concavity Preservation: The composition of concave functions with linear
maps preserves concavity.

■ Example 1.30 The set (x, y)|x2 + y 2 ≤ 1 is a convex set. ■

■ Example 1.31 The function f (x) = x2 is a convex function. ■

■ Example 1.32 The function f (x) = −x2 is a concave function. ■

Convexity
A key property that will enable efficient algorithms is convexity. This comes in the
form of convex sets and convex functions. Then the constraints to an optimization
problem form a convex set and the objective function is a convex function, then we
say that it is a convex optimization problem.

Definition 1.19 — Convex Combination. Given two points x, y, a convex com-

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.2 Convex Sets and Convex Functions 49

bination is any point z that lies on the line between x and y. Algebraically, a
convex combination is any point z that can be represented as z = λx + (1 − λ)y
for some multiplier λ ∈ [0, 1].

y
z = λx + (1 − λ)y
x

Figure 1.3: A convex combination of the points x and y is given by z = λx + (1 − λ)y


with any λ ∈ [0, 1]. Here we demonstrate this using λ = 2/3.

Definition 1.20 Convex Set A set C is convex if it contains all convex combina-
tions of points in C. That is, for any x, y ∈ C, it holds that λx + (1 − λ)y ∈ C
for all λ ∈ [0, 1].

(a) Convex Set (b) Non-convex Set

An equivalent definition of convex function are through the epigraph.

Definition 1.21 Epigraph The epigraph of f is the set {(x, y) : y ≥ f (x)}. This
is the set of all points "above" the function.

Theorem 1.4 f (x) is a convex function if and only if the epigraph of f is a convex
set.

■ Example 1.33 Examples of Convex functions


Some examples are
• f (x) = ax2 + b
• f (x) = x4
• f (x) = x
• f (x) = |x|x
• f (x) = e √
• f (x) = −3 x on the domain [0, ∞).
• f (x) = x on the domain [0, ∞).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


50 Chapter 1. Mathematical Foundations
q
• f (x, y) = x2 + y 2
• f (x, y) = x2 + y 2 + x
• f (x, y) = ex+y
• f (x, y) = ex + ey + x2 + (3x + 4y)6

10

2
5

1 1
−1 −1
1 1
−1 −1

0.5
1
−1
1
−1

Figure 1.5: Convex Functions f (x, y) = x2 + y 2 + x , f (x, y) = ex+y + ex−y + e−x−y ,


and f (x, y) = x2 .

Proving Convexity - Characterizations


Theorem 1.5 Convexity: First order characterization - linear underestimates
Suppose that f : Rn → R is differentiable. Then f is convex if and only if for all
x̄ ∈ Rn , then linear tangent is an underestimator to the function, that is,

f (x̄) + (x − x̄)⊤ ∇f (x̄) ≤ f (x).

Theorem 1.6 Convexity: Second order characterization - positive curvature We


give statements for uni-variate functions and multi-variate functions.
Suppose f : R → R is twice differentiable. Then f is convex if and only if
f ′′ (x) ≥ 0 for all x ∈ R.
• Suppose f : Rn → R is twice differentiable. Then f is convex if and only if
∇2 f (x) ≽ 0 for all x ∈ Rn .

Proving Convexity - Composition Tricks

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.2 Convex Sets and Convex Functions 51

General 1: Positive Scaling of Convex Function is Convex If f is convex and


α > 0, then αf is convex.

Example: f (x) = ex is convex. Therefore, 25ex is also convex.

General 2: Sum of Convex Functions is Convex If f and g are both convex, then
f + g is also convex.

Example: f (x) = ex , g(x) = x4 are convex. Therefore, ex + x4 is also convex.

General 3: Composition with affine function If f (x) is convex, then f (a⊤ x + b)


is also convex.

Example: f (x) = x4 are convex. Therefore, (3x + 5y + 10z)4 is also convex.

General 4: Pointwise maximum If fi are convex for i = 1, . . . , t, then f (x) =


maxi=1,...,t fi (x) is convex.

Example: f1 (x) = e−x , f2 (x) = ex are convex. Therefore, f (x) = max(ex , e−x )
is also convex.

General 5: Other compositions Suppose

f (x) = h(g(x)).

1. If g is convex, h is convex and non-decreasing, then f is convex.


2. If g is concave, h is convex and non-increasing, then f is convex.

Example 1: g(x) = x4 is convex, h(x) = ex is convex and non-decreasing.


4
Therefore, f (x) = ex is also convex.


Example 2: g(x) = x is concave √
(on [0, ∞)), h(x) = e−x is convex and
non-increasing. Therefore, f (x) = e− x is convex on x ∈ [0, ∞).

Convex and Concave Functions

Definition 1.22 — Convex Function. A function f : Rn → R is a convex function


if it satisfies:

f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) (1.43)

for all x1 , x2 ∈ Rn and for all λ ∈ [0, 1].

This definition is illustrated in Figure 1.7.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


52 Chapter 1. Mathematical Foundations

f (x1 ) + (1 − λ)f (x2 )

f (λx1 + (1 − λ)x2 )

Figure 1.6: A convex function: A convex function satisfies the expression f (λx1 +
(1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) for all x1 and x2 and λ ∈ [0, 1].

When f is a univariate function, this definition can be shown to be equiva-


lent to the definition you learned in Calculus I (Math 140) using first and second
derivatives.

Definition 1.23 — Concave Function. A function f : Rn → R is a concave function


if it satisfies:

f (λx1 + (1 − λ)x2 ) ≥ λf (x1 ) + (1 − λ)f (x2 ) (1.44)

for all x1 , x2 ∈ Rn and for all λ ∈ [0, 1].

To visualize this definition, simply flip Figure 1.7 upside down. The following theorem
is a powerful tool that can be used to show sets are convex. It’s proof is outside the
scope of the class, but relatively easy.
Theorem 1.7 Let f : Rn → R be a convex function. Then the set C = {x ∈ Rn :
f (x) ≤ c}, where c ∈ R, is a convex set.

Exercise 1.1 Prove the Theorem 1.7. [Hint: Skip ahead and read the proof of
Lemma 1.3. Follow the steps in that proof, but apply them to f .] ■

1.3 Polyhedral Theory


Important examples of convex sets are polyhedral sets, the multi-dimensional analogs
of polygons in the plane. In order to understand these structures, we must first
understand hyperplanes and half-spaces.

Definition 1.24 — Hyperplane. Let a ∈ Rn be a constant vector in n-dimensional


space and let b ∈ R be a constant scalar. The set of points
n o
H = x ∈ Rn |aT x = b (1.45)

is a hyperplane in n-dimensional space. Note the use of column vectors for a

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 53

and x in this definition.

■ Example 1.34 Consider the hyper-plane 2x1 + 3x2 + x3 = 5. This is shown in


Figure 1.8.

Figure 1.7: A hyperplane in 3 dimensional space: A hyperplane is the set of points


satisfying an equation aT x = b, where k is a constant in R and a is a constant
vector in Rn and x is a variable vector in Rn . The equation is written as a matrix
multiplication using our assumption that all vectors are column vectors.

This hyperplane is composed of the set of points (x1 , x2 , x3 ) ∈ R3 satisfying


2x1 + 3x2 + x3 = 5. This can be plotted implicitly or explicitly by solving for one of
the variables, say x3 . We can write x3 as a function of the other two variables as:

x3 = 5 − 2x1 − 3x2 (1.46)

Definition 1.25 — Half-Space. Let a ∈ Rn be a constant vector in n-dimensional


space and let b ∈ R be a constant scalar. The sets of points
n o
Hl = x ∈ Rn |aT x ≤ b (1.47)
n o
Hu = x ∈ Rn |aT x ≥ b (1.48)

are the half-spaces defined by the hyperplane aT x = b.

■ Example 1.35 Consider the two dimensional hyperplane (line) x1 + x2 = 1. Then


the two half-spaces associated with this hyper-plane are shown in Figure 1.9.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


54 Chapter 1. Mathematical Foundations

(a) Hl (b) Hu

Figure 1.8: Two half-spaces defined by a hyper-plane: A half-space is so named


because any hyper-plane divides Rn (the space in which it resides) into two halves,
the side “on top” and the side “on the bottom.”

A half-space is so named because the hyperplane aT x = b literally separates Rn


into two halves: the half above the hyperplane and the half below the hyperplane.

Lemma 1.1. Every hyper-plane is convex.


Proof. Let a ∈ Rn and b ∈ R and let H be the hyperplane defined by a and b. Choose
x1 , x2 ∈ H and λ ∈ [0, 1]. Let x = λx1 + (1 − λ)x2 . By definition we know that:
a T x1 = b
a T x2 = b
Then we have:
aT x = aT [λx1 + (1 − λ)x2 ] = λaT x1 + (1 − λ)aT x2 = λb + (1 − λ)b = b (1.49)
Thus, x ∈ H and we see that H is convex. This completes the proof. ■
Lemma 1.2. Every half-space is convex.
Proof. Let a ∈ Rn and b ∈ R. Without loss of generality, consider the half-space Hl
defined by a and b. For arbitrary x1 and x2 in Hl we have:
a T x1 ≤ b
a T x2 ≤ b
Suppose that aT x1 = b1 ≤ b and aT x2 = b2 ≤ b. Again let x = λx1 + (1 − λ)x2 . Then:

aT x = aT [λx1 + (1 − λ)x2 ] = λaT x1 + (1 − λ)aT x2 = λb1 + (1 − λ)b2 (1.50)


Since λ ≤ 1 and 1 − λ ≤ 1 and λ ≥ 0 we know that λb1 ≤ λb, since b1 ≤ b. Similarly
we know that (1 − λ)b2 ≤ (1 − λ)b, since b2 ≤ b. Thus:
λb1 + (1 − λ)b2 ≤ λb + (1 − λ)b = b (1.51)
Thus we have shown that aT x ≤ b. The case for Hu is identical with the signs of the
inequalities reversed. This completes the proof. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 55

Using these definitions, we are now in a position to define polyhedral sets, which
will be the subject of our study for most of the remainder of this chapter.

Definition 1.26 — Polyhedral Set. If P ⊆ Rn is the intersection of a finite number


of half-spaces, then P is a polyhedral set. Formally, let a1 , . . . , am ∈ Rn be a
finite set of constant vectors and let b1 , . . . , bm ∈ R be constants. Consider the
set of half-spaces:

Hi = {x|aiT x ≤ bi }

Then the set:


m
\
P= Hi (1.52)
i=1

is a polyhedral set.

It should be clear that we can represent any polyhedral set using a matrix
inequality. The set P is defined by the set of vectors x satisfying:

Ax ≤ b, (1.53)

where the rows of A ∈ Rm×n are made up of the vectors a1 , . . . , am and b ∈ Rm is a


column vector composed of elements b1 , . . . , bm .
Theorem 1.8 Every polyhedral set is convex.

Exercise 1.2 Prove Theorem 1.8. [Hint: You can prove this by brute force, verifying
convexity. You can also be clever and use two results that we’ve proved in the
notes.] ■

Rays and Directions


Recall the definition of a line (Definition 2.5 from Chapter 1. A ray is a one sided
line.

Definition 1.27 — Ray. Let x0 ∈ Rn be a point and and let d ∈ Rn be a vector


called the direction. Then the ray with vertex x0 and direction d is the collection
of points {x|x = x0 + λd, λ ≥ 0}.

■ Example 1.36 We will use the same point and direction as we did for a line in
Chapter 1. Let x0 = [2, 1]T and let d = [2, 2]T . Then the ray defined by x0 and
d is shown in Figure 2.4. The set of points is R = {(x, y) ∈ R2 : x = 2 + 2λ, y =
1 + 2λ, λ ≥ 0}.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


56 Chapter 1. Mathematical Foundations

Figure 1.9: A Ray: The points in the graph shown in this figure are in the set
produced using the expression x0 + dλ where x0 = [2, 1]T and d = [2, 2]T and λ ≥ 0.

Rays are critical for understanding unbounded convex sets. Specifically, a set is
unbounded, in a sense, only if you can show that it contains a ray. An interesting
class of unbounded convex sets are convex cones:

Definition 1.28 — Convex Cone. Let C ⊆ Rn be a convex set. Then C is a


convex cone if for all x ∈ C and for all λ ∈ R with λ ≥ 0 we have λx ∈ C.

Lemma 1.3. Every convex cone contains the origin.

Exercise 1.3 Prove the previous lemma. ■

The fact that every convex cone contains the origin by Lemma 1.4 along with
the fact that for every point x ∈ C we have λx ∈ C (λ ≥ 0) implies that the ray
0 + λx ⊆ C. Thus, since every point x ∈ C must be on a ray, it follows that a convex
cone is just made up of rays beginning at the origin.
Another key element to understanding unbounded convex sets is the notion of
direction. A direction can be thought of as a “direction of travel” from a starting
point inside an unbounded convex set so that you (the traveler) can continue moving
forever and never leave the set.

Definition 1.29 — Direction of a Convex Set. Let C be a convex set. Then d ̸= 0


is a (recession) direction of the convex set if for all x0 ∈ C the ray with vertex
x0 and direction d is contained entirely in C. Formally, for all x0 ∈ C we have:

{x : x = x0 + λd, λ ≥ 0} ⊆ C (1.54)

■Example 1.37 Consider the unbounded convex set shown in Figure 1.11. This set
has direction [1, 0]T .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 57

Figure 1.10: Convex Direction: Clearly every point in the convex set (shown in
blue) can be the vertex for a ray with direction [1, 0]T contained entirely in the
convex set. Thus [1, 0]T is a direction of this convex set.

To see this note that for any positive scaling parameter λ and for any vertex
point x0 , we can draw an arrow pointing to the right (in the direction of [1, 0]T )
with vertex at x0 scaled by λ that is entirely contained in the convex set. ■

Exercise 1.4 Prove the following: Let C ⊆ Rn be a convex cone and let x1 , x2 ∈ C.
If α, β ∈ R and α, β ≥ 0, then αx1 + βx2 ∈ C. [Hint: Use the definition of convex
cone and the definition of convexity with λ = 1/2, then multiply by 2.] ■

Exercise 1.5 Use Exercise 1.4 to prove that if C ⊆ Rn is a convex cone, then every
element x ∈ C (except the origin) is also a direction of C. ■

Directions of Polyhedral Sets


There is a unique relationship between the defining matrix A of a polyhedral
set P and a direction of this set that is particularly useful when we assume that
P is located in the positive orthant of Rn (i.e., x ≥ 0 are defining constraints of
P ).
Theorem 1.9 Suppose that P ⊆ Rn is a polyhedral set defined by:

P = {x ∈ Rn : Ax ≤ b, x ≥ 0} (1.55)

If d is a direction of P , then the following hold:

Ad ≤ 0, d ≥ 0, d ̸= 0. (1.56)

Proof. The fact that d ̸= 0 is clear from the definition of direction of a convex set.
Furthermore, d is a direction if and only if
A (x + λd) ≤ b (1.57)
x + λd ≥ 0 (1.58)
for all λ > 0 and for all x ∈ P (which is to say x ∈ Rn such that Ax ≤ b and x ≥ 0).
But then
Ax + λAd ≤ b

T.Abraha(PhD) @AKU, 2024 Linear Optimization


58 Chapter 1. Mathematical Foundations

for all λ > 0. This can only be true if Ad ≤ 0. Likewise:x + λd ≥ 0 holds for all
λ > 0 if and only if d ≥ 0. This completes the proof. ■

Corollary 1.1 If

P = {x ∈ Rn : Ax = b, x ≥ 0} (1.59)

and d is a direction of P , then d must satisfy:

Ad = 0, d ≥ 0, d ̸= 0. (1.60)

Exercise 1.6 Prove the corollary above. ■

■ Example 1.38 Consider the polyhedral set defined by the equations:

x1 − x2 ≤ 1
2x1 + x2 ≥ 6
x1 ≥ 0
x2 ≥ 0

This set is clearly unbounded as we showed in class and it has at least one direction.
The direction d = [0, 1]T pointing directly up is a direction of this set. This is
illustrated in Figure 1.12.

Figure 1.11: An Unbounded Polyhedral Set: This unbounded polyhedral set has
many directions. One direction is [0, 1]T .

In this example, we have:


 
1 −1
A= (1.61)
−2 −1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 59

Note, the second inequality constraint was a greater-than constraint. We reversed


it to a less-than inequality constraint −2x1 − x2 ≤ −6 by multiplying by −1. For
our chosen direction d = [0, 1]T , we can see that:
    
1 −1 0 −1
Ad =  = ≤0 (1.62)
−2 −1 1 −1

Clearly d ≥ 0 and d ̸= 0. ■

1.3.1 Extreme Points

Definition 1.30 — Extreme Point of a Convex Set. Let C be a convex set. A


point x0 ∈ C is a extreme point of C if there are no points x1 and x2 (x1 ̸= x0
or x2 ̸= x0 ) so that x = λx1 + (1 − λ)x2 for some λ ∈ (0, 1).

An extreme point is simply a point in a convex set C that cannot be expressed


as a strict convex combination of any other pair of points in C. We will see that
extreme points must be located in specific locations in convex sets.

Definition 1.31 — Boundary of a set. Let C ⊆ Rn be (convex) set. A point


x0 ∈ C is on the boundary of C if for all ϵ > 0,

Bϵ (x0 ) ∩ C ̸= ∅ and
Bϵ (x0 ) ∩ Rn \ C ̸= ∅

■ Example 1.39 A convex set, its boundary and a boundary point are illustrated
in Figure 1.13.

BOUNDARY POINT
BOUNDARY

INTERIOR

Figure 1.12: Boundary Point: A boundary point of a (convex) set C is a point in


the set so that for every ball of any radius centered at the point contains some
points inside C and some points outside C.

Lemma 1.4. Suppose C is a convex set. If x is an extreme point of C, then x is on


the boundary of C.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


60 Chapter 1. Mathematical Foundations

Proof. Suppose not, then x is not on the boundary and thus there is some ϵ > 0
so that Bϵ (x0 ) ⊂ C. Since Bϵ (x0 ) is a hypersphere, we can choose two points x1
and x2 on the boundary of Bϵ (x0 ) so that the line segment between these points
passes through the center of Bϵ (x0 ). But this center point is x0 . Therefore x0 is the
mid-point of x1 and x2 and since x1 , x2 ∈ C and λx1 + (1 − λ)x2 = x0 with λ = 1/2
it follows that x0 cannot be an extreme point, since it is a strict convex combination
of x1 and x2 . This completes the proof. ■
Most important in our discussion of linear programming will be the extreme
points of polyhedral sets that appear in linear programming problems. The following
theorem establishes the relationship between extreme points in a polyhedral set and
the intersection of hyperplanes in such a set.
Theorem 1.10 Let P ⊆ Rn be a polyhedral set and suppose P is defined as:

P = {x ∈ Rn : Ax ≤ b} (1.63)

where A ∈ Rm×n and b ∈ Rm . A point x0 ∈ P is an extreme point of P if and


only if x0 is the intersection of n linearly independent hyperplanes from the set
defining P .

R The easiest way to see this as relevant to linear programming is to assume


that

P = {x ∈ Rn : Ax ≤ b, x ≥ 0} (1.64)

In this case, we could have m < n. In that case, P is composed of the


intersection of n + m half-spaces. The first m are for the rows of A and the
second n are for the non-negativity constraints. An extreme point comes from
the intersection of n of the hyperplanes defining these half-spaces. We might
have m come from the constraints Ax ≤ b and the other n − m from x ≥ 0.

Proof. (⇐) Suppose that x0 is the intersection of n hyperplanes. Then x0 lies on n


hyperplanes. By way of contradiction of suppose that x0 is not an extreme point.
Then there are two points x, x̂ ∈ P and a scalar λ ∈ (0, 1) so that

x0 = λx + (1 − λ)x̂

If this is true, then for some G ∈ Rn×n whose rows are drawn from A and a vector
g whose entries are drawn from the vector b, so that Gx0 = g. But then we have:

g = Gx0 = λGx + (1 − λ)Gx̂ (1.65)

and Gx ≤ g and Gx̂ ≤ g (since x, x̂ ∈ P ). But the only way for Equation 1.65 to
hold is if
1. Gx = g and
2. Gx̂ = g
The fact that the hyper-planes defining x0 are linearly independent implies that the
solution to Gx0 = g is unique. (That is, we have chosen n equations in n unknowns
and x0 is the solution to these n equations.) Therefore, it follows that x0 = x = x̂
and thus x0 is an extreme point since it cannot be expressed as a convex combination
of other points in P .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 61

(⇒) By Lemma 1.5, we know that any extreme point x0 lies on the boundary of
P and therefore there is at least one row Ai· such that Ai· x0 = bi (otherwise, clearly
x0 does not lie on the boundary of P ). By way of contradiction, suppose that x0
is the intersection of r < n linearly independent hyperplanes (that is, only these r
constraints are binding). Then there is a matrix G ∈ Rr×n whose rows are drawn
from A and a vector g whose entries are drawn from the vector b, so that Gx0 = g.
Linear independence of the hyperplanes implies that the rows of G are linearly
independent and therefore there is a non-zero solution to the equation Gd = 0. To
see this, apply Expression 5.51 and choose solution in which d is non-zero. Then we
can find an ϵ > 0 such that:
1. If x = x0 + ϵd, then Gx = g and all non-binding constraints at x0 remain
non-binding at x.
2. If x̂ = x0 − ϵd, then Gx̂ = g and all non-binding constraints at x0 remain
non-binding at x̂.
These two facts hold since Gd = 0 and if Ai· is a row of A with Ai· x0 < bi (or
x > 0), then there is at least one non-zero ϵ so that Ai· (x0 ± ϵd) < bi (or x0 ± ϵd > 0)
still holds and therefore (x0 ± ϵd) ∈ P . Since we have a finite number of constraints
that are non-binding, we may choose ϵ to be the smallest value so that the previous
statements hold for all of them. Finally we can choose λ = 1/2 and see that
x0 = λx + (1 − λ)x̂ and x, x̂ ∈ P . Thus x0 cannot have been an extreme point,
contradicting our assumption. This completes the proof. ■

Definition 1.32 Let P be the polyhedral set from Theorem 1.10. If x0 is an


extreme point of P and more than n hyperplanes are binding at x0 , then x0 is
called a degenerate extreme point.

Definition 1.33 — Face. Let P be a polyhedral set defined by

P = {x ∈ Rn : Ax ≤ b}

where A ∈ Rm×n and b ∈ Rm . If X ⊆ P is defined by a non-empty set of binding


linearly independent hyperplanes, then X is a face of P .
That is, there is some set of linearly independent rows Ai1 · , . . . Ail · with
il < m so that when G is the matrix made of these rows and g is the vector of
bi1 , . . . , bil then:

X = {x ∈ Rn : Gx = g and Ax ≤ b} (1.66)

In this case we say that X has dimension n − l.

R Based on this definition, we can easily see that an extreme point, which is the
intersection n linearly independent hyperplanes is a face of dimension zero.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


62 Chapter 1. Mathematical Foundations

Definition 1.34 — Edge and Adjacent Extreme Point. An edge of a polyhedral


set P is any face of dimension 1. Two extreme points are called adjacent if they
share n − 1 binding constraints. That is, they are connected by an edge of P .

■ Example 1.40 Consider the polyhedral set defined by the system of inequalities:

3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
28
x1 + x2 ≤ 100
16
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0

The polyhedral set is shown in Figure 1.14.

Figure 1.13: A Polyhedral Set: This polyhedral set is defined by five half-spaces
and has a single degenerate extreme point located at the intersection of the binding
constraints 3x1 + x2 ≤ 120, x1 + 2x2 ≤ 160 and 28 16 x1 + x2 <= 100. All faces are
shown in bold.

The extreme points of the polyhedral set are shown as large diamonds and
correspond to intersections of binding constraints. Note the extreme point (16, 72) is
degenerate since it occurs at the intersection of three binding constraints 3x1 + x2 ≤
120, x1 + 2x2 ≤ 160 and 2816 x1 + x2 <= 100. All the faces of the polyhedral set are
shown in bold. They are locations where one constraint (or half-space) is binding.
An example of a pair of adjacent extreme points is (16, 72) and (35, 15), as they
are connected by the edge defined by the binding constraint 3x1 + x2 ≤ 120. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 63

Exercise 1.7 Consider the polyhedral set defined by the system of inequalities:

4x1 + x2 ≤ 120
x1 + 8x2 ≤ 160
x1 + x2 ≤ 30
x1 ≥ 0
x2 ≥ 0

Identify all extreme points and edges in this polyhedral set and their binding
constraints. Are any extreme points degenerate? List all pairs of adjacent extreme
points. ■

Extreme Directions

Definition 1.35 — Extreme Direction. Let C ⊆ Rn be a convex set. Then a


direction d of C is an extreme direction if there are no two other directions d1
̸ d and d2 ̸= d) and scalars λ1 , λ2 > 0 so that d = λ1 d1 + λ2 d2 .
and d2 of C (d1 =

We have already seen by Theorem 1.9 that is P is a polyhedral set in the positive
orthant of Rn with form:
P = {x ∈ Rn : Ax ≤ b, x ≥ 0}
then a direction d of P is characterized by the set of inequalities and equations
Ad ≤ 0, d ≥ 0, d ̸= 0.
Clearly two directions d1 and d2 with d1 = λd2 for some λ ≥ 0 may both satisfy this
system. To isolate a unique set of directions, we can normalize and construct the set:

D = {d ∈ Rn : Ad ≤ 0, d ≥ 0, eT d = 1} (1.67)
here we are interested only in directions satisfying eT d = 1. This is a normalizing
constraint that will chose only vectors whose components sum to 1.
Theorem 1.11 A direction d ∈ D is an extreme direction of P if and only if d is
an extreme point of D when D is taken as a polyhedral set.

Proof. (⇒)Suppose that d is an extreme point of D (as a polyhedral set) and not
an extreme direction of P . Then there exist two directions d1 and d2 of P and
two constants λ1 and λ2 with λ1 , λ2 ≥ 0 so that d = λ1 d1 + λ2 d2 . Without loss of
generality, we may assume that d1 and d2 are vectors satisying eT di = 1 (i = 1, 2).
If not, then we can scale them so their components sum to 1 and adjust λ1 and λ2
accordingly. But this implies that:
1 = eT d = λ1 eT d1 + λ2 eT d2 = λ1 + λ2
Further, the fact that d1 and d2 are directions of P implies they must be in D. Thus
we have found a convex combination of element of D that equals d, contradicting
our assumption that d was an extreme point.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


64 Chapter 1. Mathematical Foundations

(⇐)Conversely, suppose that d is an extreme direction of P whose components


sum to 1 and not an extreme point of D. (Again, we could scale d if needed.) Then
d cannot be recovered from a positive combination of directions d1 and d2 . But if
d is not an extreme point of D, then we know λ1 d1 + λ2 d2 = d for d1 , d2 ∈ D and
λ1 + λ2 = 1 and λ1 , λ2 ∈ (0, 1). This is clearly contradictory. Every strict convex
combination is a positive combination and therefore our assumption that d was an
extreme direction was false. ■

■ Example 1.41 Let’s consider Example 1.36 again. The polyhedral set in this
example was defined by the A matrix:
 
1 −1
A=
−2 −1

and the b vector:


 
1
b= 
−6

If we assume that P = {x ∈ Rn : Ax ≤ b, x ≥ 0}, then the set of extreme directions


of P is the same as the set of extreme points of the set

D = {d ∈ Rn : Ad ≤ 0, d ≥ 0, eT d = 1}

Then we have the set of directions d = [d1 , d2 ]T so that:

d1 − d2 ≤ 0
−2d1 − d2 ≤ 0
d1 + d2 = 1
d1 ≥ 0
d2 ≥ 0

The feasible region (which is really only the line d1 + d2 = 1) is shown in red in
Figure 1.15.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 65

Figure 1.14: Visualization of the set D: This set really consists of the set of points
on the red line. This is the line where d1 + d2 = 1 and all other constraints hold.
This line has two extreme points (0, 1) and (1/2, 1/2).

The critical part of this figure is the red line. It is the true set D. As a line, it
has two extreme points: (0, 1) and (1/2, 1/2). Note that (0, 1) as an extreme point
is one of the direction [0, 1]T we illustrated in Example 1.36. ■

Exercise 1.8 Show that d = [1/2, 1/2]T is a direction of the polyhedral set P from
Example 1.36. Now find a non-extreme direction (whose components sum to
1) using the feasible region illustrated in the previous example. Show that the
direction you found is a direction of the polyhedral set. Create a figure like Figure
1.12 to illustrate both these directions. ■

Caratheodory Characterization Theorem


Lemma 1.5. The polyhedral set defined by:
P = {x ∈ Rn : Ax ≤ b, x ≥ 0}
has a finite, non-zero number of extreme points (assuming that A is not an empty
matrix).
Proof. Let x ∈ P . If x is an extreme point, then the theorem is proved. Suppose
that x is not an extreme point. Then by Theorem 1.10, x lies at the intersection of
r < n binding constraints (where r could be zero). The fact that x is not an extreme
point of P implies the existence of y1 , y2 ∈ P and a λ > 0 so that x = λy1 + (1 − λ)y2 .
For this to hold, we know that the r constraints that bind at x must also be binding
at y1 and y2 .
Let d = y2 − y1 be the direction from y1 to y2 . We can see that:
y1 =x − (1 − λ)d
(1.68)
y2 =x + λd
The values x + γd and x − γd for γ > 0 correspond to motion from x along the
direction of d. From Expression 1.68, we can move in either the positive or negative

T.Abraha(PhD) @AKU, 2024 Linear Optimization


66 Chapter 1. Mathematical Foundations

direction of d and remain in P . Let γ be the largest value so that both x + γd


or x − γd is in P . Clearly we cannot move in both directions infinitely far since
x ≥ 0 and hence γ < ∞. Without loss of generality, suppose that γ is determined by
x − γd. (That is, x − (γ + ϵ)d ̸∈ P for ϵ > 0). Let x1 = x − γd. Since r hyperplanes
are binding at x (and y1 and y2 ) it is clear that these same hyperplanes are binding
at x1 and at least one more (because of how we selected γ). Thus there are at least
r + 1 binding hyperplanes at x1 . If r + 1 = n, then we have identified an extreme
point. Otherwise, we may repeat the process until we find an extreme point.
To show that the number of extreme points is finite, we note that every extreme
point is the intersection of n linearly independent hyperplanes defining P . There are
n + m hyperplanes
  defining P and therefore the number of possible extreme points is
limited by n+m n . This completes the proof. ■

Lemma 1.6. Let P be a non-empty polyhedral set. Then the set of directions of P
is empty if and only if P is bounded.
Proof. Clearly if P is bounded then it cannot have a direction. If P were contained
in a ball Br (x0 ) then we know that for every x ∈ P we have |x − x0 | < r. If d is a
direction of P , then we have x + λd for all λ > 0. We simply need to choose λ large
enough so that |x + λd − x0 | > r.
If P has no directions, then there is some absolute upper bound on the value of
|x| for all x ∈ P . Let r be this value. Then trivially, Br+1 (0) contains P and so P is
bounded. ■

Lemma 1.7. Let P be a non-empty unbounded polyhedral set. Then the number
extreme directions of P is finite and non-zero.
Proof. The result follows immediately from Theorem 1.11 and Lemma 1.6. ■

Theorem 1.12 Let P be a non-empty, unbounded polyhedral set defined by:

P = {x ∈ Rn : Ax ≤ b, x ≥ 0}

(where we assume A is not an empty matrix). Suppose that P has extreme points
x1 , . . . , xk and extreme directions d1 , . . . , dl . If x ∈ P , then there exists constants
λ1 , . . . , λk and µ1 , . . . , µl such that:
k
X l
X
x= λi x i + µj dj
i=1 j=1
k
(1.69)
X
λi = 1
i=1
λi ≥ 0 i = 1, . . . , k
µj ≥ 0 1, . . . , l

Proof. Let x ∈ P . We will show that we can identify λ1 , . . . , λk and µ1 , . . . , µl making


Expression 1.69 true. Define the set:

P = P ∩ {x ∈ Rn : eT x ≤ M } (1.70)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 67

where M is a large constant so that eT xi < M for i = 1, . . . , k and eT x < M . That


is, M is large enough so that the sum of the components of any extreme point is less
than M and the sum of the components of x is less than M .
It is clear that P is bounded. In fact, if P is bounded, then P = P . Furthermore
P is a polyhedral set contained in P and therefore the extreme points of P are also
the extreme points of P . Define
E p = {x1 , . . . , xk , . . . , xk+u }
as the extreme points of P . By Theorem 1.6 we know that 0 ≤ u < ∞. If x ∈ E p ,
then x can be written as a convex combination of the elements of E p . Therefore,
assume that x ̸∈ E p . Now, suppose that the system of equations Gy = g represents
the binding hyperplanes (constraints) of P that are active at x. Clearly rank(G) < n
(otherwise, x would be an extreme point of E p ).
Let d ̸= 0 be a solution to the problem Gd = 0 and compute γ1 = max{γ : x + γd ∈
X}. Since X is bounded and x is not in E p , then 0 < γ1 < ∞. Let y = x + γ1 d. Just
as in the proof of Lemma 1.6, at y, we now have (at least one) additional linearly
independent binding hyperplane of P . If there are now n binding hyperplanes, then
y is an extreme point of P . Otherwise, we may repeat this process until we identify
such an extreme point. Let y1 be this extreme point. Clearly Gy1 = g. Now define:
γ2 = max{γ : x + γ(x − y1 ) ∈ P }
The value value x − y1 is the direction from y1 to x and γ2 may be thought of as the
size of a step that one can from x away from y1 along the line from y1 to x. Let
y2 = x + γ2 (x − y1 ) (1.71)
Again γ2 < ∞ since P is bounded and further γ2 > 0 since:
G (x + γ2 (x − y1 ) = g (1.72)
for all γ ≥ 0 (as Gx = Gy1 ). As we would expect, Gy2 = g and there is at least one
additional hyperplane binding at y2 (as we saw in the proof of Lemma 1.6). Trivially,
x is a convex combination of y1 and y2 . Specifically, let
x = δy1 + (1 − δ)y2
with δ ∈ (0, 1) and δ = γ2 /(1 + γ2 ). This follows from Equation 1.71, by solving for x.
Now if y2 ∈ E p , then we have written x as a convex combination of extreme points
of P . Otherwise, we can repeat the process we used to find y2 in terms of y3 and
y4 , at which an additional hyperplane (constraint) is binding. We may repeat this
process until we identify elements of E P . Reversing this process, we can ultimately
write x as a convex combination of these extreme points. Thus we have shown that
we can express x as a convex combination of the extreme points of E P .
Based on our deduction, we may write:
k+u
X
x= δi x i
i=1
k+u
X (1.73)
1= δi
i=1
δi ≥ 0 i = 1, . . . , k + u

T.Abraha(PhD) @AKU, 2024 Linear Optimization


68 Chapter 1. Mathematical Foundations

If δi = 0 if i > k, then we have succeeded in expressing x as a convex combination of


the extreme points of P . Suppose not. Consider xv with v > k (so xv is an extreme
point of P but not P ). Then it follows that eT xv = M must hold (otherwise, xv is
an extreme point of P ). Since there are n binding hyperplanes at xv , there must be
n − 1 hyperplanes defining P binding at xv . Thus, there is an edge connecting xv
and some extreme point of P (i.e., there is an extreme point of P that shares n − 1
binding hyperplanes with xv ). Denote this extreme point as xi(v) ; it is adjacent to
xv . Consider the direction xv − xv(i) . This must be a recession direction of P since
there is no other hyperplane that binds in this direction before eT x = M . Define:

θv = eT (xv − xv(i) )

and let
xv − xv(i)
d=
θv

then eT d = 1 as we have normalized the direction elements and therefore d ∈ D.


Again, let Gy = g be the system of n − 1 binding linear hyperplanes shared by xv
and xv(i) . Then trivially:

G(xv − xv(i) ) = Gd = 0

and therefore, there are n − 1 linearly independent binding hyperplanes in the system
Ad ≤ 0, d ≥ 0. At last we see that with eT d = 1 that d must be an extreme point of
D and therefore an extreme direction of P . Let dj(v) = d be this extreme direction.
Thus we have:

xv = xi(v) + θv dj(v)

At last we can see that by substituting this into Expression 1.73 for each such v
and arbitrarily letting i(v) = j(v) = 1 if δv = 0 (in which case it doesn’t matter), we
obtain:
k
X k+u
X k+u
X
x= δj xj + δv xi(v) + δv θv dj(v) (1.74)
i=1 v=k+1 v=k+1

We have now expressed x as desired. This completes the proof. ■

■ Example 1.42 The Cartheodory Characterization Theorem is illustrated for a


bounded and unbounded polyhedral set in Figure 1.16.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


1.3 Polyhedral Theory 69
x2
x1 λx2 + (1 − λ)x3
x3
x

x5 x4
x = µx5 + (1 − µ) (λx2 + (1 − λ)x3 )

x1

x
d1
x2
λx2 + (1 − λ)x3
x3
x = λx2 + (1 − λ)x3 + θd1

Figure 1.15: The Cartheodory Characterization Theorem: Extreme points and


extreme directions are used to express points in a bounded and unbounded set.

This example illustrates simply how one could construct an expression for an
arbitrary point x inside a polyhedral set in terms of extreme points and extreme
directions. ■

Examples Convex Sets


1. Hyperplane H = {x ∈ Rn : a⊤ x = b}
2. Halfspace H = {x ∈ Rn : a⊤ x ≤ b}
3. Polyhedron P = {x ∈ Rn : Ax ≤ b}
4. Ball S = {x ∈ Rn : ni=1 x2i ≤ 1}
P

5. Second Order Cone S = {(x, t) ∈ Rn × R : ni=1 x2i ≤ t2 }


P

■ Example 1.43 Proof that a Polyhedron is Convexpolyhedron-convex Let P be a


polyhedron defined by:

P = {x ∈ Rn : Ax ≤ b}

where A is an m × n matrix and b is an m × 1 vector.

To prove the convexity of P , we’ll use the definition of convexity: For any
x, y ∈ P and any λ ∈ [0, 1], the point z = λx + (1 − λ)y must also belong to P .
Let x, y ∈ P . Then, by the definition of P , we have:

Ax ≤ b and Ay ≤ b

We want to demonstrate the inequality Az ≤ b. We do this in just a few lines:

Az = A(λx + (1 − λ)y) Substituting the expression for z


= λAx + (1 − λ)Ay Expanding the matrix-vector product
≤ λb + (1 − λ)b Explaination below

T.Abraha(PhD) @AKU, 2024 Linear Optimization


70 Chapter 1. Mathematical Foundations

=b

The inequality above uses the fact that Ax ≤ b and Ay ≤ b, and since λ is in the
interval [0,1], it follows that:

λAx ≤ λb and (1 − λ)Ay ≤ (1 − λ)b

The computation shows that z = λx + (1 − λ)y satisfies Az ≤ b and therefore


z ∈ P.
This shows that the polyhedron P is convex. ■

Lemma 1.8. Intersection of Convex Sets is Convex Let C1 and C2 be convex sets.
Then the intersection C1 ∩ C2 is convex. In particular,

C1 ∩ C2 := {x : x ∈ C1 and x ∈ C2 }.

Polytope
Ball y
x

Figure 1.16: The green intersection of the convex sets that are the ball and the
polytope is also convex. This can be seen by considering any points x, y ∈ Ball ∩
Polytope. Since Ball is convex, the line segment between x and y is completey
contained in Ball. And similarly, the line segment is completely contained in Polytope.
Hence, the line segment is also contained in the intersection. This is how we can
reason that the intersection is also convex.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2. Introduction to Linear Optimization

2.1 Overview of Linear Optimization


2.2 Historical Development
2.3 Importance in Various Fields
Optimization is an exciting sub-discipline within applied mathematics! Optimization
is all about making things better; this could mean helping a company make better
decisions to maximize profit; helping a factory make products with less environmental
impact; or helping a zoologist improve the diet of an animal. When we talk about
optimization, we often use terms like better or improvement. It’s important to
remember that words like better can mean more of something (as in the case of
profit) or less of something as in the case of waste. As we study linear programming,
we’ll quantify these terms in a mathematically precise way. For the time being, let’s
agree that when we optimize something we are trying to make some decisions that
will make it better.
Linear Programming is a sub-field of optimization theory, which is itself a sub-field
of Applied Mathematics. Applied Mathematics is a very general area of study that
could arguably encompass half of the engineering disciplines–if you feel like getting
into an argument with an engineer. Put simply, applied mathematics is all about
applying mathematical techniques to understand or do something practical.

■ Example 2.1 Let’s recall a simple optimization problem from differential calculus
(Math 140): Goats are an environmentally friendly and inexpensive way to control
a lawn when there are lots of rocks or lots of hills. (Seriously, both Google and
some U.S. Navy bases use goats on rocky hills instead of paying lawn mowers!)
Suppose I wish to build a pen to keep some goats. I have 100 meters of fencing
and I wish to build the pen in a rectangle with the largest possible area. How long
should the sides of the rectangle be? In this case, making the pen better means
making it have the largest possible area.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


72 Chapter 2. Introduction to Linear Optimization

The problem is illustrated in Figure 2.1.


x

Goat Pen y

Figure 2.1: Goat pen with unknown side lengths. The objective is to identify the
values of x and y that maximize the area of the pen (and thus the number of goats
that can be kept).

Clearly, we know that:

2x + 2y = 100 (2.1)

because 2x + 2y is the perimeter of the pen and I have 100 meters of fencing to
build my pen. The area of the pen is A(x, y) = xy. We can use Equation 2.1 to
solve for x in terms of y. Thus we have:

y = 50 − x (2.2)

and A(x) = x(50 − x). To maximize A(x), recall we take the first derivative of A(x)
with respect to x, set this derivative to zero and solve for x:
dA
= 50 − 2x = 0; (2.3)
dx
Thus, x = 25 and y = 50 − x = 25. We further recall from basic calculus how to
confirm that this is a maximum; note:

d2 A
= −2 < 0 (2.4)
dx2 x=25

Which implies that x = 25 is a local maximum for this function. Another way of
seeing this is to note that A(x) = 50x − x2 is an “upside-down” parabola. As we
could have guessed, a square will maximize the area available for holding goats. ■

Exercise 2.1 A canning company is producing canned corn for the holidays. They
have determined that each family prefers to purchase their corn in units of 12
fluid ounces. Assuming that metal costs 1 cent per square inch and 1 fluid ounce
is about 1.8 cubic inches, compute the ideal height and radius for a can of corn
assuming that cost is to be minimized. [Hint: Suppose that our can has radius
r and height h. The formula for the surface area of a can is 2πrh + 2πr2 . Since

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.4 A General Maximization Formulation 73

metal is priced by the square inch, the cost is a function of the surface area. The
volume of the can is πr2 h and is constrained. Use the same trick we did in the
example to find the values of r and h that minimize cost. ■

2.4 A General Maximization Formulation


Let’s take a more general look at the goat pen example. The area function is a
mapping from R2 to R, written A : R2 → R. The domain of A is the two dimensional
space R2 and its range is R.

Our objective in Example 2.1 is to maximize the function A by choosing values


for x and y. In optimization theory, the function we are trying to maximize (or
minimize) is called the objective function. In general, an objective function is a
mapping z : D ⊆ Rn → R. Here D is the domain of the function z.

Definition 2.1 Let z : D ⊆ Rn → R. The point x∗ is a global maximum for z if


for all x ∈ D, z(x∗ ) ≥ z(x). A point x∗ ∈ D is a local maximum for z if there is
a neighborhood S ⊆ D of x∗ (i.e., x∗ ∈ S) so that for all x ∈ S, z(x∗ ) ≥ z(x).

R Clearly Definition 2.1 is valid only for domains and functions where the
concept of a neighborhood is defined and understood. In general, S must be
a topologically connected set (as it is in a neighborhood in Rn ) in order for
this definition to be used or at least we must be able to define the concept of
neighborhood on the set.

Exercise 2.2 Using analogous reasoning write a definition for a global and local
minimum. [Hint: Think about what a minimum means and find the correct
direction for the ≥ sign in the definition above.] ■

In Example 2.1, we are constrained in our choice of x and y by the fact that
2x + 2y = 100. This is called a constraint of the optimization problem. More
specifically, it’s called an equality constraint. If we did not need to use all the fencing,
then we could write the constraint as 2x + 2y ≤ 100, which is called an inequality
constraint. In complex optimization problems, we can have many constraints. The
set of all points in Rn for which the constraints are true is called the feasible set (or
feasible region). Our problem is to decide the best values of x and y to maximize
the area A(x, y). The variables x and y are called decision variables.

Let z : D ⊆ Rn → R; for i = 1, . . . , m, gi : D ⊆ Rn → R; and for j = 1, . . . , l


hj : D ⊆ Rn → R be functions. Then the general maximization problem with objective
function z(x1 , . . . , xn ) and inequality constraints gi (x1 , . . . , xn ) ≤ bi (i = 1, . . . , m) and

T.Abraha(PhD) @AKU, 2024 Linear Optimization


74 Chapter 2. Introduction to Linear Optimization

equality constraints hj (x1 , . . . , xn ) = rj is written as:





 max z(x1 , . . . , xn )

s.t. g1 (x1 , . . . , xn ) ≤ b1





..



.





gm (x1 , . . . , xn ) ≤ bm (2.5)





 h1 (x1 , . . . , xn ) = r1


 ..
.






hl (x1 , . . . , xn ) = rl

Expression 2.5 is also called a mathematical programming problem. Naturally when


constraints are involved we define the global and local maxima for the objective
function z(x1 , . . . , xn ) in terms of the feasible region instead of the entire domain of
z, since we are only concerned with values of x1 , . . . , xn that satisfy our constraints.

■ Example 2.2 — Continuation of Example 2.1. We can re-write the problem in


Example 2.1:




max A(x, y) = xy

 s.t. 2x + 2y = 100

(2.6)


 x≥0

y≥0

Note we’ve added two inequality constraints x ≥ 0 and y ≥ 0 because it doesn’t


really make any sense to have negative lengths. We can re-write these constraints
as −x ≤ 0 and −y ≤ 0 where g1 (x, y) = −x and g2 (x, y) = −y to make Expression
2.6 look like Expression 2.5. ■

We have formulated the general maximization problem in Proble 2.5. Suppose that
we are interested in finding a value that minimizes an objective function z(x1 , . . . , xn )
subject to certain constraints. Then we can write Problem 2.5 replacing max with
min.
Exercise 2.3 Write the problem from Exercise 2.1 as a general minimization
problem. Add any appropriate non-negativity constraints. [Hint: You must
change max to min.] ■

An alternative way of dealing with minimization is to transform a minimization


problem into a maximization problem. If we want to minimize z(x1 , . . . , xn ), we can
maximize −z(x1 , . . . , xn ). In maximizing the negation of the objective function, we
are actually finding a value that minimizes z(x1 , . . . , xn ).
Exercise 2.4 Prove the following statement: Consider Problem 2.5 with the objective
function z(x1 , . . . , xn ) replaced by −z(x1 , . . . , xn ). Then the solution to this new
problem minimizes z(x1 , . . . , xn ) subject to the constraints of Problem 2.5.[Hint:
Use the definition of global maximum and a multiplication by −1. Be careful with
the direction of the inequality when you multiply by −1.] ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.5 Some Geometry for Optimization 75

2.5 Some Geometry for Optimization


A critical part of optimization theory is understanding the geometry of Euclidean
space. To that end, we’re going to review some critical concepts from Vector Calculus
(Math 230/231). I’ll assume that you remember some basic definitions like partial
derivative and Euclidean space.
We’ll denote vectors in Rn in boldface. So x ∈ Rn is an n-dimensional vector and
we have x = (x1 , . . . , xn ).

Definition 2.2 — Dot Product. Recall that if x, y ∈ Rn are two n-dimensional


vectors, then the dot product (scalar product) is:
n
X
x·y = xi y i (2.7)
i=1

where xi is the ith component of the vector x.

An alternative and useful definition for the dot product is given by the following
formula. Let θ be the angle between the vectors x and y. Then the dot product of x
and y may be alternatively written as:

x · y = ||x||||y|| cos θ (2.8)

This fact can be proved using the law of cosines from trigonometry. As a result, we
have the following small lemma:
Lemma 2.1. Let x, y ∈ Rn . Then the following hold:
1. The angle between x and y is less than π/2 (i.e., acute) iff x · y > 0.
2. The angle between x and y is exactly π/2 (i.e., the vectors are orthogonal) iff
x · y = 0.
3. The angle between x and y is greater than π/2 (i.e., obtuse) iff x · y < 0.

Exercise 2.5 Use the value of the cosine function and the fact that x · y =
||x||||y|| cos θ to prove the lemma. [Hint: For what values of θ is cos θ > 0.] ■

Definition 2.3 — Graph. Let z : D ⊆ Rn → R be function, then the graph of z is


the set of n + 1 tuples:

{(x, z(x)) ∈ Rn+1 |x ∈ D} (2.9)

When z : D ⊆ R → R, the graph is precisely what you’d expect. It’s the set of pairs
(x, y) ∈ R2 so that y = z(x). This is the graph that you learned about back in Algebra
1.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


76 Chapter 2. Introduction to Linear Optimization

Definition 2.4 — Level Set. Let z : Rn → R be a function and let c ∈ R. Then


the level set of value c for function z is the set:

{x = (x1 , . . . , xn ) ∈ Rn |z(x) = c} ⊆ Rn (2.10)

■ Example 2.3 Consider the function z = x2 + y 2 . The level set of z at 4 is the set
of points (x, y) ∈ R2 such that:

x2 + y 2 = 4 (2.11)

You will recognize this as the equation for a circle with radius 4. We illustrate this
in the following two figures. Figure 2.2 shows the level sets of z as they sit on the
3D plot of the function, while Figure 2.3 shows the level sets of z in R2 . The plot
in Figure 2.3 is called a contour plot.

Level Set

Figure 2.2: Plot with Level Sets Projected on the Graph of z. The level sets
existing in R2 while the graph of z existing R3 . The level sets have been projected
onto their appropriate heights on the graph.

Level Set

Figure 2.3: Contour Plot of z = x2 + y 2 . The circles in R2 are the level sets of the
function. The lighter the circle hue, the higher the value of c that defines the level
set.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.5 Some Geometry for Optimization 77

Definition 2.5 (Line) Let x0 , v ∈ Rn . Then the line defined by vectors x0 and
v is the function l(t) = x0 + tv. Clearly l : R → Rn . The vector v is called the
direction of the line.

■ Example 2.4 Let x0 = (2, 1) and let v = (2, 2). Then the line defined by x0 and v
is shown in Figure 2.4. The set of points on this line is the set L = {(x, y) ∈ R2 :
x = 2 + 2t, y = 1 + 2t, t ∈ R}.

Figure 2.4: A Line Function: The points in the graph shown in this figure are in
the set produced using the expression x0 + vt where x0 = (2, 1) and let v = (2, 2).

Definition 2.6 — Directional Derivative. Let z : Rn → R and let v ∈ Rn be a


vector (direction) in n-dimensional space. Then the directional derivative of z
at point x0 ∈ Rn in the direction of v is

d
z(x0 + tv) (2.12)
dt t=0

when this derivative exists.

Proposition 2.1 The directional derivative of z at x0 in the direction v is equal to:

z(x0 + hv) − z(x0 )


lim (2.13)
h→0 h

Exercise 2.6 Prove Proposition 2.1. [Hint: Use the definition of derivative for a
univariate function and apply it to the definition of directional derivative and
evaluate t = 0.] ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


78 Chapter 2. Introduction to Linear Optimization

Definition 2.7 — Gradient. Let z : Rn → R be a function and let x0 ∈ Rn . Then


the gradient of z at x0 is the vector in Rn given by:
!
∂z ∂z
∇z(x0 ) = (x0 ), . . . , (x0 ) (2.14)
∂x1 ∂xn

Gradients are extremely important concepts in optimization (and vector cal-


culus in general). Gradients have many useful properties that can be exploited.
The relationship between the directional derivative and the gradient is of critical
importance.
Theorem 2.1 If z : Rn → R is differentiable, then all directional derivatives exist.
Furthermore, the directional derivative of z at x0 in the direction of v is given by:

∇z(x0 ) · v (2.15)

where · denotes the dot product of two vectors.

Proof. Let l(t) = x0 + vt. Then l(t) = (l1 (t), . . . , ln (t)); that is, l(t) is a vector function
whose ith component is given by li (t) = x0i + vi t.
Apply the chain rule:

dz(l(t)) ∂z dl1 ∂z dln


= +···+ (2.16)
dt ∂l1 dt ∂ln dt
Thus:
d dl
z(l(t)) = ∇z · (2.17)
dt dt
Clearly dl/dt = v. We have l(0) = x0 . Thus:

d
z(x0 + tv) = ∇z(x0 ) · v (2.18)
dt t=0

We now come to the two most important results about gradients, (i) the fact that
they always point in the direction of steepest ascent with respect to the level curves
of a function and (ii) that they are perpendicular (normal) to the level curves of a
function. We can exploit this fact as we seek to maximize (or minimize) functions.

Theorem 2.2 Let z : Rn → R be differentiable, x0 ∈ Rn . If ∇z(x0 ) ̸= 0, then ∇z(x0 )


points in the direction in which z is increasing fastest.

Proof. Recall ∇z(x0 ) · v is the directional derivative of z in direction v at x0 . Assume


that v is a unit vector. We know that:

∇z(x0 ) · v = ||∇z(x0 )|| cos θ (2.19)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.5 Some Geometry for Optimization 79

(because we assumed v was a unit vector) where θ is the angle between the vectors
∇z(x0 ) and v. The function cos θ is largest when θ = 0, that is when v and ∇z(x0 )
are parallel vectors. (If ∇z(x0 ) = 0, then the directional derivative is zero in all
directions.) ■

Theorem 2.3 Let z : Rn → R be differentiable and let x0 lie in the level set S
defined by z(x) = k for fixed k ∈ R. Then ∇z(x0 ) is normal to the set S in the
sense that if v is a tangent vector at t = 0 of a path c(t) contained entirely in S
with c(0) = x0 , then ∇z(x0 ) · v = 0.

R Before giving the proof, we illustrate this theorem in Figure 2.5. The function
is z(x, y) = x4 + y 2 + 2xy and x0 = (1, 1). At this point ∇z(x0 ) = (6, 4). We
include the tangent line to the level set at the point (1,1) to illustrate the
normality of the gradient to the level curve at the point.

Figure 2.5: A Level Curve Plot with Gradient Vector: We’ve scaled the gradient
vector in this case to make the picture understandable. Note that the gradient
is perpendicular to the level set curve at the point (1, 1), where the gradient was
evaluated. You can also note that the gradient is pointing in the direction of steepest
ascent of z(x, y).

Proof. As stated, let c(t) be a curve in S. Then c : R → Rn and z(c(t)) = k for all
t ∈ R. Let v be the tangent vector to c at t = 0; that is:

dc(t)
=v (2.20)
dt t=0

Differentiating z(c(t)) with respect to t using the chain rule and evaluating at t = 0
yields:

d
z(c(t)) = ∇z(c(0)) · v = ∇z(x0 ) · v = 0 (2.21)
dt t=0

Thus ∇z(x0 ) is perpendicular to v and thus normal to the set S as required. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


80 Chapter 2. Introduction to Linear Optimization

R There’s a simpler proof of this theorem in the case of a mapping z : R2 → R.


For any such function z(x, y), we know that a level set is an implicitly defined
curve given by the expression

z(x, y) = k

where k ∈ R. We can compute the slope of any tangent line to this curve at
some point (x0 , y0 ) with implicit differentiation. We have:

d d
   
z(x, y) = k
dx dx
yields:
∂z ∂z dy
+ =0
∂x ∂y dx
Then the slope of the tangent line is given by:

dy −∂z/∂x
=
dx ∂z/∂y

By zx (x0 , y0 ) we mean ∂z/∂x evaluated at (x0 , y0 ) and by zy (x0 , y0 ) we mean


∂z/∂y evaluated at (x0 , y0 ). Then the slope of the tangent line to the curve
z(x, y) = k at (x0 , y0 ) is:

−zx (x0 , y0 )
m=
zy (x0 , y0 )

An equation for the tangent line at this point is:

y − y0 = m(x − x0 ) (2.22)

We can compute a vector that is parallel to this line by taking two points on
the line, (x0 , y0 ) and (x1 , y1 ) and computing the vector (x1 − x0 , y1 − y0 ). We
know that:

y1 − y0 = m(x1 − x0 )

because any pair (x1 , y1 ) on the tangent line must satisfy Equation 2.22. Thus
we have the vector v = (x1 − x0 , m(x1 − x0 )) parallel to the tangent line. Now
we compute the dot product of this vector with the gradient of the function:

∇z(x0 , y0 ) = (zx (x0 , y0 ), zy (x0 , y0 ))

We obtain:

∇z(x0 , y0 ) · v = zx (x0 , y0 ) (x1 − x0 ) + zy (x0 , y0 ) (m(x1 − x0 )) =


!
−zx (x0 , y0 )
zx (x0 , y0 ) (x1 − x0 ) + zy (x0 , y0 ) (x1 − x0 ) =
zy (x0 , y0 )
zx (x0 , y0 ) (x1 − x0 ) + (−zx (x0 , y0 )(x1 − x0 )) = 0

Thus, ∇z(x0 , y0 ) is perpendicular to v as we expected from Theorem 2.3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.5 Some Geometry for Optimization 81

■ Example 2.5 Let’s demonstrate the previous remark and Theorem 2.3. Consider
the function z(x, y) = x4 + y 2 + 2xy with a point (x0 , y0 ). Any level curve of the
function is given by: x4 + y 2 + 2xy = k. Taking the implicit derivative we obtain:
! !
d d dy dy
x4 + y 2 + 2xy = k =⇒ 4x3 + 2y + 2y + 2x = 0
dx dx dx dx

Note that to properly differentiate 2xy implicitly, we needed to use the product
rule from calculus. Now, we can solve for the slope of the tangent line to the curve
at point (x0 , y0 ) as:

−4x30 − 2y0
!
dy
m= =
dx 2y0 + 2x0

Our tangent line is then described the equation:

y − y0 = m(x − x0 )

Using the same reasoning we did in the remark, a vector parallel to this line is
given by (x1 − x0 , y1 − y0 ) where (x1 , y1 ) is another point on the tangent line. Then
we know that:

y1 − y0 = m(x1 − x0 )

and thus our vector is v = (x1 − x0 , m(x1 − x0 )). Now, computing the gradient of
z(x, y) at (x0 , y0 ) is:

∇z(x0 , y0 ) = (4x30 + 2y0 , 2y0 + 2x0 )

Lastly we compute:
 
∇z(x0 , y0 ) · v = 4x30 + 2y0 (x1 − x0 ) + (2y0 + 2x0 ) (m(x1 − x0 )) =
−4x30 − 2y0
!
 
4x30 + 2y0 (x1 − x0 ) + (2y0 + 2x0 ) (x1 − x0 ) =
2y0 + 2x0
   
4x30 + 2y0 (x1 − x0 ) + −4x30 − 2y0 (x1 − x0 ) = 0

Thus, for any point (x0 , y0 ) on a level curve of z(x, y) = x4 + y 2 + 2xy we know that
the gradient at that point is perpendicular to a tangent line (vector) to the curve
at the point (x0 , y0 ).
It is interesting to note that one can compute the slope of the tangent line (and
its equation) in Figure 2.5. Here (x0 , y0 ) = (1, 1), thus the slope of the tangent line
is:
−4x30 − 2y0 −6 −3
m= = =
2y0 + 2x0 4 2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


82 Chapter 2. Introduction to Linear Optimization

The equation for the line displayed in Figure 2.5 is:


−3
y−1 = (x − 1)
2

Exercise 2.7 In this exercise you will use elementary calculus (and a little bit of
vector algebra) to show that the gradient of a simple function is perpendicular to
its level sets:
(a) Plot the level sets of z(x, y) = x2 + y 2 . Draw the gradient at the point
(x, y) = (2, 0). Convince yourself that it is normal to the level set x2 + y 2 = 4.
(b) Now, choose any level set x2 + y 2 = k. Use implicit differentiation to find
dy/dx. This is the slope of a tangent line to the circle x2 + y 2 = k. Let
(x0 , y0 ) be a point on this circle.
(c) Find an expression for a vector parallel to the tangent line at (x0 , y0 ) [Hint:
you can use the slope you just found.]
(d) Compute the gradient of z at (x0 , y0 ) and use it and the vector expression
you just computed to show that two vectors are perpendicular. [Hint: use
the dot product.]

2.6 Gradients, Constraints and Optimization


Since we’re talking about optimization (i.e., minimizing or maximizing a certain
function subject to some constraints), it follows that we should be interested in
the gradient, which indicates the direction of greatest increase in a function. This
information will be used in maximizing a function. Logically, the negation of
the gradient will point in the direction of greatest decrease and can be used in
minimization. We’ll formalize these notions in the study of linear programming. We
make one more definition:

Definition 2.8 — Binding Constraint. Let g(x) ≤ b be a constraint in an opti-


mization problem. If at point x0 ∈ Rn we have g(x0 ) = b, then the constraint is
said to be binding. Clearly equality constraints h(x) = r are always binding.

■ Example 2.6 — Continuation of Example 2.1. Let’s look at the level curves of
the objective function and their relationship to the constraints at the point of
optimality (x, y) = (25, 25). In Figure 2.6 we see the level curves of the objective
function (the hyperbolas) and the feasible region shown as shaded. The elements in
the feasible regions are all values for x and y for which 2x + 2y ≤ 100 and x, y ≥ 0.
You’ll note that at the point of optimality the level curve xy = 625 is tangent to
the equation 2x + 2y = 100; i.e., the level curve of the objective function is tangent
to the binding constraint.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.6 Gradients, Constraints and Optimization 83

Figure 2.6: Level Curves and Feasible Region: At optimality the level curve of the
objective function is tangent to the binding constraints.

If you look at the gradient of A(x, y) at this point it has value (25, 25). We see
that it is pointing in the direction of increase for the function A(x, y) (as should be
expected) but more importantly let’s look at the gradient of the function 2x + 2y.
It’s gradient is (2, 2), which is just a scaled version of the gradient of the objective
function. Thus the gradient of the objective function is just a dilation of gradient
of the binding constraint. This is illustrated in Figure 2.7.

Figure 2.7: Gradients of the Binding Constraint and Objective: At optimality the
gradient of the binding constraints and the objective function are scaled versions
of each other.

The elements illustrated in the previous example are true in general. You may
have discussed a simple example of these when you talked about Lagrange Multipliers
in Vector Calculus (Math 230/231). We’ll revisit these concepts later when we talk

T.Abraha(PhD) @AKU, 2024 Linear Optimization


84 Chapter 2. Introduction to Linear Optimization

about duality theory for linear programs. We’ll also discuss the gradients of the
binding constraints with respect to optimality when we discuss linear programming.
Exercise 2.8 Plot the level sets of the objective function and the feasible region in
Exercise 2.1. At the point of optimality you identified, show that the gradient of
the objective function is a scaled version of the gradient (linear combination) of
the binding constraints. ■

2.7 Linear programs and optimization


"Linear programming" suggests telling a computer to do linear stuff. But that’s not
what the word "programming" means in the title of this class. In this context:
• "programming" means "optimization": finding a point that maximizes (or
minimizes) the value of a function.
• "linear" means that the functions we optimize will be linear, and the constraints
on our optimization will all be linear equations or inequalities.
Today, we will take an example linear program from formulation all the way to
finding a solution, and see some basic ideas of linear programming along the way.
Problem 2.1 At a major music company, you are in charge of hiring for the xylophone
department and the yodeling department. You want to change the number of
employees in the xylophone department by x and in the yodeling department by
y. (Both x and y can be positive or negative: you can hire people or you can fire
people.)
Right now, the xylophone department is doing well: each employee is bringing
you $1000 in profit each day. However, the yodeling department is actually a loss for
the company: each employee is losing $300 for the company each day.
What should you do to maximize profit, given the following constraints?
• x + y ≤ 50: you don’t have office space to increase the total size of the depart-
ments by more than 50.
• y ≥ −20: the yodeling department will just stop functioning if it loses more
than 20 people.
• The yodelists’ union has forced you to agree to the constraint 2x − y ≤ 40 in a
recent negotiation.
Here’s the problem written down without words:

maximize 1000x − 300y


x,y∈R
subject to x+ y ≤ 50
y ≥ −20
2x − y ≤ 40

(A subtle issue: we’ve written x, y ∈ R but actually x and y should be integers: you
can’t hire 12 of a yodelist. We’ll ignore this problem today and return to it near the
end of the semester.)

2.7.1 The feasible region


The linear program has a set of constraints: x + y ≤ 50, y ≥ −20, and 2x − y ≤ 40.
In general, we will allow our constraints to be linear inequalities or linear equations.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.7 Linear programs and optimization 85

(But don’t worry about "equations" for now: we’ll limit ourselves to inequalities for
the moment.)
In linear algebra, we write down a system of equations as a single matrix equation.
In linear programming, we will also write down a system of inequalities as a single
matrix inequality. Here’s how we do it.
First, we should make sure all our variables are on one side, if we haven’t done
that already. In this example, that’s already been done. Second, let’s multiply the
second inequality by −1, so that all the inequalities are ≤ inequalities:

x + y ≤ 50
−y ≤ 20
2x − y ≤ 40

The column of values on the left-hand side can be written as a matrix multiplication:
   
x+y 1 1  
    x
 −y  = 0 −1   .
   
y
2x − y 2 −1

So we will summarize this system of inequalities as the matrix inequality


   
1 1   50
  x  
0 −1   ≤ 20 .
 
y  
2 −1 40

Putting a single "≤" between two vectors is something you might not be used to.
What it means is that every component of the vector on the left is less than or equal
to the corresponding component of the vector on the right.
What happens in general? Suppose our linear program has n variables that, for
lack of creativity, we will call x1 , x2 , . . . , xn . We can put all these variables together
into a column vector x ∈ Rn . Then any collection of m linear inequalities in x1 , . . . , xn
can be combined into a matrix inequality Ax ≤ b where A is an m × n matrix and b
is an m × 1 vector.
The set {x ∈ Rn : Ax ≤ b} of all x which satisfy the constraints is called the
feasible region. In our example, the feasible region is shown below (it extends
infinitely far to the left):

2x − y = 40

x + y = 50
x

y = −20

T.Abraha(PhD) @AKU, 2024 Linear Optimization


86 Chapter 2. Introduction to Linear Optimization

A point x in the feasible region is called a feasible solution. You should think
of it as follows: a point (x, y) in this region is a feasible decision you could make
(even if it loses your company a lot of money), whereas a point (x, y) outside this
region is just not an option you could choose.

2.7.2 The objective function


The linear program also has an objective function: we want to maximize 1000x −
300y. In general, we might be maximizing or minimizing an arbitrary linear function.
What does an "arbitrary linear function" look like? Well, we can write 1000x −
300y as a product of a row vector and a column vector:
 
h x i
1000x − 300y = 1000 −300   .
y

(Technically, one of these is a scalar and one of these is a 1 × 1 matrix, but we will
often ignore the difference.)
More generally, when we have a vector of variables x ∈ Rn , we can write the
objective function as cT x for some constant vector c ∈ Rn .
Putting together these ideas, any linear program can be written as

maximize
n
cT x
x∈R
subject to Ax ≤ b.

What about minimizing? Well, minimizing cT x would be the same as maximizing its
negative (−c)T x. We will encounter both kinds of linear programs in class, but we
don’t lose any generality by focusing on one kind whenever it’s convenient.
Whether we’re minimizing or maximizing, a point x ∈ Rn with the best value of
the objective function is called an optimal solution. In our example, the point
(x, y) = (30, 20) is the unique optimal solution, as we’ll see in a moment.

2.8 The naive approach to solving linear programs


Let’s go back to the original linear program. Here are some possible values of the
objective function 1000x − 300y, and the places where they occur:
y 1000x − 300y = 16000
1000x − 300y = 20000
1000x − 300y = 24000
1000x − 300y = 28000

Pick a small value of 1000x − 300y (such as 16000) and the feasible points with
that value of x − y are a line segment. Pick a large value of 1000x − 300y (such as

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.8 The naive approach to solving linear programs 87

28000) and there are no feasible points with that value of 1000x − 300y. But when
1000x − 300y = 2400, just before the value becomes impossible, the segment shrinks
to a single point: a corner of the feasible region.
Without drawing this picture and lots of carefully measured parallel lines, all
we know is that this happens a some corner. Where are the corners? Well, at each
corner, two of our boundary lines intersect. So we can try taking our boundaries two
at a time, and seeing where they intersect:
• x + y = 50 and 2x − y = 40 when (x, y) = (30, 20). This is one of our corners:
the top one.
• 2x − y = 40 and y = −20 when (x, y) = (10, −20). This is the lower of the two
corners.
• x + y = 50 and y = −20 when (x, y) = (70, −20). This is not actually a corner:
two boundaries intersect here, but the inequality 2x − y ≤ 40 does not hold.
Now we can compare the values of 1000x − 300y at (30, 20) and (10, −20). The
first corner turns out to be better than the second, so that’s our optimal solution.
(There’s actually one more important thing to check, which we’ll get to in a bit, but
in this case it doesn’t affect the answer.)
This is the “naive" approach to solving linear programs. It’s quick to explain,
and for small examples, especially ones you can draw in the plane, it may be the
easiest thing to do.
Imagine, however, that you have a linear program with 50 variables and 100
inequalities. (This is a "tiny" linear program: my computer solves one of  these

in approximately 0.03 seconds.) With the naive approach, there are 100 50 =
100 891 344 545 564 193 334 812 497 256 combinations of 50 of the equations bounding
the region. Each of these combinations (in general) intersects at a single point, so
we need to compare that many points to find the best one.
Our goal in this class is going to be to try to do less work than this. Ahead of us
is the simplex method, which starts at one vertex of a linear program, and moves
from vertex to vertex until it finds the best one: hopefully long before it visits all
the vertices. We won’t solve problems with 100 inequalities, but computers solve
such problems in a very similar way.

2.8.1 Misbehaving linear programs


In the previous example, our linear program had a unique optimal solution, but this
is not always guaranteed to be true. What can go wrong?
1. The optimal solution might not be unique. Imagine that the lines we
are drawing are parallel to a boundary of the region. Then several corners
are equally good! This is a minor problem, as these things go, but it will
complicate our life a little.
2. The optimal solution might not exist, because the objective function
can be arbitrarily large. Imagine that we are minimizing 1000x − 300y, not
maximizing. Then we can keep moving the dashed line farther and farther left,
and getting lower and lower values.
This means one of two things: either we found a hack to get infinite profits, or
(more likely) there is another constraint we didn’t model because it’s "obvious":
to us, but not to the linear program. Maybe we can’t actually hire 1000 yodelists
and fire 1000 xylophonists, because we don’t have that many xylophonists to

T.Abraha(PhD) @AKU, 2024 Linear Optimization


88 Chapter 2. Introduction to Linear Optimization

fire!
Either way, even the naive approach needs to worry about this: we should
check that in the direction that our region extends forever, the solutions keep
getting worse and not better. We won’t go into detail about how to check this,
because we won’t be using the naive approach.
3. The optimal solution might not exist, because there are no feasible
solutions. Imagine that the constraints we have contradict each other: there
is no way to satisfy all of them. This is a time to rethink our model and see if
we can relax some constraints. (Maybe union negotiations have actually forced
us to acquire more office space before they can be satisfied, for example.)

2.9 What can a linear program model?


Linear programs are almost always a simplification: real life is nonlinear a lot of the
time. Sometimes we’re lucky and our constraints do end up being linear. Sometimes
we’re slightly less lucky, and can still approximate real life by a linear program.
Sometimes we’re not lucky at all.
For example, suppose we’re optimizing over the disk x2 + y 2 ≤ 1. That’s not
a linear constraint. But we can replace the circle by a polygon with many sides.
Each side is a straight line, so we can describe the polygon by a bunch of linear
inequalities. Probably, optimizing over the polygon will not be too different from
optimizing over the circle—and if not, we can give the polygon more sides to improve
the approximation.
Similarly, strict inequalities like x + y < 1 are not okay in our linear programs,
but also not a huge problem. We can always replace such an inequality by either
x + y ≤ 1 (including slightly more points) or x + y ≤ 0.999 (including slightly fewer
points). The second approximation can be made arbitrarily good.
On the other hand, suppose our region is the union of two disks:

(x + 2)2 + y 2 ≤ 1 (x − 2)2 + y 2 ≤ 1

No matter how you try, you can never draw a linear inequality that includes both
of these disks, but excludes the origin, (0, 0). Here’s a formal proof. Suppose you
have any system of inequalities Ax ≤ b that includes both disks. Then in particular
it includes the points (−2, 0) and (2, 0) at their centers. So
       
−2 2 −2 2
A   ≤ b and A   ≤ b =⇒ A   + A   ≤ 2b
0 0 0 0
 
0
=⇒ A   ≤ 2b
0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.10 Different formulations of linear programs 89
 
0
=⇒ A   ≤ b.
0

Therefore (0, 0) satisfies the system of inequalities as well.


In other words, this region can’t be approximated by a linear program. No matter
what, you will never exclude the point (0, 0), which is pretty far from any point
that’s actually in your region.
There’s a generalization of this idea. We call a subset S of Rn a convex set
if, whenever x, y ∈ S, the entire line segment joining x and y is also contained in S.
Algebraically, the line segment joining x and y can be described as
[x, y] = {tx + (1 − t)y : 0 ≤ t ≤ 1}
and so we can also state this definition as “whenever x, y ∈ S and 0 ≤ t ≤ 1, tx + (1 −
t)y ∈ S."
The feasible region of a linear program is always convex. We can check this by
an algebraic proof: if Ax ≤ b and Ay ≤ b, then
A(tx + (1 − t)y) = t(Ax) + (1 − t)(Ay) ≤ tb + (1 − t)b = b.
There is also an argument from geometric intuition. If x and y satisfy an linear
inequality, this means that they both fall on one side of a straight line. Then the
entire line segment [x, y] must be on the same side of that line, so it also satisfies
that linear inequality. The same is true for a system of inequalities: we just consider
the inequalities one at a time.
It turns out (though it’s harder to prove) that any convex set can be approximated
as well as you like by enough linear inequalities. If the set is bounded by straight
lines (or higher-dimensional surfaces), you can even describe it exactly. On the other
hand, if a set is not convex, there’s no hope to even get close.

2.10 Different formulations of linear programs


We’ve talked already about expressing the constraints of a linear program as a
system of inequalities Ax ≤ b. There are several variations, and we can convert linear
programs from one form to the other.

2.10.1 Nonnegativity constraints


It’s common to automatically include the nonnegativity constraints x1 , x2 , . . . , xn ≥ 0.
There are several reasons for this:
• Lots of real-world problems already include them. (Many actual quantities
can’t be negative.)
• Mathematically they are fairly nice. (We’ll see some ways this comes up later.)
• In the previous lecture, we saw that if a linear program has any optimal
solutions, we can always find one at a vertex. There’s one exception to this:
some linear programs don’t have any vertices (for example, if there’s only one
inequality, or if the feasible region looks like an infinite prism in three or more
dimensions).
When nonnegativity constraints are present, this case is guaranteed not to
happen.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


90 Chapter 2. Introduction to Linear Optimization

A linear program in the form


maximize c1 x 1 + c2 x 2 + · · · + cn x n
x1 ,...,xn ∈R

maximize cT x subject to a11 x1 + a12 x2 + · · · + a1n xn ≤ b1


nx∈R
a21 x1 + a22 x2 + · · · + a2n xn ≤ b2
subject to Ax ≤ b or .. .. .. ..
..
. . . . .
x≥0
am1 x1 + am2 x2 + · · · + amn xn ≤ bm
x1 , x 2 , . . . , x n ≥ 0
is in standard form. I won’t ask you to learn this terminology, because different
sources have different ideas of standard form, and it’s not a very self-explanatory
term, but sometimes it will be useful to put linear programs in this form.
What if there are no nonnegativity constraints? We can introduce them by
a standard trick: whenever a variable x can be positive or negative, replace it
(everywhere it occurs) by the difference x+ − x− , where x+ and x− are two variables
with x+ , x− ≥ 0. Any real number can be written as the difference of two nonnegative
numbers.
For instance, the example yesterday can be rewritten as a linear program in four
nonnegative variables instead of two unconstrained variables:
maximize 1000x+ − 1000x− − 300y + + 300y −
maximize 1000x − 300y x+ ,x− ,y + ,y − ∈R
x,y∈R
subject to x+ − x− + y+ − y − ≤ 50
subject to x+ y ≤ 50 ⇝ −y + + y − ≤ 20
−y ≤ 20
2x+ − 2x− − y+ + y − ≤ 40
2x − y ≤ 40
x+ , x − , y + , y − ≥ 0
This tends to create infinitely many solutions, because each point (x, y) has in-
finitely many representations (x+ , x− , y + , y − ). For example, the point (x, y) =
(20, −10) can be represented by (x+ , x− , y + , y − ) = (20, 0, 0, −10), but equally valid
is (x+ , x− , y + , y − ) = (20, 0, 100, −110). That’s okay: as long as there’s at least one
solution, we’re happy.
There’s a more compact but less intuitive way to do the same thing: re-
place n unconstrained variables (x1 , x2 , . . . , xn ) by just n + 1 nonnegative variables
(x′0 , x′1 , . . . , x′n ), and replace every occurrence of xi by the difference x′i − x′0 . See if
you can convince yourself that this works!

2.10.2 Equations and inequalities


Non-negativity constraints are the simplest kind of inequality, and so you might wish:
what if those were the only kinds of inequalities we had to deal with? This is possible,
and the resulting form of the linear program is sometimes called equational form.
(I will use this term more freely, because it’s at least self-explanatory.)
The idea is this: if we have an inequality
a1 x1 + a2 x2 + · · · + an xn ≤ b,
we can rewrite it as an equation with one new nonnegative variable: for some w ≥ 0,
a1 x1 + a2 x2 + · · · + an xn + w = b.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.11 Systems of linear equations 91

This new non-negative variable w is called a slack variable, because it measures


how much “slack" or flexibility there was in satisfying the constraint. When w = 0,
the constraint is tight: we are right on the edge of the linear inequality. When
w > 0, we have some wiggle room to change x1 , . . . , xn without running up against
this constraint.
Doing this for every single constraint in a linear program turns every inequality
into an equation, except for some nonnegativity constraints. If we pick up where we
left off and do this to our linear program, we get:

maximize 1000x+ − 1000x− − 300y + + 300y −


x+ ,x− ,y + ,y − ,w1 ,w2 ,w3 ∈R
subject to x+ − x− +y+ − y − + w1 = 50
−y + + y− + w2 = 20
2x+ − −
2x − +
y + y − + w3 = 40
+ − + −
x , x , y , y , w1 , w2 , w3 ≥ 0

A general linear program in equational form looks like

maximize c1 x 1 + c2 x 2 + · · · + cn x n
x1 ,...,xn ∈R

maximize cT x subject to a11 x1 + a12 x2 + · · · + a1n xn = b1


n
x∈R
a21 x1 + a22 x2 + · · · + a2n xn = b2
subject to Ax = b or .. .. .. ..
...
. . . .
x≥0
am1 x1 + am2 x2 + · · · + amn xn = bm
x1 , x 2 , . . . , x n ≥ 0

This is convenient to deal with, because linear algebra gives us a lot of tools for
understanding the system of equations Ax = b. We just need to figure out what
happens when we also require x ≥ 0.
(In particular, this is the form of linear program that the simplex method will
use: this method is built on top of Gaussian elimination for solving the system of
equations Ax = b.)
In some cases, we want to go the other way: we want to turn equations into
inequalities. This is also possible. To write down an equation

a1 x 1 + a2 x 2 + · · · + an x n = b

simply write down two inequalities, one in each direction:

a1 x 1 + a2 x 2 + · · · + an x n ≤ b and a1 x 1 + a2 x 2 + · · · + an x n ≥ b

This means we can express any linear program using only inequalities, and no
equations at all.

2.11 Systems of linear equations


In linear algebra, you learned how to solve a system of equations like this one:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


92 Chapter 2. Introduction to Linear Optimization

Problem 2.2 Solve for x and y:



3x + y = 6
2x − y = −2

In this lecture, I want to go over using Gaussian elimination to do this, and some
finer points of the algorithm that we’ll need to know for this class.
We begin by deciding that x will be the basic variable for the first equation.
Having made this decision:
1. We scale the first equation so that the coefficient of x is 1. We get

1
 x + 3y = 2
2x − y = −1

2. We subtract twice the first equation from the second, so that x is eliminated.
(In general, we do this to eliminate the basic variable from every other equation.)
We get

1
x + 3y = 2
5
 − 3y = −5

Next, we move on to the second equation. We pick a basic variable there as well; it
can only be y, because that’s the only variable contained in the equation. Again:
1. We scale the second equation so that the coefficient of y is 1. We get

1
x + 3y = 2
 y = 3
1
2. To clear y from the first equation, we subtract 3 of the second equation. We
get

x = 1
y = 3

Now the solution can be read off directly: x = 1 and y = 3.

2.12 Infinitely many solutions


When the number of variables and the number of equations are equal, as in Prob-
lem 2.2, it is typical to get one solution at the end. Occasionally, other things might
happen: an equation might end up simplifying to 0 = 0 (in which case we get fewer
constraints than expected) or to 0 = 1 (in which case there is no solution at all). But
these are rare.
In this class, we will almost exclusively deal with systems of equations which
have more variables than constraints. In this case, it is guaranteed that there will
either be no solutions or infinitely many solutions. We can still find the solutions
by Gaussian elimination.
Let’s look at an example.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.13 Terminology and notation 93

Problem 2.3 Describe all solutions to the system of equations



2x
1 + x2 − 2x3 + x4 = 0
3x1 − x2 + x4 = 5
Again, we begin by choosing x1 as the basic variable in the first equation. Divide
the first equation by 2, and then subtract 3 times the result from the second equation.
We get:

1 1
x
1 + 2 x2 − x3 + 2 x4 = 0
5 1
 − 2 x2 + 3x3 − 2 x4 = 5
Next, we choose x2 as the basic variable in the second equation. Multiply the second
equation by − 25 so that we get x2 with coefficient 1. Then subtract 12 of the result
from the first equation. We get:

2 2
x
1 − 5 x3 + 5 x4 = 1
6 1
 x2 − 5 x3 + 5 x4 = −2
Now Gaussian elimination is done: every equation has a basic variable which appears
in that equation with coefficient 1, and nowhere else. But how do we get the solutions
from this result?
We’ve gotten two basic variables x1 and x2 as well as two non-basic variables x3
and x4 . If we just want one solution to the system of equations, we can set the non-
basic variables to 0. Then the x3 and x4 terms disappear entirely, and our equations
become x1 = 1 and x2 = −2. The resulting solution (x1 , x2 , x3 , x4 ) = (1, −2, 0, 0) is
called a basic solution.
But Problem 2.3 asks for all solutions. To find these, we can set the non-basic
variables to any value we want, and read off values for the basic variables. If we’re
going to be doing this a lot, it will help to move the non-basic variables to the other
side:

2 2
x
1 = 1 + 5 x3 − 5 x4
6 1
 x2 = −2 + 5 x3 − 5 x4

For example, if we plug in x3 = 5 and x4 = 10, we get x1 = 1 + 25 (5) − 25 (10) = −1


and x2 = −2 + 65 (5) − 15 (10) = 2: (x1 , x2 , x3 , x4 ) = (−1, 2, 5, 10) is also a solution.

2.13 Terminology and notation


In linear algebra, it is more common to say “pivot variables" and “free variables"
instead of “basic variables" and “non-basic variables". In linear programming, the
“basic" and “non-basic" are used almost exclusively, so we’ll stick to that terminology.
When solving many of these equations by hand, it helps to find ways to write
less. For example, we can write Problem 2.3 in matrix form as
 

x
  1  
2 1 −2 1 
x2  0

  =
3 −1 0 1 x3 
 5
x4

T.Abraha(PhD) @AKU, 2024 Linear Optimization


94 Chapter 2. Introduction to Linear Optimization

or in augmented matrix form as


 

2 1 −2 1 0
3 −1 0 1 5
I will not do this in these lecture notes, to stay consistent with the textbook’s
notation, but you are free to do so in assignments if you choose. I do recommend
that you annotate the columns with the variables they correspond to, since this
information will be important to track:

x1 x2 x3 x4
2 1 −2 1 0
3 −1 0 1 5
When an equation has a basic variable, it helps to annotate that row with its basic
variable. This is especially important when expressing the basic variables in terms
of the non-basic variables, since that information does not exist anywhere else! For
example,

2 2 x3 x4
x
1 = 1 + 5 x3 − 5 x4 2
6 1 becomes x1 1 5 − 25
 x2 = −2 + 5 x3 − 5 x4 6
x2 −2 5 − 15

2.14 Choosing a different basis


In linear algebra, when all we wanted to do was solve the system of linear equations,
it did not really matter which variables were chosen to be the basic variables: any
choice that worked was equally good. The simplex method, as we’ll see in the next
lecture, relies on moving between different choices of basic variables.
Let’s take another look at the system of equations from Problem 2.3, but with a
different goal.
Problem 2.4 Parameterize all solutions to the system of equations

2x
1 + x2 − 2x3 + x4 = 0
3x1 − x2 + x4 = 5
by expressing x2 and x3 in terms of x1 and x4 .
There are three ways to solve this problem, and we will look at them all.

2.14.1 Solving the problem from scratch


We can continue using our previous method, but choose our basic and nonbasic
variables differently. Since we want x2 and x3 in terms of x1 and x4 , we want x2 , x3
to be our basic variables and x1 , x4 to be our nonbasic variables.
Begin by choosing x2 as the basic variable in the first equation. (Choosing x2
first rather than x3 is an arbitrary decision.) We don’t need to do any division, and
we should just add the first equation to the second to eliminate x2 :

2x
1 + x2 − 2x3 + x4 = 0
5x1 − 2x3 + 2x4 = 5

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.14 Choosing a different basis 95

Next, choose x3 as the basic variable in the second equation. We should divide by
−2, and then add twice what we get from the first equation. (Equivalently, subtract
the second equation from the first, then divide the second equation by −2.) We get:

 −3x + x2 − x4 = −5
1
 − 5 x1 + x3 − x4 = − 52
2

To read off x2 , x3 in terms of x1 , x4 , we can move those terms to the other side,
getting

x
2 = −5 + 3x1 + x4
 x3 = − 52 + 5
2 x1 + x4

2.14.2 Modify an existing solution


The approach above is fine if we saw Problem 2.4 first, but since we solved Problem 2.3
already, it seems a shame to ignore all that effort. Here is that solution again:

2 2
x
1 = 1 + 5 x3 − 5 x4
6 1
 x2 = −2 + 5 x3 − 5 x4

To minimize effort, we can take this solution as a starting point. Let’s begin by
eliminating x3 from the second equation. To do this, just subtract 3 times the first
equation:

2 2
 x1 = 1 + 5 x3 − 5 x4
 x2 − 3x1 = −5 + x4

We want x3 on the left-hand side and x1 on the right-hand side, so just move those
terms (in both equations)

2 2
 − 5 x3 = 1 − x1 − 5 x4
 x2 = −5 + 3x1 + x4

Finally, multiply the first equation by − 52 so that x3 appears with a coefficient of 1


on the left-hand side:

x
3 = − 52 + 5
2 x1 + x4
 x2 = −5 + 3x1 + x4

In this example, with only two equations, this does not seem like more effort than
solving from scratch. This approach (which we’ll call pivoting in the future) shines
if we have many equations, and we are only making a minor change to the set of
basic variables.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


96 Chapter 2. Introduction to Linear Optimization

2.14.3 Multiply by an inverse matrix


The final method we’ll look at will not be relevant for a while, but it’s an interesting
trick. Start with the matrix form of the system of equations:
 

x
  1  
2 1 −2 1 
x2  0

  =
3 −1 0 1 x3 
 5
x4

We want to solve for x2 and x3 , so just take the second and third columns of the
coefficient matrix on the left:
 

1 −2
−1 0

To get a system of equations in which x2 and x3 are the basic variables, find the
inverse of this matrix:
 −1    
1 −2 1 0 2  0 −1
 =  =
−1 0 1 · 0 − (−2) · (−1) 1 1 − 12 − 12

Then, left-multiply both sides of the matrix equation by that inverse:


 
 
x
  1   
0 −1 2 1 −2 x2   0
1   −1 0
  =
− 21 − 12 3 −1 0 1 x3 
 − 12 − 12 5
x4

This simplifies to
 

x
  1
−3 1 0 −1 x2 
 h
5
i
= −5 −
 
− 25 0 1 −1  2
 
x3 

x4

and now we directly have the row-reduced form of the system of equations. Moving
from matrices back to equations, this result tells us that:

 −3x + x2 − x4 = −5
1
 − 5 x1 + x3 − x4 = − 52
2

All that’s left is to isolate x2 in the first equation and x3 in the second, and we’ll
have the same solution we’ve found twice already:

x
2 = −5 + 3x1 + x4
 x3 = − 52 + 5
2 x1 + x4

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.15 Problems 97

2.15 Problems
1. The system of equations below has infinitely many solutions. Solve for y and z
in terms of x.

3x + 2y − 3z = −1
3x − y + 2z = 2
2. The following system of equations has already been solved for x1 , x2 , x3 in
terms of x4 , x5 :
 


 x1 + 3x2 − 2x3 + x4 − x5 = 1  x1

 = 2 − x4

−2x1 + x2 + 2x4 − x5 = 1 ⇝ 
x2 = 5 − 4x4 + x5
 

x 1 + x2 − x3 − x4 = −1 
x3 = 8 − 6x4 + x5
a. Find two different particular solutions (x1 , x2 , x3 , x4 , x5 ) to this system of
equations.
b. Solve for x1 , x2 , x5 in terms of x3 , x4 instead. Try to do as little additional
work as possible.
3. Consider the following system of equations:

3x
1 + 5x2 + x3 − 2x4 = 4
 x1 + 2x2 + x3 − x4 = −1
a. Write this system of equations in matrix form: as Ax = b, where A is a
2 × 4 matrix, x is the column vector of our variables x1 , . . . , x4 , and b is a
2 × 1 column vector.
b. Take the first two columns of A only. Find the inverse of this 2 × 2 matrix.
c. Left-multiply both sides of the matrix equation Ax = b by the inverse
matrix you’ve found.
d. Your result should now be row-reduced. Use it to solve for x1 , x2 in terms
of x3 , x4 .
4. Consider the following system of equations, already written in matrix form:
    
2 1 −5 x1 1
    
 0 1 −1 x  = 3
   2  
−2 1 3 x3 2
h i
a. Left-multiply both sides of this matrix equation by the row vector 1 −2 1 .
b. What does the result tell you about the system of equations?

2.16 Challenge problems


5. Describe all solutions (x1 , x2 , . . . , xn ) to the system of equations below.





x1 − 2x2 + x3 = 0
− 2x2


x + x3 = 0
 2



x3 − 2x2 + x3 = 0

 .. ..
. = .






xn−2 − 2xn−1 + xn = 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


98 Chapter 2. Introduction to Linear Optimization

6. Take a look at the system of equations in problem 2 again. (It is especially


useful to look at the given solution for x1 , x2 , x3 in terms of x4 , x5 .)
Suppose we are only considering nonnegative solutions to the system: solutions
with
x1 , x2 , x3 , x4 , x5 ≥ 0.
In that case, answer the following questions; try to give a reason why in each
case.
a. Is there a nonnegative solution where x4 = x5 = 0?
b. Is there a nonnegative solution where x1 > 2?
c. Is there a nonnegative solution where x2 > 5?
d. Among all nonnegative solutions, what is the highest possible value of x4 ?
e. Among all nonnegative solutions where x5 = 0, what is the highest possible
value of x4 ?
7. Consider the following linear program:

maximize x + y
x,y∈R
subject to 2x + 3y ≤ 15
x + 2y ≤ 9
2x + y ≤ 12
x, y ≥ 0
a. Without trying to solve the linear program, can you give a convincing
argument for why there is no feasible solution (x, y) where x + y is 10 or
higher?
b. An shadowy figure cryptically tells you “take the sum of the first two
inequalities, then divide by three".
How can this help you get a better upper bound on x + y than what you
got in part (a)?
c. Can you find an even better upper bound on x + y in the same way as in
part (b)?

2.17 From linear algebra back to linear programming


The simplex method works on linear programs in equational form: the constraints
are Ax = b with x ≥ 0. Written out in full:





a11 x1 + a12 x2 + ··· + a1n xn = b1
··· +


a x + a22 x2 + a2n xn = b2
 21 1



.. .. ... .. ..
 . . . .

am1 x1 + am2 x2 + · · · + amn xn = bm






x1 , x 2 , . . . , x n ≥ 0

That is, we have a perfectly ordinary system of linear equations, together with the
added constraint that all variables must be nonnegative.
There are infinitely many feasible solutions, but on the first day, we saw a rule
that cuts their number down to a manageable amount:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.18 An example of pivoting in the simplex method 99

Rule #1: At least one optimal solution is a corner point of the feasible
region.1

We understand what a corner point is geometrically, in two dimensions: it’s a


point where two of the boundary lines meet. Visualizing the same thing in higher
dimensions is tricky, but let’s try it anyway.
Suppose we have n variables x1 , . . . , xn and the system Ax = b consists of m
linear equations, none of which are redundant. The solutions to this system live in
Rn . However, each linear equation reduces the dimension of the solution set by 1, so
the solution set is an affine subspace of dimension n − m. (An “affine subspace" is
a subspace that has been shifted so it doesn’t necessarily pass through the origin.
“Dimension n − m" means it looks like Rn−m . For example, when n = 3 and m = 2,
the points live in R3 , but the solutions to Ax = b look like R1 : they are a line in
3-dimensional space.)
In two dimensions, a corner point is where two boundaries meet. In three
dimensions, a corner is where three boundaries meet (imagine the corner of a cube).
In n − m dimensions, a corner is where n − m boundaries meet. What are the
boundaries of our feasible region? They come from the inequalities x1 , x2 , . . . , xn ≥ 0.
When n − m boundaries meet, it is because n − m of our variables have been set to 0.
If that was intimidating—well, we have another way to think about the same
thing. When solving a system of m linear equations in n variables, we pick m basic
variables: one for each equation. Then, we solve for them in terms of the n − m
nonbasic variables. A basic solution is what we get if we set all n − m nonbasic
variables to 0: exactly the number that we wanted for a corner point!
In other words, we can deduce the following rule:

Rule #2: All corner points of the feasible region are basic solutions of
the system of linear equations.

This gives a motivation to find as many basic solutions as possible.

2.18 An example of pivoting in the simplex method


In keeping with our intention to think about constraints only, let’s pose half a
problem: a set of constraints without an objective.
Problem 2.5 You are trying to plan out a diet consisting entirely of french fries and
ketchup. Your research says that the following conditions are required for a healthy
diet:2
1. You need to eat at least 10 servings of food to avoid being hungry.
2. With 210 calories per serving of fries and 20 calories per serving of ketchup,
you want to limit your intake to 2000 calories.
3. With 0.1 grams of sodium per serving of fries and 0.2 grams per serving of
ketchup, you want to consume at most 3 grams of sodium.
1Terms and conditions apply. Void if the linear program doesn’t have an optimal solution. Also
void if the feasible region doesn’t have any vertices.
2 Not medical advice.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


100 Chapter 2. Introduction to Linear Optimization

With x servings of fries and y servings of ketchup, the constraints are shown
below on the left:
 


 x+ y ≥ 10 

 x+ y − w1 = 10
 
210x + 20y ≤ 2000 210x + 20y + w2 = 2000
 





0.1x + 0.2y ≤ 3 



0.1x + 0.2y + w3 = 3
x, y ≥ x, y, w1 , w2 , w3 ≥
 

0 
0

We can begin by practicing turning these into equations. Add a slack variable to
each inequality, and we get the equations above on the right.

2.18.1 Step 1: a basic solution


If all we want is a basic solution, that’s easy to find—and there’s a generally useful
strategy for how to do it. Just solve each equation for its slack variable: the first
equation for w1 , the second equation for w2 , and the third equation for w3 . Then
our system can be rewritten as

w 1

 = −10 + x + y

w2 = 2000 − 210x − 20y


w3 = 3 − 0.1x − 0.2y

where the nonnegativity conditions x, y, w1 , w2 , w3 ≥ 0 still hold, but I’ll stop writing
them every time. To find a basic solution, set the nonbasic variables x, y to 0, and
read off the values of the basic variables w1 , w2 , w3 .
Is this one of the corner points? No! When x = y = 0, we get w1 = −10, w2 = 2000,
and w3 = 0. These are not all nonnegative. We should have expected this: setting
x = y = 0 means you’re not eating anything, so you’re violating the constraint "eat
at least 10 servings".
A corner point must be a basic solution, but a corner point must also be feasible:
all the variables must be nonnegative. We are looking for a basic feasible solution:
you will hear these words a lot this semester. This term (sometimes cryptically
abbreviated bfs) is just the sum of its parts: a feasible solution which is also a basic
solution.
We won’t get anywhere with an infeasible solution, so let’s start from scratch.

2.18.2 Step 1, again: a basic feasible solution


In general, finding any starting basic feasible solution can be tricky, and we’ll return
to the hard cases of the problem later. Today, I will just give a set of basic variables
that works: y, w2 , w3 . This basic feasible solution will correspond to the strategy
“Eat enough ketchup to satisfy your hunger".
Starting from our first set of equations, we can do the row reduction to solve
for y, w2 , w3 in terms of x, w1 . If you want more practice with this, you can try this
yourself and check your work; you should get



 y = 10 − x + w1

w2 = 1800 − 190x − 20w1


w3 = 1 + 0.1x − 0.2w1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.18 An example of pivoting in the simplex method 101

Setting x = w1 = 0 gives us y = 10, w2 = 1800, and w3 = 1: no arithmetic is required,


you can just read those off from a column in the system of equations above. These are
all positive, so (x, y, w1 , w2 , w3 ) = (0, 10, 0, 1800, 1) is our first basic feasible solution!

2.18.3 Step 2: pivoting (intuitively)


Right now, we don’t have an objective, so we don’t have a reason to get more basic
feasible solutions. But let’s see how we do it anyway.
The simplex method, which we’ll finish learning in the next lecture, works by a
strategy called pivoting. The idea is that:
1. We start with a basic feasible solution.
2. We modify it slightly to make one nonbasic variable become basic (enter the
basis). One of the basic variables will have to make room and become nonbasic
(leave the basis).
If xi is the entering variable, we call this pivoting around xi .
3. We choose the leaving variable to avoid negative signs, so that we arrive at a
new basic feasible solution.
Any nonbasic variable can be chosen to enter the basis; as an example, we’ll make
w1 our entering variable, starting from our previous basic feasible solution. The
intuition is this: keeping our other nonbasic variables at 0, we try to increase w1 as
much as we can without breaking anything!
What can break? Well, let’s look at the equations in the previous step, one at a
time.
• We have y = 10 − x + w1, so when x = w1 = 0, we get y = 10. Increasing w1
from this point will increase y at the same rate. When w1 = 1, we get y = 11;
when w2 = 2, we get y = 12; when w2 = 100, we get y = 110. We can keep
going forever, and this equation will be just fine.
• We have w2 = 1800 − 190x − 20w1, so when x = w1 = 0, we get w2 = 1800.
Increasing w1 from here will decrease w2 by 20 units per increase in w1 . This
could cause a problem: we don’t want to make w2 negative. Since w2 drops to
0 when w1 = 1800 20 = 90, we want to keep w1 ≤ 90.
• We have w 3 = 1 + 0.1x − 0.2w1 , so when x = w1 = 0, we get w3 = 1. Increasing
w1 from here will decrease w3 by 0.2 units per increase in w1 . Again, we want
1
to keep w3 nonnegative. How far can we go? w3 drops to 0 when w1 = 0.2 = 5,
so we want to keep w1 ≤ 5.
So to increase w1 as much as possible, we set it to 5, driving w3 down to 0. This
tells us which variable should leave the basis: w3 will become a nonbasic variable,
since the nonbasic variables are the ones that are set to 0 in a basic solution.
This means we want to solve for y, w2 , w1 on terms of x, w3 . We’ve already seen
that this can be done from our previous set of equations, saving some effort.
First, divide the last equation by 0.2, so that the coefficient of w1 is −1. Row-
reduce: add the third equation to the first, and subtract 20 times the third equation
from the second. Finally, move w3 to the right and w1 to the left.
 


 y = 10 − x + w1 

 y + 5w3 = 15 − 0.5x

w2 = 1800 − 190x − 20w1 ⇝

w2 − 100w3 = 1700 − 200x
 

5w3 = 5 + 0.5x − w1 
5w3 = 5 + 0.5x − w1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


102 Chapter 2. Introduction to Linear Optimization



 y = 15 − 0.5x − 5w3
⇝ w
 2
= 1700 − 200x + 100w3


w1 = 5 + 0.5x − 5w3
We can read off our new basic feasible solution from here: (x, y, w1 , w2 , w3 ) =
(0, 15, 5, 1700, 0).(This is the “eat as much ketchup as you can without having too
much sodium" strategy.)

2.18.4 Step 3: pivoting (algebraically)


Let’s try to add some french fries to our diet and pivot around x, making it a basic
variable. Which variable should leave the basis? This time, let’s try to take our
experience for the previous pivot, and come up with rules to follow to make this
decision.
1. The leaving variable is the first one that will be driven to 0 as x increases. For
this to happen at all, it should increase as x increases. Therefore:
In the leaving variable’s equation, the coefficient of x should be
negative.
In this example, we are choosing between y and w2 .
2. The leaving variable is the first one that will be driven to 0 as x increases.
At which value of x will it get to 0? Solving 15 − 0.5x = 0, we divide 15 (the
current value of y) by 0.5 (the negative coefficient of x: the rate at which y
decreases as x increases). So the rule is:
current value
From these options, pick the variable with the least value of rate of decrease
to be the leaving variable.
15
Here, y’s ratio is 0.5 = 30 and w2 ’s ratio is 1700
200 = 8.5, so we pick w2 .
These are the rules the simplex method always follows! (With x replaced by whatever
the entering variable is, of course.)
Pivoting as before, we get our new set of equations:
1
 


 y = 15 − 0.5x − 5w3 

 y − 400 w2 = 10.75 − 5.25w3
1 1
200 w2 = 8.5 − 8.5 − x + 0.5w3

x + 0.5w3 ⇝  200 w2 =
  1

w1 = 5 + 0.5x − 5w3 
w1 + 400 w2 = 9.25 − 4.75w3

1



 y = 10.75 + 400 w2 − 5.25w3
1
⇝ 
x = 8.5 − 200 w2 + 0.5w3
 1

w1 = 9.25 − 400 w2 − 4.75w3
Our new basic feasible solution is (x, y, w1 , w2 , w3 ) = (8.5, 10.75, 9.25, 0, 0).
This was just aimless wandering around; in the next lecture, we’ll reintroduce
the objective function, and think about pivoting with purpose. Think of what we’ve
done today as driving around the parking lot; next, we’ll get on the highway.

2.18.5 Troubleshooting
The only goal of the pivoting algorithm we learned today is to go from a basic feasible
solution to another basic feasible solution. You know that you’ve picked the correct
leaving variable if your new basic solution is still feasible—if it’s not, then go back
and rethink your choice of leaving variable.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.19 Introducing objective functions 103

Aside from that, remember the cardinal rule: always do the same thing to both
sides of an equation. Finally, watch out for mistakes with lost negative signs, as
those are very easy to make here.

2.19 Introducing objective functions


This time, we’ll begin by posing a complete problem with an objective function to
maximize:
Problem 2.6 A farmer splits 9 acres of land3 between growing cotton (x1 acres),
corn (x2 acres), and soy (x3 acres). Cotton is not regulated, but federal regulations
require a balance in food crops sold: at most 75% of the total amount can be a single
crop. However, an additional acre’s worth of food can be sold in-state, where this
regulation does not apply.
In units of hundreds of dollars, the farmer’s profit is 2 per acre of cotton, 3 per
acre of corn, and 4 per acre of soy. How can the farmer maximize profit?
To express the fictional federal regulations in this problem as linear constraints,
we require that each of x2 , x3 is at most three times the other, plus 1: x2 ≤ 3x3 + 1
and x3 ≤ 3x2 + 1. All the quantities in this problem must be nonnegative. This
gives us the linear program we see below on the left; on the right, we’ve added slack
variables to put it in equational form.

maximize 2x1 + 3x2 + 4x3 maximize 2x1 + 3x2 + 4x3


x1 ,x2 ,x3 ∈R x1 ,x2 ,x3 ,w1 ,w2 ∈R
subject to x1 + x2 + x 3 = 9 subject to x1 + x2 + x3 =9
x2 − 3x3 ≤ 1 ⇝ x2 − 3x3 + w1 =1
−3x2 + x3 ≤ 1 −3x2 + x3 + w2 = 1
x1 , x 2 , x3 ≥ 0 x1 , x 2 , x 3 , w 1 , w 2 ≥ 0

The first thing to realize is that when the equations hold, the objective function has
many equivalent forms. Since x1 + x2 + x3 = 9, for example, maximizing 2x1 + 3x2 +
4x3 is equivalent to maximizing 2x1 + 3x2 + 4x3 + (x1 + x2 + x3 − 9) or 3x1 + 4x2 +
5x3 − 9: if we maximize that objective function instead, we get the same solution.
We will give the expression 2x1 + 3x2 + 4x3 a name: we’ll call it ζ (zeta).4 Writing
down the equation ζ = 2x1 + 3x2 + 4x3 makes our lives somewhat easier: we now
have 4 equations in 6 variables ζ, x1 , x2 , x3 , w1 , w2 , and we are simply maximizing
one of the variables.
This particular problem conveniently starts out row-reduced with x1 , w1 , w2 as
the basic variables; we can easily solve for them in terms of the non-basic variables
x2 , x3 . Out of our many representations for ζ, it is convenient to pick one that’s also
in terms of x2 , x3 . Just subtract twice (x1 + x2 + x3 − 9) to get ζ = x2 + 2x3 + 18.

2.20 The dictionary


The dictionary is a representation of a linear program closely related to the systems
of equations we were writing in the previous lecture. It has an additional equation
3An unrealistically small amount to keep our numbers low.
4 Many sources use z instead, and you may feel free to write z instead of ζ.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


104 Chapter 2. Introduction to Linear Optimization

for the objective value, which we traditionally separate as follows:

max ζ = 18 + x2 + 2x3
x1 = 9 − x2 − x3
w1 = 1 − x2 + 3x3
w2 = 1 + 3x2 − x3

It is helpful to include “max” or “min" in the top left corner of the dictionary, to
remind ourselves that we’re maximizing or minimizing. The simplex method treats
the two cases differently.
Each dictionary corresponds to a basic solution whose parameters we can read off
from the column immediately after =. We have x1 = 9, w1 = 1, and w2 = 1, while the
nonbasic variables x2 , x3 are set to 0; the objective value of this solution is ζ = 18.
With the possible exception of ζ, all the numbers in this column should be
nonnegative if we are looking at a basic feasible solution. If the dictionary has this
property, we call it a feasible dictionary and, for the time being, we will not
consider any other kind of dictionary.
There are as many feasible dictionaries as there are basic feasible solutions.5 The
simplex method operates by moving from dictionary to dictionary until we arrive at
one that gives us the optimal solution. The method of moving from dictionary to
dictionary is the same as in the previous lecture; today, we will see how the objective
value fits in.

2.21 Using the simplex method


2.21.1 The first pivoting step
Let’s begin by bringing x3 into the basis. This is an arbitrary choice for now, but
we’ll see what happens when we do this, and think about how we can make this
choice more intelligently.
If x3 is our entering variable, then we need to choose a leaving variable. This
is not new; however, to be clear, we must never choose ζ to be our leaving
variable. We will always keep ζ in the top left corner of our dictionary, so that we
always know what the objective value is!
We choose the leaving variable in two steps:
• Out of x1, w1, w2, we reject w1 immediately: it has a positive coefficient of x3
in our dictionary, and we want a negative coefficient.
• x1’s current value is 9 and it decreases at a rate of 1 as x3 increases. Meanwhile,
w2 ’s current value is 1 and it decreases at a rate of 1 as x3 increases. We choose
current value 9 1
the variable with the lowest ratio rate of decrease ; 1 > 1 , so we choose w2 .
To bring x3 into the basis and w2 out of the basis, we begin by solving w2 ’s
equation for x3 , getting x3 = 1 + 3x2 − w2 . Now, we rewrite ζ, x1 , and w1 in terms

5 Note for the future: sometimes, unfortunately, there are slightly more—multiple feasible
dictionaries for the same basic feasible solution! Don’t worry about this for now.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.21 Using the simplex method 105

of w2 rather than x3 , by substituting 1 + 3x2 − w2 in place of x3 wherever it occurs:

max ζ = 18 + x2 + 2(1 + 3x2 − w2 ) max ζ = 20 + 7x2 − 2w2


x1 = 9 − x2 − (1 + 3x2 − w2 ) x1 = 8 − 4x2 + w2

w1 = 1 − x2 + 3(1 + 3x2 − w2 ) w1 = 4 + 8x2 − 3w2
x3 = 1 + 3x2 − w2 x3 = 1 + 3x2 − w2

We’ve obtained our new dictionary! The new values of our variables are (x1 , x2 , x3 , w1 , w2 ) =
(8, 0, 1, 4, 0), and the objective value ζ has increased to 20.

2.21.2 How do we make progress?


Since ζ has gone from 18 to 20, apparently we’ve done something right. But it was a
complete accident! Let’s figure out what we did right so we can keep doing it.
How much did ζ go up? The change from 18 to 20 is the product of two things:
the 2 which was the coefficient of x3 (our entering variable) in the old equation for ζ,
and the 1 which is the new value of x3 .
Because we will always choose our leaving variable to get a feasible dictionary, the
new value of our entering variable will always be positive. However, the coefficient of
x3 in the old equation for ζ could have been anything.
We conclude that:
• If we want ζ to increase, we should choose an entering variable with a positive
coefficient in ζ’s equation.
That way, we multiply two positive numbers to compute the change in ζ.
• If we want ζ to decrease, we should choose an entering variable with a negative
coefficient in ζ’s equation.
That way, we multiply a negative number by a positive number to compute
the change in ζ.
There is an official term for the coefficient of x3 in ζ’s equation; it is called the
reduced cost of x3 . Let me explain this term so it’s easier to remember:
• The word “cost" actually comes from the fact that in the original equation
ζ = 2x1 + 3x2 + 4x3 , the 4 represented the cost at which the farmer can sell
soy. Economic problems like this one are an important application of linear
programs! The word cost has stuck around to be used in problems where the
objective value has nothing to do with money.
• The word “reduced" has nothing to do with the fact that the number has gone
down from 4 to 2; it could be larger or smaller. The term should properly be
“row-reduced cost". After we’ve row-reduced our system of equations, the cost
has changed: the row-reduced cost is its new value.
To summarize our rule: when we bring xi into the basis, the change in ζ
has the same sign as the reduced cost of xi . Look for a positive reduced cost
when maximizing ζ, and a negative reduced cost when maximizing.

2.21.3 One more pivot step


In our first pivot step, both reduced costs were positive, so any choice of entering
variable would have been fine. In the next step, we are choosing between x2 and w2 .
Only x2 has a positive reduced cost, so it is the only valid choice of entering variable.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


106 Chapter 2. Introduction to Linear Optimization

(This makes sense: since w2 just left the basis, pivoting around w2 will return us to
where we were previously, undoing our progress!)
In x2 ’s column, only x1 has a negative coefficient, so it is our only valid choice of
leaving variable: we don’t even have to compare ratios.
Solving the equation x1 = 8 − 4x2 + w2 for x2 , we get x2 = 2 − 14 x1 + 14 w2 . Now
we are ready to substitute this in for x2 in all the other rows of our dictionary:

max ζ = 20 + 7(2 − 14 x1 + 14 w2 ) − 2w2 max ζ = 34 − 74 x1 − 14 w2


1 1
x2 = 2 − 4 x1 + 4 w2 x2 = 2 − 14 x1 + 14 w2

w1 = 4 + 8(2 − 14 x1 + 14 w2 ) − 3w2 w1 = 20 − 2x1 − w2
x3 = 1 + 3(2 − 41 x1 + 14 w2 ) − w2 x3 = 7 − 34 x1 − 14 w2
We can confirm that our objective value has increased from 20 to 34: the change is
exactly equal to 7 (the reduced cost of x2 in our previous dictionary) multiplied by 2
(the value of x2 in our new feasible solution).
In full, our basic feasible solution is now (x1 , x2 , x3 , w1 , w2 ) = (0, 2, 7, 20, 0): we
grow 2 acres of corn and 7 acres of soy, for a profit of $100 · ζ = $3 400. (This would
seem more reasonable if the farm were a more realistic size!)

2.21.4 The end of the simplex method


If we look at the latest dictionary we’ve gotten, and try to pick an entering variable,
it looks at first like we have a problem. Both x1 and w2 have a negative reduced
cost: both of them would decrease ζ if we brought them into the basis.
This means we can’t improve our objective value by one step of the simplex
method. Should we worry that we’re trapped at a local optimum that isn’t as good
as some far-away corner? No! In fact, we can prove that our current basic feasible
solution is optimal.
The top equation of the dictionary says ζ = 34 − 74 x1 − 14 w2 . Remember: this
is a universal equation that holds for every feasible solution of our linear program,
because we deduced it from combining ζ = 2x1 + 3x2 + 4x3 with our constraints.
All our variables are nonnegative; in particular, x1 ≥ 0 and w2 ≥ 0. So we are
taking 34 and subtracting two nonnegative values from it. It follows that at all
feasible solutions, ζ ≤ 34.
But we’ve just seen that our current basic feasible solution achieves an objective
value of ζ = 34 exactly. We conclude that our basic feasible solution is optimal: it’s
impossible for the farmer to make a larger profit. This is the universal rule for when
the simplex method halts:
• When maximizing ζ, stop when all reduced costs are at least 0.
• When minimizing ζ, stop when all reduced costs are at most 0.
In both cases, it is fine to see a reduced cost of 0. What happens when we pivot on
a variable that has a reduced cost of 0? The objective value always changes by the
product of the old reduced cost and the new value of the entering variable. In this
case, that product will always be 0, because the reduced cost is 0. So pivoting on
such a variable will never change the objective function.
However, a reduced cost of 0 indicates that there may be multiple optimal
solutions: we can get other solutions with the same objective value by pivoting on
such a variable.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.22 Optional: dictionaries and tableaux 107

By contrast, in our case, being given ζ = 34 − 74 x1 − 14 w2 tells us that we must


have x1 = 0 and w2 = 0 in any optimal solution. (If one of these variables were
positive, we’d subtract a positive number from 34, and get a smaller objective value.)
The feasible solution we have is actually the only solution possible when x1 = 0 and
w2 = 0, so it is the unique optimal solution to our problem.

2.22 Optional: dictionaries and tableaux


There are two main ways that people have come up with to represent intermediate
steps in the simplex method. Following the textbook, we are using dictionaries,
which were introduced by Chvátal in his 1983 textbook on linear programming.
A tableau is another way of representing the same information. Fundamentally,
they are based on writing the same equations with all variables on the right, and
with constants and the objective value on the left. For example, here is how one of
our dictionaries would appear in this form:

ζ = 20 + 7x2 − 2w2 ζ − 20 = 7x2 − 2w2


x1 = 8 − 4x2 + w2 8 = x1 + 4x2 − w2

w1 = 4 + 8x2 − 3w2 4= −8x2 + w1 + 3w2
x3 = 1 + 3x2 − w2 1= −3x2 + x3 + w2

This is closer to the way we write things when we do Gaussian elimination. It has
more columns, but the advantage is that it is easier to put in a table, without having
to write the variables every time. Here, the simplex tableau would be:

x1 x2 x 3 w 1 w 2
−ζ −20 0 7 0 0 −2
x1 8 1 4 0 0 −1
w1 4 0 −8 0 1 3
x3 1 0 −3 1 0 1

We annotate the columns with the variables whose coefficients are in those columns;
we annotate the rows with the basic variable in that row. We write −ζ in the
objective row to remind ourselves that with this method, −20 is the negative of the
objective value. Iterations of the simplex method are just ordinary row reduction
with this grid of numbers.
Having the negative of the objective value appear in the tableau is a bit weird,
so you might also see tableaux written with the top equation ordered differently: as
ζ − 7x2 + 2w2 = 20. Then, the tableau could look like the following:

ζ x1 x2 x3 w1 w2
ζ 20 1 0 −7 0 0 2
x1 8 0 1 4 0 0 −1
w1 4 0 0 −8 0 1 3
x3 1 0 0 −3 1 0 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


108 Chapter 2. Introduction to Linear Optimization

This way, you can read off the current solution and the objective value from the
left of the tableau. The downside of this approach is that the reduced costs in
this version are the negatives of the reduced costs we’re used to seeing! There’s
nothing wrong with that—provided we reverse our rules of dealing with the reduced
costs—but it means that building the initial tableau is a little bit weird. The
numbers we’ll have to put in the top row of the tableau will be the negatives of the
coefficients in the objective function, because we rewrite ζ = c1 x1 + c2 x2 + · · · + cn xn
as ζ − c1 x1 − c2 x2 − · · · − cn xn = 0.
There are other variants of the the tableau, with the rows and columns rearranged
in minor ways. This makes it extra important to keep the rows and columns labeled
with variables, so that we can interpret them more easily.

2.23 An example with nothing weird going on


Today, we will skip the word problem and go straight to the linear program.
Problem 2.7 Solve the following linear program (put in equational form on the right).

maximize 2x + 3y maximize 2x + 3y
x,y∈R x,y,w1 ,w2 ,w3 ∈R
subject to −x + y ≤ 3 subject to −x + y + w1 =3
x − 2y ≤ 2 ⇝ x − 2y + w2 =2
x+ y ≤ 7 x+ y + w3 = 7
x, y ≥ 0 x, y, w1 , w2 , w3 ≥ 0
Adding slack variables has a convenient bonus effect. The slack variables
(w1 , w2 , w3 ) form a convenient set of basic variables to start with, for two reasons:
• The dictionary will already be row-reduced for the slack variables, since each
one shows up in only one equation. This will be true any time we add slack
variables.
• The basic solution is (x, y, w1 , w2 , w3 ) = (0, 0, 3, 2, 7), which is feasible. This
happens whenever our starting inequalities are all upper bounds with a positive
constant on the right-hand side. So it’s not always useful, but sometimes makes
our lives easier.
Here is our starting dictionary, and a graph of the feasible region (of the original
linear program in x and y) with the corresponding basic feasible solution marked:
y

max ζ = 0 + 2x + 3y
w1 = 3 + x − y
w2 = 2 − x + 2y
w3 = 7 − x − y x
(0, 0)

In a well-behaved 2-dimensional linear program, exactly 2 constraints should be tight


at each corner. Specifically, it’s the ones corresponding to the nonbasic variables.
In this dictionary, the nonbasic variables are x and y, and the constraints they
”own” are the x ≥ 0 and y ≥ 0 constraints. So we should be at the corner where these
two constraints are tight: the intersection of x = 0 and y = 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.23 An example with nothing weird going on 109

Let’s bring y into the basis. (This is an arbitrary choice: we could also have
chosen x.) Since w2 ’s coefficient of y is 2, it’s not a valid leaving variable; w1 and
w3 have ratios of 31 and 71 , of which the smallest is 3. So y replaces w1 in the basis,
giving us the new dictionary below:
y
max ζ = 9 + 5x − 3w1
(0, 3)
y = 3 + x − w1
w2 = 8 + x − 2w1
w3 = 4 − 2x + w1
x

The basic feasible solution is (x, y, w1 , w2 , w3 ) = (0, 3, 0, 8, 4). The nonbasic variables
of the new dictionary are x and w1 . As before, x “owns” the x ≥ 0 constraint.
Meanwhile, w1 “owns” the w1 ≥ 0 constraint, but in the original linear program,
this was the −x + y ≤ 3 constraint. We should be at the corner where x = 0 and
−x + y = 3 meet, and indeed, these lines meet at (0, 3).
The choice of entering variable corresponds to picking the direction in which we
went around the polygon: which edge out of (0, 0) we used. The edge from (0, 0) to
(0, 3) moves away from the y ≥ 0 constraint, so y is the variable that becomes basic.
We could also have brought x into the basis, moving away from the x ≥ 0 constraint.
But now, at (0, 3), there is only one good choice of entering variable. We don’t
want to go back to (0, 0), so the only choice is to continue going clockwise. In the
dictionary, this corresponds to how we don’t want to bring w1 back into the basis
(its reduced cost is negative, so this would decrease ζ). Instead, the only helpful
entering variable is x, whose reduced cost is positive.
In x’s column, the coefficients of y and w2 are both positive, so those can’t be
leaving variables. Therefore x replaces w3 in the basis, giving us the new dictionary
below:

y (2, 5)
1 5
max ζ = 19 − 2 w1 − 2 w3
1 1
y = 5 − 2 w1 − 2 w3
3 1
w2 = 10 − 2 w1 − 2 w3
1 1
x = 2 + 2 w1 − 2 w3 x

In this dictionary, all reduced costs are negative. Therefore ζ is maximized and
(x, y) = (2, 5) is the optimal solution.
(Here, w1 and w3 are nonbasic. The constraints they “own” are the −x + y ≤ 3
constraint and the x + y ≤ 7 constraint. So we end up at the corner point where the
lines −x + y = 3 and x + y = 7 intersect.)
If we had decided pivot around x first, rather than y, we would have arrived at
the same final answer, but going counterclockwise around the feasible region instead.
There would have been three steps, not two, because there are three edges to take
when going around that way.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


110 Chapter 2. Introduction to Linear Optimization

2.24 An unbounded linear program


Let’s modify Problem 2.7, removing the constraint x + y ≤ 7 (corresponding to the
slack variable w3 :
Problem 2.8 Solve the following linear program (put in equational form on the right).

maximize 2x + 3y maximize 2x + 3y
x,y∈R x,y,w1 ,w2 ∈R
subject to −x + y ≤ 3 ⇝ subject to −x + y + w1 =3
x − 2y ≤ 2 x − 2y + w2 = 2
x, y ≥ 0 x, y, w1 , w2 ≥ 0

Our first iteration of the simplex method will be nearly the same with Problem 2.8
as it was with Problem 2.7, and will also bring us to the point (0, 3). We can quickly
get the dictionary for that point by dropping the equation for w3 :

max ζ = 9 + 5x − 3w1 (0, 3)


y = 3 + x − w1
w2 = 8 + x − 2w1
x

Looking at the diagram, we see what’s about to happen: the feasible region is
unbounded in the direction we want to go.
It’s still a good idea to bring x into the basis: it still has a positive reduced cost.
But now, both basic variables are ruled out at the first stage: both of them have a
positive coefficient in x’s column, so neither of them decreases as x increases. There
is no leaving variable to choose.
This is what it looks like when the linear program is unbounded, and we can
improve the objective value as much as we want. There is no optimal solution.
From this dictionary, we can also learn a bit about how the linear program
is unbounded. The variables doing useful work are the basic variables y, w2 and
the entering variable x. All other variables (just w1 in this case) should be set to
0. Setting w1 to 0 and getting rid of it in every other row gives us the following
pseudo-dictionary:

y
max ζ = 9 + 5x
(0, 3)
y = 3 + x
w2 = 8 + x
w1 = 0
x

We get better and better solutions as we travel along the line y = 3 + x, increasing x
as much as we want: the objective value increases as ζ = 9 + 5x. All our variables

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.25 An example of degenerate pivoting 111

remain positive (including the slack variables w1 and w2 ), so the solution remains
feasible the whole way.
All this is happening behind the scenes when we do any pivot step. But here,
because the coefficients in x’s column were both positive, the slopes of y = x + 3 and
w2 = x + 8 are both positive, which means that we can increase x without a limit.
And since x had a positive reduced cost of 5, we know that this gives us arbitrarily
large objective values.
Whenever we learn from the dictionary that the linear program is unbounded,
we can perform such an analysis to find an infinite ray of feasible solutions along
which the objective value improves without bound.

2.25 An example of degenerate pivoting


For our final problem, let’s replace the x+y ≤ 7 constraint by the constraint x+2y ≤ 6:
Problem 2.9 Solve the following linear program (put in equational form on the right).

maximize 2x + 3y maximize 2x + 3y
x,y∈R x,y,w1 ,w2 ,w3 ∈R
subject to −x + y ≤ 3 subject to −x + y + w1 =3
x − 2y ≤ 2 ⇝ x − 2y + w2 =2
x + 2y ≤ 6 x + 2y + w3 = 6
x, y ≥ 0 x, y, w1 , w2 , w3 ≥ 0

To see what makes Problem 2.9 different from Problem 2.7, let’s take a look at
the initial dictionary and especially at the feasible region:

max ζ = 0 + 2x + 3y
w1 = 3 + x − y
w2 = 2 − x + 2y
w3 = 6 − x − 2y x
(0, 0)

The constraint −x + y ≤ 3 is just barely irrelevant. The line −x + y = 3 touches


the feasible region at the corner point (3, 0), and doesn’t change the feasibility of
anything. How will things change? Let’s find out!
We can proceed as before and bring y into the basis. As before, w2 is out of
consideration because it has a positive coefficient in y’s column. Meanwhile, w1 and
w3 have ratios 31 and 62 , so they are tied for having the smallest ratio. As we increase
y, w1 and w3 will decrease and hit 0 at the same time.
Which leaving variable do we choose in such a case? Right now, we have no way
to tell which choice is better, but either choice will give us a feasible dictionary. Let’s

T.Abraha(PhD) @AKU, 2024 Linear Optimization


112 Chapter 2. Introduction to Linear Optimization

choose w1 , because that’s what we did last time. We get:


y
max ζ = 9 + 5x − 3w1
(0, 3)
y = 3 + x − w1
w2 = 8 + x − 2w1
w3 = 0 − 3x + 2w1
x

Here is where things start to go wrong. Our next entering variable must be x,
because it’s the only variable with positive reduced cost. As x increases, y and w2
also increase, so they will not leave the basis: the leaving variable must be w3 . But
when we make this happen, the dictionary changes, but the values of all the variables
stay the same!
y
1 5
max ζ = 9 + 3 w1 − 3 w3 (0, 3)
1 1
y = 3 − 3 w1 − 3 w3
4 1
w2 = 8 − 3 w1 − 3 w3
2 1
x = 0 + 3 w1 − 3 w3 x

The problem is that the three lines x = 0, −x + y = 3, and x + 2y = 6 all meet at


the point (0, 3). Previously, when x and w1 were nonbasic, we thought of (0, 3) as
the intersection of x = 0 and −x + y = 3. Now, w1 and w3 are nonbasic, and we’ve
“moved” to the intersection of −x + y = 3 and x + 2y = 6. This is also (0, 3).
This is called degenerate pivoting. There’s one big problem with degenerate
pivoting:
• Usually, we can say, “The simplex method is always improving the value of ζ,
so it can never revisit the same corner point. Since there’s only finitely many
corner points, it has to reach the right one eventually."
• With degenerate pivoting, the value of ζ does not always improve. So we have
no guarantee that the simplex method won’t keep going forever, stuck at the
same corner point being represented in different ways.
In this example, we’ll leave the point (0, 3) after one more step. But in more
complicated examples, when many constraints meet at one poit, staying at that point
forever is a real danger. To avoid this, we’ll need to develop pivoting rules that
avoid infinite loops, by telling us the right variable to remove from the basis in cases
when there’s a tie.

2.26 The problem with initialization


2.26.1 A tricky example
Consider the following problem:
Problem 2.10 You have an important assignment due in 5 hours. You’re working
on it in a coffee shop, and so you’re trying to bribe yourself to work on it by a
combination of fancy coffee and sweet tea.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.26 The problem with initialization 113

You’ll need at least one cup an hour to keep you focused. To stay awake until
the assignment is due, you’ll need at least 7 “units" of caffeine (if we say a unit of
caffeine is the amount in a cup of tea; there’s 3 unit in a cup of coffee). Finally, to
have the energy to work on the assignment, you need at least 6 units of sugar (the
amount in a cup of coffee; a cup of sweet tea has 2 units).
If every cup of coffee costs $4.50 and every cup of tea costs $3, what is the
cheapest way to make this work?
We can write this problem as follows:

minimize 4.5x1 + 3x2 minimize 4.5x1 + 3x2


x1 ,x2 ∈R x1 ,x2 ,w1 ,w2 ,w3 ∈R
subject to x1 + x2 ≥ 5 subject to −x1 − x2 + w1 = −5
3x1 + x2 ≥ 7 ⇝ −3x1 − x2 + w2 = −7
x1 + 2x2 ≥ 6 −x1 − 2x2 + w3 = −6
x1 , x 2 ≥ 0 x1 , x 2 , w 1 , w 2 , w 3 ≥ 0

The difficulty in adapting our methods to this problem is this: how do we find an
initial basic feasible solution?

2.26.2 The problem in general


In the previous lecture, we relied on a shortcut for starting the simplex method:
we assumed that the point x = 0 is feasible for our linear program. This naturally
happens, for example, in production problems where all our constraints are resource
constraints: they put upper bounds on how big x1 , x2 , . . . , xn get, but no lower
bounds.
However, not all linear programs look like this! In particular, in the linear program
above, none of our constraints are satisfied when x = 0, except for the nonnegativity
constraints.
We’ll distinguish between two cases where starting with x = 0 does not make
sense:
• (Easier case) All our constraints are inequality constraints, which we’ll have to
add slack variables to before getting a problem in equational form.
• (Harder case) We are dealing with a problem that’s already in equational form
(Ax = b where x ≥ 0), possibly because we started out with some equational
constraints.
In both cases, the solution is the two-phase simplex method. In this method, we:
1. Solve an auxiliary problem, which has a built-in starting point, to determine if
the original linear program is feasible. If we succeed, we find a basic feasible
solution to the original linear program.
2. From that basic feasible solution, solve the linear program the way we’ve done
it before.
The auxiliary problem will be easier to state, and have fewer additional variables, in
the case we’re calling the “easier cases"—such as the example problem. So we will
begin there.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


114 Chapter 2. Introduction to Linear Optimization

2.27 The two-phase method for inequalities


We adjust our problem by adding a new artificial variable x0 that lets us violate
each constraint by some amount. When we replace an a ≤ b constraint by a − x0 ≤ b,
this lets us violate the original constraint by a margin of up to x0 . No matter what
our other variables are set to, if we make x0 large enough, we can satisfy every
equation.
Of course, we don’t want to make x0 large, because we’re not interested in
solutions that violate all our constraints. So our first step will be to minimize an
auxiliary objective function: we will minimize ξ = x0 . If we can get x0 down to
0, then we’re not violating any constraints, and so we have a feasible solution for our
original problem!
(Note: this is a slight departure from how the problem is described in the textbook;
the textbook has not yet introduced minimization problems, and so it describes every-
thing in terms of maximizing −ξ. Also, feel free to use a different variable instead of
ξ (xi) if you have trouble writing ξ.)
This is called the phase one problem. Here is the full description, in our
original example:

minimize x0
x0 ,x1 ,x2 ,w1 ,w2 ,w3 ∈R
subject to −x1 − x2 + w1 − x0 = −5
−3x1 − x2 + w2 − x0 = −7
−x1 − 2x2 + w 3 − x0 = −6
x0 , x 1 , x 2 , w 1 , w 2 , w 3 ≥ 0

As usual, we’ll set x1 = x2 = 0 in our initial feasible solution. We’ll need to set x0 = 7,
because that’s the largest number on the right-hand side. Then w2 = 0 satisfies our
second equation, and we can set w1 = 7 − 5 = 2 and w3 = 6 − 5 = 1 to satisfy the
second and third equation.
That’s an unsystematic description of how we get our initial dictionary, though.
We begin by taking our basic variables to be w1 , w2 , w3 :

min ξ = 0 + x0
w1 = −5 + x1 + x2 + x0
w2 = −7 + 3x1 + x2 + x0
w3 = −6 + x1 + 2x2 + x0

This is not feasible: all three of w1 , w2 , w3 are negative in the basic solution. Our
first step in the phase one problem is always, ignoring any pivoting rules, to bring x0
into the basis, and take w2 (the variable with the most negative value) out of the
basis. This is guaranteed to lead us to a feasible dictionary:

min ξ = 7 − 3x1 − x2 + w2
w1 = 2 − 2x1 + w2
x0 = 7 − 3x1 − x2 + w2
w3 = 1 − 2x1 + x2 + w2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.28 The general two-phase simplex method 115

(It will always be the case that the equation for ξ at the top matches the equation
for x0 , for as long as x0 is a basic variable. I will keep writing the same equation in
both places, just to match the usual way we write a dictionary.)
Now we can proceed to solve this linear program in the usual way. Since we’re
minimizing ξ, we should pivot on entries that have a negative reduced cost. In this
example, pivoting on x2 turns out to be the best choice. The only possible leaving
variable is x0 (since it is the only one with a negative coefficient on x2 . Solving x0 ’s
equation for x2 gives x2 = 7 − 3x1 + w2 − x0 , and then we will just substitute that in
for x2 in the other equations:
min ξ = 7 − 3x1 + w2 − (7 − 3x1 + w2 − x0 ) min ξ = 0 + x0
w1 = 2 − 2x1 + w2 w1 = 2 − 2x1 + w2

x2 = 7 − 3x1 + w2 − x0 x2 = 7 − 3x1 + w2 − x0
w3 = 1 − 2x1 + w2 + (7 − 3x1 + w2 − x0 ) w3 = 8 − 5x1 + 2w2 − x0
The phase one problem is solved once the objective value reaches 0, which typically
happens exactly when x0 leaves the basis. Once this happens, we can solve the
phase two problem: the one we started with! To get there, we:
1. Remove x0 from the dictionary; we no longer need it.
2. Replace the artificial objective function ξ by the original objective function ζ,
expressed in terms of the current basic variables.
In this case our original objective is to minimize ζ = 4.5x1 + 3x2 . Substituting
x2 = 7 − 3x1 + w2 gives us ζ = 21 − 4.5x1 − 3w2 , so our new dictionary is:
min ζ = 21 − 4.5x1 + 3w2
w1 = 2 − 2x1 + w2
x2 = 7 − 3x1 + w2
w3 = 8 − 5x1 + 2w2
Since we are minimizing ζ, the only good choice of entering variable is x1 . Comparing
the ratios 22 , 73 , and 85 , we see that w1 must leave the basis. Solving w1 ’s equation
for x1 , we get x1 = 1 − 12 w1 + 12 w2 . Now we substitute that into the other equations:

min ζ = 21 − 4.5(1 − 12 w1 + 21 w2 ) + 3w2 min ζ = 16.5 + 2.25w1 + 0.75w2


1 1
x1 = 1 − 2 w1 + 2 w2 x1 = 1 − 12 w1 + 12 w2

x2 = 7 − 3(1 − 12 w1 + 21 w2 ) + w2 x2 = 4 + 32 w1 − 12 w2
w3 = 8 − 5(1 − 12 w1 + 21 w2 ) + 2w2 w3 = 3 + 52 w1 − 12 w2
Since we are minimizing ζ and all our reduced costs are nonnegative, we have found
the optimal solution. With 1 cup of fancy coffee and 4 cups of sweet tea (exceeding
our sugar minimum by 3 units), we have found the cheapest combination of drinks,
costing $16.50.

2.28 The general two-phase simplex method


Suppose we have a general linear program in equational form: our constraints are
written as Ax = b, with x ≥ 0. Our approach in the previous section relied on having
inequality constraints, so it no longer applies.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


116 Chapter 2. Introduction to Linear Optimization

One silver lining is that we can always make the right-hand side nonnegative.
An equation constraint can always be multiplied by −1 and remain valid (unlike an
inequality constraint, which reverses when it is multiplied by −1). So let’s assume
that b ≥ 0.
The solution here is to introduce artificial slack variables to the problem. We
turn the problem Ax = b into the problem Ax ≤ b, and then add slack variables
w1 , w2 , . . . , wm ≥ 0 to turn it back into equational form. (In matrix form, this looks
like Ax + Iw = b.)
What’s the point? Well, because we’ve assumed b ≥ 0, the new is a problem for
which the two-phase simplex method is not necessary: if we make the slack variables
w1 , w2 , . . . , wm our basic variables, we get an initial basic feasible solution.
As before, we introduce an artificial objective function to optimize in the phase
one problem. In this case, our slack variables w1 , w2 , . . . , wm are artificial: they do
not belong in the problem, since we want to have Ax = b and not just Ax ≤ b. So
we decide to minimize ξ = w1 + w2 + · · · + wm : the sum of the slack variables. If we
can get it down to 0, then we get a solution where Ax = b, and then we can proceed
to the phase two problem.
For example, suppose that we have the following constraints:



 x1 + x2 + x3 = 1

6x − 2x3 = 1

1




2x1 + x2 − 3x3 = −1



x1 , x 2 , x 3 0

Our first step is to rewrite the third constraint as −2x1 − x2 + 3x3 = 1, so that all
the numbers on the right-hand side are positive. Now we are ready to insert artificial
slack variables w1 , w2 , w3 :



 x1 + x2 + x3 + w1 = 1

6x1 − 2x3 + w2 = 1






−2x1 − x2 + 3x3 + w3 = 1



x1 , x 2 , x 3 , w 1 , w 2 , w 3 0

Our objective function for the phase one problem is ξ = w1 + w2 + w3 , but that’s
phrased entirely in terms of the basic variables. We must substitute w1 = 1 − x1 −
x2 − x3 , w2 = 1 − 6x1 + 2x3 , and w3 = 1 + 2x1 + x2 − 3x3 to get the objective function
in the form we want. If we do, then ξ simplifies to 3 − 5x1 − 2x3 , and we get the
initial dictionary

min ξ = 3 − 5x1 − 2x3


w1 = 1 − x1 − x2 − x3
w2 = 1 − 6x1 + 2x3
w3 = 1 + 2x1 + x2 − 3x3

As before, we will proceed to minimize ξ, hoping to get to 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.29 Troubleshooting 117

2.29 Troubleshooting
There are several unexpected things that can go wrong in the two-phase simplex
method.
It is possible that we can never get the artificial objective function ξ down to
0. This is an indicator that our original problem did not have a feasible solution!
Although this is disappointing for the problem we were trying to solve, it’s convenient
for the solver: now we can skip phase two.
In most cases, we expect that ξ will hit 0 at the same time that our artificial
variable(s) leave the basis. After all, if ξ = 0, then x0 (from our first two-phase
method) or the artificial slack variables w1 , . . . , wm (from our second method) must
all be 0, which is the sort of thing nonbasic variables generally do. However, it is
possible for these variables to be basic and still be equal to 0.
In such a degenerate case, we can make some quick final adjustments. If an
artificial variable is equal to 0 but still basic, pick any nonbasic, non-artificial variable
in its equation, and do a pivot step to replace the artificial variable by that nonbasic
variable—ignoring our usual pivoting rules. Because both variables will remain equal
to 0, this will not change the value of any other variables, so this pivot step preserves
feasibility.
In our second two-phase method, an even weirder thing can happen. It’s possible
that:
• The artificial objective function ξ has reached 0;
• Some artificial slack variable wi is still basic;
• There are no non-artificial variables in wi ’s equation to replace it with!
If this happens, just forget that equation entirely. What this means is that one of the
equations in the system Ax = b was redundant; it could be deduced from the others.
Once we eliminate the artificial slack variables, the redundant equation becomes
0 = 0; we don’t need it.

2.30 A very degenerate problem


Here is the example we will consider today:
Problem 2.11 Xerxes, Yvonne, and Zsuzsa decide to bake biscuits. However, before
they begin baking, they start arguing about how they’ll divide the biscuits:
• Xerxes says, “I’m not greedy; I just want at least one biscuit for every two
biscuits that the two of you take."
• Yvonne says, “Well, last time we baked, Zsuzsa didn’t do her fair share of the
work! I want at least twice as many biscuits as she will get."
• Zsuzsa says, “Look, not all of us are master bakers. I’ll do my best, and I feel
like I deserve at least a quarter of the biscuits we make."
What is the maximum number of biscuits that they can end up baking?
If Xerxes gets x biscuits, Yvonne gets y biscuits, and Zsuzsa gets z biscuits, then
our inequalities are: x ≥ 12 (y + z), y ≥ 2z, and z ≥ 41 (x + y + z). The bakers’ joint
goal is to maximize the total number x + y + z. Rearranging the inequalities, we can

T.Abraha(PhD) @AKU, 2024 Linear Optimization


118 Chapter 2. Introduction to Linear Optimization

write the problem as:

maximize x + y + z
x,y,z∈R
subject to −2x + y + z ≤ 0
−y + 2z ≤ 0
x + y − 3z ≤ 0
x, y, z ≥ 0.

The astute observer will notice that (as usual with baking) if we find a feasible
solution (x, y, z) then we can scale it up to (2x, 2y, 2z) or even (100x, 100y, 100z)
without violating the inequalities. So it seems like there can’t be any limit in the
number of biscuits baked.
This is almost correct. The challenge here is to figure out if there’s any division of
biscuits that will make all three bakers happy. If not, then the only feasible solution
is (x, y, z) = (0, 0, 0) and no amount of scaling that up will get you biscuits.
If we add slack variables w1 , w2 , w3 to the inequalities, then they get us an initial
basic feasible solution; no two-phase simplex method needed here! Unfortunately,
the initial dictionary we write down looks somewhat concerning. . .
Here it is:
max ζ = 0 + x + y + z
w1 = 0 + 2x − y − z
w2 = 0 + y − 2z
w3 = 0 − x − y + 3z

Any of the three entering variables seem like equally good candidates. Let’s just try
making y the entering variable.
If we follow our usual procedure, then only w1 and w3 make it onto our “shortlist"
of leaving variables. (Even this step doesn’t seem entirely justified! Usually, if a
variable is not on our shortlist, it’s because pivoting on it is guaranteed to produce
an infeasible dictionary. However, in this case, all three leaving variables will produce
feasible dictionaries when we pivot, because we won’t be able to leave point (0, 0, 0)
after this step.) The ratios for w1 and w3 are both 01 , meaning that we can’t increase
y past 0 before either of them becomes negative. This is a tie, so we can’t tell which
variable to pick; let’s arbitrarily take w1 .
After solving w1 = 2x − y − z to get y = 2x − w1 − z and substituting this for y in
the other equations, we get the dictionary below:

max ζ = 0 + 3x − w1
y = 0 + 2x − w1 − z
w2 = 0 + 2x − w1 − 3z
w3 = 0 − 3x + w1 + 4z

It sure doesn’t seem like we’re making any progress. However, there is still a positive
reduced cost, so we can still
  keep going by pivoting on x.
Altogether, there are 63 = 20 ways to choose three basic variables in this problem.
One of them turns out not to work: if you try to solve for x, w1 , and w3 , you end up

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.31 Pivoting rules 119

having to take the inverse of a singular matrix. That leaves 19 feasible dictionaries,
all of which describe the point (x, y, z) = (0, 0, 0) in various ways.
What’s even the point of pivoting, then?6 Actually, there are two possible
outcomes that would solve the problem for us:
• Suppose that one of these 19 dictionaries has all negative reduced costs. Then
that formula for ζ proves that whenever x1 , x2 , x3 , w1 , w2 , w3 ≥ 0, we have ζ ≤ 0.
In that case, we’d be able to conclude that (0, 0, 0) is the only feasible solution.
• Suppose that one of these 19 dictionaries has an entering variable, with positive
reduced cost, such that all the coefficients in that column are positive. Then
we’d have a way to escape to infinity: by increasing that variable and keeping
the other nonbasic variables at 0, we increase all the basic variables (and ζ)
and discover that the linear program is unbounded.
The problem is that because of all the degenerate pivots we’re doing, we can
never tell if we’re making progress toward either of these goals. In fact, we don’t
even have a clear proof that either of these outcomes is guaranteed to happen!

2.31 Pivoting rules


A pivoting rule is a rule for making decisions in the simplex method in cases where
our usual rules don’t fully determine what to do. There are two situations in which
we currently need the help of a pivoting rule:
1. If two or more nonbasic variables have a reduced cost with the correct sign
(positive when maximizing, and negative when minimizing), then we don’t
know how to choose between them.
2. If two or more potential leaving variables are tied with an equal ratio, then
we can bring either one of them out of the basis and get a feasible dictionary.
Again, we don’t know which one to choose.
We’d like to settle these two scenarios in a way which avoids cycling: going between
the same set of dictionaries forever. Our secondary goal is to make decisions that
speed up the simplex method.
For example, it seems like a generally useful heuristic to use the highest-cost
pivoting rule to settle situation 1: when maximizing, pick the entering variable
with the largest positive cost, and when minimizing, pick the entering variable with
the most negative cost. The intuition is that this picks the direction in which the
objective value improves as rapidly as possible.
On the other hand, in situation 2, it’s not clear what to do. But if we’re
describing an algorithm completely, we need to specify how to break ties! One
possibility is to go in order: write down a fixed ordered list of all our variables (such
as (x, y, z, w1 , w2 , w3 )) and always pick the first variable on that list if there’s a tie
for the leaving variable.
Unfortunately, the combination of these two rules is not a winner. It is possible to
come with examples in which it will cycle forever between different representations of
the same corner point. (One example is given in section 3.2 of Vanderbei’s textbook.)
It turns out that one good answer is to use Bland’s rule. Here, once again,
we pick a fixed ordered list of our variables. This time, we use that list to make
decisions in both situation 1 and situation 2.
6Well, the point is (0, 0, 0), of course. Haha.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


120 Chapter 2. Introduction to Linear Optimization

Fact 1: Bland’s rule prevents cycling: it can never return to a dictionary it’s
previously considered.
I am calling this a “fact" and not a “theorem" because we will not prove it.
The drawback of Bland’s rule is that it’s slow: even though it never returns to
the same feasible dictionary twice in degenerate cases, it tends to perform badly
in cases with no degeneracy. That is, it often picks longer paths from the initial
corner point to the optimal one. Intuitively, the reason this happens is that variables
earlier in our list are both more likely candidates to enter the basis and more likely
candidates to leave: so they end up flipping back and forth often. (Unfortunately,
this property also plays a key role in the proof that Bland’s rule prevents cycling.)
We’d like to come up with a rule that avoids cycling just by addressing situation
2 (how to choose leaving variables). That way, we can pair it with the highest-cost
pivoting rule, which only addresses situation 1 (how to choose entering variables).
The highest-cost pivoting rule is not the smartest rule there is, but it’s good enough
in many cases.

2.32 Lexicographic pivoting


The solution we’re looking for is called lexicographic pivoting. To explain this
rule, we’ll begin with a different rule that’s bad in many ways, but will provide useful
intuition.

2.32.1 Intuition: random perturbations


Cycling can only happen when we have a degenerate pivoting step: otherwise, we’re
improving the objective value with every pivot, and can never return to a previous
(worse) dictionary. Degenerate pivoting steps only happen when we have too many
variables simultaneously equal to 0 at the same corner point.
In a randomly-chosen problem, this would never happen; once a corner point in
n
R is determined as the intersection of n hyperplanes, another random hyperplane
is very unlikely to pass exactly through that point. (In fact, in a formal way of
defining that probability, the probability is 0.) Unfortunately, we don’t usually solve
randomly-chosen problems: our example today ends up with lots of degenerate pivots,
even though we didn’t do anything that weird.
But imagine if we took our system Ax = b and randomly adjusted the constants
b by a small amount. For example, maybe we randomly adjust our initial dictionary
as follows:

max ζ = 0 + x + y + z max ζ = 0+ x+y+ z


w1 = 0 + 2x − y − z w1 = 0.000878996 + 2x − y − z

w2 = 0 + y − 2z w2 = 0.000534988 + y − 2z
w3 = 0 − x − y + 3z w3 = 0.000657869 − x − y + 3z

Geometrically, we’ve taken each equation and pushed it by a random tiny amount.
It is very unlikely that the result has even a single degenerate dictionary. So with
this adjustment, none of our pivot steps will be degenerate, and so we’ll never cycle.
Of course, we’re solving a slightly different problem now, but as long as our random

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.32 Lexicographic pivoting 121

adjustments were sufficiently small, our final answer will be very very very close to
the answer to our original problem.
(Once we’re done, we may even be able to recover the exact answer to the original
problem, by assuming that our random adjustment doesn’t change the optimal choice
of basic variables.)

2.32.2 Actual lexicographic pivoting


The method of random perturbations works, but it’s not very elegant: the solution
we get is a tiny amount off from correct, the calculations become much messier, and
it’s hard to be certain what the threshold for “tiny adjustment" is before we end up
solving a completely different problem.
The lexicographic pivoting rule is inspired by random perturbations in a “let’s not,
and say we did" kind of way. Instead of adding actual tiny random numbers to our
constraints, we add variables ϵ1 , ϵ2 , ϵ3 , . . . , ϵm that represent those tiny adjustments
in symbolic form:

max ζ = 0 + x + y + z max ζ = 0 + x + y + z
w1 = 0 + 2x − y − z w1 = ϵ1 + 2x − y − z

w2 = 0 + y − 2z w2 = ϵ2 + y − 2z
w3 = 0 − x − y + 3z w3 = ϵ3 − x − y + 3z

The rule for dealing with these ϵi ’s is summarized by the inequality

1 ≫ ϵ1 ≫ ϵ2 ≫ · · · ≫ ϵm > 0.

What does this mean? Let’s break it down step-by-step:


• Because 1 ≫ ϵ1, we say that in any comparison between an actual constant
and ϵ1 (or any other ϵi ), the constant wins. For example, we would treat even
1.001 as bigger than 1 + 1 000 000ϵ1 . The ϵi ’s only help us break ties between
the actual constants we’ve got.
This makes sure that every pivoting step we do is also a valid pivoting step
for the original problem. At the end, we’ll be able to take the solution we got,
drop all the ϵi ’s from it, and get a solution to the problem we wanted to solve.
• Similarly, ϵ1 ≫ ϵ2 ≫ ϵ3 ≫ · · · ≫ ϵm say that in any comparison between two
different constants ϵi and ϵj , the constant with the smaller subscript wins.
This makes sure that after we add the ϵi ’s, we can never obtain a tie. Two
expressions with ϵi ’s in them can only be equal if each of ϵ1 , ϵ2 , . . . , ϵm has the
same coefficient. However (more on this later) the coefficients of ϵ1 , ϵ2 , . . . , ϵm
track how we obtained each equation from our starting equations: so if two
equations had the same coefficient on every ϵi , they’d actually be the same
equation.
The lexicographic pivoting rule gets its name from this ordering.

2.32.3 Working through an example


Starting from the dictionary we had just now (which is repeated below on the left),
let’s pivot with y as our entering variable, as before. Now w1 and w3 are on the
shortlist, but there’s no longer a tie between them: the ratio ϵ13 is smaller than ϵ11 ,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


122 Chapter 2. Introduction to Linear Optimization

so w3 is the only possibly leaving variable. After solving its equation for y, we get
y = ϵ3 − x + 3z − w3 , which we then substitute for x in our other equations. The
result is shown on the right:

max ζ = 0 + x + y + z max ζ = ϵ3 + 4z − w3
w1 = ϵ1 + 2x − y − z w1 = (ϵ1 − ϵ3 ) + 3x − 4z + w3

w2 = ϵ2 + y − 2z w2 = (ϵ2 + ϵ3 ) − x + z − w3
w3 = ϵ3 − x − y + 3z y= ϵ3 − x + 3z − w3

We’ve made an infinitesimal amount of progress: the objective value has improved
from 0 to ϵ3 . (Granted, that’s pretty much the least amount of progress possible,
but so what.) Note that all three basic variables are still positive: in particular,
ϵ1 − ϵ3 > 0.
There is only one positive reduced cost: it is on z. No need to compare ratios: the
only possible leaving variable when z enters the basis is w1 . The resulting dictionary
is
max ζ = ϵ1 + 3x − w1
z = ( 14 ϵ1 − 14 ϵ3 ) + 3
4x − 1
4 w1 + 1
4 w3
w2 = ( 41 ϵ1 + ϵ2 + 34 ϵ3 ) − 1
4x − 1
4 w1 − 3
4 w3
y = ( 34 ϵ1 + 14 ϵ3 ) + 5
4x − 3
4 w1 − 1
4 w3

Now the only positive reduced cost is on x. Once again, there is only one possible
leaving variable, which is w2 . After a third pivot step, we get:

max ζ = (4ϵ1 + 12ϵ2 + 9ϵ3 ) − 4w1 − 12w2 − 9w3


z = (ϵ1 + 3ϵ2 + 2ϵ3 ) − w1 − 3w2 − 2w3
x = (ϵ1 + 4ϵ2 + 3ϵ3 ) − w1 − 4w2 − 3w3
y = (2ϵ1 + 5ϵ2 + 4ϵ3 ) − 2w1 − 5w2 − 4w3

Since the reduced costs of w1 , w2 , w3 are all negative, this tells us that we’ve “maxi-
mized" ζ: 4ϵ1 + 12ϵ2 + 9ϵ3 is the highest possible value it could have. Of course, just
like every other value we saw for ζ, it rounds to 0. To get our final answer, we set
ϵ1 = ϵ2 = ϵ3 = 0 and get that (0, 0, 0) really is our optimal solution.

2.32.4 Shortcuts (optional)


If you look back at our work, and especially at the final dictionary, you may see a
pattern: the coefficients on ϵ1 , ϵ2 , ϵ3 end up matching the coefficients on w1 , w2 , w3 ,
up to a sign.
This is not a coincidence. When we start out writing our first dictionary, we have
equations w1 = ϵ1 + · · · , w2 = ϵ2 + · · · , and w3 = ϵ3 + · · · . After that, the cardinal rule
of working with equations is that we always do the same thing to both sides. So it
will always be true that:
• When wi and ϵi are on opposite sides of the equation, they have the same
coefficient.
• When wi and ϵi are on the same side of the equation, they have opposite
coefficients (same magnitude, but different sign).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.33 Matrix calculations for the dictionary 123

• When wi doesn’t appear in an equation, neither does ϵi.


This means that in principle, we can use the lexicographic pivoting rule without
actually writing down the ϵi ’s. As long as there’s no ties between potential leaving
variables, it’s business as usual. Once there’s a tie, use these rules to figure out the
coefficients of ϵ1 , ϵ2 , . . . and break the tie.
The variables we use for this are w1 , w2 , w3 in this problem because those were
the basic variables in our initial dictionary, which we adjusted by adding ϵ1 , ϵ2 , ϵ3 . In
general, whatever our initial basic variables are, those will be the variables we can
use to deduce the coefficients of the ϵi ’s.

2.33 Matrix calculations for the dictionary


Let’s review a fact from linear algebra: a matrix-vector product Ax can be viewed
as taking a linear combination of the columns of A with weights from the vector x.
For example:
        
1 2 3 x 1 2 3
        
4 5 6 y  = x 4 + y 5 + z 6
        
7 8 9 z 7 8 9

This is just a different way of using the definition of matrix multiplication. We can
check that in the equation above, for example, both sides give 1 · x + 2 · y + 3 · z for
the first component of the result.
To add a bit of a twist to this idea: once we’ve split the product Ax up into
columns like this, we can recombine some of the columns into smaller matrix-vector
products. For example:
 
x1
 
 x 
 2
          

1 1 1 1 1   1 1 1 1 1
 x3  = x1   + x2   + x3   + x4   + x5  
1 2 3 4 5  1 2 3 4 5
 

x4 
 
x5
           
1 1 1 1 1
= x2   + x5   + x1   + x3   + x4  
2 5 1 3 4

 
    x1 
1 1 x2  1 1 1  
=  + x3  .
 
2 5 x5 1 3 4
x4

To generalize the notation xi for the ith component of a vector, if I is a sequence


of several indices, like I = (1, 3, 4), then we will write xI for the smaller vector with
just the components numbered by I picked out. For example, if x = (x1 , x2 , x3 , x4 , x5 ),
then x(1,3,4) = (x1 , x3 , x4 ).
In the case of a matrix A, we will write Ai for the ith column of A. (Picking out
a column is more useful to us than picking out a row, which is why we’ve made this

T.Abraha(PhD) @AKU, 2024 Linear Optimization


124 Chapter 2. Introduction to Linear Optimization

decision for what the notation means.) Just as with vectors, if I is a sequence of
several indices, then we’ll write AI for the matrix we get by picking out the columns
numbered by I from A.
With this notation, the equation we wrote down a bit ago can be written more
compactly as
Ax = A(2,5) x(2,5) + A(1,3,4) x(1,3,4) .

This is true for any 5-column matrix A multiplied by any x ∈ R5 .


We are interested in doing this for one specific purpose: separating the basic
variables from the nonbasic variables. For example, suppose that we have a system
of equations
 
x1
 
  x   
 2

x
1 + x2 + x3 + x4 + x5 = 4 1 1 1 1 1   4
⇐⇒  x  =  
 3
 x1 + 2x2 + 3x3 + 4x4 + 5x5 = 10 1 2 3 4 5  
x4 
10
 
x5
which we write compactly as Ax = b. We decide that we want our basic variables to
be x2 , x5 and we want to solve for them in terms of the nonbasic variables x1 , x3 , x4 .
To do this, set B = (2, 5) and N = (1, 3, 4), so that xB = (x2 , x5 ) is the vector of basic
variables and xN = (x1 , x3 , x4 ) is the vector of nonbasic variables. Then we can turn
the matrix equation Ax = b into the matrix equation
AB xB + AN xN = b.
Now we can solve for xB in the same way that we’d solve a two-variable equation
2x + 3y = 4 for x. We move AN xN to the other side, and then multiply by the inverse
of AB :
AB xB + AN xN = b =⇒ AB xB = b − AN xN
=⇒ xB = (AB )−1 b − (AB )−1 AN xN .
This is just like the usual dictionary form of our answer, except in matrix form. In
particular, we know that setting xN = 0 (that is, setting all the nonbasic variables
to 0) gives us the basic solution. In this case, doing that in the equation above turns
it into xB = (AB )−1 b. In other words:
Fact 2: When B lists the basic variables and N lists the nonbasic variables, the
basic solution to Ax = b is to set xB = (AB )−1 b and xN = 0.
We will make use of this fact again in the next lecture, when we talk about the
revised simplex method—a way to do the simplex method with fewer unnecessary
calculations. Today, we will use it to prove some claims we’ve previously explained
by intuition.

2.34 Definitions of corner points


What is a "corner point" of a subset of Rn ? There are three relevant definitions we’ll
discuss today.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.34 Definitions of corner points 125

The first definition is the definition of an extreme point. An extreme position


in politics is the opposite of a moderate position: it’s a position that does not
compromise about anything. Similarly, an extreme point in a set S ⊆ Rn is a point
that is not in the middle between any other points of S.
What does being between two points mean? Algebraically, given two points
p and q, the set of points that lie between them is the line segment of all points
tp + (1 − t)q where 0 ≤ t ≤ 1: the set of weighted averages of p and q. We make our
definition accordingly: a point x ∈ S is an extreme point of S if it cannot be written
as ty + (1 − t)z for y, z ∈ S and 0 ≤ t ≤ 1, except by taking y = x and/or z = x.
Let’s look at an example. Suppose set S ⊆ R2 is the region below. (Note that
this is not the feasible region of any linear program: it has a curved boundary!)

What are the extreme points? Nothing in the middle will do: if we can go a little bit
right and a little bit left from a point and stay in S, for example, then you’re in the
middle between “a little bit right" and “a little bit left", so you’re not an extreme
point. Also, a point that lies on a straight-line boundary is also not an extreme
point: it is between two points obtained by going a little bit in one direction along
the boundary, and a little bit in the other direction.
The three corner points of the triangle attached on the left of S are all extreme
points. Also, every single point on the curved boundary on the right of S is an
extreme point: from a point on the boundary of a circle, if you pick two opposite
directions to go in, one of them will leave the circle.
The definition of an extreme point describes a geometric intution. We can also
define a corner point in terms of what we want corner points to do. This gives us the
definition of a vertex. When S ⊆ Rn , a point x ∈ S is a vertex of S if there is some
nonzero vector a ∈ Rn such that the dot product aT x = a1 x1 + · · · + an xn is strictly
bigger (no ties allowed!) than aT y for any y ∈ S with y ̸= x.
In other words, the vertices are the points in S that are the unique optimal
solutions to a linear maximization problem over S.
Looking at the region drawn above, its vertices are almost the same as its extreme
points. For the corner points between two straight boundaries, there are many
vectors a we could choose to justify that the corner point is a vertex. For a point on
the boundary of the circular arc, pick a to be the direction away from the center of
that circle.
There’s only one exception, which is very subtle: the points where the circular
arc meets the straight boundary are extreme points, but they’re not vertices. That’s
because if we optimize along the vector a which points from the center of the circle
toward one of these points, then a points vertically, and all the points along that
straight boundary will be tied with x.
The last definition of a corner point only applies to the regions we care about:
regions of the form S = {x ∈ Rn : Ax = b and x ≥ 0}. We will assume that the system
of equations Ax = b has no redundant or inconsistent equations: this assumption

T.Abraha(PhD) @AKU, 2024 Linear Optimization


126 Chapter 2. Introduction to Linear Optimization

holds whenever we’re using the simplex method, though sometimes we need a two-
phase method to check it. Let m be the number of equations (the number of rows in
A).
In this setting, a basic feasible solution x is any x ∈ S such that we can
split (1, 2, . . . , n) into m basic variables B and n − m nonbasic variables N to have
xB = (AB )−1 b and xN = 0. (Note that from x ∈ S, it follows that x ≥ 0.) The basic
feasible solutions are exactly the solutions that the simplex method explores. What
relationship do they have to the extreme points and the vertices?

2.35 Relationships between the definitions


We’ve seen by example that for general sets S, extreme points and vertices don’t
have to be the same. However, in the case S = {x ∈ Rn : Ax = b and x ≥ 0}, they
will be the same, and they will always be the same as the basic feasible solutions!
We will prove this in three steps.

2.35.1 From basic feasible solutions to vertices


Theorem 2.4 Any basic feasible solution is a vertex of the feasible region S = {x ∈
Rn : Ax = b and x ≥ 0}.

Proof. Suppose that x is a basic feasible solution: choose B and N such that
xB = (AB )−1 b and xN = 0. Then, define a by setting aB = 0 and aN = −1.
Then the dot product aT y = a1 y1 + a2 y2 + · · · + an yn simplifies to the sum

aT y =
X
(−1) · yi .
i∈N

When y = x, this is equal to 0: each term of the sum is 0.


In fact, the only way for aT y = 0 to hold is to have yN = 0: otherwise, there will
be a negative term in the sum! And when yN = 0, the equation Ay = b turns into
AB yB = b, or yB = (AB )−1 b. Therefore aT y = 0 only if y = x.
We’ve checked exactly the condition for x to be a vertex of the feasible region. ■

2.35.2 From vertices to extreme points


We’ll actually be able to show that for any set S, all vertices are also extreme points,
even if S is not the feasible region of a linear program (though if S has curved
boundaries, the reverse might not hold).
Theorem 2.5 Any vertex of any set S ⊆ Rn is also an extreme point of S.

Proof. Let x ∈ S be a vertex of S, and let a be the vector such that aT x < aT y for
all y ∈ S with y ̸= x.
Suppose that x is not an extreme point: then there are y, z ∈ S not equal to x
and some 0 ≤ t ≤ 1 such that x = ty + (1 − t)z. Multiplying by aT on both sides and
distributing, we get

aT x = t(aT y) + (1 − t)(aT z).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.35 Relationships between the definitions 127

Because y ̸= x and z ̸= x, we know that aT y < aT x and aT z < aT x. Therefore

t(aT y) + (1 − t)(aT z) < t(aT x) + (1 − t)(aT x).

But the right-hand side of this inequality just simplifies to aT x, and we get the
ridiculous inequality aT x < aT x. Therefore assuming x is not an extreme point has
led us to a contradiction, and x must be an extreme point. ■

2.35.3 From extreme points to basic feasible solutions


The last step, going from extreme points to basic feasible solutions, is trickier.
Theorem 2.6 Any extreme point of the feasible region S = {x ∈ Rn : Ax = b and x ≥
0} is a basic feasible solution.

Proof. Let x be any extreme point of the feasible region. Split up (1, 2, . . . , n) into
P and Z such that xZ = 0Z and xP > 0P : the positive entries of x and the zero
entries of x.
What we’d like to be the case is that xP is m-dimensional (remember, m is the
number of rows in A) and that AP is invertible. Then we can take B = P and N = Z,
and x will be forced to be the basic feasible solution with basic variables B.
This can go wrong in a few ways. First of all, P might be too small. This is still
fine; sometimes we have basic variables equal to 0. If the columns of AP are at least
linearly independent, then we can pick some more columns to add to P to make
B in such a way that the columns of AB are still linearly independent, making AB
invertible. Remove those same columns from Z to get N . Now, once again, x will
be the basic feasible solution with basic variables B.
There are two more things that can go wrong:
• Maybe P is too small, but the columns of AP are already linearly dependent.
In that case, we can’t add any columns to get an invertible matrix, and the
procedure above won’t work.
• Maybe P is not too small, but too big: it has more than m entries. In that
case, the columns of AP are also linearly dependent: they are more than m
vectors in Rm .
So if anything goes wrong, then it’s because the columns of AP are linearly dependent.
In this case, we’ll try to arrive at a contradiction by showing that x is not actually
an extreme point.
If the columns of AP are linearly dependent, then we can take a nontrivial linear
combination of them to get 0. This linear combination can be written as
X
yi A i = 0
i∈P

where not all the yi are 0. Let’s turn these numbers yi into an n-dimensional vector
y, by setting yj = 0 for every j ∈ Z. Then the linear combination above is just AP yP .
Now pick a very very very very small value r > 0, and consider the points x + ry
and x − ry. We’ll show that these are two points in S and x is between them,
concluding that x is not an extreme point.
• We know AP yP = 0. We also know AZ yZ = 0, because yZ = 0. Therefore
Ay = 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


128 Chapter 2. Introduction to Linear Optimization

It follows that A(x ± ry) = Ax ± rAy = b ± r0 = b. So the two points x + ry


and x − ry satisfy our system of equations.
• For every position i in Z, both xi and yi = 0, so xi + ryi = 0 as well.
Meanwhile, for every position i ∈ P, we know xi > 0, so xi ± ryi ≥ 0 provided
that r is sufficiently small.
Therefore x + ry ≥ 0 and x + ry ≥ 0, provided that r is sufficiently small.
These are the two checks we need to know that x + ry ∈ S and x − ry ∈ S. However,
x can be written as 12 (x + ry) + 12 (x − ry), so x is not an extreme point in this case,
contradicting the assumption we started with. ■

2.36 Calculating the reduced costs


Last time, we used the notation AI and xI to pick out columns of a matrix, or
entries from a vector, with indices given by a sequence I.
We used this to write down a formula for a basic solution to the system of equations
Ax = b: if the basic variables are numbered by B, and the nonbasic variables are
numbered by N , then the corresponding basic solution has xB = (AB )−1 b and xN = 0.
A general solution has xB = (AB )−1 (b − AN xN ).
Let’s continue by doing the same thing for the objective function. In general,
this is an expression of the form cT x = c1 x1 + c2 x2 + · · · + cn xn . This, too, can be
split up by basic and nonbasic variables: cT x = (cB )T xB + (cN )T xN . If we want to
know the objective value at a basic solution, we set xB = (AB )−1 b and xN = 0 to
get (cB )T (AB )−1 b.
What about the reduced costs? Well, let’s write (cB )T xB + (cN )T xN just in
terms of xN . To do this, we use the formula xB = (AB )−1 (b − AN xN ) and get

cT x = (cB )T (AB )−1 (b − AN xN ) + (cN )T xN


= (cB )T (AB )−1 b + (cN )T − (cB )T (AB )−1 AN xN .
 

So the row vector of our reduced costs is given by the formula (cN )T −(cB )T (AB )−1 AN .
We’re writing the product (cB )T (AB )−1 a lot, so let’s give it a name: let’s call it
T
u . (It has a transpose because it’s a row vector.) We’ll learn much more about this
vector later; for now, it’s just a vector that’s handy in our calculations!
All this can be summarized by putting our dictionaries in matrix form:

(cN )T − uTAN xN
 
ζ = uT b +
xB = (AB )−1 b − (AB )−1 AN xN

When doing the ordinary simplex method, it would be bad to recompute the dictionary
at every step using these formulas, because computing (AB )−1 at every step is
expensive. On the other hand, this can be useful to compute a dictionary if, for
some reason, all you know is which variables are basic.
We will also use these formulas in the revised simplex method: an improvement on
the simplex method which is more computationally efficient by avoiding unnecessary
calculations.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.37 The revised simplex method 129

2.37 The revised simplex method


2.37.1 Finding a basic solution using matrices
Consider the following problem:
Problem 2.12 You are an adventurer who has just slain a dragon. You’re standing
in the dragon’s lair, admiring the hoard of gold, jewels, magic artifacts, and so forth.
Unfortunately, you can’t take it all. You’re limited by volume (whatever will fit in
your backpack: say, 100 units) and weight (whatever you can carry: say, 30 kg),
and you want to find the most valuable combination of objects possible under those
constraints.
We assume that all kinds of objects are continuous enough that we can say “take
xi kilograms of the ith object" for any plausible xi , and plentiful enough that there’s
no constraints other than the total weight and volume. However, there’s lots of them:
maybe you have a table along the lines of

Gold Silver Rubies Diamonds Magic rings Spell scrolls Stale cookies
Price/kg 2 1 3 5 2 5 0
Volume/kg 3 3 1 2 4 5 5

How do you figure out the most efficient combination of precious items?
We just have two constraints here, aside from nonnegativity constraints:
• if x1, . . . , x7 measure the total amount of the objects in kilograms, then we want
x1 + x2 + x3 + x4 + x5 + x6 + x7 ≤ 30.
• The volume/kg row of the table gives us the constraint on volume: 3x1 + 3x2 +
x3 + 2x4 + 4x5 + 5x6 + 5x7 ≤ 100.
The price/kg row gives us the objective function: we want to maximize 2x1 + x2 +
3x3 + 5x4 + 2x5 + 5x6 .
The challenging part is the number of variables (most of which will not be used
in the optimal solution). If 7 variables (9 when we add slack variables) is not bad
enough for you, you can imagine a more varied hoard for which the problem would
be much worse.
We will do something unusual with the notation today. To make it easier to
connect our dictionary to the matrix formulas, we will name our slack variables x8
and x9 , putting them at the end of our vector x. The variables that describe our
linear program are:
h i
cT = 2 1 3 5 2 5 0 0 0
 
1 1 1 1 1 1 1 1 0
A=
3 3 1 2 4 5 5 0 1
 
30
b =  .
100

Normally, our first choice of basic variables would be B = (8, 9): the slack variables.
To try out our new formulas, we’ll take B = (1, 7): we’ll consider filling up our
backpack with gold and stale cookies.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


130 Chapter 2. Introduction to Linear Optimization

The first thing to compute is (AB )−1 . We have


   
5
1 1 −1 2 − 12 
AB = A(1,7) =  =⇒ (AB ) =  .
3 5 − 23 12
No matter what we do next, we probably want to know the basic feasible solution
(though you have only my word that it’s going to turn out feasible) and the associated
objective value:
      
5
− 12   30  25 25
(cB )T (AB )−1 b = 2 0   = 50.
h i
(AB )−1 b =  2 =
− 32 1
2 100 5 5
So we are currently taking 25 kg of gold and 5 kg of stale cookies, which will bring
us a profit of 50 in whatever currency.

2.37.2 Being careful about what we compute


The fundamental idea of the revised simplex method is that now we are going to be
very careful not to do too much work. In particular, to fill in the entire dictionary at
this point, we’d need to compute (AB )−1 AN , and that’s a really annoying matrix
multiplication. Can we avoid it?
Well, our first step in the simplex method is to choose an entering variable. This
only involves looking at the reduced costs. We have two choices:
• We could go ahead and compute the entire row of reduced costs. This has the
formula (cN ) − (cB )T (AB )−1 AN . To compute this as efficiently as possible,
T

we’d begin by finding uT = (cB )T (AB )−1 , then calculating (cN )T − uTAN . This
avoids having to deal with the product (AB )−1 AN .
• If we use Bland’sT
rule for pivoting, then we get to save some work. After
computing u , we can find the reduced cost of variable xi by calculating
ci − uTAi : xi ’s component of (cN )T − uTAN . Bland’s rule says that we can
stop once we find the first positive reduced cost.
This helps counteract the disadvantage of Bland’s rule: its slowness. We don’t
mind doing more pivot steps if each pivot step becomes faster!
Either way, we begin by computing
 
5
T T −1
h i
 2
− 12  h i
u = (cB ) (AB ) = 2 0 = 5 −1 .
− 32 1
2

Let’s try computing the reduced costs one at a time. Silver (x2 ) gives us
 
1 h i
c2 − uTA2 = 1 − 5 −1   = 1 − (5 · 1 − 1 · 3) = −1.
3

Doing the same calculation for rubies (x3 ) gives us c3 − uTA3 = 3 − (5 · 1 − 1 · 1) = −1,
but for diamonds (x4 ) we finally get c4 − uTA4 = 5 − (5 · 1 − 1 · 2) = 2, which is positive.
Now that we know x4 is our entering variable, we want to find our leaving variable.
The trick is that we don’t need all of (AB )−1 AN to do this! We only care about x4 ’s
column of that matrix, which is given by
    
5
− 12  1  32 
(AB )−1 A4 =  23 1 = .
−2 2 2 − 12

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.37 The revised simplex method 131

Remember that our dictionary has xB = (AB )−1 (b − AN xN ) in it, so we are subtract-
ing (AB )−1 A4 x4 . Our shortlist of leaving variables comes from negative coefficients,
which means we’re looking for positive values in (AB )−1 A4 . The 32 is positive, which
puts our first variable in B = (1, 7) on our shortlist: x1 .
If we had more than one variable on our shortlist, we’d continue by computing
the ratios between the column (AB )−1 A4 we just found, and the column (AB )−1 b
that we computed earlier. But in this case, we can skip that step: x1 is the only
candidate.
So now we know x4 is our entering variable and x1 is our leaving variable. We’re
done, right? We can just go to the next step with B = (4, 7).
Not so fast! We really don’t want to compute (AB )−1 again at each step. (In this
example, it’s only a 2 × 2 matrix inverse, but for larger systems, the inverse is much
harder to compute.) Let’s try to compute the inverse of A(4,7) (the new inverse we
want) from the inverse of A(1,7) (the old inverse we have).
Here’s the idea. Using our old B, we already know all the entries of
    
5
− 21  1 1 1 1 23 0
(A(1,7) )−1 A(1,4,7) = 2 = .
− 32 1
2 3 2 5 0 − 21 1

The column corresponding to x4 , we just computed. The columns corresponding to


x1 and x7 must form an identity matrix by the definition of a matrix inverse.
What we want to see is a result of the form
    
? ? 1 1 1 ? 1 0
(A(4,7) )−1 A(1,4,7) =  =
? ? 3 2 5 ? 0 1

because whatever (A(4,7) )−1 is, multiplying by it must turn the x4 and x7 columns
of A into the identity matrix.
We can figure out what row operations turn (A(1,7) )−1 A(1,4,7) (the first 2 × 3
matrix above) into (A(4,7) )−1 A(1,4,7) (the second 2 × 3 matrix above). To do this, we
multiply the first row by 23 (to turn 32 into 1) and then add half the result to the
second row (to turn − 12 into 0).
But row operations are just matrix multiplication from the left. So those same
row operations will turn (A(1,7) )−1 into (A(4,7) )−1 , which is what we want! We take
(A(1,7) )−1 , multiply the first row by 23 , and then add half the result to the second
row:
   
5
− 12  5
− 13 
(A(1,7) )−1 =  2 ⇝ A−1
(4,7) = 3 .
− 32 1
2 − 23 1
3

2.37.3 A summary of the revised simplex method


Let’s summarize what we did in a set of instructions, so that we can do it again.
0. At the beginning of each pivot step, we should already know B (the sequence
of basic variables) and (AB )−1 (the inverse of the corresponding matrix).
1. We calculate (AB )−1 b (which tells us the current basic feasible solution) and
uT = (cB )T (AB )−1 (which will be useful for calculations).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


132 Chapter 2. Introduction to Linear Optimization

2. To determine the entering variable, we compute reduced costs: as before, we


want a positive reduced cost for maximizing and a negative one for minimizing.
The reduced cost of xi is given by ci − uTAi . We can compute this for all the
variables, but if we’re using Bland’s rule, we can find them one at a time until
we get one with the correct sign.
3. Let xj be the entering variable. We compute (AB )−1 Aj to find the coefficients
of xj in our dictionary. The rules are slightly different due to a negative sign
in our formulas:
• The leaving variables on our shortlist correspond to the positive compo-
nents of (AB )−1 Aj .
• If multiple variables are on our shortlist, choose the one with the smallest
ratio, dividing a component of (AB )−1 b by the corresponding component
of (AB )−1 Aj .
4. Let xk be the leaving variable, and suppose that it’s the ith variable in the list
B. Our new sequence of basic variables will be B ′ where xk is replaced by xj .
Before we begin the next pivot step, we must compute (AB′ )−1 . Here, let I be
the combination of B and B ′ : all the previously basic variables, together with
j.
To do this, find the row reduction steps that take (AB )−1 AI (which should
have pivots in B’s columns) to (AB′ )−1 AI (which should have pivots in B ′ ’s
columns).
Then, apply those steps to (AB )−1 to get (AB′ )−1 .

2.37.4 One more pivot step


Let’s do another pivot step for this problem. Everything will now be in terms of the
basis (4, 7).     
5 1 50
−1 3 − 3 30 3  and
1. We calculate (A(4,7) ) b = 
2 1
   = 40

−3 3 100 3
 
5
− 31 
uT = (c(4,7) )T (A(4,7) )−1 = 5 0
h i h i
25
 3 = − 53 .
− 23 1
3
3

2. To determine the entering variable, we compute the reduced costs, one at a


time. We can skip x1 , since it was just the leaving variable, so we start with
x2 :
25 5 7
 
c2 − uTA2 = 1 − ·1− ·3 = −
3 3  3
25 5 11

c3 − uTA3 = 3 − ·1− ·1 = −
3 3  3
25 5 1

T
c5 − u A 5 = 2 − ·1− ·4 = .
3 3 3
Since c5 has a positive reduced cost, it will be our entering
 variable.   
5
−1 3 − 3  1  13 
1
3. To find the leaving variable, we compute (A(4,7) ) A5 =  = 2 .
− 23 13 4 3
Both rows are positive, so both x4 and x7 are on our shortlist of leaving
variables. Their current values in the basic solution are given by (A(4,7) )−1 b:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.38 Lessons learned 133
50/3 40/3
they’re 50 40
3 and 3 . The ratio is 1/3 = 50 for x4 and 2/3 = 20 for x7 , so x7
leaves the basis.
4. We are turning the basis (4, 7) into (4, 5). To compute the new inverse matrix
(A(4,5) )−1 , we want to find the row reduction that takes

   
1
1 0 1 0 ?
(A(4,7) )−1 A(4,5,7) =  3
2 ⇝ (A(4,5) )−1 A(4,5,7) =  .
0 3 1 0 1 ?

To get there, we must multiply the second row by 32 (to turn the 23 into 1) and
then subtract 13 of that from the first row (to turn 13 into 0). So let’s do the
same things to (A(4,7) )−1 :

   
5
−1 − 13  2 − 12 
(A(4,7) ) = 3
 ⇝ (A(5,7) )−1 = 
− 32 1
3 −1 1
2

We are ready for our next pivot step.

2.38 Lessons learned


The revised simplex method might be obnoxious to do by hand (and I don’t encourage
it, except to make sure that it makes sense). But there’s a few reasons to do what
we’ve done today:
• We can carefully think about the number of operations required for the simplex
method, and how it scales with the number of variables and number of equations.
• Our concerns when designing the revised simplex method—we were kind
of worried about multiplying (AB )−1 AN , and we were really worried about
computing (AB )−1 —are common to many algorithms. You are unlikely to have
to write computer code to implement the revised simplex method; however,
it is much more important to understand what operations are cheap, what
operations are expensive, and how we can avoid the expensive ones.
Later in the semester, the dictionary formulas from the beginning of the lecture will
also be put to new, unexpected uses.

2.39 The terrible trajectory


What is the worst case for the simplex method? How many pivoting steps do we
need?
Today, we’ll show that for all the pivoting rules we know, the worst case is pretty
bad: we can cook up a linear program with only d variables and 2d constraints in
which we take around 2dsteps.
 This is approximately as bad as it could possibly get,
since there are at most 2dd < 4 d possible basic solutions for such a linear program.

First, consider the following linear program (whose feasible region forms a d-

T.Abraha(PhD) @AKU, 2024 Linear Optimization


134 Chapter 2. Introduction to Linear Optimization

dimensional hypercube):

maximize xd
x∈Rd
subject to 0 ≤ x1 ≤ 1
0 ≤ x2 ≤ 1
..
.
0 ≤ xd ≤ 1

This is not actually the worst case for the simplex method, under any reasonable
pivoting rule. The initial basic feasible solution is x = 0, the only variable with
positive reduced cost is xd , and pivoting on xd gets us to an optimal solution within
one step. But tiny modifications will make it much much worse!
First of all, there are some really inefficient trajectories possible in theory: paths
we can take going from (0, 0, . . . , 0, 0) to (0, 0, . . . , 0, 1) that visit every other vertex of
the hypercube in between. Here is an illustration of such a path in the 3-dimensional
case (when the feasible region is a cube):

(0,1,1)

• (1,1,1)
(0,0,1) •
• (1,0,1)
(0,1,0) •
• (1,1,0)
(0,0,0) •

(1,0,0)

We’ll call this path the “terrible trajectory". (Despite the alliteration, this is not es-
tablished terminology.) The terrible trajectory has a fairly simple recursive definition:
to follow it in d-dimensions, first follow the (d − 1)-dimensional terrible trajectory
(keeping xd = 0), then change xd to 1, then follow the (d − 1)-dimensional terrible
trajectory again, but in reverse.
Second, note that this trajectory is actually kind of close to being reasonable for
the linear program we want to solve. Every single step of the terrible trajectory is
neutral with respect to the objective value (it does not change xd ), except for one
step, which increases it. So it’s possible that if we push the corners around a bit,
then every single step of the terrible trajectory will increase the objective value. And
at that point, we’re close to tricking the simplex method into following it.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.40 Tricking Bland’s rule 135

2.40 Tricking Bland’s rule


“Easy mode" is tricking Bland’s rule into following the terrible trajectory. To make
this happen, we modify the linear program as follows:

maximize xd
x∈Rd
subject to 0.1 ≤ x1 ≤ 1 − 0.1
0.1x1 ≤ x2 ≤ 1 − 0.1x1
0.1x2 ≤ x3 ≤ 1 − 0.1x2
..
.
0.1xd−1 ≤ xd ≤ 1 − 0.1xd−1

The value 0.1 could be replaced by any reasonably small constant. The smaller we
make it, the closer we get to the original cube, and if we set it to 0, we just get back
that cube.
Here’s what the terrible trajectory looks like for this linear program, in 3 dimen-
sions. (It’s a bit of a lie, because with the modification, the feasible region is no
longer a perfect cube.)

(0.1,0.99,0.901)

• (0.9,0.91,0.909)
(0.1,0.01,0.999) •
• (0.9,0.09,0.991)
(0.1,0.99,0.099) •
• (0.9,0.91,0.091)
(0.1,0.01,0.001) •

(0.9,0.09,0.009)

You can see that in this trajectory, the objective values steadily increase:

0.001 < 0.009 < 0.091 < 0.099 < 0.901 < 0.909 < 0.991 < 0.999.

It turns out that, with a natural choice of variable ordering, Bland’s rule will end
up picking this trajectory. Let’s first add slack variables to the problem, rewriting
0.1xi−1 ≤ xi ≤ 1 − 0.1xi−1 as

0
i−1 − xi + wi
0.1x =0
0.1xi−1 + xi + w 1 =1
i

Here, the superscript in wi0 and wi1 is not an exponent: it’s an extra index, since we
have 2d slack variables. To explain the naming convention: wi0 is the slack variable
for the lower bound on xi , and when wi0 = 0, xi is close to 0. Meanwhile, wi1 is the
slack variable for the upper bound on xi , and when wi1 = 0, xi is close to 1.
We have to use the two-phase simplex method for this problem, since 0 is not
feasible, but let’s skip ahead and suppose we arrive at the correct basic feasible solution

T.Abraha(PhD) @AKU, 2024 Linear Optimization


136 Chapter 2. Introduction to Linear Optimization

we wanted: the corner point (0.1, 0.01, . . . , 0.1d ). Here, the variables x1 , x2 , . . . , xd are
all basic—and they’ll stay basic forever, because none of them can be 0. In order
to start out at this corner point, our nonbasic variables (corresponding to the tight
constraints) must be w10 , w20 , . . . , wd0 ; the slack variables w11 , w21 , . . . , wd1 are basic.
In each of the basic feasible solutions we can encounter, exactly one of wi0 and
wi1 is basic for each i When xi ≈ 0, wi0 ’s constraint is tight, so wi1 is nonbasic. When
xi ≈ 1, wi1 ’s constraint is tight, so wi1 is nonbasic. Moving from one corner point to
an adjacent one means pivoting so that wi0 enters the basis and wi1 leaves for som i,
or vice versa.
If we put the slack variables in the order

w10 , w11 , w20 , w21 , . . . , wd0 , wd1

then Bland’s rule will pivot on w10 or w11 whenever this improves the objective value,
which is every other step. In between those, it will pivot on w20 or w21 as often as
possible, and so on. In 3 dimensions, the sequence of entering variables will be

w10 , w20 , w11 , w30 , w10 , w21 , w11

The pattern continues in higher dimensions: the subscripts will follow the sequence

1, 2, 1, 3, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 1, 5, 1, 2, 1, 3, 1, 2, 1, . . . ,

which traces out the terrible trajectory.


To prove that this behavior continues in a bit more detail, consider that for as
long as wd0 remains nonbasic, xd will be stuck at 0.1xd−1 , and so maximizing xd will
be equivalent to maximizing xd−1 . This, together with the first d − 1 inequalities in
our linear program, looks exactly like the (d − 1)-dimensional linear program.
As long as pivoting makes sense in the (d − 1)-dimensional linear program, wd0 will
not even be considered as an entering variable, because it has the lowest priority. So
we’ll follow the exact same sequence of pivots as in d − 1 dimensions. After 2d−1 − 1
steps, we’ll end at the point where w10 , w20 , . . . , wd−2
0 1
and wd−1 are nonbasic, and none
of them are worth pivoting on.
That’s the first, and only, step at which we’ll pivot on wd0 . After we do, wd1
becomes nonbasic. Now, instead of having xd = 0.1xd−1 , we have xd = 1 − 0.1xd−1 ,
so maximizing xd is equivalent to minimizing xd−1 . So the simplex method will
continue the first 2d−1 − 1 pivot steps in reverse: for every step that made sense to
maximize xd−1 , undoing it now makes equal sense to minimize xd−1 , and has the
same priority in our variable ordering.
After a total of (2d−1 − 1) + 1 + (2d−1 − 1) steps, we’ll finally reach the point
where w10 , w20 , . . . , wd−1
0 and wd1 are nonbasic, which is our optimal solution.

2.41 The Klee–Minty cube


You may think that this example simply exploits a flaw in Bland’s rule (which, after
all, does not try very hard to pick a good pivot). But a famous linear program called
the Klee–Minty cube shows that the highest-cost pivoting rule is also vulnerable to
the “terrible trajectory".

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.41 The Klee–Minty cube 137

The Klee–Minty linear program in d dimensions is given below (I’ve written the
inequalities backwards to make the pattern easier to see):
maximize 2d−1 x1 + 2d−2 x2 + · · · + xd
x∈Rd
subject to 5 ≥ x1
25 ≥ 4x1 + x2
125 ≥ 8x1 + 4x2 + x3
625 ≥ 16x1 + 8x2 + 4x3 + x4
..
.
5d ≥ 2d x1 + 2d−1 x2 + 2d−2 x3 + · · · + 8xd−2 + 4xd−1 + xd
x1 , x2 , . . . , xd ≥ 0.
This is also shaped kind of like a hypercube in d dimensions (and a cube in 3
dimensions), but it is very distorted. Here is a picture of the “terrible trajectory"
for the Klee–Minty cube, with coordinates given in the figure on the left, and their
objective values on the right. (For this linear program, the shape of the cube is an
incredible lie, but the adjacencies between the corners are the same.)
(0,25,25)

• (5,5,65)
(0,0,125) •
• (5,0,85)
(0,25,0) •
• (5,5,0)
(0,0,0) •

(5,0,0)
75

• 95
125 •
• 105
50 •
• 30
0•

20
The best way to understand why this cube tricks the highest-cost rule is to try doing
it, and see how the reduced costs change. But essentially, this construction exploits a
weakess of the pivoting rule that we’ve already talked about: it’s sensitive to changes
in units. To get the highest-cost rule to pick earlier variables over later ones, it’s
enough to set up the problem so that a very small change in x1 or x2 has the same
effect as a very large change in xd−1 or xd . However, the constraints are set up so
that the distance that it’s possible to go in the xd−1 or xd direction is always much
larger.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


138 Chapter 2. Introduction to Linear Optimization

2.42 Closing remarks


2.42.1 Other pivoting rules
Other pivoting rules which we have not discussed exist. Some of them are slightly
less vulnerable to this strategy: for example, there is the “best neighbor" pivoting
rule, which considers all possible entering variables, and determines which one will
lead to the greatest improvement in the objective function.
The best neighbor pivoting rule cannot be fooled by any of the examples we saw
today. After all, in both examples, the optimal solution is within one pivot step of
the starting solution. The best neighbor pivoting rule will notice this and solve the
problem in one step.
However, this does not mean that the best neighbor pivoting rule is always
guaranteed to be efficient. It’s possible to cook up examples in which it, too, takes
an exponentially long time. If the best neighbor pivoting rule is within one step of
the optimal solution, it will work well; but if it’s within two steps, and the first step
does not look promising, the pivoting rule will remain happily oblivious.
There are complicated pivoting rules out there which have a better worst case:
on a linear program with n inequalities and d√variables, the number of pivot steps
they take is bounded by functions such as C d log n for some constant C, which is
better than exponential. But the upper bound we dream of is “a polynomial function
in n and d", and it’s an open problem whether any pivoting rule can achieve this.

2.42.2 The average case


Shouldn’t we forget all about the worthless simplex method now that we know the
truth about its worst-case behavior?
In fact, there are other algorithms for solving linear programs that are alternatives
to the simplex method, and have better worst-case guarantees on running time. But
linear programming is the one case where it turns out that the worst-case-exponential
algorithm is better than the worst-case-polynomial algorithm.
It is very unusual to encounter examples like the Klee–Minty cube, where it
makes sense to pivot on the same variable over and over and over and over again.
One argument for this is that we don’t encounter cases like this by generating linear
programs at random, but this is not a particularly strong argument: most linear
programs we want to solve aren’t random!
Other studies have shown that the simplex method is efficient under smoothed
analysis: when we take any linear program and introduce a bit of random noise
in the coefficients, the average result is solved efficiently. (You can see how this
would completely destroy our examples today, where the behavior is very sensitive
to comparisons between numbers like 0.0001 and 0.00001.)

2.43 An example of duality


Problem 2.13 You visit a chocolate factory and want to buy as much chocolate as
you can. The factory sells plain chocolate chips for $1 per pint and deluxe chocolate
chips for $2 per pint. You can only carry one pint of chocolate in your hands; if you
want more, you’ll have to buy a bag. Empty bags of all sizes are available; an empty
3-pint bag costs $4, and all other sizes cost a proportional amount.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.43 An example of duality 139

If you have $7, what is the largest amount of chocolate you can buy and take
home with you?
Let x1 be the amount of plain chocolate chips and x2 the amount of deluxe
chocolate chips, in pints (so that we want to maximize x1 + x2 ). Let x3 be the
number of 3-pint bags you buy (if it is a fraction, we assume that you bought some
other size of bag.) Then the amount of money you brought limits these variables
to x1 + 2x2 + 4x3 ≤ 7. Also, you can carry at most 1 + 3x3 pints of chocolate, so
x1 + x2 ≤ 1 + 3x3 , or x1 + x2 − 3x3 ≤ 1.
In summary, we get the linear program below:

maximize x1 + x2
x1 ,x2 ,x3 ∈R
subject to x1 + 2x2 + 4x3 ≤ 7
x1 + x2 − 3x3 ≤ 1
x1 , x 2 , x 3 ≥ 0

Today, we’re going to be too lazy to try to solve this linear program. Instead, we
want to prove some lower and upper bounds on the objective value of the solution.
Lower bounds for a maximization problem are easy to find.
• Setting x1 = x2 = x3 = 0 satisfies both constraints, so clearly we can’t do worse
than an objective value of 0.
• We could try tweaking that: say x1 = 1 and x2 = x3 = 0, then we get an
objective value of 1.
• In general, any feasible solution gives us a lower bound on the objective value.
If we wanted to get good lower bounds this way, we’d start trying to solve the
linear program, which we said we didn’t want to do.
What about upper bounds? Well, here are some ideas:
• x1 + x2 is always less than or equal to x1 + 2x2 + 4x3. So if x1 + 2x2 + 4x3 ≤ 7,
we can immediately conclude x1 + x2 ≤ 7.
• Note that we can’t conclude from x1 + x2 − 3x3 ≤ 1 that x1 + x2 ≤ 1, because
the −3x3 term could potentially make x1 + x2 − 3x3 a lot smaller than x1 + x2 .
• However, if we average the two constraints, we get an improvement:
1 1 1 3 1
(x1 + 2x2 + 4x3 ) + (x1 + x2 − 3x3 ) ≤ (7 + 1) =⇒ x1 + x2 + x3 ≤ 4
2 2 2 2 2
and we always have x1 + x2 ≤ x1 + 32 x2 + 12 x3 , so we conclude that x1 + x2 ≤ 4.
More generally, we could try to combine the two constraints with any coefficients.
As long as u1 ≥ 0 and u2 ≥ 0, we can try to combine the inequalities with weights u1
and u2 to get

u1 (x1 + 2x2 + 4x3 ) + u2 (x1 + x2 − 3x3 ) ≤ 7u1 + u2 .

Rearranging the inequality to group the x1 , x2 , and x3 terms together, we get

(u1 + u2 )x1 + (2u1 + u2 )x2 + (4u1 − 3u2 )x3 ≤ 7u1 + u2 .

This is a valid inequality, but not necessarily a useful one. We want the left-hand
side to be an upper bound on x1 + x2 if we want to apply the same logic that we did

T.Abraha(PhD) @AKU, 2024 Linear Optimization


140 Chapter 2. Introduction to Linear Optimization

earlier. For this to happen, the coefficients of x1 and x2 must be at least 1, and the
coefficient of x3 must be nonnegative. This gives us three constraints on u1 and u2
in order for 7u1 + u2 to be an upper bound.
What is the best upper bound we can find by combining the inequalities in this
way? The answer can be found by solving a different linear program in terms of u1
and u2 :
minimize 7u1 + u2
u1 ,u2 ∈R
subject to u1 + u2 ≥ 1
2u1 + u2 ≥ 1
4u1 − 3u2 ≥ 0
u1 , u 2 ≥ 0

2.44 Weak duality


In matrix form, we can write our constraints in the original problem as Ax ≤ b,
where
   
1 2 4 7
A= b =  .
1 1 −3 1
The combined inequality u1 (x1 + 2x2 + 4x3 ) + u2 (x1 + x2 − 3x3 ) ≤ 7u1 + u2 can be
written in matrix form as
 
 x1   
h i1 2 4   h i 7
 x2  ≤ u1 u2   .
u1 u2 
1 1 −3   1
x3
In matrix notation: starting from Ax ≤ b, we deduced that uTAx ≤ uT b. (As a
reminder, combining these inequalities in this way is only valid provided that u ≥ 0.)
We don’t just want to deduce valid inequalities: we want to deduce useful
inequalities. We want the coefficients of x1 , x2 , x3 on the left-hand side to be at least
as big as the coefficients of x1 , x2 , x3 in the objective function x1 + x2 , so that we
get an upper bound on x1 + x2 . The three constraints u1 + u2 ≥ 1, 2u1 + u2 ≥ 1, and
4u1 − 3u2 ≥ 0 can be written in matrix form as
 
h i1 2 4 h i
u1 u2  ≥ 1 1 0 .
1 1 −3
h i
If cT = 1 1 0 is the cost vector in our objective function, then these constraints
can be written as uTA ≥ cT . (We can also take the transpose of both sides, and
write AT u ≥ c.) You can see that in the two linear programs we wrote down in the
previous section, the matrix of coefficients of u is the transpose of the matrix of
coefficients of x.
We can do this for any linear program. Write down a general linear program in
the form
cT x

 maximize
n x∈R

 
(P) subject to Ax ≤ b




x≥0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.45 Duals of other kinds of programs 141

where A is an m × n matrix, b ∈ Rm , and c ∈ Rn . The linear program of “what is


the best upper bound we can deduce on (P) by taking a linear combination of its
inequalities?" is called the dual linear program, and has the form below:

uT b bT u

 minimize minimize
 u∈Rm u∈Rm


(D) subject to uTA ≥ cT ⇐⇒ subject to AT u ≥ c


u≥0 u≥0

(When (D) is the dual linear program of (P), we call (P) the primal linear
program.)
The dual linear program above is written in two forms. On the right, we took
the transpose of both sides, putting it into a form more usual for linear programs.
But when we think about the dual relationship between (P) and (D), it’s more
convenient to use the formulation on the left, because then the dual program is
distinguished by being in terms of a row vector uT instead of a column vector u.
The reasoning by which (D) gives upper bounds for (P) holds in general. Formally,
this relationship is called weak duality, and is summarized in the theorem below:
Theorem 2.7 — Weak duality of linear programs. For any x ∈ Rn which is feasible
for the primal linear program (P) (or primal feasible) and for any u ∈ Rm which
is feasible for the dual linear program (D) (or dual feasible), we have cT x ≤ uT b.
In particular, the objective value of the dual optimal solution is an upper bound
for the objective value of the primal optimal solution (assuming both optimal
solutions exist).

Proof. Since x is primal feasible, we have Ax ≤ b. Since u is dual feasible, we


have u ≥ 0. Therefore the inequality uTAx ≤ uT b is valid: we have multiplied
the inequalities in Ax ≤ b by nonnegative coefficients u1 , . . . , um and added them
together.
Since u is dual feasible, we have uTA ≥ cT . Since x is primal feasible, we have
x ≥ 0. By the same logic as above, we can deduce that uTAx ≥ cT x: again, we have
multiplied the inequalities in uTA ≥ cT by nonnegative coefficients x1 , . . . , xn , then
added them together.
Putting these together, we get cT x ≤ uTAx ≤ uT b, so cT x ≤ uT b. ■

2.45 Duals of other kinds of programs


So far we’ve discussed starting with a primal program that’s a maximization problem
with nonnegative variables and an Ax ≤ b constraint. Duality is more general than
this: it can handle any kind of linear program. The only thing that never changes is
that

Variables in one program correspond to constraints in the other.

Just to give a few examples of how things change:


• Suppose that we drop the requirement that x1 ≥ 0 in our original linear program.
(We can buy negative chocolate chips, which have negative weight and cost
negative money.)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


142 Chapter 2. Introduction to Linear Optimization

As before, an expression such as 12 x1 + 2x2 is not an upper bound on x1 + x2 ,


because 12 x1 might be less than x1 : if x1 is large and x2 is small, then 12 x1 +2x2 <
x1 + x2 . However, this time, an expression such as 2x1 + 2x2 is also not an
upper bound on x1 + x2 : if x1 is a large negative number, then 2x1 < x1 .
So we see that in any inequality which gives an upper bound on x1 + x2 , the
coefficient of x1 has to be exactly 1. Our inequality u1 + u2 ≥ 1 would become
u1 + u2 = 1.
In general, an unconstrained variable gives us = constraints in the dual linear
program.
• Suppose that we reverse the first constraint to say x1 + 2x2 + 4x3 ≥ 7. (We
have unlimited money and must spend at least $7 of it.)
In this case, if we still want upper bounds on some expression in terms of x1
and x2 , we have to multiply this constraint by a negative coefficient to reverse
the inequality. Instead of wanting u1 ≥ 0, we’d want u1 ≤ 0.
In general, a ≥ constraint gives us a nonpositive variable in the dual linear
program.
• Suppose that the primal program asks to minimize x1 +x2 instead of maximizing
it.
This changes everything, because now we are trying to get lower bounds
instead of upper bounds. In particular, the relationship between (P) and (D)
is reversed: a feasible solution for (P) will always have a greater or equal
objective value compared to a feasible solution for (D).
Now ≤ constraints in (P) correspond to nonnegative variables in (D) (they
are the “natural" kind of constraint when we’re minimizing) and ≥ constraints
in (P) correspond to nonpositive variables in (D).
Actually, the relationship between (P) and (D) is symmetric: if (D) is the dual
of (P), then (P) is the dual of (D). It’s easiest to describe the duality relationship
as a relationship between a maximization problem and a minimization problem,
never mind which one of them was the primal and which was the dual.
With that in mind, here is the complete list of possible correspondences betwene
a constraint in one problem and a variable in the other:

Maximization problem Minimization problem


≤ constraint variable ≥ 0
= constraint unconstrained variable
≥ constraint variable ≤ 0
variable ≥ 0 ≥ constraint
unconstrained variable = constraint
variable ≤ 0 ≤ constraint

Memorizing the rules in the table is possible, but it probably isn’t very satisfying.
It is healthier to practice figuring out the correspondence for yourself, by asking the
questions in the examples above: how can we combine the constraints of the primal
problem to get bounds on its optimal value, of whichever kind makes sense?

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.46 Strong duality 143

2.46 Strong duality


2.46.1 A stronger theorem
In fact, a stronger relationship between (P) and (D) holds, which is appropriately
enough called strong duality. It says that:
Theorem 2.8 — Strong duality of linear programs. If either one of (P) or (D)
has an optimal solution, then so does the other one. The objective values of the
optimal solutions are equal.

In other words, the dual program is good at finding bounds on the primal program:
the best bound it finds is exactly correct.
We have not yet proved strong duality. (We will see a proof later.)
However, keep in mind the word “if" at the beginning of this theorem. We are not
guaranteed that a linear program has an optimal solution: it could be unbounded,
or infeasible!
In fact, just from weak duality, we can already deduce a relationship between
unbounded and infeasible linear programs.
• Suppose that (P) hasT a feasible solution x. Then we know that for every dual
feasible u, we have c x ≤ u b. Therefore uT b cannot be arbitrarily low: it is
T

bounded below by whatever cT x is. So (D) cannot be unbounded.


Conversely, if (D) is unbounded, it tells us that (P) is infeasible.
• similar reasoning, any dual feasible u proves that (P) cannot be unbounded.
By
Therefore if (P) is unbounded, (D) must be infeasible.
(It is also possible for both (P) and (D) to be infeasible in exceptionally unfortunate
cases.)

2.46.2 Examples with infeasible primal and dual


Here are some very simple examples that illustrate all three possibilities where
(P) or (D) is infeasible. (I have labeled each constraint in one program with the
corresponding variable in the other program, which we’re going to keep doing in the
future for all our primal-dual pairs.)
In the pair
 
maximize x  minimize −u
x∈R  u∈R

 


(P) subject to x ≤ −1 (u) (D) subject to u ≥ 1 (x)

 

x≥0 u≥0

 

the primal program is infeasible (we can’t have x ≤ −1 and x ≥ 0 at the same time)
and the dual program is unbounded (by setting u to be very large, we make −u very
small).
We can get an example where the primal program is infeasible and the dual
program is unbounded simply by reversing the roles of the two programs. Or, if we
want to keep (P) a maximization problem and (D) a minimization problem, we

T.Abraha(PhD) @AKU, 2024 Linear Optimization


144 Chapter 2. Introduction to Linear Optimization

could do a slight variant of the example above:


 
 maximize y minimize v
y∈R  v∈R

 


(P) subject to −y ≤ 1 (v) (D) subject to −v ≥ 1 (y)

 

v≥0
 
y≥0
 

Here, any nonnegative y is primal feasible, but no v is dual feasible.


To get an example where both linear programs are infeasible, just combine these
two examples:
 
maximize x+y  minimize −u + v
x,y∈R u,v∈R

 


 

 
subject to x ≤ −1 (u) subject to u ≥ 1 (x)
 
(P) (D)
−y ≤ 1 (v) −v ≥ 1 (y)

 


 


 


x, y ≥ 0 
u, v ≥ 0

Here, the primal is infeasible because we can’t choose a value of x, and the dual is
infeasible because we can’t choose a value of v.

2.47 A solution we suspect to be optimal


2.47.1 A shipping problem
Problem 2.14 You own two chocolate stores: one in Atlanta, and one in Seattle.
They buy chocolate chips from three different factories. The store in Atlanta is
bigger, and buys 10 pounds of chocolate chips a day; the store in Seattle only buys 5
pounds of chocolate chips a day.
Since the two stores are very far apart, shipping costs are very different. They
are given by the table below:

from Factory #1 from Factory #2 from Factory #3


to Atlanta $7/lb $12/lb $10/lb
to Seattle $10/lb $12/lb $20/lb

Additionally, each factory can ship at most 6 pounds of chocolate chips per day
(total).
What is the most cost-efficient way to supply both stores with chocolate chips?
To model this linear program, our first step is to understand the variables. What
quantities do we need to know to specify how we’re supplying both stores? We need
a variable telling us how many pounds of chocolate chips are shipped from each
factory to each store.
Let’s write a1 , a2 , a3 for the amount shipped from factories 1, 2, 3 respectively to
Atlanta, and s1 , s2 , s3 for the amount shipped from factories 1, 2, 3 respectively to
Seattle. These are all nonnegative variables.
We have two “demand constraints": each store needs a certain amount of chocolate.
We can write these as a1 + a2 + a3 = 10 and s1 + s2 + s3 = 5. We also have three
“supply constraints": each factory can ship at most 6 pounds of chocolate per day.
We can write these as a1 + s1 ≤ 6, a2 + s2 ≤ 6, and a3 + s3 ≤ 6. We must minimize

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.47 A solution we suspect to be optimal 145

the total cost of shipping, which we can get by multiplying the cost per pound in
each entry of the table by the amount shipped from that factor to that store.
This gives us the primal linear program (P) below:


 minimize 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3
a,s∈R3




subject to a1 + a2 + a3 = 10 (u1 )






s1 + s2 + s3 = 5 (u2 )




(P) a1 + s1 ≤ 6 (v1 )



a2 + s2 ≤ 6 (v2 )










 a3 + s3 6 (v3 )

a1 , a 2 , a 3 , s 1 , s 2 , s 3 ≥ 0

2.47.2 Taking the dual


To practice what we learned in the previous lecture, and to prepare for the next
question, let’s find the dual of this linear program. I’ve chosen variables u1 , u2 for
the two supply constraints and v1 , v2 , v3 for the three demand constraints; these are
written above in parentheses next to the constraint they “own".
Because the primal is a minimization problem, the dual will be a maximization
problem: we will maximize the lower bound on the cost in (P) that we can prove by
combining its constraints. Mechanically applying the rules in the previous lecture,
we can derive the following dual:


 maximize 10u1 + 5u2 + 6v1 + 6v2 + 6v3
u∈R2 ,v∈R3









 subject to u1 + v1 7 (a1 )

u1 + v2 ≤ 12 (a2 )






u1 + v3 ≤ 10 (a3 )

(D)
u2 + v1 ≤ 10 (s1 )









 u2 + v2 ≤ 12 (s2 )






 u2 + v3 20 (s3 )

v1 , v2 , v3 ≤ 0

Let’s also try to understand how these lower bounds work, so that we can better
understand those rules.
A working lower bound for (P) would be an inequality P a1 + Qa2 + Ra3 +
Ss1 + T s2 + U s3 ≥ X, where P, Q, R, S, T, U are less than the costs 7, 12, 10, 10, 12, 20
respectively. This would make P a1 + Qa2 + Ra3 + Ss1 + T s2 + U s3 a lower bound on
the primal objective function 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3 , which means X
would also be a lower bound on that primal objective function. (Since all variables in
(P) are nonnegative, it’s okay if some coefficients P, Q, R, S, T, U are less than their
corresponding costs.) This gives us all six constraints in (D).
The objective function in (D) comes from seeing what the lower bound X will
be if we multiply the constraints in (P) by coefficients u1 , u2 , v1 , v2 , v3 and add them
up. Since we want the most informative (and therefore greatest) lower bound, we
want to maximize.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


146 Chapter 2. Introduction to Linear Optimization

The trickiest part is understanding the types of variables (D) has. I’ve written
v1 , v2 , v3 ≤ 0, and this is not a typo: v1 , v2 , v3 really are nonpositive variables. Why?
It’s because the corresponding constraints a1 + s1 ≤ 6, a2 ≤ s2 ≤ 6, and a3 + s3 ≤ 6
are all “≤" inequalities: they give upper bounds. To turn them into lower bounds
we want, we need to multiply them by a negative number to flip them.
Similarly, u1 , u2 are unconstrained, because the equations can be multiplied by
any coefficient: positive or negative.
Here’s a few feasible solutions to (D) to look at. First, suppose we take u1 = 7,
u2 = 10, and v1 = v2 = v3 = 0. This correspond to adding together 7a1 +7a2 +7a3 = 70
and 10s1 + 10s2 + 10s3 = 50 to get

7a1 + 7a2 + 7a3 + 10s1 + 10s2 + 10s3 = 120.

Since the objective function 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3 is at least as big
as the left-hand-side of the equation above, 120 is a lower bound on the objective
value.
If we increased u1 to 10, then that wouldn’t be true: the coefficient of a1 would be
too big. But supposed we fixed that by setting v1 = −3: subtracting 3a1 + 3s1 ≤ 18.
We’d get:

10(a1 + a2 + a3 ) + 10(s1 + s2 + s3 ) − 3(a1 + s1 ) ≥ 10 · 10 + 10 · 5 − 3 · 6


7a1 + 10a2 + 10a3 + 7s1 + 10s2 + 10s3 ≥ 132.

This lets us deduce a better lower bound!

2.47.3 An example of complementary slackness


Problem 2.15 Adding on to the previous problem, suppose that historically, the store
in Atlanta opened first. You did the clear optimal thing and bought 6 pounds of
chocolate chips from factory #1 (with the cheapest price) and 4 pounds from factory
#3 (with the second-cheapest price).
Then, the store in Seattle opened. Since factory #1 has no more chocolate chips,
you decided to ship 5 pounds from factory #2. This seems reasonable, but now
you’re not sure. Is this the most cost-effective way to supply both stores?
In other words, is the solution (a1 , a2 , a3 , s1 , s2 , s3 ) = (6, 0, 4, 0, 5, 0), with objective
value 142, optimal?
We already have some ways of checking that. We could try to find a dictionary
which has this as its basic feasible solution, for example, and find the reduced costs.
But let’s explore another option. If there is a feasible solution to (D) with objective
value 142, then that would prove that we’ve found the optimal solution to (P).
In fact, by looking at our solution to (P), we can make some deductions about
what (D) has to do. They come in two types.
Deduction 1. Suppose that our dual solution manages to prove the inequality

P a1 + Qa2 + Ra3 + Ss1 + T s2 + U s3 ≥ 142

for some P, Q, R, S, T, U . In general, this inequality only needs to have P ≤ 7, Q ≤ 12,


and so on, to be a lower bound on (P)’s objective function. However, we can argue
that actually, since a1 = 6 in the primal solution, its coefficient P must be exactly

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.48 Complementary slackness 147

7. If the coefficient of P were a smaller number like 6.5, it would mean that our
dual solution would still prove a lower bound of 142 when the price of shipping from
factory #1 to Atlanta dropped to $6.50 per pound. But that’s impossible, since we
know that our solution to (P) gets cheapper by $3 in that case!
Similarly, the coefficients of a3 and s2 must be exact. This tells us that three of
the inequalities in (D) must actually be equations if we are to match the bound of
142: we must get u1 + v1 = 7, u1 + v3 = 10, and u2 + v2 = 12.
Deduction 2. In our solution to (P), the constraint a2 + s2 ≤ 6 is slack: actually,
a2 + s2 = 0 + 5 < 6. This means that if we use this constraint to prove an inequality

P a1 + Qa2 + Ra3 + Ss1 + T s2 + U s3 ≥ 142,

then for our solution to (P), it will actually prove a strict inequality with <. This is
impossible: it would prove that our primal solution has objective value strictly less
than 142, which is false.
Therefore we shouldn’t use that constraint in our hypothetical lower bound of
142: we should have v2 = 0 in the dual solution we want. Similarly, since a3 + s3 ≤ 6
is slack in our solution to (P), we should have v3 = 0 in the solution to (D) we’re
looking for.
Combining the two deductions: since u2 + v2 = 12 and v2 = 0, we want u2 = 12.
Since u1 + v3 = 10 and v3 = 0, we want u1 = 10. Finally, since u1 + v1 = 7 and u1 = 10,
we want v1 = −3.
This is all resting on the hypothetical assumption that our solution to (P)
is optimal and has a matching lower bound based on a solution to (D). So
it is extremely important to check our work: is the resulting dual solution
(u1 , u2 , v1 , v2 , v3 ) = (10, 12, −3, 0, 0) actually a feasible solution for (D)?
It turns out that yes: this solution satisfies all six constraints in (D), and has
an objective value of 10 · 10 + 12 · 5 − 3 · 6 = 142. Therefore our primal solution is
optimal: the shipping plan does not need to be changed!
(If we had started with a suboptimal solution to (P), we would have gotten a
dual solution that fails this final check; that’s why checking is so important.)

2.48 Complementary slackness


The technique we used in the problem above is called complementary slackness.
Complementary slackness is a limitation on what can happen if we have a feasible
solution to (P) and a feasible solution to (D) with the same objective value (in
which case they’re both optimal). In words, it says the following:
• Whenever our feasible solution to (P) has a slack constraint (the two sides of
the inequality are not equal), the corresponding dual variable must be 0 in our
feasible solution to (D).
In other words, whenever a dual variable is not zero, the corresponding primal
constraint must be tight: the two sides must be equal.
• Whenever our feasible solution to (D) has a slack constraint (the two sides of
the inequality are not equal), the corresponding primal variable must be 0 in
our feasible solution to (P).
In other words, whenever a primal variable is not zero, the corresponding dual
constraint must be tight: the two sides must be equal.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


148 Chapter 2. Introduction to Linear Optimization

The proof is just the sort of reasoning we used in our deductions above, but
generalized. Let’s consider one specific case: when the primal and dual have the
form
cT x uT b
 
maximize
 minimize

 x∈Rn
  u∈Rm

(P) subject to Ax ≤ b (D) subject to uTA ≥ cT

 


x≥0 
u≥0
The proof is not significantly different in all other cases, there are just a lot of cases
to check.
Theorem 2.9 — Complementary slackness. Suppose that we have a feasible solution
x for (P) and a feasible solution uT for (D) with cT x = uT b. Then the following
relationship holds:
• For all i, either (Ax)i = bi orTui = 0.
• For all j, either xj = 0 or (u A)j = ci.
Proof. Recall our proof of weak duality: we showed that cT x ≤ uT b by showing that
cT x ≤ uTAx ≤ uT b. So if cT x = uT b, then we must have equality in the second
equation as well: cT x = uTAx = uT b.
We can rewrite uTAx = uT b as uT (b − Ax) = 0. This is a dot product which we
can expand as a sum: we must have
m
X
ui (bi − (Ax)i ) = 0.
i=1

In every term, we must have ui ≥ 0 (since u is feasible for (D)) and bi − (Ax)i ≥ 0
(since x is feasible for (P)). So every term of the sum is nonnegative, and the
only way for the sum to be 0 is to have every term equal to 0. Therefore for all i,
ui (bi − (Ax)i ) = 0, which means that either (Ax)i = bi or ui = 0.
This proves the first bullet point. For the second bullet point, we use the same
reasoning, but applied to the equation cT x = uTAx, rewritten as (uTA−cT )x = 0. ■
As we saw in today’s example, complementary slackness can be useful when we
have a candidate solution, and we want to know whether it is optimal. (Note that
if we find a feasible solution x to (P) and a feasible solution u to (D) such that
cT x = uT b, then weak duality automatically tells us that both solutions are optimal!)

2.49 The problem


Problem 2.16 You own two chocolate stores: one in Atlanta, and one in Seattle.
They buy chocolate chips from three different factories. The store in Atlanta is
bigger, and buys 10 pounds of chocolate chips a day; the store in Seattle only buys 5
pounds of chocolate chips a day.
Since the two stores are very far apart, shipping costs are very different. They
are given by the table below:
from Factory #1 from Factory #2 from Factory #3
to Atlanta $7/lb $12/lb $10/lb
to Seattle $10/lb $12/lb $20/lb

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.50 The primal LP 149

Additionally, each factory can ship at most 6 pounds of chocolate chips per day
(total).
What is the most cost-efficient way to supply both stores with chocolate chips?

2.50 The primal LP



 minimize 7a1 + 12a2 + 10a3 + 10s1 + 12s2 + 20s3
a,s∈R3




subject to a1 + a2 + a3 = 10 (u1 )





s1 + s2 + s3 = 5 (u2 )




(P) a1 + s1 ≤ 6 (v1 )



a2 + s2 ≤ 6 (v2 )










 a3 + s3 6 (v3 )

a1 , a 2 , a 3 , s 1 , s 2 , s 3 ≥ 0

2.51 The dual LP



 maximize 10u1 + 5u2 + 6v1 + 6v2 + 6v3
2 3
 u∈R ,v∈R





subject to u1 + v1 7 (a1 )




u1 + v2 ≤ 12 (a2 )






u1 + v3 ≤ 10 (a3 )

(D)
u2 + v1 ≤ 10 (s1 )









 u2 + v2 ≤ 12 (s2 )






 u2 + v3 20 (s3 )

v1 , v2 , v3 ≤ 0

2.52 Finding the dual solution from the dictionary


In the previous lecture, we discussed complementary slackness, which lets us find the
optimal dual solution given the optimal primal solution. This still involves a little
bit of work solving the equations.
In practice, we might have an optimal dictionary and not just an optimal primal
solution. If this is the case, then there is a more concrete expression we can give for
the dual solution. Actually, there are two ways to find the dual solution:

• one general way that always works;


• one very quick method that works for linear programs that started in Ax ≤ b
form and added slack variables.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


150 Chapter 2. Introduction to Linear Optimization

2.52.1 A general formula


The simplex method is applied to problems in equational form. Our dual in this case
looks like the following, with u unconstrained (each ui can be positive or negative):

cT x

maximize 
uT b

 x∈Rn minimize
 

(P) (D) u∈Rm
subject to Ax = b subject to uTA ≥ cT

 


x≥0
Recall that we have a formula for the dictionaries we get through the simplex method.
If we choose basic variables B and nonbasic variables N , then the corresponding
dictionary is

(cN )T − uTAN xN
 
ζ = uT b +
xB = (AB )−1 b − (AB )−1 AN xN

where uT = cB T (AB )−1 . It is not a coincidence that the vector u used in this formula
was given the same letter as the vector u we are using for the dual solution: they
are the same!
More precisely, suppose that we have achieved an optimal dictionary for maxi-
mizing cT x. This means that our reduced costs are all less than or equal to 0: we
have no variables left worth pivoting on. In other words, (cN )T − uTAN ≤ 0, or
uTAN ≥ (cN )T . This looks a lot like the constraints in (D): more precisely, it is the
constraints, but only the ones indexed by N .
What about the constraints indexed by B? These constraints correspond to the
basic variables, which are probably positive in our optimal solution, so we expect
them to be satisfied with equality: we expect that uTAB = (cB )T . This is also true,
since uT = (cB )T (AB )−1 .

2.52.2 Strong duality


The formula we’ve just found has a theoretic use and not just a practical one. We
can use it to show that when (P) has an optimal solution x, the dual solution u we
find satisfies uT b = cT x: the primal and dual solutions have the same objective value.
To prove this, we need to remember our formula for the primal optimal solution we
read off from the dictionary: we set xN = 0, and get xB = (AB )−1 b. Therefore

cT x = (cB )T xB + (cN )T xN = (cB )T (AB )−1 b + (cN )T 0 = uT b.

This is a proof of strong duality in a special case:


Theorem 2.10 — Strong duality. Whenever (P) has an optimal solution x, it is
also true that (D) has an optimal solution uT with the same objective value.

The general case of strong duality can be deduced from this one, since all linear
programs can be put into equational form. (The proof is not automatic, since when
we have a linear program in two forms, its dual also has two forms, so optimal dual
solutions also look different. We would need to check that the dual solution we got
from the simplex method can be used to “recover" a dual solution for the dual of the
original linear program.)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.52 Finding the dual solution from the dictionary 151

2.52.3 A special case


It is worthwhile to look at one other case for (P): where we start with inequalities
Ax ≤ b, and add slack variables to put it in equational form. Written as a matrix
equation, the equational form of Ax ≤ b is Ax + Iw = b. Something silly happens
with the dual when we make this change:

cT x uT b

 maximize 
 minimize
x∈Rn ,w∈Rm  u∈Rm

 

(P) subject to Ax + Iw = b (D) subject to uTA ≥ cT
 
uT ≥ 0T
 
x, w ≥ 0
 

When we add slack variables, (D) still has nonnegativity constraints on u, but
instead of being treated separately as nonnegativity constraints, they are simply the
constraints corresponding to the primal variables w.
Looking at our dictionary formula, we can notice that if xi is a nonbasic variable,
then its reduced cost xi is given by ci − uT Ai : the right-hand side of the dual
constraint corresponding to xi , minus the left-hand side of that constraint. This is
also true if xi is a basic variable, assuming that we consider the reduced cost of a
basic variable to be 0.
What if we do this for a slack variable? The dual constraint corresponding to
slack variable wi is just the constraint ui ≥ 0. The right-hand side minus the left-hand
side is just equal to −ui . So we deduce a simplified rule for finding uT :
Theorem 2.11 If (P) started out in the inequality form Ax ≤ b, then an optimal
solution u for (D) can be read off from the optimal dictionary for (P) by taking
the negatives of the reduced costs of the slack variables.

2.52.4 Examples
Let’s start with an example in equational form. Take the following primal-dual pair:

  minimize 4u1 + 10u2
maximize x1 + 2x3 − x4





 u1 ,u2 ∈R
x∈R4
 
subject to u1 + u2 ≥ 1(x1 )

 

 
(P) subject to x1 + x2 + x3 + x4 = 4(u1 )
 
(D)  u1 + 2u2 ≥ 0(x2 )

 x1 + 2x2 + 3x3 + 4x4 = 10(u2 ) 
u1 + 3u2 ≥ 2(x3 )

 

 
x1 , x 2 , x 3 , x 4 ≥ 0

 


u1 + 4u2 ≥ −1(x4 )

To get started finding the optimal dual solution, it’s enough for me to tell you that
in the optimal primal solution, x1 and x3 are basic; you don’t even need to know
what their values are! Then we use the formula uT = (cB )T(AB )−1 to compute
 −1  
3
h i h i1 1 h i
 2
− 21  h
1 1
i
u1 u2 = 1 2  = 1 2 = .
1 3 − 12 1
2
2 2

Therefore (u1 , u2 ) = ( 21 , 12 ) is the optimal solution.


Now let’s look at an example with slack variables. Here, we’ll actually need
to know the optimal dictionary; on the other hand, we will not have to do matrix

T.Abraha(PhD) @AKU, 2024 Linear Optimization


152 Chapter 2. Introduction to Linear Optimization

inverse calculations. Take the following example:



 maximize 2x + 3y 
minimize 3u1 + 2u2 + 7u3



 x,y∈R 

  u1 ,u2 ,u3 ∈R
subject to −x + y ≤ 3(u1 )

 

 
(D) subject to −u1 + u2 + u3 ≥ 2(x)

(P)  x − 2y ≤ 2(u2 )
  u1 − 2u2 + u3 ≥ 3(y)
x + y ≤ 7(u3 )

 

 
u1 , u2 , u3 ≥ 0

 


x, y ≥ 0

We add slack variables to (P), solve it, and end up at the following optimal dictionary:
1 5
max ζ = 19 − 2 w1 − 2 w3
1 1
x = 2 + 2 w1 − 2 w3
1 1
y = 5 − 2 w1 − 2 w3
3 1
w2 = 10 − 2 w1 − 2 w3

The reduced costs of w1 and w3 are − 12 and − 52 , telling us that in the dual optimal
solution, u1 = 12 and u3 = 52 . What about w2 ? It’s a basic variable, so its reduced
cost is automatically 0. Therefore (u1 , u2 , u3 ) = ( 12 , 0, 25 ) is an optimal solution to
(D).

2.53 The dual simplex method


It would be fair to complain: in all of these examples, once we’ve solved (P), why
do we care about the values of (D)? We will see some meaning to those values in
future lectures. Today, we will look at a surprising use that the dual solution has,
even when we don’t care what it is.
Here (on the left) is a linear program we looked at earlier in the semester:

minimize 4.5x1 + 3x2


x1 ,x2 ∈R min ζ = 0 + 4.5x1 + 3x2
subject to x1 + x2 ≥ 5
w1 = −5 + x1 + x2
3x1 + x2 ≥ 7
w2 = −7 + 3x1 + x2
x1 + 2x2 ≥ 6
w3 = −6 + x1 + 2x2
x1 , x 2 ≥ 0

On the right is a very bad initial dictionary for it. It is not feasible: every single
basic variable has a negative value!
But let’s look at the bright side: all the reduced costs are positive! This is just
what we want to see in a minimization problem. It would indicate that we’ve found
an optimal solution. . . if it weren’t for that pesky “not actually feasible" problem. . .
Now that we know about finding dual solutions from the dictionary, we know
that these reduced costs are exactly the information we need to know that the dual
solution we can extract from it is feasible. More precisely, the dual solution here has
(u1 , u2 , u3 ) = (0, 0, 0); the dual constraints are u1 +3u2 +u3 ≤ 4.5 and u1 +u2 +2u3 ≤ 3,
and they are satisfied with a slack of 4.5 and 3, which are precisely the reduced costs
in our dictionary. Our “optimal-but-not-feasible" solution to the primal corresponds
to a feasible (but not optimal) dual solution!

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.53 The dual simplex method 153

The dual simplex method takes this idea and runs with it. Call a dictionary
dual feasible if all the reduced costs are the correct sign for optimality. We will start
with a dual feasible dictionary, and do pivot steps that preserve dual feasibility, while
getting the dictionary closer to ordinary (primal) feasibility. To do this, the overall
strategy is: choose a basic variable whose value is negative to leave the
basis, then choose an entering variable so that dual feasibility is preserved.
In this example, we’re spoiled for choice in leaving variables: all three of w1 , w2 , w3
are negative. Let’s pick w1 for no good reason. Meanwhile, we don’t know how to
choose an entering variable yet, so let’s try both. Here are the two dictionaries we
can get if either x1 (left) or x2 (right) enters the basis:

min ζ = 22.5 + 4.5w1 − 1.5x2 min ζ = 15 + 1.5x1 + 3w1


x1 = 5 + w1 − x2 x2 = 5 − x1 + w1
w2 = 8 + 3w1 − 2x2 w2 = −2 + 2x1 + w1
w3 = −1 + w1 + x2 w3 = 4 − x1 + 2w1

Choosing x1 is bad: we end up losing dual feasibility. On the other hand, dual
feasibility is preserved if we pivot on x2 . What are the rules we have to follow to
make this decision in general?
1. First, we have to pick an entering variable with a positive coefficient in
the leaving variable’s equation. In this example, both x1 and x2 had this
property, so we didn’t notice.
The reason for this rule is to make sure that our leaving variable ends up with
the correct sign of reduced cost. When we did the substitution of either x1 or
x2 in ζ’s equation, the old reduced cost was multiplied by the coefficient of w1 ,
so that coefficient had to be positive.
This is our “shortlist for entering variables", analogous to the “shortlist for
leaving variables" in the ordinary simplex method.
2. When multiple entering variables satisfy this property, we should compare
ratios. Specifically, we compute the ratio

reduced cost of variable


coefficient in leaving variable’s row

and choose the entering variable with the smallest ratio. This is the calculation
we get if we track what happens to the reduced costs of other variables, and
make sure that they stay positive.
In a maximization problem, dual feasibility means negative reduced costs, and
we want to keep them negative. In that case, all these ratios will be negative,
and we want to pick the least negative ratio (the one closest to 0). In other
words, if we take absolute values first, the rule stays the same.
Let’s do another pivot from the dictionary where x2 is a basic variable. The
only negative basic variable in that dictionary is w2 , so let’s make w2 the leaving
variable to fix that. In the equation w2 = −2 + 2x1 + w1 , both x1 and w1 have positive
coefficients. We compute the ratios: x1 ’s ratio is 1.5 3
2 = 0.75 and w1 ’s ratio is 1 = 3.
This means x1 should be our entering variable, since its ratio is smaller.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


154 Chapter 2. Introduction to Linear Optimization

Our new dictionary is

min ζ = 16.5 + 0.75w2 + 2.25w1


x2 = 4 − 0.5w2 + 1.5w1
x1 = 1 + 0.5w2 − 0.5w1
w3 = 3 − 0.5w2 + 2.5w1

and it is both feasible and dual feasible. So we’ve found the optimal solution! It is
(x1 , x2 ) = (1, 4) with objective value 16.5.
We will see many uses of the dual simplex method in the future, but there is
one practical use we can see directly from this example. If we used the ordinary
simplex method, we would have had to do two phases, because we don’t have an
initial basic feasible solution! Meanwhile, we do have a initial basic solution which is
dual feasible, so the dual simplex method is easier to start.
Under the hood, the dual simplex method is actually applying the simplex method
to the dual linear program. However, we don’t have to know that to know what is
going on: we don’t have to know what the dual constraints are, or the values of the
dual variables.

2.54 Example: another look at a terrible cube


With the dual simplex method in our toolbox, there are now twice as many linear
programs that we can solve in one phase. Suppose we have a linear program with
constraints Ax ≤ b, for which we decide to write down an initial dictionary where
the slack variables are basic. Then:
• Provided b ≥ 0, this initial dictionary is feasible, and so we can use the ordinary
simplex method.
• Provided we are minimizing cT x where c ≥ 0 (or, equivalently, maximizing
T
c x where c ≤ 0), this initial dictionary is dual feasible, and so we can use the
dual simplex method.
Together, these two cases cover many practical examples, but we’ve already seen
problems that don’t fit in either category. For example, take the linear program
below, which is the 3-dimensional version of the problem we used to make Bland’s
rule take exponentially many steps:

max ζ = 0 + x3
maximize x3 w1 = −0.1 + x1
x1 ,x2 ,x3 ∈R
subject to 0.1 ≤ x1 ≤ 1 − 0.1 w1′ = 0.9 − x1
0.1x1 ≤ x2 ≤ 1 − 0.1x1 w2 = 0 − 0.1x1 + x2
0.1x2 ≤ x3 ≤ 1 − 0.1x2 w2′ = 1 − 0.1x1 − x2
x1 , x 2 , x 3 ≥ 0 w3 = 0 − 0.1x2 + x3
w3 = 1 − 0.1x2 − x3

If we write each pair of inequalities LB ≤ xi ≤ UB as LB + wi = xi and xi + wi′ = UB


(where LB and UB stand in for our various creative lower and upper bounds on xi )
then we can write down an initial dictionary in terms of six slack variables. But this

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.55 The two-phase dual simplex method 155

dictionary is neither feasible (since w1 < 0) nor dual feasible (since the reduced cost
of x3 is positive).

We could handle this using a two-phase method where we add an artificial variable.
But let’s see a different method of accomplishing the same thing.

2.55 The two-phase dual simplex method

2.55.1 The plan

Just like our earlier two-phase methods, the two-phase dual simplex method involves
solving a phase one problem before we get to the problem we actually want to
solve. But this method is much more economical: though it will require an auxiliary
objective function, we will not need to add any new variables or constraints.

The logic is this: dual feasibility only depends on the objective function, and not
on the constraints. So if we replace our original objective function by an auxiliary
objective function, we can choose that auxiliary objective function to make our
dictionary dual feasible! Then, we can apply the dual simplex method and solve that
phase one problem.

Of course, in the phase one problem, we’ll be optimizing something completely


unrelated to what we actually want. This doesn’t matter: once the phase one
problem is solved, we’ll have a dictionary that’s both feasible and dual feasible. Now,
replace the auxiliary objective function by our original objective function. Unless
we’re very lucky, the resulting dictionary won’t be dual feasible—however, it will
still be feasible, because we didn’t touch the constraints! Therefore we can continue
with the ordinary simplex method to solve the problem we actually care about.

What should our auxiliary objective function be? Anything we like, as long as
it gives us a dual feasible dictionary. In general, this means minimizing any linear
expression with nonnegative coefficients on all the nonbasic (non-slack) variables.

You might be tempted to go with the simplest such linear expression: minimize
0. This is a bad choice, because the value of 0 doesn’t change as we pivot from basis
to basis. This means that the dual simplex method will constantly be doing “dual
degenerate pivots", and once again we have to worry about cycling.

A simple choice that will work as well as any other in general is to minimize the sum
of all the nonbasic variables. If you’re worried about degeneracy, you could borrow
from the lexicographic pivoting rule and decide to minimize ϵ1 x1 + ϵ2 x2 + · · · + ϵn xn ,
where ϵ1 ≫ ϵ2 ≫ · · · ≫ ϵn > 0. We will not bother doing this in our examples.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


156 Chapter 2. Introduction to Linear Optimization

2.55.2 Solving the example


Let’s try this on our example. Our auxiliary objective will be to minimize ξ =
x1 + x2 + x3 :

min ξ = 0 + x1 + x2 + x3
w1 = −0.1 + x1
w1′ = 0.9 − x1
w2 = 0 − 0.1x1 + x2
w2′ = 1 − 0.1x1 − x2
w3 = 0 − 0.1x2 + x3
w3 = 1 − 0.1x2 − x3

Only one basic variable currently has a negative value: w1 = −0.1. The only variable
with a positive coefficient in w1 ’s equation is x1 , so we have no choice in our pivot:
x1 enters and w1 leaves. We get x1 = 0.1 + w1 when we solve for x1 , and then we
substitute that in the other rows, getting

min ξ = 0.1 + w1 + x2 + x3
x1 = 0.1 + w1
w1′ = 0.8 − w1
w2 = −0.01 − 0.1w1 + x2
w2′ = 0.99 − 0.1w1 − x2
w3 = 0 − 0.1x2 + x3
w3 = 1 − 0.1x2 − x3

Again, only one basic variable has a negative value: w2 = −0.01. The only variable
with a positive coefficient in w2 ’s equation is x2 , so we still have no choice in our
pivot: x2 enters and w2 leaves. We get x2 = 0.01 + 0.1w1 + w2 when we solve for x2 ,
and then we substitute that in the other rows, getting

min ξ = 0.11 + 1.1w1 + w2 + x3


x1 = 0.1 + w1
w1′ = 0.8 − w1
x2 = 0.01 + 0.1w1 + w2
w2′ = 0.98 − 0.2w1 − w2
w3 = −0.001 − 0.01w1 − 0.1w2 + x3
w3′ = 0.999 − 0.01w1 − 0.1w2 − x3

Again, only one basic variable has a (barely) negative value: w3 = −0.001. And yet
again, there is only one choice of entering variable to replace w3 as a leaving variable:
x3 . When we pivot, we solve for x3 and get x3 = 0.001 + 0.01w1 + 0.1w2 + w3 , leading

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.55 The two-phase dual simplex method 157

us to the following dictionary:

min ξ = 0.111 + 1.11w1 + 1.1w2 + w3


x1 = 0.1 + w1
w1′ = 0.8 − w1
x2 = 0.01 + 0.1w1 + w2
w2′ = 0.98 − 0.2w1 − w2
x3 = 0.001 + 0.01w1 + 0.1w2 + w3
w3′ = 0.998 − 0.02w1 − 0.2w2 − w3

We are done with phase one! We don’t really care that we’ve optimized ξ, but the
good news for us is that the dictionary is feasible. When we replace the objective
function with ζ = x3 = 0.001 + 0.01w1 + 0.1w2 + w3 , it remains feasible:

max ζ = 0.001 + 0.01w1 + 0.1w2 + w3


x1 = 0.1 + w1
w1′ = 0.8 − w1
x2 = 0.01 + 0.1w1 + w2
w2′ = 0.98 − 0.2w1 − w2
x3 = 0.001 + 0.01w1 + 0.1w2 + w3
w3′ = 0.998 − 0.02w1 − 0.2w2 − w3

Now we are ready to maximize ζ, and if we like, we can use Bland’s rule and take
the most ridiculous number of steps possible to do it.
By the way, if you notice, the actual artificial objective function ξ never played
a role in our pivoting. This is not guaranteed to happen, but it’s not particularly
surprising: if we want the point (x1 , x2 , x3 ) = (0, 0, 0) to be dual feasible for ξ, then
ξ will probably be minimized at some point close to (0, 0, 0). This means that we
shouldn’t stress out too much about our choice of ξ in problems like this.

2.55.3 Problems in equational form


This strategy isn’t quite enough to deal with constraints of the form Ax = b where
x ≥ 0.
For systems like this, we have a three-step procedure:
1. Begin by row-reducing the system: just the usual Gaussian elimination you
learn in linear algebra. You will end up solving the equations for some set of
basic variables, which is not particularly under your control.
2. Now add an auxiliary objective function to build a dual feasible dictionary out
of the basic solution you got. For example, this objective could be to minimize
the sum of whichever variables end up nonbasic after step 1.
Use the dual simplex method to solve this phase one problem.
3. From the optimal dictionary, replace the auxiliary objective function by what-
ever objective function you originally wanted to optimize. As before, use the
ordinary simplex method to solve the phase two problem.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


158 Chapter 2. Introduction to Linear Optimization

2.56 Warm starts and row generation


Problem 2.17 Once again, you’re in charge of a factory that produces gizmos, widgets,
and doodads. Let’s say you must produce at least 10 of each object a day, but you’re
limited to using at most 2000 pounds of iron: it takes 10 pounds to make a doodad,
20 pounds to make a gizmo, and 30 pounds to make a widget. You must maximize
the profit: $20 per doodad, $30 per gizmo, and $30 per widget.
Let’s suppose you set up the problem (on the left), find the optimal dictionary
(on the right), and bring the optimal solution to your boss:

maximize20xd + 30xg + 40xw


xd ,xg ,xw ∈R max ζ = 3700 − 10w2 − 20w3 − 2w4
subject to xd ≥ 10 w1 = 140 − 2w2 − 3w3 − 0.1w4
xg ≥ 10 xd = 150 − 2w2 − 3w3 − 0.1w4
xw ≥ 10 xg = 10 + w2
10xd + 20xg + 30xw ≤ 2000 xw = 10 + w3
xd , x g , x w ≥ 0

Your boss takes one look at the printout and says, “No, no, that won’t work. What
kind of fool doesn’t know that the doohickey will overheat if you run it for more than
10 hours a day? The table is right there on the machine: you need the doohickey
for 15 minutes per doodad and 5 minutes per gizmo. Go fix this, I need the factory
schedule yesterday!"
Do you have to start from scratch? No. Let’s take the new constraint and
insert it into our final dictionary. Okay, this takes a bit of work: the constraint is
15xd + 5xg ≤ 600, which we write as w5 = 600 − 15xd − 5xg for a new slack variable
w5 . But xd and xg are also basic, so we substitute their equations in:

w5 = 600 − 15(150 − 2w2 − 3w3 − 0.1w4 ) − 5(10 + w2 )

which simplifies to w5 = −1700 + 25w2 + 45w3 + 1.5w4 . Now we can add that to our
dictionary.
The new dictionary is

max ζ = 3700 − 10w2 − 20w3 − 2w4


w1 = 140 − 2w2 − 3w3 − 0.1w4
xd = 150 − 2w2 − 3w3 − 0.1w4
xg = 10 + w2
xw = 10 + w3
w5 = −1700 + 25w2 + 45w3 + 1.5w4

which, of course, is no longer feasible: the doohickey constraint is not satisfied.


(We’re trying to use it for over 38 hours per day. Poor doohickey!) But the dictionary
is still dual feasible, because it started out dual feasible before we added the new
constraint. So we can try to fix it with the dual simplex method.
In this case, only a single step is required. The leaving variable must be w5 . All
three nonbasic variables have positive coefficients, so we compare the ratios: 10 20
25 , 45 ,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.57 Sensitivity analysis of the costs 159
2 10
and 1.5 . The smallest ratio is 25 , so w2 is the entering variable. After one pivot step,
we get:

max ζ = 3020 − 2w3 − 1.4w4 − 0.4w5


w1 = 4 + 0.6w3 + 0.02w4 − 0.08w5
xd = 14 + 0.6w3 + 0.02w4 − 0.08w5
xg = 78 − 1.8w3 − 0.06w4 + 0.04w5
xw = 10 + w3
w5 = 68 − 1.8w3 − 0.06w4 + 0.04w5
The point (xd , xg , xw ) = (14, 78, 10) is our new optimal solution that takes doohickey
usage into account.
This is called solving the problem from a warm start: we start from a dictionary
that was optimal for a closely-related problem. There is no guarantee that a warm
start will be faster than solving the problem from scratch, and in particular, there is
no guarantee that a single pivot step will be enough, like it was here. However, in
practice, it often seems to work well.
The situation where we forget about a constraint is contrived. Sometimes this
situation occurs when you have to solve a linear program for many similar problems.
Suppose that the factory constraints change a little every day; then always doing a
warm start from the previous day’s optimal solution might be a good idea.
A related concept is row generation,7 which we’ll see several times later this
semester. Here, we know in advance that our set of constraints is incomplete, but we
don’t know which of many possible constraints we’ll need. So we solve our linear
program, then look at an optimal solution to see if it violates any constraints we
left out. (The way this is done depends on the exact application, and often involves
techniques outside linear programming.) Then, we add some missing constraint and
use the dual simplex method to continue.

2.57 Sensitivity analysis of the costs


2.57.1 Intuition
Let’s begin with a linear program we’ve already solved much earlier in the semester.
Below is the linear program, along with a diagram of its feasible region:
y
maximize 2x + 3y (2, 5)
x,y∈R
subject to −x + y ≤ 3 (0, 3)
x − 2y ≤ 2 ( 16 , 5)
3 3
x + y ≤ 7
x
x, y ≥ 0 (0, 0) (2, 0)

Today, we look at the following question: what happens to the optimal solution when
we change the linear program slightly?
7 Note: the term “row generation" is often used specifically for a technique called Benders
decomposition, which is one particular example of the general idea.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


160 Chapter 2. Introduction to Linear Optimization

We will start by looking at one specific change in the objective function. Rather
than maximizing 2x + 3y, what happens when we maximize (2 + δ)x + 3y for some
real number δ? (In an economic application, this could happen when the profitability
of a product goes up or down.)
Because this is a small linear program with only five corner points, we can answer
this question in the silliest way. At each corner point, we can compute the value of
(2 + δ)x + 3y, as a function of δ. We get the diagram on the left:

ζ = 19 + 2δ ζ
y
40
ζ =9
30
47
ζ= 3 + 16
3δ 20

10
x
ζ =0 ζ = 2δ δ
-8 -6 -4 -2 2 4

We know that the optimal solution is going to be the best of the corner points.
Therefore, as a function of δ, the maximum value of (2 + δ)x + 3y is max{0, 9, 19 +
2δ, 47 16
3 + 3 δ, 2δ}. The diagram on the right plots the resulting function.
What we see is a piecewise linear function whose slope increases from left to
right. It has a segment where it is equal to ζ = 9, a segment where it is equal to
ζ = 19 + 2δ, and a segment where it acts as ζ = 47 16
3 + 3 δ. Those segments correspond
to the ranges of δ where each of those corner points is optimal.
(There are no segments where the function is equal to ζ = 0 or ζ = 2δ. That’s
because (0, 0) and (2, 0) will never be optimal as long as y has a positive coefficient
in the objective function: (0, 0) is always worse than (0, 3) and (2, 0) is always worse
than (2, 5).)

2.57.2 What we can compute


In larger examples, there will be too many corner points for this type of analysis
to be reasonable. We do not want to go back to the naive way of solving linear
programs, where we must find all of the corner points.
So let’s ask a different question: what part of the information above can we
compute from just knowing the optimal solution (x, y) = (2, 5)?
We can guess that for small values of δ, the optimal solution probably won’t
change. So if the objective value is 2 · 2 + 3 · 5 = 19 right now, we expect to get
(2 + δ) · 2 + 3 · 5 = 19 + 2δ, provided that δ is not too large.
We don’t know what “not too large" is yet, and we don’t know what happens
for too-large δ. However, we can say a little bit about it. Even if the point (2, 5)
stops being optimal for large δ, it will still remain feasible. Therefore, for any value
of δ, the maximum objective value possible will always be at least 19 + 2δ. (Our
prediction is “pessimistic".)
Here is a rule that generalizes this logic:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.57 Sensitivity analysis of the costs 161

Theorem 2.12 Suppose that our linear program has optimal solution x∗ with
objective value cT x∗ = ζ ∗ . If we change the coefficient of xi in the objective
function from ci to ci + δ, then the new objective value will be at least as good
as ζ ∗ + δx∗i . (This is a lower bound when maximizing, and an upper bound when
minimizing.)
For small δ, we can hope that the new objective value will be exactly ζ ∗ + δx∗i .

As a special case, if a variable xi is nonbasic, then we expect the objective value


to stay the same when the cost of xi changes by a small amount.

2.57.3 Ranging
Even a prediction for small values of δ can be more precise than this. From looking
at the plot we got by comparing all corner points, we see that the prediction of
19 + 2δ is exactly correct when −5 ≤ δ ≤ 1. We can try to determine the interval
where our prediction is correct: this is called ranging.
To do this, we’ll need to look at the optimal dictionary, not just the optimal
solution. The key idea is that to understand the effect of adding δx to the objective
function, we can add δx to the equation for ζ in our optimal dictionary.
Of course, this is no longer a properly formed dictionary, because x is a basic
variable and should never appear on the right-hand side. So we substitute the
equation x = 2 + 12 w1 − 12 w3 into δx and simplify. This results in the dictionary on
the right:

max ζ = 19 − 12 w1 − 52 w3 + δx max ζ = 19 + 2δ + (− 21 + 12 δ)w1 + (− 52 − 12 δ)w3


y = 5 − 12 w1 − 12 w3 y= 5 − 1
2 w1 − 1
2 w3

w2 = 10 − 32 w1 − 12 w3 w2 = 10 − 3
2 w1 − 1
2 w3
x = 2 + 12 w1 − 12 w3 x= 2 + 1
2 w1 − 1
2 w3

We see that the new dictionary is optimal as long as − 12 + 12 δ ≤ 0 and − 25 − 12 δ ≤ 0.


These inequalities simplify to δ ≤ 1 and δ ≥ −5, so that’s the range where 19 + 2δ is
guaranteed to predict the correct objective value.
As a shortcut for this calculation, we can compute ratios:
Theorem 2.13 To determine the range where the prediction ζ ∗ + δx∗i is guaranteed
to be correct, when xi is a basic variable, compute the values
!
reduced cost of xj

coefficient of xj in the equation for xi

for every nonbasic variable xj . Negative values are lower bounds on δ and positive
values are upper bounds on δ; the prediction is guaranteed to be correct if all the
bounds hold.

2.57.4 Nonbasic variables


The variable x was a bit special because it is a basic variable in our optimal solution.
The case of nonbasic variables is different, and actually easier to handle.
First of all, if a variable is nonbasic in the optimal solution, its value in the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


162 Chapter 2. Introduction to Linear Optimization

optimal solution is 0, so our prediction says that the objective value will not change
when the cost of the variable changes.
For how long will this be true? Well, adding δ to the cost of a basic variable is
the same as adding δ to its reduced cost. So we have just one restriction: addition
of δ can’t change the sign of the reduced cost.
To see this in our example, we’ll have to do something a bit unnatural and see what
happens when we add a δw1 term to the objective function. (Usually, slack variables
don’t show up at all in the objective function, but we’ll make an exception here.)
This changes the reduced cost: ζ = 19 − 12 w1 + 52 w3 becomes ζ = 19 + (δ − 12 )w1 + 52 w3 .
Therefore our solution remains optimal as long as δ − 12 ≤ 0, or δ ≤ 12 . (In particular,
making δ an arbitrarily big negative number will never change anything.)

2.58 Sensitivity analysis of the constraints


2.58.1 The dual variables as “shadow costs”
What if we make a different kind of change: a change in one of our constraints?
For example, suppose we change the constraint −x + y ≤ 3 in our linear program to
−x + y ≤ 3 + δ. What happens to the objective value?
This is much harder to understand by looking at a plot of the feasible region.
The problem is that as we move the constraint −x + y ≤ 3, not only do some corner
points themselves begin to move, but some of them vanish, and new ones appear!
This is not as prominent in a 2-dimensional problem, because every boundary of the
feasible region just has two corner points on it. In a 3-dimensional feasible region,
even the number of corner points on the boundary of the moving constraint can vary!
The trick is to look at the dual solution. Remember: if the primal program has
constraints Ax ≤ b or Ax = b, the dual program has objective function uT b. This
means that even though changing the vector b does weird unpredictable things to
primal solutions, it leaves the dual solutions entirely unchanged—all that changes is
their objective values.
As a result, the previous rules for changes in the objective function give us rules
about changes in the constraints, provided that we use the dual solution instead.
Theorem 2.14 Suppose that our linear program has an optimal solution with
objective value ζ ∗ , and a dual optimal solution u∗ . If we change the bound in the
j th constraint from bi to bi + δ, then the new objective value will be no better than
ζ ∗ + δu∗i . (This is an upper bound when maximizing, and a lower bound when
minimizing.)
For small δ, we can hope that the new objective value will be exactly ζ ∗ + δu∗i .

In the example, we are looking at a maximization problem with Ax ≤ b constraints,


so we can use a simple method for finding the dual solution: it is the negative of the
reduced costs of the slack variables. Looking at our optimal dictionary again, we see
that w1 has reduced cost − 12 , w2 has reduced cost 0, and w3 has reduced cost − 52 .
Therefore the optimal dual solution is u = ( 12 , 0, 52 ).
In particular, when the constraint −x + y ≤ 3 changes to −x + y ≤ 3 + δ, we
predict that the objective value will change to 19 + 12 δ. At least, we suspect that this
is true when δ is not too large.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.58 Sensitivity analysis of the constraints 163

This theorem has one detail we did not mention: it is an “optimistic" prediction.
Intuitively, for large changes in δ, one of two things will happen:
• If there is a very large positive change, say to −x + y ≤ 10000, what can happen
is that the constraint might stop being relevant. In this problem, even if the
−x + y ≤ 3 constraint is removed entirely, the optimal solution will just be
(0, 7) with an objective value of 21. Therefore at some point
• If there is a very large negative change, say to −x + y ≤ −10000, what can
happen is that the problem might become infeasible. In this case, we consider
the maximum objective value to be −∞, which is infinitely worse than the
prediction 19 + 2δ.
The underlying reason that this prediction is “optimistic" while our previous pre-
dictions were “pessimistic" is that the dual program is the reverse of the primal: it
minimizes when the primal maximizes, and vice versa.
The dual variables are also called shadow costs due to an economic application
of this analysis. Suppose that our objective value 2x + 3y measures the profit we make
from a particular solution. Then the prediction “when the constraint −x + y ≤ 3
changes to −x + y ≤ 3 + δ, we predict that the objective value will change to 19 + 12 δ"
means that an increase by δ in this constraint is worth 12 δ dollars to us. In other
words, we should be willing to pay up to 50 cents for each unit increase in this upper
bound.
The term “shadow cost" refers to the idea that we’re putting an inferred value on
something that might not have a clear inherent value to us. For example, if the ith
constraint is given by the number of hours our employees can work, then the dual
variable ui tells us the price of labor: a limit on how much we should be willing to
pay one of them to work an additional hour.

2.58.2 Ranging with slack variables


Just as when analyzing the costs, we can put a range on the values of δ for which
this prediction is exact. To avoid getting bogged down in calculations, we will only
do this for cases like our example: when we have a problem with constraints Ax ≤ b
which we put into equational form as Ax + w = b.
In a situation like this, we have a different way to do the calculation. In equational
form, changing −x + y + w1 = 3 to −x + y + w1 = 3 + δ is equivalent to changing it to
−x + y + (w1 − δ) = 3, and we can track the effects of this change by just replacing
w1 by w1 − δ everywhere in the dictionary:

max ζ = 19 − 12 (w1 − δ) − 52 w3 max ζ = (19 + 12 δ) − 12 w1 − 25 w3


y = 5 − 12 (w1 − δ) − 12 w3 y = (5 + 12 δ) − 12 w1 − 21 w3

w2 = 10 − 32 (w1 − δ) − 12 w3 w2 = (10 + 32 δ − 32 w1 − 21 w3
x = 2 + 12 (w1 − δ) − 12 w3 x = (2 − 12 δ) + 12 w1 − 21 w3

This gives us the same conclusion: that the objective value changes to 19 + 12 δ.
However, we can also see what happens to the basic variables. What does this tell
us about the limits on δ? Well, this prediction stops being valid if our basic solution
stops being feasible: if any of the basic variables become negative. So we get the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


164 Chapter 2. Introduction to Linear Optimization

following constraints on δ:
1 3 1
5 + δ ≥ 0, 10 + δ ≥ 0, 2 − δ ≥ 0.
2 2 2
This gives us a lower bound δ ≥ −10, another lower bound δ ≥ − 20 3 , and an upper
bound δ ≤ 4. Therefore our prediction is valid for δ in the range [− 20
3 , 4].
For a basic variable like w2 , the same method works, but we can work things
out more intuitively. Seeing w2 = 10 in the dictionary tells us that the constraint
x − 2y ≤ 2 is not even tight at the moment: x − 2y is 10 lower than its upper bound.
So changing the right-hand side by a small amount will not affect the objective value,
and this stays true, provided that we don’t reduce it by more than 10.

2.59 Introduction to games


2.59.1 Matrix games
We are going to consider two-player games where the players simultaneously pick
strategies, and are rewarded with a payoff based on their choices. A classic example
is the prisoner’s dilemma (which we’re only going to briefly mention).
Here, the two players (Alice and Bob) are suspected of a bank robbery (2 years
in prison), but arrested only for some minor thing like tax evasion (1 year in prison).
The prosecutor offers each of them a bargain: testify against the other, and the tax
evasion charges are dropped. As a result:
• If Alice and Bob both stay silent, each spends 1 year in prison.
• If Alice testifies against Bob, Alice goes free and Bob spends 3 years in prison
(and vice versa).
• If Alice and Bob both testify against each other, they both spend 2 years in
prison.
We can represent spending t years in prison by a payoff of −t. (The goal is always
to maximize your payoff, so bad outcomes get negative payoffs.)
We can represent the prisoner’s dilemma (and more generally, any such game) by
a payoff matrix. Assign each of Alice’s strategies a row, and each of Bob’s strategies
a column. At the intersection of the row and the column, record Alice and Bob’s
respective payoffs:

Bob stays silent Bob testifies


Alice stays silent (−1, −1) (−3, 0)
Alice testifies (0, −3) (−2, −2)

It is best for both players to stay silent, rather than both testify, so they should
agree not to testify. However, each individual player is better of testifying, no matter
what the other does, so they should betray that agreement. However, that leaves
both players worse off. This weird behavior gives the prisoner’s dilemma a rich and
complicated dynamic. . .

2.59.2 Zero-sum games


. . . which we’re going to ignore entirely, because in this class we will only fully analyze
zero-sum games.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.60 Strategies 165

A zero-sum game is one in which any hope of cooperation between the players
is eliminated, because Alice’s payoff is the negative of Bob’s payoff. (They sum to
0.) Whatever outcome helps one player, hurts the other player equally.
Though a lot of the concepts we will introduce make sense for general matrix
games, we will mostly rely on the assumption that Alice should expect Bob to make
the choice that’s worst for Alice, and vice versa. This is a bad assumption in general:
Bob wants to make the choice that’s best for Bob, whether or not that hurts Alice!
But in the case of zero-sum games, the two are equivalent.
In any case, our goal in analyzing these games will be to determine what Alice
and Bob’s optimal strategies are, and what the resulting payoff is for both players.

2.60 Strategies
We will look at some examples of zero-sum games to illustrate a few cases where we
can find the optimal strategies easily, and the general case which is more complicated.

2.60.1 Dominated strategies


Consider the following game, called “higher number". It’s a pretty stupid game.
Alice and Bob each hold up one, two, or three fingers. The player holding up fewer
fingers gives $1 to the other player. The payoff matrix for the two players is:

Bob: one Bob: two Bob: three


Alice: one (0, 0) (−1, 1) (−1, 1)
Alice: two (1, −1) (0, 0) (−1, 1)
Alice: three (1, −1) (1, −1) (0, 0)

It is immediate to see that both players want to hold up more fingers rather
than fewer. The point of this example is just to introduce two terms that help in
analyzing matrix games.
• We say that one strategy of a player dominates another strategy if, no matter
what the other player does, the first strategy is better than (or at least as good
as) the second.
Here, “two" dominates “one" and “three" dominates both “one" and “two", for
both players: no matter what the other player does, you can never go wrong
by holding up more fingers.
• If a strategy dominates every other strategy, we call it a dominant strategy.
Here, “three" is the dominant strategy for both players.
If we can identify a dominant strategy for a game, there is nothing left to analyze—
at least in the case of zero-sum games. A player with an optimal strategy should
always play it. With that assumption, the only thing left for the other player to do
is to pick the best response to that strategy.
Even if there is no dominant strategy, it makes sense to eliminate from considera-
tion any strategy that’s dominated by another strategy. After all, that other strategy
is never worse. This can help us simplify the problem and make the payoff matrix
smaller.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


166 Chapter 2. Introduction to Linear Optimization

2.60.2 Saddle points


Suppose that Alice and Bob are two generals. Alice is defending a city, which has
two gates: a north gate and a south gate. Alice can choose to defend just one gate,
or split her forces and defend both. Meanwhile, Bob can either send his entire army
to attack one gate in force, or split his forces and and send raiding groups against
both gates.
This will be a zero-sum game, so we’ll think of the possible outcomes as one
player winning or losing “points" from the other. The possible outcomes are, in
order:
• If Bob attacks an undefended gate, he captures the city. (Alice loses 5 points
to Bob.)
• If Bob sends raiding groups and finds an undefended gate, that raiding group
pillages supplies from the city. (Alice loses 2 points to Bob.)
• If Bob sends raiding groups against two gates, and both are partially defended,
Bob’s soldiers retreat with minimal losses. (No points.)
• If Bob attacks a partially defended gate, Alice holds him off and Bob suffers
casualties. (Alice wins 1 point from Bob.)
• If Bob sends his entire army against Alice’s, he takes heavy losses and the siege
is broken. (Alice wins 5 points from Bob.)
In the form of a payoff matrix, we have:

Bob: attack north Bob: raid Bob: attack south


Alice: defend north (5, −5) (−2, 2) (−5, 5)
Alice: split forces (1, −1) (0, 0) (1, −1)
Alice: defend south (−5, 5) (−2, 2) (5, −5)

There is no dominant strategy in this game: for each player, both of the “all-in"
strategies that focus on one gate have the highest reward, but also the highest risk.
To analyze this game, we can make the following observations:
• If Alice splits her forces, she is certain not to lose points. She might even win
a few points if Bob attacks in force.
• If Bob sends raiding groups, he is also certain not to lose points (and so Alice
is certain not to win any points). If one of the gates is unprotected, Bob might
even win some points.
This means that for Alice, splitting her forces is an optimal strategy. On the one
hand, it guarantees her at least 0 points. On the other hand, Bob has a counter-
strategy that prevents Alice from gaining any positive amount of points, so she can’t
possibly do better. For Bob, sending raiding groups is an optimal strategy for the
same reason. (Either one of the generals can even announce the strategy they take
in advance: it will not make any difference.)
In general, we call an outcome like “Alice splits her forces and Bob sends raiding
groups" a saddle point. A saddle point in a zero-sum game is an outcome that’s
the worst outcome for Alice in its row, but the best outcome for Alice in its column.
Whenever there is a saddle point, either player can guarantee an outcome at least as
good as the saddle point by choosing its row or its column as a strategy.
It is a coincidence that the saddle point gives 0 points to both players in this
example. We could modify the problem by having Bob give Alice one extra point in

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.60 Strategies 167

each outcome (to model the idea that Bob’s army is running low on supplies, while
Alice is in a well-stocked city with months of food). The same outcome would stay a
saddle point, because relative comparisons between two outcomes come out the same
way. However, the saddle point would now give payoffs of (1, −1) to Alice and Bob.

2.60.3 Mixed strategies


Finally, we will consider the “odd-even game". In this game, Alice and Bob each
hold up either 1 or 2 fingers. They add up the number of fingers held up; Alice wins
if it is odd, and Bob wins if it is even. Additionally, Alice’s choice determines the
stakes: if Alice holds up 1 finger, she wins or loses $1, nad if Alice holds up 2 fingers,
she wins or loses $2. The payoff matrix is
Bob: 1 Bob: 2
Alice: 1 (−1, 1) (1, −1)
Alice: 2 (2, −2) (−2, 2)
This game has neither a dominant strategy for either player, nor a saddle point.
What can we do?
If Bob knew what Alice would play, he could play the same move in response
and win either $1 or $2. So there’s no best move for Alice. Instead, one possible
strategy for Alice is to flip a coin (invisibly from Bob) before making her move. If
it’s heads, Alice holds up 1 finger; if it’s tails, Alice holds up 2 fingers. If Alice is
doing this, we can compute the expected payoff for each strategy Bob chooses:
• If Bob holds up 1 finger, there is a 12 chance that Alice will lose $1 and a 12
chance that she will win $2. So the average amount of money Alice wins is
1 1 1
2 (−1) + 2 (2) = 2 : in expectation, she wins 50 cents.
• If Bob holds up 2 fingers, there is a 12 chance that Alice will win $1 and a 12
chance that Alice will lose $2. So the average amount of money Alice wins is
1 1 1
2 (1) + 2 (−2) = − 2 : in expectation, she loses 50 cents.
The coin-flip strategy has better worst-case behavior for Alice. If Alice is planning
to hold up some number of fingers for certain, and Bob knows it, Bob can hold up
the same number of fingers, and win either $1 or $2. But if Alice is planning to flip
a coin, and Bob knows it, the best Bob can do is hold up 2 fingers, which guarantees
him an average of 50 cents.
We can generalize this idea. Suppose that Alice and Bob are playing a matrix
game where Alice has m choices numbered 1, 2, . . . , m and Bob has n choices numbered
1, 2, . . . , n. We can record all of Alice’s payoffs in an m × n matrix A, where aij is
Alice’s payoff when Alice chooses choice i and Bob chooses choice j. We can record
Bob’s payoffs in an m × n matrix B, though in the case of zero-sum games we will
just have B = −A.
A pure strategy for Alice is the strategy of just picking one of the m choices
she has. A mixed strategy involves picking between her options at random.
There are many mixed strategies, each one represented by a probability vector
y ∈ Rm . (The conditions for y to be a probability vector are that y ≥ 0 and that
y1 + y2 + · · · + ym = 1.) We interpret a probability vector as saying that Alice picks
choice i with probability yi .
The calculation we did for Alice’s expected payoffs (depending on Bob’s actions)
is generalized by the vector-matrix product yTA. This is a row vector of length n in

T.Abraha(PhD) @AKU, 2024 Linear Optimization


168 Chapter 2. Introduction to Linear Optimization

which the j th entry is given by the sum


y1 a1j + y2 A2j + · · · + ym amj
which is exactly the calculation that tells us Alice’s expected payoff when Alice plays
the mixed strategy given by y and Bob picks choice j. It is the sum of the products
(probability of outcome) × (payoff from outcome)
over all outcomes in the j th column.
Bob can also play a mixed strategy. Bob’s mixed strategies can be described
by probability vectors x ∈ Rn , since Bob has n options. The matrix-vector product
Ax gives us a column vector of Alice’s expected payoffs when Bob plays this mixed
strategy.
Finally, suppose that both players are playing mixed strategies. First, let’s
think about this in the odd-even game. Here, Alice’s strategy is described by a
probability vector (y1 , y2 ) ∈ R2 , and Bob’s strategy is described by a probability
vector (x1 , x2 ) ∈ R2 . Alice can get payoffs of:
• −1 with probability y1x1;
• 1 with probability y1x2;
• 2 with probability y2x1;
• −2 with probability y2x2.
Therefore Alice’s expected payoff is −y1 x1 + y1 x2 + 2y2 x1 − 2y2 x2 . This is exactly
the value given by the product
  
h i −1 1  x1 
y1 y2  .
2 −2 x2

This generalizes: if the matrix of Alice’s payoffs is the m × n matrix A, Alice’s


strategy is given by a probability vector y ∈ Rm , and Bob’s strategy is given bby a
probability vector x ∈ Rn , then Alice’s expected payoff is given by yTAx.

2.61 The “best worst-case" (maximin) strategy


2.61.1 Alice’s plan
Let us summarize the previous lecture. In a matrix game, two players named Alice
and Bob simultaneously pick one of several choices; we abstractly say that Alice’s
options are {1, 2, . . . , m} and Bob’s options are {1, 2, . . . , n}. There is an m × n
matrix A listing Alice’s payoffs: when Alice picks option i ∈ {1, 2, . . . , m} and Bob
picks option j ∈ {1, 2, . . . , n}, Alice receives a payoff of aij (which can be positive or
negative). Alice would like to maximize her payoff.
(There is also an m × n matrix B for Bob’s payoffs, and in a zero-sum game, we
have B = −A. For now, we will ignore B, because we are only focusing on Alice’s
choices.)
A mixed strategy for Alice, corresponding to choosing between her strategies
at random with some probabilities, is given by a vector y ∈ Rm with y1 + y2 + · · · +
ym = 1 and y ≥ 0. A mixed strategy for Bob is given by a vector x ∈ Rn with
x1 + x2 + · · · + xn = 1 and x ≥ 0. We saw that if Alice and Bob play these mixed
strategies against each other, the expected payoff for Alice is given by yTAx.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.61 The maximin strategy 169

Alice would like to choose the best probability vector y for her mixed strategy.
But she can’t use the formula yTAx to evaluate how good a strategy is directly,
because she doesn’t know which vector x ∈ Rn represents Bob’s strategy. Instead,
one thing Alice can do is evaluate her strategies by what happens if Bob knows
her strategy and chooses the option that’s worst for Alice. This is called Alice’s
maximin strategy.
For all we know, this could be a terrible idea! In fact, we can cook up lots of
examples of games that aren’t zero-sum, in which the maximin strategy does terribly.
Consider the “Win, Lose, and Copy" game, defined as follows. Bob has two
options: “Win $100" and “Lose $100". Alice also has two options: “Don’t play" and
“Copy Bob". This game has payoff matrix

Bob: Win $100 Bob: Lose $100


Alice: Don’t play (0, 100) (0, −100)
Alice: Copy Bob (100, 100) (−100, −100)

In this game, Alice’s maximin strategy is not to play: copying Bob has the risk
that Bob will pick “Lose $100", and not playing can’t lose any money. But Bob isn’t
stupid and will never pick “Lose $100", so “Copy Bob" is guaranteed to earn Alice
$100 as well.
We will see that in the case of zero-sum games, the maximin strategy is reasonable,
but it will take us some time to get there.

2.61.2 Writing down a linear program


First, here is a claim to simplify the analysis.
Theorem 2.15 For any strategy played by Alice, at least one worst-case response
from Bob is a pure strategy.

Proof. If Alice is playing the mixed strategy given by some probability vector y ∈ Rm ,
then yTA is the vector of her possible payoffs, depending on Bob’s choices. If Bob
plays the pure strategy “always pick option i" for some i ∈ {1, 2, . . . , n}, then Alice’s
payoff is going to be (yTA)i : the ith component of this vector.
If Bob plays a mixed strategy x ∈ Rn , then Alice’s expected payoff yTAx is
a weighted average of the payoffs above, where payoff (yTA)i is multiplied by
weight xi . The weighted average can’t be lower than the smallest of the payoffs
(yTA)1 , (yTA)2 , . . . , (yTA)n . So the smallest of those payoffs is Alice’s worst case.
Or in other words: if option j ∈ {1, 2, . . . , n} is the best response for Bob, then
playing a mixed strategy x instead could be described as “with probability xj , do the
best thing; the rest of the time, do something worse." That’s obviously suboptimal.
(Technically, multiple options could be tied for Bob’s best response, in which
case choosing randomly between them is just as good as choosing one of them; but
choosing randomly will never be strictly better.) ■

Based on this claim, the worst-case payoff when Alice plays a mixed strategy
given by y ∈ Rm is
n o
min (yTA)1 , (yTA)2 , . . . , (yTA)n .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


170 Chapter 2. Introduction to Linear Optimization

Therefore Alice can find a maximin strategy by solving the following optimization
problem:
n o
maximize
m
min (yTA)1 , (yTA)2 , . . . , (yTA)n
y∈R
subject to y1 + y2 + · · · + ym = 1
y≥0
This is not a linear program. But there is a trick that turns it into one!
To maximize the minimum of multiple options, we can maximize an auxiliary
variable u, subject to the constraint that u is smaller than each option. In other words,
we maximize u, adding the constraints u ≤ (yTA)1 , u ≤ (yTA)2 , . . . , u ≤ (yTA)n .
Let 1 denote the vector (1, 1, . . . , 1) in which every component is 1. (In this case,
we’ll want to have 1 ∈ Rn , but in general, we’ll abuse notation and write 1 for the
all-ones vector of whichever dimension we need.) Then a quick way to write down
these constraints on u is u1T ≤ yTA. Similarly, the constraint y1 + y2 + · · · + ym = 1
can be written as yT 1 = 1, where 1 ∈ Rm .
So we get the following linear program:
maximize
m
u
y∈R ,u∈R
subject to yTA ≤ u1T
yT 1 = 1
y≥0

2.61.3 Example: the odd-even game


As an example, let’s consider the odd-even game from the previous lecture, with the
payoff matrix below:

Bob: 1 Bob: 2
Alice: 1 (−1, 1) (1, −1)
Alice: 2 (2, −2) (−2, 2)

If Alice plays a mixed strategy given by the probability vector (y1 , y2 ), what
happens?
• When Bob counters with playing “1" (holding up 1 finger), Alice’s expected
payoff is y1 (−1) + y2 (2) = −y1 + 2y2 .
• When Bob counters with playing “2” (holding up 2 fingers), Alice’s expected
payoff is y1 (1) + y2 (−2) = y1 − 2y2 .
Alice wants to choose (y1 , y2 ) to maximize her expected payoff in the worst case:
she wants to maximize min{−y1 + 2y2 , y1 − 2y2 }. For this, we write down the linear
program

maximize u
u,y1 ,y2 ∈R
subject to −y1 + 2y2 ≥ u
y1 − 2y2 ≥ u
y1 + y2 = 1
y1 , y2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.62 Zero-sum games and duality 171

2.62 Zero-sum games and duality


Of course, Bob can write down a similar linear program to Alice’s. In the case of a
general matrix game, Bob’s payoffs are given by the m × n matrix B unrelated to A.
When Bob plays a mixed strategy given by a probability vector x ∈ Rn , and Alice
counters it by doing whatever is worst for Bob, Bob’s expected payoff is
min {(Bx)1 , (Bx)2 , . . . , (Bx)m } .
There is not much else to say in the general case, but in the case of zero-sum games,
the matrix B is equal to −A. In that case, the formula above can be rewritten as
min {(−Ax)1 , (−Ax)2 , . . . , (−Ax)m } = − max {(Ax)1 , (Ax)2 , . . . , (Ax)m } .
For Bob, trying to maximize − max {(Ax)1 , (Ax)2 , . . . , (Ax)m } is equivalent to trying
to minimize the same expression without the negative sign. In other words, it is
equivalent for Bob to pick the mixed strategy so that the highest payoff Alice can
ensure for herself is as low as possible. This is called a minimax strategy.
(In a general matrix game, playing the minimax strategy is needlessly spiteful:
you’re not trying to do what’s best for yourself, but instead trying to hurt your
opponent as much as possible. In a zero-sum game, these are one and the same.)
We can use the same trick as earlier to rewrite this as a linear program: to
minimize
max {(Ax)1 , (Ax)2 , . . . , (Ax)m } ,
Bob can minimize a quantity v ∈ R with the constraints that v ≥ (Ax)1 , v ≥ (Ax)2 ,
. . . , v ≥ (Ax)m . This results in the following linear program for Bob’s minimax
strategy:
minimize
n
v
x∈R ,v∈R
subject to Ax ≤ 1v
1T x = 1
x≥0
Once again, let’s look at this linear program in the case of the odd-even game. Here,
if Bob plays a mixed strategy given by the probability vector (x1 , x2 ), Alice can do
the following:
• Counter by playing “1” (holding up 1 finger), obtaining an expected payoff of
−x1 + x2 .
• Counter by playing “2” (holding up 2 fingers), obtaining an expected payoff of
2x1 − 2x2 .
Bob wants to choose (x1 , x2 ) to minimize Alice’s expected payoff in her best case:
he wants to minimize max{−x1 + x2 , 2x1 − 2x2 }. For this, we write down the linear
program
minimize v
v,x1 ,x2 ∈R
subject to −x1 + x2 ≤ v
2x1 − 2x2 ≤ v
x1 + x2 = 1
x1 , x 2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


172 Chapter 2. Introduction to Linear Optimization

The amazing thing that happens is that Alice and Bob’s linear programs are duals
of each other! We can see this by rewriting them side-by-side in a more standardized
form, and pairing variables and constraints appropriately:
 
 maximize
 u 
 minimize v


 u,y1 ,y2 ∈R 

 v,x1 ,x2 ∈R
 
subject to y1 + y2 = 1 (v) subject to x1 + x2 = 1 (u)

 

 
(P) u + y1 − 2y2 ≤ 0(x1 ) (D) v + x1 − x2 ≥ 0(y1 )

 

u − y1 + 2y2 ≤ 0(x2 ) v − 2x1 + 2x2 ≥ 0(y2 )

 


 


 

y1 , y2 ≥ 0 x1 , x 2 ≥ 0

 

In particular, pay attention to u and v: these are unconstrained variables, and each
one is paired with an equation constraint in the other linear program.
It is also true that in general, the linear program for Alice’s maximin strategy
is dual to the linear program for Bob’s minimax strategy. We can write the linear
programs in the following form to expose the duality:
 
 maximize
 u  minimizen v
u∈R,y∈Rm






 v∈R,x∈R
yT 1 = 1(v) 1T x = 1(u)
 
subject to subject to
 
(P) (D)



 u1T − yTA ≤ 0T (x) 


 1v − Ax ≥ 0(y)
 
y≥ 0 x≥0

 

The matrix of coefficients in the constraints is, in both cases, the matrix with block
structure
 

0 1T 
1 −A

where A is Alice’s payoff matrix.

2.63 Solving the linear program


Let’s return to Alice and Bob’s linear programs for the odd-even game:
 

 maximize u 
 minimize v


 u,y1 ,y2 ∈R 

 v,x1 ,x2 ∈R
 
subject to y1 + y2 = 1 (v) subject to x1 + x2 = 1 (u)

 


 

(P) u + y1 − 2y2 ≤ 0(x1 ) (D)  v + x1 − x2 ≥ 0(y1 )

 
u − y1 + 2y2 ≤ 0(x2 ) v − 2x1 + 2x2 ≥ 0(y2 )

 


 


 

y1 , y2 ≥ 0 x1 , x 2 ≥ 0

 

Can we figure out their optimal strategies?


We can solve these linear programs directly, but they have a couple of unsavory
features: each has an equational constraint and an unrestricted variable. These are
both things we can deal with, but in this case, there are shortcuts to take that make
our life easier.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.64 Example: a fruit-shipping problem 173

2.63.1 Complementary slackness


In cases where one player has only two options to choose from, we can solve the
other player’s linear program quickly by using complementary slackness. In this case,
this works for both Alice and Bob; let’s use it to solve Alice’s linear program.
The logic is this: looking at the payoff matrix, we can quickly check that Bob
does not have a dominant strategy. Therefore Bob’s optimal strategy should be a
mixed strategy with x1 > 0 and x2 > 0.
By complementary slackness, this means that in Alice’s linear program, u + y1 −
2y2 = 0 and u − y1 + 2y2 = 0. From this, we can deduce that −y1 + 2y2 = y1 − 2y2 ,
because both are equal to u. (Intuitively, this means Alice wants to pick (y1 , y2 ) so
that Bob is indifferent between his options.) This simplifies to y1 = 2y2 , and from
y1 + y2 = 1, we conclude that Alice’s optimal strategy is given by (y1 , y2 ) = ( 13 , 23 ):
she should hold up one finger 31 of the time and two fingers 32 of the time.
This reasoning does not always work in bigger problems. If Bob has many options,
even if there is no option that dominates another by itself, it’s possible that one of
Bob’s options is dominated by a mixed strategy formed by several others. So we
can’t necessarily say that Alice wants to make Bob indifferent between all of his
options: only the ones that make sense for Bob to sometimes play!

2.63.2 Simplifying the linear program


There are a couple of things we can do to make the linear program easier to solve;
they can be done independently or together.
First, it would be nice if we could make u a nonnegative variable in Alice’s linear
program, because then we can use it in the simplex method. There is a trick to do
this: modify the game so that every time Alice and Bob play, Bob gives Alice an
extra $2 for free. The new payoff matrix becomes:

Bob: 1 Bob: 2
Alice: 1 (1, −1) (3, −3)
Alice: 2 (4, −4) (0, 0)

In this payoff matrix, Alice’s payoffs are always nonnegative, so u will also be
nonnegative. However, Alice’s optimal strategy is unaffected by Bob giving her $2
unconditionally. Therefore solving the linear program with this new payoff matrix
will produce the same optimal (y1 , y2 ), and now we can assume that u ≥ 0.
Also, though in general an equational constraint is hard to deal with, in this
problem we don’t have to go to the trouble of using a two-phase method. Any pure
strategy, such as for example (y1 , y2 , . . . , ym ) = (1, 0, . . . , 0), is a feasible solution to
Alice’s linear program. This means that after adding slack variables w1 , w2 , . . . , wn
to Alice’s linear program, we can solve for the basic variables (y1 , w1 , w2 , . . . , wn ) to
get an initial feasible basis.

2.64 Example: a fruit-shipping problem


Suppose you grow oranges in California and want to ship them by airplane to Atlanta.
Let’s say that there are no direct flights from Los Angeles (LAX) to Atlanta (ATL),
so the oranges will have to pass through an intermediate airport; for simplicity,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


174 Chapter 2. Introduction to Linear Optimization

let’s limit those to Chicago (ORD), New York (JFK), and Dallas (DFW). On our
simplified map of the airports, we will draw arrows representing the possible flights.
Every airplane can carry the same amount of oranges, but the number of flights
between the airports is limited. To keep track of that number, we label each arrow
from airport to airport with the number of flights we can use to carry oranges going
in that direction. This labeling is shown below on the left. On the right, we see one
possible (though maybe not very efficient) way that the flights could be used: there
are 5 airplanes carrying oranges from LAX to DFW, 5 more carrying those same
oranges from DFW to ORD, and then the oranges are split up; 4 airplanes’ worth of
oranges are taken directly to ATL, and the remaning oranges are shipped through
JFK.

ORD 3 JFK ORD 1/3 JFK

6 0/6
6 4 0/6 4/4
LAX 5 7 LAX 5/5 1/7
3 0/3
6 5/6
DFW 2 ATL DFW 0/2 ATL

We would like to get as many oranges as possible from LAX to ATL per day.
How can we formulate this question as a linear program?
Our variables in this problem will have to be the variables we need to specify
feasible solutions such as the one we see in the diagram of the right: for every
pair of airports connected by flights, we need to know the number of airplanes
used to ship oranges from one to the other. For example, we might have a variable
xLAX,DFW to represent how much is being shipped from LAX to DFW; in the diagram,
xLAX,DFW = 5. Sometimes, there are flights going both ways, which need different
variables; for example, we will have separate variables xDFW,JFK and xJFK,DFW .
What are the constraints on these variables? The most straightforward ones
are the ones coming from the numbers in the diagram: for example, xLAX,DFW ≤ 6
because we are told that there are only 6 flights from LAX to DFW. (Also, of course,
all these variables should be nonnegative.) But if those were all the constraints, then
we’d “solve” the problem by setting each variable to its maximum value—after all,
why not?
There are some logical problems with doing that. For example, we’d have 12
flights leaving LAX carrying oranges each day, but 13 flights entering ATL with
oranges. Where do the extra oranges come from? The problem is that we forgot to
add any constraints saying that oranges can’t appear out of nowhere or vanish into
nowhere.
The “conservation of oranges” constraints will look at every intermediate airport
and say: the number of oranges going in should equal the number of oranges going
out. For example, at JFK, this constraint would be

xORD,JFK + xDFW,JFK = xJFK,DFW + xJFK,ATL .

We do not have such a constraint at LAX or ATL. There are no oranges that can be
shipped into LAX, but that does not mean that xLAX,ORD +xLAX,DFW (the number of

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.65 Maximum flow problems 175

oranges shipped out of LAX) should be 0: in fact, we want this quantity to be as large
as possible! To solve our problem, we can either maximize xLAX,ORD + xLAX,DFW
or, equivalently, maximize xORD,ATL + xJFK,ATL + xDFW,ATL : the number of oranges
arriving in ATL. With the “conservation of oranges” constraint in play, these should
be one and the same.
Now we have all the ingredients we need for this linear program: our first example
of a maximum flow problem.

2.65 Maximum flow problems


Let’s generalize the problem we have described in the previous section.
In general, our problem will be described by a network, which includes two
pieces of information:
• A set N of nodes. In a maximum-flow problem, there are always two special
nodes in N : a source s and a sink t. Our goal is to transport as much stuff
from s to t.
• A set A of arcs. Each arc is a pair (i, j), where i and j are nodes, and
representing the possibility of going from i to j. Each arc (i, j) has a capacity
cij . We say that the arc (i, j) starts at i and ends at j; we assume that there
are no arcs that start at t or end at s, because they’d be pointless.
To describe what we are doing with the network, we have a variable xij for every arc
(i, j) ∈ A representing the amount of stuff being moved along arc (i, j). We call xij
the flow from i to j, and in general we call the vector x a flow. A feasible flow
must satisfy the following constraints:
• Nonnegativity constraints: xij ≥ 0 for all (i, j) ∈ A.
• Capacity constraints: xij ≤ cij for all (i, j) ∈ A.
• Flow conservation constraints, which take a bit more effort to write down.
For every node k ∈ N other than s and t, we have
X X
xik = xkj .
i:(i,k)∈A j:(k,j)∈A

Here, “i : (i, k) ∈ A” means “this sum ranges over all nodes i such that (i, k) is
an arc”. Similarly, the second sum ranges over all nodes j such that (k, j) is an
arc.
The meaning of this constraint is that the total flow going into node k is equal
to the total flow going out of node k.
Subject to all three sets of constraints, we maximize the total flow out of the source:
X
xsj .
j:(s,j)∈A

We call this quantity the value of flow x.

2.66 Cuts and their capacities


We move on to the following question: how can we tell if a feasible flow is optimal?
Consider the first diagram below. Here, just as in the orange shipping problem,
we label an arc (i, j) by xij /cij : the flow along the arc and the capacity of the arc.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


176 Chapter 2. Introduction to Linear Optimization

Here, the value of the flow 7; we can guarantee that this is best possible, because
the total capacity of edges leaving s is 7, so at most 7 flow can leave s.

a a
3/3 2/2 3/3 3/5
s 1/5 t s 0/3 t
4/4 5/7 4/7 4/4
b b

Now consider the second diagram. Here, things are a bit more complicated. However,
we can still see a “bottleneck" in the flow if we think of splitting up the nodes into
{s, b} and {a, t}. The total flow going from {s, b} to {a, t} is 7 (which is still the
value of the flow). This cannot be increased, because all edges from s or b to a or t
(namely, (s, a) and (b, t)) have the maximum flow possible, and all edges going the
other way (namely, (a, b)) have zero flow.
The generalization of this notion is a cut. A cut in a network is a partition of
the node set N into two sets, S and T , such that s ∈ S and t ∈ T . (Being a partition
requires that S ∩ T = ∅ and S ∪ T = N : each node is exactly one of the two sets
S, T .)
The capacity of a cut (S, T ) is, informally, the maximum amount of flow that
can move from S to T . Formally, it is the sum
XX
c(S, T ) = cij
i∈S j∈T

where we take cij to be 0 if (i, j) ∈/ A. For example, in the second diagram above,
the cut ({s, b}, {a, t}) has capacity csa + cbt = 7, because (s, a) and (b, t) are the only
two arcs going from {s, b} to {a, t}.
Low-capacity cuts are bottlenecks in the network: if we have a cut (S, T ) with
capacity c(S, T ), then no more than c(S, T ) flow can be sent from s to t. This might
make intuitive sense, but just in case, let’s prove why this happens.
Theorem 2.16 If a cut (S, T ) has capacity c(S, T ), then no more than c(S, T ) flow
can be sent from s to t in the network.
Proof. Let x be a feasible flow in the network, and consider the sum
 
X X X
v(x) :=  xkj − xik  .
k∈S j:(k,j)∈A i:(i,k)∈A

On one hand, flow conservation tells us that only the k = s term of this sum is
allowed to be nonzero. In fact, the k = s term of this sum is equal to the value of x,
and therefore the whole sum v(x) simplifies to the value of x.
Now rearrange this sum slightly differently. First, split it up:
X X X X
v(x) = xkj − xik .
k∈S j:(k,j)∈A k∈S i:(i,k)∈A

Now split up each of these sums further: in each sum ranging over all i or all
j, consider i, j ∈ S separately from i, j ∈ T . (To simplify notation, we’ll drop the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.67 Cuts and linear programming duality 177

requirement that (k, j) ∈ A or (i, k) ∈ A: with the convention that the capacity of
arcs not in A is 0, this doesn’t make a difference.) We get:
   
XX X X XX XX
v(x) =  xkj + xkj  −  xik + xik  .
k∈S j∈S k∈S j∈T k∈S i∈S k∈S i∈T

The double sum over k ∈ S, j ∈ S cancels with the double sum over k ∈ S, i ∈ S,
because those two sums include the exact same terms, so we have
X X XX
v(x) = xkj − xik .
k∈S j∈T k∈S i∈T

Now we’re going to put an upper bound on v(x). For the first sum, we have xkj ≤ ckj
in each term, and so replacing xkj by ckj can only increase the result. For the second
sum, we have xik ≥ 0 in each term, and we’re subtracting all of these terms, so
replacing xik by 0 can also only increase the result. Therefore
X X XX
v(x) ≤ ckj − 0.
k∈S j∈T k∈S i∈T

But this is precisely the definition (with slightly different summation variables) of the
capacity c(S, T ). Therefore v(x) ≤ c(S, T ), which is the inequality we wanted. ■

2.67 Cuts and linear programming duality


Cuts in a network are a way to put an upper bound on the value of a feasible flow.
We already have one way to do that for any linear program, not just this one: the
dual program. Can we learn to think of cuts in terms of a linear programming dual
of the maximum flow program?
The answer is that the two are almost the same; we’ll deal with the differences
over the next few lectures. Here is a pair of dual linear programs we will look at
more closely:

 maximize

 v,x
v

 X
subject to v − xsj = 0 (us )





j:(s,j)∈A



 X X



 xik − xkj = 0 for k ∈ N, k ̸= s, t (uk )
(P)  i:(i,k)∈A j:(k,j)∈A
v′
X
xit − = 0 (ut )








 i:(i,t)∈A

xij ≤ cij for (i, j) ∈ A (yij )







x ≥ 0
 X
 minimize cij yij
u,y





 (i,j)∈A

subject to −ui + uj + yij ≥ 0 for (i, j) ∈ A (xij )


(D)



us = 1 (v)
(v ′ )




 ut = 0

y ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


178 Chapter 2. Introduction to Linear Optimization

Here, (P) is essentially the same as our maximum-flow linear program. To make the
dual look nicer, we’ve added a dummy variable v that stands in for the objective
function, and a dummy variable v ′ that does nothing (but should be equal to v in
any feasible solution). This means that we have constraints for s and for t that look
a bit like the flow conservation constraint.
Meanwhile, if we stare at (D) long enough, we will recognize a minimum-cut-like
idea to it. There is only one constraint involving each dual variable yij , which can
be rewritten as yij ≥ ui − uj ; however, yij also must be nonnegative. So we could
make the y-variables disappear if we rewrote (D) as a minimax program, where we
minimize
X
cij max{0, ui − uj }.
(i,j)∈A

Now imagine that every ui variable is either 0 or 1. Then max{0, ui − uj } is equal to


1 whenever ui = 1 and uj = 0, but it is equal to 0 in all other cases. So the objective
function just adds up the capacity cij for every arc (i, j) such that ui = 1 and uj = 0.
We can interpret such a binary vector u as a cut (S, T ), where S = {i ∈ N : ui = 1}
and T = {i ∈ N : ui = 0}. Helpfully, the two remaining constraints in (D) tell us
that by this rule, s ∈ S and t ∈ T . The minimax form of the dual objective function
becomes the capacity c(S, T ): we do get cuts back out of linear programming duality!
Of course, not everything is entirely clear yet. Why should u necessarily be a
binary vector? We can maybe make an argument from optimality tht 0 ≤ ui ≤ 1 for
all i ∈ N , since making sure all the u-variables are between us and ut tends to make
max{0, ui − uj } smaller. I am not spelling this argument out in detail, because it
doesn’t resolve the bigger mystery: why should every ui be an integer? More on this
later.

2.68 Greedily increasing flow


We’ve found a linear program for max-flow problems. But it turns out that there are
better ways to solve this problem than by using a linear program. In the next few
lectures, we’ll explore some of these approaches.
To build intuition, we’ll start by discussing an approach that seems like it ought
to work, but doesn’t quite finish the job. The idea here is to find directed paths
from s to t along which we can increase the flow, and just—keep doing that.
Here’s a reasonably typical network that we can use as an example. (As usual,
I’ll mark an arc with a fraction x/c to represent that x flow is being sent along that
arc, and it has a maximum capacity of c.)

a 0/10 b
0/10 0/4 0/12
s t
0/10 0/4 0/8
c 0/4 d

Our first step is to notice the promising path that goes s → a → b → t. The arcs
along this path all have capacity at least 10, so we can send 10 flow along this path:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.69 An augmenting path 179

a 10/10 b
10/10 0/4 10/12
s t
0/10 0/4 0/8
c 0/4 d

Similarly, there is the path s → c → d → t. This one doesn’t, let us make as much
progress, but the bottleneck capacity along that path is 4, so we can send 4 flow
along this path:

a 10/10 b
10/10 0/4 10/12
s t
4/10 0/4 4/8
c 4/4 d

And there is room to send 2 more flow along the path s → c → b → t (but no
more, because the arc (b, t) reaches its capacity of 12 when we do so):

a 10/10 b
10/10 0/4 12/12
s t
6/10 2/4 4/8
c 4/4 d

At this point, we seem to be done. There are no paths that go from s to t along
which we can increase the flow.
However, this still not the maximum flow: there are flows with larger value. It’s
just that we’ve gotten stuck at a feasible flow where we can’t increase all the flows
without decreasing all the flows. We need a better strategy.
(Disclaimer: if we had chosen different paths from s to t to use at each step, we
could have gotten to the maximum flow. But the point is that we don’t know which
paths are the right ones, so we need to be smarter than that.)

2.69 An augmenting path


If you try to describe how we can improve the flow in the most recent step, you might
say something like “we want to send less flow from a to b, and more of it to d. This
lets us increase the flow from c to b." So, there’s a super complicated explanation,
and you can imagine that making arguments like this would get pretty tricky when
the network is bigger.
There’s a trick to describing such improvements that’s simpler to think about,
while still being powerful enough that we can always use it to get to the maximum
flow. That trick is to find augmenting paths, which are slightly more general paths
from s to t.
In an augmenting path, we go from s to t, but we’re allowed to ignore the
directions of the arcs: we can travel both forward and backward. For example, the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


180 Chapter 2. Introduction to Linear Optimization

following is a valid augmenting path:


6/10 2/4 10/10 0/4 4/8
s −−−→ c −−→ b ←−−− a −−→ d −−→ t
Notice that the arc from a to b is being traversed in the “wrong" direction.
An augmenting path is not allowed to cheat however you like, however. The rule
is:
• You can go forward along an arc if it’s still below its maximum capacity.
• You can go backward along an arc if it’s at positive capacity.
If we find an augmenting path, we can let δ be the smallest margin by which it’s
valid: the smallest amount by which forward arcs are below maximum capacity, or
backward arcs are above zero capacity. In this example, δ = 2: we can’t increase the
flow from c to b by more than 2, because xcb = 2 and ccb = 4.
Then we can modify our current feasible flow by adding δ flow along every forward
arc of the path, and subtracting δ flow along every backward arc. In our example,
we increase the flow xsc by 2, increase the flow xcb by 2, decrease the flow xab by
2, increase the flow xad by 2, and increase the folow xdt by 2. Here is the result, in
diagram form:
a 8/10 b
10/10 2/4 12/12
s t
8/10 4/4 6/8
c 4/4 d

This is called “augmenting a flow along a path".


It is worth checking that this operation still gives a feasible flow: in other words,
that flow conservation is still satisfied at each node other than s and t. The good
thing is that even in very complicated networks, there are only four cases to consider
for how a node is affected by this augmenting step:
1. Suppose we have a node · · · → p → · · · along the path, visited “in the normal
way": we take a forward arc into p, and a forward arc out.
In this case, if we augment by δ, the total flow into p will increase by δ, and so
will the total flow out of p. Assuming flow was conserved before augmenting,
it is still conserved.
2. Now suppose our augmenting path visits some node q in a · · · → q ← · · · pattern:
the arc before q is a forward arc, but the arc after q is a backward arc.
In this case, if we augment by δ, the forward arc’s flow will increase by δ and
the backward arc’s flow will decrease by δ. Both arcs contribute to the flow
into q, so the total flow into q will not change. The flow out of q was not
affected at all, so flow is still conserved at q.
3. If a node r is visited in a · · · ← r → · · · pattern, something similar to case 2
happens.
Here, both arcs represent flow out of r, and the total change in the flow out of
r is 0. Meanwhile, flow into r is unaffected, so flow is still conserved at r.
4. Finally, a node s visited in a · · · ← s ← · · · pattern, where both arcs are
backward arcs, is similar to case 1.
Here, the flow along both arcs decreases by δ. So the total flow into r and the
total flow out of r decrease by the same amount. Again, flow is still conserved.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.70 The residual graph 181

2.70 The residual graph


Augmenting paths are convenient to have, but hard to find. To try to do this
systematically, we’ll construct a residual graph.
In a residual graph, the nodes will remain the same, but the arcs we consider are
different. We will add every arc that corresponds to a direction that an augmenting
path could go. Namely:
• We keep an arc of our network in the residual graph if it’s still below maximum
capacity. We label it with its residual capacity, which is the amount of room
still left to increase the flow. (For an arc (i, j), the residual capacity is defined
to be cij − xij .)
• When an arc of our network has positive flow, we add a reverse arc to the
residual graph. We also label it with its residual capacity, which is now the
amount of room left to decrease the flow. If the arc of our network was (i, j)
with flow xij , the residual capacity of the reverse arc (j, i) in the residual graph
is also xij .
To see how this helps us find augmenting paths, let’s go back to the previous
step. On the right is the feasible flow we had at that point; on the left is the residual
graph. (The reverse arcs in the residual graphs are drawn as red dashed lines to
distinguish them.)
Here is how this shakes out for the flow we had before the last augmenting step
we did. (Forward arcs in the residual graph are drawn in black, backward arcs in
red.)

10
a 10/10 b 10
a b 12
10/10 0/4 12/12 4
s t s 2 t
6
6/10 2/4 4/8 4 2 4
4
c 4/4 d c d
4

In the residual graph, finding an augmenting path is just a matter of of “getting


through the maze”: we want to see if there is a way to follow the arcs in the residual
graph (in their proper directions) to get from s to t.
For example, we see that there is a path

4 2 10 4 4
s→
− c→
− b −→ a →
− d→
− t

in the residual graph. That corresponds the same augmenting path we found earlier,
but now we don’t have to follow arcs in unnatural directions to see it. What’s more,
the value of δ is now easy to find: it’s just the smallest residual capacity along this
path.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


182 Chapter 2. Introduction to Linear Optimization

2.71 Residual graphs and minimum cuts


But now, suppose we draw the residual graph for the flow we get after our most
recent augmenting path. Here is that flow, and here is its residual graph:
8
a 8/10 b 10
a 2 b 12
10/10 2/4 12/12 2 4
s t s 8
t
8/10 4/4 6/8 2 2
2 6
c 4/4 d c d
4

This residual graph is a impossible “maze”, and it doesn’t take us long to discover
this. From node s, we can only get to node c; from node c, we can only return to
node s. As a result, there is no augmenting path to find for this flow.
This seems disappointing, but actually it’s expected, and it’s what we want to
see. Here’s why.
Theorem 2.17 Suppose that there is no path from s to t in the residual graph (for
some feasible flow x in some network). Then:
• The flow x is a maximum flow.
• Let S be the set of all nodes reachable from s in the residual graph (including
s itself). Let T be the set of all other nodes. Then (S, T ) is a minimum cut,
and has capacity equal to the value of x.
Before we prove the theorem, we can verify that this is true in our network. The
value of our flow is 18: that’s the total flow leaving s (10 + 8), and also the total flow
entering t (12 + 6). Meanwhile, if S is the set of all nodes reachable from s in the
residual graph, then S = {s, c}, which means T = {a, b, d, t}. The capacity of the cut
(S, T ) is csa + ccb + ccd = 10 + 4 + 4 = 18.

2.72 Proof of Theorem 2.17


Things work out in the example above: since the capacity of the cut (S, T ) agrees
with the value of x, both are optimal. Why is this true in general?
In the previous lecture, when we were proving that the capacity of a cut gives an
upper bound on the value of a flow, we showed the following:
X X XX XX XX XX
v(x) = xsj − xis = xij − xij ≤ cij − 0 = c(S, T ).
j:(s,j)∈A i:(i,s)∈A i∈S j∈T i∈T j∈S i∈S j∈T i∈S j∈T

On the left, we have (by definition) the value of the flow x. By some algebraic
manipulation, we showed that this is equal to the middle expression: the total flow
crossing from S to T , minus the total flow crossing from T back to S. This is upper
bounded by the expression on the right (by using xij ≤ cij on every term of the first
sum, and xij ≥ 0 on every term of the middle sum), and the expression on the right
is just the capacity of the cut (S, T ).
Now let’s think about what happens in the special case where S is the set of all
vertices reachable from the source in the residual graph. This means that there can
be no residual arc from any node i ∈ S to any node j ∈ T .
There are two kinds of residual arcs.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.73 The Ford–Fulkerson algorithm 183

• Forward residual arcs i → j corresponding to arcs (i, j) with xij < cij .
If there are no such arcs from S to T , then for every i ∈ S and j ∈ T , we have
xij = cij .
• Backward residual arcs i ← j corresponding to arcs (i, j) with xij > 0.
If there are no such arcs from S to T , then for every i ∈ T and j ∈ S, we have
xij = 0.
As a result, for the cut we get from the residual graph, the ≤ inequality is actually
a = equation. We replaced xij by cij only in cases where we already had xij = cij ,
and we replaced xij by cij only in cases where we already had xij = 0.
Therefore the value of the flow x is equal to the capacity of the cut (S, T ).
The optimality of both of the flow and the cut follows. We know the capacity
of the cut (S, T ) is an upper bound on the value of any flow; since x achieves that
upper bound, it is a maximum flow (no other flow can be better). Similarly, the
value of x is a lower bound on the capacity of any cut: since (S, T ) achieves that
lower bound, it is a minimum cut.

2.73 The Ford–Fulkerson algorithm


This leads us to an algorithm for trying to find a maximum flow in a network without
using linear programming.
Start at a feasible flow: for example, the flow x = 0. Then repeat the augmenting
step we’ve developed:
1. Construct the residual graph for x.
2. Attempt to find a path from s to t in the residual graph. If such a path exists,
it gives us an augmenting path, which we use to improve x and go back to step
1.
3. If no such path exists, we use Theorem 2.17 to obtain a minimum cut whose
capacity is equal to the value of x. We know x is optimal, and the cut gives us
a certificate of optimality.
As with the simplex algorithm, there is one more thing left to worry about.
Ford–Fulkerson lets us always keep getting a better flow, and only stops when we
reach a maximum flow, but are we guaranteed to actually reach it?
Unfortunately, as stated so far, there is no such guarantee in general. We can say
that we’ll eventually be done in cases where all the capacities cij are integers. In this
case, at every step, the value of the flow increases by at least 1, and so eventually it
P
will reach j:(s,j)∈A csj , which is an upper bound on the value of the maximum flow.
Similarly, if all the capacities are rational numbers, the flow always increases by at
least d1 , where d is the greatest common denominator of all the capacities.
But this is a very bad upper bound. Consider the following example:

a
0/1000 0/1000
s 0/1 t
0/1000 0/1000
b

The maximum flow here has value 2000, which can be reached in just 2 steps. But if
we have poor judgement, and alternate between the augmenting paths s → a → b → t

T.Abraha(PhD) @AKU, 2024 Linear Optimization


184 Chapter 2. Introduction to Linear Optimization

and s → b ← a → t, we increase the flow by 1 at each step, and only finish is 2000
steps.
Things are even worse if some capacities are irrational numbers. Then there is
an example in which a poor choice of augmenting paths means we never even get
close to the maximum flow.
Fortunately, there is a simple rule to avoid this situation. If we always pick the
shortest augmenting path available at every step, then it can be shown that we’ll
always reach the maximum flow after at most n · m steps (in a network with n nodes
and m arcs). This refinement is called the Edmonds–Karp algorithm.

2.74 Consequences of the Ford–Fulkerson method


In the previous lecture, we saw a method for solving maximum flow problems. Just
from the way our method works, there are several things we can deduce about these
solutions.
Theorem 2.18 — Max-flow min-cut theorem. The maximum value of a flow in any
network is equal to the minimum capacity of any cut.

Proof. When the Ford–Fulkerson method finishes computing the maximum flow in a
network, it ends by finding a cut whose capacity is equal to the value of the resulting
flow: that’s how we know that we’ve found the optimal solution. ■

There are many combinatorial applications of the max-flow min-cut theorem.


(Menger’s theorem in graph theory is one notable example; it is fairly difficult to
prove directly, but can be deduced quickly from this theorem. There are many
applications in graph theory, most of which we will not see, just because it would
take us too far out of the scope of this class.)
But how do these combinatorial applications work? The flow along an arc is
a continuous value, but combinatorial problems often have discrete solutions: for
example, we will see some applications where objects are being sorted into categories,
and something can’t be split halfway between multiple categories. So there is a
second theorem that’s important to applying maximum-flow problems in such cases:
Theorem 2.19 — Integral flow theorem. If all capacities in a network are integers,
then the network has an integer maximum flow. (That is, there is a maximum flow
x such that for every arc (i, j), the flow xij is an integer.)

Proof. This result comes from thinking about how the Ford–Fulkerson method finds
a maximum flow. We repeatedly find an augmenting path, then identify the minimum
residual capacity of any of its arcs, and then increase or decrease the flow of those
arcs by that residual capacity.
As long as we have an integer flow x, every residual capacity will also be an
integer: the residual capacity of any arc in the residual graph is given by either xij
or cij − xij , and both of those will have integer values. So in the next step, several
of the xij ’s will change by an integer value, which means that the next flow will also
be an integer flow. That means that when the Ford–Fulkerson method finishes, the
values of x will still all be integers. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.75 Totally unimodular matrices 185

It appears that the linear program for the maximum flow problem is quite special.
First, we found out that the dual program has integer optimal solutions (which
describe cuts, and not some weird fractional analog of cuts). Now we are seeing that
the primal program also has integer optimal solutions!
Although we have already shown that this is true, the first thing we’ll do today
will be to give another proof of this fact—one that relies on just thinking about the
properties of the maximum flow linear program. This is useful to know, because
it can guarantee integer solutions to some other problems as well. After that, we
will see some applications of maximum flow problems, including ones where the
integrality plays a key role.

2.75 Totally unimodular matrices


Let’s back up a bit and consider a general linear program, with constraints Ax = b.
We’ve seen earlier in the semester that a basic solution to this system, where the
basic variables are indexed by B and the nonbasic variables are indexed by N , is
given by xB = (AB )−1 b (with xN = 0). What new things can we learn from this?
You might remember the formula for the inverse of a 2 × 2 matrix:
 −1  
a b 1  d −b
 =
c d ad − bc −c a

The important thing to notice about this formula is that the only number we divide
by is ad − bc: the determinant of the matrix. This continues for larger matrices; for
example, for 3 × 3 matrices, we have
 −1  
a b c ei − f h ch − bi bf − ce
  1  
d e f  =  f g − di ai − cg cd − af 
  aei + bf g + cdh − af h − bdi − ceg  
g h i dh − eg bg − ah ae − bd

and the big fraction in front is exactly the determinant of the matrix. In general, the
1
inverse of an n × n matrix A is equal to det(A) multiplied by another matrix called
the adjugate matrix, whose entries are polynomial functions of the entries of A.
We do not need to know the details. Just knowing this much gives us a useful
corollary:

Theorem 2.20 If A is an n × n matrix with integer entries, then A−1 has integer
entries if and only if det(A) = ±1.

Proof. If det(A) = ±1, then we conclude that A−1 has integer entries from formulas
like the ones above: we compute the adjugate matrix (which will have integer entries,
because we multiply, add, and subtract the entries of A) and divide by det(A) (which
is 1 or −1, so it will not give us any fractional results).
On the other hand, if A and A−1 both have integer entries, then det(A) and
det(A−1 ) must both be integers. But it’s always true that det(A−1 ) = det(A)
1
, because
−1 −1
det(A) · det(A ) = det(AA ) = det(I) = 1. The only way that both det(A) and
1
det(A) can be integers is if det(A) = ±1. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


186 Chapter 2. Introduction to Linear Optimization

We say that an m × n integer matrix A is unimodular if every m × m submatrix


AB has det(AB ) ∈ {−1, 0, 1}. This means that every time we have a basic solution
to Ax = b, the inverse matrix (AB )−1 will have integer entries, by Theorem 2.20.
As a result, when A is totally unimodular and b is an integer vector, Ax = b will
only have integer basic solutions. (We allow a determinant of 0 to make a more
general claim: this corresponds to m × m submatrices which can never give us basic
solutions anyway, since AB will not be invertible.)
An integer matrix A is totally unimodular if every square submatrix, of any
size, has a determinant of −1, 0, or 1. (Note that submatrices don’t have to pick
consecutive rows or consecutive columns. A valid 3 × 3 submatrix of a large matrix
A might pick the 1st , 3rd , and 6th rows with the 2nd , 9th , and 10th colums.
This definition allows us to generalize to basic solutions of systems like Ax ≤ b
(which are really Ax + Is = b for some slack variables s). When we take an m × m
submatrix of such a system, if some of the slack variables are basic, then the
determinant might be equal to the determinant of a smaller, k × k submatrix of A.
This is the explanation for integer solutions to network flow problems! It turns
out that:
Theorem 2.21 In any network, the matrix of flow conservation constraints is totally
unimodular.

Proof. The key to this is that each variable xij in a flow appears in at most two
conservation constraints: once in flow conservation at i, and once in flow conservation
at j. (When i or j is s or t, there might be just one constraint.) Moreover, these
have opposite coefficients: 1 and −1.
If we take a k × k submatrix of the flow conservation matrix, one of the following
happens happens:
1. We picked the column for a variable xij , but didn’t pick any of the rows where
xij has a positive coefficient. Then our submatrix has a column of all zeroes,
and the determinant is 0.
2. We picked the column for a variable xij , but only picked one of the rows
where xij has a nonzero coefficient. Then we can do an expansion by minors
along xij ’s column, and get a determinant equal to ±1 times a (k − 1) × (k − 1)
determinant; repeat this argument for that determinant instead.
(If k is already 1, we get a determinant of ±1 equal to our single nonzero entry.)
3. If case 1 and 2 don’t occur for any column, then every column has both a 1
and a −1 inside it. Then the rows of our submatrix add up to 0, because the 1
and −1 in every column cancel. This is a linear dependency between the rows,
so the determinant is 0.
Since all cases result in a determinant of −1, 0, or 1, the matrix is totally unimodular.

To help with visualization, here is an example network and the matrix for its flow

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.76 Some applications 187

conservation constraints:
a 0/10 b 
1 0 −1 −1 0 0 0 0

0/10 0/4 0/12  
s t
0
 0 1 0 −1 1 0 0
 
0/10 0/4 0/8
0
 1 0 0 0 −1 −1 0 

c 0/4 d 0 0 0 1 0 0 1 −1

The rows correspond to nodes a, b, c, d; the columns, to variables xsa , xsc , xab , xad , xbt , xcb , xcd , xdt .
Theorem 2.21 implies the integral flow theorem (we have to consider the capacity
constraints as well, but it turns out these do not change much). It also explains why
the dual program always has an integer optimal solution (representing a cut).

2.76 Some applications


2.76.1 Graph factorization
Problem 2.18 A chocolate factory has 11 employees; conveniently; there are 11
industrial processes that they need to be trained in. Not everyone needs to be trained
in every process; is it possible for every employee to learn 5 processes, such that
every process has 5 employees that are trained in it?
Suppose that the first part of the problem is solved. Later, the factory would
like 3 of the employees trained in a process to get a official certification in it. Is it
possible for this to happen so that every employee gets certifications in 3 of the 5
processes they are trained in?
For the first part of this problem, there are some mathematical constructions
with modular arithmetic that show how it can be done; however, we can also set this
up as a network flow problem. Draw the following network (where unlabeled arcs
(ai , bj ) should have capacity 1):

s
0/5 0/5 0/5 0/5

a1 a2 ··· a10 a11

b1 b2 b10 b11
0/5 0/5 0/5 0/5

Suppose we can find an integer flow in this network with value 5 · 11 = 55. Then
every node ai receives 5 flow from s, which it must send to 5 different nodes among
b1 , b2 , . . . , b11 . Meanwhile, each of those nodes must receive a total of 5 flow, which it
sends on to t. Now, if we interpret a flow of 1 from ai to bj as “Employee i is trained
in process j” then we satisfy the conditions in the problem exactly.
How do we know that an integer flow like this exists? The key is that a fractional
flow like this can be found without any work. As before, send 5 flow from s to every

T.Abraha(PhD) @AKU, 2024 Linear Optimization


188 Chapter 2. Introduction to Linear Optimization
5
node ai , and have each node ai send 11 units of flow to each node b1 , b2 , . . . , b11 . As
5
a result, every node bj receives 11 units of flow from 11 different sources, for a total
of 5, which it passes on to t. This is a maximum flow, since we cannot send any
more flow out of s, so by the integral flow theorem, there is an integer maximum
flow with the same value.
For the second part of the problem, assuming we have solve the first any way we
like, redraw the network: keep only the arcs (ai , bj ) which were used in the integer
solution to the first part of the problem. Then, change the capacity on the arcs out
of s and into t to 3.
Again, we can find a fractional flow with value 3 · 11 (the maximum possible)
quickly. Node s sends 3 flow to each of a1 , a2 , . . . , a11 . Each of them sends 35 flow
along each of the 5 arcs leaving it; then each of the nodes b1 , b2 , . . . , b11 receives 35
flow along 5 incoming arcs, for a total of 3, and sends on 3 flow to t. By the integral
flow theorem, there is also an integer flow with value 3 · 11; picking out only the
arcs (ai , bj ) with flow 1, and interpreting them as “Employee i gets a certification in
process j”, we satisfy the requirements of the problem.

2.76.2 Consistent matrix rounding


Problem 2.19 The students, faculty, and staff at a university vote on whether they
prefer coffee or tea, and we get the following percentages among those who voted:

Students (p) Faculty (q) Staff (r) Total


Coffee (a) 7.143% 21.43% 14.29% 42.86%
Tea (b) 28.57% 7.143% 21.43% 57.14%
Total 35.71% 28.57% 35.71% 100%

Can we round all percentages to integer values so that the row and column totals
still make sense?
There are more than just aesthetic reasons why we might want to round data like
this. In a scientific report, we want to avoid revealing specific numbers, but as it is,
a determined investigator might notice that all of these percentages are approximate
1
multiples of 14 . This strongly suggests that there were 14 people total, that exactly
one student preferred coffee, and so forth.
It turns out that we achieve the rounding we want if we allow ourselves a tiny bit
of flexibility: we can round each percentage in either direction, not just to the closest
integer. Moreover, this can be described as a variant of the network flow problem!
The network that represents the problem is given below. Arcs (s′ , a) and (s′ , b)
correspond to the row sums; arcs (p, t), (q, t), and (r, t) correspond to the row sums;
the six intermediate arcs correspond to individual entries of the table. With this
setup, flow conservation constraints are exactly the constraints that tell us that row
and column sums do what they’re supposed to do.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.77 Transversals 189

a [7,8] p
[21,22]
[42,43] [14,15] [35,36]

s [100,100] s′ q [28,29] t
[57,58] [28,29] [35,36]
[7,8]
b [21,22] r

We have not talked about how to solve a network flow problem with lower and
upper bounds. It turns out that there are ways to convert this problem to a problem
with only capacities, and if we had more time in the semester, we’d absolutely talk
about how. For now, let’s just prove that this network must have an integer flow
satisfying all the constraints (and with value exactly 100).
First, there is a feasible flow x using the exact values that generated our per-
6
centages: for example, xsa in this feasible flow is exactly 14 · 100%, the value that
created our approximate value of 42.86%. This flow is optimal, so an optimal integer
flow exists as well.

2.77 Transversals
Problem 2.20 At Network Flow University, there are many clubs and student
organizations. There are so many that the students ran into a problem: how can
they elect presidents for them all?
Of course, every club must have a president (who must be a member of the club).
Moreover, being president is a lot of work, so every student should be a member
of only one club. If we know the rosters of all the clubs, can we always choose a
president for each one?
Here is a particular instance of this problem—this one is probably much smaller
than Network Flow University would really have to deal with, but it lets us look at
something concrete. Say that there are six students at NFU; their names are Declan
(D), Evelyn (E), Finn (F ), Genevieve (G), Harper (H), and Jasper (J). There are
also six clubs, whose rosters we’ll abbreviate to:
A1 : {F, J} A2 : {D, E, F, H} A3 : {D, F, G, J} A4 : {E, J} A5 : {E, F } A6 : {E, F, J}
Thinking more generally, we can pose this problem whenever we have a universe U
(in this case, the universe is the university: the set of students {D, E, F, G, H, J})
and a family F of subsets of U (in this case, F = {A1 , A2 , A3 , A4 , A5 , A6 }). Our goal
is to find a transversal of F: from each set in F, we want to choose one element,
without choosing the same element of U twice. Formally, we want to choose elements
a1 , a2 , a3 , a4 , a5 , a6 ∈ U such that ai ∈ Ai for all i and whenever i ̸= j, we have ai ̸= aj .
You may already see some issues with our particular instance of this problem,
but we will ignore those issues and keep going. Let’s try to model this problem using
integer flows.
The idea is that we want to set up a network so that “elect student u to be
president of club Ai ” will be represented by “send flow 1 from node i to node u”. (We
could also build a network to send flow the other way: the choice to send flow from
clubs to students is arbitrary.) Of course, we only include an arc (i, u) if student u is
a member of club Ai .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


190 Chapter 2. Introduction to Linear Optimization

We cannot elect the same u to be president of multiple clubs; to enforce this, we


can make sure that the flow out of u can be at most 1, limiting the flow into u to 1
as well.
We want to make sure every club Ai gets a president; this would mean making
sure that the flow into node i is at least 1. This is not something we can really
enforce in a network. Maximum flow problems are optimization problems; to make
this an optimization problem, we can ask “what is the maximum number of clubs
that can have presidents?”
Now we can add arcs from the source s to nodes 1, 2, 3, 4, 5, 6 with capacity 1 each;
sending 1 flow along arc (s, i) means that the flow out of node i will be 1, forcing
club Ai to elect a president.
Here is the resulting network:

0/1 0/1 0/1 0/1 0/1 0/1

1 2 3 4 5 6

D E F G H J

0/1 0/1 0/1 0/1 0/1 0/1

We did not put any capacities on arcs from clubs to students. In principle, we
could put a capacity of 1 on each such arc, since a club cannot elect a student to be
“double president”. But that constraint is already implied by the fact that a student
cannot be elected president twice (not even by the same club).
Instead, we will give these arcs infinite capacity. In the linear program, we can
represent this by just not having a capacity constraint. For the Ford–Fulkerson
method, using ∞ as a capacity is fine if we’re careful, but we could also use any big
number M . The reason to use a large capacity on these arcs is that later on, it will
limit the cuts we can find to ones with a nice structure.

2.78 Hall’s theorem


What happens if we try to solve this network flow problem?
It turns out that the maximum value of any flow is 5. There are many solutions.
Here is one of them; we will use a thick edge to make it clear which arcs from clubs
to students have flow 1:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.78 Hall’s theorem 191

1/1 1/1 1/1 1/1 1/1 0/1

1 2 3 4 5 6

D E F G H J

1/1 1/1 1/1 0/1 1/1 1/1

So it turns out we cannot elect a president for every club. Why not? For an
explanation, we must turn to the minimum cut.
We did not draw the residual graph, but let’s try to reconstruct it in our heads:
going forward along arcs where the capacity can be increased, and backwards where
the capacity can be decreased. From s, we can only go to 6 in the residual graph.
From 6, we can go to E, F , and J. From these, we cannot take any forward arcs
(those are at their maximum capacity), but we can go along the backward arcs (5, E)
(1, F ), and (4, J) with positive flow to get to nodes 1, 4, 5. Finally, from these nodes,
we can only go forward, but the only forward arcs lead to nodes E, F , and J, which
we’ve seen already. So the minimum cut (S, T ) we find has

S = {s, 1, 4, 5, 6, E, F, J}, T = {2, 3, D, G, H, t}.

The capacity of this cut is 5: the arcs going from S to T are the arcs (s, 2), (s, 3),
(E, t), (F, t), and (G, t), which have capacity 1 each.
We can interpret this cut to identify an issue with the club rosters we started
with. The four clubs A1 , A4 , A5 , A6 have only three students between them altogether:
Evelyn, Finn, and Jasper. So how could they possibly elect four different presidents?
This tells how to identify a general obstacle to the existence of a transversal of a
family F: a transversal cannot exist if F has a subfamily G such that G (the union
S

of all sets in G) has fewer than G elements. In our case, G = {A1 , A4 , A5 , A6 } and
[
G = {F, J} ∪ {E, J} ∪ {E, F } ∪ {E, F, J} = {E, F, J},

so |G| = 4 and | G| = 3.
S

In fact, we can prove that this is the only kind of obstacle that can rule out
a transversal! To do this, we have to look at the structure of cuts in the network
carefully.
Suppose that we are looking at a general example with F = {A1 , A2 , . . . , An }. Our
network has four “layers": one layer with just s, one layer with nodes {1, 2, . . . , n},
one layer with nodes U (the universe), and one layer with just t. Let’s begin choosing
our cut by saying that the set S will contain s and a subset I ⊆ {1, 2, . . . , n}.
For every i ∈ I and every x ∈ Ai , there is an arc (i, x) in our network with infinite
capacity! We are trying to find a cut with small capacity, and it seems reasonable to
aim for “not infinite” as a starting point toward “small”. So we should make sure to

T.Abraha(PhD) @AKU, 2024 Linear Optimization


192 Chapter 2. Introduction to Linear Optimization

include every such x in S as well, so that arc (i, x) does not cross from S to T . In
S
other words, S must include the entire union i∈I Ai .
At this point, the following arcs cross from S to T :
• The arcs (s, j) for every j ∈/ I, since these are the nodes from {1, 2, . . . , n} we
decided not to include in S. The total capacity of these is n − |I|.
• The arcs (x,St) for every x ∈ S, since we definitely have t ∈ T . The total capacity
of these is | i∈I Ai |.
• Actually, we could add more elements of U to S, but this would only increase
our capacity, so it wouldn’t help.
This cut proves that we don’t have a transversal exactly when its total capacity
n − |I| + | i∈I Ai | is less than n: in other words, when | i∈I Ai | < |I|. In other words,
S S

the subfamily G = {Ai : i ∈ I} has | G| < |G|: this is the same type of obstacle that
S

we identified earlier.
What we’ve proved is a result known as Hall’s theorem:
Theorem 2.22 — Hall’s theorem. A family F has a transversal if and only if every
subset G ⊆ F satisfies Hall’s condition: | G| ≥ |G|.
S

Many results about when transversals are guaranteed to exist can be shown using
Hall’s theorem. For example:
Theorem 2.23 If every set in F has the same size k, every element of U occurs in
ℓ sets of F, and k ≥ ℓ, then F has a transversal.

Proof 1 (using Hall’s theorem). For an arbitrary subset G ⊆ F, a naive count of


| G| would say: well, there are |G| sets, and each one has k elements, so the union
S

has k|G| elements. This is false when the sets in G are not disjoint, since the same
element can be counted multiple times. However, we will never count an element
more than ℓ times, since every element of U appears in ℓ sets of F. Therefore we can
at least say that | G| ≥ kℓ |G|. Since k ≥ ℓ, this implies that | G| ≥ |G|, so Hall’s
S S

condition holds for all G. ■

Proof 2 (using network flows). To simplify notation, let F = {A1 , A2 , . . . , An }.


The network for this problem has a fractional flow with value n, which is the
maximum possible. Send one flow along every arc (s, i) with 1 ≤ i ≤ n; split it up
into k1 flow along every arc (i, x) with x ∈ Ai ; finally, at each x ∈ U , we end up with
a total kℓ ≤ 1 flow going in, so we can send kℓ flow out to t.
By the integral flow theorem, there is also an integer flow with value n, which
corresponds to a transversal. ■

2.79 Vertex covers and matchings


Often, problems such as the club-president problem are represented by bipartite
graphs. You would see this in more detail in a class all about graph theory, but
essentially these are just a condensed drawing of our network, leaving out s and t,
such as:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.79 Vertex covers and matchings 193

1 2 3 4 5 6

D E F G H J

Rather than “nodes” and “arcs”, it’s more common to say vertices and edges in
the case of a graph, but these mean nearly the ssame thing, except that edges don’t
have a direction. The graph is bipartite because its vertices can be partitioned into
two sets X and Y , such that every edge has one endpoint in X and one endpoint in
Y . Let’s say that X = {1, 2, 3, 4, 5, 6} and Y = {D, E, F, G, H, J} in our case.
In any flow, the arcs we use (except for the ones out of s and into t, which
are gone now) cannot repeat a node. The equivalent notion in a graph is called a
matching: a set of edges that does not repeat any endpoints. The solution we found
in the previous section corresponds to the matching {1F, 2H, 3D, 4J, 5E}.
It is common to see problems where X and Y are the same size. In this case, a
transversal of a set family corresponds to a perfect matching in a graph: this is a
matching which uses up all the vertices in X and in Y .
What about the equivalent notion to a cut in our network? For this, it’s convenient
to turn our pair (S, T ) into a different sort of object. We’ll take a set C with the
following elements:
• All vertices in X (the side that used to be closer to s) which are in T , and
• All vertices in Y (the side that used to be closer to t) which are in S.
In our example with S = {s, 1, 4, 5, 6, E, F, J} and T = {2, 3, D, G, H, t}, we end up
with C = {2, 3, E, F, J}.
Why do we do this? Well, the defining feature of a cut with finite capacity is that
we cannot have an infinite capacity arc going from S to T ; in our graph, this means
that there cannot be an edge such that neither endpoint is in C. The resulting set C
is always a vertex cover: a set of vertices that includes one endpoint of every edge.
The max-flow min-cut theorem has the following result when we translate it into
the language of bipartite graphs:
Theorem 2.24 — König’s theorem. In any bipartite graph, the number of edges
in a maximum matching is equal to the number of vertices in a minimum vertex
cover.
Hall’s theorem could also also be stated in the language of graphs, as a condition
for the existence of a perfect matching. Often, that is the way you first see it.
However, no matter what language we use, Hall’s theorem and König’s theorem have
slightly different applications:
• Hall’s theorem asks: when can we guarantee that we can find a perfect matching
between X and Y ? This sees more use in theoretical applications, where we
are looking for a bijection of a certain type between two sets.
• König’s theorem asks: what is the size of the largest matching we can find?
This sees more use in practical applications, where we might want a large
matching even if a perfect one does not exist.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


194 Chapter 2. Introduction to Linear Optimization

In both cases, the best way to answer the question is to solve the network flow
problem.

2.80 Integer linear programming


An integer linear program (often just called an “integer program") is your usual
linear program, together with a constraint on some (or all) variables that they must
have integer solutions.
We encountered this requirement in some of the applications of network flow
problems, but these all had the miraculous property of total unimodularity to save
us: we didn’t have to think about the integer constraints, because they would hold
automatically. This is rare and unusual: typically, requiring our variables to be
integers does change the optimal solution.
For example, consider the following three optimization problems:

maximize x + y maximize x + y maximize x + y


x,y∈R x∈Z,y∈R x,y∈Z
subject to 3x + 8y ≤ 24 subject to 3x + 8y ≤ 24 subject to 3x + 8y ≤ 24
3x − 4y ≤ 6 3x − 4y ≤ 6 3x − 4y ≤ 6
x, y ≥ 0 x, y ≥ 0 x, y ≥ 0
3
The first example is an ordinary linear program with optimal solution (4, 2 ).
The second example is a (mixed) integer program where (4, 23 ) is still the optimal
solution. In fact, here, all vertices of the feasible region have x ∈ Z; if we know this
ahead of time, we can solve the integer program as a linear program.
The last example is an integer program with the same constraints, but the optimal
solutions are (2, 2) and (3, 1) instead. Note that we can’t even solve the integer
program by rounding (4, 23 ) to the nearest integer; that won’t give us a feasible
solution.
In general, an optimal integer solution can be arbitrarily far from the optimal
solution. For example, consider the region
x−1 x
 
2
(x, y) ∈ R : ≤ y ≤ , x, y ≥ 0 .
8 10
This region (shown below) has a vertex at (x, y) = (5, 12 ), but its only integer points
are at (0, 0) and (1, 0).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.81 Logical constraints 195

We can replace 8 and 10 by large numbers like 998 and 1000 to get a vertex
arbitrarily far away from an integer point. This is just one of the reasons that
integer programming is hard, and weird things can happen when we add the integer
constraint.

2.81 Logical constraints


2.81.1 Sudoku
In a Sudoku puzzle, you have a 9 × 9 grid (divided into nine 3 × 3 subgrids called
“boxes” that will become important later); your goal is to fill in the cells with numbers
from 1 to 9. Some of the cells are pre-filled with numbers to begin with; the other
constraint on the solution is that every row, column, and box must contain each of
the numbers 1 through 9 exactly once. Below is an example of a Sudoku puzzle.8 Its
solution is given on the last page of these notes, in case you want to solve it yourself
without being spoiled.

5 3 7
6 1 9 5
9 8 6
8 6 3
4 8 3 1
7 2 6
6 2 8
4 1 9 5
8 7 9
Suppose that instead of solving this Sudoku ourselves, we want to write an integer
program so that an automatic solver can do it for us. How can we do that?
If you ask yourself, “What should my variables be?” there is a tempting but
incorrect answer: for each row i and column j, have an integer variable xij with the
constraint 1 ≤ xij ≤ 9 that tells you the number in cell (i, j).
The reason that this is not the right choice of variables is that the constraint
“these nine cells contain each value exactly once” cannot be written as a linear
inequality in terms of these variables. For example, it is possible for a row of a
Sudoku grid to contain the numbers (1, 2, 3, 4, 5, 6, 7, 8, 9), in that order; it is also
possible for that row to contain the numbers (9, 8, 7, 6, 5, 4, 3, 2, 1), in that order.
However, any set of linear constraints that allows these two points as solutions also
allows their midpoint, which is (5, 5, 5, 5, 5, 5, 5, 5, 5): not a valid way to fill in a row
of a Sudoku grid!
Instead, we will use binary variables. These are a very common choice in integer
programs; they are possibly the most common integer variable. What we’ll do here
is have an integer variable xijk where i, j, and k are all between 1 and 9. These 729
8The example is taken from Wikipedia, and its author is listed as Tim Stellmach.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


196 Chapter 2. Introduction to Linear Optimization

variables will be bound by constraints 0 ≤ xijk ≤ 1; since they’re all integers, then
they can only be 0 or 1. The meaning we’ll attach to these variables is that xijk will
be 1 if cell (i, j) contains the number k, and 0 otherwise.
There are many constraints to be added to enforce the rules of Sudoku, but they
all come in four types:
• In every row, the numbers must all be different; in other words, no value can
be repeated. This is a set of 81 constraints: for every row i, and for every value
k, we add the constraint

xi1k + xi2k + xi3k + xi4k + xi5k + xi6k + xi7k + xi8k + xi9k = 1

to ensure that exactly one of the cells (i, 1) through (i, 9) contains the value k.
• In every column, the numbers must all be different. This is a very similar-
looking set of 81 constraints: for every column j, and for every value k, we add
the constraint

x1jk + x2jk + x3jk + x4jk + x5jk + x6jk + x7jk + x8jk + x9jk = 1

to ensure that exactly one of the cells (1, j) through (9, j) contains the value k.
• There are also 81 constraints for the boxes. These also look very similar, though
they’re slightly harder to describe. For example, for the top left 3 × 3 box, we
will have a constraint

x11k + x12k + x13k + x21k + x22k + x23k + x31k + x32k + x33k = 1

for each k between 1 and 9, which ensures that the value k appears exactly
once in that box. We add similar sets of constraints for the other 3 × 3 boxes.
• The last set of constraints is easier to forget about. It does not enforce the
rules of Sudoku as described above; rather, it enforces a rule of common sense
that comes from the meaning we attach to those variables. We want to ensure
that every cell of the 9 × 9 grid contains exactly one number. Thus, for every
row i, and for every column j, we add the constraint

xij1 + xij2 + xij3 + xij4 + xij5 + xij6 + xij7 + xij8 + xij9 = 1.

The resulting integer program has 729 variables and 324 constraints (not counting
the 729 constraints of the form xijk ≤ 1, and the 729 nonnegativity constraints), so it
is well out of reach of what we’ll be able to solve by hand. However, computers are
very good at problems like this: a general-purpose algorithm for solving problems
with {0, 1}-valued variables can solve the Sudoku above basically instantly. Though
in general, all known integer programming algorithms take exponential time in the
worst case, this takes a while to kick in, especially in special cases.

2.81.2 Boolean satisfiability problems


The Sudoku integer program above falls under the umbrella of Boolean satisfiability
problems. The word “Boolean” refers to variables that have two values (in our
case, 1 and 0) which mean that some statement is true or false, respectively (in our
case, xijk = 1 corresponds to the statement “Cell (i, j) contains value k” being true).
The word “satisfiability” means the same thing as “feasibility” in other problems

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.81 Logical constraints 197

we’ve solved this semester: we have no objective function, we just want to know if a
feasible solution exists.
Boolean satisfiability problems are typically expressed using logical operations:
“and”, “or”, “not”, and others. However, all of these can be expressed using linear
expressions and linear constraints:
• If x and y are {0, 1}-valued variables and 1 is interpreted as “true”, then
x + y ≥ 1 encodes the constraint “x or y is true”.
• If x and y are {0, 1}-valued variables and 1 is interpreted as “true”, then
x + y = 2 encodes the constraint “x and y are both true”.
• If x is a {0, 1}-valued variable and 1 is interpreted as “true”, then 1 − x
represents the same truth value as “not x”.
Typically, Boolean satisfiability problems are written in a standard form called
“conjunctive normal form”. This standard form consists of:
• literals, which are either “xi” or “not xi” for some variable xi;
• combined into clauses: sets of literals among which at least one must be true;
• finally, the overall problem is a collection of clauses among which all must be
true.
We will not get into the weeds of encoding problems in this form, but it is a form
that lends itself particularly well to an integer programming representation. Each
clause can be represented by a single linear inequality: for example, “x or y or not z”
could be written as
x + y + (1 − z) ≥ 1
or x + y − z ≥ 0.

2.81.3 Tying together logical and linear constraints


Integer programming, however, is even more expressive than just Boolean satisfiability
problems. In the next lecture, we will see an important example where some larger
integers (not just 0 and 1) come into play. We can also get a lot of mileage out of
using a few {0, 1}-valued variables inside a larger linear program, which is what we’ll
see today.
Here are just a few ways these can come into play:
Fixed costs. Not all costs (and profits) in an optimization problem scale
continuously per unit. For example, imagine that a factory produces calculators and
stores them before shipping. The factory might earn $50 per calculator9 produced;
however, it might pay $1000 to rent a warehouse, no matter how many calculators
are stored there.
We can use a binary variable y (as before, this will be an integer variable with the
constraint 0 ≤ y ≤ 1) to represent whether we rent the warehouse. Then, our profits
from producing x calculators might be expressed as 50x − 1000y. It is probably
reasonable to treat x as a real-valued variable; though it is impossible to produce 12
of a calculator, the number of calculators will probably be large enough that this
won’t make a big difference.
Choice of constraints. Of course, if the warehouse rental does not affect
anything, we will always set y to 0. We can also switch between different constraints
9 Calculator prices are ridiculous, considering they are often based on technology that was
available in the 1980s.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


198 Chapter 2. Introduction to Linear Optimization

depending on the value of w. To give a simple example, suppose we can produce up


to 100 calculators if we don’t have a warehouse, but up to 2000 if we do have one.
This can be written an inequality

x ≤ 100 + 1900y.

Here, we have an upper bound on x in either case, one bound is just larger than
the other. If, on the other hand, the warehouse can store an effectively unlimited
number of calculators, then we might have to “make up” an upper bound. That is,
we’d write an inequality

x ≤ 100 + M y

where M is a number chosen large enough that this constraint won’t limit the number
of calculators we can produce. For example, based on looking at the rest of the linear
program, we might conclude that we’ll never have the materials to produce more
than 10000 calculators over the time period we’re considering, and then we could set
M = 10000.

Note that in practice, we’ll see better behavior when we solve our integer programs
if the value of M is not chosen to be gratuitously large. (But we must be careful: if
we pick too small an M , we might cut off legitimate solutions.)

Multiple binary variables. There are many ways to make this setup more
complicated.

Maybe we can rent multiple storage locations with different costs and different
capacities: we could end up a constraint like

x ≤ 100 + 200y1 + 500y2 + 1600y3

where y1 , y2 , y3 correspond to different storage options. (Our objective value might


become something like 50x + 200y1 + 400y2 + 1000y3 to account for their prices.)

It’s possible that the ideas from the previous section might get involved. Maybe
options y1 and y2 are mutually exclusive, but option y3 requires option y2 to be
chosen first. We could represent such logical constraints by the inequalities y1 + y2 ≤ 1
and y3 ≤ y2 , respectively.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.82 Sudoku solution 199

2.82 Sudoku solution


Here is the solution to the Sudoku puzzle that appeared earlier in these notes:

5 3 4 6 7 8 9 1 2
6 7 2 1 9 5 3 4 8
1 9 8 3 4 2 5 6 7
8 5 9 7 6 1 4 2 3
4 2 6 8 5 3 7 9 1
7 1 3 9 2 4 8 5 6
9 6 1 5 3 7 2 8 4
2 8 7 4 1 9 6 3 5
3 4 5 2 8 6 1 7 9

2.83 The bin packing problem


Problem 2.21 For mysterious reasons, you are carrying crates of potatoes from one
end of campus to another. You have fifteen crates to carry: five 4-kilogram crates,
five 5-kilogram crates, and five 6-kilogram crates. You can carry multiple crates at a
time, but you’re not willing to carry more than 12 kilograms in one trip.
What is the minimum number of trips you’ll have to make?
This problem is a special case of the bin packing problem, usually phrased
in terms of packing items (of different sizes) into the smallest possible number of
containers (of some fixed size). Here, each of your trips across campus is a “container”.
There are multiple different formulations of the bin packing problem as an integer
program. Let’s begin with an attempt that might be intuitive but actually is a bad
idea.

2.83.1 Bin packing by planning each trip


The natural choice of variables in this problem is to have variables that answer the
questions: “What will you carry on your first trip? What will you carry on your
second trip?” and so on. The identities of the crates don’t really matter, only their
weight. So suppose you want to answer these questions as: “On my first trip, I will
carry three 4-kilogram crates. On my second trip, I will carry one 5-kilogram crate
and one 6-kilogram crate. On my third trip, . . . ”
This suggests using nonnegative integer variables xi,j equal to the number of
i-kilogram crates carried on the j th trip: x4,1 , x5,1 , x6,1 , x4,2 , x5,2 , x6,2 , x4,3 , . . . . Here,
i is always one of 4, 5, or 6. It’s a bit of a challenge deciding how far j should go:
isn’t that the question we’re trying to answer? A quick way to settle this is to decide
that it’s definitely possible to solve the problem in 15 trips, so we’ll have j go up to
15, and then we’ll try to minimize the number of trips we’ll actually use.
Three constraints will tell us that we end up carrying all the crates we have:

x4,1 + x4,2 + x4,3 + · · · + x4,15 = 5

T.Abraha(PhD) @AKU, 2024 Linear Optimization


200 Chapter 2. Introduction to Linear Optimization

x5,1 + x5,2 + x5,3 + · · · + x5,15 = 5


x6,1 + x6,2 + x6,3 + · · · + x6,15 = 5

To make sure that for each j, we don’t exceed our weight limit on the j th trip, we
could add constraints

4x4,j + 5x5,j + 6x6,j ≤ 12

for each j = 1, . . . , 15. (Wait on this, though; we’ll modify this constraint in a bit.)
The tricky part is figuring out how to minimize the number of trips necessary.
One way to do this is with a boolean variable yj for each trip j (that is, an integer
variable with 0 ≤ yj ≤ 1) answer the question: did we take trip j at all? To enforce
this interpretation of yj , we can modify the above constraint to

4x4,j + 5x5,j + 6x6,j ≤ 12yj

for each j = 1, . . . , 15. If yj = 1, we get back our previous constraint; if yj = 0, then


this modified constraint forces us to have x4,j = x5,j = x6,j = 0. In other words, if we
want to make use of the j th trip, we must set yj = 1. Now, we can

minimize y1 + y2 + y3 + · · · + y15

to minimize the number of trips taken.


This is a valid solution! Why don’t we like it? Because there is too much symmetry.
There are many essentially equivalent solutions that have different representations.
If we take one solution and swap the crates we carry on the first and second trip,
then we get a different, equally good solution. We could even “renumber our trips”
and say, “We will take 10 trips, but they will be trips 3 through 12” by setting
y3 = y4 = · · · = y12 = 1 and y1 = y2 = y13 = y14 = y15 = 0. This is equivalent to going
on 10 trips numbered 1 through 10.
In integer programming, we don’t like having many equivalent solutions. In-
tuitively, this is because we end up having to wade through many equally good
options in the hope of finding a better one, which takes a long time. After we see
the branch-and-bound method today, we will be able to say more precisely why
symmetry is bad.

2.83.2 The configuration linear program


The configuration linear program (or configuration LP) is a different approach
to the bin packing problem; the idea can be applied to many other combinatorial
optimization problems like it.
Here, we begin by listing the possible configurations: the sets (multisets) of
crates that can be carried on a single trip. The ones that are worth considering are

(4, 4, 4) (5, 5) (6, 6) (4, 5) (4, 6) (5, 6).

Here (4, 4, 4), for example, means that you carry three 4-kilogram crates; (4, 6)
means that you carry a 4-kilogram crate and a 6-kilogram crate. This list leaves out
suboptimal configurations like “carry just two 4-kilogram crates”; this simplifies our
setup, though we’ll have to address this issue later.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.84 The branch-and-bound method 201

Now we can define nonnegative integer variables x444 , x55 , x66 , x45 , x46 , x56 to be
the number of trips taken with each configuration of goods. The total number of
trips is just the sum of these variables: x444 + x55 + x66 + x45 + x46 + x56 ; this is the
quantity we will want to minimize.
For each size of crate, we ask for the total number of crates taken of that size to
be at least 5; for example, we add the constraint
3x444 + x45 + x46 ≥ 5
for the 4-kilogram crates. Why at least 5 and not exactly 5? Because in practice,
we might want to carry suboptimal configurations we didn’t include: if our first two
trips just deal with the 4-kilogram crates, the configurations will be (4, 4, 4) and then
(4, 4). We will model that as “taking an extra 4-kilogram crate”, with two trips that
are both (4, 4, 4).
In this case, our entire linear program is:
minimize x444 + x55 + x66 + x45 + x46 + x56
subject to 3x444 + x45 + x46 ≥ 5
2x55 + x45 + x56 ≥ 5
2x66 + x46 + x56 ≥ 5
x444 , x55 , x66 , x45 , x46 , x56 ≥ 0
x444 , x55 , x66 , x45 , x46 , x56 ∈ Z
Figuring out the configurations takes a bit of work (and in general, there may be
many of them, which is a weakness of this method). However, once we get there,
solving this integer program will turn out to be easier.
How do we get there? Well, we can begin by ignoring the integer variables, and
pretending they can be fractions. This is called “solving the LP relaxation”.
In this case, doing that will tell us that we can solve the problem in 6 23 trips,10
by setting x55 = 52 , x66 = 52 , and x444 = 53 .
Of course, this is not an actual workable solution. But it does give us some
important information! We now know for certain that at least 7 trips will be necessary.

2.84 The branch-and-bound method


2.84.1 Branching
The branch-and-bound method is the first complete method we’ll see for solving
integer programs.
The first ingredient of this method is the branching. The idea is that we’ll start
by solving the LP relaxation of a problem, and we’ll keep solving LP relaxations.
However, we can’t just keep solving the same linear program, or we’ll keep getting the
same solution. In fact, we should only solve linear programs that somehow exclude
that previous fractional solution we got.
To accomplish this, we branch on an integer variable which has been given a
fractional value. Whenever our optimal solution ends up giving some integer variable
xi a fractional value f ∈
/ Z, we know that one of two things must hold:
10 I
am writing the objective value as a mixed fraction, which is rarely done in advanced mathe-
matics, because I want it to be easy to see which integer values are the closest.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


202 Chapter 2. Introduction to Linear Optimization

• either xi is actually at most ⌊f ⌋ (f rounded down),


• or xi is actually at least ⌈f ⌉ (f rounded up).
So we can solve two new linear programs: one where we add the constraint xi ≤ ⌊f ⌋,
and one where we add the constraint xi ≥ ⌈f ⌉. Both of these exclude our previous
solution, because setting xi to f does not satisfy either constraint. However, we do
not exclude any solutions where xi really is an integer, so we have not lost the true
optimal solution to our problem.
We do not have to start over from scratch. The dual simplex method is tailor-
made for situations where we take a formerly-optimal dictionary and add a constraint
that it violates. Even though we get two new linear programs to solve every time we
branch, we can hope that we won’t have to do too much extra work to solve each
one of them.
When we solve the new linear programs, we might still get fractional solutions to
each one of them. So we repeat the process, branching from each of those fractional
solutions. The number of linear programs can grow quickly. Here is what happens if
we start branching from our solution to the potato-crate problem:

6 23

x55 ≤ 2 x55 ≥ 3

6 32 7 16

x444 ≤ 1 x66 ≤ 2 x66 ≥ 3


x444 ≥ 2

7 7 7 13 7 23

x444 ≤ 1 x444 ≤ 1
x444 ≥ 2 x444 ≥ 2

7 12 8 9 8

x66 ≤ 1
x66 ≥ 2

7 32 8

x444 ≤ 0
x444 ≥ 1

8 8

In this diagram, each rectangle is labeled with an optimal objective value, and every
arrow is labeled with the constraint we added when branching. The rectangles with
no branches correspond to places where the linear program obtained an integer
optimal solution.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.84 The branch-and-bound method 203

2.84.2 Pruning
As you can see, it can take a while to end up at integer solutions by branching.
In fact, it is typical to have to deal with many cases in the branch-and-bound
method; often, there are exponentially many cases. We saw in the previous lecture
that integer programming is very expressive and can handle many different problems;
the downside is that it is not usually easy to solve.
However, things are not as bad as I’ve made them seem, because we have only seen
half of the branch-and-bound method. In addition to branching, there is pruning:
discarding branches that are not promising without exploring them.
Here’s an example. Suppose that we start solving the potato-crate problem, and
get the following results:
1. We solve the original LP and get the the solution

(x444 , x55 , x66 , x45 , x46 , x56 ) = ( 53 , 52 , 52 , 0, 0, 0)

with objective value 6 23 . From here, we branch on x55 .


2. We try the x55 ≥ 3 branch and get the solution

(x444 , x55 , x66 , x45 , x46 , x56 ) = ( 35 , 3, 52 , 0, 0, 0)

with objective value 7 16 . From here, we branch on on x66 .


3. Before doing that, we go back to see what happens with the x55 ≤ 2 branch,
and get the solution

(x444 , x55 , x66 , x45 , x46 , x56 ) = ( 53 , 0, 0, 0, 0, 5)

with objective value 6 23 , which seems more promising. From here, we branch
on x444 .
4. We try the x444 ≤ 1 sub-branch of the x55 ≤ 2 branch, and get the solution

(x444 , x55 , x66 , x45 , x46 , x56 ) = (1, 0, 0, 1, 1, 4)

with objective value 7. This is our first integer solution!


Here is the diagram of where we’ve gotten so far:

6 23

x55 ≤ 2 x55 ≥ 3

6 23 7 61

x444 ≤ 1 x66 ≤ 2
x444 ≥ 2 x66 ≥ 3
7 ?? ?? ??

There are certainly branches we haven’t explored yet. But are they worth
exploring? We have an integer solution with objective value 7 already: we know we

T.Abraha(PhD) @AKU, 2024 Linear Optimization


204 Chapter 2. Introduction to Linear Optimization

can solve the problem in 7 trips. So the only reason to explore other branches is if
they could give us a 6-trip solution.
However, whenever we add more constraints, we can only make the objective
value worse. Therefore exploring down from the node labeled 6 23 will give us more
solutions with objective value at least 6 23 : if they’re integer solutions, the objective
value will be at least 7. Exploring down from the node labeled 7 16 will be even worse:
the smallest integer objective value we can get is at least 8.
So at this point we know we’ve found an optimal integer solution! We can stop.
The general philosophy is:
1. If ζ ∗ is the objective value of the best integer solution we’ve found, we can
prune nodes that branched from an objective value of ζ ∗ or worse: we don’t
explore those sub-branches. They will never give us anything better than what
we’ve already found.
2. In problems where integer solutions have integer objective values, we can do a
bit better. For each fractional objective value, treat it as the next-worst integer
value when pruning, since that is the best we can do from that branch. (In
our example, we prune the other branch leading out of 6 23 , even though 6 23 < 7,
because there’s no integer between 6 23 and 7.)
(Sometimes, adding constraints will create an infeasible linear program in one of our
sub-branches. These should also be pruned, because adding more constraints to an
already-infeasible lineanear program will never yield solutions.)

2.84.3 Why symmetry is bad


Now that we know how branch-and-bound works, we can say more about why
symmetry—that is, having many equivalent integer solutions—is a bad sign in an
integer program.
Let’s look back at our first formulation of the potato-crate problem, where we
had a variable xi,j for the number of i-kilogram crates on the j th trip. A fractional
solution might have x5,1 = 45 : we carry a fraction of a 5-kilogram crate on the first
trip. What will happen when we branch on x5,1 , adding the constraints x5,1 ≤ 0 or
x5,1 ≥ 1?
One thing that’s likely to happen is that we’ll find an equivalent fractional solution
with x5,2 = 45 , instead: we carry the same fraction of a 5-kilogram crate, but on the
second trip. We’ll have to branch on x5,2 , then on x5,3 , and so on through x5,15
before we finally start seeing really new solutions. It takes much longer before we
get to an integer solution we can actually use to prune our options.
When we have a choice between several integer programs for the same underlying
problem, we prefer to pick ones without this type of symmetry: ones where every
branch will lead to new solutions.

2.85 Cutting planes in general


The cutting plane method is really a family of strategies for solving integer programs.
The philosophy is that an integer program can have multiple linear programming
formulations: different sets of linear inequalities describing the same set of integer
points. When we solve the linear programming relaxation and get a fractional
solution, that’s a sign that our linear programming formulation wasn’t very good.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.86 The Gomory fractional cut 205

So why not try to improve it?


Specifically, suppose that we have an integer program

maximize
n
cT x
x∈Z
subject to Ax ≤ b
x≥0

We solve the LP relaxation, and get a fractional solution x∗ . If we want an improved


formulation of this integer program, we want to generate a new inequality pT x ≤ q
such that:
• It’s valid for the integer Tprogram: every point x ∈ Zn that satisfies Ax ≤ b
and x ≥ 0 also satisfies p x ≤ q.
We don’t want to change the problem, after all!
• It cuts off the fractional solution x∗ we got previously: we have pTx∗ > q.
If this does not hold, then adding the new inequality won’t help; we’ll still get
x∗ as the optimal solution to the LP relaxation.
Such an inequality is called a cutting plane for the integer program. If we can
come up with a cutting plane, then we can add it as an additional constraint, and
solve the LP relaxation of the new integer program.
Of course, there’s no guarantee that the new LP relaxation will have an integer
solution, either. We might get another fractional solution, in which case we’ll have
to do this again. The hope is that after several steps, we’ll get an integer solution.
One big question remains: where do we actually get these cutting planes to begin
with?
There are many strategies, and each one results in a cutting plane method. They
vary in how difficult they are (some require more or less work to come up with a
cut) and how effective.

2.86 The Gomory fractional cut


The Gomory fractional cut is one strategy for coming up with cutting planes. It’s
quick to perform, and the cuts it produces are pretty good.
To make it work, we’ll assume that we have a purely integer program: some
cutting plane methods still work with a mix of integer and real variables, but this
one isn’t one of them. We’ll also assume that all numbers in the constraints are
integers. (If they’re merely rational numbers, we can turn them into integers by
clearing denominators.) This is important because:
Fact 3: Given a purely integer linear program where all numbers in the constraints
are integers, at every basic solution the slack variables also have integer values.
This fact holds because, given the assumptions, every slack variable will be a
difference between two integer quantities: the two sides of an inequality in integer
variables with integer coefficients.
Normally, our slack variables are real numbers, and there is no reason to force
them to be integers. However, the Gomory fractional cut cannot work unless all the
variables are integer variables, and that includes the slack variables.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


206 Chapter 2. Introduction to Linear Optimization

2.86.1 The general rule


The Gomory fractional cut takes an equation in integer variables, and uses it to come
up with an inequality they must satisfy. The rule defining the fractional cut is:
Theorem 2.25 Suppose that nonnegative integer variables x1 , x2 , . . . , xn satisfy the
equation

a1 x1 + a2 x2 + · · · + an xn = b.

Then they also satisfy the inequality

⌊a1 ⌋x1 + ⌊a2 ⌋x2 + · · · + ⌊an ⌋xn ≤ ⌊b⌋

where ⌊r⌋ denotes the floor of r: the greatest integer less than or equal to r.
Moreover, the difference between the two sides of this inequality is an integer.

Proof. The quantity ⌊a1 ⌋x1 + ⌊a2 ⌋x2 + · · · + ⌊an ⌋xn has smaller or equal coefficients
on every variable, compared to a1 x1 + a2 x2 + · · · + an xn . Since all the variables are
nonnegative, it must be smaller. We conclude that

⌊a1 ⌋x1 + ⌊a2 ⌋x2 + · · · + ⌊an ⌋xn ≤ b.

However, in the inequality we just wrote down, the left-hand side is an integer. So if
it is less than or equal to b, it is also less than or equal to ⌊b⌋, giving us the inequality.
The difference between the two sides of the inequality is an integer simply because
both sides are integers. ■

It is often convenient to rewrite the inequality in this theorem by subtracting it from


the equation we started with, getting

(a1 − ⌊a1 ⌋)x1 + (a2 − ⌊a2 ⌋)x2 + · · · + (an − ⌊an ⌋)xn ≥ b − ⌊b⌋.

Here, ai − ⌊ai ⌋ and b − ⌊b⌋ are called the fractional parts of ai and of b.
Be careful when taking fractional parts of negative numbers! If r is a positive real
number, r − ⌊r⌋ is just the part of r after the decimal, but this is no longer true if r
is negative. For example, if r = −1.23, then its floor is ⌊r⌋ = −2, and its fractional
part is r − ⌊r⌋ = 0.77.
There is a further detail we need to know: not just that this inequality is valid,
but that it cuts off fractional optimal solutions. This will force the dual simplex
method to find us a new solution, with a new opportunity for it to be an integer
solution.
If the Gomory fractional cut is obtained from a row of our dictionary, then the
fractional-part form of the inequality will contain only nonbasic variables: the single
basic variable xi will have ai = 1 (since it appears on the left side with coefficient 1)
and therefore ai − ⌊ai ⌋ = 0. So the right-hand side of the inequality will be 0 at the
current basic solution. Provided we pick a row where the constant term is a fraction,
we will have b − ⌊b⌋ > 0, so the inequality will not hold!

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.86 The Gomory fractional cut 207

2.86.2 An example
As an example, consider the following integer program:

maximize 3x + 2y
x,y∈R
subject to 3x + y ≤ 6
y ≤ 2
x, y ≥ 0

Before we can find a cutting plane, we should solve the linear programming relaxation.

max ζ = 0 + 3x + 2y max ζ = 4 − 2w2 + 3x max ζ = 8 − w1 − w2


w1 = 6 − 3x − y ⇝ w1 = 4 + w2 − 3x ⇝ x = 43 − 31 w1 + 13 w2
w2 = 2 − y y = 2 − w2 y=2 − w2

If we did not find an integer solution, then by definition, one of the basic variables
will have a fractional value in the optimal dictionary. In our case, x has a fractional
value. Finding the Gomory fractional cut requires us to pick one such variable; if
there are several, it doesn’t matter which one we pick, but in this case, we can only
pick x.
To apply Theorem 2.25 here, we need to write x’s equation in the appropriate
form. We move all the nonbasic variables to the left-hand side, getting
1 1 4
x + w1 − w2 = .
3 3 3
The inequality in Theorem 2.25 is obtained by rounding every coefficient down to
the nearest integer. This turns x into x, 13 w1 into 0w1 or 0, − 13 w2 into −w2 , and 43
into 1, so we get the inequality

x − w2 ≤ 1.

The reason we like to subtract this inequality from the previous inequality (or
equivalently, take the fractional parts of the coefficients) is that this eliminates the
basic variable x, giving us an inequality in the nonbasic variables:
1 2 1
w1 + w2 ≥ .
3 3 3
Our next step is to add this to our dictionary. We add a new variable w3 representing
the amount by which 13 w1 + 23 w2 exceeds 13 . Solving for w3 , this gives us the equation
1 1 2
w3 = − + w1 + w2 .
3 3 3
The clause at the end of Theorem 2.25 is important here: it tells us that the new
variable w3 will be an integer at any integer solution to our program, so we will
continue to have an integer program in which all variables are integers.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


208 Chapter 2. Introduction to Linear Optimization

We can add the equation w3 = − 13 + 13 w1 + 23 w2 as another row to our dictionary


and solve the new linear program using the dual simplex method:
15 1 3
max ζ = 8 − w1 − w2 max ζ = 2 − 2 w1 − 2 w3
4 1 1 3 1 1
x = 3 − 3 w1 + 3 w2 ⇝
x = 2 − 2 w1 + 2 w3
3 1 3
y = 2 − w2 y = 2 + 2 w1 − 2 w3
w3 = − 13 + 1
3 w1 + 2
3 w2 w2 = 1
2 − 1
2 w1 + 3
2 w3

(A brief synopsis of the dual simplex pivot we did: we know that w3 is our leaving
variable, and because both w1 and w2 have positive coefficients, they’re both on our
1
shortlist for entering variables. We compare ratios, and w1 ’s ratio 1/3 is larger than
1
w2 ’s ratio 2/3 , so w2 is our entering variable. After pivoting, we end up with an
optimal dictionary.)
We are still not done, because (x, y) = ( 32 , 32 ) is not a integer solution. But let’s
take a break from that to look at what is going on graphically.
To visualize the procedure of adding a cutting plane, we can rewrite our inequality
x − w2 ≤ 1 in yet a third form: in terms of x and y. To do this, substitute w2 = 2 − y,
getting x − (2 − y) ≤ 1 or x + y ≤ 3. In the diagram of the feasible region, here is
what this looks like:

The newly added inequality separates all the integer solutions (in black) from the
fractional solution to the LP relaxation (the large point in red). Unfortunately, the
optimal solution to the new feasible region happens to be the one corner of that
region that doesn’t have integer coordinates. We have the worst luck!
To continue, we go back to our dictionary. All three basic variables have a
fractional value, so we could pick any of them to deal with, but let’s pick x again.
(In this example, it turns out that we’ll get the same cut no matter which equation
we get it from.)
Moving all the variables to the left, we get x + 12 w1 − 12 w3 = 32 . Taking the
fractional parts according to the alternate form of Theorem 2.25, we get the inequality
1 1 1
2 w1 + 2 w3 ≥ 2 . We can add this to our dictionary, with a new slack variable w4 , if
we define w4 = − 12 + 12 w1 + 12 w3 .
Applying the dual simplex method again takes us only one pivot step:
15 1 3
max ζ = 2 − 2 w1 − 2 w3 max ζ = 7 − w3 − w4
3 1 1
x = 2 − 2 w1 + 2 w3 x = 1 + w3 − w4
3 1 3
y = 2 + 2 w1 − 2 w3
⇝ y = 2 − 2w3 + w4
1 1 3
w2 = 2 − 2 w1 + 2 w3 w2 = 0 + 2w3 − w4
w4 = − 12 + 1
2 w1 + 1
2 w3 w1 = 1 − w3 + 2w4

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.87 Extensions 209

The optimal solution is (x, y) = (1, 2), which is an integer solution! After the second
cutting plane, we are done.
The second cut we added, which we wrote as 12 w1 + 12 w3 ‘ ≥ 21 , can be written in
terms of x and y as 2x + y ≤ 4. Here is another diagram showing the evolution of
our feasible region as we add the cutting planes:

⇝ ⇝

2.87 Extensions
The cutting plane method is often combined with the branch-and-bound method into
a hybrid algorithm called “branch-and-cut". Here, when solving a linear program
and getting a fractional solution, we make a choice between two options:
• Pick a variable xi with a fractional value, and use it to branch out to two new
linear programs, as usual in the branch-and-bound method.
• Add a cutting plane inequality to replace the linear program by a new one with
a different solution.
It’s a matter of heuristics (in other words, guesswork) to decide between these two
options. These heuristics are only partially developed by mathematical reasoning;
partially, we just check them on practical examples to see how well they behave.

2.88 The traveling salesman problem


The “flavor text" of the traveling salesman problem (TSP) is the following. There
are n cities, numbered 1, 2, . . . , n, with some costs of travel between them. Between
two cities i and j, we are given a cost of travel cij to go from i to j. (We assume
that it’s possible to travel from any city to any other—maybe you can hire a private
jet if you need to—but some costs may be extremely large.)
A salesman starting in city 1 wants to visit all n cities in some order and return
to city 1. We call this a tour of the n cities—sometimes we call it a closed tour to
distinguish it from open tours which do not have to return to the starting point.
The salesman’s goal is to find the cheapest possible (closed) tour, adding up the cost
of all n legs of the tour. There are (n − 1)! orders in which the other cities could be
visited, so this is not a problem we can solve by brute force for any reasonable value
of n.
These days, we buy everything online, so we are not interested in solving the
problems of actual traveling salesmen. On the other hand, an Amazon delivery truck
might end up solving the traveling salesman problem if it has to make n deliveries in a
neighborhood in the shortest amount of time. Many other route-planning algorithms
also require solving the traveling salesman problem, even when literal salesmen are
not involved.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


210 Chapter 2. Introduction to Linear Optimization

There are also many industrial applications in which “cities” and “travel” are
more metaphorical. For example, if we are constructing an object layer by layer in a
3D printer, then optimizing the order in which we deposit material is a variant of
the traveling salesman problem. We might also be drilling holes in a circuit board,
cutting a sheet of wood with a laser cutter, or even manipulating a robot arm to
take photos of an object from multiple angles11 .
Even if some of these problems add additional twists to the problem, the starting
point is usually one of the two TSP formulations we will look at today.
It will be convenient for us to assume that we never visit a city more than once
in a tour. In some formulations, this may require distinguishing between “official”
and “unofficial” visits to a city. For example, we can imagine that if we’re trying
to tour the US by taking airplane flights, we might go from Atlanta to Orlando to
Charlotte, and the flight from Orlando to Charlotte might have a layover in Atlanta.
In this case, the cost of the Orlando–Charlotte route in our problem would simply
be the total cost of the two-leg trip, and we don’t even notice the layover in Atlanta
when we’re finding the optimal tour.
On the other hand, we could also consider a problem where “unofficial” visits to
a city are not allowed—for example, we are still trying to visit n different cities in
the US by airplane and return, but we want our trip to consist of n direct flights.
In this case, the cost of going from Orlando to Charlotte might increase if we have
to avoid stopping in Atlanta. We will return to this distinction at the end of the
lecture.

2.89 An incomplete formulation


Here is a first attempt at representing the problem with an integer program.
Suppose that to every pair of cities (i, j), we assign an integer variable xij ∈ {0, 1}
which will equal 1 if the tour goes from city i to city j, and 0 otherwise. Then the
pair (i, j) constributes cij xij to the total cost of a tour: cij , when xij = 1, and 0,
when xij = 0. Therefore, we have an objective function to
n X
X n
minimize cij xij .
i=1 j=1

In a tour, we visit each city only once: we enter the city once, and then we leave
the city once. (For city 1, we do those in a different order: first we leave city 1, and
then we return to it. But this doesn’t affect things; in fact, in a tour, it doesn’t
matter which city is the starting city.) We can represent this requirement by a pair
of constraints for each city:
X
xij = 1 for each j = 1, 2, . . . , n (2.23)
1≤i≤n
i̸=j
X
xjk = 1 for each j = 1, 2, . . . , n (2.24)
1≤k≤n
k̸=j

11 Formore details of this unusual application—with pictures!—see the paper where I originally
found it: https://doi.org/10.3390/robotics11010016.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.90 Subtour elimination constraints 211

Equation (2.23) says that we arrive at city j from exactly one other city. Equa-
tion (2.24) says that we leave city j to go to exactly one other city.
If these constraints were all we needed, we’d be in great shape. (In fact, the
constraint matrix so far is totally unimodular, so we wouldn’t even need to worry
about integer programming techniques. We will not prove this, because it doesn’t
immediately help us with anything, but it’s true.) Unfortunately, there’s a problem.

Take a random set of 9 points (as in the first diagram) and let cij be the distance
between the ith point and the j th point. Then the minimum-cost tour between
the 9 points, as found by a brute-force search, is shown in the second diagram.
Unfortunately, the solution to the integer program with constraints (2.23) and (2.24),
shown in the third diagram, is not a tour at all!
The optimal solution to the integer program we have so far satisfies the constraint
that we must enter each node once and leave it once, and so it looks like a tour
“locally". Unfortunately, it is missing the “global" condition that the tour must be
connected.

2.90 Subtour elimination constraints


We solve this problem by adding additional constraints called subtour elimination
constraints that rule out the disconnected solutions. There are two famous solutions
to this problem that take very different approaches, each with its advantages and
drawbacks.

2.90.1 Solution #1: the DFJ constraints


The first solution to this problem was proposed by Dantzig, Fulkerson, and Johnson
in 1954. To disambiguate, we will call this set of constraints the DFJ constraints.
The DFJ subtour elimination constraints are the constraints
XX
xij ≥ 1 for each S s.t. 1 ≤ |S| ≤ n − 1 (2.25)
i∈S j ∈S
/

For every set S of cities, other than the empty set ∅ and the set {1, 2, . . . , n} of all
cities, the sum on the left-hand side ranges over all pairs (i, j) such that going from
city i to city j leave S. By requiring the sum to be at least 1, we require that the
tour will leave the set S at least once.
This is guaranteed to happen for any legitimate tour. Since the tour visits every
single city, it must visit a city in S at some point. However, the tour cannot stay in
S forever, since there are also cities not in S, so eventually it must take a step that
leaves S.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


212 Chapter 2. Introduction to Linear Optimization

However, the optimal solution to the constraints in (2.23) and (2.24) on the
previous page violates this condition. We could, for example, take S to be the set of
the three points on the bottom. The solution there consists of a “subtour" that just
cycles between the three cities in S, and some other thing that happens between the
six cities outside S, with no step that leaves S.
With the subtour elimination constraints in play, every integer solution to (2.23),
(2.24), and (2.25) is actually a valid tour, and so we can solve the TSP problem
using an integer program. A slightly concerning feature of the subtour elimination
constraints is that there are 2n − 2 of them. That number grows almost as quickly
as the number (n − 1)! of possible tours, so solving even the linear programming
relaxation might not be quicker than solving the TSP problem by brute force.
A solution to this is to add the constraints in (2.25) on the fly, one at a time, just
as we added the fractional cuts in the previous lecture. Given any integer solution
to (2.23) and (2.24) that is not a tour, we can quickly find a set S for which the
corresponding constraint in (2.25) is violated. For example, we can start at city
1 and follow the path defined by the integer solution (by going from city i to the
unique city j such that xij = 1) until we return to city 1. Let S be the set of all
cities we visit: then either S = {1, 2, . . . , n} and we have a tour, or else the subtour
elimination constraint for S is violated because
XX
xij = 0.
i∈S j ∈S
/

As a result, one way to proceed is using a hybrid branch-and-cut method, starting


with just (2.23) and (2.24) as the constraints. Whenever we find a fractional solution
(which can’t happen with just those constraints, but might happen if we have added
some of the constraints in (2.25) already), we can branch on one of the fractional
variables. Whenever we find an integer solution which doesn’t represent a tour, we
can find a set S for which the constraint in (2.25) is violated, and add that constraint
to the problem.

2.90.2 Solution #2: MTZ constraints


Another way to formulate the integer program, discovered by Miller, Tucker, and
Zemlin in 1960, avoids the subtour elimination constraints in favor of a more compact
formulation. In the MTZ constraints, we add n − 1 additional variables t2 , t3 , . . . , tn
that, intuitively, represent the time at which a city is visited.
If the times at which we visit the cities are given, then we can eliminate subtours
with the condition that, when going from city i to city j, the time tj at which city
j is visited must be later than the time ti at which city i is visited. Because strict
inequalities are not something we allow in linear programs, we will phrase this as
the logical implication
if xij = 1, then tj ≥ ti + 1
for every pair (i, j) with i ̸= 1 and j ̸= 1. (We leave t1 undefined and don’t include it
in these constraints because city 1 is visited twice: at the start of the tour, and at
the end.)
We can encode the logical implication with the “big-M " technique:
tj ≥ ti + 1 − M (1 − xij )

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.91 Approximation algorithms 213

where M is some large number: when xij = 0, the constraint tj ≥ ti + 1 − M does


nothing, and when xij = 1, we have tj ≥ ti + 1. We can actually choose M = n,
because the times can be chosen from the range [0, n−1]. This gives us the constraints
ti − tj + 1 ≤ n(1 − xij ) for all i, j ̸= 1 s.t. i ̸= j (2.26)
In any actual tour, we can satisfy (2.26) by setting ti = 1 for the first city we visit
after city 1, ti = 2 for the second, and so on, with ti = n − 1 for the last city (after
which we return to city 1).
However, if we have an integer solution to (2.23) and (2.24) that is not a tour,
it must have a subtour not including city 1. For that subtour, some constraint in
(2.26) must be violated. For example, if the subtour goes from city a to b to c back
to a, then we must have xab = xbc = xca = 1, and constraint (2.26) for these three
pairs simplifies to
ta − tb + 1 ≤ 0
tb − tc + 1 ≤ 0
tc − ta + 1 ≤ 0

There is no solution to these three constraints: when we add all three of them
together, the variables ta , tb , tc all cancel and we get the false inequality 3 ≤ 0. We
get a similar contradiction for a longer subtour.
With the equations (2.23), (2.24), (2.26), we only have around n2 constraints
in our (n2 + n)-variable integer program, which is much better than the around 2n
constraints we had earlier. The variables t2 , t3 , . . . , tn don’t even need to be integer
variables, although there is an optimal solution where they all have integer values.
It is not necessarily true that the MTZ constraints are better than the DFJ
constraints, just because there are fewer of them. In practice, it seems that the DFJ
constraints have better performance—and adding the constraints on the fly with a
branch-and-cut approach solves the main obstacle to using them. Still, the MTZ
constraints have the advantage that they’re easier to work with without specialized
code: both approaches are useful in the right circumstances.

2.91 Approximation algorithms


So far in this class, we’ve talked about ways to solve an integer program exactly.
Sometimes, we are not that greedy: we will be happy if we find an integer solution
that’s pretty good. Even this is not always easy: there is no general-purpose strategy.
In some (but not all) cases, we can obtain a decent solution by rounding. This
happens, for example, with the configuration LP that we used to solve packing
problems in Lecture 26. In that problem, solving the LP relaxation might give us a
solution with 52 of one configuration, 52 of another configuration, and 53 of a third
configuration. Rounding these values up could give us an integer solution with 3
of the first configuration, 3 of the second, and 2 of the third. This is not the best
integer solution, but it is decent: we are guaranteed to exceed the optimal solution
by at most the number of configurations!
The traveling salesman problem is not a case where we can get anywhere by
rounding. A fractional solution might end up “leaving” city 1 by setting x12 = x13 = 12 .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


214 Chapter 2. Introduction to Linear Optimization

If we round these values up to integers, we obtain a solution that is supposed to go


from city 1 to both city 2 and to city 3 at the same time: this is nonsense!
We will see a decent approximation algorithm in the case of the metric traveling
salesman problem: where costs are symmetric (cij = cji ) and satisfy the triangle
inequality cij + cjk ≥ cik . This is true of distances in the plane, for example.
The trick here is that a related problem turns out to be much easier to solve.
Suppose that we want to find the cheapest set of connections that join all the cities
together, even if we cannot visit them in order. This is always guaranteed to be at
least as cheap as the cheapest closed tour. Moreover, this is a problem that can be
solved greedily: if we repeatedly take the cheapest connection that does not create a
subtour until we reach n − 1 connections (for n cities), then the result will be optimal
for this simplified problem.
Given such a set of connections, we can find a closed tour, with some redundancies
in it, that uses each connection twice: once in each direction. This is where we lose
optimality: we are now at twice the cost of the simplified problem, which could be
as bad as twice the cost of the optimal traveling salesman tour (but no worse).
Finally, assuming that the triangle inequality cij + cjk ≥ cik holds, we can simplify
our solution to a standard closed tour that does not revisit any cities. The way to do
this is simple: every time you’d be coming back to a city where you’ve already been,
just skip ahead to the next new city! This can only reduce the cost of the tour.
An example of this is shown below: first the cheapest set of connections that join
together all the cities, then the tour that uses each of those connections twice, then
the simplified closed tour that does not revisit cities. (In the second diagram, the
curved arcs are just there to help distinguish the two times we use a connection;
the distances we use for costs are still measured along a straight line. In the last
diagram, the two times we “skip ahead” are drawn in red.)

In this case, though we did not find the optimal solution, we got much closer
than the factor-2 guarantee. The optimal solution (shown on a previous page) has
total length about 4.88733; the solution found by our approximation algorithm has
total length about 5.01606.
A fancier version of this approximation algorithm, called the Christofides algo-
rithm, does even better: its cost is at most 1.5 times the cost of an optimal tour.
This is not always what we want, but it can be better than nothing if our integer
programs are too big to solve directly!

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.92 Review of Linear Programming 215

2.92 Review of Linear Programming


Review of basic solutions, basic feasible solutions, simplex etc. Basic feasible solutions
are extreme points.

Definition 2.9 — Column Sub-matrix. Suppose we have a matrix

1 2 3 4 5
 
1 2 −1 1 −1
 
0 1
 0 1 −1

0 0 1 1 −1

with columns labelled 1, 2, 3, 4, 5. Let B be a subset of column indices, so


B ⊆ {1, 2, 3, 4, 5}. Then AB is a column sub-matrix of A indexed by set B.
For example, if B = {1, 2, 3}, then AB is
 
1 2 −1
 
0 1 0 
 
0 0 1

Using this notation, Aj denotes column j of A.

Definition 2.10 — Basis. Let B be a subset of column indices. B is a basis if


1. AB is a square matrix
2. AB is non-singular (columns are independent)
Let A be a matrix with independent rows, then B is a basis if and only if B is
a maximal set of independent columns of A.

Theorem 2.26 The maximum number of independent columns is equal to the


maximum number of independent rows.

Definition 2.11 — Basic and Non-basic Variables. Let B be a basis for A


• if j ∈ B then xj is a basic variable
• if j ∈
/ B then xj is a non-basic variable
For example, for basis B = {1, 2, 4}, then x1 , x2 , x4 are basic variables and x3 , x5
are non-basic variables.

Definition 2.12 — Basic Solution. x is a basic solution for basis B if


1. Ax = b and

T.Abraha(PhD) @AKU, 2024 Linear Optimization


216 Chapter 2. Introduction to Linear Optimization

2. xj = 0 whenever j ̸∈ B.
For example, suppose we have

1 2 3 4 5
 
1 2 −1 1 −1
A =
0 1 0

1 −1
 
0 0 1 1 −1

 
2
 
B =
1

1
 
1
 
1
 
 
Then x = 1
  is a basic solution for basis B = {1, 2, 3} since Ax = b and
 
0
 
0
x4 = x5 = 0.

Problem 2.22 Suppose we have the system of linear equations


   
1 0 1 −1 2
 x= 
0 1 1 1 2
What is the basic solution x for basis B = {1, 4}? We have
   
2
 =
1 0 1 −1
x
2 0 1 1 1
       
1 1 1 −1
= x1   + x2   + x 3   + x4  
0 0 1 1
   
1 −1
= x1   + x4   Since x is a basic solution, x2 = x3 = 0
0 1
  
1 −1 x1 
=
0 1 x4
Since AB is a square matrix and non-singular, it must have an inverse, therefore
   −1       
x
 1 = 
1 −1 2
 =
1 1 2 4
=
x4 0 1 2 0 1 2 2
 
4
 
0
Therefore the basic solution is x= .
 
0
 
2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.92 Review of Linear Programming 217

Notice that when we are given a basis, then there is only 1 solution. This is a
theorem.
Theorem 2.27 Consider Ax = b and a basis B of A. Then there exists a unique
basic solution x for B.
Proof. We have
b = Ax
X
= A j xj
j
X X
= A j xj + A j xj
j∈B j̸∈B
X
= A j xj Since xj = 0 for all j ̸∈ B
j∈B
= AB xB
Now, since B is a basis, it implies AB is invertible, so A−1
B exists. Hence, xB =
A−1
B b. ■

Definition 2.13 — Basic Solution. Consider Ax = b with independent rows.


Vector x is a basic solution if it is a basic solution for some basis B.

Problem 2.23 For the system of linear equations


   
3 2 1 4 1 6
 x= 
−1 1 0 2 1 3
 
0
 
0
 
 
is x=
3
 basic?
 
0
 
3
We just have to give a basis B and show that x is basic for the basis B. We let
B = {3, 5} and we have Ax = b, and x1 = x2 = x4 = 0. Therefore, x is basic for basis
B, and so x is a basic solution.
 
0
 
1
 
 
Now is x=
0
 basic?
 
1
 
0
̸ 0
No, we will prove it. Suppose x is basic for basis B. By definition, since x2 =
and x4 ̸= 0, we have 2, 4 ∈ B. Thus
 
2 4
A{2,4} = 
1 2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


218 Chapter 2. Introduction to Linear Optimization

is a column submatrix of AB . But the columns of A{2,4} are dependent, so AB is


singular and B is not a basis, a contradiction.

Definition 2.14 — Feasible Basic Solution. A basic solution x of Ax = b is feasible


if x ≥ 0, i.e., if it is feasible for the problem P .

Definition 2.15 — Simplex Algorithm. Suppose we have a basic solution to the


LP problem in canonical form with basis B. We can find a better feasible
solution by:
1. Pick k ̸∈ B such that ck > 0.
2. Set xk = t ≥ 0 as large as possible
3. Keep all other non-basic variables at 0
4. Choose basic variables such that Ax = b holds
Here is an example:
Suppose we have the following LP problem:
max  
0 1 3 0 x
s.t.
   
1 1 2 0 2
 x= 
0 1 1 1 5
x1 , x 2 , x 3 , x 4 ≥ 0

Our LP is in canonical form for basis B = {1, 4} and (2, 0, 0, 5)T is a basic solution.
We will pick k ̸∈ B such that ck > 0. So we pick k = 2, and set x2 = t ≥ 0, while
keeping other non-basic variables to 0. So x3 = 0.
Then we choose basic variables such that Ax = b holds. So we have
   
2
 =
1 1 2 0
x
5 0 1 1 1
       
1 1 2 0
= x1   + x2   + x3   + x4  
0 1 1 1
       
x1 1 2 0
=  +t +0 + 
0 1 1 x4
   
1 x1
= t + 
1 x4

Rearranging gives      
x 2 1
 1 =   − t  
x4 5 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.92 Review of Linear Programming 219

This can be written in general as

xB = b − tAk

Since we want x1 , x4 ≥ 0, we have

x1 = 2 − t ≥ 0 ⇒ t ≤ 2
x4 = 5 − t ≥ 0 ⇒ t ≤ 5

So t ≤ 2 and we choose t = 2. Therefore, our new feasible solution is


   
2−t 0
   
 t  2
x=
 = 
  
 0  0
   
5−t 3

Notice that our new solution is a basic solution for basis B = {2, 4}, we can say
that 1 left the basis and 2 entered the basis. Rewriting our LP to canonical
form with basis B = {2, 4}, we have
max  
−1 0 1 0 x
s.t.
   
1 1 2 0 2
 x= 
−1 0 −1 1 3
x1 , x 2 , x 3 , x 4 ≥ 0

Again, we choose k ̸∈ B such that ck > 0 and set xk = t. So we choose k = 3, so


x3 = t ≥ 0. We then pick

xB = b − tAk
     
x 2
 2 =   − t 
2
x4 3 −1

From here, we pick t = 1, so x2 = 0, thus 2 leaves the basis. The new basis is
now B = {3, 4} with canonical form
max  
−1.5 −0.5 0 0 x
s.t.
   
0.5 0.5 1 0 1
 x= 
−0.5 0.5 0 1 4
x1 , x 2 , x 3 , x 4 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


220 Chapter 2. Introduction to Linear Optimization
 
0
 
0
The basic solution is x=
 .

Since there is no k ̸∈ B such that ck > 0, we
1
 
4
have reached the optimal solution.
However, the Simplex algorithm is not guaranteed to terminate.

Definition 2.16 — Feasible Region. For an optimization problem, the feasible


region is the set of all feasible solutions.

2.92.0.1 Operations that Preserve Convexity



T
If Qi are convex, then i Qi is convex (can be infinite).
• For an affine map f : C → D, f (x) = Ax + b, where C, D are convex. The
images f (C), and preimages f −1 (D) are convex sets.
• Projections of convex sets P (C) are convex.
• Scaling of convex sets are convex, αC
• Rotation of convex sets are convex, QC, QT Q = I.
• Translation of convex sets are convex, C + x.
2.92.0.2 Hyperplane Separation and Support Theorems
Theorem 2.28 — Hyperplane Separation Theorem. Let C, D ⊂ Rn be convex sets
and C ∩ D = ∅. Then there exists a ̸= 0, α ∈ R such that

aT x ≤ α ≤ aT y

for all x ∈ C, y ∈ D.

In other words, this theorem guarantees that if you have two sets that do not
intersect and are convex, there is a hyperplane (which is a generalization of a flat
surface to n-dimensions; for instance, a line in 2D or a plane in 3D) that separates
the two sets. This hyperplane can be thought of as a decision boundary, which
one can use to distinguish points belonging to one set from those belonging to the
other.
Proof. See studocu course notes ■

Definition 2.17 — Minkowski Sum and Difference. Let C, D ⊆ Rn be two sets.


The Minkowski sum of C and D is defined as

C + D = {x + y : x ∈ C, y ∈ D}

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 221

The Minkowski difference of C and D is defined as

C − D = {x − y : x ∈ C, y ∈ D}

Lemma 2.2. Let C ⊂ Rn be a closed convex set and let d ̸∈ C. Let x∗ be the nearest
point in C to d. Then the hyperplane
H = {x : ⟨v, x⟩ = ⟨v, d⟩ − ∥v∥2 }
strictly separates C from d. That is, we have
⟨v, x⟩ ≤ ⟨v, d⟩ − ∥v∥2 < ⟨v, d⟩
for all x ∈ C.

Definition 2.18 — Supporting Hyperplane. H = {x : v T x = b} is a supporting


hyperplane to a convex set Q at x̄ ∈ Q, if

xT v ≤ b, ∀x ∈ Q

and the equation xT v = b holds.

Theorem 2.29 — Existence of Supporting Hyperplane. Suppose that S ⊆ Rn is a


convex set, and x0 ∈ bd(S), the boundary of S. Then there exists a supporting
hyperplane containing x0 .

Theorem 2.30 — Krein-Milman Theorem. Let K be a compact convex set. Then

K = conv(ext(K))

where ext(K) is the set of extreme points of K.

Theorem 2.31 — Caratheodory Theorem. Let S ⊆ Rn and let x ∈ conv(S). Then


there exists xi ∈ S, i = 1, . . . , n + 1 such that x ∈ conv{x1 , . . . , xn+1 }. In other words,
we need at most n + 1 points for the convex combination.

2.93 Definition of Terms


Optimization involves finding a maximum or minimum value of a function sub-
ject to constraints. The function to be optimized is called the objective func-
tion, and the variables that affect the objective function are called decision vari-
ables.

Definition 2.19 Optimization is the process of finding the best solution among a
set of possible solutions. In other words, it is the art of finding the maximum or
minimum value of a function subject to certain constraints.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


222 Chapter 2. Introduction to Linear Optimization

Mathematically, an optimization problem can be represented as:

min
x
f (x) s.t. g(x) ≤ 0
where f (x) is the objective function, x is the decision variable, and g(x) is
the constraint function.

Types of Optimization Problems


Optimization problems can be classified into various types based on the nature
of the objective function and the constraints.
Linear vs Nonlinear Optimization
• Linear Optimization (Linear Programming): Both the objective function
and the constraints are linear.
Linear optimization, i.e., a linear program (LP), has a linear objective function
subject to a set of linear constraints and continuous decision variables.

Definition 2.20 A function f (x1 , x2 , · · · , xn ) is linear if, and only if, we


have f (x1 , x2 , · · · , xn ) = c1 x1 + c2 x2 + · · · + cn xn , where the c1 , c2 , · · · , cn
coefficients are constants.

A Linear Programming problem has the following general form:

max{cx : Ax = b, x ∈ R},

where x is a vector of decision variables, and the vectors c and b, as well as


the matrix A, are constant problem parameters.
• Nonlinear Optimization: The objective function or the constraints (or both)
are nonlinear.
Unconstrained vs Constrained Optimization
• Unconstrained Optimization: In this type of problem, there are no con-
straints on the decision variable. The goal is to find the minimum or maximum
value of the objective function.
• Constrained Optimization: In this type of problem, there are constraints
on the decision variable. The goal is to find the minimum or maximum value
of the objective function subject to these constraints.
Continuous vs Discrete Optimization
• Continuous Optimization: Decision variables can take any value within a
given range.
• Discrete Optimization: Decision variables can only take discrete values.
Formulation of Optimization Problems
The formulation of an optimization problem involves defining the objective
function, decision variables, and constraints.
Objective Function
The objective function f (x) is the function to be maximized or minimized.
Decision Variables
The decision variables x are the variables that affect the objective function.
Constraints

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 223

Constraints are the conditions that the decision variables must satisfy. They can
be equalities or inequalities.

■ Example 2.7

Minimize f (x) = x21 + x22


Subject to x1 + x2 = 1
x1 , x 2 ≥ 0

is an optimization problem which is nonlinear optimization. ■

Various methods are used to solve optimization problems, depending on their


type.
• Graphical Method: The graphical method is used for solving linear pro-
gramming problems with two variables. It involves plotting the constraints
and finding the feasible region.
• Analytical Methods: Analytical methods involve using calculus to find the
optimal solution by setting the derivative of the objective function to zero and
solving for the decision variables.
• Numerical Methods: Numerical methods, such as the Simplex method for
linear programming and gradient descent for nonlinear optimization, are used
when analytical methods are not feasible.
Optimization has a wide range of applications in various fields.
• Engineering: In engineering, Optimization is used to design optimal systems,
such as electrical circuits and mechanical systems.
• Economics: In economics, Optimization is used to determine the optimal
price and quantity of goods and services.
• Computer Science: Optimization is used to solve problems in computer
science, such as scheduling and resource allocation.

To use optimization, first you must formulate your model, based on the system of
interest and any simplifications required (i.e., assumptions). Formulating the model
is not enough, we are also interested in solving the problem, and in a reasonable
amount of time (however that is determined). To solve these problems, algorithms
are developed. An algorithms is a step-by-step process for finding a solution.
Optimization is the process of finding the best solution that satisfies a set of
constraints and criteria. The goal is to find the optimal solution that maximizes or
minimizes a objective function. The objective function is a mathematical function
that describes the problem to be solved.
Linear Programming is a sub-field of optimization theory, which is itself a subfield
of Applied Mathematics. Applied Mathematics is a very general area of study that
could arguably encompass half of the engineering disciplines. Put simply, applied
mathematics is all about applying mathematical techniques to understand or do
something practical.
Optimization is an exciting sub-discipline within applied mathematics! Opti-
mization is all about making things better; this could mean helping a company
make better decisions to maximize profit; helping a factory make products with less
environmental impact; or helping a zoologist improve the diet of an animal. When

T.Abraha(PhD) @AKU, 2024 Linear Optimization


224 Chapter 2. Introduction to Linear Optimization

we talk about optimization, we often use terms like better or improvement. It’s
important to remember that words like better can mean more of something (as in
the case of profit) or less of something (as in the case of waste). As we study linear
programming, we’ll quantify these terms in a mathematically precise way. For the
time being, let’s agree that when we optimize something, we are trying to make some
decisions that will make it better.

■ Example 2.8 Let’s recall a simple optimization problem from differential calculus:
Goats are an environmentally friendly and inexpensive way to control a lawn when
there are lots of rocks or lots of hills. (Seriously, both Google and some U.S. Navy
bases use goats on rocky hills instead of paying lawn mowers!) Suppose I wish to
build a pen to keep some goats. I have 100 meters of fencing and I wish to build
the pen in a rectangle with the largest possible area. How long should the sides of
the rectangle be? In this case, making the pen better means making it have the
largest possible area. The problem is illustrated in Figure 1.1. Clearly, we know
that:

2x + 2y = 100 (2.27)

because 2x + 2y is the perimeter of the pen and I have 100 meters of fencing to
build my pen. The area of the pen is A(x, y) = xy. We can use Equation 1.1 to
solve for x in terms of y. Thus we have:

y = 50 − x (2.28)

and A(x) = x(50 − x). To maximize A(x), recall we take the first derivative of A(x)
with respect to x, set this derivative to zero and solve for x:
dA
= 50 − 2x = 0 (2.29)
dx

Goat pen y

Figure 2.8: Goat pen with unknown side lengths. The objective is to identify the
values of x and y that maximize the area of the pen (and thus the number of goats
that can be kept).

Thus, x = 25 and y = 50 − x = 25. We further recall from basic calculus how to


confirm that this is a maximum; note:

d2 A
= −2 < 0 (2.30)
dx2
x=25

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 225

Which implies that x = 25 is a local maximum for this function. Another way of
seeing this is to note that A(x) = 50x − x2 is an "upside-down" parabola. As we
could have guessed, a square will maximize the area available for holding goats.

Let’s take a more general look at the goat pen example. The area function is a
mapping from R2 to R, written A : R2 → R. The domain of A is the two-dimensional
space R2 and its range is R.
Our objective in Example 2.8 is to maximize the function A by choosing values
for x and y. In optimization theory, the function we are trying to maximize (or
minimize) is called the objective function. In general, an objective function is a
mapping f : D ⊆ Rn → R. Here D is the domain of the function f .

Definition 2.21 Let f : D ⊆ Rn → R. The point x∗ is a global maximum for f if


for all x ∈ D, f (x∗ ) ≥ f (x). A point x∗ ∈ D is a local maximum for f if there is
a neighborhood S ⊆ D of x∗ (i.e., x∗ ∈ S) such that for all x ∈ S, f (x∗ ) ≥ f (x).

R Clearly Definition 2.21 is valid only for domains and functions where the
concept of a neighborhood is defined and understood. In general, S must be a
topologically connected set (as it is in a neighborhood in Rn ) in order for this
definition to be used, or at least we must be able to define the concept of a
neighborhood on the set.
In Example 2.8, we are constrained in our choice of x and y by the fact that
2x + 2y = 100. This is called a constraint of the optimization problem. More
specifically, it’s called an equality constraint. If we did not need to use all the fencing,
then we could write the constraint as 2x + 2y ≤ 100, which is called an inequality
constraint. In complex optimization problems, we can have many constraints. The
set of all points in Rn for which the constraints are true is called the feasible set (or
feasible region). Our problem is to decide the best values of x and y to maximize
the area A(x, y). The variables x and y are called decision variables.
Let f : D ⊆ Rn → R; for i = 1, . . . , m, gi : D ⊆ Rn → R; and for j = 1, . . . , l,
hj : D ⊆ Rn → R be functions. Then the general maximization problem with objective
function f (x1 , . . . , xn ) and inequality constraints gi (x1 , . . . , xn ) ≤ bi (i = 1, . . . , m) and
equality constraints hj (x1 , . . . , xn ) = rj is written as:




 max f (x1 , . . . , xn )





 subject to :

g1 (x1 , . . . , xn ) ≤ b1




..



.


(2.31)




gm (x1 , . . . , xn ) ≤ bm




 h1 (x1 , . . . , xn ) = r1

 ..
.






hl (x1 , . . . , xn ) = rl

Expression 2.31 is also called a mathematical programming problem. Naturally


when constraints are involved, we define the global and local maxima for the objective

T.Abraha(PhD) @AKU, 2024 Linear Optimization


226 Chapter 2. Introduction to Linear Optimization

function z(x1 , . . . , xn ) in terms of the feasible region instead of the entire domain of
z, since we are only concerned with values of x1 , . . . , xn that satisfy our constraints.

■ Example 2.9 — Continuation of Example 2.8. We can rewrite the problem in


Example 2.8 as:

max A(x, y) = xy (2.32)


s.t.:2x + 2y = 100
x≥0
y≥0

Note we’ve added two inequality constraints x ≥ 0 and y ≥ 0 because it doesn’t


really make any sense to have negative lengths. We can rewrite these constraints
as −x ≤ 0 and −y ≤ 0 where g1 (x, y) = −x and g2 (x, y) = −y to make Expression
(2.32) look like Expression (2.31). We have formulated the general maximization
problem in Problem 2.31. Suppose that we are interested in finding a value that
minimizes an objective function z(x1 , . . . , xn ) subject to certain constraints. Then
we can write Problem 2.31 replacing max with min. ■

Constrained optimization
In optimization, the objective is to maximize or minimize some function. For
example, if we are a factory, we want to minimize our cost of production. Often,
our optimization is not unconstrained. Otherwise, the way to minimize costs is to
produce nothing at all. Instead, there are some constraints we have to obey. The is
known as constrained optimization.

Definition 2.22 — Constrained optimization. The general problem is of con-


strained optimization is

minimize f (x) subject to h(x) = b, x ∈ X

where x ∈ Rn is the vector of decision variables, f : Rn → R is the objective


function, h : Rn → Rm and b ∈ Rm are the functional constraints, and X ⊆ Rn
is the regional constraint.

Note that everything above is a vector, but we do not bold our vectors. This is since
almost everything we work with is going to be a vector, and there isn’t much point
in bolding them.
This is indeed the most general form of the problem. If we want to maximize
f instead of minimize, we can minimize −f . If we want our constraints to be
an inequality in the form h(x) ≥ b, we can introduce a slack variable z, make the
functional constraint as h(x) − z = b, and add the regional constraint z ≥ 0. So all is
good, and this is in fact the most general form.
Linear programming is, surprisingly, the case where everything is linear. We can
write our problem as:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 227

minimize cT x subject to

aTi x ≥ bi for all i ∈ M1


aTi x ≤ bi for all i ∈ M2
aTi x = bi for all i ∈ M3
xi ≥ 0 for all i ∈ N1
xj ≤ 0 for all i ∈ N2

where we’ve explicitly written out the different forms the constraints can take.
This is too clumsy. Instead, we can perform some tricks and turn them into a
nicer form:

Definition 2.23 — General and standard form. The general form of a linear
program is

minimize cT x subject to Ax ≥ b, x ≥ 0

The standard form is

minimize cT x subject to Ax = b, x ≥ 0.

It takes some work to show that these are indeed the most general forms. The
equivalence between the two forms can be done via slack variables, as described
above. We still have to check some more cases. For example, this form says that x ≥ 0,
i.e. all decision variables have to be positive. What if we want x to be unconstrained,
ie can take any value we like? We can split x into to parts, x = x+ − x− , where each
part has to be positive. Then x can take any positive or negative value.
Note that when I said “nicer”, I don’t mean that turning a problem into this
form necessarily makes it easier to solve in practice. However, it will be much easier
to work with when developing general theory about linear programs.

■ Example 2.10 We want to minimize −(x1 + x2 ) subject to

x1 + 2x2 ≤ 6
x1 − x2 ≤ 3
x1 , x 2 ≥ 0

Since we are lucky to have a 2D problem, we can draw this out.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


228 Chapter 2. Introduction to Linear Optimization
x2

x 1 − x2 = 3

x1 + 2x2 = 6
x1
c
−(x1 + x2 ) = 0 −(x1 + x2 ) = −2 −(x1 + x2 ) = −5

The shaded region is the feasible region, and c is our cost vector. The dotted lines,
which are orthogonal to c are lines in which the objective function is constant. To
minimize our objective function, we want the line to be as right as possible, which
is clearly achieved at the intersection of the two boundary lines. ■

Now we have a problem. In the general case, we have absolutely no idea how to
solve it. What we do know, is how to do unconstrained optimization.

Unconstrained optimization
Let f : Rn → R, x∗ ∈ Rn . A necessary condition for x∗ to minimize f over Rn is
∇f (x∗ ) = 0, where
!T
∂f ∂f
∇f = ,··· ,
∂x1 ∂xn

is the gradient of f .
However, this is obviously not a sufficient condition. Any such point can be a max-
imum, minimum or a saddle. Here we need a notion of convexity:

Definition 2.24 — Convex region. A region S ⊆ Rn is convex iff for all δ ∈ [0, 1],
x, y ∈ S, we have δx + (1 − δ)y ∈ S. Alternatively, If you take two points, the
line joining them lies completely within the region.

non-convex convex

Definition 2.25 — Convex function. A function f : S → R is convex if S is convex,


and for all x, y ∈ S, δ ∈ [0, 1], we have δf (x) + (1 − δ)f (y) ≥ f (δx + (1 − δ)y).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 229

δf (x) + (1 − δ)f (y)

x δx + (1 − δ)y
y

A function is concave if −f is convex. Note that a function can be neither


concave nor convex.

We have the following lemma:


Lemma 2.3. Let f be twice differentiable. Then f is convex on a convex set S if
the Hessian matrix
∂ 2f
Hfij =
∂xi ∂xj

is positive semidefinite for all x ∈ S, where this fancy term means:

Definition 2.26 — Positive-semi-definite. A matrix H is positive semi-definite if


v T Hv ≥ 0 for all v ∈ Rn .

Which leads to the following theorem:


Theorem 2.32 Let X ⊆ Rn be convex, f : Rn → R be twice differentiable on X. If
x∗ ∈ X satisfy ∇f (x∗ ) = 0 and Hf (x) is positive semi-definite for all x ∈ X, then
x∗ minimizes f on X.
We will not prove these.
Note that this is helpful, since linear functions are convex (and concave). The
problem is that our problems are constrained, not unconstrained. So we will have to
convert constrained problems to unconstrained problems.

Supporting hyper-planes and convexity


We use the fancy term “hyperplane” to denote planes in higher dimensions (in an
n-dimensional space, a hyperplane has n − 1 dimensions).

Definition 2.27 — Supporting hyperplane. A hyperplane α : Rm → R is supporting


to ϕ at b if α intersects ϕ at b and ϕ(c) ≥ α(c) for all c.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


230 Chapter 2. Introduction to Linear Optimization

ϕ
α

ϕ(b)
x

Theorem 2.33 (P ) satisfies strong duality iff ϕ(c) = inf f (x) has a supporting
x∈X(c)
hyperplane at b.
Note that here we fix a b, and let ϕ be a function of c.
Proof. (⇐) Suppose there is a supporting hyperplane. Then since the plane passes
through ϕ(b), it must be of the form
α(c) = ϕ(b) + λT (c − b).
Since this is supporting, for all c ∈ Rm ,
ϕ(b) + λT (c − b) ≤ ϕ(c),
or
ϕ(b) ≤ ϕ(c) − λT (c − b),
This implies that
ϕ(b) ≤ infm (ϕ(c) − λT (c − b))
c∈R
= infm inf (f (x) − λT (h(x) − b))
c∈R x∈X(c)

(since ϕ(c) = inf f (x) and h(x) = c for x ∈ X(c))


x∈X(c)

= inf L(x, λ).


x∈X

X(c) = X, which is true since for any x ∈ X, we have x ∈ X(h(x)))


S
(since
c∈Rm

= g(λ)
By weak duality, g(λ) ≤ ϕ(b). So ϕ(b) = g(λ). So strong duality holds.
(⇒). Assume now that we have strong duality. The there exists λ such that for
all c ∈ Rm ,
ϕ(b) = g(λ)
= inf L(x, λ)
x∈X
≤ inf L(x, λ)
x∈X(c)

= inf (f (x) − λT (h(x) − b))


x∈X(c)

= ϕ(c) − λT (c − b)
So ϕ(b) + λT (c − b) ≤ ϕ(c). So this defines a supporting hyperplane. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 231

We are having some progress now. To show that Lagrange multipliers work, we
need to show that (P ) satisfies strong duality. To show that (P ) satisfies strong
duality, we need to show that it has a supporting hyperplane at b. How can we show
that there is a supporting hyperplane? A sufficient condition is convexity.
Theorem 2.34 — Supporting hyperplane theorem. Suppose that ϕ : Rm → R is
convex and b ∈ Rm lies in the interior of the set of points where ϕ is finite. Then
there exists a supporting hyperplane to ϕ at b.

Proof follows rather straightforwardly from the definition of convexity, and is omitted.
This is some even better progress. However, the definition of ϕ is rather convoluted.
How can we show that it is convex? We have the following helpful theorem:
Theorem 2.35 Let

ϕ(b) = inf {f (x) : h(x) ≤ b}


x∈X

If X, f, h are convex, then so is ϕ (assuming feasibility and boundedness).

Proof. Consider b1 , b2 ∈ Rm such that ϕ(b1 ) and ϕ(b2 ) are defined. Let δ ∈ [0, 1] and
define b = δb1 + (1 − δ)b2 . We want to show that ϕ(b) ≤ δϕ(b1 ) + (1 − δ)ϕ(b2 ).
Consider x1 ∈ X(b1 ), x2 ∈ X(b2 ), and let x = δx1 + (1 − δ)x2 . By convexity of X,
x ∈ X.
By convexity of h,

h(x) = h(δx1 + (1 − δ)x2 )


≤ δh(x1 ) + (1 − δ)h(x2 )
≤ δb1 + (1 − δ)b2
=b

So x ∈ X(b). Since ϕ(x) is an optimal solution, by convexity of f ,

ϕ(b) ≤ f (x)
= f (δx1 + (1 − δ)x2 )
≤ δf (x1 ) + (1 − δ)f (x2 )

This holds for any x1 ∈ X(b1 ) and x2 ∈ X(b2 ). So by taking infimum of the right
hand side,

ϕ(b) ≤ δϕ(b1 ) + (1 − δ)ϕ(b2 ).

So ϕ is convex. ■

h(x) = b is equivalent to h(x) ≤ b and −h(x) ≤ −b. So the result holds for
problems with equality constraints if both h and −h are convex, i.e. if h(x) is linear.
So
Theorem 2.36 If a linear program is feasible and bounded, then it satisfies strong
duality.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


232 Chapter 2. Introduction to Linear Optimization

2.93.1 Solutions of linear programs


Linear programs
We’ll come up with an algorithm to solve linear program efficiently. We first illustrate
the general idea with the case of a 2D linear program. Consider the problem

maximize x1 + x2
subject to
x1 + 2x2 ≤ 6
x1 − x2 ≤ 3
x1 , x 2 ≥ 0
We can plot the solution space out
x2

x 1 − x2 = 3

c
x1
x1 + 2x2 = 6

To maximize x1 + x2 , we want to go as far in the c direction as possible. It should


be clear that the optimal point will lie on a corner of the polygon of feasible region,
no matter what the shape of it might be.
Even if we have cases where c is orthogonal to one of the lines, eg
x2

A x1 − x2 = 3

c
x1

x1 + x2 = 3.5

An optimal point might be A. However, if we know that A is an optimal point, we


can slide it across the x1 + x2 = 3.5 line until it meets one of the corners. Hence we
know that one of the corners must be an optimal point.
This already allows us to solve linear programs, since we can just try all corners
and see which has the smallest value. However, this can be made more efficient,
especially when we have a large number of dimensions and hence corners.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 233

Basic solutions
Here we will assume that the rows of A are linearly independent, and any set of m
columns are linearly independent. Otherwise, we can just throw away the redundant
rows or columns.
In general, if both the constraints and the objective functions are linear, then the
optimal point always lies on a “corner”, or an extreme point.

Definition 2.28 — Extreme point. An extreme point x ∈ S of a convex set S is a


point that cannot be written as a convex combination of two distinct points in
S, i.e. if y, z ∈ S and δ ∈ (0, 1) satisfy

x = δy + (1 − δ)z,

then x = y = z.

Consider again the linear program in standard form, i.e.


maximize cT x subject to Ax = b, x ≥ 0, where A ∈ Rm×n and b ∈ Rm .
Note that now we are talking about maximization instead of minimization.

Definition 2.29 — Basic solution and basis. A solution x ∈ Rn is basic if it has


at most m non-zero entries (out of n), i.e. if there exists a set B ⊆ {1, · · · , n}
with |B| = m such that xi = 0 if i ̸∈ B. In this case, B is called the basis, and
xi are the basic variables if i ∈ B.

We will later see (via an example) that basic solutions correspond to solutions at the
“corners” of the solution space.

Definition 2.30 — Non-degenerate solutions. A basic solution is non-degenerate


if it has exactly m non-zero entries.

Note that by “solution”, we do not mean a solution to the whole maximization


problem. Instead we are referring to a solution to the constraint Ax = b. Being a
solution does not require that x ≥ 0. Those that satisfy this regional constraint are
known as feasible.

Definition 2.31 — Basic feasible solution. A basic solution x is feasible if it


satisfies x ≥ 0.

■ Example 2.11 Consider the linear program

maximize f (x) = x1 + x2 subject to

x1 + 2x2 + z1 = 6

T.Abraha(PhD) @AKU, 2024 Linear Optimization


234 Chapter 2. Introduction to Linear Optimization

x1 − x2 + z2 = 3
x1 , x 2 , z 1 , z 2 ≥ 0

where we have included the slack variables.


Since we have 2 constraints, a basic solution would require 2 non-zero entries,
and thus 2 zero entries. The possible basic solutions are

x1 x2 z1 z2 f (x)
A 0 0 6 3 0
B 0 3 0 6 3
C 4 1 0 0 5
D 3 0 3 0 3
E 6 0 0 −4 6
F 0 −3 12 0 −3

Among all 6, E and F are not feasible solutions since they have negative entries.
So the basic feasible solutions are A, B, C, D.
x2

x 1 − x2 = 3
D

C
A B E x1
x1 + 2x2 = 6

In previous example, we saw that the extreme points are exactly the basic feasible
solutions. This is true in general.
Theorem 2.37 A vector x is a basic feasible solution of Ax = b if and only if it is
an extreme point of the set X(b) = {x′ : Ax′ = b, x′ ≥ 0}.

We will not prove this.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 235

Extreme points and optimal solutions


Recall that we previously showed in our 2D example that the optimal solution lies
on an extreme point, i.e. is a basic feasible solution. This is also true in general.
Theorem 2.38 If (P ) is feasible and bounded, then there exists an optimal solution
that is a basic feasible solution.
Proof. Let x be optimal of (P ). If x has at most non-zero entries, it is a basic feasible
solution, and we are done.
Now suppose x has r > m non-zero entries. Since it is not an extreme point, we
have y ̸= z ∈ X(b), δ ∈ (0, 1) such that

x = δy + (1 − δ)z.

We will show there exists an optimal solution strictly fewer than r non-zero entries.
Then the result follows by induction.
By optimality of x, we have cT x ≥ cT y and cT x ≥ cT z.
Since cT x = δcT y + (1 − δ)cT z, we must have that cT x = cT y = cT z, i.e. y and z
are also optimal.
Since y ≥ 0 and z ≥ 0, x = δy + (1 − δ)z implies that yi = zi = 0 whenever xi = 0.
So the non-zero entries of y and z is a subset of the non-zero entries of x. So y
and z have at most r non-zero entries, which must occur in rows where x is also
non-zero.
If y or z has strictly fewer than r non-zero entries, then we are done. Otherwise,
for any δ̂ (not necessarily in (0, 1)), let

xδ̂ = δ̂y + (1 − δ̂)z = z + δ̂(y − z).

Observe that xδ̂ is optimal for every δ̂ ∈ R.


Moreover, y − z ̸= 0, and all non-zero entries of y − z occur in rows where x is
non-zero as well. We can thus choose δ̂ ∈ R such that xδ̂ ≥ 0 and xδ̂ has strictly
fewer than r non-zero entries. ■
Intuitively, this is what we do when we “slide along the line” if c is orthogonal to
one of the boundary lines.
This result in fact holds more generally for the maximum of a convex function f
over a compact (i.e. closed and bounded) convex set X.
In that case, we can write any point x ∈ X as a convex combination
k
δi xi
X
x=
i=1
Pk
of extreme points xk ∈ X, where δ ∈ Rk≥0 and i=1 δi = 1.
Then, by convexity of f ,
k
δi f (xi ) ≤ max f (xi )
X
f (x) ≤
i
i=1

So any point in the interior cannot be better than the extreme points.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


236 Chapter 2. Introduction to Linear Optimization

Linear programming duality


Consider the linear program in general form with slack variables,
minimize cT x subject to Ax − z = b, x, z ≥ 0
We have X = {(x, z) : x, z ≥ 0} ⊆ Rm+n .
The Lagrangian is
L(x, z, λ) = cT x − λT (Ax − z − b) = (cT − λT A)x + λT z + λT b.
Since x, z can be arbitrarily positive, this has a finite minimum if and only if
cT − λT A ≥ 0, λT ≥ 0.
Call the feasible set Y . Then for fixed λ ∈ Y , the minimum of L(x, z, λ) is attained
when (cT − λT A)x and λT z = 0 by complementary slackness. So
g(λ) = inf L(x, z, λ) = λT b.
(x,z)∈X

The dual is thus


maximize λT b subject to AT λ ≤ c, λ ≥ 0

Theorem 2.39 The dual of the dual of a linear program is the primal.

Proof. It suffices to show this for the linear program in general form. We have shown
above that the dual problem is
minimize −bT λ subject to −AT λ ≥ −c, λ ≥ 0.
This problem has the same form as the primal, with −b taking the role of c, −c
taking the role of b, −AT taking the role of A. So doing it again, we get back to the
original problem. ■

■ Example 2.12 Let the primal problem be

maximize 3x1 + 2x2 subject to

2x1 + x2 + z1 = 4
2x1 + 3x2 + z2 = 6
x1 , x2 , z1 , z2 ≥ 0.

Then the dual problem is

minimize 4λ1 + 6λ2 such that

2λ1 + 2λ2 − µ1 = 3
λ1 + 3λ2 − µ2 = 2
λ1 , λ2 , µ1 , µ2 ≥ 0.

We can compute all basic solutions of the primal and the dual by setting n − m − 2
variables to be zero in turn.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 237

Given a particular basic solutions of the primal, the corresponding solutions of


the dual can be found by using the complementary slackness solutions:

λ1 z1 = λ2 z2 = 0, µ1 x1 = µ2 x2 = 0.

x1 x2 z1 z2 f (x) λ1 λ2 µ1 µ2 g(λ)
A 0 0 4 6 0 0 0 -3 -2 0
3
B 2 0 0 2 6 2 0 0 − 21 6
3 5
C 3 0 -2 0 9 0 2 0 2 9
3 13 5 1 13
D 2 1 0 0 2 4 4 0 0 2
2
E 0 2 2 0 4 0 3 − 35 0 4
F 0 4 0 -6 8 2 0 1 0 8

x2 λ2

F
C
E D
B D
C x1 F
λ1
A B A E λ1 + 3λ2 = 2
2x1 + 3x2 = 6
2x1 + x2 = 4 2λ1 + 2λ2 = 3

We see that D is the only solution such that both the primal and dual solutions
are feasible. So we know it is optimal without even having to calculate f (x). It
turns out this is always the case. ■

Theorem 2.40 Let x and λ be feasible for the primal and the dual of the linear
program in general form. Then x and λ and optimal if and only if they satisfy
complementary slackness, i.e. if

(cT − λT A)x = 0 and λT (Ax − b) = 0.

Proof. If x and λ are optimal, then

cT x = λT b

since every linear program satisfies strong duality. So

cT x = λT b
= inf

(cT x′ − λT (Ax′ − b))
x ∈X
≤ c x − λT (Ax − b)
T

≤ cT x.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


238 Chapter 2. Introduction to Linear Optimization

The last line is since Ax ≥ b and λ ≥ 0.


The first and last term are the same. So the inequalities hold with equality.
Therefore

λT b = cT x − λT (Ax − b) = (cT − λT A)x + λT b.

So

(cT − λT A)x = 0.

Also,

cT x − λT (Ax − b) = cT x

implies

λT (Ax − b) = 0.

On the other hand, suppose we have complementary slackness, i.e.

(cT − λT A)x = 0 and λT (Ax − b) = 0,

then

cT x = cT x − λT (Ax − b) = (cT − λT A)x + λT b = λT b.

Hence by weak duality, x and λ are optimal. ■

2.93.2 Simplex method


The simplex method is an algorithm that makes use of the result we just had. To find
the optimal solution to a linear program, we start with a basic feasible solution of
the primal, and then modify the variables step by step until the dual is also feasible.
We start with an example, showing what we do, then explain the logic behind,
then do a more proper example.

■ Example 2.13 Consider the following problem:

maximize x1 + x2 subject to

x1 + 2x2 + z1 = 6
x1 − x2 + z2 = 3
x1 , x2 , z1 , z2 ≥ 0.

We write everything in the simplex tableau, by noting down the coefficients:

x1 x2 z1 z2
Constraint 1 1 2 1 0 6
Constraint 2 1 -1 0 1 3
Objective 1 1 0 0 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 239
 
1 0
We see an identity matrix  in the z1 and z2 columns, and these correspond
0 1
to basic feasible solution: z1 = 6, z2 = 3, x1 = x2 = 0. It’s pretty clear that our basic
feasible solution is not optimal, since our objective function is 0. This is since
something in the last row is positive, and we can increase the objective by, say,
increasing x1 .
The simplex method says that we can find the optimal solution if we make the
bottom row all negative while keeping the right column positive, by doing row
operations.
We multiply the first row by 12 and subtract/add it to the other rows to obtain

x1 x2 z1 z2
1 1
Constraint 1 2 1 2 0 3
2 1
Constraint 2 3 0 2 1 6
1
Objective 2 0 − 12 0 -3

Our new basic feasible solution is x2 = 3, z2 = 6, x1 = z1 = 0. We see that the


number in the bottom-right corner is −f (x). We can continue this process to
finally obtain a solution. ■

Here we adopt the following notation: let A ⊆ Rm×n and b ∈ Rm . Assume that A
has full rank. Let B be a basis and set B ⊆ {1, 2, · · · , n} with |B| = m, corresponding
to at most m non-zero entries.
We rearrange the columns so that all basis columns are on the left. Then we can
write our matrices as
 
Am×n = (AB )m×m (AN )m×(n−m)
 T
xn×1 = (xB )m×1 (xN )(n−m)×1
 
c1×n = (cB )m×1 (cN )(n−m)×1 .

Then the functional constraints

Ax = b

can be decomposed as

AB xB + AN xN = b.

We can rearrange this to obtain

xB = A−1
B (b − AN xN ).

In particular, when xN = 0, then

xB = A−1
B b.

The general tableau is then

T.Abraha(PhD) @AKU, 2024 Linear Optimization


240 Chapter 2. Introduction to Linear Optimization

Basis components Other components

A−1
B AB = I A−1
B AN A−1
B b

cTB − cTB A−1


B AB = 0 cTN − cTB A−1
B AN −cTB A−1
B b

This might look really scary, and it is! Without caring too much about how the
formulas for the cells come from, we see the identity matrix on the left, which is
where we find our basic feasible solution. Below that is the row for the objective
function. The values of this row must be 0 for the basis columns.
On the right-most column, we have A−1 B b, which is our xB . Below that is
−cB A−1
T
B b, which is the negative of our objective function cTB xB .
The simplex tableau
We have

f (x) = cT x
= cTB xB + cTN xN
= cTB A−1 T
B (b − AN xN ) + cN xN
= cTB A−1 T T −1
B b + (cN − cB AB AN )xN .

We will maximize cT x by choosing a basis such that cTN − cTB A−1 B AN ≤ 0, i.e. non-
−1
positive everywhere and AB b ≥ 0.
If this is true, then for any feasible solution x ∈ Rn , we must have xN ≥ 0. So
(cN − cTB A−1
T
B AN )xN ≤ 0 and

f (x) ≤ cTB A−1


B b.

So if we choose xB = A−1 B b, xN = 0, then we have an optimal solution.


Hence our objective is to pick a basis that makes cTN −cTB A−1
B AN ≤ 0 while keeping
AB b ≥ 0. To do this, suppose this is not attained. Say (cN − cTB A−1
−1 T
B AN )i > 0.
We can increase the value of the objective function by increasing (xN )i . As we
increase (xN )i , we have to satisfy the functional constraints. So the value of other
variables will change as well. We can keep increasing (xN )i until another variable
hits 0, say (xB )j . Then we will have to stop.
(However, if it so happens that we can increase (xN )i indefinitely without other
things hitting 0, our problem is unbounded)
The effect of this is that we have switched basis by removing (xB )j and adding
(xN )i . We can continue from here. If (cTN − cTB A−1 B AN ) is negative, we are done.
Otherwise, we continue the above procedure.
The simplex method is a systematic way of doing the above procedure.
Using the Tableau
Consider a tableau of the form

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 241

aij ai0

a0j a00

where ai0 is b, a0j corresponds to the objective function, and a00 is initial 0.
The simplex method proceeds as follows:
1. Find an initial basic feasible solution.
2. Check whether a0j ≤ 0 for every j. If so, the current solution is optimal. Stop.
3. If not, choose a pivot column j such that a0j > 0. Choose a pivot row i ∈ {i :
aij > 0} that minimizes ai0 /aij . If multiple rows are minimize ai0 /aij , then
the problem is degenerate, and things might go wrong. If aij ≤ 0 for all i, i.e.
we cannot choose a pivot row, the problem is unbounded, and we stop.
4. We update the tableau by multiplying row i by 1/aij (such that the new
aij = 1), and add a (−akj /aij ) multiple of row i to each row k ̸= i, including
k = 0 (so that akj = 0 for all k ̸= i)
We have a basic feasible solution, since our choice of aij makes all right-hand
columns positive after subtracting (apart from a00 ).
5. GOTO (ii).
Now visit the example at the beginning of the section to see how this is done in
practice. Then read the next section for a more complicated example.

The two-phase simplex method


Sometimes we don’t have a nice identity matrix to start with. In this case, we need
to use the two-phase simplex method to first find our first basic feasible solution,
then to the actual optimization.
This method is illustrated by example.

■ Example 2.14 Consider the problem

minimize 6x1 + 3x2 subject to

x1 + x2 ≥ 1
2x1 − x2 ≥ 1
3x2 ≤ 2
x1 , x 2 ≥ 0

This is a minimization problem. To avoid being confused, we maximize −6x1 − 3x2


instead. We add slack variables to obtain

maximize −6x1 − 3x2 subject to

x1 + x2 − z1 = 1
2x1 − x2 − z2 = 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


242 Chapter 2. Introduction to Linear Optimization

3x2 + z3 = 2
x1 , x2 , z1 , z2 , z3 ≥ 0

Now we don’t have a basic feasible solution, since we would need z1 = z2 = −1, z3 = 2,
which is not feasible. So we add more variables, called the artificial variables.

maximize −6x1 − 3x2 subject to

x1 + x2 − z1 + y1 = 1
2x1 − x2 − z2 + y2 = 1
3x2 + z3 = 2
x1 , x 2 , z 1 , z 2 , z 3 , y 1 , y 2 ≥ 0

Note that adding y1 and y2 might create new solutions, which is bad. We solve
this problem by first trying to make y1 and y2 both 0 and find a basic feasible
solution. Then we can throw away y1 and y2 and then get a basic feasible for our
original problem. So momentarily, we want to solve

minimize y1 + y2 subject to

x1 + x2 − z1 + y1 = 1
2x1 − x2 − z2 + y2 = 1
3x2 + z3 = 2
x1 , x 2 , z 1 , z 2 , z 3 , y 1 , y 2 ≥ 0

By minimizing y1 and y2 , we will make them zero.


Our simplex tableau is

x1 x2 z1 z2 z3 y1 y2
1 1 -1 0 0 1 0 1
2 -1 0 -1 0 0 1 1
0 3 0 0 1 0 0 2
-6 -3 0 0 0 0 0 0
0 0 0 0 0 -1 -1 0

Note that we keep both our original and “kill-yi ” objectives, but now we only care
about the second one. We will keep track of the original objective so that we can
use it in the second phase.
We see an initial feasible solution y1 = y2 = 1, z3 = 2. However, this is not a
proper simplex tableau, as the basis columns should not have non-zero entries
(apart from the identity matrix itself). But we have the two −1s at the bottom!
So we add the first two rows to the last to obtain

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.93 Definition of Terms 243

x1 x2 z1 z2 z3 y1 y2
1 1 -1 0 0 1 0 1
2 -1 0 -1 0 0 1 1
0 3 0 0 1 0 0 2
-6 -3 0 0 0 0 0 0
3 0 -1 -1 0 0 0 2

Our pivot column is x1 , and our pivot row is the second row. We divide it by 1
and add/subtract it from other rows.

x1 x2 z1 z2 z3 y1 y2
3 1
0 2 -1 2 0 1 − 12 1
2
1 − 12 0 − 21 0 0 1
2
1
2
0 3 0 0 1 0 0 2
0 -6 0 -3 0 0 3 3
3 1
0 2 −1 2 0 0 − 32 1
2

There are two possible pivot columns. We pick z2 and use the first row as the pivot
row.

x1 x2 z1 z2 z3 y1 y2
0 3 -2 1 0 2 -1 1
1 1 -1 0 0 1 0 1
0 3 0 0 1 0 0 2
0 3 -6 0 0 6 0 6
0 0 0 0 0 -1 -1 0

We see that y1 and y2 are no longer in the basis, and hence take value 0. So we
drop all the phase I stuff, and are left with

x1 x2 z1 z2 z3
0 3 -2 1 0 1
1 1 -1 0 0 1
0 3 0 0 1 2
0 3 -6 0 0 6

We see a basic feasible solution z1 = x1 = 1, z3 = 2.


We pick x2 as the pivot column, and the first row as the pivot row. Then we
have

T.Abraha(PhD) @AKU, 2024 Linear Optimization


244 Chapter 2. Introduction to Linear Optimization

x1 x2 z1 z2 z3
0 1 − 23 1
3 0 1
3
1 0 − 13 − 31 0 2
3
0 0 2 -1 1 1
0 0 -4 -1 0 5

Since the last row is all negative, we have complementary slackness. So this is a
optimal solution. So x1 = 23 , x2 = 13 , z3 = 1 is a feasible solution, and our optimal
value is 5.
Note that we previously said that the bottom right entry is the negative of the
optimal value, not the optimal value itself! This is correct, since in the tableau, we
are maximizing −6x1 − 3x2 , whose maximum value is −5. So the minimum value
of 6x1 + 3x2 is 5. ■

2.94 Introduction to Linear optimization


Linear optimization is a fundamental topic in optimization theory that deals with
finding the optimal solution to a linear problem. In this lecture, we will introduce
the basics of linear optimization and its applications.

Linear Programming Problem


A linear programming problem is defined as follows:

min xT c s.t. Ax ≤ b, x ≥ 0
where:
• x is the decision vector
• c is the cost vector
• A is the coefficient matrix
• bTis the right-hand side vector
• x denotes the transpose of x
The objective is to find the optimal solution that minimizes or maximizes the
objective function xT c, subject to the constraints Ax ≤ b and x ≥ 0.

2.94.1 Definitions
The following is an example of a problem in linear programming:

Maximize x + y − 2z
Subject to 2x + y + z ≤ 4
3x − y + z = 0
x, y, z ≥ 0
Solving this problem means finding real values for the variables x, y, z satisfying
the constraints 2x + y + z ≤ 4, 3x − y + z = 0, and x, y, z ≥ 0 that gives the maximum
possible value (if it exists) for the objective function x + y − 2z.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.94 Introduction to Linear optimization 245
   
x 0
   
For example, y  = 1  satisfies all the constraints and is called a feasible
  

z 1
solution. Its objective function value, obtained by evaluating the objective
   
x 0
   
function at y  = 1 , is 0 + 1 − 2(1) = −1. The set of feasible solutions to a linear
  

z 1
programming problem is called the feasible region.
More formally, a linear programming problem is an optimization problem of the
following form:
n
X
Maximize (or Minimize) cj x j
j=1
Subject to Pi (x1 , . . . , xn ) i = 1, . . . , m

where m and n are positive integers, cj ∈ R for j = 1, . . . , n, and for each i = 1, . . . , m,


Pi (x1 , . . . , xn ) is a linear constraint on the (decision) variables x1 , . . . , xn having
one of the following forms:
• a1 x 1 + · · · + an x n ≥ β
• a1 x 1 + · · · + an x n ≤ β
• a1 x 1 + · · · an x n = β
where β, a1 , . . . , an ∈ R. To save writing, the word “Minimize” (“Maximize”) is
replaced with “min” (“max”)  
and “Subject to” is abbreviated as “s.t.”.
x
 1
 .. 
A feasible solution ⃗x =  .  that gives the maximum possible objective function
 
xn
value in the case of a maximization problem is called an optimal solution and its
objective function value is the optimal value of the problem.
The following example shows that it is possible to have multiple optimal solutions:

max x + y
s.t. 2x + 2y ≤ 1
       
1
x x 0
The constraint says that x + y cannot exceed 12 . Now, both   =  2  and   =  1 
y 0 y 2
1
are feasible solutions having objective function value 2 . Hence, they are both optimal
solutions. (In fact, this problem has infinitely many optimal solutions. Can you
specify all of them?)
Not all linear programming problems have optimal solutions. For example, a
problem can have no feasible solution. Such a problem is said to be infeasible. Here
is an example of an infeasible problem:

min x
s.t. x ≤ 1
x≥2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


246 Chapter 2. Introduction to Linear Optimization

There is no value for x that is at the same time at most 1 and at least 2.
Even if a problem is not infeasible, it might not have an optimal solution as the
following example shows:

min x
s.t. x ≤ 0

Note that now matter what real number M we are given, we can always find a
feasible solution whose objective function value is less than M . Such a problem is
said to be unbounded. (For a maximization problem, it is unbounded if one can
find feasible solutions who objective function value is larger than any given real
number.)
So far, we have seen that a linear programming problem can have an optimal
solution, be infeasible, or be unbounded. Is it possible for a linear programming
problem to be not infeasible, not unbounded, and with no optimal solution?
The following optimization problem, though not a linear programming problem, is
not infeasible, not unbounded, and has no optimal solution:

min 2x
s.t. x ≤ 0

The objective function value is never negative and can get arbitrarily close to 0 but
can never attain 0.
A main result in linear programming states that if a linear programming problem
is not infeasible and is not unbounded, then it must have an optimal solution.
This result is known as the Fundamental Theorem of Linear Programming
(Theorem 2.42) and we will see a proof of this importan result. In the meantime,
we will consider the seemingly easier problem of determining if a system of linear
constraints has a solution.

Exercises
1. Determine all values of a such that the problem

min x + y
s.t. −3x + y ≥ a
2x − y ≥ 0
x + 2y ≥ 2

is infeasible.
2. Show that the problem

min 2x · 4y
s.t. e−3x+y ≥ 1
|2x − y| ≤ 4

can be solved by solving a linear programming problem.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.94 Introduction to Linear optimization 247

Solutions
1. Adding the first two inequalities gives −x ≥ a. Adding 2 times the second
inequality and the third inequality gives 5x ≥ 2, implying that x ≥ 25 . Hence,
if a > − 25 , there is no solution.  
Note that if a ≤ − 25 , then (x, y) = 25 , 45 satisfies all the inequalities. Hence,
the problem is infeasible if and only if a > − 25 .
2. Note that the constraint |2x − y| ≤ 4 is equivalent to the constraints 2x − y ≤ 4
and 2x − y ≥ −4 taken together, and the constraint e−3x+y ≥ 1 is equivalent to
−3x + y ≥ 0. Hence, we can rewrite the problem with linear constraints.
Finally, minimizing 2x · 4y is the same as minimizing 2x+2y , which is equivalent
to minimizing x + 2y.

2.94.2 Linear Programming Formulation


Remember, for a linear program (LP), we want to maximize or minimize a linear
objective function of the continous decision variables, while considering linear
constraints on the values of the decision variables.

Definition 2.32 Linear Function A function f (x1 , x2 , · · · , xn ) is linear if, and only
if, we have f (x1 , x2 , · · · , xn ) = c1 x1 + c2 x2 + · · · + cn xn , where the c1 , c2 , · · · , cn
coefficients are constants.

A Generic Linear Program (LP)


Decision Variables:
xi : continuous variables (xi ∈ R, i.e., a real number), ∀i = 1, · · · , 3.
Parameters (known input parameters):
ci : cost coefficients ∀i = 1, · · · , 3
aij : constraint coefficients ∀i = 1, · · · , 3, j = 1, · · · , 4
bj : right hand side coefficient for constraint j, j = 1, · · · , 4

Min z = c1 x1 + c2 x2 + c3 x3 (2.33)
s.t. a11 x1 + a12 x2 + a13 x3 ≥ b1 (2.34)
a21 x1 + a22 x2 + a23 x3 ≤ b2 (2.35)
a31 x1 + a32 x2 + a33 x3 = b3 (2.36)
a41 x1 + a42 x2 + a43 x3 ≥ b4 (2.37)
x1 ≥ 0, x2 ≤ 0, x3 urs. (2.38)

Eq. (2.56) is the objective function, (2.57)-(2.60) are the functional constraints,
while (2.61) is the sign restrictions (urs signifies that the variable is unrestricted). If
we were to add any one of these following constraints x2 ∈ {0, 1} (x2 is binary-valued)
or x3 ∈ Z (x3 is integer-valued) we would have an Integer Program. For the purposes
of this class, an Integer Program (IP) is just an LP with added integer restrictions
on (some) variables.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


248 Chapter 2. Introduction to Linear Optimization

While, in general, solvers will take any form of the LP, there are some special
forms we use in analysis:

LP Standard Form: The standard form has all constraints as equalities, and
all variables as non-negative. The generic LP is not in standard form, but any LP
can be converted to standard form.

Since x2 is non-positive and x3 unrestricted, perform the following substitutions


− −
x2 = −x̂2 and x3 = x+ +
3 − x3 , where x̂2 , x3 , x3 ≥ 0. Eqs. (2.57) and (2.60) are in the
form left-hand side (LHS) ≥ right-hand side (RHS), so to make an equality, subtract
a non-negative slack variable from the LHS (s1 and s4 ). Eq. (2.58) is in the form
LHS ≤ RHS, so add a non-negative slack variable to the LHS.


Min z = c1 x1 − c2 x̂2 + c3 (x+
3 − x3 )

s.t. a11 x1 − a12 x2 + a13 (x+
3 − x 3 ) − s 1 = b1

a21 x1 − a22 x̂2 + a23 (x+
3 − x 3 ) + s 2 = b2

a31 x1 − a32 x̂2 + a33 (x+
3 − x 3 ) = b3
a41 x1 − a42 x̂2 + a43 x3 − s4 = b4

x1 , x̂2 , x+
3 , x3 , s1 , s2 , s4 ≥ 0.

LP Canonical Form: For a minimization problem the canonical form of the


LP has the LHS of each constraint greater than or equal to the the RHS, and a
maximization the LHS less than or equal to the RHS, and non-negative variables.
Next we consider some formulation examples:

Production Problem: You have 21 units of transparent aluminum alloy (TAA),


LazWeld1, a joining robot leased for 23 hours, and CrumCut1, a cutting robot leased
for 17 hours of aluminum cutting. You also have production code for a bookcase,
desk, and cabinet, along with commitments to buy any of these you can produce for
$18, $16, and $10 apiece, respectively. A bookcase requires 2 units of TAA, 3 hours
of joining, and 1 hour of cutting, a desk requires 2 units of TAA, 2 hours of joining,
and 2 hour of cutting, and a cabinet requires 1 unit of TAA, 2 hours of joining, and
1 hour of cutting. Formulate an LP to maximize your revenue given your current
resources.
Decision variables:
xi : number of units of product i to produce,
∀i = {bookcase, desk, cabinet}.

max z = 18x1 + 16x2 + 10x3 :


2x1 + 2x2 + 1x3 ≤ 21 (T AA)
3x1 + 2x2 + 2x3 ≤ 23 (LazW eld1)
1x1 + 2x2 + 1x3 ≤ 17 (CrumCut1)
x1 , x2 , x3 ≥ 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.95 Basic feasible solution 249

2.95 Basic feasible solution


For a linear constraint ⃗aT⃗x ⊔ γ where ⊔ is ≥, ≤, or =, we call ⃗aT the coefficient
row-vector of the constraint.
Let S denote a system of linear constraints with n variables and m constraints
given by {⃗a(i) }T⃗x ⊔i bi where ⊔i is ≥, ≤, or = for i = 1, . . . , m.
For ⃗x′ ∈ Rn , let J(S,⃗x′ ) denote the set {i : ⃗aT x′ = bi } and define AS,⃗x′ to be
(i) ⃗
the matrix whose rows are precisely the coefficient row-vectors of the constraints
indexed by J(S,⃗x′ ).

■ Example 2.15 Suppose that S is the system

x1 + x2 − x3 ≥ 2
3x1 − x2 + x3 = 2
2x1 − x2 ≤ 1

 
1
⃗x′ then J(S,⃗x′ ) = {1, 2} since ⃗x′ satisfies the first two constraints with
 
If =
3,

2
 
1 1 −1
equality but not the third. Hence, AS,⃗x′ = . ■
3 −1 1

x∗ to S is called a basic feasible solution if the


Definition 2.33 A solution ⃗
rank of AS,⃗x∗ is n.

 
1
 
A basic feasible solution to the system in Example 2.15 is 1 .
 
0
It is not difficult to see that in two dimensions, basic feasible solutions correspond
to “corner points” of the set of all solutions. Therefore, the notion of a basic feasible
solution generalizes the idea of a corner point to higher dimensions.
The following result is the basis for what is commonly known as the corner
method for solving linear programming problems in two variables.
Theorem 2.41 Basic Feasible Optimal Solutioncorner Let (P) be a linear program-
ming problem. Suppose that (P) has an optimal solution and there exists a basic
feasible solution to its constraints. Then there exists an optimal solution that is a
basic feasible solution.
We first state the following simple fact from linear algebra:

Lemma 2.4. Let A ∈ Rm×n and d⃗ ∈ Rn be such that Ad⃗ = ⃗0. If ⃗q ∈ Rn satisfies
⃗qT d⃗ ̸= 0 then ⃗qT is not in the row space of A.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


250 Chapter 2. Introduction to Linear Optimization

Proof. Proof of Theorem 2.41.


Suppose that the system of constraints in (P), call it S, has m constraints and n
variables. Let the objective function be ⃗cT⃗x. Let v denote the optimal value.
Let ⃗x∗ be an optimal solution to (P) such that the rank of AS,⃗x∗ is as large as
possible. We claim that ⃗x∗ must be a basic feasible solution.
To ease notation, let J = J(S,⃗x∗ ). Let N = {1, . . . , m}\J.
Suppose to the contrary that the rank of AS,⃗x∗ is less than n. Let P⃗x = ⃗q denote
the system of equations obtained by setting the constraints indexed by J to equalities.
Then P⃗x = AS,⃗x∗ . Since P has n columns and its rank is less than n, there exists a
nonzero d⃗ such that Pd⃗ = ⃗0.
As ⃗x∗ satisfies each constraint indexed by N strictly, for a sufficiently small ϵ > 0,
⃗x∗ + ϵd⃗ and ⃗x∗ − ϵd⃗ are solutions to S and therefore are feasible to (P). Thus,

⃗cT (⃗x∗ + ϵd)


⃗ ≥v
(2.39)
⃗cT (⃗x∗ − ϵd)
⃗ ≥ v.

Since ⃗x∗ is an optimal solution, we have ⃗cT⃗x∗ = v. Hence, (2.39) simplifies to

ϵ⃗cT d⃗ ≥ 0
−ϵ⃗cT d⃗ ≥ 0,

giving us ⃗cT d⃗ = 0 since ϵ > 0.


Without loss of generality, assume that the constraints indexed by N are Q⃗x ≥ ⃗r.
P
As (P) does have a basic feasible solution, implying that the rank of   is n, at
Q
least one row of Q, which we denote by ⃗tT , must satisfy ⃗tT d⃗ ≠ 0. Without loss of
generality, we may assume that ⃗tT d⃗ > 0, replacing d⃗ with −d⃗ if necessary.
Consider the linear programming problem

min λ
⃗ ≥ p⃗
s.t. Q(⃗x∗ + λd)

Since at least one entry of Qd⃗ is positive (namely, ⃗tT d),


⃗ this problem must have
an optimal solution, say λ . Setting ⃗x = ⃗x + λ , we have that ⃗x′ is an optimal
′ ′ ∗ ′

solution since ⃗cT⃗x′ = v.


Now, ⃗x′ must satisfy at least one constraint in Q ≥ p⃗ with equality. Let ⃗qT be
the coefficient row-vector of one such constraint. Then the rows of AS,⃗x′ must have
all the rows of AS,⃗x∗ and ⃗qT . Since ⃗qT d⃗ ̸= 0, by Lemma 2.4, the rank of AS,⃗x′ is
larger than rank the rank of AS,⃗x∗ , contradicting our choice of ⃗x∗ . Thus, ⃗x∗ must be
a basic feasible solution. ■

Exercises
1. Find all basic feasible solutions to

x1 + 2x2 − x3 ≥ 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.95 Basic feasible solution 251

x2 + 2x3 ≥ 3
−x1 + 2x2 + x3 ≥ 3
−x1 + x2 + x3 ≥ 0.

2. A set S ⊂ Rn is said to be bounded if there exists a real number M > 0 such


that for every ⃗x ∈ S, |xi | < M for all i = 1, . . . , n. Let A ∈ Rm×n and ⃗b ∈ Rm .
Prove that if {⃗x : A⃗x ≥ ⃗b} is nonempty and bounded, then there is a basic
feasible solution to A⃗x ≥ ⃗b.
3. Let A ∈ Rm×n and ⃗b ∈ Rm where m and n are positive integers with m ≤ n.
Suppose that the rank of A is m and ⃗x′ is a basic feasible solution to

A⃗x = ⃗b
⃗x ≥ ⃗0.

Let J = {i : x′i > 0}. Prove that the columns of A indexed by J are linearly
independent.

Solutions
1. To obtain all the basic feasible solutions, it suffices to enumerate all subsystems
A′⃗x ≥ ⃗b′ of the given system such that the rank of A′ is three and solve A′⃗x = ⃗b′
for ⃗x and see if is a solution to the system, in which case it is a basic feasible
solution. Observe that every basic feasible solution can be discovered in this
manner.
We have at most four subsystems to consider.  
0
 
Setting the first three inequalities to equality gives the unique solution 1 

1
 
0
 
which satisfies the given system.. Hence, 1
  is a basic feasible solution.
1
Setting the
5
first, second, and fourth inequalities to equality gives the unique
3
1
solution 
 3  which violates the third inequality of the given system.
4
3
Setting the first, third, and fourth inequalities to equality leads to no solution.
(In fact, the coefficient matrix of the system does not have rank 3 and therefore
this case can be ignored.)
 
3
 
Setting the last three inequalities to equality gives the unique solution 3 

0
 
3
 
which satisfies the given system. Hence, 3
  is a basic feasible solution.
0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


252 Chapter 2. Introduction to Linear Optimization
   
0 3
   
Thus, 1 and 3
  
 are the only basic feasible solutions.
1 0
2. Let S denote the system A⃗x ≥ ⃗b. Let ⃗x′ be a solution to S such that the rank
of AS,⃗x′ is as large as possible. If the rank is n, then we are done. Otherwise,
there exists nonzero d⃗ ∈ Rn such AS,⃗x′ d⃗ = ⃗0. Since the set of solutions to S is
a bounded set, at least one of the following values is finite:
• max{λ : A(⃗x′ + λd) ⃗ ≥ ⃗b}
• min{λ : A(⃗x′ + λd) ⃗ ≥ ⃗b}
Without loss of generality, assume that the maximum is finite and is equal to
⃗ we have that the rows of AS,⃗x∗ contains all the rows
λ∗ . Setting ⃗x∗ to ⃗x′ + λ∗ d,
of AS,⃗x′ plus at least one additional row, say ⃗qT . Since ⃗qT d⃗ ̸= 0, by Lemma 2.4,
the rank of AS,⃗x∗ is larger than the rank of AS,⃗x′ , contradicting our choice of
⃗x′ .
3. The system of equations obtained from taking all the constraints satisfied with
equality by ⃗x′ is

A⃗x = ⃗b
(2.40)
xj = 0 j ∈
/ J.

Note that the coefficient matrix of this system has rank n if and only if it has
a unique solution. Now, (2.40) simplifies to

xj Aj = ⃗b,
X

j∈J

which has a unique solution if and only if the columns of A indexed by J are
linearly independent.

2.96 Fundamental Theorem of Linear Programming


Having used Fourier-Motzkin elimination to solve a linear programming problem,
we now will go one step further and use the same technique to prove the following
important result.
Theorem 2.42 Fundamental Theorem of Linear Programming For any given linear
programming problem, exactly one of the following holds:
the problem is infeasible;
2.
1. the problem is unbounded;
3. the problem has an optimal solution.

Proof. Without loss of generality, we may assume that the linear programming
problem is of the form

min ⃗cT⃗x
(2.41)
s.t. A⃗x ≥ ⃗b

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.96 Fundamental Theorem of Linear Programming 253
 
x
 1
.
where m and n are positive integers, A ∈ R m×n , ⃗b ∈ R , ⃗c ∈ R , and ⃗x =  .. 
m n 
 
xn
is a tuple of variables. Indeed, any linear programming problem can be converted to
a linear programming problem in the form of (2.41) having the same feasible region
and optimal solution set. To see this, note that a constraint of the form aT⃗x ≤ β
can be written as −aT⃗x ≥ −β; a constraint of the form aT⃗x = β written as a pair
of constraints aT⃗x ≥ β and −aT⃗x ≥ −β; and a maximization problem is equivalent
to the problem that minimizes the negative of the objective function subject to the
same constraints.
Suppose that (2.41) is not infeasible. Form the system

z −⃗cT⃗x ≥ 0
−z +⃗cT⃗x ≥ 0 (2.42)
A⃗x ≥ ⃗b.

Solving (2.41) is equivalent to finding among all the solutions to (2.42) one that
minimizes z, if it exists. Eliminating the variables x1 , . . . , xn (in any order) using
Fourier-Motzkin elimination gives a system of linear inequalities (S) containing at
most the variable z. By scaling, we may assume that the each coefficient of z in (S)
is 1, −1, or 0. Note that any z satisfying (S) can be extended to a solution to (2.42)
and the z value from any solution to (2.42) must satisfy (S).
That (2.41) is not unbounded implies that (S) must contain an inequality of the
form z ≥ β for some β ∈ R. (Why?) Let all the inequalites in which the coefficient of
z is positive be

z ≥ βi

where βi ∈ R for i = 1, . . . , p for some positive integer p. Let γ = max{β1 , . . . , βp }.


Then for any solution x, z to (2.42), z is at least γ. But we can set z = γ and extend
it to a solution to (2.42). Hence, we obtain an optimal solution for (2.41) and γ is
the optimal value. This completes the proof of the theorem.

Remark. We can construct multipliers to infer the inequality ⃗cT⃗x ≥ γ from the
system A⃗x ≥ ⃗b. Because we obtained the inequality z ≥ γ using Fourier-Motzkin
elimination, there must exist real numbers α, β, y1∗ , . . . , ym
∗ ≥ 0 such that

1 −⃗cT   h
   
0
h i  z i 
α β y1∗ ∗ T
· · · ym −1 ⃗c 
    ∗ ∗
≥ α β y1 · · · ym 0

⃗x ⃗b

0 A

is identically z ≥ γ. Note that we must have α − β = 1 and

y∗ ≥ 0, y∗ T A = ⃗cT , and y∗ T⃗b = γ

where y∗ = [y1∗ , . . . , ym
∗ ]T . Hence, y ∗ , . . . , y ∗ are the desired multipliers.,
1 m
The significance of the fact that we can infer ⃗cT⃗x ≥ γ where γ will be discussed
in more details when we look at duality theory for linear programming.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


254 Chapter 2. Introduction to Linear Optimization

Exercises
1. Determine the optimal value of the following linear programming problem:

min x
s.t. x + y ≥ 2
x − 2y + z ≥ 0
y − 2z ≥ −1.

2. Determine if the following linear programming problem has an optimal solution:

min x1 + 2x2
s.t. x1 + 3x2 ≥ 4
−x1 + x2 ≥ 0.

3. A set S ⊂ Rn is said to be bounded if there exists a real number M > 0 such


that for every ⃗x ∈ S, |xi | < M for all i = 1, . . . , n. Prove that every linear
programming problem with a bounded nonempty feasible region has an optimal
solution.

Solutions
1. The problem is equivalent to determining the minimum value for x among all
x, y, z satisfying

x+y ≥ 2 (1)
x − 2y + z ≥ 0 (2)
y − 2z ≥ −1. (3)

We use Fourier-Motzkin Elimination Method to eliminate z. Multiplying (3)


by 21 , we get

x+y ≥ 2 (1)
x − 2y + z ≥ 0 (2)
1 1
2y − z ≥ −2. (4)

Eliminating z, we obtain

x+y ≥ 2 (1)
x − 23 y ≥ − 12 (5)

where (5) is given by (2) + (4).


Multiplying (5) by 23 , we get

x+y ≥ 2 (1)
2
3x − y≥ − 13 (6)

Eliminating y, we get
5 5
3x ≥ 3 (7)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.97 Farkas’ Lemma 255

where (7) is given by (1) + (6). Multiplying (7) by 35 , we obtain x ≥ 1. Hence,


the minimum possible value for x is 1.
Note that setting x = 1, the system (1) and (6) forces y = 1. And (2) and (3)
together force z = 1. One can check that (x, y, z) = (1, 1, 1) is a feasible solution.
Remark. Note that the inequality x ≥ 1 is given by
3 3 3
(7) ⇐ (1) + (6)
5 5 5
3 2
⇐ (1) + (5)
5 5
3 2 2
⇐ (1) + (2) + (4)
5 5 5
3 2 1
⇐ (1) + (2) + (3)
5 5 5
2. It suffices to determine if there exists a minimum value for z among all the
solutions to the system

z − x1 − 2x2 ≥ 0 (1)
−z + x1 + 2x2 ≥ 0 (2)
x1 + 3x2 ≥ 4 (3)
−x1 + x2 ≥ 0 (4)

Using Fourier-Motzkin elimination to eliminate x1 , we obtain:

(1) + (2) : 0≥0


(1) + (3) : z + x2 ≥ 4 (5)
(2) + (4) : −z + 3x2 ≥ 0 (6)
(3) + (4) : 4x2 ≥ 4 (7)

Note that all the coefficients of x2 is nonnegative. Hence, eliminating x2 will


result in a system with no constraints. Therefore, there is no lower bound on the
value of z. In particular, if z = t for t ≤ 0, then from (5)–(6), we need x2 ≥ 4 − t,
3x2 ≥ t, and x2 ≥ 1. Hence, we can set x2 = 4 − t and x1 = −8 + 3t. This gives
a feasible solution for all t ≤ 0 with objective function value that approaches
−∞ as t → −∞. Hence, the linear programming problem is unbounded.
3. Let (P) denote a linear programming problem with a bounded nonempty feasible
region with objective function ⃗cT⃗x. By assumption, (P) is not infeasible. Note
that (P) is not unbounded because |⃗cT⃗x| ≤ i |ci ||xi | ≤ M i |ci |. Thus, by
P P

Theorem 2.42, (P) has an optimal solution.

2.97 Farkas’ Lemma


 that a system of linear equations A⃗
A well-known result in linear algebrastates x = ⃗b,
x
 1
.
where A ∈ Rm×n , ⃗b ∈ Rm , and ⃗x =  ..  is a tuple of variables, has no solution if

 
xn
and only if there exists ⃗y ∈ Rm such that ⃗y T A = ⃗0 and ⃗y T⃗b ̸= 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


256 Chapter 2. Introduction to Linear Optimization

It is easily seen that if such a ⃗y exists, then the system A⃗x = ⃗b cannot have a
solution. (Simply multiply both sides of A⃗x = ⃗b on the left by ⃗y T .) However, proving
the converse requires a bit of work. A standard elementary proof involves using
Gauss-Jordan elimination to reduce the original system to an equivalent system
Q⃗x = d⃗ such that Q has a row of zero, say in row i, with d⃗i ̸= 0. The process can be
captured by a square matrix M satisfying MA = Q. We can then take ⃗y T to be the
ith row of M.
An analogous result holds for systems of linear inequalities. The following result
is one of the many variants of a result known as the Farkas’ Lemma:

x, and ⃗b as above, the system A⃗x ≥ ⃗b has


Theorem 2.43 Farkas’ Lemma With A, ⃗
no solution if and only if there exists ⃗y ∈ Rm such that

⃗y ≥ ⃗0, ⃗y T A = ⃗0, ⃗y T⃗b > 0.

In other words, the system A⃗x ≥ ⃗b has no solution if and only if one can infer
the inequality 0 ≥ γ for some γ > 0 by taking a nonnegative linear combination of
the inequalities.
This result essentially says that there is always a certificate (the m-tuple ⃗y with
the prescribed properties) for the infeasibility of the system A⃗x ≥ ⃗b. This allows
third parties to verify the claim of infeasibility without having to solve the system
from scratch.

■ Example 2.16 For the system

2x − y + z ≥ 2
−x + y − z ≥ 0
−y + z ≥ 0,

adding two times the second inequality


 
and the third inequality to the first
1
 
inequality gives 0 ≥ 2. Hence, ⃗y = 
2 is a certificate of infeasibility for this

1
example. ■

We now give a proof of Theorem 2.45. It is easy to see that if such a ⃗y exists,
then the system A⃗x ≥ ⃗b has no solution.

2.98 Complementary slackness


Theorem 2.44 Weak Dualityweak-duality-cs Let (P) and (D) denote a primal-dual
pair of linear programming problems in generic form as defined previously. Let ⃗x∗
be a feasible solution to (P ) and ⃗y ∗ is a feasible solution to (D). Then the following
hold:
T
⃗cT⃗x∗ ≥ y⃗∗ ⃗b.
1. ⃗x∗ and ⃗y ∗ are optimal solutions to the respective problems if and only if the
2.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.98 Complementary slackness 257

following conditions (known as the complementary slackness conditions)


hold:
T
x∗j = 0 or y⃗∗ Aj = cj for j = 1, . . . , n
yi∗ = 0 or ⃗aT ⃗∗
(i) x = bi for i = 1, . . . , m

Part 1 of the theorem is known as weak duality. Part 2 of the theorem is often
called the Complementary Slackness Theorem.
Proof of Theorem 2.44. Note that if x∗j is constrained to be nonnegative, its corre-
T T
sponding dual constraint is y⃗∗ Aj ≤ cj . Hence, (cj − y⃗∗ Aj )x∗j ≥ 0 with equality if
T
and only if x∗j = 0 or y⃗∗ Aj = cj (or both).
T
If x∗j is constrained to be nonpositive, its corresponding dual constraint is y⃗∗ Aj ≥
T T
cj . Hence, (cj − y⃗∗ Aj )x∗j ≥ 0 with equality if and only if x∗j = 0 or y⃗∗ Aj = cj (or
both).
T
If x∗j is free, its corresponding dual constraint is y⃗∗ Aj = cj . Hence, (cj −
T
y⃗∗ Aj )x∗j = 0.
n
T
We can combine these three cases and obtain that (⃗cT − y⃗∗ A)x⃗∗ =
X
(cj −
j=1
T
y⃗∗ Aj )x∗j ≥ 0 with equality if and only if for each j = 1, . . . , n,
T
x∗j = 0 or y⃗∗ Aj = cj .

(Here, the usage of “or” is not exclusive.)


Similarly, with equality if and only if for each i = 1, . . . , n,

yi∗ = 0 or ⃗aT ⃗∗
(i) x = bi .

(Again, the usage of “or” is not exclusive.)


T T
Adding the inequalities (⃗cT − y⃗∗ A)x⃗∗ ≥ 0 and y⃗∗ (Ax⃗∗ − ⃗b) ≥ 0, we obtain
T
⃗cT⃗x∗ − y⃗∗ ⃗b ≥ 0 with equality if and only if the complementary slackness conditions
hold. By strong duality, ⃗x∗ is optimal (P) and ⃗y ∗ is optimal for (D) if and only if
T
⃗cT⃗x∗ = y⃗∗ ⃗b. The result now follows.

The complementary slackness conditions give a characterization of optimality
which can be useful in solving certain problems as illustrated by the following
example.

■ Example 2.17 Checking Optimalityunnamed-chunk-3 Let (P) denote the following

T.Abraha(PhD) @AKU, 2024 Linear Optimization


258 Chapter 2. Introduction to Linear Optimization

linear programming problem:

min 2x1 + 4x2 + 2x3


s.t. x1 + x2 + 3x3 ≤ 1
−x1 + 2x2 + x3 ≥ 1
3x2 − 6x3 = 0
x1 , x3 ≥ 0
x2 free.
 
0
Is ⃗x∗ = 
 2 


5 
an optimal solution to (P)? ■

1
5

Solution: One could answer this question by solving (P) and then see if the
objective function value of ⃗x∗ , assuming that its feasibiilty has already been verified,
is equal to the optimal value. However, there is a way to make use of the given
information to save some work.
Let (D) denote the dual problem of (P):
max y1 + y2
s.t. y1 − y2 ≤ 2
y1 + 2y2 + 3y3 = 4
3y1 + y2 − 6y3 ≤ 2
y1 ≤ 0
y2 ≥ 0
y3 free.
One can check that ⃗x∗ is a feasible solution to (P). If ⃗x∗ is optimal, then there
must exist a feasible solution ⃗y ∗ to (D) satisfying together with ⃗x∗ the complementary
slackness conditions:
y1∗ = 0 or x∗1 + x∗2 + 3x∗3 = 1
y2∗ = 0 or −x∗1 + 2x∗2 + x∗3 = 1
y3∗ = 0 or 3x∗2 − 6x∗3 = 0
x∗1 = 0 or y1∗ − y2∗ = 2
x∗2 = 0 or y1∗ + 2y2∗ + 3y3∗ = 4
x∗3 = 0 or 3y1∗ + y2∗ − 6y3∗ = 2.
As x∗2 , x∗3 > 0, satisfying the above conditions require that
y1∗ + 2y2∗ + 3y3∗ = 4
3y1∗ + y2∗ − 6y3∗ = 2.
Solving for y2∗ and y3∗ in terms of y1∗ gives y2∗ = 2 − y1∗ , y3∗ = 13 y1∗ . To make ⃗y ∗
feasible to (D), we can set y1∗ = 0 to obtain the feasible solution y1∗ = 0, y2∗ = 2, y3∗ = 0.
We can check that this ⃗y ∗ satisfies the complementary slackness conditions with ⃗x∗ .
Hence, ⃗x∗ is an optimal solution to (P) by Theorem 2.44, part 2.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.98 Complementary slackness 259

Exercises
1. Let (P) and (D) denote a primal-dual pair of linear programming problems.
Prove that if (P) is not infeasible and (D) is infeasible, then (P) is unbounded.
2. Let (P) denote the following linear programming problem:

min 4x2 + 2x3


s.t. x1 + x2 + 3x3 ≤ 1
x1 − 2x2 + x3 ≥ 1
x1 + 3x2 − 6x3 = 0
x1 , x3 ≥ 0
x2 free.

3
   
x1
   51 
x2  = − 5  is an optimal solution to (P).
Determine if    

x3 0
3. Let (P) denote the following linear programming problem:

min x1 + 2x2 − 3x3


s.t. x1 + 2x2 + 2x3 = 2
−x1 + x2 + x3 = 1
−x1 + x2 − x3 ≥ 0
x1 , x2 , x3 ≥ 0

   
x1 0
   
Determine if 
x2  = 1 is an optimal solution to (P).
  

x3 0
4. Let m and n be positive integers. Let A ∈ Rm×n . Let ⃗b ∈ Rm . Let ⃗c ∈ Rn . Let
(P) denote the linear programming problem

min ⃗cT⃗x
s.t. A⃗x = ⃗b
⃗x ≥ ⃗0.

Let (D) denote the dual problem of (P):

max ⃗y T⃗b
s.t. ⃗y T A ≤ ⃗cT .

Suppose that A has rank m and that (P) has at least one optimal solution.
Prove that if x∗j = 0 for every optimal solution x∗ to (P), then there exists an
optimal solution ⃗y ∗ to (D) such that (⃗y ∗ )T Aj < ci where Aj denotes the jth
column of A.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


260 Chapter 2. Introduction to Linear Optimization

Solutions

1. By the Fundamental Theorem of Linear Programming, (P) either is unbounded


or has an optimal solution. If it is the latter, then by strong duality, (D) has
an optimal solution, which contradicts that (D) is infeasible. Hence, (P) must
be unbounded.
2. We show that it is not an optimal solution to (P). First, note that the dual
problem of (P) is

max y1 + y2
s.t. y1 + y2 + y3 ≤ 0
y1 − 2y2 + 3y3 = 4
3y1 + y2 − 6y3 ≤ 2
y1 ≤ 0
y2 ≥ 0
y3 free.

\end{bmatrix}) were an optimal solution, there would exist y⃗∗ feasible to (D)
satisfying the complementary slackness conditions with ⃗x∗ :

y1∗ = 0 or x∗1 + x∗2 + 3x∗3 = 1


y2∗ = 0 or x∗1 − 2x∗2 + x∗3 = 1
y3∗ = 0 or x∗1 + 3x∗2 − 6x∗3 = 0
x∗1 = 0 or y1∗ + y2∗ + y3∗ = 0
x∗2 = 0 or y1∗ − 2y2∗ + 3y3∗ = 4
x∗3 = 0 or 3y1∗ + y2∗ − 6y3∗ = 2.

Since x∗1 + x∗2 + 3x∗3 < 1, we must have y1∗ = 0. Also, x∗1 , x∗2 are both nonzero.
Hence,

y1∗ + y2∗ + y3∗ = 0


y1∗ − 2y2∗ + 3y3∗ = 4,
implying that

y2∗ + y3∗ = 0
−2y2∗ + 3y3∗ = 4.
Solving gives y2∗ = − 54 and y3∗ = 45 . But this implies that y ∗ is not a feasible
solution to the dual problem since we need y2∗ ≥ 0. Hence, ⃗x∗ is not an optimal
solution to (P).
3. We show that it is not an optimal solution to (P). First, note that the dual

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.98 Complementary slackness 261

problem of (P) is
max 2y1 + y2
s.t. y1 − y2 − y3 ≤ 1
2y1 + y2 + y3 ≤ 2
2y1 + y2 − y3 ≤ −3
y1 , y2 free.
y3 ≥ 0
 
0
⃗x∗ = 
 
Note that 1
 is a feasible solution to (P). If it were an optimal solution
0
to (P), there would exist y⃗∗ feasible to the dual problem (D) satisfying the
complementary slackness conditions with ⃗x∗ :

y1∗ = 0 or x∗1 + 2x∗2 + 2x∗3 = 2


y2∗ = 0 or −x∗1 + x∗2 + x∗3 = 1
y3∗ = 0 or −x∗1 + x∗2 − x∗3 = 0
x∗1 = 0 or y1∗ − y2∗ − y3∗ = 1
x∗2 = 0 or 2y1∗ + y2∗ + y3∗ = 2
x∗3 = 0 or 2y1∗ + y2∗ − y3∗ = −3.

Since −x∗1 + x∗2 − x∗3 > 0, we must have y3∗ = 0. Also, x∗2 > 0 implies that
2y1∗ + y2∗ + y3∗ = 2. Simplifying gives y2∗ = 2 − 2y1∗ .
Hence, for y ∗ to be feasible to the dual problem, it needs to satisfy the third
constraint, 2y1∗ + (2 − 2y1∗ ) ≤ −3, which simplifies to the absurdity 2 ≤ −3.
Hence, ⃗x∗ is not an optimal solution to (P).
4. Let v denote the optimal value of (P). Let (P’) denote the problem
min −xi
s.t. A⃗x = ⃗b
⃗cT⃗x ≤ v
⃗x ≥ ⃗0
Note that x∗ is a feasible solution to (P’) if and only if it is an optimal solution
to (P). Since x∗i = 0 for every optimal solution to (P), we see that the optimal
value of (P’) is 0.
Let (D’) denote the dual problem of (P’):
max ⃗y T⃗b + vu
s.t. ⃗y T Ap + cp u ≤ 0 for all p ̸= i
⃗y T Ai + ci u ≤ −1
u ≤ 0.
Suppose that an optimal solution to (D’) is given by ⃗y ′ , u′ . Let ⃗y¯ be an optimal
solution to (D). We consider two cases.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


262 Chapter 2. Introduction to Linear Optimization

Case 1: u′ = 0.
Then (⃗y ′ )T⃗b = 0. Hence, ⃗y ∗ = ⃗y¯ +⃗y ′ is an optimal solution to (D) with (⃗y ∗ )T Ai <
ci .
Case 2: u′ < 0.
Then (⃗y ′ )T⃗b + vu′ = 0, implying that |u1′ | (⃗y ′ )T⃗b = v. Let ⃗y ∗ = |u|
1 ′
⃗y . Then ⃗y ∗ is
an optimal solution to (D) with (⃗y ∗ )T Ai < ci .

2.99 Farkas’ Lemma


 that a system of linear equations A⃗
A well-known result in linear algebrastates x = ⃗b,
x
 1
.
where A ∈ Rm×n , ⃗b ∈ Rm , and ⃗x =  ..  is a tuple of variables, has no solution if

 
xn
and only if there exists ⃗y ∈ R such that ⃗y T A = ⃗0 and ⃗y T⃗b ̸= 0.
m

It is easily seen that if such a ⃗y exists, then the system A⃗x = ⃗b cannot have a
solution. (Simply multiply both sides of A⃗x = ⃗b on the left by ⃗y T .) However, proving
the converse requires a bit of work. A standard elementary proof involves using
Gauss-Jordan elimination to reduce the original system to an equivalent system
Q⃗x = d⃗ such that Q has a row of zero, say in row i, with d⃗i ̸= 0. The process can be
captured by a square matrix M satisfying MA = Q. We can then take ⃗y T to be the
ith row of M.
An analogous result holds for systems of linear inequalities. The following result
is one of the many variants of a result known as the Farkas’ Lemma:

x, and ⃗b as above, the system A⃗x ≥ ⃗b has no solution if


Theorem 2.45 With A, ⃗
and only if there exists ⃗y ∈ Rm such that

⃗y ≥ ⃗0, ⃗y T A = ⃗0, ⃗y T⃗b > 0.

In other words, the system A⃗x ≥ ⃗b has no solution if and only if one can infer
the inequality 0 ≥ γ for some γ > 0 by taking a nonnegative linear combination of
the inequalities.
This result essentially says that there is always a certificate (the m-tuple ⃗y with
the prescribed properties) for the infeasibility of the system A⃗x ≥ ⃗b. This allows
third parties to verify the claim of infeasibility without having to solve the system
from scratch.

■ Example 2.18 For the system

2x − y + z ≥ 2
−x + y − z ≥ 0
−y + z ≥ 0,

adding two times the second inequality and the third inequality to the first

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.99 Farkas’ Lemma 263
 
1
 
inequality gives 0 ≥ 2. Hence, ⃗y = 2
  is a certificate of infeasibility for this
1
example. ■

We now give a proof of Theorem 2.45. It is easy to see that if such a ⃗y exists,
then the system A⃗x ≥ ⃗b has no solution.
Conversely, suppose that the system A⃗x ≥ ⃗b has no solution. It suffices to show
that we can infer the inequality 0 ≥ α for some postive α by taking nonnegative
linear combination of the inequalities in the system A⃗x ≥ ⃗b. If the system already
contains an inequality 0 ≥ α for some positive α, then we are done. Otherwise, we
show by induction on n that we can infer such an inequality.
Base case: The system A⃗x ≥ ⃗b has only one variable.
For the system to have no solution, there must exist two inequalites ax1 ≥ t and

−a′ x1 ≥ t′ such that a, a′ > 0 and at > −t 1
a′ . Adding a times the inequality ax1 ≥ t

and a1′ times the inequality −a′ x1 ≥ t′ gives the inequality 0 ≥ at + at ′ with a positive
right-hand side. This establishes the base case.
Induction hypothesis: Let n ≥ 2 be an integer. Assume that given any system
of linear inequalities A′⃗x ≥ ⃗b′ in n − 1 variables having no solution, one can infer the
inequality 0 ≥ α′ for some positive α′ by taking a nonnegative linear combination of
the inequalities in the system P⃗x ≥ ⃗q.
Apply Fourier-Motzkin elimination to eliminate xn from A⃗x ≥ ⃗b to obtain the
system P⃗x ≥ ⃗q. As A⃗x ≥ ⃗b has no solution, P⃗x ≥ ⃗q also has no solution.
By the induction hypothesis, one can infer the inequality 0 ≥ α for some positive
α by taking a nonnegative linear combination of the inequalities in P⃗x ≥ ⃗q. However,
each inequality in P⃗x ≥ ⃗q can be obtained from a nonnegative linear combination
of the inequalites in A⃗x ≥ ⃗b. Hence, one can infer the inequality 0 ≥ α by taking a
nonnegative linear combination of nonnegative linear cominbations of the inequalities
in A⃗x ≥ ⃗b. Since a nonnegative linear combination of nonnegative linear cominbations
of the inequalities in A⃗x ≥ ⃗b is simply a nonnegative linear combination of the
inequalities in A⃗x ≥ ⃗b, the result follows.

Remark. Notice that in the proof above, if A and ⃗b have only rational entries,
then we can take ⃗y to have only rational entries as well.

Corollary 2.1 Let A ∈ Rm×n and let ⃗b ∈ Rm . The system

A⃗x = ⃗b
⃗x ≥ ⃗0

has no solution if and only if there exists ⃗y ∈ Rm such that ⃗y T A ≤ ⃗0 and


⃗y T⃗b > 0.
Furthermore, if A and ⃗b are rational, ⃗y can be taken to be rational.
Proof. One can easily check that if such a ⃗y exists, there is no soluton.
We now prove the converse. The system
A⃗x = ⃗b
⃗x ≥ ⃗0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


264 Chapter 2. Introduction to Linear Optimization

can be rewritten as
⃗b
   
A
 ⃗
   
 x ≥ −b
−A ⃗

I ⃗0

where I is the n × n identity matrix. Then by Theorem 2.45, if this system has
no solution, then there exist ⃗u,⃗v ∈ Rm , w
⃗ ∈ Rn satisfying

⃗ ≥ ⃗0, uT A − vT A + w = 0, uT b − vT b > 0.
⃗u,⃗v , w

The result now follows from setting ⃗y = ⃗u −⃗v .


Rationality follows from the remark after the proof of Theorem 2.45.

Exercises
1. You are given that the following system has no solution.

x1 + x2 + 2x3 ≥ 1
−x1 + x2 + x3 ≥ 2
x1 − x2 + x3 ≥ 1
−x2 − 3x3 ≥ 0.
Obtain a certificate of infeasibility for the system.

Solutions
   
1 1 2 1
   
−1 1 1 2
1. The system can be written as A⃗x ≥ ⃗b with A =   and ⃗b =  .
   
 1 −1 1  1
   
0 −1 −3 0
So we need to find ⃗y ≥ 0 such that ⃗y T A = ⃗0 and ⃗y T⃗b > 0. As the system of
equations ⃗y T A = ⃗0 is homogeneous, we could without loss of generality fix
⃗y T⃗b = 1, thus leading to the system

⃗y T A = ⃗0
⃗y T⃗b = 1
⃗y ≥ ⃗0
that we could attempt to solve directly. However, it is possible to obtain a ⃗y
using the Fourier-Motzkin Elimination Method.
Let us first label the inequalities:

x1 + x2 + 2x3 ≥ 1 (1)
−x1 + x2 + x3 ≥ 2 (2)
x1 − x2 + x3 ≥ 1 (3)
−x2 − 3x3 ≥ 0. (4)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.100 Solving linear programming problems 265

Eliminating x1 gives:

−x2 − 3x3 ≥ 0 (4)


2x2 + 3x3 ≥ 3 (5)
2x3 ≥ 3. (6)

Note that (5) is obtained from (1) + (2) and (6) is obtained from (2) + (3).
Multiplying (5) by 12 gives

−x2 − 3x3 ≥ 0 (4)


3 3
x2 + x3 ≥ (7)
2 2
2x3 ≥ 3. (6)

Eliminating x2 gives:

2x3 ≥ 3 (6)
3 3
− x3 ≥ (8)
2 2
where (8) is obtained from (4) + (7).
Now 34 × (6) + (8) gives 0 ≥ 15 4 , a contradiction.
To obtain a certificate of infeasibility, we trace back the computations. Note
that 34 (6) + (8) is given by 34 ((2) + (3)) + (4) + (7), which in turn is given by
3 1 3 1
4 ((2)+(3))+(4)+ 2 (5), which in turn is given by 4 ((2)+(3))+(4)+ 2 ((1)+(2)).
15
Thus, we can obtain 0 ≥ 4 from the nonnegative linear combination of the
original inequalities as follows: 12 (1) + 45 (2) + 34 (3) + (4).
1
 
2
5
Therefore, ⃗y  43 
= is a certificate of infeasibility.

 
4
1
(Check that ⃗y T A = ⃗0 and ⃗y T⃗b > 0.

2.100 Solving linear programming problems


Fourier-Motzkin elimination can actually be used to solve a linear programming
problem though the method is not efficient and is almost never used in practice. We
illustrate the process with an example.
Consider the following linear programming problem:

min x + y
s.t. x + 2y ≥ 2 (2.43)
3x + 2y ≥ 6.

Observe that (3.1) is equivalent to

T.Abraha(PhD) @AKU, 2024 Linear Optimization


266 Chapter 2. Introduction to Linear Optimization

min z
s.t. z − x − y = 0
(2.44)
x + 2y ≥ 2
3x + 2y ≥ 6.

Note that the objective function is replaced with z and z is set to the original
objective function in the first constraint of (3.2) since z = x + y if and only if
z − x − y = 0. Then, solving (3.2) is equivalent to finding among all the solutions to
the following system a solution that minimizes z, if it exists.

z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
3x + 2y ≥ 6 (4)

Since we are interested in the minimum possible value for z we use Fourier-Motzking
elimination to eliminate the variables x and y.
To eliminate x, we first multiply (4) by 31 to obtain:

z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
x + 32 y ≥ 2 (5)

Then eliminate x to obtain


(1) + (2) : 0≥0
(1) + (3) : z + y ≥ 2 (6)
(1) + (5) : z − 31 y ≥ 2 (7)

Note that there is no need to keep the first inequality. To eliminate y, we first
multiply (7) by 3 to obtain:

z +y ≥ 2 (6)
3z − y ≥ 6 (8)

Then eliminate y to obtain

4z ≥ 8 (9)

Multiplying (9) by 14 gives z ≥ 2. Hence, the minimum possible value for z among
all the solutions to the system is 2. So the optimal value of (3.2) is 2. To obtain an
optimal solution, set z = 2. Then we have no choice but to set y = 0 and x = 2. One
can check that (x, y) = (2, 0) is a feasible solution with objective function value 2.
We can obtain an independent proof that the optimal value is indeed 2 if we
trace back the computations. Note that the inequality z ≥ 2 is given by

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 267

1 1 1
(9) ⇐ (6) + (8)
4 4 4
1 1 3
⇐ (1) + (3) + (7)
4 4 4
1 1 3 3
⇐ (1) + (3) + (1) + (5)
4 4 4 4
1 1
⇐ (1) + (3) + (4)
4 4

This shows that 14 (3) + 14 (4) gives the inequality x + y ≥ 2. Hence, no feasible
solution to (3.1) can have objective function value less than 2. But we have found
one feasible solution with objective function value 2. Hence, 2 is the optimal value.

2.101 Graphical example


To motivate the subject of linear programming (LP), we begin with a planning
problem that can be solved graphically.

■ Example 2.19 Lemonade Vendorlemonadevendor Say you are a vendor of lemonade


and lemon juice. Each unit of lemonade requires 1 lemon and 2 litres of water.
Each unit of lemon juice requires 3 lemons and 1 litre of water. Each unit of
lemonade gives a profit of three dollars. Each unit of lemon juice gives a profit of
two dollars. You have 6 lemons and 4 litres of water available. How many units of
lemonade and lemon juice should you make to maximize profit?
If we let x denote the number of units of lemonade to be made and let y denote
the number of units of lemon juice to be made, then the profit is given by 3x + 2y
dollars. We call 3x + 2y the objective function. Note that there are a number of
constraints that x and y must satisfied. First of all, x and y should be nonnegative.
The number of lemons needed to make x units of lemonade and y units of lemon
juice is x + 3y and cannot exceed 6. The number of litres of water needed to make
x units of lemonade and y units of lemon juice is 2x + y and cannot exceed 4.
Hence, to determine the maximum profit, we need to maximize 3x + 2y subject to
x and y satisfying the constraints x + 3y ≤ 6, 2x + y ≤ 4, x ≥ 0, and y ≥ 0.
A more compact way to write the problem is as follows:

maximize 3x + 2y
subject to x + 3y ≤ 6
2x + y ≤ 4
x ≥ 0
y ≥ 0.

We can solve
 this maximization problem graphically as follows. We first sketch the
x
set of   satisfying the constraints, called the feasible region, on the (x, y)-plane.
y

T.Abraha(PhD) @AKU, 2024 Linear Optimization


268 Chapter 2. Introduction to Linear Optimization

We then take the objective function 3x + 2y and turn it into an equation of a line
3x + 2y = z where z is a parameter. Note that as the value of z increases, the line

3
defined by the equation 3x + 2y = z moves in the direction of the normal vector  .
2
We call this direction the direction of improvement. Determining the maximum
value of the objective function, called the optimal value, subject to the contraints
amounts to finding the maximum value of z so that the line defined by the equation
3x + 2y = z still intersects the feasible region.

x>=0

Direction of improvement
2x+
y
<=
4

(1.2,1.6)

x+3
y<=
6

y>=0
3x
3x

3x

+2
+2

+2

y=
y=

y=

6.8
0

In the figure above, the lines with z at 0, 4 and 6.8 have been drawn. From the
picture, we can see that if z is greater than 6.8, the line defined by 3x + 2y = z will
not intersect the feasible region. Hence, the profit cannot exceed 6.8 dollars.
As the line 3x + 2y = 6.8 does intersect the feasible region, 6.8 is the maximum
value for the objective function. Note that there is only  one point in the feasible
x 1.2
region that intersects the line 3x + 2y = 6.8, namely   =   . In other words, to
y 1.6
maximize profit, we want to make 1.2 units of lemonade and 1.6 units of lemon juice.
The above solution method can hardly be regarded  as rigorous because we relied
x
on a picture to conclude that 3x + 2y ≤ 6.8 for all   satisfying the constraints. But
y
we can actually show this algebraically.
Note that multiplying both sides of the constraint x+3y ≤ 6 gives 0.2x+0.6y ≤ 1.2,
and multiplying both sides of the constraint 2x + y ≤ 4 gives 2.8x + 1.4y ≤ 5.6.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 269
 
x
Hence, any   that satisfies both x + 3y ≤ 6 and 2x + y ≤ 4 must also satisfy
y
(0.2x + 0.6y) + (2.8x + 1.4y) ≤ 1.2 + 5.6, which simplifies to 3x + 2y ≤ 6.8 as desired!
(Here, we used the fact that if a ≤ b and c ≤ d, then a + c ≤ b + d.)
Now, one might ask if it is always possible to find an algebraic proof like the one
above for similar problems. If the answer is yes, how does one find such a proof? We
will see answers to this question later on.
Before we end this segment, let us consider the following problem:
minimize −2x + y
subject to −x + y ≤ 3
x − 2y ≤ 2
x ≥ 0
y ≥ 0.
   
x t
Note that for any t ≥ 0,   =   satisfies all the constraints. The value of the
y t
   
x t
objective function at   =   is −t. As t → ∞, the value of the objective function
y t
tends to −∞. Therefore, there is no minimum value for the objective function. The
problem is said to be unbounded. Later on, we will see how to detect unboundedness
algorithmically.
 As an exercise,
 check that unboundedness can also be established by using
x 2t + 2
 = for t ≥ 0.
y t

Exercises
 
x
1. Sketch all   satisfying
y

x − 2y ≤ 2
on the (x, y)-plane.
2. Determine the optimal value of
Minimize x + y
Subject to 2x + y ≥ 4
x + 3y ≥ 1.
3. Show that the problem
Minimize −x + y
Subject to 2x − y ≥ 0
x + 3y ≥ 3
is unbounded.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


270 Chapter 2. Introduction to Linear Optimization

4. Suppose that you are shopping for dietary supplements to satisfy your required
daily intake of 0.40mg of nutrient M and 0.30mg of nutrient N . There are
three popular products on the market. The costs and the amounts of the two
nutrients are given in the following table:

Product 1 Product 2 Product 3


Cost $27 $31 $24
Daily amount of M 0.16 mg 0.21 mg 0.11 mg
Daily amount of N 0.19 mg 0.13 mg 0.15 mg

You want to determine how much of each product you should buy so that the
daily intake requirements of the two nutrients are satisfied at minimum cost.
Formulate your problem as a linear programming problem, assuming that you
can buy a fractional number of each product.

Solutions
1. The points (x, y) satisfying x − 2y ≤ 2 are precisely those above the line passing
through (2, 0) and (0, −1).
2. We want to determine the minimum value z so that x + y = z defines a line
that has a nonempty intersection with the feasible region. However, we can
avoid referring to a sketch by setting x = z − y and substituting for x in the
inequalities to obtain:

2(z − y) + y ≥ 4
(z − y) + 3y ≥ 1,
or equivalently,
1
z ≥ 2+ y
2
z ≥ 1 − 2y,
Thus, the minimum value for z is min{2 + 12 y, 1 − 2y}, which occurs at y = − 25 .
Hence, the optimal value is 95 .
We can verify our work by doing the following. If our calculations above are
correct, then an optimal solution is given by x = 11 2
5 , y = − 5 since x = z − y. It
is easy to check that this satisfies both inequalities and therefore is a feasible
solution.
Now, taking 25 times the first inequality and 15 times the second inequality,
we can infer the inequality x + y ≥ 95 . The left-hand side of this inequality is
precisely the objective function. Hence, no feasible solution can have objective
function value less than 95 . But x = 11 2
5 , y = − 5 is a feasible solution with
9
objective function value equal to 5 . As a result, it must be an optimal solution.
Remark. We have not yet discussed how to obtain the multipliers 25 and 15 for
inferring the inequality x + y ≥ 59 . This is an issue that will be taken up later.
In the meantime, think about how one could have obtained these multipliers
for this particular exercise.
3. We could glean some insight by first making a sketch on the (x, y)-plane.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 271

The
  line
 defined
 by −x + y = z has x-intercept −z. Note that for z ≤ −3,
x
 =
−z 
satisfies both inequalities and the value of the objective function
y 0
   
x −z
at   =   is z. Hence, there is no lower bound on the value of objective
y 0
function.
4. Let xi denote the amount of Product i to buy for i = 1, 2, 3. Then, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.
Remark. If one cannot buy fractional amounts of the products, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.
x1 , x2 , x3 ∈ Z.
x2
4

3 3

x2
+2
1
−x
5

2

x2

1
x1

2x

1
+
x2

3

0 x1
-1 0 1 2 3 4 5

-1

-2

-3
footnote https://tex.stackexchange.
com/questions/75933/how-to-draw-the-region-of-inequality
Work Scheduling Problem: You are the manager of LP Burger. The following
table shows the minimum number of employees required to staff the restaurant on
each day of the week. Each employees must work for five consecutive days. Formulate
an LP to find the minimum number of employees required to staff the restaurant.
Decision variables:
xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7

T.Abraha(PhD) @AKU, 2024 Linear Optimization


272 Chapter 2. Introduction to Linear Optimization

Day of Week Workers Required


1 = Monday 6
2 = Tuesday 4
3 = Wednesday 5
4 = Thursday 4
5 = Friday 3
6 = Saturday 7
7 = Sunday 7

Min z = x1 + x2 + x3 + x4 + x5 + x6 + x7
s.t. x1 + x4 + x5 + x6 + x7 ≥ 6
x2 + x5 + x6 + x7 + x1 ≥ 4
x3 + x6 + x7 + x1 + x2 ≥ 5
x4 + x7 + x1 + x2 + x3 ≥ 4
x5 + x1 + x2 + x3 + x4 ≥ 3
x6 + x2 + x3 + x4 + x5 ≥ 7
x7 + x3 + x4 + x5 + x6 ≥ 7
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.
The solution is as follows:

LP Solution IP Solution
zLP = 7.333 zI = 8.0
x1 = 0 x1 = 0
x2 = 0.333 x2 = 0
x3 = 1 x3 = 0
x4 = 2.333 x4 = 3
x5 = 0 x5 = 0
x6 = 3.333 x6 = 4
x7 = 0.333 x7 = 1

LP Burger has changed it’s policy, and allows, at most, two part time workers,
who work for two consecutive days in a week. Formulate this problem.
Decision variables:
xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7
yi : the number of workers that start 2 consecutive days of work on day i, i = 1, · · · , 7.

Min z = 5(x1 + x2 + x3 + x4 + x5 + x6 + x7 )

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 273

+ 2(y1 + y2 + y3 + y4 + y5 + y6 + y7 )
s.t. x1 + x4 + x5 + x6 + x7 + y1 + y7 ≥ 6
x2 + x5 + x6 + x7 + x1 + y2 + y1 ≥ 4
x3 + x6 + x7 + x1 + x2 + y3 + y2 ≥ 5
x4 + x7 + x1 + x2 + x3 + y4 + y3 ≥ 4
x5 + x1 + x2 + x3 + x4 + y5 + y4 ≥ 3
x6 + x2 + x3 + x4 + x5 + y6 + y5 ≥ 7
x7 + x3 + x4 + x5 + x6 + y7 + y6 ≥ 7
y1 + y2 + y3 + y4 + y5 + y6 + y7 ≤ 2
xi ≥ 0, yi ≥ 0, ∀i = 1, · · · , 7.

The Diet Problem: In the future (as envisioned in a bad 70’s science fiction
film) all food is in tablet form, and there are four types, green, blue, yellow, and red.
A balanced, futuristic diet requires, at least 20 units of Iron, 25 units of Vitamin B,
30 units of Vitamin C, and 15 units of Vitamin D. Formulate an LP that ensures a
balanced diet at the minimum possible cost.

Tablet Iron B C D Cost ($)


green (1) 6 6 7 4 1.25
blue (2) 4 5 4 9 1.05
yellow (3) 5 2 5 6 0.85
red (4) 3 6 3 2 0.65

Now we formulate the problem:


Decision variables:
xi : number of tablet of type i to include in the diet, ∀i ∈ {1, 2, 3, 4}.

Min z = 1.25x1 + 1.05x2 + 0.85x3 + 0.65x4


s.t. 6x1 + 4x2 + 5x3 + 3x4 ≥ 20
6x1 + 5x2 + 2x3 + 6x4 ≥ 25
7x1 + 4x2 + 5x3 + 3x4 ≥ 30
4x1 + 9x2 + 6x3 + 2x4 ≥ 15
x1 , x2 , x3 , x4 ≥ 0.

The Next Diet Problem: Progress is important, and our last problem had
too many tablets, so we are going to produce a single, purple, 10 gram tablet for our
futuristic diet requires, which are at least 20 units of Iron, 25 units of Vitamin B,
30 units of Vitamin C, and 15 units of Vitamin D, and 2000 calories. The tablet is
made from blending 4 nutritious chemicals; the following table shows the units of our
nutrients per, and cost of, grams of each chemical. Formulate an LP that ensures a
balanced diet at the minimum possible cost.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


274 Chapter 2. Introduction to Linear Optimization

Tablet Iron B C D Calories Cost ($)


Chem 1 6 6 7 4 1000 1.25
Chem 2 4 5 4 9 250 1.05
Chem 3 5 2 5 6 850 0.85
Chem 4 3 6 3 2 750 0.65

Decision variables:
xi : grams of chemical i to include in the purple tablet, ∀i = 1, 2, 3, 4.

Minz = 1.25x1 + 1.05x2 + 0.85x3 + 0.65x4


s.t. 6x1 + 4x2 + 5x3 + 3x4 ≥ 20
6x1 + 5x2 + 2x3 + 6x4 ≥ 25
7x1 + 4x2 + 5x3 + 3x4 ≥ 30
4x1 + 9x2 + 6x3 + 2x4 ≥ 15
1000x1 + 250x2 + 850x3 + 750x4 ≥ 2000
x1 + x2 + x3 + x4 = 10
x1 , x2 , x3 , x4 ≥ 0.

The Assignment Problem: Consider the assignment of n teams to n projects,


where each team ranks the projects, where their favorite project is given a rank
of n, their next favorite n − 1, and their least favorite project is given a rank of
1. The assignment problem is formulated as follows (we denote ranks using the
R-parameter):
Variables:
xij : 1 if project i assigned to team j, else 0.
n X
X n
Max z = Rij xij
i=1 j=1
n
X
s.t. xij = 1, ∀j = 1, · · · , n
i=1
Xn
xij = 1, ∀i = 1, · · · , n
j=1
xij ≥ 0, ∀i = 1, · · · , n, j = 1, · · · , n.

The assignment problem has an integrality property, such that if we remove the
binary restriction on the x variables (now just non-negative, i.e., xij ≥ 0) then we
still get binary assignments, despite the fact that it is now an LP. This property is
very interesting and useful. Of course, the objective function might not quite what
we want, we might be interested ensuring that the team with the worst assignment
is as good as possible (a fairness criteria). One way of doing this is to modify the
assignment problem using a max-min objective:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 275

Max-min Assignment-like Formulation

M ax z
n
X
s.t. xij = 1, ∀j = 1, · · · , n
i=1
Xn
xij = 1, ∀i = 1, · · · , n
j=1
xij ≥ 0, ∀i = 1, · · · , n, J = 1, · · · , n
n
X
z≤ Rij xij , ∀j = 1, · · · , n.
i=1

Does this formulation have the integrality property (it is not an assignment problem)?
Consider a very simple example where two teams are to be assigned to two projects
and the teams give the projects the following rankings: Both teams prefer Project

Project 1 Project 2
Team 1 2 1
Team 2 2 1

2. For both problems, if we remove the binary restriction on the x-variable, they
can take values between (and including) zero and one. For the assignment problem
the optimal solution will have z = 3, and fractional x-values will not improve z. For
the max-min assignment problem this is not the case, the optimal solution will have
z = 1.5, which occurs when each team is assigned half of each project (i.e., for Team
1 we have x11 = 0.5 and x21 = 0.5).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


276 Chapter 2. Introduction to Linear Optimization

Linear Data Models: Consider a data set that consists of n data points (xi , yi ).
We want to fit the best line to this data, such that given an x-value, we can predict
the associated y-value. Thus, the form is yi = αxi + β and we want to choose the α
and β values such that we minimize the error for our n data points.
Variables:
ei : error for data point i, i = 1, · · · , n.
α : slope of fitted line.
β : intercept of fitted line.
n
X
M in |ei |
i=1
s.t. αxi + β − yi = ei , i = 1, · · · , n
ei , α, β urs.
Of course, absolute values are not linear function, so we can linearize as follows:
Decision variables:
+
ei : positive error for data point i, i = 1, · · · , n.
e−
i : negative error for data point i, i = 1, · · · , n.
α : slope of fitted line.
β : intercept of fitted line.
n

e+
X
M in i + ei
i=1

s.t. αxi + β − yi = e+ i − ei , i = 1, · · · , n

e+
i , ei ≥ 0, α, β urs.

Two-Person Zero-Sum Games: Consider a game with two players, A and B.


In each round of the game, A chooses one out of m possible actions, while B chooses
one out of n actions. If A takes action j while B takes action i, then cij is the payoff
for A , if cij > 0, A “wins" cij (and B losses that amount), and if cij < 0 if B “wins"
−cij (and A losses that amount). This is a two-person zero-sum game.
Rock, Paper, Scissors is a two-person zero-sum game, with the following payoff
matrix.
A
R P S
R 0 1 -1
B P -1 0 1
S 1 -1 0

We can have a similar game, but with a different payoff matrix, as follows:
A
R P S
R 4 -1 -1
B P -2 4 -2
S -3 -3 4

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 277

What is the optimal strategy for A (for either game)? We define xj as the
probability that A takes action j (related to the columns). Then the payoff for A, if
B takes action i is m j=1 cij xj . Of course, A does not know what action B will take,
P

so let’s find a strategy that maximizes the minimum expected winnings of A given
any random strategy of B, which we can formulate as follows:
m
X
M ax (mini=1,··· ,n cij xi )
j=1
m
X
s.t. xj = 1
j=1
xj ≥ 0, i = 1, · · · , m,

which can be linearized as follows:

M ax z
m
X
s.t. z≤ cij xj , i = 1, · · · , n
j=1
m
X
xj = 1
j=1
xj ≥ 0, i = 1, · · · , m.

The last two constraints ensure the that xi -variables are valid probabilities. If
you solved this LP for the first game (i.e., payoff matrix) you find the best strategy
is x1 = 1/3, x2 = 1/3, and x3 = 1/3 and there is no expected gain for player A. For
the second game, the best strategy is x1 = 23/107, x2 = 37/107, and x3 = 47/107,
with A gaining, on average, 8/107 per round.

Applications of Linear Optimization


Linear optimization has many applications in various fields, including:
• Resource allocation problems
• Production planning problems
• Portfolio optimization problems
• Supply chain management problems
Linear Programming Algorithms
There are several algorithms used to solve linear programming problems, including:
• The Simplex Algorithm
• The Ellipsoid Algorithm
• The Interior-Point Method

Components of a Linear Program


A linear program consists of the following components:
Objective Function: The objective function is a linear function that represents
the goal of the problem. It is typically denoted as z = cT x, where c is the cost vector
and x is the decision vector. Decision Variables: The decision variables are the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


278 Chapter 2. Introduction to Linear Optimization

variables that are being optimized in the problem. They are typically denoted as
x1 , x2 , . . . , xn . Constraints: The constraints are the restrictions imposed on the
problem. They are typically denoted as Ax ≤ b, where A is the coefficient matrix and
b is the right-hand side vector. Non-Negativity Constraints: The non-negativity
constraints are the constraints that restrict the values of the decision variables to be
non-negative.

Formulating Linear Programs

Formulating a linear program involves defining the objective function, decision


variables, constraints, and non-negativity constraints. Here are some examples of
how to formulate linear programs:

■Example 2.20 Minimize the cost of producing a product subject to a constraint


on the total production. ■

■ Example 2.21 Maximize the profit of a company subject to constraints on the


production levels. ■

A linear program is the problem of optimizing a linear objective function in the


decision variables, x1 . . . xn , subject to linear equality or inequality constraints on
the xi ’s. In standard form, it is expressed as:

n
X
Min cj x j (objective function)
j=1
subject to:
n
X
aij xj = bi , i = 1 . . . m (constraints)
j=1
xj ≥ 0, j = 1 . . . n (non-negativity constraints)

where {aij , bi , cj } are given.

A linear program is expressed more conveniently using matrices:


Ax = b
min cT x subject to
 x ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.101 Graphical example 279

where
 
x
 1 
 .. 
x =  .  ∈ Rn×1
 
xn
 
b
 1
 ..

b =  . ∈ Rm×1


 
b
 m
c1

.. 
c = 
 . 

∈ Rn×1
 
cn
 
a11
...
 
A =  ∈ Rm×n
 

 
amn

Basic Terminology
Definition 2.34 If x satisfies Ax = b, x ≥ 0, then x is feasible.

Definition 2.35 A linear program (LP) is feasible if there exists a feasible solution,
otherwise it is said to be infeasible.

Definition 2.36 An optimal solution x∗ is a feasible solution s.t. cT x∗ =


min{cT x : Ax = b, x ≥ 0}.

Definition 2.37 LP is unbounded (from below) if ∀λ ∈ R, ∃ a feasible x∗ s.t.


cT x∗ ≤ λ.

Equivalent Forms
A linear program can take on several forms. We might be maximizing instead of
minimizing. We might have a combination of equality and inequality contraints.
Some variables may be restricted to be non-positive instead of non-negative, or be
unrestricted in sign. Two forms are said to be equivalent if they have the same set of
optimal solutions or are both infeasible or both unbounded.
1. A maximization problem can be expressed as a minimization problem.

max cT x ⇔ min −cT x

T.Abraha(PhD) @AKU, 2024 Linear Optimization


280 Chapter 2. Introduction to Linear Optimization

2. An equality can be represented as a pair of inequalities.


 aTi x ≤ bi

 aT x ≥ b i
aTi x = bi  i
 aTi x ≤ bi
⇔ T
 −a x ≤ −bi
i

3. By adding a slack variable, an inequality can be represented as a combination


of equality and non-negativity constraints.

aTi x ≤ bi ⇔ aTi x + si = bi , si ≥ 0.

4. Non-positivity constraints can be expressed as non-negativity constraints.


To express xj ≤ 0, replace xj everywhere with −yj and impose the condition
yj ≥ 0.
5. x may be unrestricted in sign.
If x is unrestricted in sign, i.e. non-positive or non-negative, everywhere replace
− −
xj by x+ +
j − xj , adding the constraints xj , xj ≥ 0.

In general, an inequality can be represented using a combination of equality and


non-negativity constraints, and vice versa.
n n
Using these rules, min cT x s.t. Ax ≥ b} can be transformed into min cT x+ − cT x−
o
s.t. Ax+ − Ax− − Is = b, x+ , x− , s ≥ 0 . The former LP is said to be in canonical
form, the latter in standard form.
n
Conversely, an LP in standard form may be written in canonical form. min cT x
n
s.t. Ax = b, x ≥ 0} is equivalent to min cT x s.t. Ax ≥ b, −Ax ≥ −b, Ix ≥ 0}. This
   
A b
′ ′ ′   ′  
may be rewritten as A x ≥ b , where A = 
 -A 
 and b = 
 -b .

I 0

■ Example 2.22 Consider the following linear program:





 x1 ≥ 2

3x1 − x2 ≥ 0


min x2 subject to




x1 + x2 ≥ 6
−x1 + 2x2 ≥


0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.102 The Geometry of LP 281

Figure 2.9: Graph representing primal in example.

The optimal solution is (4, 2) of cost 2 (see Figure 2.9). If we were maximizing x2
instead of minimizing under the same feasible region, the resulting linear program
would be unbounded since x2 can increase arbitrarily. From this picture, the reader
should be convinced that, for any objective function for which the linear program is
bounded, there exists an optimal solution which is a “corner” of the feasible region.
We shall formalize this notion in the next section.
An example of an infeasible linear program can be obtained by reversing some of
the inequalities of the above LP:

x1 ≤ 2
3x1 − x2 ≥ 0
x1 + x2 ≥ 6
−x1 + 2x2 ≤ 0.

2.102 The Geometry of LP


Let P = {x : Ax = b, x ≥ 0} ⊆ Rn .

Definition 2.38 x is a vertex of P if ̸ ∃y ̸= 0 s.t. x + y, x − y ∈ P .


Theorem 2.46 Assume min{cT x : x ∈ P } is finite, then ∀x ∈ P, ∃ a vertex x such

that cT x ≤ cT x.

Proof. If x is a vertex, then take x = x.
If x is not a vertex, then, by definition, ∃y =
̸ 0 s.t. x + y, x − y ∈ P . Since
A(x + y) = b and A(x − y) = b, Ay = 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


282 Chapter 2. Introduction to Linear Optimization

WLOG, assume cT y ≤ 0 (take either y or −y). If cT y = 0, choose y such that ∃j


s.t. yj < 0. Since y ̸= 0 and cT y = cT (−y) = 0, this must be true for either y or −y.
Consider x + λy, λ > 0. cT (x + λy) = cT x + λcT y ≤ cT x, since cT y is assumed
non-positive.
Case 1 ∃j such that yj < 0
As λ increases, component j decreases until x + λy is no longer feasible.
Choose λ = min{j:yj <0} {xj /−yj } = xk /−yk . This is the largest λ such that
x + λy ≥ 0. Since Ay = 0, A(x + λy) = Ax + λAy = Ax = b. So x + λy ∈ P , and
moreover x + λy has one more zero component, (x + λy)k , than x.
Replace x by x + λy.
Case 2 yj ≥ 0 ∀j
By assumption, cT y < 0 and x + λy is feasible for all λ ≥ 0, since A(x + λy) =
Ax + λAy = Ax = b, and x + λy ≥ x ≥ 0. But cT (x + λy) = cT x + λcT y → −∞
as λ → ∞, implying LP is unbounded, a contradiction.
Case 1 can happen at most n times, since x has n components. By induction on

the number of non-zero components of x, we obtain a vertex x .

R The theorem was described in terms of the polyhedral set P = {x : Ax =


b : x ≥ 0}. Strictly speaking, the theorem is not true for P = {x : Ax ≥ b}.
Indeed, such a set P might not have any vertex. For example, consider
P = {(x1 , x2 ) : 0 ≤ x2 ≤ 1} (see Figure 2.10). This polyhedron has no vertex,
since for any x ∈ P , we have x + y, x − y ∈ P , where y = (1, 0). It can be shown
that P has a vertex iff Rank(A) = n. Note that, if we transform a program in
canonical form into standard form, the non-negativity constraints imply that
the resulting matrix A has full column rank, since
 
A
 
Rank  -A  = n.
 
I

Figure 2.10: A polyhedron with no vertex.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.102 The Geometry of LP 283

Corollary 2.2 If min{cT x : Ax = b, x ≥ 0} is finite, There exists an optimal solution,


x∗ , which is a vertex.

Proof. Suppose not. Take an optimal solution. By Theorem 2.46 there exists a
vertex costing no more and this vertex must be optimal as well. ■

Corollary 2.3 If P = {x : Ax = b, x ≥ 0} ̸= ∅, then P has a vertex.

Theorem 2.47 Let P = {x : Ax = b, x ≥ 0}. For x ∈ P , let Ax be a sub-matrix of A


corresponding to j s.t. xj > 0. Then x is a vertex iff Ax has linearly independent
columns. (i.e. Ax has full column rank.)
 
  2  
2 1 3 0   2 3
   0   
Example A = 
 7 3 2 1 
 x = 


 Ax = 
 7 2 , and x is a vertex.

 1 
0 0 0 5  
0 0
0
Proof. Show ¬i → ¬ii.
Assume x is not a vertex. Then, by definition, ∃y ̸= 0 s.t. x + y, x − y ∈ P . Let
Ay be sub-matrix corresponding to non-zero components of y.
As in the proof of Theorem 2.46,

Ax + Ay = b 
⇒ Ay = 0.
Ax − Ay = b 
Therefore, Ay has dependent columns since y ̸= 0.
Moreover,

x + y ≥ 0 
⇒ yj = 0 whenever xj = 0.
x − y ≥ 0 
Therefore Ay is a sub-matrix of Ax . Since Ay is a sub-matrix of Ax , Ax has
linearly dependent columns.
Show ¬ii → ¬i.
Suppose Ax has linearly dependent columns. Then ∃y s.t. Ax y = 0, y ̸= 0.
Extend y to Rn by adding 0 components. Then ∃y ∈ Rn s.t. Ay = 0, y = ̸ 0 and
yj = 0 wherever xj = 0.
′ ′ ′
Consider y = λy for small λ ≥ 0. Claim that x + y , x − y ∈ P , by argument
analogous to that in Case 1 of the proof of Theorem 2.46, above. Hence, x is
not a vertex.

■ Example 2.23 Niki holds two part-time jobs, Job I and Job II. She never wants
to work more than a total of 12 hours a week. She has determined that for every
hour she works at Job I, she needs 2 hours of preparation time, and for every hour
she works at Job II, she needs one hour of preparation time, and she cannot spend
more than 16 hours for preparation.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


284 Chapter 2. Introduction to Linear Optimization

If Nikki makes 40 an hour at Job I, and 30 an hour at Job II, how many hours
should she work per week at each job to maximize her income? ■

Solution: We start by choosing our variables.


Let x be the number of hours per week Niki will work at Job I.
Let y be the number of hours per week Niki will work at Job II.
Now we write the objective function. Since Niki gets paid $40 an hour at Job I,
and $30 an hour at Job II, her total income I is given by the following equation:

I = 40x + 30y

Our next task is to find the constraints. The second sentence in the problem
states, “She never wants to work more than a total of 12 hours a week.” This
translates into the following constraint:

x + y ≤ 12

The third sentence states, “For every hour she works at Job I, she needs 2 hours
of preparation time, and for every hour she works at Job II, she needs one hour of
preparation time, and she cannot spend more than 16 hours for preparation.” The
translation follows:

2x + y ≤ 16

The fact that x and y can never be negative is represented by the following two
constraints:

x ≥ 0 and y ≥ 0

Well, good news! We have formulated the problem. We restate it as:

Maximize I = 40x + 30y


subject to x + y ≤ 12
2x + y ≤ 16
x≥0
y≥0

Maximize I = 40x + 30y


Subject to: x + y ≤ 12
2x + y ≤ 16
x≥0
y≥0
In order to solve the problem, we graph the constraints and shade the region that
satisfies all the inequality constraints.
Any appropriate method can be used to graph the lines for the constraints.
However, often the easiest method is to graph the line by plotting the x-intercept
and y-intercept.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.102 The Geometry of LP 285

The line for a constraint will divide the plane into two regions, one of which
satisfies the inequality part of the constraint. A test point is used to determine which
portion of the plane to shade to satisfy the inequality. Any point on the plane that
is not on the line can be used as a test point.
• If the test point satisfies the inequality, then the region of the plane that
satisfies the inequality is the region that contains the test point.
• If the test point does not satisfy the inequality, then the region that satisfies
the inequality lies on the opposite side of the line from the test point.
In the graph below, after the lines representing the constraints were graphed
using an appropriate method from Chapter 1, the point (0, 0) was used as a test
point to determine that
• (0, 0) satisfies the constraint x + y ≤ 12 because 0 + 0 ≤ 12.
• (0, 0) satisfies the constraint 2x + y ≤ 16 because 2(0) + 0 ≤ 16.
Therefore, in this example, we shade the region that is below and to the left of
both constraint lines, but also above the x-axis and to the right of the y-axis, in
order to further satisfy the constraints x ≥ 0 and y ≥ 0.

The shaded region where all conditions are satisfied is called the feasibility region
or the feasibility polygon.
The Fundamental Theorem of Linear Programming states that the maximum (or
minimum) value of the objective function always takes place at the vertices of the
feasibility region.
Therefore, we will identify all the vertices (corner points) of the feasibility region.
We call these points critical points. They are listed as (0, 0), (0, 12), (4, 8), (8, 0). To
maximize Niki’s income, we will substitute these points in the objective function to
see which point gives us the highest income per week. We list the results below.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


286 Chapter 2. Introduction to Linear Optimization

Critical Points Income


(0, 0) 40(0) + 30(0) = $0
(0, 12) 40(0) + 30(12) = $360
(4, 8) 40(4) + 30(8) = $400
(8, 0) 40(8) + 30(0) = $320

Clearly, the point (4, 8) gives the most profit: $400.


Therefore, we conclude that Niki should work 4 hours at Job I, and 8 hours at
Job II.

■ Example 2.24 A factory manufactures two types of gadgets, regular and premium.
Each gadget requires the use of two operations, assembly and finishing, and there
are at most 12 hours available for each operation. A regular gadget requires 1 hour
of assembly and 2 hours of finishing, while a premium gadget needs 2 hours of
assembly and 1 hour of finishing. Due to other restrictions, the company can make
at most 7 gadgets a day. If a profit of 20isrealizedf oreachregulargadgetand30 for
a premium gadget, how many of each should be manufactured to maximize profit?

Solution: We choose our variables.


• Let x be the number of regular gadgets manufactured each day.
• Let y be the number of premium gadgets manufactured each day.
The objective function is

P = 20x + 30y

We now write the constraints. The fourth sentence states that the company can
make at most 7 gadgets a day. This translates as

x+y ≤ 7

Since the regular gadget requires one hour of assembly and the premium gadget
requires two hours of assembly, and there are at most 12 hours available for this
operation, we get

x + 2y ≤ 12

Similarly, the regular gadget requires two hours of finishing and the premium
gadget one hour. Again, there are at most 12 hours available for finishing. This gives
us the following constraint:

2x + y ≤ 12

The fact that x and y can never be negative is represented by the following two
constraints:

x ≥ 0 and y ≥ 0

We have formulated the problem as follows:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.102 The Geometry of LP 287

Maximize P = 20x + 30y


Subject to: x + y ≤ 7
x + 2y ≤ 12
2x + y ≤ 12
x≥0
y≥0

In order to solve the problem, we next graph the constraints and feasibility region.
Again, we have shaded the feasibility region, where all constraints are satisfied.
Since the extreme value of the objective function always takes place at the vertices
of the feasibility region, we identify all the critical points. They are listed as (0, 0),
(0, 6), (2, 5), (5, 2), and (6, 0). To maximize profit, we will substitute these points
in the objective function to see which point gives us the maximum profit each day.
The results are listed below.
Critical Point Income
(0, 0) 20(0) + 30(0) = $0
(0, 6) 20(0) + 30(6) = $180
(2, 5) 20(2) + 30(5) = $190
(5, 2) 20(5) + 30(2) = $160
(6, 0) 20(6) + 30(0) = $120
The point (2, 5) gives the most profit, and that profit is $190.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


288 Chapter 2. Introduction to Linear Optimization

Therefore, we conclude that we should manufacture 2 regular gadgets and 5


premium gadgets daily to obtain the maximum profit of $190.
So far we have focused on “standard maximization problems” in which:
• The objective function is to be maximized.
• All constraints are of the form ax + by ≤ c.
• All variables are constrained to be non-negative (x ≥ 0, y ≥ 0).
We will next consider an example where that is not the case. Our next problem
is said to have “mixed constraints”, since some of the inequality constraints are
of the form ax + by ≤ c and some are of the form ax + by ≥ c. The non-negativity
constraints are still an important requirement in any linear program.

■ Example 2.25 To motivate the subject of linear programming (LP), we begin


with a planning problem that can be solved graphically.
Say you are a vendor of lemonade and lemon juice. Each unit of lemonade
requires 1 lemon and 2 litres of water. Each unit of lemon juice requires 3 lemons
and 1 litre of water. Each unit of lemonade gives a profit of three dollars. Each
unit of lemon juice gives a profit of two dollars. You have 6 lemons and 4 litres of
water available. How many units of lemonade and lemon juice should you make to
maximize profit?
If we let x denote the number of units of lemonade to be made and let y denote
the number of units of lemon juice to be made, then the profit is given by 3x + 2y
dollars. We call 3x + 2y the objective function. Note that there are a number of
constraints that x and y must satisfied. First of all, x and y should be nonnegative.
The number of lemons needed to make x units of lemonade and y units of lemon
juice is x + 3y and cannot exceed 6. The number of litres of water needed to make
x units of lemonade and y units of lemon juice is 2x + y and cannot exceed 4.
Hence, to determine the maximum profit, we need to maximize 3x + 2y subject to
x and y satisfying the constraints x + 3y ≤ 6, 2x + y ≤ 4, x ≥ 0, and y ≥ 0.
A more compact way to write the problem is as follows:

maximize 3x + 2y
subject to x + 3y ≤ 6
2x + y ≤ 4
x ≥ 0
y ≥ 0.

We cansolve this maximizationproblem graphically as follows. We first sketch


x
the set of   satisfying the constraints, called the feasible region, on the (x, y)-
y
plane. We then take the objective function 3x + 2y and turn it into an equation of
a line 3x + 2y = z where z is a parameter. Note that as the value of z increases,
the line defined
 by the equation 3x + 2y = z moves in the direction of the normal
3
vector  . We call this direction the direction of improvement. Determining the
2
maximum value of the objective function, called the optimal value, subject to the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.102 The Geometry of LP 289

contraints amounts to finding the maximum value of z so that the line defined by
the equation 3x + 2y = z still intersects the feasible region.

x>=0

Direction of improvement
2x+
y<=
4

(1.2,1.6)

x+3
y <=6

y>=0
3x
3x

3x

+2
+2

+2

y=
y=

y=

6
0

.8

In the figure above, the lines with z at 0, 4 and 6.8 have been drawn. From the
picture, we can see that if z is greater than 6.8, the line defined by 3x + 2y = z will
not intersect the feasible region. Hence, the profit cannot exceed 6.8 dollars.
As the line 3x + 2y = 6.8 does intersect the feasible region, 6.8 is the maximum
value for the objective function. Note that there is only  one point
 in the feasible
x 1.2
region that intersects the line 3x + 2y = 6.8, namely   =   . In other words,
y 1.6
to maximize profit, we want to make 1.2 units of lemonade and 1.6 units of lemon
juice.
The above solution method can hardly be regarded  as rigorous because we relied
x
on a picture to conclude that 3x + 2y ≤ 6.8 for all   satisfying the constraints.
y
But we can actually show this algebraically.
Note that multiplying both sides of the constraint x + 3y ≤ 6 gives 0.2x + 0.6y ≤
1.2, and multiplying
  both sides of the constraint 2x + y ≤ 4 gives 2.8x + 1.4y ≤ 5.6.
x
Hence, any   that satisfies both x + 3y ≤ 6 and 2x + y ≤ 4 must also satisfy
y
(0.2x + 0.6y) + (2.8x + 1.4y) ≤ 1.2 + 5.6, which simplifies to 3x + 2y ≤ 6.8 as desired!
(Here, we used the fact that if a ≤ b and c ≤ d, then a + c ≤ b + d.)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


290 Chapter 2. Introduction to Linear Optimization

Now, one might ask if it is always possible to find an algebraic proof like the
one above for similar problems. If the answer is yes, how does one find such a
proof? We will see answers to this question later on.
Before we end this segment, let us consider the following problem:

minimize −2x + y
subject to −x + y ≤ 3
x − 2y ≤ 2
x ≥ 0
y ≥ 0.
   
x t
Note that for any t ≥ 0,   =   satisfies all the constraints. The value of
y t
   
x t
the objective function at   =   is −t. As t → ∞, the value of the objective
y t
function tends to −∞. Therefore, there is no minimum value for the objective
function. The problem is said to be unbounded. Later on, we will see how to
detect unboundedness algorithmically. ■

■ Example 2.26  
x
1. Sketch all   satisfying
y

x − 2y ≤ 2

on the (x, y)-plane.


2. Determine the optimal value of

Minimize x + y
Subject to 2x + y ≥ 4
x + 3y ≥ 1.

3. Show that the problem

Minimize −x + y
Subject to 2x − y ≥ 0
x + 3y ≥ 3

is unbounded.
4. Suppose that you are shopping for dietary supplements to satisfy your required
daily intake of 0.40mg of nutrient M and 0.30mg of nutrient N . There are
three popular products on the market. The costs and the amounts of the
two nutrients are given in the following table:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.102 The Geometry of LP 291

Product 1 Product 2 Product 3


Cost $27 $31 $24
Daily amount of M 0.16 mg 0.21 mg 0.11 mg
Daily amount of N 0.19 mg 0.13 mg 0.15 mg

You want to determine how much of each product you should buy so that
the daily intake requirements of the two nutrients are satisfied at minimum
cost. Formulate your problem as a linear programming problem, assuming
that you can buy a fractional number of each product.

Solution:
1. The points (x, y) satisfying x − 2y ≤ 2 are precisely those above the line passing
through (2, 0) and (0, −1).
2. We want to determine the minimum value z so that x + y = z defines a line
that has a nonempty intersection with the feasible region. However, we can
avoid referring to a sketch by setting x = z − y and substituting for x in the
inequalities to obtain:

2(z − y) + y ≥ 4
(z − y) + 3y ≥ 1,
or equivalently,
1
z ≥ 2+ y
2
z ≥ 1 − 2y,
Thus, the minimum value for z is min{2 + 12 y, 1 − 2y}, which occurs at y = − 25 .
Hence, the optimal value is 95 .
We can verify our work by doing the following. If our calculations above are
correct, then an optimal solution is given by x = 11 2
5 , y = − 5 since x = z − y. It
is easy to check that this satisfies both inequalities and therefore is a feasible
solution.
Now, taking 25 times the first inequality and 15 times the second inequality,
we can infer the inequality x + y ≥ 95 . The left-hand side of this inequality is
precisely the objective function. Hence, no feasible solution can have objective
function value less than 95 . But x = 11 2
5 , y = − 5 is a feasible solution with
objective function value equal to 95 . As a result, it must be an optimal solution.
Remark. We have not yet discussed how to obtain the multipliers 25 and 15 for
inferring the inequality x + y ≥ 59 . This is an issue that will be taken up later.
In the meantime, think about how one could have obtained these multipliers
for this particular exercise.
3. We could glean some insight by first making a sketch on the (x, y)-plane.
The
  line defined
 by −x + y = z has x-intercept −z. Note that for z ≤ −3,
x
 =
−z 
satisfies both inequalities and the value of the objective function
y 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


292 Chapter 2. Introduction to Linear Optimization
   
x −z
at   =   is z. Hence, there is no lower bound on the value of objective
y 0
function.
4. Let xi denote the amount of Product i to buy for i = 1, 2, 3. Then, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.

Remark. If one cannot buy fractional amounts of the products, the problem
can be formulated as
minimize 27x1 + 31x2 + 24x3
subject to 0.16x1 + 0.21x2 + 0.11x3 ≥ 0.30
0.19x1 + 0.13x2 + 0.15x3 ≥ 0.40
x1 , x2 , x3 ≥ 0.
x1 , x2 , x3 ∈ Z.

2.103 Bases
Let x be a vertex of P = {x : Ax = b, x ≥ 0}. Suppose first that |{j : xj > 0}| = m
(where A is m × n). In this case we denote B = {j : xj > 0}. Also let AB = Ax ; we
use this notation not only for A and B, but also for x and for other sets of indices.
Then AB is a square matrix whose columns are linearly independent (by Theorem
2.47), so it is non-singular. Therefore we can express x as xj = 0 if j ̸∈ B, and since
AB xB = b, it follows that xB = A−1 B b. The variables corresponding to B will be called
basic. The others will be referred to as nonbasic. The set of indices corresponding to
nonbasic variables is denoted by N = {1, . . . , n} − B. Thus, we can write the above
as xB = A−1 B b and xN = 0.
Without loss of generality we will assume that A has full row rank, rank(A) = m.
Otherwise either there is a redundant constraint in the system Ax = b (and we can
remove it), or the system has no solution at all.
If |{j : xj > 0}| < m, we can augment Ax with additional linearly independent
columns, until it is an m × m sub-matrix of A of full rank, which we will denote AB .
In other words, although there may be less than m positive components in x, it is
convenient to always have a basis B such that |B| = m and AB is non-singular. This
enables us to always express x as we did before, xN = 0, xB = A−1 B b.
Summary x is a vertex of P iff there is B ⊆ {1, . . . , n} such that |B| = m and
1. xN = 0 for N = {1, . . . , n} − B
2. AB is non-singular
3. xB = A−1 B b≥0
In this case we say that x is a basic feasible solution. Note that a vertex can
have several basic feasible solution corresponding to it (by augmenting {j : xj > 0}
in different ways). A basis might not lead to any basic feasible solution since A−1 B b
is not necessarily non-negative.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.104 The Simplex Method 293

■ Example 2.27
x1 + x2 + x3 = 5
2x1 − x2 + 2x3 = 1
x1 , x 2 , x 3 ≥ 0
We can select as a basis B = {1, 2}. Thus, N = {3} and
 
1 1 
AB = 
2 −1
 
1 1
A−1
B =  3
2 −1
3 
3 3
 
2 
A−1
B b =

3
x = (2, 3, 0)

n
R A crude upper bound on the number of vertices of P is m . This number
m
is exponential (it is upper bounded by n ). We can come up with a tighter
n− m 
approximation of m 2 , though this is still exponential. The reason why the
2
number is much smaller is that most basic solutions to the system Ax = b
(which we counted) are not feasible, that is, they do not satisfy x ≥ 0.

2.104 The Simplex Method


The Simplex algorithm solves linear programming problems by focusing on basic
feasible solutions. The basic idea is to start from some vertex v and look at the
adjacent vertices. If an improvement in cost is possible by moving to one of the
adjacent vertices, then we do so. Thus, we will start with a bfs corresponding to a
basis B and, at each iteration, try to improve the cost of the solution by removing
one variable from the basis and replacing it by another.
We begin the Simplex algorithm by first rewriting our LP in the form:

min cB xB + cN xN
s.t. AB xB + AN xN = b
xB , x N ≥ 0

Here B is the basis corresponding to the bfs we are starting from. Note that, for
any solution x, xB = A−1 −1 T
B b − AB AN xN and that its total cost, c x can be specified
as follows:

cT x = cB x B + cN x N
= cB (A−1 −1
B b − AB AN xN ) + cN xN
= cB A−1 −1
B b + (cN − cB AB AN )xN

T.Abraha(PhD) @AKU, 2024 Linear Optimization


294 Chapter 2. Introduction to Linear Optimization

We denote the reduced cost of the non-basic variables by c̃N , c̃N = cN − cB A−1
B AN ,
i.e. the quantity which is the coefficient of xN above. If there is a j ∈ N such that
c̃j < 0, then by increasing xj (up from zero) we will decrease the cost (the value of
the objective function). Of course xB depends on xN , and we can increase xj only
as long as all the components of xB remain positive.
So in a step of the Simplex method, we find a j ∈ N such that c̃j < 0, and increase
it as much as possible while keeping xB ≥ 0. It is not possible any more to increase
xj , when one of the components of xB is zero. What happened is that a non-basic
variable is now positive and we include it in the basis, and one variable which was
basic is now zero, so we remove it from the basis.
If, on the other hand, there is no j ∈ N such that c̃j < 0, then we stop, and the
current basic feasible solution is an optimal solution. This follows from the new
expression for cT x since xN is nonnegative.
Remarks:
1. Note that some of the basic variables may be zero to begin with, and in this
case it is possible that we cannot increase xj at all. In this case we can replace
say j by k in the basis, but without moving from the vertex corresponding to
the basis. In the next step we might replace k by j, and be stuck in a loop.
Thus, we need to specify a “pivoting rule” to determine which index should
enter the basis, and which index should be removed from the basis.
2. While many pivoting rules (including those that are used in practice) can lead
to infinite loops, there is a pivoting rule which will not (known as the minimal
index rule - choose the minimal j and k possible [Bland, 1977]). This fact was
discovered by Bland in 1977. There are other methods of “breaking ties” which
eliminate infinite loops.
3. There is no known pivoting rule for which the number of pivots in the worst
case is better than exponential.
4. The question of the complexity of the Simplex algorithm and the last remark
leads to the question of what is the length of the shortest path between two
vertices of a convex polyhedron, where the path is along edges, and the length
of the path in measured in terms of the number of vertices visited.
Hirsch Conjecture: For m hyperplanes in d dimensions the length of the
shortest path between any two vertices of the arrangement is at most m − d.
This is a very open question — there is not even a polynomial bound proven
on this length.
On the other hand, one should note that even if the Hirsch Conjecture is true,
it doesn’t say much about the Simplex Algorithm, because Simplex generates
paths which are monotone with respect to the objective function, whereas the
shortest path need not be monotone.
Recently, Kalai (and others) has considered a randomized pivoting rule. The
idea is to randomly permute the index columns of A and to apply the Simplex
method, always choosing the smallest j possible. In this way, it is possible to
show a subexponential bound on the expected number of pivots. This leads to
a subexponential bound for the diameter of any convex polytope defined by m
hyperplanes in a d dimension space.
The question of the existence of a polynomial pivoting scheme is still open
though. We will see later a completely different algorithm which is polynomial,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.105 When is a Linear Program Feasible ? 295

although not strongly polynomial (the existence of a strongly polynomial


algorithm for linear programming is also open). That algorithm will not move
from one vertex of the feasible domain to another like the Simplex, but will
confine its interest to points in the interior of the feasible domain.
A visualization of the geometry of the Simplex algorithm can be obtained from
considering the algorithm in 3 dimensions (see Figure 2.11). For a problem in the
form min{cT x : Ax ≤ b} the feasible domain is a polyhedron in R3 , and the algorithm
moves from vertex to vertex in each step (or does not move at all).

Figure 2.11: Traversing the vertices of a convex body (here a polyhedron in R3 ).

2.105 When is a Linear Program Feasible ?


We now turn to another question which will lead us to important properties of linear
programming. Let us begin with some examples.
We consider linear programs of the form Ax = b, x ≥ 0. As the objective function
has no effect on the feasibility of the program, we ignore it.
We first restrict our attention to systems of equations (i.e. we neglect the
non-negativity constraints).

■ Example 2.28 Consider the system of equations:


x1 + x2 + x3 = 6
2x1 + 3x2 + x3 = 8
2x1 + x2 + 3x3 = 0
and the linear combination
−4 × x1 + x2 + x3 = 6
1 × 2x1 + 3x2 + x3 = 8
1 × 2x1 + x2 + 3x3 = 0
The linear combination results in the equation

0x1 + 0x2 + 0x3 = −16

T.Abraha(PhD) @AKU, 2024 Linear Optimization


296 Chapter 2. Introduction to Linear Optimization

which means of course that the system of equations has no feasible solution.
In fact, an elementary theorem of linear algebra says that if a system has no
solution, there is always a vector y such as in our example (y = (−4, 1, 1)) which
proves that the system has no solution. ■

Theorem 2.48 Exactly one of the following is true for the system Ax = b:
1. There is x such that Ax = b.
2. There is y such that AT y = 0 but y T b = 1.

This is not quite enough for our purposes, because a system can be feasible,
but still have no non-negative solutions x ≥ 0. Fortunately, the following lemma
establishes the equivalent results for our system Ax = b, x ≥ 0.
Theorem 2.49 — Farkas’ Lemma. Exactly one of the following is true for the system
Ax = b, x ≥ 0:
1. There is x such that Ax = b, x ≥ 0.
2. There is y such that AT y ≥ 0 but bT y < 0.

Proof. We will first show that the two conditions cannot happen together, and then
than at least one of them must happen.

Suppose we do have both x and y as in the statement of the theorem.

Ax = b =⇒ y T Ax = y T b =⇒ xT AT y = y T b

but this is a contradiction, because y T b < 0, and since x ≥ 0 and AT y ≥ 0, so


xT AT y ≥ 0.

The other direction is less trivial, and usually shown using properties of the
Simplex algorithm, mainly duality. We will use another tool, and later use Farkas’
Lemma to prove properties about duality in linear programming. The tool we shall
use is the Projection theorem, which we state without proof:
Theorem 2.50 — Projection Theorem. Let K be a closed convex (see Figure 2.12a)
non-empty set in Rn , and let b be any point in Rn . The projection of b onto K
is a point p ∈ K that minimizes the Euclidean distance ∥b − p∥. Then p has the
property that for all z ∈ K, (z − p)T (b − p) ≤ 0 (see Figure 2.13) non-empty set.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.105 When is a Linear Program Feasible ? 297

Figure 2.13: The Projection Theorem.

(a) Not Convex (b) Convex

Figure 2.12: Examples of convex and non-convex sets in R2 .

We are now ready to prove the other direction of Farkas’ Lemma. Assume that
there is no x such that Ax = b, x ≥ 0; we will show that there is y such that AT y ≥ 0
but y T b < 0.
Let K = {Ax : x ≥ 0} ⊆ Rm (A is an m × n matrix). K is a cone in Rm and it is
convex, non-empty and closed. According to our assumption, Ax = b, x ≥ 0 has no
solution, so b does not belong to K. Let p be the projection of b onto K.
Since p ∈ K, there is a w ≥ 0 such that Aw = p. According to the Projection
Theorem, for all z ∈ K, (z − p)T (b − p) ≤ 0 That is, for all x ≥ 0 (Ax − p)T (b − p) ≤ 0
We define y = p−b, which implies (Ax−p)T y ≥ 0. Since Aw = p, (Ax−Aw)T y ≥ 0.
(x − w)T (AT y) ≥ 0 for all x ≥ 0 (remember that w was fixed by choosing b).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


298 Chapter 2. Introduction to Linear Optimization
 
0
 

 0 

 .. 
.
 
 
Set x = w+  (w plus a unit vector with a 1 in the i-th row). Note that x
1
 
 
..
 
 

 . 

0
is non-negative, because w ≥ 0.
This will extract the i-th column of A, so we conclude that the i-th component
of AT y is non-negative (AT y)i ≥ 0, and since this is true for all i, AT y ≥ 0.
Now it only remains to show that y T b < 0.
y t b = (p − y)T y = pT y − y T y Since (Ax − p)T y ≥ 0 for all x ≥ 0, taking x to be zero
shows that pT y ≤ 0. Since b ̸∈ K, y = p − b ̸= 0, so y T y > 0. So y T b = pT y − y T y <
0. ■

Using a very similar proof one can show the same for the canonical form:
Theorem 2.51 Exactly one of the following is true for the system Ax ≤ b:
1. There is x such that Ax ≤ b.
2. There is y ≥ 0 such that AT y = 0 but y T b < 0.
The intuition behind the precise form for 2. in the previous theorem lies in the proof
that both cannot happen. The contradiction 0 = 0x = (y T A)x = y T (Ax) = y T b < 0
is obtained if AT y = 0 and y T b < 0.

2.106 Duality
Duality is the most important concept in linear programming. Duality allows to
provide a proof of optimality. This is not only important algorithmically but also it
leads to beautiful combinatorial statements. For example, consider the statement

In a graph, the smallest number of edges in a path between two


specified vertices s and t is equal to the maximum number of s − t cuts
(i.e. subsets of edges whose removal disconnects s and t).

This result is a direct consequence of duality for linear programming.


Duality can be motivated by the problem of trying to find lower bounds on the
value of the optimal solution to a linear programming problem (if the problem is
a maximization problem, then we would like to find upper bounds). We consider
problems in standard form:

min cT x
s.t. Ax = b
x≥0
Suppose we wanted to obtain the best possible upper bound on the cost function.
By multiplying each equation Am x = bm by some number ym and summing up the
resulting equations, we obtain that y T Ax = bT y. if we impose that the coefficient of
xj in the resulting inequality is less or equal to cj then bT y must be a lower bound

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.106 Duality 299

on the optimal value since xj is constrained to be non-negative. To get the best


possible lower bound, we want to solve the following problem:

max bT y
s.t. AT y ≤ c
This is another linear program. We call this one the dual of the original one,
called the primal. As we just argued, solving this dual LP will give us a lower bound
on the optimum value of the primal problem. Weak duality says precisely this: if we
denote the optimum value of the primal by z, z = min cT x, and the optimum value
of the dual by w, then w ≤ z. We will use Farkas’ lemma to prove strong duality
which says that these quantities are in fact equal. We will also see that, in general,
the dual of the dual is the problem.
Example:

■ Example 2.29
z = min x1 + 2x2 + 4x3
x1 + x2 + 2x3 = 5
2x1 + x2 + 3x3 = 8
The first equality gives a lower bound of 5 on the optimum value z, since x1 +
2x2 + 4x3 ≥ x1 + x2 + 2x3 = 5 because of nonnegativity of the xi . We can get an
even better lower bound by taking 3 times the first equality minus the second one.
This gives  2x2 + 3x3 = 7 ≤ x1 + 2x2 + 4x3 , implying a lower bound of 7 on
x1 + 
3
 
z. For x =  2 

, the objective function is precisely 7, implying optimality. The
0
mechanism of generating lower bounds is formalized by the dual linear program:

max 5y1 + 8y2


y1 + 2y2 ≤ 1
y1 + y2 ≤ 2
2y1 + 3y2 ≤ 4

y1 represents the multiplier for the first constraint and y2 the multiplier for the
second constraint,
  This LP’s objective function also achieves a maximum value of
3 
7 at y =  . ■
−1

We now formalize the notion of duality. Let P and D be the following pair of
dual linear programs:

(P ) z = min{cT x : Ax = b, x ≥ 0}
(D) w = max{bT y : AT y ≤ c}.

(P ) is called the primal linear program and (D) the dual linear program.
In the proof below, we show that the dual of the dual is the primal. In other
words, if one formulates (D) as a linear program in standard form (i.e. in the same

T.Abraha(PhD) @AKU, 2024 Linear Optimization


300 Chapter 2. Introduction to Linear Optimization

form as (P )), its dual D(D) can be seen to be equivalent to the original primal (P ).
In any statement, we may thus replace the roles of primal and dual without affecting
the statement.
Proof. The dual problem D is equivalent to min{−bT y : AT y + Is = c, s ≥ 0}. Chang-
ing forms we get min{−bT y + + bT y − : AT y + − AT y − + Is = c, and y + , y − , s ≥ 0}.
Taking the dual of this we obtain: max{−cT x : A(−x) ≤ −b, −A(−x) ≤ b, I(−x) ≤ 0}.
But this is the same as min{cT x : Ax = b, x ≥ 0} and we are done. ■
We have the following results relating w and z.
Lemma 2.5. (Weak Duality) z ≥ w.
Proof. Suppose x is primal feasible and y is dual feasible. Then, cT x ≥ y T Ax = y T b,
thus z = min{cT x : Ax = b, x ≥ 0} ≥ max{bT y : AT y ≤ c} = w. ■
From the preceding lemma we conclude that the following cases are not possible
(these are dual statements):
1. P is feasible and unbounded and D feasible.
2. P is feasible and D is feasible and unbounded.
We should point out however that both the primal and the dual might be infeasible.
To prove a stronger version of the weak duality lemma, let’s recall the following
corollary of Farkas’ Lemma (Theorem 2.51):
Corollary 2.4 Exactly one of the following is true:
1. ∃x′ : A′ x′ ≤ b′ .
2. ∃y ′ ≥ 0 : (A′ )T y ′ = 0 and (b′ )T y ′ < 0.

Theorem 2.52 (Strong Duality) If P or D is feasible then z = w.

Proof. We only need to show that z ≤ w. Assume without loss of generality (by
duality) that P is feasible. If P is unbounded, then by Weak Duality, we have that
z = w = −∞. Suppose P is bounded, and let x∗ be an optimal solution, i.e. Ax∗ = b,
x∗ ≥ 0 and cT x∗ = z. We claim that ∃y s.t. AT y ≤ c and bT y ≥ z. If so weare done.

A T
Suppose no such y exists. Then, by the preceding corollary, with A′ =  ,
−bT
   
c  ′ x
b′ =  , x = y, y ′ =  , ∃x ≥ 0, λ ≥ 0 such that
−z λ
Ax = λb
and cT x < λz.
We have two cases
• Case 1: λ ̸= 0. Since we can normalize byTλ we can assume that λ = 1. This
means that ∃x ≥ 0 such that Ax = b and c x < z. But this is a contradiction
with the optimality of x∗ .
• Case 2: λ = 0. This means∗ that ∃x ≥ 0 such that Ax = 0 and cT Tx <∗0. If this
is the case then ∀µ ≥ 0, x + µx is feasible for P and its cost is c (x + µx) =
cT x∗ + µ(cT x) < z, which is a contradiction.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.106 Duality 301

Rules for Taking Dual Problems


If P is a minimization problem then D is a maximization problem. If P is a
maximization problem then D is a minimization problem. In general, using the rules
for transforming a linear program into standard form, we have that the dual of (P ):
z = min cT1 x1 + cT2 x2 + cT3 x3
s.t.

A11 x1 + A12 x2 + A13 x3 = b1


A21 x1 + A22 x2 + A23 x3 ≥ b2
A31 x1 + A32 x2 + A33 x3 ≤ b3
x1 ≥ 0 , x2 ≤ 0 , x3 UIS

(where UIS means “unrestricted in sign” to emphasize that no constraint is on the


variable) is (D)
w = max bT1 y1 + bT2 y2 + bT3 y3
s.t.

AT11 y1 + AT21 y2 + AT31 y3 ≤ c1


AT12 y1 + AT22 y2 + AT32 y3 ≥ c2
AT13 y1 + AT23 y2 + AT33 y3 = c3
y1 UIS , y2 ≥ 0 , y3 ≤ 0

Complementary Slackness
Let P and D be

(P ) z = min{cT x : Ax = b, x ≥ 0}
(D) w = max{bT y : AT y ≤ c},

and let x be feasible in P , and y be fesible in D. Then, by weak duality, we know


that cT x ≥ bT y. We call the difference cT x − bT y the duality gap. Then we have that
the duality gap is zero iff x is optimal in P , and y is optimal in D. That is, the
duality gap can serve as a good measure of how close a feasible x and y are to the
optimal solutions for P and D. The duality gap will be used in the description of
the interior point method to monitor the progress towards optimality.
It is convenient to write the dual of a linear program as

(D) w = max{bT y : AT y + s = c for some s ≥ 0}


Then we can write the duality gap as follows:

c T x − bT y = c T x − x T A T y
= xT (c − AT y)
= xT s

since AT y + s = c.
The following theorem allows to check optimality of a primal and/or a dual
solution.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


302 Chapter 2. Introduction to Linear Optimization

Theorem 2.53 (Complementary Slackness)


Let x∗ , (y ∗ , s∗ ) be feasible for (P ), (D) respectively. The following are equiva-
lent:
1. x∗ is an optimal solution to (P ) and (y ∗ , s∗ ) is an optimal solution to (D).
2. (s∗ )T x∗ = 0.
3. x∗j s∗j = 0, ∀ j = 1, . . . , n.
4. If s∗j > 0 then x∗j = 0.

Proof. Suppose (1) holds, then, by strong duality, cT x∗ = bT y ∗ . Since c = AT y ∗ + s∗


and Ax∗ = b, we get that (y ∗ )T Ax∗ + (s∗ )T x∗ = (x∗ )T AT y ∗ , and thus, (s∗ )T x∗ = 0
(i.e (2) holds). It follows, since x∗j , s∗j ≥ 0, that x∗j s∗j = 0, ∀ j = 1, . . . , n (i.e. (3) holds).
Hence, if s∗j > 0 then x∗j = 0, ∀ j = 1, . . . , n (i.e. (4) holds). The converse also holds,
and thus the proof is complete. ■

In the example of section 2.106, the complementary slackness equations corre-


sponding to the primal solution x = (3, 2, 0)T would be:

y1 + 2y2 = 1
y1 + y2 = 2

Note that this implies that y1 = 3 and y2 = −1. Since this solution satisfies the
other constraint of the dual, y is dual feasible, proving that x is an optimum solution
to the primal (and therefore y is an optimum solution to the dual).

Size of a Linear Program


Size of the Input
If we want to solve a Linear Program in polynomial time, we need to know what
would that mean, i.e. what would the size of the input be. To this end we introduce
two notions of the size of the input with respect to which the algorithm we present
will run in polynomial time. The first measure of the input size will be the size of
a LP, but we will introduce a new measure L of a LP that will be easier to work
with. Moreover, we have that L ≤ size(LP ), so that any algorithm running in time
polynomial in L will also run in time polynomial in size(LP).
Let’s consider the linear program of the form:

min cT x
s.t.
Ax = b
x≥0

where we are given as inputs the coefficients of A (an m × n matrix), b (an m × 1


vector), and c (an n × 1 vector), whith rationial entries.
We can further assume, without loss of generality, that the given coefficients are
all integers, since any LP with rational coefficients can be easily transformed into an

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.106 Duality 303

equivalent one with integer coefficients (just multiply everything by l.c.d.). In the
rest of these notes, we assume that A, b, c have integer coefficients.
For any integer n, we define its size as follows:

size(n) = 1 + ⌈log2 (|n| + 1)⌉

where the first 1 stands for the fact that we need one bit to store the sign of n,
size(n) represents the number of bits needed to encode n in binary. Analogously, we
define the size of a p × 1 vector d, and of a p × l matrix M as follows:
△ Pp
size(v) =i=1 size(vi )

size(M ) = pi=1 lj=1 size(mij )
P P

We are then ready to talk about the size of a LP.

Definition 2.39 — Size of a linear program.


size(LP) = size(A) + size(b) + size(c).

A more convenient definition of the size of a linear program is given next.

Definition 2.40

L = size(detmax ) + size(bmax ) + size(cmax ) + m + n

where

detmax = max

(| det(A′ )|)
A

bmax = max(|bi |)
i

cmax = max(|cj |)
j

and A′ is any square submatrix of A.

Proposition 2.2 L < size(LP), ∀A, b, c.


Before proving this result, we first need the following lemma:
Lemma 2.6. 1. If n ∈ Z then |n| ≤ 2size(n)−1 − 1.
2. If v ∈ Z then ∥v∥ ≤ ∥v∥1 ≤ 2size(v)−n − 1.
n
2
3. If A ∈ Zn×n then |det(A)| ≤ 2size(A)−n − 1.
Proof. 1. By definition.
n n n
2size(vi )−1 = 2size(v)−n where
X Y Y
2. 1 + ∥v∥ ≤ 1 + ∥v∥1 = 1 + |vi | ≤ (1 + |vi |) ≤
i=1 i=1 i=1
we have used 1.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


304 Chapter 2. Introduction to Linear Optimization

3. Let a1 , . . . , an be the columns of A. Since |det(A)| represents the volume of the


parallelepiped spanned by a1 , . . . , an , we have
n
Y
|det(A)| ≤ ∥ai ∥.
i=1

Hence, by 2,
n n n 2
2size(ai )−n = 2size(A)−n .
Y Y Y
1 + |det(A)| ≤ 1 + ∥ai ∥ ≤ (1 + ∥ai ∥) ≤
i=1 i=1 i=1

We now prove Proposition 2.2.

Proof. If B is a square submatrix of A then, by definition, size(B) ≤ size(A).


Moreover, by lemma 2.6, 1 + |det(B)| ≤ 2size(B)−1 . Hence,

⌈log(1 + |det(B)|)⌉ ≤ size(B) − 1 < size(B) ≤ size(A). (2.45)

Let v ∈ Zp . Then size(v) ≥ size(maxj |vj |) + p − 1 = ⌈log(1 + maxj |vj |)⌉ + p. Hence,

size(b) + size(c) ≥ ⌈log(1 + max |cj |)⌉ + ⌈log(1 + max |bi |)⌉ + m + n. (2.46)
j i

Combining equations (2.45) and (2.46), we obtain the desired result. ■

R detmax ∗ bmax ∗ cmax ∗ 2m+n < 2L , since for any integer n, 2size(n) > |n|.
In what follows we will work with L as the size of the input to our algorithm.

2.106.1 Size of the Output


In order to even hope to solve a linear program in polynomial time, we better make
sure that the solution is representable in size polynomial in L. We know already that
if the LP is feasible, there is at least one vertex which is an optimal solution. Thus,
when finding an optimal solution to the LP, it makes sense to restrict our attention
to vertices only. The following theorem makes sure that vertices have a compact
representation.
Theorem 2.54 Let x be a vertex of the polyhedron defined by Ax = b, x ≥ 0. Then,
!
T p1 p 2 pn
x = ... ,
q q q
where pi (i = 1, . . . , n), q ∈ N,

and

0 ≤ pi < 2L
1 ≤ q < 2L .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.107 Complexity of linear programming 305

Proof. Since x is a basic feasible solution, ∃ a basis B such that xB = A−1B b and
xN = 0. Thus, we can set pj = 0, ∀ j ∈ N , and focus our attention on the xj ’s such
that j ∈ B. We know by linear algebra that
1
xB = A−1
B b= cof (AB )b
det(AB )
where cof (AB ) is the cofactor matrix of AB . Every entry of AB consists of a
determinant of some submatrix of A. Let q = |det(AB )|, then q is an integer since AB
has integer components, q ≥ 1 since AB is invertible, and q ≤ detmax < 2L . Finally,
note that pB = qxB = |cof (AB )b|, thus pi ≤ m
j=1 |cof (AB )ij ||bj | ≤ m detmax bmax <
P
L
2 . ■

2.107 Complexity of linear programming


In this section, we show that linear programming is in NP∩ co-NP. This will follow
from duality and the estimates on the size of any vertex given in the previous section.
Let us define the following decision problem:

Definition 2.41 — LP.


Input: Integral A, b, c, and a rational number λ,
Question: Is min{cT x : Ax = b, x ≥ 0} ≤ λ?

Theorem 2.55 LP ∈ NP ∩ co-NP

Proof. First, we prove that LP ∈ NP.


If the linear program is feasible and bounded, the “certificate” for verification of
instances for which min{cT x : Ax = b, x ≥ 0} ≤ λ is a vertex x′ of {Ax = b, x ≥ 0} s.t.
cT x′ ≤ λ. This vertex x′ always exists since by assumption the minimum is finite.
Given x′ , it is easy to check in polynomial time whether Ax′ = b and x′ ≥ 0. We also
need to show that the size of such a certificate is polynomially bounded by the size
of the input. This was shown in section 2.106.1.
If the linear program is feasible and unbounded, then, by strong duality, the
dual is infeasible. Using Farkas’ lemma on the dual, we obtain the existence of x̃:
Ax̃ = 0, x̃ ≥ 0 and cT x̃ = −1 < 0. Our certificate in this case consists of both a vertex
of {Ax = b, x ≥ 0} (to show feasiblity) and a vertex of {Ax = 0, x ≥ 0, cT x = −1}
(to show unboundedness if feasible). By choosing a vertex x′ of {Ax = 0, x ≥ 0,
cT x = −1}, we insure that x′ has polynomial size (again, see Section 2.106.1).
This proves that LP ∈ NP. (Notice that when the linear program is infeasible,
the answer to LP is “no”, but we are not responsible to offer such an answer in order
to show LP ∈ NP).
Secondly, we show that LP ∈ co-NP, i.e. LP ∈ NP, where LP is defined as:
Input: A, b, c, and a rational number λ,
Question: Is min{cT x : Ax = b, x ≥ 0} > λ?
If {x : Ax = b, x ≥ 0} is nonempty, we can use strong duality to show that LP is
indeed equivalent to:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


306 Chapter 2. Introduction to Linear Optimization

Input: A, b, c, and a rational number λ,

Question: Is max{bT y : AT y ≤ c} > λ?

which is also in NP, for the same reason as LP is.

If the primal is infeasible, by Farkas’ lemma we know the existence of a y s.t.


AT y ≥ 0 and bT y = −1 < 0. This completes the proof of the theorem. ■

2.108 Solving a Liner Program in Polynomial Time

The first polynomial-time algorithm for linear programming is the so-called ellipsoid
algorithm which was proposed by Khachian in 1979. The ellipsoid algorithm was in
fact first developed for convex programming (of which linear programming is a special
case) in a series of papers by the russian mathematicians A.Ju. Levin and, D.B. Judin
and A.S. Nemirovskii, and is related to work of N.Z. Shor. Though of polynomial
running time, the algorithm is impractical for linear programming. Nevertheless it
has extensive theoretical applications in combinatorial optimization. For example,
the stable set problem on the so-called perfect graphs can be solved in polynomial
time using the ellipsoid algorithm. This is however a non-trivial non-combinatorial
algorithm.

In 1984, Karmarkar presented another polynomial-time algorithm for linear


programming. His algorithm avoids the combinatorial complexity (inherent in the
simplex algorithm) of the vertices, edges and faces of the polyhedron by staying
well inside the polyhedron (see Figure 2.14). His algorithm lead to many other
algorithms for linear programming based on similar ideas. These algorithms are
known as interior point methods.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.108 Solving a Liner Program in Polynomial Time 307

Figure 2.14: Exploring the interior of a convex body.

It still remains an open question whether there exists a strongly polynomial


algorithm for linear programming, i.e. an algorithm whose running time depends on
m and n and not on the size of any of the entries of A, b or c.
In the rest of these notes, we discuss an interior-point method for linear program-
ming and show its polynomiality.
High-level description of an interior-point algorithm:
1. If x (current solution) is close to the boundary, then map the polyhedron onto
another one s.t. x is well in the interior of the new polyhedron (see Figure 2.15).
2. Make a step in the transformed space.
3. Repeat (a) and(b) until we are close enough to an optimal solution.
Before we give description of the algorithm we give a theorem, the corollary
of which will be a key tool used in determinig when we have reached an optimal
solution.

Theorem 2.56 Let x1 , x2 be vertices of Ax = b,


x ≥ 0.

If cT x1 ̸= cT x2 then |cT x1 − cT x2 | > 2−2L .

Proof. By Theorem 2.54, ∃ qi , q2 , such that 1 ≤ q1 , q2 < 2L , and q1 x1 , q2 x2 ∈ Nn .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


308 Chapter 2. Introduction to Linear Optimization

Furthermore,
q 1 cT x 1 q 2 cT x 2
|cT x1 − cT x2 | = −
q1 q2
q1 q2 (cT x1 − cT x2 )
=
q1 q2
1
≥ since cT x1 − cT x2 ̸= 0, q1 , q2 ≥ 1
q1 q2
1
> L L = 2−2L since q1 , q2 < 2L .
2 2

Corollary 2.5 Assume z = min{cT x : Ax = b, x ≥ 0 }.


| {z }
polyhedron P

Assume x is feasible to P , and such that cT x ≤ z + 2−2L .

Then, any vertex x′ such that cT x′ ≤ cT x is an optimal solution of the LP.

Proof. Suppose x′ is not optimal. Then, ∃x∗ , an optimal vertex, such that cT x∗ = z.
Since x′ is not optimal, cT x′ ̸= cT x∗ , and by Theorem 2.56
⇒ cT x′ − cT x∗ > 2−2L
⇒ cT x′ > cT x∗ + 2−2L
= Z + 2−2L
≥ cT x by definition of x
≥ cT x ′ by definition of x′
⇒ cT x ′ > c T x ′ ,
a contradiction. ■
What this corollary tells us is that we do not need to be very precise when
choosing an optimal vertex. More precisely we only need to compute the objective
function with error less than 2−2L . If we find a vertex that is within that margin of
error, then it will be optimal.

Figure 2.15: A centering mapping. If x is close to the boundary, we map the


polyhedron P onto another one P ′ , s.t. the image x′ of x is closer to the center of P ′ .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.108 Solving a Liner Program in Polynomial Time 309

2.108.1 Ye’s Interior Point Algorithm


In the rest of these notes we present Ye’s [Ye91] interior point algorithm for linear
programming. Ye’s algorithm (among several others) achieves the best known
asymptotic running time in the literature, and our presentation incorporates some
simplifications made by Freund [Freund91].
We are going to consider the following linear programming problem:

minimize Z = cT x




(P ) subject to Ax = b,



x≥0

and its dual

W = bT y

maximize



(D) subject to AT y + s = c,



s ≥ 0.

The algorithm is primal-dual, meaning that it simultaneously solves both the


primal and dual problems. It keeps track of a primal solution x and a vector of dual
slacks s (i.e. ∃ y : AT y = c − s) such that x > 0 and s > 0. The basic idea of this
algorithm is to stay away from the boundaries of the polyhedron (the hyperplanes
xj ≥ 0 and sj ≥ 0, j = 1, 2, . . . , n) while approaching optimality. In other words, we
want to make the duality gap

c T x − bT y = x T s > 0

very small but stay away from the boundaries. Two tools will be used to achieve this
goal in polynomial time.

Tool 1: Scaling (see Figure 2.15)


Scaling is a crucial ingredient in interior point methods. The two types of scaling
commonly used are projective scaling (the one used by Karmarkar) and affine scaling
(the one we are going to use).
Suppose the current iterate is x > 0 and s > 0, where x = (x1 , x2 , . . . , xn )T , then
the affine scaling maps x to x′ as follows.

x1
   
x1 x1
x2
   

 x2 


 x2


   
 .   . 
x=  −→ x′ =
  
 

 . 


 . 

   

 . 


 . 

xn xn
xn .

Notice this transformation maps x to e = (1, . . . , 1)T .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


310 Chapter 2. Introduction to Linear Optimization
−1
We can express the scaling transformation in matrix form as x′ = X x or
x = Xx′ , where  
 1
x 0 0 . . . 0 
 0 x2 0 . . . 0 


 .. .. .. 

X =  . . . 


 0 0 . . . x 0
 
 n−1 

0 0 ... 0 xn .
Using matrix notation we can rewrite the linear program (P) in terms of the trans-
formed variables as:
minimize Z = cT Xx′
subject to AXx′ = b,
x′ ≥ 0.
T
If we define c = Xc (note that X = X ) and A = AX we can get a linear program
in the original form as follows.
minimize Z = cT x′
subject to Ax′ = b,
x′ ≥ 0.
We can also write the dual problem (D) as:
maximize W = bT y
subject to (AX)T y + Xs = c,
Xs ≥ 0
or, equivalently,
maximize W = bT y
T
subject to A y + s′ = c,
s′ ≥ 0
where s′ = Xs, i.e.
 
s 1 x1
 

 s 2 x2 

′  
s =
 . 

 

 . 

s n xn .

One can easily see that

xj sj = x′j s′j ∀j ∈ {1, . . . , n} (2.47)

and, therefore, the duality gap xT s = j xj sj remains unchanged under affine scaling.
P

As a consequence, we will see later that one can always work equivalently in the
transformed space.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.108 Solving a Liner Program in Polynomial Time 311

Tool 2: Potential Function


Our potential function is designed to measure how small the duality gap is and
how far the current iterate is away from the boundaries. In fact we are going to use
the following “logarithmic barrier function”.

Definition 2.42 — Potential Function, G(x, s).


n

G(x, s) = q ln(xT s) −
X
ln(xj sj ), for some q,
j=1

where q is a parameter that must be chosen appropriately.


Note that the first term goes to −∞ as the duality gap tends to 0, and the second
term goes to +∞ as xi → 0 or si → 0 for some i. Two questions arise immediately
concerning this potential function.
Question 1: How do we choose q?
Lemma 2.7. Let x, s > 0 be vectors in Rn×1 . Then
n
n ln xT s −
X
ln xj sj ≥ n ln n.
j=1

Proof. Given any n positive numbers t1 , . . . , tn , we know that their geometric mean
does not exceed their arithmetic mean, i.e.
 1/n  
n n
Y 1 X
 tj  ≤  tj 
j=1 n j=1
.

Taking the logarithms of both sides we have


   
n n
1 X X
ln tj  ≤ ln  tj  − ln n.
n j=1 j=1

Rearranging this inequality we get


   
n
X n
X
n ln  tj  −  ln tj  ≥ n ln n.
j=1 j=1

(In fact the last inequality can be derived directly from the concavity of the logarithmic
function). The lemma follows if we set tj = xj sj . ■
Since our objective is that G → −∞ as xT s → 0 (since our primary goal is to get
close to optimality), according to Lemma 2.7, we should choose some q > n (notice
that ln xT s → −∞ as xT s → 0) . In particular, if we choose q = n + 1, the√algorithm
will terminate after O(nL) iterations.
√ In fact we are going to set q = n + n, which
gives us the smallest number — O( nL) — of iterations by this method.
Question 2: When can we stop?

T.Abraha(PhD) @AKU, 2024 Linear Optimization


312 Chapter 2. Introduction to Linear Optimization

Suppose that xT s ≤ 2−2L , then cT x − Z ≤ cT x − bT y = xT s ≤ 2−2L , where Z is


the optimum value to the primal problem. From Corollary 2.5, the following claim
follows immediately.

Claim 1: If xT s ≤ 2−2L , then any vertex x∗ satisfying cT x∗ ≤ cT x is optimal.


In order to find x∗ from x, two methods can be used. One is based on purely
algebraic techniques (but is a bit cumbersome to describe), while the other (the
cleanest one in literature) is based upon basis reduction for lattices. We shall not
elaborate on this topic, although we’ll get back to this issue when discussing basis
reduction in lattices.

Lemma 2.8. Let x, s be feasible primal-dual vectors such that G(x, s) ≤ −k nL
for some constant k. Then
xT s < e−kL .

Proof. By the definition of G(x, s) and the previous theorem we have:



−k nL ≥ G(x, s)
√ n
= (n + n) ln xT s −
X
ln xj sj
j=1

≥ n ln xT s + n ln n.
Rearranging we obtain

ln xT s ≤ −kL − n ln n
< −kL.
Therefore
xT s < e−kL . ■

The previous lemma and claim tell us that we can stop whenever G(x, s) ≤ −2 nL.
In practice, the algorithm can terminate even earlier, so it is a good idea to check
from time to time if we can get the optimal solution right away.
Please notice that according to Equation (2.47) the affine transformation does
not change the value of the potential function. Hence we can work either in the
original space or in the transformed space when we talk about the potential function.

2.109 Description of Ye’s Interior Point Algorithm


Initialization:
Set i = 0.
√Choose x0 > 0, s0 > 0, and y 0 such that Ax0 = b, AT y 0 + s0 = c and G(x0 , s0 ) =
O( nL). (Details are not covered in class but can be found in the appendix. The
general idea is as follows. By augmenting the linear program with additional variables,
it is easy to obtain a feasible solution. Moreover, by carefully choosing the augmented
linear program, it is possible to have feasible primal and dual solutions x and s such
that L
√ all xj ’s and sj ’s are large (say 2 ). This can be seen to result in a potential of
O( nL).)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.109 Description of Ye’s Interior Point Algorithm 313

Iteration:


while G(xi , si ) > −2 nL 
 xi
either a primal step (changing only) 
to get (xi+1 , si+1 )



i
do or a dual step (changing s only) 


 i := i + 1

The iterative step is as follows. Affine scaling maps (xi , si ) to (e, s′ ). In this
transformed space, the point is far away from the boundaries. Either a dual or
primal step occurs, giving (x̃, s̃) and reducing the potential function. The point is
then mapped back to the original space, resulting in (xi+1 , si+1 ).
Next, we are going to describe precisely how the primal or dual step is made such
that
7
G(xi+1 , si+1 ) − G(xi , si ) ≤ − <0
120

holds for either a primal or dual step, yielding an O( nL) total number of iterations.
In order to find the new point (x̃, s̃) given the current iterate (e, s′ ) (remember
we are working in the transformed space), we compute the gradient of the potential
function. This is the direction along which the value of the potential function changes
at the highest rate. Let g denote the gradient. Recall that (e, s′ ) is the map of the
current iterate, we obtain
g = ∇x G(x, s)|(e,s′ )
 
1/x1
q 
. 
= T s −  .. 
 
x s  
1/xn (e,s′ )
q
= T ′ s′ − e (2.48)
e s
We would like to maximize the change in G, so we would like to move in the
direction of −g. However, we must insure the new point is still feasible (i.e. Ax̃ = b).
Let d be the projection of g onto the null space {x : Ax = 0} of A. Thus, we will
move in the direction of −d.

Figure 2.16: Null space of A and gradient direction g.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


314 Chapter 2. Introduction to Linear Optimization
T
Claim 2: d = (I − A(A A )−1 A)g.

Proof. Since g − d is orthogonal to the null space of A, it must be the combination


of some row vectors of A. Hence we have


 Ad = 0
T
 ∃w, s.t. A w = g − d.

This implies

T

 A w = g−d
T (normal equations).
 (A A )w = Ag

Solving the normal equations, we get


T
w = (A A )−1 Ag
and
T T T T
d = g − A (A A )−1 Ag = (I − A (A A )−1 A)g.

A potential problem arises if g is nearly perpendicular to the null space of A. In
this case, ||d|| will be very small, and each primal step will not reduce the potential
greatly. Instead, we will perform a√dual step.
In particular, if ||d|| = ||d||2 = dT d ≥ 0.4, we make a primal step as follows.
1
x̃ = e − d
4||d||
s̃ = s′ .

Claim 3: x̃ > 0.
d
Proof. x˜j = 1 − 41 ||d||
j
≥ 3
4 > 0. ■

This claim insures that the new iterate is still an interior point. For the similar
reason, we will see that s̃ > 0 when we make a dual step.
Proposition 2.3 When a primal step is made, G(x̃, s̃) − G(e, s′ ) ≤ − 120
7
.
If ||d|| < 0.4, we make a dual step. Again, we calculate the gradient

h = ∇s G(x, s)|(e,s′ )
 
1/s′1
q 
.. 
= T ′e− 
 . 

(2.49)
e s  
1/s′n

Notice that hj = gj /sj , thus h and g can be seen to be approximately in the same
direction.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.110 Analysis of the Potential Function 315

Suppose the current dual feasible solution is y ′ , s′ such that


T
A y ′ + s′ = c.

Again, we restrict the solution to be feasible, so


T
A y + s̃ = c
T
s̃ − s′ = A (y ′ − y)

Thus, in the dual space, we move perpendicular to the null space and in the direction
of −(g − d).
Thus, we have

s̃ = s′ − (g − d)µ
T
For any µ, ∃y A y + s̃ = c
T ′ T
So, we can choose µ = e qs and get A (y ′ + µw) + s̃ = c.
Therefore,

′ eT s′
s̃ = s− (g − d)
q
eT s′ s′
= s′ − (q T ′ − e − d)
q e s
T
e s ′
= (d + e)
q
x̃ = x′ = e.

One can show that s̃ > 0 as we did in Claim 3. So such move is legal.
Proposition 2.4 When a dual step is made, G(x̃, s̃) − G(e, s′ ) ≤ − 61
According to these two propositions, the potential function decreases by a constant
amount at each √ step. So if we start
√ from an initial interior point (x0 , s0 ) with
G(x0 , s0 ) = O( nL), then after O(√ nL) iterations we will obtain another interior
j j j j
point (x , s ) with G(x , s ) ≤ −k nL. From Lemma 2.8, we know that the duality
gap (xj )T sj satisfies
(xj )T sj ≤ 2−kL ,
and the algorithm terminates by that time. Moreover, each iteration requires O(n3 )
operations. Indeed, in each iteration, the only non-trivial task is the computation of
the projected gradient d. This can be done by solving the linear system (ĀĀT )w = Āg
in O(n3 ) time using Gaussian elimination. Therefore, the overall time complexity of
this algorithm is O(n3.5 L). By using approximate solutions to the linear systems, we
can obtain O(n2.5 ) time per iteration, and total time O(n3 L).

2.110 Analysis of the Potential Function


In this section, we prove the two propositions of the previous section, which concludes
the analysis of Ye’s algorithm.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


316 Chapter 2. Introduction to Linear Optimization

Proof of 1: Proposition 2.3

1
G(x̃, s̃) − G(e, s′ ) = G(e − d, s̃) − G(e, s′ )
4||d||
n n
dT s′
! !
T ′ dj
ln s′j −
X X
= q ln e s − − ln 1 − −
4||d|| j=1 4||d|| j=1
  n n
T ′
ln s′j
X X
−q ln e s + ln 1 +
j=1 j=1
n
d s′
T
! !
X dj
= q ln 1 − − ln 1 − .
4||d||eT s′ j=1 4||d||

Using the relation

x2
−x − ≤ ln(1 − x) ≤ −x (2.50)
2(1 − a)

which holds for |x| ≤ a < 1, we get:

q dT s′ n
dj n d2j
G(x̃, s̃) − G(e, s′ ) ≤ −
X X
+ + for a = 1/4
4||d||eT s′ j=1 4||d|| j=1 16||d||2 2(3/4)
q dT s′ eT d 1
= − T ′
+ +
4||d||e s 4||d|| 24
1 q 1
= (e − T ′ s′ )T d +
4||d|| e s 24
1 1
= (−g)T d +
4||d|| 24
||d|| 2 1
= − +
4||d|| 24
||d|| 1
= − +
4 24
1 1
≤ − +
10 24
7
= − .
120

Note that g T d = ||d||2 , since d is the projection of g. (This is where we use the
fact that d is the projected gradient!)
Before proving Proposition 2.4, we need the following lemma.

Lemma 2.9.
n
X eT s̃ −2
ln(s̃j ) − n ln( )≥ .
j=1 n 15

Proof. Using the equality s̃ = ∆


q (e + d) and Equation 2.50, which holds for |x| ≤ a < 1,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.110 Analysis of the Potential Function 317

we see that
Pn eT s̃ Pn ∆ ∆ eT d
j=1 ln(s̃j ) − n ln( n ) = j=1 ln( q (1 + dj )) − n ln( q (1 + n ))

2 T
Pn j d e d
≥ j=1 (dj − 2(3/5) ) − n n

2
≥ − ||d||
6/5

−2
≥ 15


Proof of 2: Proposition 2.4
Using Lemma 2.9 and the inequality
n
X eT s
ln(sj ) ≤ n ln( ),
j=1 n

which follows from the concavity of the logarithm function, we have


T
G(e, s̃) − G(e, s′ ) = q ln( eeT ss̃′ ) − ′
Pn Pn
j=1 ln(s˜j ) + j=1 ln(sj )

T T T ′
≤ q ln( eeT ss̃′ ) + 15
2
− n ln( e ns̃ ) + n ln( e ns )

√ T
= 2
15 + n ln( eeT ss̃′ )
On the other hand,

eT s̃ = (n + eT d)
q
and recall that ∆ = eT s′ ,

eT s̃ 1 1 √
T ′
= (n + eT d) ≤ √ (n + 0.4 n),
e s q n+ n

since, by Cauchy-Schwartz inequality, |eT d| ≤ ||e|| ||d|| = n||d||. Combining the
above inequalities yields
√ √
0.6 √n
G(e, s̃) − G(e, s′ ) ≤ 2
15 + n ln(1 − n+ n
)

2 0.6n
≤ 15 − n+ √
n

2 3
≤ 15 − 10 = − 16

since n + n ≤ 2n.
This completes the analysis of Ye’s algorithm.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


318 Chapter 2. Introduction to Linear Optimization

2.111 Bit Complexity


Throughout the presentation of the algorithm, we assumed that all operations can
be performed exactly. This is a fairly unrealistic assumption. For example, notice
that ∥d∥ might be irrational since it involves a square root. However, none of the
thresholds we set were crucial. We could for example test whether ∥d∥ ≥ 0.4 or
∥d∥ ≤ 0.399. To test this, we need to compute only a few bits of ∥d∥. Also, if we
perform a primal step (i.e. ∥d∥ ≥ 0.4) and compute the first few bits of ∥d∥ so that
the resulting approximation ∥d∥ap satisfies (4/5)∥d∥ ≤ ∥d∥ap ≤ ∥d∥ then if we go
through the analysis of the primal step performed in Proposition 1, we obtain that
the reduction in the potential function is at least 19/352 instead of the previous
7/120. Hence, by rounding ∥d∥ we can still maintain a constant decrease in the
potential function.
Another potential problem is when using Gaussian elimination to compute
the projected gradient. We mentioned that Gaussian elimination requires O(n3 )
arithmetic operations but we need to show that, during the computation, the numbers
involved have polynomial size. For that purpose, consider the use of Gaussian
elimination to solve a system Ax = b where
(1) (1) (1)
 
a11 a12 . . . a1n
(1) (1) (1) 
 
(1)

 a21 a22 . . . a2n 
A=A = .. .. .. ..
.
.
 

 . . . 

(1) (1) (1)
am1 am2 . . . amn

Assume that a11 ̸= 0 (otherwise, we can permute rows or columns). In the first
(1) (1)
iteration, we substract ai1 /a11 times the first row from row i where i = 2, . . . , m,
resulting in the following matrix:
(2) (2) (2)
 
a11 a12 . . . a1n
(2) (2) 
 
(2)

 0 a22 . . . a2n 
A = .. .. .. ..
.
.
 

 . . . 

(2) (2)
0 am2 . . . amn
(i) (i)
In general, A(i+1) is obtained by subtracting aji /aii times row i from row j of A(i)
for j = i + 1, . . . , m.
(i)
Theorem 2.57 For all i ≤ j, k, ajk can be written in the form det(B)/ det(C) where
B and C are some submatrices of A.

Proof. Let Bi denote the i × i submatrix of A(i) consisting of the first i entries of the
(i)
first i rows. Let Bjk denote the i × i submatrix of A(i) consisting of the first i − 1
(i)
rows and row j, and the first i − 1 columns and column k. Since Bi and Bjk are
upper triangular matrices, their determinants are the products of the entries along
the main diagonal and, as a result, we have:
(i) det(Bi )
aii =
det(Bi−1 )

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.112 Transformation for the Interior Point Algorithm 319

and
(i)
(i) det(Bjk )
ajk = .
det(Bi−1 )
Moreover, remember that row operations do not affect the determinants and, hence,
(i)
the determinants of Bjk and Bi−1 are also determinants of submatrices of the original
matrix A. ■
Using the fact that the size of the determinant of any submatrix of A is at most
the size of the matrix A, we obtain that all numbers occuring during Gaussian
elimination require only O(L) bits.
Finally, we need to round the current iterates x, y and s to O(L) bits. Otherwise,
these vectors would require a constantly increasing number of bits as we iterate. By
rounding up x and s, we insure that these vectors are still strictly positive. It is
fairly easy to check that this rounding does not change the potential function by a
significant amount and so the analysis of the algorithm is still valid. Notice that now
the primal and dual constraints might be slightly violated but this can be taken care
of in the rounding step.

2.112 Transformation for the Interior Point Algorithm


In this appendix, we show how a pair of dual linear programs

M in cT x M ax bT y
(P ) s.t. Ax = b (D) s.t. AT y + s = c
x≥0 s ≥ 0

can be transformed so that we know a strictly feasible primal √ solution x0 and a


strictly feasible vector of dual slacks s0 such that G(x0 ; s0 ) = O( nL) where
n
G(x; s) = q ln(xT s) −
X
ln(xj sj )
j=1

and q = n + n.
Consider the pair of dual linear programs:

M in cT x + kc xn+1

(P ) s.t. 2L
Ax + (b − 2 Ae)xn+1 = b
4L T
(2 e − c) x 4L
+ 2 xn+2 = kb
x≥0 xn+1 ≥ 0 xn+2 ≥ 0

and

M in bT y + kb ym+1

(D ) s.t. T 4L
A y + (2 e − c)ym+1 + s = c
(b − 22L Ae)T y + sn+1 = kc
24L ym+1 + sn+2 = 0
s, sn+1 , sn+2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


320 Chapter 2. Introduction to Linear Optimization

where kb = 26L (n + 1) − 22L cT e is chosen in such a way that x′ = (x, xn+1 , xn+2 ) =
(22L e, 1, 22L ) is a (strict) feasible solution to (P ′ ) and kc = 26L . Notice that (y ′ , s′ ) =
(y, ym+1 , s, sn+1 , sn+2 ) = (0, −1, 24L e, kc , 24L ) is a feasible solution to (D′ ) with s′ > 0.
x′ and (y ′ , s′ ) serve as our initial feasible solutions.
We have to show: √
1. G(x′ ; s′ ) = O( n′ L) where n′ = n + 2,
2. the pair (P ′ ) − (D′ ) is equivalent to (P ) − (D),
3. the input size L′ for (P ′ ) as defined in the lecture notes does not increase too
much.
The proofs of these statements are simple but heavily use the definition of L and
the fact that vertices have all components bounded by 2L .
We first show 1. Notice first that x′j s′j = 26L for all j, implying that

′ ′ ′
√ T
n
n′ ) ln(x′ s′ ) − ln(x′j s′j )
X
G(x ; s ) = (n +
j=1

= (n′ + n′ ) ln(26L n′ ) − n′ ln(26L )
√ √
= n′ ln(26L ) + (n′ + n′ ) ln(n′ )

= O( n′ L)

In order to show that (P ′ ) − (D′ ) are equivalent to (P ) − (D), we consider an


optimal solution x∗ to (P ) and an optimal solution (y ∗ , s∗ ) to (D) (the case where
(P ) or (D) is infeasible is considered in the problem set). Without loss of generality,
we can assume that x∗ and (y ∗ , s∗ ) are vertices of the corresponding polyhedra. In
particular, this means that x∗j , |yj∗ |, s∗j < 2L .
Proposition 2.5 Let x′ = (x∗ , 0, (kb −(24L e−c)T x∗ )/24L ) and let (y ′ , s′ ) = (y ∗ , 0, , s∗ , kc −
(b − 22L Ae)T y ∗ , 0). Then
1. x′ is a feasible solution to (P ′ ) with x′n+2 > 0,
2. (y ′ , s′ ) is a feasible solution to (D′ ) with s′n+1 > 0,
3. x′ and (y ′ , s′ ) satisfy complementary slackness, i.e. they constitute a pair of
optimal solutions for (P ′ ) − (D′ ).

Proof. To show that x′ is a feasible solution to (P ′ ) with x′n+2 > 0, we only need to
show that kb − (24L e − c)T x∗ > 0 (the reader can easily verify that x′ satisfy all the
equalities defining the feasible region of (P ′ )). This follows from the fact that

(24L e − c)T x∗ ≤ n(24L + 2L )2L = n(25L + 22L ) < n26L

and

kb = 26L (n + 1) − 22L cT e ≥ 26L (n + 1) − 22L n max |cj | ≥ 26L n + 26L − 23L > n26L
j

where we have used the definition of L and the fact that vertices have all their entries
bounded by 2L .
To show that (y ′ , s′ ) is a feasible solution to (D′ ) with s′n+1 > 0, we only need to
show that kc − (b − 22L Ae)T y ∗ > 0. This is true since
(b − 22L Ae)T y ∗ ≤ bT y ∗ − 22L eT AT y ∗

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.113 Modeling: Linear Programming 321

≤ m max |bi |2L + 22L nm max |aij |2L


i i,j
2L 4L 6L
= 2 +2 <2 = kc .
x′ and (y ′ , s′ ) satisfy complementary slackness since
• x∗′ T s∗ ′= 0 by optimality of x∗ and (y∗, s∗) for (P ) and (D)
• xn+1 sn+1 = 0 and
• n+2s′n+2 = 0.
x ′


This proposition shows that, from an optimal solution to (P ) − (D), we can easily
construct an optimal solution to (P ′ ) − (D′ ) of the same cost. Since this solution
has s′n+1 > 0, any optimal solution x̂ to (P ′ ) must have x̂n+1 = 0. Moreover, since
x′n+2 > 0, any optimal solution (ŷ, ŝ) to (D′ ) must satisfy ŝn+2 = 0 and, as a result,
ŷm+1 = 0. Hence, from any optimal solution to (P ′ ) − (D′ ), we can easily deduce an
optimal solution to (P ) − (D). This shows the equivalence between (P ) − (D) and
(P ′ ) − (D′ ).
By some tedious but straightforward calculations, it is possible to show that
L (corresponding to (P ′ ) − (D′ )) is at most 24L. In other words, (P ) − (D) and

(P ′ ) − (D′ ) have equivalent sizes.

Graphical Representation of Linear Programs


The graphical representation of a linear program is a powerful tool for visualizing
and solving problems. It involves plotting the feasible region and finding the optimal
solution by identifying the intersection point with the objective function.

Algebraic Solution Techniques


There are several algebraic solution techniques used to solve linear programs, includ-
ing:
Graphical Method: The graphical method involves plotting the feasible region
and finding the optimal solution by identifying the intersection point with the
objective function.
Simplex Method: The simplex method is an iterative method that involves
solving a series of linear programs to find the optimal solution.
Dual Simplex Method: The dual simplex method is an iterative method that
involves solving a series of linear programs to find the optimal solution.

2.113 Modeling: Linear Programming


Linear Programming, also known as Linear Optimization, is the starting point for
most forms of optimization. It is the problem of optimization a linear function over
linear constraints.
In this section, we will define what this means, how to setup a linear program,
and discuss many examples. Examples will be connected with code in Excel and
Python (using with PuLP or Gurobipy modeling tools) so that you can easily start
solving optimization problems. Tutorials on these tools will come in later chapters.
We begin this section with a simple example.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


322 Chapter 2. Introduction to Linear Optimization

■ Example 2.30 Toy Maker


Consider the problem of a toy company that produces toy planes and toy boats.
The toy company can sell its planes for $10 and its boats for $8 dollars. It costs $3
in raw materials to make a plane and $2 in raw materials to make a boat. A plane
requires 3 hours to make and 1 hour to finish while a boat requires 1 hour to make
and 2 hours to finish. The toy company knows it will not sell anymore than 35
planes per week. Further, given the number of workers, the company cannot spend
anymore than 160 hours per week finishing toys and 120 hours per week making
toys. The company wishes to maximize the profit it makes by choosing how much
of each toy to produce.
We can represent the profit maximization problem of the company as a linear
programming problem. Let x1 be the number of planes the company will produce
and let x2 be the number of boats the company will produce. The profit for each
plane is $10 − $3 = $7 per plane and the profit for each boat is $8 − $2 = $6 per
boat. Thus the total profit the company will make is:

z(x1 , x2 ) = 7x1 + 6x2 (2.51)

The company can spend no more than 120 hours per week making toys and since
a plane takes 3 hours to make and a boat takes 1 hour to make we have:

3x1 + x2 ≤ 120 (2.52)

Likewise, the company can spend no more than 160 hours per week finishing toys
and since it takes 1 hour to finish a plane and 2 hour to finish a boat we have:

x1 + 2x2 ≤ 160 (2.53)

Finally, we know that x1 ≤ 35, since the company will make no more than 35
planes per week. Thus the complete linear programming problem is given as:

max

 z(x1 , x2 ) = 7x1 + 6x2

s.t. 3x1 + x2 ≤ 120






x1 + 2x2 ≤ 160


(2.54)


 x1 ≤ 35

x1 ≥ 0






x2 ≥ 0

Exercise 2.9 Chemical Manufacturingexer:ChemicalPlant A chemical manufacturer


produces three chemicals: A, B and C. These chemical are produced by two
processes: 1 and 2. Running process 1 for 1 hour costs $4 and yields 3 units of
chemical A, 1 unit of chemical B and 1 unit of chemical C. Running process 2 for
1 hour costs $1 and produces 1 units of chemical A, and 1 unit of chemical B (but
none of Chemical C). To meet customer demand, at least 10 units of chemical A,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 323

5 units of chemical B and 3 units of chemical C must be produced daily. Assume


that the chemical manufacturer wants to minimize the cost of production. Develop
a linear programming problem describing the constraints and objectives of the
chemical manufacturer.
[Hint: Let x1 be the amount of time Process 1 is executed and let x2 be amount
of time Process 2 is executed. Use the coefficients above to express the cost of
running Process 1 for x1 time and Process 2 for x2 time. Do the same to compute
the amount of chemicals A, B, and C that are produced.] ■

2.114 Modeling and Assumptions in Linear Programming


2.114.1 General models

A Generic Linear Program (LP)


Decision Variables:
xi : continuous variables (xi ∈ R, i.e., a real number), ∀i = 1, . . . , 3.
Parameters (known input parameters):
ci : cost coefficients ∀i = 1, . . . , n
aij : constraint coefficients ∀i = 1, . . . , n, j = 1, . . . , m
bj : right hand side coefficient for constraint j, j = 1, . . . , m
The problem we will consider is
max z = c1 x1 + · · · + cn xn
s.t. a11 x1 + · · · + a1n xn ≤ b1
.. (2.55)
.
am1 x1 + · · · + amn xn ≤ bm
For example, in 3 variables and 4 contraints this could look like the following.
The following example considers other types of constraints, i.e., ≥ and =. We will
show how all these forms can be converted later.
Decision Variables:
xi : continuous variables (xi ∈ R, i.e., a real number), ∀i = 1, · · · , 3.
Parameters (known input parameters):
ci : cost coefficients ∀i = 1, . . . , 3
aij : constraint coefficients ∀i = 1, . . . , 3, j = 1, . . . , 4
bj : right hand side coefficient for constraint j, j = 1, . . . , 4

Min z = c1 x1 + c2 x2 + c3 x3 (2.56)
s.t. a11 x1 + a12 x2 + a13 x3 ≥ b1 (2.57)
a21 x1 + a22 x2 + a23 x3 ≤ b2 (2.58)
a31 x1 + a32 x2 + a33 x3 = b3 (2.59)
a41 x1 + a42 x2 + a43 x3 ≥ b4 (2.60)
x1 ≥ 0, x2 ≤ 0, x3 urs. (2.61)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


324 Chapter 2. Introduction to Linear Optimization

Definition 2.43 Linear Functionlinearfunction A function z : Rn → R is linear if


there are constants c1 , . . . , cn ∈ R so that:

z(x1 , . . . , xn ) = c1 x1 + · · · + cn xn (2.62)

For the time being, we will eschew the general form and focus exclusively on
linear programming problems with two variables. Using this limited case, we will
develop a graphical method for identifying optimal solutions, which we will generalize
later to problems with arbitrary numbers of variables.

2.114.2 Assumptions
Inspecting Example 2.30 (or the more general Problem 2.55) we can see there are
several assumptions that must be satisfied when using a linear programming model.
We enumerate these below:
Proportionality Assumption A problem can be phrased as a linear program only
if the contribution to the objective function and the left-hand-side of each
constraint by each decision variable (x1 , . . . , xn ) is proportional to the value of
the decision variable.
Additivity Assumption A problem can be phrased as a linear programming prob-
lem only if the contribution to the objective function and the left-hand-side of
each constraint by any decision variable xi (i = 1, . . . , n) is completely indepen-
dent of any other decision variable xj (j ̸= i) and additive.
Divisibility Assumption A problem can be phrased as a linear programming
problem only if the quantities represented by each decision variable are infinitely
divisible (i.e., fractional answers make sense).
Certainty Assumption A problem can be phrased as a linear programming prob-
lem only if the coefficients in the objective function and constraints are known
with certainty.
The first two assumptions simply assert (in English) that both the objective
function and functions on the left-hand-side of the (in)equalities in the constraints
are linear functions of the variables x1 , . . . , xn .
The third assumption asserts that a valid optimal answer could contain fractional
values for decision variables. It’s important to understand how this assumption
comes into play–even in the toy making example. Many quantities can be divided
into non-integer values (ounces, pounds etc.) but many other quantities cannot be
divided. For instance, can we really expect that it’s reasonable to make 12 a plane in
the toy making example? When values must be constrained to true integer values,
the linear programming problem is called an integer programming problem. There is
a vast literature dealing with these problems [PS98, WN99]. For many problems,
particularly when the values of the decision variables may become large, a fractional
optimal answer could be obtained and then rounded to the nearest integer to obtain
a reasonable answer. For example, if our toy problem were re-written so that the
optimal answer was to make 1045.3 planes, then we could round down to 1045.
The final assumption asserts that the coefficients (e.g., profit per plane or boat)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 325

is known with absolute certainty. In traditional linear programming, there is no


lack of knowledge about the make up of the objective function, the coefficients in
the left-hand-side of the constraints or the bounds on the right-hand-sides of the
constraints. There is a literature on stochastic programming [KW94, BN02] that
relaxes some of these assumptions, but this too is outside the scope of the course.
Exercise 2.10 In a short sentence or two, discuss whether the problem given in
Example 2.30 meets all of the assumptions of a scenario that can be modeled by a
linear programming problem. Do the same for Exercise 2.9. [Hint: Can you make
2 1
3 of a toy? Can you run a process for 3 of an hour?] ■

We will begin with a few examples, and then discuss specific problem types that
occur often.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


326 Chapter 2. Introduction to Linear Optimization

■ Example 2.31 — Production with welding robot. You have 21 units of transpar-
ent aluminum alloy (TAA), LazWeld1, a joining robot leased for 23 hours, and
CrumCut1, a cutting robot leased for 17 hours of aluminum cutting. You also
have production code for a bookcase, desk, and cabinet, along with commitments
to buy any of these you can produce for $18, $16, and $10 apiece, respectively. A
bookcase requires 2 units of TAA, 3 hours of joining, and 1 hour of cutting, a desk
requires 2 units of TAA, 2 hours of joining, and 2 hour of cutting, and a cabinet
requires 1 unit of TAA, 2 hours of joining, and 1 hour of cutting. Formulate an
LP to maximize your revenue given your current resources. ■

Solution: Sets:
• The types of objects = { bookcase, desk, cabinet}.
Parameters:
• Purchase cost of each object
• Units of TAA needed for each object
• Hours of joining needed for each object
• Hours of cutting needed for each object
• Hours of TAA, Joining, and Cutting available on robots
Decision variables:
xi : number of units of product i to produce,
for all i =bookcase, desk, cabinet.

Objective and Constraints:

max z = 18x1 + 16x2 + 10x3 (profit)


s.t. 2x1 + 2x2 + 1x3 ≤ 21 (T AA)
3x1 + 2x2 + 2x3 ≤ 23 (LazW eld1)
1x1 + 2x2 + 1x3 ≤ 17 (CrumCut1)
x1 , x2 , x3 ≥ 0.

■ Example 2.32 — The Diet Problem. In the future (as envisioned in a bad 70’s
science fiction film) all food is in tablet form, and there are four types, green, blue,
yellow, and red. A balanced, futuristic diet requires, at least 20 units of Iron, 25
units of Vitamin B, 30 units of Vitamin C, and 15 units of Vitamin D. Formulate
an LP that ensures a balanced diet at the minimum possible cost. ■

Tablet Iron B C D Cost ($)


Chem 1 6 6 7 4 1.25
Chem 2 4 5 4 9 1.05
Chem 3 5 2 5 6 0.85
Chem 4 3 6 3 2 0.65

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 327

Solution: Sets:
• Set of tablets {1, 2, 3, 4}
Parameters:
• Iron in each tablet
• Vitamin B in each tablet
• Vitamin C in each tablet
• Vitamin D in each tablet
• Cost of each tablet
Decision variables:
xi : number of tablet of type i to include in the diet, ∀i ∈ {1, 2, 3, 4}.

Objective and Constraints:

Min z = 1.25x1 + 1.05x2 + 0.85x3 + 0.65x4


s.t. 6x1 + 4x2 + 5x3 + 3x4 ≥ 20
6x1 + 5x2 + 2x3 + 6x4 ≥ 25
7x1 + 4x2 + 5x3 + 3x4 ≥ 30
4x1 + 9x2 + 6x3 + 2x4 ≥ 15
x1 , x2 , x3 , x4 ≥ 0.

■ Example 2.33 — The Next Diet Problem. Progress is important, and our last
problem had too many tablets, so we are going to produce a single, purple, 10
gram tablet for our futuristic diet requires, which are at least 20 units of Iron, 25
units of Vitamin B, 30 units of Vitamin C, and 15 units of Vitamin D, and 2000
calories. The tablet is made from blending 4 nutritious chemicals; the following
table shows the units of our nutrients per, and cost of, grams of each chemical.
Formulate an LP that ensures a balanced diet at the minimum possible cost. ■

Tablet Iron B C D Calories Cost ($)


Chem 1 6 6 7 4 1000 1.25
Chem 2 4 5 4 9 250 1.05
Chem 3 5 2 5 6 850 0.85
Chem 4 3 6 3 2 750 0.65

Solution: Sets:
• Set of chemicals {1, 2, 3, 4}
Parameters:
• Iron in each chemical
• Vitamin B in each chemical
• Vitamin C in each chemical
• Vitamin D in each chemical
• Cost of each chemical

T.Abraha(PhD) @AKU, 2024 Linear Optimization


328 Chapter 2. Introduction to Linear Optimization

Decision variables:
xi : grams of chemical i to include in the purple tablet, ∀i = 1, 2, 3, 4.
Objective and Constraints:

min z = 1.25x1 + 1.05x2 + 0.85x3 + 0.65x4


s.t.6x1 + 4x2 + 5x3 + 3x4 ≥ 20
6x1 + 5x2 + 2x3 + 6x4 ≥ 25
7x1 + 4x2 + 5x3 + 3x4 ≥ 30
4x1 + 9x2 + 6x3 + 2x4 ≥ 15
1000x1 + 250x2 + 850x3 + 750x4 ≥ 2000
x1 + x2 + x3 + x4 = 10
x1 , x2 , x3 , x4 ≥ 0.

■Example 2.34 — Work Scheduling Problem. You are the manager of LP Burger.
The following table shows the minimum number of employees required to staff the
restaurant on each day of the week. Each employees must work for five consecutive
days. Formulate an LP to find the minimum number of employees required to staff
the restaurant.

Day of Week Workers Required


1 = Monday 6
2 = Tuesday 4
3 = Wednesday 5
4 = Thursday 4
5 = Friday 3
6 = Saturday 7
7 = Sunday 7

Solution: This problem has multiple optimal solutions for which days workers
begin working on, all of which result in 8 total workers hired.
Decision variables:
xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7
Objective and Constraints:

Min z = x1 + x2 + x3 + x4 + x5 + x6 + x7
s.t. x1 + x4 + x5 + x6 + x7 ≥ 6
x2 + x5 + x6 + x7 + x1 ≥ 4
x3 + x6 + x7 + x1 + x2 ≥ 5
x4 + x7 + x1 + x2 + x3 ≥ 4
x5 + x1 + x2 + x3 + x4 ≥ 3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 329

x6 + x2 + x3 + x4 + x5 ≥ 7
x7 + x3 + x4 + x5 + x6 ≥ 7
x1 , x2 , x3 , x4 , x5 , x6 , x7 ≥ 0.

One solution is as follows:


LP Solution IP Solution
zLP = 7.333 zI = 8.0
x1 = 0 x1 = 0
x2 = 0.333 x2 = 0
x3 = 1 x3 = 0
x4 = 2.333 x4 = 3
x5 = 0 x5 = 0
x6 = 3.333 x6 = 4
x7 = 0.333 x7 = 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


330 Chapter 2. Introduction to Linear Optimization

■ Example 2.35 — LP Burger - extended. LP Burger has changed its policy, and
allows, at most, two part time workers, who work for two consecutive days in a
week. Formulate this problem. ■

Solution: Decision variables:


xi : the number of workers that start 5 consecutive days of work on day i, i = 1, · · · , 7
yi : the number of workers that start 2 consecutive days of work on day i, i = 1, · · · , 7.
Objective and Constraints:

Min z = 5(x1 + x2 + x3 + x4 + x5 + x6 + x7 )
+ 2(y1 + y2 + y3 + y4 + y5 + y6 + y7 )
s.t. x1 + x4 + x5 + x6 + x7 + y1 + y7 ≥ 6
x2 + x5 + x6 + x7 + x1 + y2 + y1 ≥ 4
x3 + x6 + x7 + x1 + x2 + y3 + y2 ≥ 5
x4 + x7 + x1 + x2 + x3 + y4 + y3 ≥ 4
x5 + x1 + x2 + x3 + x4 + y5 + y4 ≥ 3
x6 + x2 + x3 + x4 + x5 + y6 + y5 ≥ 7
x7 + x3 + x4 + x5 + x6 + y7 + y6 ≥ 7
y1 + y2 + y3 + y4 + y5 + y6 + y7 ≤ 2
xi ≥ 0, yi ≥ 0, ∀i = 1, · · · , 7.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 331

2.114.3 Knapsack Problem

■ Example 2.36 — Camping Trip. Imagine you are preparing for a week-long camping
trip to the mountains. You have a backpack with a weight capacity of 20 kilograms.
Your goal is to pack the most valuable items to ensure a comfortable and safe trip
without exceeding the backpack’s weight limit. Each item has a weight and a value
associated with it, representing its importance and utility for the trip. The table
below lists the potential items you can take, their weights in kilograms, and their
values on a scale from 1 to 10 (with 10 being the most valuable).
Formulate an Integer Program to find the optimal items you should bring on your
trip with you.

Item Index Weight (kg) Value


Tent A 4 10
Stove B 2 9
Sleeping Bag C 3 8
First Aid Kit D 1 7
Food Supplies E 5 9
Water Filter F 1 6
Clothes G 4 5
Map and Compass H 0.5 7

Solution: Decision variables:


xi : whether to include item i in the backpack, where i ∈ {A, B, C, D, E, F, G, H},
xi = 1 if item i is included, and xi = 0 otherwise.
Objective and Constraints:

Max Z = 10xA + 9xB + 8xC + 7xD + 9xE + 6xF + 5xG + 7xH


s.t. 4xA + 2xB + 3xC + 1xD + 5xE + 1xF + 4xG + 0.5xH ≤ 20
xA , xB , xC , xD , xE , xF , xG , xH ∈ {0, 1}.

2.114.4 Capital Investment

■ Example 2.37 — Capital Allocation Problem. You are a financial planner for an
investment firm. The firm has $100,000 to invest in a portfolio of projects. Each
project has a projected return and requires an initial investment. The goal is to
maximize the total return of the portfolio while not exceeding the available capital.
The following table lists potential projects, their required investments, and their
projected returns.
Formulate an LP to maximize the total return of the portfolio while not
exceeding the available investment capital.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


332 Chapter 2. Introduction to Linear Optimization

Project Investment Required ($) Projected Return ($)


A 20,000 30,000
B 30,000 40,000
C 50,000 75,000
D 10,000 15,000
E 40,000 60,000

Solution: Decision variables:


xi : whether to include project i in the portfolio, where i ∈ {A, B, C, D, E}, xi = 1 if
project i is included, and xi = 0 otherwise.
Objective and Constraints:
Max z = 30, 000xA + 40, 000xB + 75, 000xC + 15, 000xD + 60, 000xE
s.t. 20, 000xA + 30, 000xB + 50, 000xC + 10, 000xD + 40, 000xE ≤ 100, 000
xA , xB , xC , xD , xE ∈ {0, 1}.

2.114.5 Work Scheduling


2.114.6 Assignment Problem
Consider the assignment of n teams to n projects, where each team ranks the projects,
where their favorite project is given a rank of n, their next favorite n − 1, and their
least favorite project is given a rank of 1. The assignment problem is formulated as
follows (we denote ranks using the R-parameter):
Variables:
xij : 1 if project i assigned to team j, else 0.
n X
X n
Max z = Rij xij
i=1 j=1
n
X
s.t. xij = 1, ∀j = 1, · · · , n
i=1
Xn
xij = 1, ∀i = 1, · · · , n
j=1
xij ≥ 0, ∀i = 1, · · · , n, j = 1, · · · , n.

■ Example 2.38 Hiring for tasks In this assignment problem, we need to hire three
people (Person 1, Person 2, Person 3) to three tasks (Task 1, Task 2, Task 3). In
the table below, we list the cost of hiring each person for each task, in dollars.
Since each person has a different cost for each task, we must make an assignment
Cost Task 1 Task 2 Task 3
Person 1 40 47 80
to minimize our total cost.
Person 2 72 36 58
Person 3 24 61 71
Given the specific costs of assigning three people to three tasks, we can write

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 333

out the mathematical model explicitly using the given numbers.


Objective Function:
Minimize the total cost of assignments:

Z = 40x11 + 47x12 + 80x13 + 72x21 + 36x22 + 58x23 + 24x31 + 61x32 + 71x33 (2.63)

Subject to Constraints:
Each person is assigned to exactly one task:

x11 + x12 + x13 = 1 (2.64)


x21 + x22 + x23 = 1 (2.65)
x31 + x32 + x33 = 1 (2.66)

Each task is assigned to exactly one person:

x11 + x21 + x31 = 1 (2.67)


x12 + x22 + x32 = 1 (2.68)
x13 + x23 + x33 = 1 (2.69)

Binary constraints on the variables:

xij ∈ {0, 1} ∀i ∈ {1, 2, 3}, ∀j ∈ {1, 2, 3} (2.70)

This explicit model incorporates the specific costs associated with each person-
task assignment and ensures that each person is assigned to exactly one task, each
task is assigned to exactly one person, and the overall cost is minimized.
We could write out this model using more generic notation in the following
way: We define the following sets, parameters, and variables to construct the
mathematical model.
Sets:
• I = {1, 2, 3}, the set of people.
• J = {1, 2, 3}, the set of tasks.
Parameters:
• Cij , the cost of assigning person i ∈ I to task j ∈ J. The costs are given in
the following table:
 
40 47 80
 
C = 72 36 58


24 61 71

Variables:

1 if person i is assigned to task j
• xij = 0 otherwise for all i ∈ I, j ∈ J.

Model:
The objective is to minimize the total cost of assignments:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


334 Chapter 2. Introduction to Linear Optimization

XX
Minimize Z = Cij xij (2.71)
i∈I j∈J

Subject to the constraints:


1. Each person is assigned to exactly one task:

X
xij = 1 ∀i ∈ I (2.72)
j∈J

2. Each task is assigned to exactly one person:

X
xij = 1 ∀j ∈ J (2.73)
i∈I

3. Binary constraints on the variables:

xij ∈ {0, 1} ∀i ∈ I, ∀j ∈ J (2.74)

This model ensures that each person is assigned to exactly one task, each task is
assigned to exactly one person, and the total cost of the assignments is minimized.

The assignment problem has an integrality property, such that if we remove the
binary restriction on the x variables (now just non-negative, i.e., xij ≥ 0) then we
still get binary assignments, despite the fact that it is now an LP. This property is
very interesting and useful. Of course, the objective function might not quite what
we want, we might be interested ensuring that the team with the worst assignment
is as good as possible (a fairness criteria). One way of doing this is to modify the
assignment problem using a max-min objective:
Max-min Assignment-like Formulation

M ax z
n
X
s.t. xij = 1, ∀j = 1, · · · , n
i=1
Xn
xij = 1, ∀i = 1, · · · , n
j=1
xij ≥ 0, ∀i = 1, · · · , n, J = 1, · · · , n
n
X
z≤ Rij xij , ∀j = 1, · · · , n.
i=1

Does this formulation have the integrality property (it is not an assignment problem)?
Consider a very simple example where two teams are to be assigned to two projects
and the teams give the projects the following rankings: Both teams prefer Project
2. For both problems, if we remove the binary restriction on the x-variable, they

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 335

Project 1 Project 2
Team 1 2 1
Team 2 2 1

can take values between (and including) zero and one. For the assignment problem
the optimal solution will have z = 3, and fractional x-values will not improve z. For
the max-min assignment problem this is not the case, the optimal solution will have
z = 1.5, which occurs when each team is assigned half of each project (i.e., for Team
1 we have x11 = 0.5 and x21 = 0.5).

2.114.7 Multi period Models


2.114.7.1 Production Planning
2.114.7.2 Crop Planning
2.114.8 Mixing Problems
2.114.9 Financial Planning
2.114.10 Network Flow
To begin a discussion on Network flow, we first need to discuss graphs.
2.114.10.1 Graphs
A graph G = (V, E) is defined by a set of vertices V and a set of edges E that contains
pairs of vertices.
For example, the following graph G can be described by the vertex set V =
{1, 2, 3, 4, 5, 6} and the edge set E = {(4, 6), (4, 5), (5, 1)(1, 2), (2, 5), (2, 3), (3, 4)}.

6
5
4 1

2
3

In an undirected graph, we do not distinguish the direction of the edge. That


is, for two vertices i, j ∈ V , we can equivalently write (i, j) or (j, i) to represent the
edge.
Alternatively, we will want to consider directed graphs. We denote these as
G = (V, A) where A is a set of arcs where an arc is a directed edge.
For example, the following directed graph G can be described by the vertex set V =
{1, 2, 3, 4, 5, 6} and the edge set A = {(4, 6), (6, 4), (4, 5), (5, 1)(1, 2), (2, 5), (2, 3), (3, 4)}.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


336 Chapter 2. Introduction to Linear Optimization

6
5
4 1

2
3

Sets A finite network G is described by a finite set of vertices V and a finite set A of
arcs. Each arc (i, j) has two key attributes, namely its tail j ∈ V and its head i ∈ V .
We think of a (single) commodity as being allowed to "flow" along each arc, from
its tail to its head.
Variables Indeed, we have "flow" variables

xij := amount of flow on arc(i, j) from vertex i to vertex j,

for all (i, j) ∈ A.


2.114.10.2 Maximum Flow Problem

X
max xsi max total flow from source (2.75)
(s,i)∈A
X X
s.t. xiv − xvj = 0 v ∈ V \{s, t} (2.76)
i:(i,v)∈A j:(v,j)∈A
0 ≤ xij ≤ uij ∀(i, j) ∈ A (2.77)

X
minimize ℓu→v · xu→v
u→v
X X
subject to xu→s − xs→w =1
u w
X X
xu→t − xt→w = −1
u w
X X
xu→v − xv→w = 0 for every vertex v ̸= s, t
u w
xu→v ≥ 0 for every edge u → v
Shortest Path Problem
Or maybe write it like this:
X
min cij xij (2.78)
(i,j)∈A
X
s.t. xij = 0∀i ∈ V \ {s, t} (2.79)
(i,j)∈δ + (i)
X
xij = −1 (2.80)
(i,j)∈δ + (s)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.114 Modeling and Assumptions in Linear Programming 337
X
xij = 1 (2.81)
(i,j)∈δ + (t)
xij ∈ {0, 1}∀(i, j) ∈ A, (2.82)

■ Example 2.39 Max flow example


v1 12 v3
15
11
10
s t

7
4
8 4
v2 11 v4

■ Example 2.40 Min Cost Network Flow


6 -3
12, 9
v1 v3 15
,4 ,
11 2
10, 1

1, 2

7, 8

-2 v5 v6 10
1
4,

8, 2
7 4,
v2 v4
11, 7
-5 -6

T.Abraha(PhD) @AKU, 2024 Linear Optimization


338 Chapter 2. Introduction to Linear Optimization

2.114.10.3 Minimum Cost Network Flow


Parameters We assume that flow on arc (i, j) should be non-negative and should
not exceed
uij := the flow upper bound on arc(i, j),

for (i, j) ∈ A. Associated with each arc (i, j) is a cost

cij := cost per-unit-flow on arc (i, j),

for (i, j) ∈ A. The (total) cost of the flow x is defined to be


X
cij xij .
(i,j)∈A

We assume that we have further data for the nodes. Namely,

bv := the net supply at node v,

for v ∈ V .
A flow is conservative if the net flow out of node v, minus the net flow into node
v, is equal to the net supply at node v, for all nodes v ∈ V .
The (single-commodity min-cost) network-flow problem is to find a minimumcost
conservative flow that is non-negative and respects the flow upper bounds on the
arcs.
Objective and Constraints We can formulate this as follows:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


2.115 Modeling Tricks 339

X
min cij xij minimize cost
(i,j)∈A
X X
xiv − xvi = bv , for all v ∈ V, flow conservation
(i,v)∈A (v,i)∈A
0 ≤ xij ≤ uij , for all (i, j) ∈ A.

Theorem 2.58 Integrality of Network Flowintegralitynetworkflow If the capacities


and demands are all integer values, then there always exists an optimal solution to
the LP that has integer values.

2.114.11 Multi-Commodity Network Flow


In the same vein as the Network Flow Problem
K X
cke xke
X
min
k=1 e∈A
xke − xke = bkv , for v ∈ N , k = 1, 2, . . . , K;
X X

e∈A : t(e)=v e∈A : h(e)=v


K
xke ≤ ue , for e ∈ A;
X

k=0
xke ≥ 0, for e ∈ A, k = 1, 2, . . . , K

Notes:
K=1 is ordinary single-commodity network flow. Integer solutions for free when
node-supplies and arc capacities are integer. K=2 example below with integer data
gives a fractional basic optimum. This example doesn’t have any feasible integer
flow at all.

R Unfortunately, the same integrality theorem does not hold in the multi-
commodity network flow probem. Nontheless, if the quanties in each flow are
very large, then the LP solution will likely be very close to an integer valued
solution.

2.115 Modeling Tricks


2.115.1 Maximizing a minimum
When the constraints could be general, we will write x ∈ X to define general con-
straints. For instance, we could have X = {x ∈ Rn : Ax ≤ b} of X = {x ∈ Rn : Ax ≤
b, x ∈ Zn } or many other possibilities.
Consider the problem

max min{x1 , . . . , xn }
such that x ∈ X

T.Abraha(PhD) @AKU, 2024 Linear Optimization


340 Chapter 2. Introduction to Linear Optimization

Having the minimum on the inside is inconvenient. To remove this, we just define
a new variable y and enforce that y ≤ xi and then we maximize y. Since we are
maximizing y, it will take the value of the smallest xi . Thus, we can recast the
problem as

max y
such that y ≤ xi for i = 1, . . . , n
x∈X

■ Example 2.41 Minimizing an Absolute Value Note that

|t| = max(t, −t),

Thus, if we need to minimize |t| we can instead write

min z (2.83)
s.t. (2.84)
t ≤ z −t ≤ z (2.85)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


3. Geometry of Linear Optimization

3.1 Geometric Interpretation of Linear Programs


3.2 Feasible Regions and Vertices
3.3 Optimality and Boundedness
Consider the following linear programming problem:

min x + y
s.t. x + 2y ≥ 2 (3.1)
3x + 2y ≥ 6.

Observe that (3.1) is equivalent to

min z
s.t. z − x − y = 0
(3.2)
x + 2y ≥ 2
3x + 2y ≥ 6.

Note that the objective function is replaced with z and z is set to the original
objective function in the first constraint of (3.2) since z = x + y if and only if
z − x − y = 0. Then, solving (3.2) is equivalent to finding among all the solutions to
the following system a solution that minimizes z, if it exists.

z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
3x + 2y ≥ 6 (4)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


342 Chapter 3. Geometry of Linear Optimization

Since we are interested in the minimum possible value for z we use Fourier-Motzking
elimination to eliminate the variables x and y.
To eliminate x, we first multiply (4) by 31 to obtain:

z −x−y ≥ 0 (1)
−z + x + y ≥ 0 (2)
x + 2y ≥ 2 (3)
x + 32 y ≥ 2 (5)

Then eliminate x to obtain

(1) + (2) : 0≥0


(1) + (3) : z + y ≥ 2 (6)
(1) + (5) : z − 31 y ≥ 2 (7)

Note that there is no need to keep the first inequality. To eliminate y, we first
multiply (7) by 3 to obtain:

z +y ≥ 2 (6)
3z − y ≥ 6 (8)

Then eliminate y to obtain

4z ≥ 8 (9)

Multiplying (9) by 14 gives z ≥ 2. Hence, the minimum possible value for z among
all the solutions to the system is 2. So the optimal value of (3.2) is 2. To obtain an
optimal solution, set z = 2. Then we have no choice but to set y = 0 and x = 2. One
can check that (x, y) = (2, 0) is a feasible solution with objective function value 2.
We can obtain an independent proof that the optimal value is indeed 2 if we
trace back the computations. Note that the inequality z ≥ 2 is given by

1 1 1
(9) ⇐ (6) + (8)
4 4 4
1 1 3
⇐ (1) + (3) + (7)
4 4 4
1 1 3 3
⇐ (1) + (3) + (1) + (5)
4 4 4 4
1 1
⇐ (1) + (3) + (4)
4 4

This shows that 14 (3) + 14 (4) gives the inequality x + y ≥ 2. Hence, no feasible
solution to (3.1) can have objective function value less than 2. But we have found
one feasible solution with objective function value 2. Hence, 2 is the optimal value.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


3.4 Solving systems of linear inequalities 343

3.4 Solving systems of linear inequalities


Before we can solve a linear programming problem, we should be able to solve the
seemingly simpler problem of finding a feasible solution. We will now consider how
one can determine if a system of linear inequalities has a solution.
For the sake of illustrating the principles involved, we limit ourselves to systems
consisting of only ≥-inequalities. Extending the method to work with any systems
of linear constraints is left as an exercise.
Suppose that we want to determine if there exist x, y ∈ R satisfying

x+y ≥ 0
2x + y ≥ 2
−x + y ≥ 1
−x + 2y ≥ −1.

The key is to take one of the variables and see how it is constrained by the
remaining variables. We “isolate” x by rewriting the system to the equivalent system

x ≥ −y
1
x ≥ 1− y
2
x ≤ −1 + y
x ≤ 1 + 2y.

Hence, x is constrained by the lower bounds −y and 1 − 12 y and the upper bounds
−1 + y and 1 + 2y. Therefore, we can find a value for x satisfying these bounds if
and only if each of the upper bounds is at least each of the lower bounds; that is,

−1 + y ≥ −y
1
−1 + y ≥ 1 − y
2
1 + 2y ≥ −y
1
1 + 2y ≥ 1 − y.
2
Simplifying this system gives

2y ≥ 1
3
y≥2
2
3y ≥ −1
5
y ≥ 0,
2
or more simply,
1
y≥
2
4
y≥
3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


344 Chapter 3. Geometry of Linear Optimization
1
y≥−
3
y ≥ 0.

Note that this system does not contain the variable x and it has a solution if and
only if y ≥ 34 . Hence, the original system has a solution if and only if y ≥ 34 . If we
set y = 2, for example, then x must satisfy

x ≥ −2
x≥0
x≤1
x ≤ 5.

 Thus,
 we can pick x to be any value in the closed interval [0, 1]. In particular,
x
 = 
0
is one solution to the given system of linear inequalities. There could be
y 2
other solutions.
The above example illustrates the process of solving a system of linear inequaltiies
by constructing a system that has a reduced number of variables. As the number
of variables is finite, the process can be repeated until we obtain a system whose
solvability is apparent (as in the one-variable case).
Observe that the pairing of an upper bound constraint of the form x ≤ q and a
lower bound constraint of the form x ≥ p to obtain q ≥ p is equivalent to adding the
inequalities −x ≥ −q and x ≥ p. This observation leads to the following:

3.4.1 Fourier-Motzkin Elimination


Given: A system of linear inequalities
n
X
aij xj ≥ bi , i = 1, . . . , m
j=1

where aij , bi ∈ R for i = 1, . . . , m and j = 1, . . . , n,


Eliminate xk for some k ∈ {1, . . . , n} using the following steps:
1. For each j ∈ {1, . . . m},
• if ajk > 0, multiply the jth inequality by a1jk ,
• if ajk < 0, multiply the jth inequality by − a1jk
2. Form a new system of inequalities as follows:
• copy down all the inequalities in which the coefficient of xk is 0
• for each inequality in which xk has positive coefficent and for each in-
equality in which xk has negative coefficient, obtain a new inequality by
adding them together.
Remarks.
1. Step 1 is to ensure that all the nonzero coefficients of xk are 1 or −1.
2. The new system formed in Step 2 will not contain the variable xk . Furthermore,
if x∗1 , . . . , x∗n is a solution to the original system, then x∗1 , . . . , x∗k−1 , x∗k+1 , . . . , x∗n
is a solution to the new system. And if x∗1 , . . . , x∗k−1 , x∗k+1 , . . . , x∗n is a solution
to the new system, then there exists x∗k such that x∗1 , . . . , x∗n is a solution to the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


3.4 Solving systems of linear inequalities 345

original system. (Why?) Hence, the original system has a solution if and only
if the new system does.
Now, if we apply Fourier-Motzkin elimination repeatedly, we obtain a system
with at most one variable such that it has a solution if and only if the original system
does. Since solving systems of linear inequalities with at most one variable is easy,
we can conclude whether or not the original system has a solution.
Note that if the coefficients are all rational, the system obtained after eliminating
one variable using Fourier-Motzkin elimination will also have only rational coefficients.

■ Example 3.1 Determine if the following system of inequalities has a solution:

x1 + x2 − 2x3 ≥ 2 (1)
−x1 − 3x2 + x3 ≥ 0 (2)
x2 + x3 ≥ 1 (3)

We first eliminate x1 . The new system is


(1) + (2) : −2x2 − x3 ≥ 2 (4)
x2 + x3 ≥ 1 (3)
We then eliminate x2 . We first normalize the coefficients of x2 :
1 1
2 × (4) −x2 − 2 x3 ≥ 1 (5)
x2 + x3 ≥ 1 (3)
So the new system is:
(5) + (3) : 12 x3 ≥ 2
So there is a solution. In particular, we can set x3 = 4. Then we must have
x2 = −3 and x1 = 13. ■

Remark. Note that setting x3 to another value larger than 4 will lead to different
solutions to the system. Since there are infinitely many different values that we can
set x3 to, there are infinitely many solutions.

Exercises
1. Use Fourier-Motzkin elimination to determine if there exist x, y, z ∈ R satisfying

x + y + 2z ≥ 1
−x + y + z ≥ 2
x−y+z ≥ 1
−y − 3z ≥ 0.
2. Let a , . . . , am ∈ Rn .
1 Let β1 , . . . , βm ∈ R. Let λ1 , . . . , λm ≥ 0. Then the inequal-
m
!T m
i
X X
ity λi a x≥ λi βi is called a nonnegative linear combination of the
i=1 i=1
T
inequalities ai x ≥ βi , i = 1, . . . , m. Show that any new inequality created by
Fourier-Motzkin Elimination is a nonnegative linear combination of the original
inequalities.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


346 Chapter 3. Geometry of Linear Optimization

Solutions

1. We use Fourier-Motzkin elimination to eliminate x. We first copy down the


inequality −y − 3z ≥ 0 and then form one new inequality by adding the first
two inequalities and another by adding the second and third inequalities. The
resulting system is

−y − 3z ≥ 0
2y + 3z ≥ 3
2z ≥ 3.
Note that this system has a solution if and only if the original system does.
We now use Fourier-Motzkin elimination to eliminate y. First we multiply the
second inequality by 12 to obtain

−y − 3z ≥ 0
3 3
y+ z ≥
2 2
2z ≥ 3.
Eliminating y gives

2z ≥ 3
3 3
− z ≥ ,
2 2
or equivalently,

3
z ≥
2
z ≤ −1,
which clearly has no solution. Hence, there is no x, y, z satisfying the original
system.
2. First of all, observe that a nonnegative linear combination of ≥-inequalites that
are themselves nonnegative linear combination of the inequalites in Ax ≥ b is
again a nonnegative linear combination of inequalities in Ax ≥ b.
It is easy to see that in Step 1 of Fourier-Motzkin Elimination all inequalities
are nonnegative linear combinations of the original inequalities. For instance,
′T
multiplying ai x ≥ βi′ by α > 0 is the same as taking the nonnegative linear
m
!T m
i
λi βi with λi = 0 for all i ̸= i′ and λi′ = α.
X X
combination λi a x≥
i=1 i=1
In Step 2, new inequalities are formed by adding two inequalities from Step 1.
Hence, they are nonnegative linear combinations of the inequalities from Step 1.
By the observation at the beginning, they are nonnegative linear combinations
of the original system.
Remark. By the observation at the beginning and this result, we see that after
repeated applications of Fourier-Motzkin Elimination, all resulting inequalities
are nonnegative linear combinations of the original inequalities. This is an
important fact that will be exploited later.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


3.5 Using an LP solver 347

3.5 Using an LP solver


Linear programming problems are rarely solved by hand. Problems from industrial
applications often have thousands (and sometimes millions) of variables and con-
straints. Fortunately, there exist a number of commercial as well as open-source
solvers that can handle such large-scale problem. In this chapter, we will take a look
at one of them. Before we do so, we introduce the CPLEX LP format which allows
us to specify linear programming problems in a way close to how we write them.

3.6 CPLEX LP file format


A description of the CPLEX LP file format can be found here.
However, it is perhaps easiest to illustrate with some examples. Consider

min x1 − 2x2 + x3
s.t. x1 − 2x2 + x3 − x4 ≥ 1
x1 + x2 + 2x3 − 2x4 = 3
− x2 + 3x3 − 5x4 ≤ 7
x3 , x4 ≥ 0.

The problem can be specified in LP format as follows:

min x_1 - 2x_2 + x_3


st
x_1 - 2x_2 + x_3 - x_4 >= 1
x_1 + x_2 + 2x_3 - 2x_4 = 3
- x_2 + 3x_3 - 5x_4 <= 7
bounds
x_1 free
x_2 free
end

Any variable that does not appear in the bounds section is automatically assumed
to be nonnegative.

3.7 SoPlex
SoPlex is an open-source linear programming solver. It is free for noncommercial
use. Binaries for Mac OS X and Windows are readily available for download.
One great feature of SoPlex is that it can return exact rational solutions whereas
most other solvers only return solutions as floating-point numbers.
Suppose that the problem for the example at the beginning is saved in LP format
in an ASCII file named eg.lp. The following is the output of running SoPlex in a
macOS command-line terminal:

bash-3.2$ ./soplex-2.2.1.darwin.x86_64.gnu.opt -X --solvemode=2 -f=0 -o=0 eg.lp


SoPlex version 2.2.1 [mode: optimized] [precision: 8 byte] [rational: GMP 6.0.0] [githash: 267a4
Copyright (c) 1996-2016 Konrad-Zuse-Zentrum fuer Informationstechnik Berlin (ZIB)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


348 Chapter 3. Geometry of Linear Optimization

int:solvemode = 2
real:feastol = 0
real:opttol = 0

Reading (real) LP file <eg.lp> . . .


Reading took 0.00 seconds.

LP has 3 rows 4 columns and 11 nonzeros.

Initial floating-point solve . . .


Simplifier removed 0 rows, 0 columns, 0 nonzeros, 0 col bounds, 0 row bounds
Reduced LP has 3 rows 4 columns 11 nonzeros
Equilibrium scaling LP

type | time | iters | facts | shift |violation | value


L | 0.0 | 0 | 1 | 4.00e+00 | 2.00e+00 | 0.00000000e+00
E | 0.0 | 1 | 2 | 0.00e+00 | 4.00e+00 | 3.00000000e+00
E | 0.0 | 2 | 3 | 0.00e+00 | 0.00e+00 | 1.00000000e+00

Floating-point optimal.
Max. bound violation = 0
Max. row violation = 1/4503599627370496
Max. reduced cost violation = 0
Max. dual violation = 0
Performing rational reconstruction . . .
Tolerances reached.
Solved to optimality.

SoPlex status : problem is solved [optimal]


Solving time (sec) : 0.00
Iterations : 2
Objective value : 1.00000000e+00

Primal solution (name, value):


x_1 7/3
x_2 2/3
All other variables are zero.
bash-3.2$

The option -X asks the solver to display the primal rational solution. The options
--solvemode=2 invokes iterative refinement for solving for a rational solution. The
options -f=0 -o=0 set the primal feasibility and dual feasibility tolerances to 0.
Without these options, one might get only approximate solutions to the problem. If
we remove the last three options and replace -X with -x, we obtain the following
instead:

bash-3.2$ ./soplex-2.2.1.darwin.x86_64.gnu.opt -x eg.lp


SoPlex version 2.2.1 [mode: optimized] [precision: 8 byte] [rational: GMP 6.0.0] [githash: 267a4
Copyright (c) 1996-2016 Konrad-Zuse-Zentrum fuer Informationstechnik Berlin (ZIB)

Reading (real) LP file <eg.lp> . . .


Reading took 0.00 seconds.

LP has 3 rows 4 columns and 11 nonzeros.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


3.8 NEOS server for optimization 349

Simplifier removed 0 rows, 0 columns, 0 nonzeros, 0 col bounds, 0 row bounds


Reduced LP has 3 rows 4 columns 11 nonzeros
Equilibrium scaling LP
type | time | iters | facts | shift |violation | value
L | 0.0 | 0 | 1 | 4.00e+00 | 2.00e+00 | 0.00000000e+00
E | 0.0 | 1 | 2 | 0.00e+00 | 4.00e+00 | 3.00000000e+00
E | 0.0 | 2 | 3 | 0.00e+00 | 0.00e+00 | 1.00000000e+00
--- transforming basis into original space
L | 0.0 | 0 | 1 | 0.00e+00 | 0.00e+00 | 1.00000000e+00
L | 0.0 | 0 | 1 | 0.00e+00 | 0.00e+00 | 1.00000000e+00

SoPlex status : problem is solved [optimal]


Solving time (sec) : 0.00
Iterations : 2
Objective value : 1.00000000e+00

Primal solution (name, value):


x_1 2.333333333
x_2 0.666666667
All other variables are zero (within 1.0e-16).
bash-3.2$

There are many solver options that one can specify. To view the list of all the
options, simply run the solver without options and arguments.

3.8 NEOS server for optimization


If one does not want to download and install SoPlex, one can use the NEOS server
for optimization. In addition to SoPlex, there are many other solvers to choose from.
To solve a linear programming problem using SoPlex on the NEOS server, one
can submit a file in CPLEX LP format here.

Exercises
1. Use SoPlex to obtain the exact optimal value of

min 3x + 2y + 9z
s.t. 53x + 20y + 96z ≥ 2
3
13x − 7y + 6z ≥ 17
−x + 71y − 3z ≥ 73
x , y , z ≥ 0.

Solutions
1882
1. The optimal value is 11679 .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4. Simplex Method

4.1 Introduction to the Simplex Algorithm


4.2 Tableau Method
4.3 Pivot Operations and Computational Aspects
4.4 Variants of the Simplex Method
Linear Programs (LP’s) with two variables can be solved graphically by plotting the
feasible region along with the level curves of the objective function. We will show
that we can find a point in the feasible region that maximizes the objective function
using the level curves of the objective function.
We will begin with an easy example that is bounded and investigate the structure
of the feasible region. We will then explore other examples.

4.5 Nonempty and Bounded Problem


Consider the problem

max 2X + 5Y
s.t. X + 2Y ≤ 16
5X + 3Y ≤ 45
X, Y ≥ 0

We want to start by plotting the feasible region, that is, the set points (X, Y )
that satisfy all the constraints.
We can plot this by first plotting the four lines
• X + 2Y = 16
• 5X + 3Y = 45
• X =0
• Y =0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


352 Chapter 4. Simplex Method

and then shading in the side of the space cut out by the corresponding inequality.

The resulting feasible region can then can be shaded in as the region that satisfies
all the inequalties.

Notice that the feasible region is nonempty (it has points that satisfy all the
inequalities) and also that it is bounded (the feasible points dont continue infinitly
in any direction).
We want to identify the extreme points (i.e., the corners) of the feasible region.
Understanding these points will be critical to understanding the optimal solutions of
the model. Notice that all extreme points can be computed by finding the intersection
of 2 of the lines. But! Not all intersections of any two lines are feasible.
We will later use the terminology basic feasible solution for an extreme point of
the feasible region, and basic solution as a point that is the intersection of 2 lines,
but is actually infeasible (does not satisfy all the constraints).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.5 Nonempty and Bounded Problem 353

Theorem 4.1 — Optimal Extreme Point. optimalextremepoint If the feasible region


is nonempty and bounded, then there exists an optimal solution at an extreme
point of the feasible region.

We will explore why this theorem is true, and also what happens when the feasible
region does not satisfy the assumptions of either nonempty or bounded. We illustrate
the idea first using the problem from Example 2.30.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


354 Chapter 4. Simplex Method

3x1 + x2 = 120

x1 = 35
∇(7x1 + 6x2 )

x1 + 2x2 = 160

3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0

(x∗1 , x∗2 ) = (16, 72)

3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0

Figure 4.1: Feasible Region and Level Curves of the Objective Function: The shaded
region in the plot is the feasible region and represents the intersection of the five
inequalities constraining the values of x1 and x2 . On the right, we see the optimal
solution is the “last” point in the feasible region that intersects a level set as we
move in the direction of increasing profit.

■ Example 4.1 — Continuation of Example 2.30. Let’s continue the example of the
Toy Maker begin in Example 2.30. Solve this problem graphically. ■

Solution: To solve the linear programming problem graphically, begin by drawing


the feasible region. This is shown in the blue shaded region of Figure 4.1.
After plotting the feasible region, the next step is to plot the level curves of the
objective function. In our problem, the level sets will have the form:

−7 c
7x1 + 6x2 = c =⇒ x2 = x1 +
6 6

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.5 Nonempty and Bounded Problem 355

This is a set of parallel lines with slope −7/6 and intercept c/6 where c can be varied
as needed. The level curves for various values of c are parallel lines. In Figure 4.1
they are shown in colors ranging from red to yellow depending upon the value of c.
Larger values of c are more yellow.
To solve the linear programming problem, follow the level sets along the gradient
(shown as the black arrow) until the last level set (line) intersects the feasible region.
If you are doing this by hand, you can draw a single line of the form 7x1 + 6x2 = c
and then simply draw parallel lines in the direction of the gradient (7, 6). At some
point, these lines will fail to intersect the feasible region. The last line to intersect
the feasible region will do so at a point that maximizes the profit. In this case,
the point that maximizes z(x1 , x2 ) = 7x1 + 6x2 , subject to the constraints given, is
(x∗1 , x∗2 ) = (16, 72).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


356 Chapter 4. Simplex Method

Note the point of optimality (x∗1 , x∗2 ) = (16, 72) is at a corner of the feasible
region. This corner is formed by the intersection of the two lines: 3x1 + x2 = 120 and
x1 + 2x2 = 160. In this case, the constraints

3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160

are both binding, while the other constraints are non-binding. In general, we will
see that when an optimal solution to a linear programming problem exists, it will
always be at the intersection of several binding constraints; that is, it will occur at a
corner of a higher-dimensional polyhedron.
We can now define an algorithm for identifying the solution to a linear programing
problem in two variables with a bounded feasible region (see Algorithm 1):

Algorithm 1 Algorithm for Solving a Two Variable Linear Programming Problem


Graphically–Bounded Feasible Region, Unique Solution Case

Algorithm for Solving a Linear Programming Problem Graphically


Bounded Feasible Region, Unique Solution
1. Plot the feasible region defined by the constraints.
2. Plot the level sets of the objective function.
3. For a maximization problem, identify the level set corresponding the greatest
(least, for minimization) objective function value that intersects the feasible
region. This point will be at a corner.
4. The point on the corner intersecting the greatest (least) level set is a solution
to the linear programming problem.

The example linear programming problem presented in the previous section has
a single optimal solution. In general, the following outcomes can occur in solving a
linear programming problem:
1. The linear programming problem has a unique solution. (We’ve already seen
this.)
2. There are infinitely many alternative optimal solutions.
3. There is no solution and the problem’s objective function can grow to posi-
tive infinity for maximization problems (or negative infinity for minimization
problems).
4. There is no solution to the problem at all.
Case 3 above can only occur when the feasible region is unbounded; that is, it
cannot be surrounded by a ball with finite radius. We will illustrate each of these
possible outcomes in the next four sections. We will prove that this is true in a later
chapter.

4.6 Infinitely Many Optimal Solutions


It can happen that there is more than one solution. In fact, in this case, there are
infinitly many optimal solutions. We’ll study a specific linear programming problem

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.6 Infinitely Many Optimal Solutions 357

with an infinite number of solutions by modifying the objective function in Example


2.30.

■ Example 4.2 — Toy Maker Alternative Solutions. ex:ToyMakerAltOptSoln Suppose


the toy maker in Example 2.30 finds that it can sell planes for a profit of $18 each
instead of $7 each. The new linear programming problem becomes:



max z(x1 , x2 ) = 18x1 + 6x2

 s.t. 3x1 + x2 ≤ 120





x1 + 2x2 ≤ 160


(4.1)


 x1 ≤ 35

x1 ≥ 0






x2 ≥ 0

Solution: Applying our graphical method for finding optimal solutions to linear
programming problems yields the plot shown in Figure 4.2. The level curves for the
function z(x1 , x2 ) = 18x1 + 6x2 are parallel to one face of the polygon boundary of
the feasible region. Hence, as we move further up and to the right in the direction
of the gradient (corresponding to larger and larger values of z(x1 , x2 )) we see that
there is not one point on the boundary of the feasible region that intersects that
level set with greatest value, but instead a side of the polygon boundary described
by the line 3x1 + x2 = 120 where x1 ∈ [16, 35]. Let:

S = {(x1 , x2 )|3x1 + x2 ≤ 120, x1 + 2x2 ≤ 160, x1 ≤ 35, x1 , x2 ≥ 0}

that is, S is the feasible region of the problem. Then for any value of x∗1 ∈ [16, 35]
and any value x∗2 so that 3x∗1 + x∗2 = 120, we will have z(x∗1 , x∗2 ) ≥ z(x1 , x2 ) for all
(x1 , x2 ) ∈ S. Since there are infinitely many values that x1 and x2 may take on, we
see this problem has an infinite number of alternative optimal solutions.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


358 Chapter 4. Simplex Method

Every point on this line


is an alternative optimal
solution.

Figure 4.2: An example of infinitely many alternative optimal solutions in a linear


programming problem. The level curves for z(x1 , x2 ) = 18x1 + 6x2 are parallel to one
face of the polygon boundary of the feasible region. Moreover, this side contains the
points of greatest value for z(x1 , x2 ) inside the feasible region. Any combination of
(x1 , x2 ) on the line 3x1 + x2 = 120 for x1 ∈ [16, 35] will provide the largest possible
value z(x1 , x2 ) can take in the feasible region S.

Exercise 4.1 Use the graphical method for solving linear programming problems
to solve the linear programming problem you defined in Exercise 2.9. ■

Based on the example in this section, we can modify our algorithm for finding
the solution to a linear programming problem graphically to deal with situations
with an infinite set of alternative optimal solutions (see Algorithm 2):

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.7 Problems with No Solution 359

Algorithm 2 Algorithm for Solving a Two Variable Linear Programming Problem


Graphically–Bounded Feasible Region Case

Algorithm for Solving a Linear Programming Problem Graphically


Bounded Feasible Region
1. Plot the feasible region defined by the constraints.
2. Plot the level sets of the objective function.
3. For a maximization problem, identify the level set corresponding the greatest
(least, for minimization) objective function value that intersects the feasible
region. This point will be at a corner.
4. The point on the corner intersecting the greatest (least) level set is a solution
to the linear programming problem.
5. If the level set corresponding to the greatest (least) objective
function value is parallel to a side of the polygon boundary next
to the corner identified, then there are infinitely many alternative
optimal solutions and any point on this side may be chosen as an
optimal solution.

Exercise 4.2 Modify the linear programming problem from Exercise 2.9 to obtain
a linear programming problem with an infinite number of alternative optimal
solutions. Solve the new problem and obtain a description for the set of alternative
optimal solutions. [Hint: Just as in the example, x1 will be bound between two
value corresponding to a side of the polygon. Find those values and the constraint
that is binding. This will provide you with a description of the form for any
x∗1 ∈ [a, b] and x∗2 is chosen so that cx∗1 + dx∗2 = v, the point (x∗1 , x∗2 ) is an alternative
optimal solution to the problem. Now you fill in values for a, b, c, d and v.] ■

4.7 Problems with No Solution


Recall for any mathematical programming problem, the feasible set or region is
simply a subset of Rn . If this region is empty, then there is no solution to the
mathematical programming problem and the problem is said to be over constrained.
In this case, we say that the problem is infeasible. We illustrate this case for linear
programming problems with the following example.

■ Example 4.3 Infeasible Problem Consider the following linear programming


problem:


max z(x1 , x2 ) = 3x1 + 2x2


 1 1
x1 + x2 ≤ 1


 s.t.
40 60




1 1 (4.2)
 x1 + x2 ≤ 1



 50 50
x1 ≥ 30






x2 ≥ 20

T.Abraha(PhD) @AKU, 2024 Linear Optimization


360 Chapter 4. Simplex Method

Solution: The level sets of the objective and the constraints are shown in Figure
4.3.

Figure 4.3: A Linear Programming Problem with no solution. The feasible region of
the linear programming problem is empty; that is, there are no values for x1 and x2
that can simultaneously satisfy all the constraints. Thus, no solution exists.

The fact that the feasible region is empty is shown by the fact that in Figure 4.3
there is no blue region–i.e., all the regions are gray indicating that the constraints
are not satisfiable.

Based on this example, we can modify our previous algorithm for finding the solution
to linear programming problems graphically (see Algorithm 3):

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.8 Problems with Unbounded Feasible Regions 361

Algorithm 3 Algorithm for Solving a Two Variable Linear Programming Problem


Graphically–Bounded Feasible Region Case

Algorithm for Solving a Linear Programming Problem Graphically


Bounded Feasible Region
1. Plot the feasible region defined by the constraints.
2. If the feasible region is empty, then no solution exists.
3. Plot the level sets of the objective function.
4. For a maximization problem, identify the level set corresponding the greatest
(least, for minimization) objective function value that intersects the feasible
region. This point will be at a corner.
5. The point on the corner intersecting the greatest (least) level set is a solution
to the linear programming problem.
6. If the level set corresponding to the greatest (least) objective
function value is parallel to a side of the polygon boundary next
to the corner identified, then there are infinitely many alternative
optimal solutions and any point on this side may be chosen as an
optimal solution.

4.8 Problems with Unbounded Feasible Regions


Consider the problem

min Z = 5X + 7Y
s.t. X + 3Y ≥ 6
5X + 2Y ≥ 10
Y ≤4
X, Y ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


362 Chapter 4. Simplex Method

As you can see, the feasible region is unbounded. In particular, from any point in
the feasible region, one can always find another feasible point by increasing the X
coordinate (i.e., move to the right in the picture). However, this does not necessarily
mean that the optimization problem is unbounded.
Indeed, the optimal solution is at the B, the extreme point in the lower left hand
corner.

Consider however, if we consider a different problem where we try to maximize


the objective

max Z = 5X + 7Y
s.t. X + 3Y ≥ 6
5X + 2Y ≥ 10
Y ≤4
X, Y ≥ 0

Solution: This optimization problem is unbounded! For example, notice that


the point (X, Y ) = (n, 0) is feasible for all n = 1, 2, 3, . . . ,. Then the objective function
Z = 5n + 0 follows the sequence 5, 10, 15, . . . ,, which diverges to infinity.
Again, we’ll tackle the issue of linear programming problems with unbounded
feasible regions by illustrating the possible outcomes using examples.

■ Example 4.4 Consider the linear programming problem below:



max


z(x1 , x2 ) = 2x1 − x2

 s.t. x 1 − x2 ≤ 1

(4.3)


 2x1 + x2 ≥ 6

x1 , x 2 ≥ 0

Solution: The feasible region and level curves of the objective function are shown
in Figure 4.4.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.8 Problems with Unbounded Feasible Regions 363

x1 − x2 = 1

∇z(x1 , x2 ) = (2, −1)

2x1 + x2 = 6

Figure 4.4: A Linear Programming Problem with Unbounded Feasible Region: Note
that we can continue to make level curves of z(x1 , x2 ) corresponding to larger and
larger values as we move down and to the right. These curves will continue to
intersect the feasible region for any value of v = z(x1 , x2 ) we choose. Thus, we can
make z(x1 , x2 ) as large as we want and still find a point in the feasible region that will
provide this value. Hence, the optimal value of z(x1 , x2 ) subject to the constraints
+∞. That is, the problem is unbounded.

The feasible region in Figure 4.4 is clearly unbounded since it stretches upward
along the x2 axis infinitely far and also stretches rightward along the x1 axis infinitely
far, bounded below by the line x1 − x2 = 1. There is no way to enclose this region by
a disk of finite radius, hence the feasible region is not bounded.
We can draw more level curves of z(x1 , x2 ) in the direction of increase (down
and to the right) as long as we wish. There will always be an intersection point
with the feasible region because it is infinite. That is, these curves will continue to
intersect the feasible region for any value of v = z(x1 , x2 ) we choose. Thus, we can
make z(x1 , x2 ) as large as we want and still find a point in the feasible region that
will provide this value. Hence, the largest value z(x1 , x2 ) can take when (x1 , x2 ) are
in the feasible region is +∞. That is, the problem is unbounded.
Just because a linear programming problem has an unbounded feasible region
does not imply that there is not a finite solution. We illustrate this case by modifying
example .

■ Example 4.5 — Continuation of Example . ex:LPUnboundFeasibleRegion2 Con-


sider the linear programming problem from Example with the new objective

T.Abraha(PhD) @AKU, 2024 Linear Optimization


364 Chapter 4. Simplex Method

function: z(x1 , x2 ) = (1/2)x1 − x2 . Then we have the new problem:


1

max z(x1 , x2 ) = x1 − x2



2




s.t. x1 − x2 ≤ 1

(4.4)
2x1 + x2 ≥ 6






x1 , x 2 ≥ 0

Solution: The feasible region, level sets of z(x1 , x2 ) and gradients are shown
in Figure 4.5. In this case note, that the direction of increase of the objective
function is away from the direction in which the feasible region is unbounded (i.e.,
downward). As a result, the point in the feasible region with the largest z(x1 , x2 )
value is (7/3, 4/3). Again this is a vertex: the binding constraints are x1 − x2 = 1
and 2x1 + x2 = 6 and the solution occurs at the point these two lines intersect.

x1 − x2 = 1

∇z(x1 , x2 ) = (2, −1)

2x1 + x2 = 6

� �
1
∇z(x1 , x2 ) = , −1
2
� �
7 4
,
3 3

Figure 4.5: A Linear Programming Problem with Unbounded Feasible Region and
Finite Solution: In this problem, the level curves of z(x1 , x2 ) increase in a more
“southernly” direction that in Example –that is, away from the direction in which
the feasible region increases without bound. The point in the feasible region with
largest z(x1 , x2 ) value is (7/3, 4/3). Note again, this is a vertex.

Based on these two examples, we can modify our algorithm for graphically solving a
two variable linear programming problems to deal with the case when the feasible
region is unbounded.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.8 Problems with Unbounded Feasible Regions 365

Algorithm 4 Algorithm for Solving a Linear Programming Problem Graphically–


Bounded and Unbounded Case

Algorithm for Solving a Two Variable Linear Programming Problem Graphically


1. Plot the feasible region defined by the constraints.
2. If the feasible region is empty, then no solution exists.
3. If the feasible region is unbounded goto Line 8. Otherwise, Goto Line 4.
4. Plot the level sets of the objective function.
5. For a maximization problem, identify the level set corresponding the greatest
(least, for minimization) objective function value that intersects the feasible
region. This point will be at a corner.
6. The point on the corner intersecting the greatest (least) level set is a solution
to the linear programming problem.
7. If the level set corresponding to the greatest (least) objective
function value is parallel to a side of the polygon boundary next
to the corner identified, then there are infinitely many alternative
optimal solutions and any point on this side may be chosen as an
optimal solution.
8. (The feasible region is unbounded): Plot the level sets of the objective
function.
9. If the level sets intersect the feasible region at larger and larger (smaller and
smaller for a minimization problem), then the problem is unbounded and
the solution is +∞ (−∞ for minimization problems).
10. Otherwise, identify the level set corresponding the greatest (least, for mini-
mization) objective function value that intersects the feasible region. This
point will be at a corner.
11. The point on the corner intersecting the greatest (least) level set is a solution
to the linear programming problem. If the level set corresponding to
the greatest (least) objective function value is parallel to a side of
the polygon boundary next to the corner identified, then there are
infinitely many alternative optimal solutions and any point on this
side may be chosen as an optimal solution.

Exercise 4.3 Does the following problem have a bounded solution? Why?




min z(x1 , x2 ) = 2x1 − x2

 s.t. x 1 − x2 ≤ 1

(4.5)


 2x1 + x2 ≥ 6

x1 , x 2 ≥ 0

[Hint: Use Figure 4.5 and Algorithm 4.] ■

Exercise 4.4 Modify the objective function in Example or Example to produce a


problem with an infinite number of solutions. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


366 Chapter 4. Simplex Method

Exercise 4.5 Modify the objective function in Exercise 4.3 to produce a mini-
mization problem that has a finite solution. Draw the feasible region and level
curves of the objective to “prove” your example works.
[Hint: Think about what direction of increase is required for the level sets of
z(x1 , x2 ) (or find a trick using Exercise 2.4).] ■

4.9 Formal Mathematical Statements


Vectors and Linear and Convex Combinations

Vectors: Vector n has n-elements and represents a point (or an arrow from
the origin to the point, denoting a direction) in Rn space (Euclidean or real space).
Vectors can be expressed as either row or column vectors.
Vector Addition: Two vectors of the same size can be added, componentwise, e.g.,
for vectors a = (2, 3) and b = (3, 2), a + b = (2 + 3, 3 + 2) = (5, 5).
Scalar Multiplication: A vector can be multiplied by a scalar k (constant) component-
wise. If k > 0 then this does not change the direction represented by the vector,
it just scales the vector.
Inner or Dot Product: Two vectors of the same size can be multiplied to produce
a real number. For example, ab = 2 ∗ 3 + 3 ∗ 2 = 10.

Linear Combination: The vector b is a linear combination of a1 , a2 , · · · , ak if


b = ki=1 λi ai for λ1 , λ2 , · · · , λk ∈ R. If λ1 , λ2 , · · · , λk ∈ R≥0 then b is a non-negative
P

linear combination of a1 , a2 , · · · , ak .

Convex Combination: The vector b is a convex combination of a1 , a2 , · · · , ak


if b = ki=1 λi ai , for λ1 , λ2 , · · · , λk ∈ R≥0 and ki=1 λi = 1 . For example, any convex
P P

combination of two points will lie on the line segment between the points.

Linear Independence: Vectors a1 , a2 , · · · , ak are linearly independent if the


following linear combination ki=1 λi ai = 0 implies that λi = 0, i = 1, 2, · · · , k. In R2
P

two vectors are only linearly dependent if they lie on the same line. Can you have
three linearly independent vectors in R2 ?

Spanning Set: Vectors a1 , a2 , · · · , ak span Rm is any vector in Rm can be


represented as a linear combination of a1 , a2 , · · · , ak , i.e., m
i=1 λi ai can represent any
P

vector in Rm .

Basis: Vectors a1 , a2 , · · · , ak form a basis of Rm if they span Rm and any smaller


subset of these vectors does not span Rm . Vectors a1 , a2 , · · · , ak can only form a
basis of Rm if k = m and they are linearly independent.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.9 Formal Mathematical Statements 367

Convex and Polyhedral Sets

Convex Set: Set S in Rn is a convex set if a line segment joining any pair of points
a1 and a2 in S is completely contained in ∫ , that is, λa1 + (1 − λ)a2 ∈ S, ∀λ ∈ [0, 1].

Hyperplanes and Half-Spaces: A hyperplane in Rn divides Rn into 2 half-


spaces (like a line does in R2 ). A hyperplane is the set {x : px = k}, where p is
the gradient to the hyperplane (i.e., the coefficients of our linear expression). The
corresponding half-spaces is the set of points {x : px ≥ k} and {x : px ≤ k}.

Polyhedral Set: A polyhedral set (or polyhedron) is the set of points in the
intersection of a finite set of half-spaces. Set S = {x : Ax ≤ b, x ≥ 0}, where A is an
m × n matrix, x is an n-vector, and b is an m-vector, is a polyhedral set defined by
m + n hyperplanes (i.e., the intersection of m + n half-spaces).
• Polyhedral sets are convex.
• A polytope is a bounded polyhedral set.
• A polyhedral cone is a polyhedral set where the hyperplanes (that define the
half-spaces) pass through the origin, thus C = {x : Ax ≤ 0} is a polyhedral
cone.
Edges and Faces: An edge of a polyhedral set S is defined by n − 1 hyperplanes,
and a face of S by one of more defining hyperplanes of S, thus an extreme point
and an edge are faces (an extreme point is a zero-dimensional face and an edge a
one-dimensional face). In R2 faces are only edges and extreme points, but in R3
there is a third type of face, and so on...

Extreme Points: x ∈ S is an extreme point of S if:


Definition 1: x is not a convex combination of two other points in S, that is, all line
segments that are completely in S that contain x must have x as an endpoint.
Definition 2: x lies on n linearly independent defining hyperplanes of S.
If more than n hyperplanes pass through an extreme points then it is a degenerate
extreme point, and the polyhedral set is considered degenerate. This just adds a bit
of complexity to the algorithms we will study, but it is quite common.

Unbounded Sets:

Rays: A ray in Rn is the set of points {x : x0 + λd, λ ≥ 0}, where x0 is the


vertex and d is the direction of the ray.

Convex Cone: A Convex Cone is a convex set that consists of rays emanating
from the origin. A convex cone is completely specified by its extreme directions. If C
is convex cone, then for any x ∈ C we have λx ∈ C, λ ≥ 0.

Unbounded Polyhedral Sets: If S is unbounded, it will have directions. d is


a direction of S only if Ax + λd ≤ b, x + λd ≥ 0 for all λ ≥ 0 and all x ∈ S. In other
words, consider the ray {x : x0 + λd, λ ≥ 0} in Rn , where x0 is the vertex and d is
the direction of the ray. d ̸= 0 is a direction of set S if for each x0 in S the ray
{x0 + λd, λ ≥ 0} also belongs to S.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


368 Chapter 4. Simplex Method

Extreme Directions: An extreme direction of S is a direction that cannot be


represented as positive linear combination of other directions of S. A non-negative
linear combination of extreme directions can be used to represent all other directions
of S. A polyhedral cone is completely specified by its extreme directions.

Let’s define a procedure for finding the extreme directions, using the following
LP’s feasible region. Graphically, we can see that the extreme directions should
follow the the s1 = 0 (red) line and the s3 = 0 (orange) line.

x2
max z = −5x1 − x2 5 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 1 s3 = 0
4
− x1 + 2x2 + s3 = 4
x1 , x2 , s1 , s2 , s3 ≥ 0.
3

1
s1 = 0
0
0 1 2 3 4 5 x1
x2
5 s2 = 0

s3 = 0
4

1 s1 = 0

0
0 1 2 3 4 5 x1

E.g., consider the s3 = 0 (orange) line, to find the extreme direction start at

T.Abraha(PhD) @AKU, 2024 Linear Optimization


4.9 Formal Mathematical Statements 369

extreme point (2,3) and find another feasible point on the orange line, say (4,4) and
subtract (2,3) from (4,4), which yields (2,1).

This is related to the slope in two-dimensions, as discussed in class, the rise is 1


and the run is 2. So this direction has a slope of 1/2, but this does not carry over
easily to higher dimensions where directions cannot be defined by a single number.

To find the extreme directions we can change the right-hand-side to b = 0, which


forms a polyhedral cone (in yellow), and then add the constraint x1 + x2 = 1. The
intersection of the cone and x1 + x2 = 1 form a line segment.

x2
max z = −5x1 − x2 4 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 0
3
− x1 + 2x2 + s3 = 0
x1 + x2 = 1
x1 , x2 , s1 , s2 , s3 ≥ 0. 2
s3 = 0

1
x1 + x2 = 1
s1 = 0
0
0 1 2 3 4 x1

x2
4 s2 = 0

2
s3 = 0

1
x1 + x2 = 1
s1 = 0
0
0 1 2 3 4 x1

Magnifying for clarity, and removing the s2 = 0 (teal) line, as it is redundant,


and marking the extreme points of the new feasible region, (4/5, 1/5) and (2/3, 1/3),
with red boxes, we have:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


370 Chapter 4. Simplex Method
x2

1
s3 = 0

x1 + x2 = 1

s1 = 0

0 x1
0 1 2

The extreme directions are thus (4/5, 1/5) and (2/3, 1/3).

Representation Theorem: Let x1 , x2 , · · · xk be the set of extreme points of


S, and if S is unbounded, d1 , d2 , · · · dl be the set of extreme directions. Then any
x ∈ S is equal to a convex combination of the extreme points and a non-negative
linear combination of the extreme directions: x = kj=1 λj xj + lj=1 µj dj , where
P P
Pk
j=1 λj = 1, λj ≥ 0, ∀j = 1, 2, · · · , k, and µj ≥ 0, ∀j = 1, 2, · · · , l.

x2
max z = −5x1 − x2 5 s2 = 0
s.t. x1 − 4x2 + s1 = 0
− x1 + x2 + s 2 = 1 s3 = 0
4
− x1 + 2x2 + s3 = 4
x1 , x2 , s1 , s2 , s3 ≥ 0.
3

1
s1 = 0
0
0 1 2 3 4 5 x1
1
Represent point ( 2 , 1) as a convex combination of the extreme points of the above
LP. Find λs to solve the following system of equations:
       
0 0 2 1/2 
λ1   + λ2   + λ3   = 
0 1 3 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5. Duality Theory

5.1 Duality Concepts


5.2 Primal-Dual Relationships
5.3 Economic Interpretation of Duality
5.4 Dual Simplex Method
In this section, we will review matrix concepts critical for the general understanding
of general linear programming algorithms.
Let x and y be two vectors in Rn . Recall we denote the dot product of the two
vectors as x · y.

5.5 Matrices
Recall an m × n matrix is a rectangular array of numbers, usually drawn from a field
such as R. We write an m × n matrix with values in R as A ∈ Rm×n . The matrix
consists of m rows and n columns. The element in the ith row and j th column of
A is written as Aij . The j th column of A can be written as A·j , where the · is
interpreted as ranging over every value of i (from 1 to m). Similarly, the ith row of
A can be written as Ai· . When m = n, then the matrix A is called square.

Definition 5.1 — Matrix Addition. If A and B are both in Rm×n , then C = A + B


is the matrix sum of A and B and

Cij = Aij + Bij for i = 1, . . . , m and j = 1, . . . , n (5.1)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


372 Chapter 5. Duality Theory

■ Example 5.1
       
1 2   5 6   1+5 2+6   6 8 
 + = = (5.2)
3 4 7 8 3+7 4+8 10 12

Definition 5.2 — Row/Column Vector. A 1 × n matrix is called a row vector,


and a m × 1 matrix is called a column vector. For the remainder of these notes,
every vector will be thought of column vector unless otherwise noted.

It should be clear that any row of matrix A could be considered a row vector in
Rn and any column of A could be considered a column vector in Rm .

Definition 5.3 — Matrix Multiplication. If A ∈ Rm×n and B ∈ Rn×p , then C = AB


is the matrix product of A and B and

Cij = Ai· · B·j (5.3)

Note, Ai· ∈ R1×n (an n-dimensional vector) and B·j ∈ Rn×1 (another n-dimensional
vector), thus making the dot product meaningful.

■ Example 5.2
      
1 2   5 6   1(5) + 2(7) 1(6) + 2(8)   19 22 
 = = (5.4)
3 4 7 8 3(5) + 4(7) 3(6) + 4(8) 43 50

Definition 5.4 — Matrix Transpose. If A ∈ Rm×n is a m × n matrix, then the


transpose of A dented AT is an m × n matrix defined as:

ATij = Aji (5.5)

■ Example 5.3
 T  
1 2  1 3 
 = (5.6)
3 4 2 4

The matrix transpose is a particularly useful operation and makes it easy to


transform column vectors into row vectors, which enables multiplication. For example,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.5 Matrices 373

suppose x is an n × 1 column vector (i.e., x is a vector in Rn ) and suppose y is an


n × 1 column vector. Then:
x · y = xT y (5.7)

Exercise 5.1 Let A, B ∈ Rm×n . Use the definitions of matrix addition and transpose
to prove that:

(A + B)T = AT + BT (5.8)

[Hint: If C = A + B, then Cij = Aij + Bij , the element in the (i, j) position of
matrix C. This element moves to the (j, i) position in the transpose. The (j, i)
position of AT + BT is ATji + BTji , but ATji = Aij . Reason from this point.] ■

Exercise 5.2 Let A, B ∈ Rm×n . Prove by example that AB ̸= BA; that is, matrix
multiplication is not commutative. [Hint: Almost any pair of matrices you pick
(that can be multiplied) will not commute.] ■

Exercise 5.3 Let A ∈ Rm×n and let, B ∈ Rn×p . Use the definitions of matrix
multiplication and transpose to prove that:

(AB)T = BT AT (5.9)

[Hint: Use similar reasoning to the hint in Exercise 5.1. But this time, note that
Cij = Ai· · B·j , which moves to the (j, i) position. Now figure out what is in the
(j, i) position of BT AT .] ■

Let A and B be two matrices with the same number of rows (so A ∈ Rm×n and
B ∈ Rm×p ). Then the augmented matrix [A|B] is:
 
a11 a12 ... a1n b11 b12 ... b1p
 

 a21 a22 ... a2n b21 b22 ... b2p 


.. .. .. .. .. ..  (5.10)
. . . . . .
 
 
 
am1 am2 . . . amn bm1 bm2 . . . bmp

Thus, [A|B] is a matrix in Rm×(n+p) .

■ Example 5.4 Consider the following matrices:


   
1 2  7
A= , b= 
3 4 8

Then [A|B] is:


 
1 2 7 
[A|B] = 
3 4 8

T.Abraha(PhD) @AKU, 2024 Linear Optimization


374 Chapter 5. Duality Theory

h i
Exercise 5.4 By analogy define the augmented matrix A
B . Note, this is not
a fraction. In your definition, identify the appropriate requirements on the
relationship between the number of rows and columns that the matrices must
have. [Hint: Unlike [A|B], the number of rows don’t have to be the same, since
your concatenating on the rows, not columns. There should be a relation between
the numbers of columns though.] ■

5.6 Special Matrices and Vectors


Definition 5.5 — Identify Matrix. The n × n identify matrix is:
 
1 0 ... 0
 

 0 1 ... 0 

In =  .. .. ..  (5.11)
. . .
 
 
 
0 0 ... 1

When it is clear from context, we may simply write I and omit the subscript
n.
Exercise 5.5 Let A ∈ Rn×n . Show that AIn = In A = A. Hence, I is an iden-
tify for the matrix multiplication operation on square matrices. [Hint: Do the
multiplication out long hand.] ■

Definition 5.6 — Standard Basis Vector. The standard basis vector ei ∈ Rn is:
 

ei = 0, 0, . . ., 1, 0, . . . , 0
 
| {z } | {z }
i−1 n−i−1

Note, this definition is only valid for n ≥ i. Further the standard basis vector ei
is also the ith row or column of In .

Definition 5.7 — Unit and Zero Vectors. The vector e ∈ Rn is the one vector
e = (1, 1, . . . , 1). Similarly, the zero vector 0 = (0, 0, . . . , 0) ∈ Rn . We assume that
the length of e and 0 will be determined from context.

Exercise 5.6 Let x ∈ Rn , considered as a column vector (our standard assumption).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.7 Matrices and Linear Programming Expression 375

Define:
x
y=
eT x
Show that eT y = yT e = 1. [Hint: First remember that eT x is a scalar value (it’s
e · x. Second, remember that a scalar times a vector is just a new vector with
each term multiplied by the scalar. Last, use these two pieces of information to
write the product eT y as a sum of fractions.] ■

5.7 Matrices and Linear Programming Expression


Consider the following system of equations:




a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2



.. (5.12)



 .

am1 x1 + am2 x2 + · · · + amn xn = bm

Then we can write this in matrix notation as:

Ax = b (5.13)

where Aij = aij for i = 1, . . . , m, j = 1, . . . , n and x is a column vector in Rn with


entries xj (j = 1, . . . , n) and b is a column vector in Rm with entries bi (i = 1 . . . , m).
Obviously, if we replace the equalities in Expression 5.12 with inequalities, we can
also express systems of inequalities in the form:

Ax ≤ b (5.14)

Using this representation, we can write our general linear programming problem
using matrix and vector notation. Expression 2.55 can be written as:

z(x) =cT x

max



s.t. Ax ≤ b (5.15)


Hx = r

For historical reasons, linear programs are not written in the general form of
Expression 5.15.

Definition 5.8 — Canonical Form. A maximization linear programming problem


is in canonical form if it is written as:

z(x) =cT x

max



s.t. Ax ≤ b (5.16)


x≥0

A minimization linear programming problem is in canonical form if it is

T.Abraha(PhD) @AKU, 2024 Linear Optimization


376 Chapter 5. Duality Theory

written as:

z(x) =cT x




min

s.t. Ax ≥ b (5.17)


x≥0

Definition 5.9 — Standard Form (Max Problem). A maximization linear pro-


gramming problem is in standard form if it is written as:

z(x) =cT x




max
s.t. Ax = b (5.18)



x≥0

R In the previous definition, a problem is in standard form as long as its


constraints have form Ax = b and x ≥ 0. The problem can be either a
maximization or minimization problem.

Theorem 5.1 Every linear programming problem in canonical form can be put into
standard form.
Proof. Consider the constraint corresponding to the first row of the matrix A:

a11 x1 + a12 x2 + · · · + a1n xn ≤ b1 (5.19)

Then we can add a new slack variable s1 so that s1 ≥ 0 and we have:

a11 x1 + a12 x2 + · · · + a1n xn + s1 = b1 (5.20)

This act can be repeated for each row of A (constraint) yielding m new variables
s1 , . . . , sm , which we can express as a row s. Then the new linear programming
problem can be expressed as:

z(x) =cT x

max



s.t. Ax + Im s = b


x, s ≥ 0
Using augmented matrices, we can express this as:
 T  
c x

max z(x) =





 0 s
x
  
s.t. [A|Im ] =b


 s
x

  
≥0



s
Clearly, this new linear programming problem is in standard form and any solution
maximizing the original problem will necessarily maximize this one. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.7 Matrices and Linear Programming Expression 377

■ Example 5.5 Consider the Toy Maker problem from Example 2.30. The problem
in canonical form is:



max z(x1 , x2 ) = 7x1 + 6x2

 s.t. 3x1 + x2 ≤ 120





x1 + 2x2 ≤ 160




 x1 ≤ 35

x1 ≥ 0






x2 ≥ 0

We can introduce slack variables s1 , s2 and s3 into the constraints (one for each
constraint) and re-write the problem as:



max z(x1 , x2 ) = 7x1 + 6x2

s.t. 3x1 + x2 + s1 = 120






x1 + 2x2 + s2 = 160




 x1 + s3 = 35

x1 ≥ 0






x2 ≥ 0

R We can deal with constraints of the form:

ai1 x1 + ai2 x2 + · · · + ain xn ≥ bi (5.21)

in a similar way. In this case we subtract a surplus variable si to obtain:

ai1 x1 + ai2 x2 + · · · + ain xn − si = bi

Again, we must have si ≥ 0.

Theorem 5.2 Every linear programming problem in standard form can be put into
canonical form.
Proof. Recall that Ax = b if and only if Ax ≤ b and Ax ≥ b. The second inequality
can be written as −Ax ≤ −b. This yields the linear programming problem:

z(x) =cT x

max



s.t. Ax ≤ b


(5.22)




− Ax ≤ −b


x≥0

Defining the appropriate augmented matrices allows us to convert this linear pro-
gramming problem into canonical form. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


378 Chapter 5. Duality Theory

Exercise 5.7 Complete the “pedantic” proof of the preceding theorem by defining
the correct augmented matrices to show that the linear program in Expression
5.22 is in canonical form. ■

The standard solution method for linear programming models (the Simplex
Algorithm) assumes that all variables are non-negative. Though this assumption
can be easily relaxed, the first implementation we will study imposes this restriction.
The general linear programming problem we posed in Expression 5.15 does not
(necessarily) impose any sign restriction on the variables. We will show that we can
transform a problem in which xi is unrestricted into a new problem in which all
variables are positive. Clearly, if xi ≤ 0, then we simply replace xi by −yi in every
expression and then yi ≥ 0. On the other hand, if we have the constraint xi ≥ li ,
then clearly we can write yi = xi − li and yi ≥ 0. We can then replace xi by yi + li
in every equation or inequality where xi appears. Finally, if xi ≤ ui , but xi may be
negative, then we may write yi = ui − xi . Clearly, yi ≥ 0 and we can replace xi by
ui − yi in every equation or inequality where xi appears.
If xi is unrestricted in sign and has no upper or lower bounds, then let xi = yi − zi
where yi , zi ≥ 0 and replace xi by (yi − zi ) in the objective, equations and inequalities
of a general linear programming problem. Since yi , zi ≥ 0 and may be given any values
as a part of the solution, clearly xi may take any value in R.
Exercise 5.8 Convince yourself that the general linear programming problem
shown in Expression 5.15 can be converted into canonical (or standard) form using
the following steps:
1. Every constraint of the form xi ≤ ui can be dealt with by substituting
yi = ui − xi , yi ≥ 0.
2. Every constraint of the form li ≤ xi can be dealt with by substituting
yi = xi − li , yi ≥ 0.
3. If xi is unrestricted in any way, then we can variables yi and zi so that
xi = yi − zi where yi , zi ≥ 0.
4. Any equality constraints Hx = r can be transformed into inequality con-
straints.
Thus, Expression 5.15 can be transformed to standard form. [Hint: No hint, the
hint is in the problem.] ■

5.8 Gauss-Jordan Elimination and Solution to Linear Equa-


tions
In this sub-section, we’ll review Gauss-Jordan Elimination as a solution method
for linear equations. We’ll use Gauss-Jordan Elimination extensively in the coming
chapters.

Definition 5.10 — Elementary Row Operation. Let A ∈ Rm×n be a matrix. Recall


Ai· is the ith row of A. There are three elementary row operations:
1. (Scalar Multiplication of a Row) Row Ai· is replaced by αAi· , where α ∈ R
and α ̸= 0.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.8 Gauss-Jordan Elimination and Solution to Linear Equations 379

2. (Row Swap) Row Ai· is swapped with Row Aj· for i ̸= j.


3. (Scalar Multiplication and Addition) Row Aj· is replaced by αAi· + Aj·
for α ∈ R and i ̸= j.

■ Example 5.6 Consider the matrix:


 
1 2
A=
3 4

In an example of scalar multiplication of a row by a constant, we can multiply the


second row by 1/3 to obtain:
 
1 2
B =  4
1 3

As an example of scalar multiplication and addition, we can multiply the second


row by (−1) and add the result to the first row to obtain:
   
0 2 − 34  0 2
3
C= 
4 = 4
1 3 1 3

We can then use scalar multiplication and multiply the first row by (3/2) to obtain:
 
0 1
D =  4
1 3

We can then use scalar multiplication and addition to multiply the first row by
(−4/3) add it to the second row to obtain:
 
0 1
E=
1 0

Finally, we can swap row 2 and row 1 to obtain:


 
1 0
I2 = 
0 1

Thus using elementary row operations, we have transformed the matrix A into the
matrix I2 . ■

Theorem 5.3 Each elementary row operation can be accomplished by a matrix


multiplication.

Proof. We’ll show that scalar multiplication and row addition can be accomplished
by a matrix multiplication. In Exercise 5.9, you’ll be asked to complete the proof for
the other two elementary row operations.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


380 Chapter 5. Duality Theory

Let A ∈ Rm×n . Without loss of generality, suppose we wish to multiply row 1 by


α and add it to row 2, replacing row 2 with the result. Let:
 
1 0 0 ... 0
 
α

1 0 . . . 0

E=
 .. .. ... 
 (5.23)
. . 0
 
0 0 0 ... 1

This is simply the identity Im with an α in the (2, 1) position instead of 0. Now
consider EA. Let A·j = [a1j , a2j , . . . , amj ]T be the j th column of A. Then :
    
1 0 0 . . . 0 a1j a1j
    
α

1 0 . . . 0  a2j  α(a1j ) + a2j 
  

 .... ..   ..  =  .. (5.24)
    
. . . 0 .   .


    
0 0 0 . . . 1 amj amj

That is, we have taken the first element of A·j and multiplied it by α and added it
to the second element of A·j to obtain the new second element of the product. All
other elements of A·j are unchanged. Since we chose an arbitrary column of A, it’s
clear this will occur in each case. Thus EA will be the new matrix with rows the
same as A except for the second row, which will be replaced by the first row of A
multiplied by the constant α and added to the second row of A. To multiply the ith
row of A and add it to the j th row, we would simply make a matrix E by starting
with Im and replacing the ith element of row j with α. ■

Exercise 5.9 Complete the proof by showing that scalar multiplication and row
swapping can be accomplished by a matrix multiplication. [Hint: Scalar multi-
plication should be easy, given the proof above. For row swap, try multiplying
matrix A from Example 5.6 by:
 

0 1
1 0

and see what comes out. Can you generalize this idea for arbitrary row swaps?] ■

Matrices of the kind we’ve just discussed are called elementary matrices. Theorem
5.3 will be important when we study efficient methods for solving linear programming
problems. It tells us that any set of elementary row operations can be performed
by finding the right matrix. That is, suppose I list 4 elementary row operations to
perform on matrix A. These elementary row operations correspond to for matrices
E1 , . . . , E4 . Thus the transformation of A under these row operations can be written
using only matrix multiplication as B = E4 · · · E1 A. This representation is much
simpler for a computer to keep track of in algorithms that require the transformation
of matrices by elementary row operations.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.9 Matrix Inverse 381

Definition 5.11 — Row Equivalence. Let A ∈ Rm×n and let B ∈ Rm×n . If there
is a sequence of elementary matrices E1 , . . . , Ek so that:

B = Ek · · · E1 A

then A and B are said to be row equivalent.

5.9 Matrix Inverse


Definition 5.12 — Invertible Matrix. Let A ∈ Rn×n be a square matrix. If there
is a matrix A−1 such that

AA−1 = A−1 A = In (5.25)

then matrix A is said to be invertible (or nonsingular) and A−1 is called its
inverse. If A is not invertible, it is called a singular matrix.

Exercise 5.10 Find the equivalent elementary row operation matrices for Example
5.6. There should be five matrices E1 , . . . , E5 corresponding to the five steps
shown. Show that the product of these matrices (in the correct order) yields the
identity matrix. Now compute the product B = E5 · · · E1 . Show that B = A−1
[Hint: You’ve done most of the work.] ■

The proof of the following theorem is beyond the scope of this class.

Theorem 5.4 If A ∈ Rn×n is a square matrix and X ∈ Rn×n so that XA = In ,


then:
1. AX = In
2. X = A−1
3. A and A−1 can be written as a product of elementary matrices.
The process we’ve illustrated in Example 5.6 is an instance of Gauss-Jordan
elimination and can be used to find the inverse of any matrix (or to solve systems of
linear equations). This process is summarized in Algorithm 5.

̸ 0, the process performed


Definition 5.13 — Pivoting. In Algorithm 5 when Aii =
in Steps 4 and 5 is called pivoting on element (i, i).

We illustrate this algorithm in the following example.

■ Example 5.7 Again consider the matrix A from Example 5.6. We can follow the
steps in Algorithm 5 to compute A−1 .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


382 Chapter 5. Duality Theory

Algorithm 5 Gauss-Jordan Elimination for Matrix Inversion

Gauss-Jordan Elimination
Computing an Inverse
1. Let A ∈ Rn×n . Let X = [A|In ].
2. Let i := 1
3. If Xii = 0, then use row-swapping on X to replace row i with a row j (j > i)
so that Xii ̸= 0. If this is not possible, then A is not invertible.
4. Replace Xi· by (1/Xii )Xi· . Element (i, i) of X should now be 1.
−X
5. For each j ̸= i, replace Xj· by Xiiji Xi· + Xj· .
6. Set i := i + 1.
7. If i > n, then A has been replaced by In and In has been replaced by A−1
in X. If i ≤ n, then goto Line 3.

Step 1
 
1 2 1 0 
X := 
3 4 0 1

Step 2 i := 1
Step 3 and 4 (i = 1) A11 = 1, so no swapping is required. Furthermore, replacing
X1· by (1/1)X1· will not change X.
Step 5 (i = 1) We multiply row 1 of X by −3 and add the result to row 2 of X
to obtain:
 
1 2 1 0 
X := 
0 −2 −3 1

Step 6 i := 1 + 1 = 2 and i = n so we return to Step 3.


Steps 3 (i = 2) The new element A22 = −2 ̸= 0. Therefore, no swapping is
required.
Step 4 (i = 2) We replace row 2 of X with row 2 of X multiplied by −1/2.
 
1 2 1 0 
X := 
0 1 32 − 12

Step 5 (i = 2) We multiply row 2 of X by −2 and add the result to row 1 of X


to obtain:
 
1 0 −2 1 
X := 
0 1 32 − 12

Step 6 (i = 2) i := 2 + 1 = 3. We now have i > n and the algorithm terminates.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.10 Solution of Linear Equations 383

Thus using Algorithm 5 we have computed:


 
−2 1 
A−1 =  3
2 − 12

This value should be the result you obtained in Exercise 5.10. ■

Exercise 5.11 Does the matrix:


 
1 2 3
 
A=
4 5 6

7 8 9

have an inverse? [Hint: Use Gauss-Jordan elimination to find the answer.] ■

Exercise 5.12 — Bonus. Implement Gauss-Jordan elimination in the programming


language of your choice. Illustrate your implementation by using it to solve the
previous exercise. [Hint: Implement sub-routines for each matrix operation. You
don’t have to write them as matrix multiplication, though in Matlab, it might
speed up your execution. Then use these subroutines to implement each step of
the Gauss-Jordan elimination.] ■

5.10 Solution of Linear Equations


Let A ∈ Rn×n , b ∈ Rn . Consider the problem:

Ax = b (5.26)

There are three possible scenarios:

1. There is a unique solution x = A−1 b.


2. There are no solutions; i.e., there is no vector x so that Ax = b.
3. There are infinitely many solutions; i.e., there is an infinite set X ⊆ Rn such
that for all x ∈ X , Ax = b.

We can use Gauss-Jordan elimination to find a solution x to Equation 5.26 when


it occurs. Instead of operating on the augmented X = [A|In ] in Algorithm 5, we use
the augmented matrix X := [A|b]. If the algorithm terminates with A replaced by
In , then the solution vector resides where b began. That is, X is transformed to
X := [In |A−1 b].
If the algorithm does not terminate with X := [In |A−1 b], then suppose the
algorithm terminates with X := [A′ |b′ ]. There is at least one row in A′ with all zeros.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


384 Chapter 5. Duality Theory

That is A′ has the form:


 
1 0 ... 0
 
0 1 . . . 0
 
. .. 
. ..
. . .

A′ =   (5.27)
0 0 . . . 0
 
 .. .. .. 
 
.
 . .

0 0 ... 0

In this case, there are two possibilities:


1. For every zero row in A′ , the corresponding element in b′ is 0. In this case,
there are an infinite number of alternative solutions to the problem expressed
by Equation 5.26.
2. There is at least one zero row in A′ whose corresponding element in b′ is not
zero. In this case, there are no solutions to Equation 5.26.
We illustrate these two possibilities in the following example:

■ Example 5.8 Consider the system of equations:

x1 + 2x2 + 3x3 = 7
4x1 + 5x2 + 6x3 = 8
7x1 + 8x2 + 9x3 = 9

This yields matrix:


 
1 2 3
 
A=
4 5 6

7 8 9

and right hand side vector b = [7, 8, 9]T . Applying Gauss-Jordan elimination in
this case yields:

1 0 −1 − 19
 
3 
20

X :=  0 1 2

3 
 (5.28)
0 0 0 0

Since the third row is all zeros, there are an infinite number of solutions. An easy
way to solve for this set of equations is to let x3 = t, where t may take on any value
in R. Then, row 2 of Expression 5.28 tells us that:
20 20 20
x2 + 2x3 = =⇒ x2 + 2t = =⇒ x2 = − 2t (5.29)
3 3 3
We then solve for x1 in terms of t. From row 1 of Expression 5.28 we have:
19 19 19
x1 − x3 = − =⇒ x1 − t = − =⇒ x1 = t − (5.30)
3 3 3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.11 Linear Combinations, Span, Linear Independence 385

Thus every vector in the set:


19 20
  
X= t − , − 2t, t : t ∈ R (5.31)
3 3
is a solution to Ax = b.
Conversely, suppose we have the problem:

x1 + 2x2 + 3x3 = 7
4x1 + 5x2 + 6x3 = 8
7x1 + 8x2 + 9x3 = 10

The new right hand side vector is b = [7, 8, 20]T . Applying Gauss-Jordan elimination
in this case yields:
 
1 0 −1 0
 
X :=  0 1 2 0 

 (5.32)
0 0 0 1

Since row 3 of X has a non-zero element in the b′ column, we know this problem
has no solution, since there is no way that we can find values for x1 , x2 and x3
satisfying:

0x1 + 0x2 + 0x3 = 1 (5.33)

Exercise 5.13 Solve the problem

x1 + 2x2 = 7
3x1 + 4x2 = 8

using Gauss-Jordan elimination. ■

5.11 Linear Combinations, Span, Linear Independence


Definition 5.14 Let x1 , . . . , xm be vectors in ∈ Rn and let α1 , . . . , αm ∈ R be
scalars. Then

α 1 x1 + · · · + α m xm (5.34)

is a linear combination of the vectors x1 , . . . , xm .

Clearly, any linear combination of vectors in Rn is also a vector in Rn .

Definition 5.15 — Span. Let X = {x1 , . . . , xm } be a set of vectors in ∈ Rn , then

T.Abraha(PhD) @AKU, 2024 Linear Optimization


386 Chapter 5. Duality Theory

the span of X is the set:

span(X ) = {y ∈ Rn |y is a linear combination of vectors in X } (5.35)

Definition 5.16 — Linear Independence. Let x1 , . . . , xm be vectors in ∈ Rn . The


vectors x1 , . . . , xm are linearly dependent if there exists α1 , . . . , αm ∈ R, not all
zero, such that

α 1 x1 + · · · + α m xm = 0 (5.36)

If the set of vectors x1 , . . . , xm is not linearly dependent, then they are linearly
independent and Equation 5.36 holds just in case αi = 0 for all i = 1, . . . , n.

Exercise 5.14 Consider the vectors x1 = [0, 0]T and x2 = [1, 0]T . Are these vectors
linearly independent? Explain why or why not. ■

■ Example 5.9 In R3 , consider the vectors:


     
1 1 0
     
1 , x2 = 0 , x3 = 1
x1 =      

0 1 1

We can show these vectors are linearly independent: Suppose there are values
α1 , α2 , α3 ∈ R such that

α1 x1 + α2 x2 + α3 x3 = 0

Then:
        
α1 α2 0 α1 + α2 0
        
α  +  0  α  = α + α  = 0
 1    3  1 3  
0 α2 α3 α2 + α3 0

Thus we have the system of linear equations:

α1 +α2 =0
α1 + α3 = 0
α2 + α3 = 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.11 Linear Combinations, Span, Linear Independence 387

which can be written as the matrix expression:


    
1 1 0 α1 0
    
1 0 1 α  = 0
   2  
0 1 1 α3 0

This is just a simple matrix equation, but note that the three vectors we are
focused on: x1 , x2 , and x3 , have become the columns of the matrix on the left-
hand-side. We can use Gauss-Jordan elimination to solve this matrix equation
yielding: α1 = α2 = α3 = 0. Thus these vectors are linearly independent. ■

R It is worthwhile to note that the zero vector 0 makes any set of vectors a
linearly dependent set.

Exercise 5.15 Prove the remark above. ■

Exercise 5.16 Show that the vectors


     
1 4 7
     
x1 = 
2 , x2 = 5 , x3 = 8
    

3 6 9

are not linearly independent. [Hint: Following the example, create a matrix
whose columns are the vectors in question and solve a matrix equation with
right-hand-side equal to zero. Using Gauss-Jordan elimination, show that a zero
row results and thus find the infinite set of values solving the system.] ■

R So far we have only given examples and exercises in which the number of
vectors was equal to the dimension of the space they occupied. Clearly, we
could have, for example, 3 linearly independent vectors in 4 dimensional space.
We illustrate this case in the following example.

■ Example 5.10 Consider the vectors:


   
1 4
   
x1 = 2 , x2 = 5
  

3 6

Determining linear independence requires us to solve the matrix equation:


   
1 4   0
 α
2 5  1  = 0
  
 
α2  
3 6 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


388 Chapter 5. Duality Theory

The augmented matrix:


 
1 4 0
 
 2 5 0 
 
4 6 0

represents the matrix equation. Using Gauss-Jordan elimination yields:


 
1 4 0
 
 0 1 0 
 
0 0 0

This implies that the following system of equations:

α1 + 4α2 = 0
α2 = 0
0α1 + 0α2 = 0

The last equation is tautological (true regardless of the values of α1 and α2 ). The
second equation implies α2 = 0. Using this value in first equation implies that
α1 = 0. This is the unique solution to the problem and thus the vectors are linearly
independent. ■

The following theorem is related to the example above. It’s proof is outside the
scope of the course. It should be taught in a Linear Algebra course (Math 436).
Proofs can be found in most Linear Algebra textbooks. Again, see [Str87] (Theorem
3.1) for a proof using vector spaces.
Theorem 5.5 Let x1 , . . . , xm ∈ Rn . If m > n, then the vectors are linearly dependent.

5.12 Basis
Definition 5.17 — Basis. Let X = {x1 , . . . , xm } be a set of vectors in Rn . The
set X is called a basis of Rn if X is a linearly independent set of vectors and
every vector in Rn is in the span of X . That is, for any vector w ∈ Rn we can
find scalar values α1 , . . . , αm such that
m
X
w= α i xi (5.37)
i=1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.12 Basis 389

■ Example 5.11 We can show that the vectors:


     
1 1 0
     
x1 = 1 , x2 = 0 , x3 = 1
    

0 1 1

form a basis of R3 . We already know that the vectors are linearly independent. To
show that R3 is in their span, chose an arbitrary vector in Rm : [a, b, c]T . Then we
hope to find coefficients α1 , α2 and α3 so that:
 
a
 
α1 x1 + α2 x2 + α3 x3 = 
b

Expanding this, we must find α1 , α2 and α3 so that:


       
α1 α2 0 a
       
α  +  0  + α  =  b 
 1    3  
0 α2 α3 c

Just as in Example 5.9, this can be written as an augmented matrix representing a


set of linear equations:
 
1 1 0 a
 
 1 0 1 b 
  (5.38)
0 1 1 c

Applying Gauss-Jordan elimination to the augmented matrix yields:


 
1 0 0 1/2 a + 1/2 b − 1/2 c
 
 

 0 1 0 −1/2 b + 1/2 a + 1/2 c 
 (5.39)
 
0 0 1 1/2 c + 1/2 b − 1/2 a

which clearly has a solution for all a, b, and c. Another way of seeing this is to
note that the matrix:
 
1 1 0
 
A= 1 0 1 

 (5.40)
0 1 1

is invertible. ■

The following theorem on the size of a basis in Rn is outside the scope of this
course.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


390 Chapter 5. Duality Theory

Theorem 5.6 If X is a basis of Rn , then X contains precisely n vectors.

Exercise 5.17 Show that the vectors


     
1 4 7
     
x1 = 
2 , x2 = 5 , x3 = 8
    

3 6 9

are not a basis for R3 . [Hint: See exercise 5.16.] ■

We will use the following lemma, which is related to the notion of the basis of
Rn when we come to our formal method for solving linear programming problems.
Lemma 5.1. Let {x1 , . . . , xm+1 } be a linearly dependent set of vectors in Rn and
let X = {x1 , . . . , xm } be a linearly independent set. Further assume that xm+1 ̸= 0.
Assume α1 , . . . , αm+1 are a set of scalars, not all zero, so that
m+1
X
αi xi = 0 (5.41)
i=1

For any j ∈ {1, . . . , m} such that αj =


̸ 0, if we replace xj in the set X with xm+1 ,
then this new set of vectors is linearly independent.
Proof. Clearly αm+1 cannot be zero, since we assumed that X is linearly independent.
Since xm+1 ̸= 0, we know there is at least one other αi (i = 1, . . . , m) not zero. Without
loss of generality, assume that αm = ̸ 0 (if not, rearrange the vectors to make this
true).
We can solve for xm+1 using this equation to obtain:
m
X αi
xm+1 = − xi (5.42)
i=1 αm+1
Suppose, without loss of generality, we replace xm by xm+1 in X . We now proceed
by contradiction. Assume this new set is linearly dependent. There there exists
constants β1 , . . . , βm−1 , βm+1 , not all zero, such that:
β1 x1 + · · · + βm−1 xm−1 + βm+1 xm+1 = 0. (5.43)
Again, we know that βm+1 = ̸ 0 since the set {x1 , . . . , xm−1 } is linearly independent
because X is linearly independent. Then using Equation 5.42 we see that:
m
!
X αi
β1 x1 + · · · + βm−1 xm−1 + βm+1 − xi = 0. (5.44)
i=1 αm+1
We can rearrange the terms in this sum as:
! !
βm+1 α1 βm+1 αm−1 αm
β1 − x1 + · · · + βm−1 − xm−1 − xm = 0 (5.45)
αm+1 αm+1 αm+1
The fact that αm = ̸ 0 and βm+1 ̸= 0 and αm+1 ̸= 0 means we have found γ1 , . . . , γm ,
not all zero, such that γ1 x1 + · · · + γm xm = 0, contradicting our assumption that X
was linearly independent. This contradiction completes the proof. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.13 Rank 391

R This lemma proves an interesting result. If X is a basis of Rm and xm+1 is


another, non-zero, vector in Rm , we can swap xm+1 for any vector xj in X
as long as when we express xm+1 as a linear combination of vectors in X the
coefficient of xj is not zero. That is, since X is a basis of Rm we can express:
m
X
xm+1 = αi xi
i=1

As long as αj ̸= 0, then we can replace xj with xm+1 and still have a basis of
Rm .

Exercise 5.18 Prove the following theorem: In Rn every set of n linearly indepen-
dent vectors is a basis. [Hint: Let X = {x1 , . . . , xn } be the set. Use the fact that
α1 x1 + · · · + αn xn = 0 has exactly one solution.] ■

5.13 Rank
Definition 5.18 — Row Rank. Let A ∈ Rm×n . The row rank of A is the size of
the largest set of row (vectors) from A that are linearly independent.

Exercise 5.19 By analogy define the column rank of a matrix. [Hint: You don’t
need a hint.] ■

Theorem 5.7 If A ∈ Rm×n is a matrix, then elementary row operations on A do


not change the row rank.

Proof. Denote the rows of A by R = {a1 , . . . , am }; i.e., ai = Ai· . Suppose we apply


elementary row operations on these rows to obtain a new matrix A′ . First, let us
consider row swapping. Obviously if the size of the largest subset of R that is linearly
independent has value k ≤ min{m, n} then swapping two rows will not change this.
Let us now consider the effect of multiplying ai by α and adding it to aj (where
neither row is the zero row). Then we replace aj with αai + aj . There are two
possibilities:
Case 1: There are no α1 , . . . , αm (not all zero) so that:

α1 a1 + · · · + αm am = 0

Suppose there is some β1 , . . . , βm not all zero so that:

β1 a1 + · · · + βj−1 aj−1 + βj (αai + aj ) + · · · + βm am = 0

In this case, we see immediately that:

β1 a1 + · · · + βj−1 aj−1 + βj αai + βj aj + · · · + βm am = 0

and thus if we have αk = βk for k ̸= i and αi = βi + βj α then we have an α1 , . . . , αm


(not all zero) so that:

α1 a1 + · · · + αm am = 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


392 Chapter 5. Duality Theory

which is a contradiction.
Case 2: Suppose that the size of the largest set of linearly independent rows is
k. Denote such a set by S = {as1 , . . . , ask }. There are several possibilities: (i) Both
ai and aj are in this set. In this case, we simply replace our argument above with
constants α1 through αk and the result is the same.
(ii) ai and aj are not in this set, in which case we know that there are αs1 , . . . , αsk
and βs1 , . . . , βsk so that:

αs1 a1 + · · · + αsk ak = ai
βs1 a1 + · · · + βsk ak = aj

But this implies that αai + aj can also be written as a linear combination of the
elements of S and thus the rank of A′ is no larger than the rank of A.
(iii) Now suppose that ai in the set S. Then there are constants αs1 , . . . , αsk so
that:

αs1 a1 + · · · + αsk ak = aj

Without loss of generality, suppose that ai = as1 then:

αai + αs1 ai + αs2 as2 + · · · + αsk ak = αai + aj

Again, this implies that αai + aj is still a linear combination of the elements of S
and so we cannot have increased the size of the largest linearly independent set of
vectors, nor could we have decreased it.
(iv) Finally, suppose that aj ∈ S. Again let aj = as1 . Then there are constants
αs1 , . . . , αsk

αs1 aJ + αs2 as2 + · · · + αsk ask = ai

Apply Lemma 5.1 to replace aj in S with some other row vector al . If l = i, then we
reduce to sub-case (iii). If l ̸= i, then we reduce to sub-case (ii).
Finally, suppose we multiply a row by α. This is reduces to the case of multiplying
row i by α − 1 and adding it to row i, which is covered in the above analysis. This
completes the proof. ■
We have the following theorem, whose proof is again outside the scope of this
course. There are very nice proofs available in [Str87].

Theorem 5.8 If A ∈ Rm×n is a matrix, then the row rank of A is equal to the
column rank of A. Further, rank(A) ≤ min{m, n}.

Lastly, we will not prove the following theorem, but it should be clear from all
the work we have done up to this point.

Theorem 5.9 If A ∈ Rm×m (i.e., A is a square matrix) and rank(A) = m, then A


is invertible.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.14 Solving Systems with More Variables than Equations 393

Definition 5.19 Suppose that A ∈ Rm×n and let m ≤ n. Then A has full row
rank if rank(A) = m.

■ Example 5.12 Again consider the matrix


 
1 2 3
 
A=
4 5 6

7 8 9

By now you should suspect that it does not have full row rank. Recall that the
application of Gauss-Jordan elimination transforms A into the matrix
 
1 0 −1
′  
A = 0 1 2 


0 0 0

No further transformation is possible. It’s easy to see that the first two rows of A′
are linearly independent. (Note that the first row vector has a non-zero element in
its first position and zero in it’s second position, while the second row vector has a
non-zero element in the second position and a zero element in the first position.
Because of this, it’s impossible to find any non-zero linear combination of those
vectors that leads to zero.) Thus we conclude the matrix A has the same rank as
matrix A′ which is 2. ■

Exercise 5.20 Change one number in matrix A in the preceding example to create
a new matrix B that as full row rank. Show that your matrix has rank 3 using
Gauss-Jordan elimination. ■

5.14 Solving Systems with More Variables than Equations


Suppose now that A ∈ Rm×n where m ≤ n. Let b ∈ Rm . Then the equation:

Ax = b (5.46)

has more variables than equations and is underdetermined and if A has full row
rank then the system will have an infinite number of solutions. We can formulate an
expression to describe this infinite set of solutions.
Sine A has full row rank, we may choose any m linearly independent columns of
A corresponding to a subset of the variables, say xi1 , . . . , xim . We can use these to
form the matrix

B = [A·i1 · · · A·im ] (5.47)

from the columns A·i1 , . . . , A·im of A, so that B is invertible. It should be clear


at this point that B will be invertible precisely because we’ve chosen m linearly

T.Abraha(PhD) @AKU, 2024 Linear Optimization


394 Chapter 5. Duality Theory

independent column vectors. We can then use elementary column operations to write
the matrix A as:

A = [B|N] (5.48)

The matrix N is composed of the n − m other columns of A not in B. We can


similarly sub-divide the column vector x and write:
 
xB
[B|N]   = b (5.49)
xN

where the vector xB are the variables corresponding to the columns in B and
the vector xN are the variables corresponding to the columns of the matrix N.

Definition 5.20 — Basic Variables. For historical reasons, the variables in the
vector xB are called the basic variables and the variables in the vector xN are
called the non-basic variables.

We can use matrix multiplication to expand the left hand side of this expression
as:

BxB + NxN = b (5.50)

The fact that B is composed of all linearly independent columns implies that
applying Gauss-Jordan elimination to it will yield an m × m identity and thus that B
is invertible. We can solve for basic variables xB in terms of the non-basic variables:

xB = B−1 b − B−1 NxN (5.51)

We can find an arbitrary solution to the system of linear equations by choosing values
for the variables the non-basic variables and solving for the basic variable values
using Equation 5.51.

Definition 5.21 (Basic Solution) When we assign xN = 0, the resulting solution


for x is called a basic solution and

xB = B−1 b (5.52)

■ Example 5.13 Consider the problem:


 
 x1  
1 2 3   7
 x2  =  
 (5.53)
4 5 6  
8
x3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.15 Solving Linear Programs with Matlab 395

Then we can let x3 = 0 and:


 
1 2
B= (5.54)
4 5

We then solvea :
     
−19
x 7
 1  = B−1   =  3 
20 (5.55)
x2 8 3

Other basic solutions could be formed by creating B out of columns 1 and 3 or


columns 2 and 3. ■

aThanks to Doug Mercer, who found a typo below that was fixed.

Exercise 5.21 Find the two other basic solutions in Example 5.13 corresponding to
 
2 3
B=
5 6

and
 
1 3
B=
4 6

In each case, determine what the matrix N is. [Hint: Find the solutions any way
you like. Make sure you record exactly which xi (i ∈ {1, 2, 3}) is equal to zero in
each case.] ■

5.15 Solving Linear Programs with Matlab


In this section, we’ll show how to solve Linear Programs using Matlab. Matlab
assumes that all linear programs are input in the following form:

z(x) =cT x



 min

s.t. Ax ≤ b






Hx = r (5.56)




 x≥l

x≤u

Here c ∈ Rn×1 , so there are n variables in the vector x, A ∈ Rm×n , b ∈ Rm×1 ,


H ∈ Rl×n and r ∈ Rl×1 . The vectors l and u are lower and upper bounds respectively
on the decision variables in the vector x.
The Matlab command for solving linear programs is linprog and it takes the
parameters:
1. c,
2. A,
3. b,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


396 Chapter 5. Duality Theory

4. H,
5. r,
6. l,
7. u
If there are no inequality constraints, then we set A = [] and b = [] in Matlab; i.e., A
and b are set as the empty matrices. A similar requirement holds on H and r if there
are no equality constraints. If some decision variables have lower bounds and others
don’t, the term -inf can be used to set a lower bound at −∞ (in l). Similarly, the
term inf can be used if the upper bound on a variable (in u) is infinity. The easiest
way to understand how to use Matlab is to use it on an example.

■ Example 5.14 Suppose I wish to design a diet consisting of Raman noodles and
ice cream. I’m interested in spending as little money as possible but I want to
ensure that I eat at least 1200 calories per day and that I get at least 20 grams of
protein per day. Assume that each serving of Raman costs $1 and contains 100
calories and 2 grams of protein. Assume that each serving of ice cream costs $1.50
and contains 200 calories and 3 grams of protein.
We can construct a linear programming problem out of this scenario. Let x1
be the amount of Raman we consume and let x2 be the amount of ice cream we
consume. Our objective function is our cost:

x1 + 1.5x2 (5.57)

Our constraints describe our protein requirements:

2x1 + 3x2 ≥ 20 (5.58)

and our calorie requirements (expressed in terms of 100’s of calories):

x1 + 2x2 ≥ 12 (5.59)

This leads to the following linear programming problem:



min


x1 + 1.5x2

 s.t. 2x1 + 3x2 ≥ 20

(5.60)


 x1 + 2x2 ≥ 12

x1 , x 2 ≥ 0

Before turning to Matlab, let’s investigate this problem in standard form. To


transform the problem to standard form, we introduce surplus variables s1 and s2
and our problem becomes:

min


x1 + 1.5x2

 s.t. 2x1 + 3x2 − s1 = 20

(5.61)


 x1 + 2x2 − s2 = 12

x1 , x 2 , s 1 , s 2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.15 Solving Linear Programs with Matlab 397

This leads to a set of two linear equations with four variables:

2x1 + 3x2 − s1 = 20x1 + 2x2 − s2 = 12

We can look at the various results from using Expression 5.51 and Definition 5.21.
Let:
   
2 3 −1 0  20
A= b=  (5.62)
1 2 0 −1 12

Then our vector of decision variables is:


 
x1
 
x 
2
x=
  (5.63)
 s1 
 
s2

We can use Gauss-Jordan elimination on the augmented matrix:


 

2 3 −1 0 20 
1 2 0 −1 12

Suppose we wish to have xB = [s1 s2 ]T . Then we would transform the matrix as:
 

−2 −3 1 0 −20 
−1 −2 0 1 −12

This would result in x1 = 0, x2 = 0 (because xN = [x1 x2 ]T and s1 = −20 and


s2 = −12. Unfortunately, this is not a feasible solution to our linear programming
problem because we require s1 , s2 ≥ 0. Alternatively we could look at the case
when xB = [x1 x2 ]T and xN = [s1 s2 ]T . Then we would perform Gauss-Jordan
elimination on the augmented matrix to obtain:
 

1 0 −2 3 4 
0 1 1 −2 4

That is, x1 = 4, x2 = 4 and of course s1 = s2 = 0. Notice something interesting:


   −1
−2 3  2 3
 = −
1 −2 1 2

This is not an accident, it’s because we started with a negative identity matrix
inside the augmented matrix. The point x1 = 4, x2 = 4 is a point of intersection,
shown in Figure 5.1. It also happens to be one of the alternative optimal solutions
of this problem. Notice in Figure 5.1 that the level curves of the objective function
are parallel to one of the sides of the boundary of the feasible region.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


398 Chapter 5. Duality Theory

Matlab Solution

Figure 5.1: The feasible region for the diet problem is unbounded and there are
alternative optimal solutions, since we are seeking a minimum, we travel in the
opposite direction of the gradient, so toward the origin to reduce the objective
function value. Notice that the level curves hit one side of the boundary of the
feasible region.

If we continued in this way, we could actually construct all the points of


intersection that make up the boundary of the feasible region. We’ll can do one
more, suppose xB = [x1 s1 ]T . Then we would use Gauss-Jordan elimination to
obtain:
 

1 2 0 −1 12 
0 1 1 −2 4

Notice there are now columns of the identity matrix in the columns corresponding
to s1 and x1 . That’s how we know we’re solving for s1 and x2 . We have x1 = 12
and s1 = 4. By definition x1 = s2 = 0. This corresponds to the point x1 = 12, x2 = 0
shown in Figure 5.1.
Let’s use Matlab to solve this problem. Our original problem is:

min


x1 + 1.5x2

 s.t. 2x1 + 3x2 ≥ 20



 x1 + 2x2 ≥ 12

x1 , x 2 ≥ 0

This is not in a form Matlab likes, so we change it by multiplying the constraints


by −1 on both sides to obtain:




min x1 + 1.5x2

 s.t. − 2x1 − 3x2 ≤ −20



 − x1 − 2x2 ≤ −12

x1 , x 2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.16 Introduction 399

Then we have:
 
1
c= 
1.5
 
−2 −3
A=
−1 −2
 
−20
b=
−12
H = r = []
 
0
l =   u = []
0

The Matlab code to solve this problem is shown in Figure 5.2


%%Solve the Diet Linear Programming Problem
c = [1 1.5]’;
A = [[-2 -3];...
[-1 -2]];
b = [-20 -12]’;
H = [];
r = [];
l = [0 0]’;
u = [];
[x obj] = linprog(c,A,b,H,r,l,u);

Figure 5.2: Matlab input for solving the diet problem. Note that we are solving a
minimization problem. Matlab assumes all problems are mnimization problems, so
we don’t need to multiply the objective by −1 like we would if we started with a
maximization problem.

The solution Matlab returns in the x variable is x1 = 3.7184 and x2 = 4.1877,


note this is on the line of alternative optimal solutions, but it is not at either end
of the line. I prefer to have a little less ice cream, so I’d rather have the alternative
optimal solution x1 = x2 = 4. ■

Exercise 5.22 In previous example, you could also have just used the problem in
standard form with the surplus variables and had A = b = [] and defined H and r
instead. Use Matlab to solve the diet problem in standard form. Compare your
results to Example 5.14 ■

5.16 Introduction
General LP Objective: minf (x); x ∈ F ⊂ R where F is the set of feasible solutions.

There are many types of problems that are included in linear programming. When a
problem can be put in terms of an objective function and competing constraints, it
might be solved with LP. Although we can’t solve sorting with LP, if you go to a

T.Abraha(PhD) @AKU, 2024 Linear Optimization


400 Chapter 5. Duality Theory

desert island and can only take one algorithm with you, take an LP solver1 . Here is
an example LP problem:

Example 1: Min-conductance of G
|∂S| P
ϕ(G) = mins∈∅,vol(s)≤
/ 1
vol(v) vol(s) where vol(s) = i∈s d
2

This problem can be translated to the following problem:



1, if i ∈ s
Obj: unknowns is x ∈ Rn ⇒ indicators ⋗1s ⇒
0, if i ∈
/s
T
min P x Lx
Note L is the laplacian, minimize such that the following conditions
i∈[a] xi ·di
hold:
1. x ∈ 0, 1 ⇔ xi (1 − xi ) = 0
2. xi ≥ 1
P

3. xi · di ≤ 12 di
P P

In effect the numerator in the minimization represents the boundary of S and


the denominator represents the volume of S if the constraints hold. As we saw in
last class, solving the former problem is NP-hard, so the LP version will be as well.
Some intuition why this is the case is that conditions 2 and 3 are linear inequalities,
but condition 1 has a polynomial of degree 2, which in general makes the problem
quite difficult.

5.16.1 Linear Programming General Form


Linear programming is a subset of optimization problems that have the following form:
Pn
minf (x) = i=1 ci xi s.t Ax ≥ b where A is a m x m matrix and b ∈ Rn .

This effectively is a summarization of a series of linear inequalities of the form:


a1,1 x1 + a1,2 x2 , , a1,n xn ≥ b1 .

Remarks:
1. maxf (x) = −min[−f (x)]
2. A1 x ≥ b ⇔ (−A1 )x ≤ −b1
3. A1 x = b1 ⇔ A1 x ≤ b1 & A1 x ≥ b1
So we can already see how the same problem can be written in different ways. The
general goal of the chapter is to be able to solve equations of this form via an
algorithm that works in polynomial time.

Example 2: Max flow in G


Problem: Given G, capacity k; s,t ∈ v, find f ∈ Rn that max|f | s.t. f represents
s → t flow.

LP Formulation:
max (s,u)∈E fs,u − (u,s)∈E fu,s = |f | s.t.
P P

1 Joke from lecture

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.17 Standard Form of LP 401

1. f obeys capacity constraints: 0 ≤ fu,v ≤ ku,v ∀(u, v) ∈ E


2. f obeys flow conservation: ∀u ∈/ s, t: (v,u)∈E fv,u − (u,v)∈E fu,v = 0
P P

Although we can put a max flow problem into an LP solver, that doesn’t mean
that we’ll get as good of a runtime as some of the other algorithms we discussed.

5.17 Standard Form of LP


One of the purposes of today’s lecture is to understand the structure of the problem
we’re trying to solve. Once we sufficiently understand the structure, then we can
design algorithms in later lectures to exploit it. Although not really mathematically
distinct, we have another way of expressing an LP problem that can be more
convenient. It’s nothing more than a rewriting of what we saw earlier, but this will
later become useful. This transformation is analogous to what we did going from
the adjacency matrix to the Laplacian, which had some nice properties.

Definition 5.22 Minimize cT x where c ∈ Rn , such that Ax = b and x ≥ 0.

At first glance this may appear easier than what we saw earlier. We’re just
looking at solutions to a system of linear equations where x is positive. However, it
turns out that the two problems are equivalent.
Reduction from general form to standard form:
We have two things to address. First we are now allowing only positive variables,
and second, now Ax = b is an equality. The reduction proceeds in two steps.
1. To accomodate the desire for all non-negative variables, for all variables xi ∈ R

in the general form, we create two new variables x+ i and xi with the definition
− −
xi ≜ x+ +
i − xi and xi , xi ≥ 0. Since any real number can be written as the
difference of two other positive real numbers, this is a safe step to take.

2. Plugging this into the general form would imply Ai x ≥ b ⇒ Ai (x+ i − xi ) ≥ b,
+ −
which we want now to be an equality. If we need Ai (xi − xi ) ≥ b this is

equivalent to saying we need Ai (x+ i − xi ) − ξi = b for some non-negative ξi .

This motivates the introduction of slack variables ξi ≜ Ai (x+ i − xi ) − b where
we constrain ξi ≥ 0.
The slack variables tell us how far off we are from the previous inequalities in
general form. If ξi = 0, we call the ith constraint tight. We have introduced a bunch
of equations which will be captured in the new Ax = b 2 . Note that our x, A and b in
Ax = b will not be the same as were for the general form, otherwise we could easily
see that the mathematical equivalence breaks down. Rather these objects will grow

in dimension. In particular x will now contain x+ i , xi , and ξi . The c in the objective
function may also grow, and in general the objective function will not look the same.
However, the key point is that we will guarantee that an optimal solution to the
new linear program will yield an optimal solution to the original. More specifically,
for each feasible solution x in the general form with objective value v, there is a
corresponding feasible solution x′ in standard form with the same objective value v
and vice versa.
2 Except for the non-negativity constraint on the variables which may be implicitly taken care of.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


402 Chapter 5. Duality Theory

5.18 Structure of Feasible Solutions for LP


Definition 5.23 x is a feasible solution if it satisfies the constraints (Ax ≥ b for
general form). F ≜ set of feasible x.

We proceed to get some intuition for what F looks like, meanwhile ignoring the
objective for now. First, recall we need to satisfy Ai x ≥ bi , ∀i . To break this down
we can think about A1 · x ≥ b1 where A1 is a vector of the first row of A and b1 is just
a real number. For 2 dimensional x the solutions lie on one side of a line which A1 is
perpendicular to, for 3 dimensions the solutions are confined to one side of a plane,
and in general the solutions will be a half-space on one side of a hyperplane with
A1 normal to the plane. To see why A1 is normal, consider the simplified A1 x ≥ 0.
We’ve defined a hyperplane through the origin and if x sits on the hyperplane, we
should get 0, thus A1 would need to be perpendicular to x and the hyperplane.
Because we must satisfy not just A1 x ≥ b1 but all Ai x ≥ bi , F will have to sit
inside all of these half-spaces or a polytope.

Definition 5.24 F is bounded if contained in a finite polytope and unbounded


otherwise.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


5.19 How to Find an Optimum Solution 403

5.18.1 Example of F in two dimensions

x2
4

2
x1 ≤ 2
1 x1 + x2 ≥ 1
x1 − x2 ≥ −1
0 x1
-1 0 1 2 3 4 5
x2 ≥ 0
-1

-2

-3

5.19 How to Find an Optimum Solution


5.19.1 Possibilities
Recall we need to minimize cT x such that x ∈ F . There are 3 possibilities for what
can happen:
1. If F is ∅, then there are no solutions, never mind optimal solutions. Constraints
contradict.
2. When F is unbounded, there may be no finite minimum. For example minimize
x1 subject to x1 ≤ 1. However, if we had to minimize x1 subject to x1 ≥ 1,
then F is still unbounded, but there is a finite solution.
3. There is an optimal solution x and we say v ∗ = cT x.

5.19.2 *Algorithm
This is really more of a mathematical procedure than an algorithm. We can take
a v ∈ R which is a guess that hopefully doesn’t over estimate the optimal value v ∗ .
Now we have cT x = v, which by virtue of fixing v constrains x to lie somewhere in
a hyperplane we’ll call Hv . Now, if v < v ∗ , Hv ∩ F , otherwise v ∗ would not be the
optimum. So then we may progress by guessing v + ϵ, effectively moving Hv up a
little bit. After some iterations, we will see Hv ∩ F , which would put v within ϵ of
v ∗ , and we are done.

5.19.3 Remarks
To think a bit more about this geometrically, c is normal to Hv and governs its
orientation. In the two dimensional figure above Hv would be a line with slope
perpendicular to c; as the line moves up in the x1 x2 plane it will eventually intersect
the yellow region F . c drives x in some particular direction for which F only permits
us to go so far.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


404 Chapter 5. Duality Theory

In general the intersection with F will occur at a vertex of the polytope, which
will become useful because this vertex is the solution to some system of equations.
This motivates our interest in solving systems of equations, which we now turn to.

5.20 Solving Systems of Equations


We’re interested in solving Ax = b where A is an m by n matrix. The core algorithm
we have for doing this is Gaussian Elimination.

5.20.1 Example

3x1 + 7x2 + 2x3 ... + 4xn = 9 (1)


2x1 + 4x2 + 6x3 ... + 2xn = 1 (2)
... (more equations)
x1 + 8x2 + 3x3 ... + 5xn = 0 (n)

Here we could solve the first equation for x1 , and then plug into the second
equation to eliminate x1 from it. Then we could solve the second equation for x2
and so on3 until we get to the nth equation, which we could then solve for xn and
plug back in to get the rest.

5.20.2 Remarks
First we wish to understand the complexity of the solution. Suppose we even start
with all integers defining the system, how many bits do we need to represent x? In
particular, we’d like to prove that the number of bits is not exponential in n because
if it is any algorithm to solve the system would also be exponential in runtime
simply because of the description complexity of the output. We’d also like to better
understand the cases when Gaussian Elimination fails.

5.21 Looking Ahead


Here we will just establish a list of facts from linear algebra that we’ll be using. If
any are a surprise, they would be worth reviewing.
If A is a square matrix, the following are equivalent:
1. A is invertible
2. Det(A) ̸= 0
3. Columns of A are linearly independent
4. Rows of A are linearly independent
5. Ax = b has a unique solution

Next time we’ll continue by proving the following:


Claim 4: The solution to Ax = b has polynomial length description in n if A is
composed of integers.
We’ll also start talking about duality and algorithms for linear programming.
3 If all goes well at least.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6. Interior-Point Methods

6.1 Basics of Interior-Point Methods


6.2 Barrier Methods
6.3 Path-Following Methods
6.4 Comparison with the Simplex Method
6.5 Linear Programming and Extreme Points
In this section we formalize the intuition we’ve obtained in all our work in two
dimensional linear programming problems. Namely, we noted that if an optimal
solution existed, then it occurred at an extreme point. For the remainder of this
chapter, assume that A ∈ Rm×n with full row rank and b ∈ Rm let

X = {x ∈ Rn : Ax ≤ b, x ≥ 0} (6.1)

be a polyhedral set over which we will maximize the objective function z(x1 , . . . , xn ) =
cT x, where c, x ∈ Rn . That is, we will focus on the linear programming problem:

cT x

max


P s.t. Ax ≤ b (6.2)



x≥0

Theorem 6.1 If Problem P has an optimal solution, then Problem P has an optimal
extreme point solution.

Proof. Applying the Cartheodory Characterization theorem, we know that any point
x ∈ X can be written as:
k
X l
X
x= λi xi + µi di (6.3)
i=1 i=1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


406 Chapter 6. Interior-Point Methods

where x1 , . . . xk are the extreme points of X and d1 , . . . , dl are the extreme directions
of X and we know that
k
X
λi = 1
i=1 (6.4)
λi , µi ≥ 0 ∀i
We can rewrite problem P using this characterization as:
k l
λi cT xi + µi cT di
X X
max
i=1 i=1
Xk (6.5)
s.t. λi = 1
i=1
λi , µi ≥ 0 ∀i
If there is some i such that cT di > 0, then we can simply choose µi as large as we
like, making the objective as large as we like, the problem will have no finite solution.
Therefore, assume that cT di ≤ 0 for all i = 1, . . . , l (in which case, we may simply
choose µi = 0, for all i). Since the set of extreme points x1 , . . . xk is finite, we can
simply set λp = 1 if cT xp has the largest value among all possible values of cT xi ,
i = 1, . . . , k. This is clearly the solution to the linear programming problem. Since xp
is an extreme point, we have shown that if P has a solution, it must have an extreme
point solution. ■

Corollary 6.1 Problem P has a finite solution if and only if cT di ≤ 0 for all
i = 1, . . . l when d1 , . . . , dl are the extreme directions of X.

Proof. This is implicit in the proof of the theorem. ■

Corollary 6.2 Problem P has alternative optimal solutions if there are at least
two extreme points xp and xq so that cT xp = cT xq and so that xp is the extreme
point solution to the linear programming problem.

Proof. Suppose that xp is the extreme point solution to P identified in the proof of
the theorem. Suppose xq is another extreme point solution with cT xp = cT xq . Then
every convex combination of xp and xq is contained in X (since X is convex). Thus
every x with form λxp + (1 − λ)xq and λ ∈ [0, 1] has objective function value:
λcT xp + (1 − λ)cT xq = λcT xp + (1 − λ)cT xp = cT xp
which is the optimal objective function value, by assumption. ■

Exercise 6.1 Let X = {x ∈ Rn : Ax ≤ b, x ≥ 0} and suppose that d1 , . . . dl are


the extreme directions of X (assuming it has any). Show that the problem:

min cT x
s.t. Ax ≤ b (6.6)
x≥0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.6 Algorithmic Characterization of Extreme Points 407

has a finite optimal solution if (and only if) cT dj ≥ 0 for k = 1, . . . , l. [Hint: Modify
the proof above using the Cartheodory characterization theorem.] ■

6.6 Algorithmic Characterization of Extreme Points


In the previous sections we showed that if a linear programming problem has a
solution, then it must have an extreme point solution. The challenge now is to
identify some simple way of identifying extreme points. To accomplish this, let us
now assume that we write X as:

X = {x ∈ Rn : Ax = b, x ≥ 0} (6.7)

Our work in the previous sections shows that this is possible. Recall we can separate
A into an m × m matrix B and an m × (n − m) matrix N and we have the result:

xB = B−1 b − B−1 NxN (6.8)

We know that B is invertible since we assumed that A had full row rank. If we
assume that xN = 0, then the solution

xB = B−1 b (6.9)

was called a basic solution (See Definition 5.21.) Clearly any basic solution satisfies
the constraints Ax = b but it may not satisfy the constraints x ≥ 0.

Definition 6.1 — Basic Feasible Solution. If xB = B−1 b and xN = 0 is a basic


solution to Ax = b and xB ≥ 0, then the solution (xB , xN ) is called basic feasible
solution.

Theorem 6.2 Every basic feasible solution is an extreme point of X. Likewise,


every extreme point is characterized by a basic feasible solution of Ax = b, x ≥ 0.

Proof. Since Ax = BxB + NxN = b this represents the intersection of m linearly


independent hyperplanes (since the rank of A is m). The fact that xN = 0 and
xN contains n − m variables, then we have n − m binding, linearly independent
hyperplanes in xN ≥ 0. Thus the point (xB , xN ) is the intersection of m + (n − m) = n
linearly independent hyperplanes. By Theorem 1.10 we know that (xB , xN ) must be
an extreme point of X.
Conversely, let x be an extreme point of X. Clearly x is feasible and by Theorem
1.10 it must represent the intersection of n hyperplanes. The fact that x is feasible
implies that Ax = b. This accounts for m of the intersecting linearly independent
hyperplanes. The remaining n − m hyperplanes must come from x ≥ 0. That
is, n − m variables are zero. Let xN = 0 be the variables for which x ≥ 0 are
binding. Denote the remaining variables xB . We can see that A = [B|N] and that
Ax = BxB + NxN = b. Clearly, xB is the unique solution to BxB = b and thus
(xB , xN ) is a basic feasible solution. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


408 Chapter 6. Interior-Point Methods

6.7 The Simplex Algorithm–Algebraic Form


In this section, we will develop the simplex algorithm algebraically. The idea behind
the simplex algorithm is as follows:
1. Convert the linear program to standard form.
2. Obtain an initial basic feasible solution (if possible).
3. Determine whether the basic feasible solution is optimal. If yes, stop.
4. If the current basic feasible solution is not optimal, then determine which non-
basic variable (zero valued variable) should become basic (become non-zero)
and which basic variable (non-zero valued variable) should become non-basic
(go to zero) to make the objective function value better.
5. Determine whether the problem is unbounded. If yes, stop.
6. If the problem doesn’t seem to be unbounded at this stage, find a new basic
feasible solution from the old basic feasible solution. Go back to Step 3.
Suppose we have a basic feasible solution x = (xB , xN ). We can divide the cost
vector c into its basic and non-basic parts, so we have c = [cB |cN ]T . Then the
objective function becomes:

cT x = cTB xB + cTN xN (6.10)

We can substitute Equation 6.8 into Equation 6.10 to obtain:


   
cT x = cTB B−1 b − B−1 NxN + cN xN = cTB B−1 b + cTN − cTB B−1 N xN (6.11)

Let J be the set of indices of non-basic variables. Then we can write Equation
6.11 as:
X 
z(x1 , . . . , xn ) = cTB B−1 b + cj − cTB B−1 A·j xj (6.12)
j∈J

Consider now the fact xj = 0 for all j ∈ J . Further, we can see that:

∂z
= cj − cTB B−1 A·j (6.13)
∂xj

This means that if cj − cTB B−1 A·j > 0 and we increase xj from zero to some new
value, then we will increase the value of the objective function. For historic reasons,
we actually consider the value cTB B−1 A·j − cj , called the reduced cost and denote it
as:
∂z
− = zj − cj = cTB B−1 A·j − cj (6.14)
∂xj

In a maximization problem, we chose non-basic variables xj with negative reduced


cost to become basic because, in this case, ∂z/∂xj is positive.
Assume we chose xj , a non-basic variable to become non-zero (because zj −cj < 0).
We wish to know which of the basic variables will become zero as we increase xj
away from zero. We must also be very careful that none of the variables become
negative as we do this.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.7 The Simplex Algorithm–Algebraic Form 409

By Equation 6.8 we know that the only current basic variables will be affected by
increasing xj . Let us focus explicitly on Equation 6.8 where we include only variable
xj (since all other non-basic variables are kept zero). Then we have:

xB = B−1 b − B−1 A·j xj (6.15)

Let b = B−1 b be an m × 1 column vector and let that aj = B−1 A·j be another m × 1
column. Then we can write:

x B = b − a j xj (6.16)

Let b = [b1 , . . . bm ]T and aj = [aj1 , . . . , ajm ], then we have:


       
x b a b − aj1 xj
 B 1   1   j1   1 
 x B 2   b2   bj 2   b − aj2 xj
       
 xj =  2

 =  − (6.17)
 ..   ..   ..  ..
 
 .   .   .  .
 
 
       
xBm bm bj m b m − aj m x j

We know (a priori) that bi ≥ 0 for i = 1, . . . , m. If aji ≤ 0, then as we increase xj ,


bi − aji ≥ 0 no matter how large we make xj . On the other hand, if aji > 0, then as
we increase xj we know that bi − aji xj will get smaller and eventually hit zero. In
order to ensure that all variables remain non-negative, we cannot increase xj beyond
a certain point.
For each i (i = 1, . . . , m) such that aji > 0, the value of xj that will make xBi goto
0 can be found by observing that:

x B i = b i − aj i x j (6.18)

and if xBi = 0, then we can solve:

bi
0 = bi − aji xj =⇒ xj = (6.19)
aj i

Thus, the largest possible value we can assign xj and ensure that all variables remain
positive is:
( )
bi
min : i = 1, . . . , m and aji > 0 (6.20)
aj i

Expression 6.20 is called the minimum ratio test. We are interested in which index i
is the minimum ratio.
Suppose that in executing the minimum ratio test, we find that xj = bk /ajk .
The variable xj (which was non-basic) becomes basic and the variable xBk becomes
non-basic. All other basic variables remain basic (and positive). In executing this
procedure (of exchanging one basic variable and one non-basic variable) we have
moved from one extreme point of X to another.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


410 Chapter 6. Interior-Point Methods

Theorem 6.3 If zj − cj ≥ 0 for all j ∈ J , then the current basic feasible solution is
optimal.

Proof. We have already shown in Theorem 6.1 that if a linear programming problem
has an optimal solution, then it occurs at an extreme point and we’ve shown in
Theorem 6.2 that there is a one-to-one correspondence between extreme points and
basic feasible solutions. If zj − cj ≥ 0 for all j ∈ J , then ∂z/∂xj ≤ 0 for all non-basic
variables xj . That is, we cannot increase the value of the objective function by
increasing the value of any non-basic variable. Thus, since moving to another basic
feasible solution (extreme point) will not improve the objective function, it follows
we must be at the optimal solution. ■

Theorem 6.4 In a maximization problem, if aji ≤ 0 for all i = 1, . . . , m, and


zj − cj < 0, then the linear programming problem is unbounded.

Proof. The fact that zj − cj < 0 implies that increasing xj will improve the value of
the objective function. Since aji < 0 for all i = 1, . . . , m, we can increase xj indefinitely
without violating feasibility (no basic variable will ever go to zero). Thus the objective
function can be made as large as we like. ■

R We should note that in executing the exchange of one basic variable and
one non-basic variable, we must be very careful to ensure that the resulting
basis consist of m linearly independent columns of the original matrix A. The
conditions for this are provided in Lemma 5.1. Specifically, we must be able
to write the column corresponding to xj , the entering variable, as a linear
combination of the columns of B so that:

α1 b1 + . . . αm bm = A·j (6.21)

and further if we are exchanging xj for xBi (i = 1, . . . , m), then αi ̸= 0.


We can see this from the fact that aj = B−1 A·j and therefore:

Baj = A·j

and therefore we have:

A·j = B·1 aj1 + · · · + B·m ajm

which shows how to write the column A·j as a linear combination of the
columns of B.

Exercise 6.2 Consider the linear programming problem given in Exercise 6.1.
Under what conditions should a non-basic variable enter the basis? State and
prove an analogous theorem to Theorem 6.3 using your observation. [Hint: Use
the definition of reduced cost. Remember that it is −∂z/∂xj .] ■

■ Example 6.1 Consider the Toy Maker Problem (from Example 2.30). The linear

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.7 The Simplex Algorithm–Algebraic Form 411

programming problem given in Equation 2.54 is:





max z(x1 , x2 ) = 7x1 + 6x2

s.t. 3x1 + x2 ≤ 120






x1 + 2x2 ≤ 160




 x1 ≤ 35

x1 ≥ 0






x2 ≥ 0

We can convert this problem to standard form by introducing the slack variables
s1 , s2 and s3 :




max z(x1 , x2 ) = 7x1 + 6x2

s.t. 3x1 + x2 + s1 = 120





x1 + 2x2 + s2 = 160





 x1 + s3 = 35

x1 , x 2 , s 1 , s 2 , s 3 ≥ 0

which yields the matrices


   
7 x1
       
6 x  3 1 1 0 0 120
   2
       
c = 0 x = 
 
 s1 
 A = 1 2 0 1 0 b = 160
  

   
0  s2  1 0 0 0 1 35
   
0 s3

We can begin with the matrices:


   
1 0 0 3 1
   
B=
0 1 0 N = 1 2
  

0 0 1 1 0

In this case we have:


   
s1   0  
  x 1   7
xB = 
s2  xN =
   cB = 0 cN =  
x2  
6
s3 0

and
   
120 3 1
−1   −1  
160 B N = 1 2
B b=   

35 1 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


412 Chapter 6. Interior-Point Methods

Therefore:
h i h i
cTB B−1 b = 0 cTB B−1 N = 0 0 cTB B−1 N − cN = −7 −6

Using this information, we can compute:

cTB B−1 A·1 − c1 = −7


cTB B−1 A·2 − c2 = −6

and therefore:
∂z ∂z
= 7 and =6
∂x1 ∂x2
Based on this information, we could chose either x1 or x2 to enter the basis and
the value of the objective function would increase. If we chose x1 to enter the basis,
then we must determine which variable will leave the basis. To do this, we must
investigate the elements of B−1 A·1 and the current basic feasible solution B−1 b.
Since each element of B−1 A·1 is positive, we must perform the minimum ratio
test on each element of B−1 A·1 . We know that B−1 A·1 is just the first column of
B−1 N which is:
 
3
−1  
B A·1 = 
1

Performing the minimum ratio test, we see have:


120 160 35
 
min , ,
3 1 1
In this case, we see that index 3 (35/1) is the minimum ratio. Therefore, variable
x1 will enter the basis and variable s3 will leave the basis. The new basic and
non-basic variables will be:
   
s1   0  
  s 3   0
xB = 
 s2  x N =
   cB = 0 cN =  
x2  
6
x1 7

and the matrices become:


   
1 0 3 0 1
   
B=
0 1 1 N = 0 2
  

0 0 1 1 0

Note we have simply swapped the column corresponding to x1 with the column
corresponding to s3 in the basis matrix B and the non-basic matrix N. We will do
this repeatedly in the example and we recommend the reader keep track of which

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.7 The Simplex Algorithm–Algebraic Form 413

variables are being exchanged and why certain columns in B are being swapped
with those in N.
Using the new B and N matrices, the derived matrices are then:
   
15 −3 1
−1   −1  
125 B N = −1 2
B b=   

35 1 0

The cost information becomes:


h i h i
cTB B−1 b = 245 cTB B−1 N = 7 0 cTB B−1 N − cN = 7 −6

using this information, we can compute:

cTB B−1 A·5 − c5 = 7


cTB B−1 A·2 − c2 = −6

Based on this information, we can only choose x2 to enter the basis to ensure
that the value of the objective function increases. We can perform the minimum
ration test to figure out which basic variable will leave the basis. We know that
B−1 A·2 is just the second column of B−1 N which is:
 
1
−1  
B A·2 = 
2

Performing the minimum ratio test, we see have:


15 125
 
min ,
1 2
In this case, we see that index 1 (15/1) is the minimum ratio. Therefore, variable
x2 will enter the basis and variable s1 will leave the basis. The new basic and
non-basic variables will be: The new basic and non-basic variables will be:
   
x2   6  
  s 3   0
xB = 
 s2  x N =
   cB = 0 cN =  
s1  
0
x1 7

and the matrices become:


   
1 0 3 0 1
   
B=
2 1 1 N = 0 0
  

0 0 1 1 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


414 Chapter 6. Interior-Point Methods

The derived matrices are then:


   
15 −3 1
−1   −1  
B b = 95 B N =  5 −2
  

35 1 0

The cost information becomes:


h i h i
cTB B−1 b = 335 cTB B−1 N = −11 6 cTB B−1 N − cN = −11 6

Based on this information, we can only choose s3 to (re-enter) the basis to


ensure that the value of the objective function increases. We can perform the
minimum ration test to figure out which basic variable will leave the basis. We
know that B−1 A·5 is just the fifth column of B−1 N which is:
 
−3
−1  
B A·5 = 
 5 

Performing the minimum ratio test, we see have:


95 35
 
min ,
5 1
In this case, we see that index 2 (95/5) is the minimum ratio. Therefore, variable
s3 will enter the basis and variable s2 will leave the basis. The new basic and
non-basic variables will be:
   
x2   6  
  s   0
 2  cB = 0 cN =  
xB = 
 s3  x N =

s1  
0
x1 7

and the matrices become:


   
1 0 3 0 1
   
B = 2 0 1 N = 1 0
  

0 1 1 0 0

The derived matrices are then:


   
72 6/10 −1/5
−1   −1  
19 B N =  1/5 −2/5
B b=   

16 −1/5 2/5

The cost information becomes:


h i h i
cTB B−1 b = 544 cTB B−1 N = 11/5 8/5 cTB B−1 N − cN = 11/5 8/5

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.7 The Simplex Algorithm–Algebraic Form 415

Since the reduced costs are now positive, we can conclude that we’ve obtained an
optimal solution because no improvement is possible. The final solution then is:
   
x2 72
∗   −1  
xB =  s3  = B b = 19
  

x1 16

Simply, we have x1 = 16 and x2 = 72 as we obtained in Example 2.30. The path


of extreme points we actually took in traversing the boundary of the polyhedral
feasible region is shown in Figure 6.1.

Figure 6.1: The Simplex Algorithm: The path around the feasible region is shown
in the figure. Each exchange of a basic and non-basic variable moves us along an
edge of the polygon in a direction that increases the value of the objective function.

Exercise 6.3 Assume that a leather company manufactures two types of belts:
regular and deluxe. Each belt requires 1 square yard of leather. A regular belt
requires 1 hour of skilled labor to produce, while a deluxe belt requires 2 hours of
labor. The leather company receives 40 square yards of leather each week and a
total of 60 hours of skilled labor is available. Each regular belt nets $3 in profit,
while each deluxe belt nets $5 in profit. The company wishes to maximize profit.
1. Ignoring the divisibility issues, construct a linear programming problem
whose solution will determine the number of each type of belt the company
should produce.
2. Use the simplex algorithm to solve the problem you stated above remember-
ing to convert the problem to standard form before you begin.
3. Draw the feasible region and the level curves of the objective function. Verify
that the optimal solution you obtained through the simplex method is the
point at which the level curves no longer intersect the feasible region in the
direction following the gradient of the objective function.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


416 Chapter 6. Interior-Point Methods

6.8 Simplex Method–Tableau Form


No one executes the simplex algorithm in algebraic form. Instead, several representa-
tions (tableau representations) have been developed to lesson the amount of writing
that needs to be done and to collect all pertinent information into a single table.
To see how a Simplex Tableau is derived, consider Problem P in standard form:

cT x




max
P s.t. Ax = b



x≥0

We can re-write P in an unusual way by introducing a new variable z and separating


A into its basic and non-basic parts to obtain:

max z
s.t. z − cTB xB − cTN xN = 0
(6.22)
BxB + NxN = b
xB , xN ≥ 0

From the second equation, it’s clear

xB + B−1 NxN = B−1 b (6.23)

We can multiply this equation by cT


B to obtain:

cTB xB + cTB B−1 NxN = cTB B−1 b (6.24)

If we add this equation to the equation z − cTB xB − cTN xN = 0 we obtain:

z + 0T xB + cTB B−1 NxN − cTN xN = cTB B−1 b (6.25)

Here 0 is the vector of zeros of appropriate size. This equation can be written as:
 
z + 0T xB + cTB B−1 N − cTN xN = cTB B−1 b (6.26)

We can now represent this set of equations as a large matrix (or tableau):

z xB xN RHS
z 1 0 cTB B−1 N − cTN cTB B−1 b Row 0
xB 0 1 B−1 N B−1 b Rows 1 through m

The augmented matrix shown within the table:


 
1 0 cTB B−1 N − cTN cTB B−1 b 
 (6.27)
0 1 B−1 N B−1 b

is simply the matrix representation of the simultaneous equations described by


Equations 6.23 and 6.26. We can see that the first row consists of a row of the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.8 Simplex Method–Tableau Form 417

first row of the (m + 1) × (m + 1) identity matrix, the reduced costs of the non-basic
variables and the current objective function values. The remainder of the rows consist
of the rest of the (m + 1) × (m + 1) identity matrix, the matrix B−1 N and B−1 b the
current non-zero part of the basic feasible solution.
This matrix representation (or tableau representation) contains all of the infor-
mation we need to execute the simplex algorithm. An entering variable is chosen
from among the columns containing the reduced costs and matrix B−1 N. Naturally,
a column with a negative reduced cost is chosen. We then chose a leaving variable
by performing the minimum ratio test on the chosen column and the right-hand-side
(RHS) column. We pivot on the element at the entering column and leaving row and
this transforms the tableau into a new tableau that represents the new basic feasible
solution.

■ Example 6.2 Again, consider the toy maker problem. We will execute the simplex
algorithm using the tableau method. Our problem in standard form is given as:




max z(x1 , x2 ) = 7x1 + 6x2

s.t. 3x1 + x2 + s1 = 120





x1 + 2x2 + s2 = 160





 x1 + s3 = 35

x1 , x 2 , s 1 , s 2 , s 3 ≥ 0

We can assume our initial basic feasible solution has s1 , s2 and s3 as basic variables
and x1 and x2 as non-basic variables. Thus our initial tableau is simply:
 
z x1 x2 s1 s2 s3 RHS
 
z 
 1 −7 −6 0 0 0 0 

 
s1 
 0 3 1 1 0 0 120   (6.28)
 
s2 
 0 1 2 0 1 0 160  
s3 0 1 0 0 0 1 35

Note that the columns have been swapped so that the identity matrix is divided
and B−1 N is located in columns 2 and 3. This is because of our choice of basic
variables. The reduced cost vector is in Row 0.
Using this information, we can see that either x1 or x2 can enter. We can
compute the minimum ratio test (MRT) next to the RHS column. If we chose x2
as the entering variable, then the MRT tells us s2 will leave. We put a box around
the element on which we will pivot:
 
z x1 x2 s1 s2 s3 RHS MRT (x2 )
 
z 
 1 −7 −6 0 0 0 0 

 
s1 
 0 3 1 1 0 0 120   120 (6.29)
 
s2 
 0 1 2 0 1 0 160   80
s3 0 1 0 0 0 1 35 −

T.Abraha(PhD) @AKU, 2024 Linear Optimization


418 Chapter 6. Interior-Point Methods

If we pivot on this element, then we transform the column corresponding to x2


into the identity column:
 
0
 
0
(6.30)
 
 
1
 
0

This process will correctly compute the new reduced costs and B−1 matrix as well
as the new cost information. The new tableau becomes:
 
z x1 x2 s1 s2 s3 RHS
 
z 
 1 −4 0 0 3 0 480 

 
s1 
 0 2.5 0 1 −0.5 0 40 
 (6.31)
 
x2 
 0 0.5 1 0 0.5 0 80 

s3 0 1 0 0 0 1 35

We can see that x1 is a valid entering variable, as it has a negative reduced cost
(−4). We can again place the minimum ratio test values on the right-hand-side of
the matrix to obtain:
 
z x1 x2 s1 s2 s3 RHS MRT (x1 )
 
z 
 1 −4 0 0 3 0 480 
 
s1 
 0 2.5 0 1 −0.5 0 40 
 16 (6.32)
 
x2 
 0 0.5 1 0 0.5 0 80 
 160
s3 0 1 0 0 0 1 35 35

We now pivot on the element we have boxed to obtain the new tableaua :
 
z x1 x2 s 1 s2 s3 RHS
 
z 
 1 0 0 1.6 2.2 0 544 

 
x1 
 0 1 0 0.4 −0.2 0 16 
 (6.33)
 
x2 
 0 0 1 −0.2 0.6 0 72 

s3 0 0 0 −0.4 0.2 1 19

All the reduced costs of the non-basic variables (s1 and s2 ) are positive and so this
is the optimal solution to the linear programming problem. We can also see that
this solution agrees with our previous computations on the Toy Maker Problem. ■
aThanks to Ethan Wright for catching a typo here.

6.9 Identifying Unboundedness


We have already identified a theorem for detecting unboundedness. Recall Theorem
6.4: In a maximization problem, if aji < 0 for all i = 1, . . . , m, and zj − cj < 0, then
the linear programming problem is unbounded.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.9 Identifying Unboundedness 419

This condition occurs when a variable xj should enter the basis because ∂z/∂xj >
0 and there is no blocking basis variable. That is, we can arbitrarily increase
the value of xj without causing any variable to become negative. We give an
example:

■ Example 6.3 Consider the Linear programming problem from Example :



max


z(x1 , x2 ) = 2x1 − x2

 s.t. x 1 − x2 ≤ 1



 2x1 + x2 ≥ 6

x1 , x 2 ≥ 0

We can convert this problem into standard form by adding a slack variable s1 and
a surplus variable s2 :




max z(x1 , x2 ) = 2x1 − x2

 s.t. x 1 − x2 + s 1 = 1



 2x1 + x2 − s2 = 6

x1 , x 2 , s 1 , s 2 ≥ 0

This yields the matrices:


   
2 x1    
   
−1 x  1 −1 1 0 1
2
c=
  x=  A=
    b= 
 0 
 
 s1 
  2 1 0 −1 6
0 s2

We have both slack and surplus variables, so the case when x1 = x2 = 0 is not a
valid initial solution. We can chose a valid solution based on our knowledge of the
problem. Assume that s1 = s2 = 0 and so we have:
   
1 −1 1 0
B= N=
2 1 0 −1

In this case we have:


       
x1 s1 2 0
xB =   x N =   c B =   c N =  
x2 s2 −1 0

This yields:
   
7/3 1/3 −1/3
B−1 b =   B−1 N = 
4/3 −2/3 −1/3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


420 Chapter 6. Interior-Point Methods

We also have the cost information:


10 h i h i
cB B−1 b = cB B−1 N = 43 − 13 cB B−1 N − cN = 43 − 13
3
Based on this information, we can construct the tableau for this problem as:
 
z x1 x2 s1 s2 RHS
4 −1 10
 
z  1 0 0 3 3 3

(6.34)
 
1 −1 7
 
x1 
 0 1 0 3 3 3


x2 −2 −1 4
0 0 1 3 3 3

We see that s2 should enter the basis because cB B−1 A·4 − c4 < 0. But the
column corresponding to s2 in the tabluau is all negative. Therefore there is no
minimum ratio test. We can let s2 become as large as we like and we will keep
increasing the objective function without violating feasibility.
What we have shown is that the ray with vertex
 
7/3
 
4/3
x0 =  


 0 
 
0

and direction:
 
1/3
 
1/3
d= 


 0 
 
1

is entirely contained inside the polyhedral set defined by Ax = b. This can be see
from the fact that:

xB = B−1 b − B−1 NxN

When applied in this case, we have:

xB = B−1 b − B−1 A·4 s2

We know that
 
1/3
−B−1 A·4 =  
1/3

We will be increasing s2 (which acts like λ in the definition of ray) and leaving s1
equal to 0. It’s now easy to see that the ray we described is contained entirely in
the feasible region. This is illustrated in the original constraints in Figure 6.2.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.9 Identifying Unboundedness 421

x1 − x2 = 1

∇z(x1 , x2 ) = (2, −1)

2x1 + x2 = 6

Extreme direction

Figure 6.2: Unbounded Linear Program: The existence of a negative column aj in


the simplex tableau for entering variable xj indicates an unbounded problem and
feasible region. The recession direction is shown in the figure.

Based on our previous example, we have the following theorem that extends Theorem
6.4:
Theorem 6.5 In a maximization problem, if aji ≤ 0 for all i = 1, . . . , m, and
zj − cj < 0, then the linear programming problem is unbounded furthermore, let
aj be the j th column of B−1 A·j and let ek be a standard basis column vector in
Rm×(n−m) where k corresponds to the position of j in the matrix N. Then the
direction:
 
−aj 
d= (6.35)
ek

is an extreme direction of the feasible region X = {x ∈ Rn : Ax = b, x ≥ 0}.

Proof. The fact that d is a direction is easily verified by the fact there is an extreme
point x = [xB xN ]T and for all λ ≥ 0 we have:
x + λd ∈ X (6.36)
Thus it follows from the proof of Theorem 1.9 that Ad ≤ 0. The fact that d ≥ 0 and
d ̸= 0 follows from our assumptions. Now, we know that we can write A = [B|N].
Further, we know that aj = B−1 A·j . Let us consider Ad:
 
−aj 
Ad = [B|N]  = −BB−1 A·j + Nek (6.37)
ek
Remember, ek is the standard basis vector that has have 1 precisely in the position
corresponding to column A·j in matrix N, so A·j = Nej . Thus we have:
−BB−1 A·j + Nek = −A·j + A·j = 0 (6.38)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


422 Chapter 6. Interior-Point Methods

Thus, Ad = 0. We can scale d so that eT d = 1. We know that n − m − 1 elements


of d are zero (because of ek ) and we know that Ad = 0. Thus d can be made to
represent the intersection of n-hyperplanes in Rn . Thus, d is an extreme point of the
polyhedron D = {d ∈ Rn : Ad ≤ 0, d ≥ 0, eT d = 1}. It follows from Theorem 1.11,
we know that d is an extreme direction of X. ■

Exercise 6.4 Consider the problem






min z(x1 , x2 ) = 2x1 − x2

 s.t. x 1 − x2 + s 1 = 1



 2x1 + x2 − s2 = 6

x1 , x 2 , s 1 , s 2 ≥ 0

Using the rule you developed in Exercise 6.2, show that the minimization problem
has an unbounded feasible solution. Find an extreme direction for this set. [Hint:
The minimum ratio test is the same for a minimization problem. Execute the
simplex algorithm as we did in Example 6.3 and use Theorem 6.5 to find the
extreme direction of the feasible region.] ■

6.10 Identifying Alternative Optimal Solutions


We saw in Theorem 6.3 that is zj − cj > 0 for all j ∈ J (the indices of the non-basic
variables), then the basic feasible solution generated by the current basis was optimal.
Suppose that zj − cj ≥ 0. Then we have a slightly different result:

Theorem 6.6 In Problem P for a given set of non-basic variables J , if zj − cj ≥ 0 for


all j ∈ J , then the current basic feasible solution is optimal. Further, if zj − cj = 0
for at least one j ∈ J , then there are alternative optimal solutions. Furthermore,
let aj be the j th column of B−1 A·j . Then the solutions to P are:

= B−1 b − aj xj




xB
 " ( )#
bi


xj ∈ 0, min : i = 1, . . . , m, aji > 0 (6.39)


 aj i



xr = 0, ∀r ∈ J , r ̸= j

Proof. It follows from the proof of Theorem 6.3 that the solution must be optimal as
∂z/∂xj ≤ 0 for all j ∈ J and therefore increasing and xj will not improve the value
of the objective function. If there is some j ∈ J so that zj − cj = 0, then ∂z/∂xj = 0
and we may increase the value of xj up to some point specified by the minimum ratio
test, while keeping other non-basic variables at zero. In this case, we will neither
increase nor decrease the objective function value. Since that objective function
value is optimal, it follows that the set of all such values (described in Equation 6.39)
are alternative optimal solutions. ■

■ Example 6.4 Let us consider the toy maker problem again from Example 2.30

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.10 Identifying Alternative Optimal Solutions 423

and 6.1 with our adjusted objective

z(x1 , x2 ) = 18x1 + 6x2 (6.40)

Now consider the penultimate basis from Example 6.1 in which we had as basis
variables x1 , s2 and x2 .
   
x1   18  
  s   0
 1  cB =  6  cN =  
xB = 
x2  xN =

s3  
0
s2 0

The matrices become:


   
3 1 0 1 0
   
B=
1 2 1 N = 0 0
  
1 0 0 0 1

The derived matrices are then:


   
35 0 1
−1   −1  
B b = 15 B N =  1 −3
  

95 −2 5

The cost information becomes:


h i h i
cTB B−1 b = 720 cTB B−1 N = 6 0 cTB B−1 N − cN = 6 0

This yields the tableau:


 
z x1 x2 s1 s2 s3 RHS
 
z 
 1 0 0 6 0 0 720  
 
s1 
 0 1 0 0 0 1 35 
 (6.41)
 
s2 
 0 0 1 1 0 −3 15  
s3 0 0 0 −2 1 5 95

Unlike example 6.1, the reduced cost for s3 is 0. This means that if we allow
s3 to enter the basis, the objective function value will not change. Performing the
minimum ratio test however, we see that s2 will still leave the basis:
 
z x1 x2 s1 s2 s3 RHS MRT (s3 )
 
z 
 1 0 0 6 0 0 720  
 
x1 
 0 1 0 0 0 1 35 
 35 (6.42)
 
x2 
 0 0 1 1 0 −3 15   −
s2 0 0 0 −2 1 5 95 19

T.Abraha(PhD) @AKU, 2024 Linear Optimization


424 Chapter 6. Interior-Point Methods

Therefore any solution of the form:

s3 ∈ [0, 19]
     
x1 35 1
      (6.43)
x  = 15 − −3 s3
 2    
s2 95 5

is an optimal solution to the linear programming problem. This precisely describes


the edge shown in Figure 6.3.

Figure 6.3: Infinite alternative optimal solutions: In the simplex algorithm, when
zj − cj ≥ 0 in a maximization problem with at least one j for which zj − cj = 0,
indicates an infinite set of alternative optimal solutions.

Exercise 6.5 Consider the diet problem we covered in Example 5.14. I wish
to design a diet consisting of Raman noodles and ice cream. I’m interested in
spending as little money as possible but I want to ensure that I eat at least 1200
calories per day and that I get at least 20 grams of protein per day. Assume that
each serving of Raman costs $1 and contains 100 calories and 2 grams of protein.
Assume that each serving of ice cream costs $1.50 and contains 200 calories and 3
grams of protein.
1. Develop a linear programming problem that will help me minimize the cost
of my food intake.
2. Remembering to transform the linear programming problem you found above
into standard form, use the simplex algorithm to show that this problem
has an infinite set of alternative optimal solutions.
3. At an optimal extreme point, find an expression for the set of infinite
alternative optimal exteme points like the one shown in Equation 6.43.
4. Plot the feasible region and the level curves of the objective function. High-
light the face of the polyhedral set on which the alternative optimal solutions

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.11 Degeneracy and Convergence 425

can be found.

6.11 Degeneracy and Convergence


In this section we give an example of degeneracy and its impact on the simplex
algorithm.

■ Example 6.5 Consider the modified form of the toy maker problem originally
stated in Example 1.38:



max 7x1 + 6x2

s.t. 3x1 + x2 ≤ 120






x1 + 2x2 ≤ 160




 x1 ≤ 35 (6.44)

7





 x1 + x2 ≤ 100



 4
x1 , x 2 ≥ 0

The polyhedral set and level curves of the objective function are shown Figure 6.4.

Figure 6.4: An optimization problem with a degenerate extreme point: The optimal
solution to this problem is still (16, 72), but this extreme point is degenerate, which
will impact the behavior of the simplex algorithm.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


426 Chapter 6. Interior-Point Methods

We can convert the problem to standard form by introducing slack variables:





max 7x1 + 6x2

s.t. 3x1 + x2 + s1 = 120






x1 + 2x2 + s2 = 160




 x1 + s3 = 35 (6.45)

7





 x1 + x2 + s4 = 100



 4
x1 , x 2 , s 1 , s 2 , s 3 , s 4 ≥ 0

Suppose we start at the extreme point where x1 = 35 and x2 = 15 and s2 = 95


and s4 = 23.75. In this case, the matrices are:
   
3 1 0 0 1 0
   
 1 2 1 0 0 0
B=  N=
   

 1 0 0 0 0 1
  
7/4 1 0 1 0 0
   
35 0 1
   
15  1 −3
−1 −1
B b=  B N=
   

95 −2 5 
   
95 5
4 −1 4
h i
cB B−1 b = 335 cB B−1 N − cN = 6 −11
The tableau representation is:
 
z x1 x2 s 1 s 2 s3 s4 RHS MRT (s3 )
 
z 
 1 0 0 6 0 −11 0 335 
 
x1  0 1 0 0 0 1 0 35  35



 (6.46)
x2 
 0 0 1 1 0 −3 0 15 
 −
0 0 0 −2 1
 
s2 
 5 0 95 
 19
95
s4 0 0 0 −1 0 5/4 1 4 19

From this, we see that the variable s3 should enter (because its reduce cost
is negative). In this case, there is a tie for the leaving variables: we see that
95/5 = 19 = (95/4)/(5/4), therefore, either s2 or s4 could be chosen as the leaving
variable. This is because we will move to a degenerate extreme point when s3
enters the basis.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.11 Degeneracy and Convergence 427

Suppose we choose s4 as the leaving variable. Then our tableau will become:
 
z x1 x2 s1 s2 s3 s4 RHS MRT (s1 )
 
z 
 1 0 0 −14/5 0 0 44/5 544 

 
x1  0 1 0 4/5 0 0 −4/5 16  20



 (6.47)
x2 
 0 0 1 −7/5 0 0 12/5 72 
 −
 
s2 
 0 0 0 2 1 0 −4 0 
 0
s3 0 0 0 −4/5 0 1 4/5 19 −

We now observe two things:


1. One of the basic variables (s2 ) is zero, even though it is basic. This is the
indicator of degeneracy at an extreme point.
2. The reduced cost of s1 is negative, indicating that s1 should enter the basis.
If we choose s1 as an entering variable, then using the minimum ratio test, we will
choose s2 as the leaving variable (by the minimum ratio test)a . Then the tableau
becomes:
 
z x1 x2 s 1 s2 s3 s4 RHS
 
z 
 1 0 0 0 7/5 0 16/5 544  
 
x1  0 1 0 0 −2/5 0 4/5 16 



 (6.48)
x2 
 0 0 1 0 7/10 0 −2/5 72  
 
s1 
 0 0 0 1 1/2 0 −2 0 

s3 0 0 0 0 2/5 1 −4/5 19

Notice the objective function value cB B−1 b has not changed, because we
really have not moved to a new extreme point. We have simply changed from one
representation of the degenerate extreme point to another. This was to be expected,
the fact that the minimum ratio was zero showed that we could not increase s1
and maintain feasibility. As such s1 = 0 in the new basic feasible solution. The
reduced cost vector cB B−1 N − cN has changed and we could now terminate the
simplex method. ■

aThe minimum ratio test still applies when bj = 0. In this case, we will remain at the same
extreme point.

Theorem 6.7 Consider Problem P (our linear programming problem). Let B ∈


Rm×m be a basis matrix corresponding to some set of basic variables xB . Let
b = B−1 b. If bj = 0 for some j = 1, . . . , m, then xB = b and xN = 0 is a degenerate
extreme point of the feasible region of Problem P .

Proof. At any basic feasible solutions we have chosen m variables as basic. This
basic feasible solution satisfies BxB = b and thus provides m binding constraints.
The remaining variables are chosen as non-basic and set to zero, thus xN = 0, which
provides n − m binding constraints on the non-negativity constraints (i.e., x ≥ 0).
If there is a basic variable that is zero, then an extra non-negativity constraint is
binding at that extreme point. Thus n + 1 constraints are binding and, by definition,

T.Abraha(PhD) @AKU, 2024 Linear Optimization


428 Chapter 6. Interior-Point Methods

the extreme point must be degenerate. ■

6.11.1 The Simplex Algorithm and Convergence


Using the work we’ve done in this chapter, we can now state the following implemen-
tation of the Simplex algorithm in matrix form.

Algorithm 6 The Matrix form of the Simplex Algorithm

Simplex Algorithm in Algebraic Form


1. Given Problem P in standard form with cost vector c, coefficient matrix
A and right hand side b, identify an initial basic feasible solution xB and
xN by any means. Let J be the set of indices of non-basic variables. If no
basic feasible solution can be found, STOP, the problem has no solution.
2. Compute the row vector cTB B−1 N − cTN . This vector contains zj − cj for
j ∈J.
3. If zj − cj ≥ 0 for all j ∈ J , STOP, the current basic feasible solution is
optimal. Otherwise, Goto 4.
4. Choose a non-basic variable xj with zj − cj < 0. Select aj from B−1 N. If
aj ≤ 0, then the problem is unbounded, STOP. Otherwise Goto 5.
5. Let b = B−1 b. Find the index i solving:
n o
min bi /aji : i = 1, . . . , m and aji ≥ 0

6. Set xBi = 0 and xj = bi /aji .


7. Update J and GOTO Step 2

Exercise 6.6 State the simplex algorithm in Tableau Form. [Hint: Most of the
simplex algorithm is the same, simply add in the row-operations executed to
compute the new reduced costs and B−1 N.] ■

Theorem 6.8 If the feasible region of Problem P has no degenerate extreme points,
then the simplex algorithm will terminate in a finite number of steps with an
optimal solution to the linear programming problem.

Sketch of Proof. In the absence of degeneracy, the value of the objective function
improves (increases in the case of a maximization problem) each time we exchange a
basic variable and non-basic variable. This is ensured by the fact that the entering
variable always has a negative reduced cost. There are a finite number of extreme
points for each polyhedral set, as shown in Lemma 1.6. Thus, the process of moving
from extreme point to extreme point of X, the polyhedral set in Problem P must
terminate with the largest possible objective function value. ■

6.12 Simplex Initialization


In the previous chapter, we introduced the Simplex Algorithm and showed how to
manipulate the A, B and N matrices as we execute it. In this chapter, we will

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.13 Artificial Variables 429

discuss the issue of finding an initial basic feasible solution to start execution of the
Simplex Algorithm.

6.13 Artificial Variables


So far we have investigated linear programming problems that had form:

max cT x
s.t. Ax ≤ b
x≥0

In this case, we use slack variables to convert the problem to:

max cT x
s.t. Ax + Im xs = b
x, xs ≥ 0

where xs are slack variables, one for each constraint. If b ≥ 0, then our initial basic
feasible solution can be x = 0 and xs = b (that is, our initial basis matrix is B = Im ).
We have also explored small problems where a graphical technique could be used to
identify an initial extreme point of a polyhedral set and thus an initial basic feasible
solution for the problem.
Suppose now we wish to investigate problems in which we do not have a problem
structure that lends itself to easily identifying an initial basic feasible solution. The
simplex algorithm requires an initial BFS to begin execution and so we must develop
a method for finding such a BFS.
For the remainder of this chapter we will assume, unless told otherwise, that we
are interested in solving a linear programming problem provided in Standard Form.
That is:

cT x




max
P  s.t. Ax = b (6.49)


x≥0

and that b ≥ 0. Clearly our work in Chapter 3 shows that any linear programming
problem can be put in this form.
Suppose to each constraint Ai· x = bi we associate an artificial variable xai . We
can replace constraint i with:

Ai· x + xai = bi (6.50)

Since bi ≥ 0, we will require xai ≥ 0. If xai = 0, then this is simply the original
constraint. Thus if we can find values for the ordinary decision variables x so that
xai = 0, then constraint i is satisfied. If we can identify values for x so that all the
artificial variables are zero and m variables of x are non-zero, then the modified
constraints described by Equation 6.50 are satisfied and we have identified an initial
basic feasible solution.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


430 Chapter 6. Interior-Point Methods

Obviously, we would like to penalize non-zero artificial variables. This can be


done by writing a new linear programming problem:

e T xa




min
P1 s.t. Ax + Im xa = b (6.51)



x, xa ≥ 0

R We can see that the artificial variables are similar to slack variables, but they
should have zero value because they have no true meaning in the original
problem P . They are introduced artificially to help identify an initial basic
feasible solution to Problem P .

Lemma 6.1. The optimal objective function value in Problem P1 is bounded below
by 0. Furthermore, if the optimal solution to problem P1 has xa = 0, then the values
of x form a feasible solution to Problem P .
Proof. Clearly, setting xa = 0 will produce an objective function value of zero. Since
e > 0, we cannot obtain a smaller objective function value. If at optimality we have
xa = 0, then we know that m of the variables in x are in the basis and the remaining
variables (in both x and xa ) are not in the basis and hence at zero. Thus we have
found a basic feasible solution to Problem P . ■

■ Example 6.6 Consider the following problem:

min x1 + 2x2
s.t. x1 + 2x2 ≥ 12
(6.52)
2x1 + 3x2 ≥ 20
x1 , x 2 ≥ 0

We can convert the problem to standard form by adding two surplus variables:

min x1 + 2x2
s.t. x1 + 2x2 − s1 = 12
(6.53)
2x1 + 3x2 − s2 = 20
x1 , x 2 , s 1 , s 2 ≥ 0

It’s not clear what a good basic feasible solution would be for this. Clearly, we
cannot set x1 = x2 = 0 because we would have s1 = −12 and s2 = −20, which is
not feasible. We can introduce two artificial variables (xa1 and xa2 ) and create a
new problem P1 .

min xa1 + xa2


s.t. x1 + 2x2 − s1 + xa1 = 12
(6.54)
2x1 + 3x2 − s2 + xa2 = 20
x1 , x2 , s1 , s2 , xa1 , xa2 ≥ 0

A basic feasible solution for our artificial problem would let xa1 = 12 and xa2 = 20.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.13 Artificial Variables 431

The pertinent matrices in this case are:


   
1 2 −1 0 1 0 12
A= b= 
2 3 0 −1 0 1 20
     
1 0 1 2 −1 0  12
B= N= B−1 b =  
0 1 2 3 0 −1 20
 
 
0
 
1 0
cB =   cN =  


1 0
 
0
h i
cTB B −1 b = 32 cTB B−1 N − cTN = 3 5 −1 −1
We can construct an initial tableau for this problem as:
 
z x1 x2 s1 s2 xa1 xa2 RHS
 
z  1 3 5 −1 −1 0 0 32 
(6.55)
 
 
xa1 
 0 1 2 −1 0 1 0 12 

xa2 0 2 3 0 −1 0 1 20

This is a minimization problem, so if zj − cj > 0, then entering xj will improve


(decrease) the objective value because ∂z/∂xj < 0. In this case, we could enter
either x1 or x2 to improve the objective value. Let’s assume we enter variable x1 .
Performing the minimum ratio test we see:
 
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x1 )
 
z  1 3 5 −1 −1 0 0 32 


  (6.56)
xa1 
 0 1 2 −1 0 1 0 12 
 12
xa2 0 2 3 0 −1 0 1 20 20/2 = 10

Thus xa2 leaves the basis and x1 enters. The new tableau becomes:
 
z x1 x2 s 1 s2 xa1 xa2 RHS
 
z  1 0 1/2 −1 1/2 0 −3/2 2 



 (6.57)
xa1 
 0 0 1/2 −1 1/2 1 −1/2 2 

x1 0 1 3/2 0 −1/2 0 1/2 10

T.Abraha(PhD) @AKU, 2024 Linear Optimization


432 Chapter 6. Interior-Point Methods

In this case, we see that x2 should enter the basis. Performing the minimum ratio
test, we obtain:
 

z x1 x2 s1 s2 xa1 xa2 RHS  MRT(x2 )
z 1 0 1/2 −1 1/2 0 −3/2 2 
 

  (6.58)
xa1 0 0 1/2 −1 1/2 1 −1/2 2 4
 
 
 
x1 0 1 3/2 0 −1/2 0 1/2 10 20/3

Thus we see that xa2 leaves the basis and we obtain:


 
z x1 x2 s1 s2 xa1 xa2 RHS
 
z  1 0 0 0 0 −1 −1 0 



 (6.59)
x2 
 0 0 1 −2 1 2 −1 4 
x1 0 1 0 3 −2 −3 2 4

At this point, we have eliminated both artificial variables from the basis and we
have identified and initial basic feasible solution to the original problem: x1 = 4,
x2 = 4, s1 = 0 and s2 = 0. The process of moving to a feasible solution in the
original problem is shown in Figure 6.5.

Figure 6.5: Finding an initial feasible point: Artificial variables are introduced into
the problem. These variables allow us to move through non-feasible space. Once
we reach a feasible extreme point, the process of optimizing Problem P1 stops.

We could now continue on to solve the initial problem we were given. At this
point, our basic feasible solution makes x2 and x1 basic variables and s1 and s2
non-basic variables. Our problem data are:
   
x2 s1
xB =   xN =  
x1 s2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.13 Artificial Variables 433

Note that we keep the basic variables in the order in which we find them at the
end of the solution to our first problem.
   
1 2 −1 0  12
A= b= 
2 3 0 −1 20
     
2 1 −1 0  4
B= N= B−1 b =  
3 2 0 −1 4
   
2 0
cB =   cN =  
1 0
h i
cTB B −1 b = 12 cTB B−1 N − cTN = −1 0
Notice that we don’t have to do a lot of work to get this information out of
the last tableau in Expression 6.59. The matrix B−1 is actually positioned in the
columns below the artificial variables. This is because we started with an identity
matrix in this position. As always, the remainder of the matrix holds B−1 N. Thus,
we can read this final tableau as:

 
z xB s xa RHS
z  
 1 0 0 −e 0  (6.60)
x2 
−1 −1 −1

0 I2 B N B B b
x1

In our case from Expression 6.59 we have:


 
z x1 x2 s1 s2 xa1 xa2 RHS
 
 1 0 0 0 0 −1 −1 0 
z 
 

 0 0 1 −2 1 2 −1 4  (6.61)
x2 
 

 0 1 0 3 −2 −3 2 4 
x1 
−1 −1 −1

− I2 B N B B b

We can use this information (and the reduced costs and objective function we
computed) to start our tableau to solve the problem with which we began. Our
next initial tableau will be:
 
z x1 x2 s1 s2 RHS
 
z  1 0 0 −1 0 12 



 (6.62)
x2 
 0 0 1 −2 1 4 

x1 0 1 0 3 −2 4

Notice all we’ve done is removed the artificial variables from the problem and
substituted the newly computed reduced costs for s1 and s2 (−1 and 0) into Row
0 of the tableau. We’ve also put the correct objective function value (12) into

T.Abraha(PhD) @AKU, 2024 Linear Optimization


434 Chapter 6. Interior-Point Methods

Row 0 of the right hand side. We’re now ready to solve the original problem.
However, since this is a minimization problem we can see we’re already at a point
of optimality. Notice that all reduced costs are either negative or zero, suggesting
that entering any non-basic variable will at best keep the objective function value
the same and at worst make the objective function worse. Thus we conclude that
an optimal solution for our original problem is x∗1 = x∗2 = 4 and s∗1 = s∗2 = 0. ■

Theorem 6.9 Let x∗ , xa ∗ be an optimal feasible solution to problem P1 . Problem


P is feasible if and only if xa ∗ = 0.

Proof. We have already proved in Lemma 6.1 that if xa ∗ = 0, then x∗ is a feasible


solution to P and thus P is feasible.
Conversely, suppose that P is feasible. Then P has at least one basic feasible
solution because the feasible region of P is a polyhedral set and we are assured by
Lemma 1.6 that this set has at least one extreme point. Now we can simply let
xa ∗ = 0 and x be this basic feasible solution to problem P . Then this is clearly an
optimal solution to problem P1 because it forces the objective value to its lower
bound (zero). ■

6.14 The Two-Phase Simplex Algorithm


The two phase simplex algorithm applies the results from the previous section to
develop an end-to-end algorithm for solving an arbitrary linear programming problem.

When we solve the Phase I problem, if xa∗ = ̸ 0 at optimality, then there is no


solution. If xa∗ = 0, then there are two possibilities:
1. The basis consists only of variables in the vector x; i.e., no auxiliary variable is
in the basis.
2. There is some auxiliary variable xai = 0 and this variable is in the basis; i.e., the
solution is degenerate and the degeneracy is expressed in an auxiliary variable.

6.14.1 Case I: xa = 0 and is out of the basis


If xa = 0 and there are not elements of the vector xa in the basis, then we have
identified a basic feasible solution x = [xB xN ]T . Simply allow the non-zero basic
elements (in x) to be xB and the remainder of the elements (not in xa ) are in xN .
We can then begin Phase II using this basic feasible solution.

6.14.2 Case II: xa = 0 and is not out of the basis


If xa = 0 and there is at least one artificial variable still in the basis, then we have
identified a degenerate solution to the Phase I problem. Theoretically we could
proceed directly to Phase II, assigning 0 coefficients to the artificial variables as
long as we ensure that no artificial variable ever becomes positive again. [BJS04]
notes that this can be accomplished by selective pivoting, however it is often more
efficient and simpler to remove the artificial variables completely from the basis
before proceeding to Phase II.
To remove the artificial variables from the basis, let us assume that we can

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.14 The Two-Phase Simplex Algorithm 435

Algorithm 7 Two-Phase Simplex Algorithm

Two-Phase Simplex Algorithm


1. Given a problem of the form of the general maximization (or minimization)
problem from Equation 2.55, convert it to standard form:

cT x




max
P  s.t. Ax = b


x≥0

with b ≥ 0.
2. Introduce auxiliary variables xa and solve the Phase I problem:

e T xa

min


P1 s.t. Ax + Im xa = b



x, xa ≥ 0

3. If xa∗ = 0, then an initial feasible solution has been identified. This solution
can be converted into a basic feasible solution as we discuss below. Otherwise,
there is no solution to Problem P .
4. Use the Basic Feasible solution identified in Step 3 to start the Simplex
Algorithm (compute the reduced costs given the c vector).
5. Solve the Phase II problem:

cT x

max


P s.t. Ax = b



x≥0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


436 Chapter 6. Interior-Point Methods

arrange Rows 1 − m of the Phase I simplex tableau as follows:

xB xBa xN xNa RHS


xB Ik 0 R1 R3 b (6.63)
xBa 0 Im−k R2 R4 0

Column swapping ensures we can do this, if we so desire. Our objective is to replace


elements in xBa (the basic artificial variables) with elements from xN non-basic,
non-artificial variables. Thus, we will attempt to pivot on elements in the matrix R2 .
Clearly since the Phase I coefficients of the variables in xN are zero, pivoting in these
elements will not negatively impact the Phase I objective value. Thus, if the element
in position (1, 1) is non-zero, then we can enter the variable xN1 into the basis and
remove the variable xBa1 . This will produce a new tableau with structure similar to
the one given in Equation 6.63 except there will be k + 1 non-artificial basic variables
and m − k − 1 artificial basic variables. Clearly if the element in position (1, 1) in
matrix R2 is zero, then we must move to a different element for pivoting.
In executing the procedure discussed above, one of two things will occur:
1. The matrix R2 will be transformed into Im−k or
2. A point will be reached where there are no longer any variables in xN that can
be entered into the basis because all the elements of R2 are zero.
In the first case, we have removed all the artificial variables from the basis in
Phase I and we can proceed to Phase II with the current basic feasible solution. In
the second case, we will have shown that:
 
Ik R1 
A∼ (6.64)
0 0

This shows that the m − k rows of A are not linearly independent of the first k rows
and thus the matrix A did not have full row rank. When this occurs, we can discard
the last m − k rows of A and simply proceed with the solution given in xB = b,
xN = 0. This is a basic feasible solution to the new matrix A in which we have
removed the redundant rows.

■ Example 6.7 Once execution of the Phase I simplex algorithm is complete, the
reduced costs of the current basic feasible solution must be computed. These can
be computed during Phase I by adding an additional “z” row to the tableau. In
this case, the initial tableau has the form:
 
z x1 x2 s1 s2 xa1 xa2 RHS
 
zII 
 1 −1 −2 0 0 0 0 0 
 
z 
 1 3 5 −1 −1 0 0 32 
 (6.65)
 
xa1 
 0 1 2 −1 0 1 0 12 

xa2 0 2 3 0 −1 0 1 20

The first row (zII ) is computed for the objective function:

x1 + 2x2 + 0s1 + 0s2 + 0xa1 + 0xa2 , (6.66)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.15 The Big-M Method 437

which is precisely the Phase II problem, except we never allow the artificial variables
xa1 and xa2 to carry into the Phase II problem. If we carry out the same steps we
did in Example 6.6 then we obtain the sequence of tableaux:
TABLEAU I
 
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x1 )
 
zII 
 1 −1 −2 0 0 0 0 0 
 
z 
 1 3 5 −1 −1 0 0 32 

 
xa1 
 0 1 2 −1 0 1 0 12 
 12
xa2 0 2 3 0 −1 0 1 20 20/2 = 10

TABLEAU II
 

z x1 x2 s1 s2 xa1 xa2 RHS  MRT(x1 )
zII  1 0 −1/2 0 −1/2 0 1/2 10 
 
 
z 

1 0 1/2 −1 1/2 0 −3/2 2 

 
xa1  0 0 1/2 −1 1/2 1 −1/2 2 4

 
 
x1 0 1 3/2 0 −1/2 0 1/2 10 20/3

TABLEAU III
 
z x1 x2 s1 s2 xa1 xa2 RHS
 
zII 
 1 0 0 −1 0 1 0 12 

 
z 
 1 0 0 0 0 −1 −1 0 
 
x2 
 0 0 1 −2 1 2 −1 4 
x1 0 1 0 3 −2 −3 2 4

We again arrive at the end of Phase I, but we are now prepared to immediately
execute Phase II with the tableau:
 
z x1 x2 s1 s2 RHS
 
zII 
 1 0 0 −1 0 12 

 
x2 
 0 0 1 −2 1 4 

x1 0 1 0 3 −2 4

In this case, we see that we are already at an optimal solution for a minimization
problem because the reduced costs are all less than or equal to zero. We also note
that since the reduced cost of the non-basic variable s2 is zero, there are alternative
optimal solutions. ■

6.15 The Big-M Method


The Big-M method is similar to the two-phase simplex algorithm, except that it
essentially attempts to execute Phase I and Phase II in a single execution of the
simplex algorithm.
In the Big-M method we modify problem P with artificial variables as we did in

T.Abraha(PhD) @AKU, 2024 Linear Optimization


438 Chapter 6. Interior-Point Methods

the two-phase simplex method but we also modify the objective function:

cT x − M eT x a




max
PM  s.t. Ax + Im xa = b (6.67)


x, xa ≥ 0

Here, M is a large positive constant, much larger than the largest coefficient in the
vector c. The value of M is usually chosen to be at least 100 times larger than the
largest coefficient in the original objective function.
R In the case of a minimization problem, the objective function in the Big-M
method becomes:

min cT x + M eT xa (6.68)

Exercise 6.7 In Exercise 2.4 we showed that every maximization problem can be
written as a minimization problem (and vice-versa). Show that Equation 6.68
follows by changing Problem PM into a minimization problem. ■

Lemma 6.2. Suppose that problem PM is unbounded. If problem P is feasible,


then it is unbounded.
Proof. If PM is unbounded, then there is some direction direction:
 
d
dM = 
da

to the feasible region of Problem PM . Furthermore, d ≥ 0 and da ≥ 0 and as a whole


dM ̸= 0. For this problem to be unbounded, it suffices that:

cT d − M eT da > 0 (6.69)

by Corollary 6.1.
Since we are free to choose M as large as we like, it follows that for a large value
of M , the left-hand-side of Inequality 6.69 must be negative unless da = 0.
The fact that dM is a direction implies that Ad + Im da = 0 and therefore Ad = 0.
We know further that d ≥ 0 and d ̸= 0. Thus it follows that we have identified
a direction d of the feasible region for Problem P . Furthermore, we know that
following this direction must result in an unbounded objective function for P since
the coefficients of the artificial variables are all negative. ■

R Lemma 6.2 tells us that if the Problem PM is unbounded, then we know that
there is no useful solution to Problem P . If Problem P has non-empty feasible
region, then it [Problem P ] is unbounded and thus there is no useful solution.
On the other hand, if Problem P has no feasible solution, there is still no
useful solution to Problem P . In either case, we may need to re-model the
problem to obtain a useful result.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.15 The Big-M Method 439

Theorem 6.10 If Problem P is feasible and has a finite solution. Then there is an
M > 0 so that the optimal solution to PM has all artificial variables non-basic and
thus the solution to Problem P can be extracted from the solution to Problem
PM .

Proof. By contrapositive applied to Lemma 6.2 we know that Problem PM is bounded.


By the Cartheodory Characterization theorem, we may enumerate the extreme points
of the feasible region of Problem PM : call these y1 , . . . , yk where:
 
x
y= 
xa

Let zM1 , . . . zMk be the objective function value of Problem PM at each of these
extreme points. Since P is feasible, at least one of these extreme points has xa = 0.
Let us sub-divide the extreme points in Ya = {y1 , . . . , yl } and Y = {yl+1 , . . . , yk }
where the points in Ya are the extreme points such that there is at least one non-zero
artificial variable and the points in Y are the extreme points where all artificial
variables are zero. At any extreme point in Y we know that at most m elements of
the vector x are non-zero and therefore, every extreme point in Y corresponds to
an extreme point of the original problem P . Since Problem P has a finite solution,
it follows that the optimal solution to problem P occurs at some point in Y, by
Theorem 6.1. Furthermore the value of the objective function for Problem P is
precisely the same as the value of the objective function of Problem PM for each
point in Y because xa = 0. Define:

zPmin = min{cT x : y = [x xa ]T } (6.70)


y∈Y

At each extreme point in Ya , the value of the objective function of Problem PM


is a function of the value of M , which we are free to choose. Therefore, choose M so
that:

max {cT x − M eT xa : y = [x xa ]T } < zPmin (6.71)


y∈Ya

Such a value exists for M since there are only a finite number of extreme points in
Y. Our choice of M ensures that the optimal solution to PM occurs at an extreme
point where xa = 0 and the x component of y is the solution to Problem P . ■

R Another way to look at the proof of this theorem is to think of defining M in


such a way so that at any extreme point where xa ̸= 0, the objective function
can always be made larger by moving to any extreme point that is feasible to
Problem P . Thus the simplex algorithm will move among the extreme points
seeking to leave those that are not feasible to Problem P because they are less
desirable.

Theorem 6.11 Suppose Problem P is infeasible. Then there is no value of M that


will drive the all the artificial variables from the basis of Problem PM .

Proof. If such an M existed, then xa = 0 and the resulting values of x represents a


feasible solution to Problem P , which contradicts our assumption that Problem P
was infeasible. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


440 Chapter 6. Interior-Point Methods

R The Big-M method is not particularly effective for solving real-world problems.
The introduction of a set of variables with large coefficients (M ) can lead
to round-off errors in the execution of the simplex algorithm. (Remember,
computers can only manipulate numbers in binary, which means that all
floating point numbers are restricted in their precision to the machine precision
of the underlying system OS. This is generally given in terms of the largest
amount of memory that can be addressed in bits. This has led, in recent times,
to operating system manufacturers selling their OS’s as “32 bit” or “64 bit.”
When solving real-world problems, these issue can become a real factor with
which to contend.
Another issue is we have no way of telling how large M should be without
knowing that Problem P is feasible, which is precisely what we want the Big-M
method to tell us! The general rule of thumb provided earlier will suffice.

■ Example 6.8 Suppose we solve the problem from Example 6.6 using the Big-M
method. Our problem is:

min x1 + 2x2
s.t. x1 + 2x2 ≥ 12
(6.72)
2x1 + 3x2 ≥ 20
x1 , x 2 ≥ 0

Again, this problem has standard form:

min x1 + 2x2
s.t. x1 + 2x2 − s1 = 12
(6.73)
2x1 + 3x2 − s2 = 20
x1 , x 2 , s 1 , s 2 ≥ 0

To execute the Big-M method, we’ll choose M = 300 which is larger than 100
times the largest coefficient of the objective function of the original problem. Our
new problem becomes:

min x1 + 2x2 + 300xa1 + 300xa2


s.t. x1 + 2x2 − s1 + xa1 = 12
(6.74)
2x1 + 3x2 − s2 + xa2 = 20
x1 , x2 , s1 , s2 , xa1 , xa2 ≥ 0

Since this is a minimization problem, we add M eT xa to the objective function.


Letting xa1 and xa2 be our initial basis, we have the series of tableaux:
TABLEAU I
 
z x1 x2 s1 s2 xa1 xa2 RHS MRT(x1 )
 
z 
 1 899 1498 −300 −300 0 0 9600 

 
xa1 
 0 1 2 −1 0 1 0 12 
 12
xa2 0 2 3 0 −1 0 1 20 20/2 = 10

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.16 The Single Artificial Variable Technique 441

TABLEAU II
 

z x1 x2 s1 s2 xa1 xa2 RHS  MRT(x1 )
z 1 0 299/2 −300 299/2 0 −899/2 610 
 

 
xa1 0 0 1/2 −1 1/2 1 −1/2 2 4
 
 
 
x1 0 1 3/2 0 −1/2 0 1/2 10 20/3

TABLEAU III
 
z x1 x2 s1 s2 xa1 xa2 RHS
 
z 
 1 0 0 −1 0 −299 −300 12  
 
x2 
 0 0 1 −2 1 2 −1 4 
x1 0 1 0 3 −2 −3 2 4

It is worth noting that this is essentially the same series of tableau we had when
executing the Two-Phase method, but we have to deal with the large M coefficients
in our arithmetic. ■

6.16 The Single Artificial Variable Technique


Consider the system of equations Ax = b that composes a portion of the feasible
region of Problem P . Suppose we chose some sub-matrix of A to be our basis matrix
B irrespective of whether the solution xB = B−1 b ≥ 0. If A has full row rank, then
clearly such a matrix exists. The resulting basic solution with basis B is called a
crash basis.
If b = B−1 b ≥ 0, then we have (by luck) identified an initial basic feasible solution
and we can proceed directly to execute the simplex algorithm as we did in Chapter
5. Suppose that b ̸≥ 0. Then we can form the new system:

Im xB + B−1 N + ya xa = b (6.75)

where xa is a single artificial variable and ya is a (row) vector of coefficients for xa


so that:

−1 if bi < 0
yai =  (6.76)
0 else

Lemma 6.3. Suppose we enter xa into the basis by pivoting on the row of the
simplex tableau with most negative right hand side. That is, xa is exchanged with
variable xBj having most negative value. Then the resulting solution is a basic
feasible solution to the constraints:
Im xB + B−1 N + ya xa = b
(6.77)
x, xa ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


442 Chapter 6. Interior-Point Methods

Exercise 6.8 Prove Lemma 6.3. ■

The resulting basic feasible solution can either be used as a starting solution for the
two-phase simplex algorithm with the single artificial variable or the Big-M method.
For the two-phase method, we would solve the Phase I problem:

min xa
s.t. Ax + B0 ya xa = b (6.78)
x, xa ≥ 0

where B0 is the initial crash basis we used to identify the coefficients of single artificial
variable. Equation 6.78 is generated by multiplying by B0 on both sides of the
inequalities.

■ Example 6.9 Suppose we were interested in the constraint set:

x1 + 2x2 − s1 = 12
2x1 + 3x2 − s2 = 20 (6.79)
x1 , x 2 , s 1 , s 2 ≥ 0

We can choose the crash basis:


 
−1 0 
 (6.80)
0 −1

corresponding to the variables s1 and s2 . Then we obtain the system:

− x1 − 2x2 + s1 = −12
(6.81)
− 2x1 − 3x2 + s2 = −20

That is, s1 = −12 and s2 = −20 is our basic solution, which is not feasible. We
append the artificial variable with coefficient vector ya = [−1 − 1]T (since both
elements of the right-hand-side are negative) to obtain:

− x1 − 2x2 + s1 − xa = −12
(6.82)
− 2x1 − 3x2 + s2 − xa = −20

If we build a tableau for Phase I with this current BFS, we obtain:


 
z x1 x2 s1 s2 xa RHS
 
z 
 1 0 0 0 0 −1 0 

 
s1 
 0 −1 −2 1 0 −1 −12 

s2 0 −2 −3 0 1 −1 −20

We enter the variable xa and pivot out variable s2 which has the most negative

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.17 Problems that Can’t be Initialized by Hand 443

right hand side to obtain the initial feasible tableau:


 
z x1 x2 s1 s2 xa RHS
 
z 
 1 2 3 0 1 0 20 

 
s1 
 0 1 1 1 −1 0 8 
xa 0 2 3 0 −1 1 20

We can now complete the Phase I process and execute the simplex algorithm until
we drive xa from the basis and reduce the right-hand-side to 0. At this point we
will have identified an initial basic feasible solution to the initial problem and we
can execute Phase II. ■

R Empirical evidence suggests that the single artificial variable technique is


not as efficient as the two-phase or Big-M method. Thus, it is presented as
an historical component of the development of efficient implementations of
the Simplex algorithm and not as a realistic technique for implementation in
production systems.

6.17 Problems that Can’t be Initialized by Hand


In these notes, we have so far considered very small problems that could easily be
solved graphically. Determining an initial basic feasible solution requires little more
than trial and error. These problems hardly require the use of Phase I methods.
To provide an example of a class of problems that can easily generate large
numbers of variables, we will consider a multi-period inventory control problem.
These problems can easily generate large numbers of variables and constraints, even
in small problems.

■ Example 6.10 McLearey’s Shamrock Emporium produces and sells shamrocks


for three days each year: the day before St. Patrick’s Day, St. Patrick’s Day and
the day after St. Patrick’s day. This year, McLearey had 10 shamrocks left over
from last year’s sale. This year, he expects to sell 100 shamrocks the day before St.
Patrick’s Day, 200 shamrocks the day of St. Patrick’s day and 50 shamrocks the
day after St. Patrick’s day.
It costs McLearey $2 to produce each Shamrock and $0.01 to store a Shamrock
over night. Additionally, McLearey can put shamrocks into long term storage for
$0.05 per shamrock.
McLearey can produce at most 150 shamrocks per day. Shamrocks must be
produced within two days of being sold (or put into long term storage) otherwise,
they wilt. Assuming that McLearey must meet his daily demand and will not start
producing Shamrocks early, he wants to know how many shamrocks he should
make and store on each day to minimize his cost.
To determine an answer to this problem, note that we have time as a parameter:
time runs over three days. Let xt be the number of shamrocks McLearey makes on
day t (t = 1, 2, 3) and let yt be the number of shamrocks McLearey stores on day
t. There is also a parameter y0 = 10, the number of shamrocks left over from last
year.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


444 Chapter 6. Interior-Point Methods

McLearey’s total cost (in cents) can the be written as:

z = 200x1 + 200x2 + 200x3 + y1 + y2 + 5y3 (6.83)

Additionally, there are some constraints linking production, storage and demand.
These constraints are depicted graphically in Figure 6.6. Multiperiod inventory
models operate on a principle of conservation of flow. Manufactured goods and
previous period inventories flow into the box representing each period. Demand
and next period inventories flow out of the box representing each period. This
inflow and outflow must be equal to account for all shamrocks produced. This is
depicted below:

Manufactured Causes Shamrocks to Enter Boxes

x1 x2 x3
Initial Inventory
Enters
y0 Day 1
y1 Day 2
y2 Day 3
y3
Remaining Final Inventory
Shamrocks Leaves
goto Inventory

d1 d2 d3
Demand Causes Shamrocks to Leave Boxes

Figure 6.6: Multiperiod inventory models operate on a principle of conservation


of flow. Manufactured goods and previous period inventories flow into the box
representing each period. Demand and next period inventories flow out of the box
representing each period. This inflow and outflow must be equal to account for all
production.

This means that:

yt−1 + xt = yt + dt ∀t (6.84)

This equation says that at period t the amount of inventory carried over from
period t − 1 plus amount of shamrocks produced in period t must be equal to
the total demand in period t plus any left over shamrocks at the end of period
t. Clearly we also know that xt ≥ 0 for all t, since you cannot make a negative
number of shamrocks. However, by also requiring yt ≥ 0 for all t, then we assert
that our inventory can never be negative. A negative inventory is a backorder.
Thus by saying that yt ≥ 0 we are also satisfying the requirement that McLearey
satisfy his demand in each period. Note, when t = 1, then yt−1 = y0 , which is the
parameter we defined above.
The complete problem describing McLearey’s situation is:




min 200x1 + 200x2 + 200x3 + y1 + y2 + 5y3

 s.t. yt−1 + xt = yt + dt ∀t ∈ {1, 2, 3}

(6.85)


 xt ≤ 150 ∀t ∈ {1, 2, 3}

xt , yt ≥ 0 ∀t ∈ {1, 2, 3}

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.17 Problems that Can’t be Initialized by Hand 445

Constraints of the form xt ≤ 150 for all t come from the fact that McLearey can
produce at most 150 shamrocks per day.
This simple problem now has 6 variables and 6 constraints plus 6 non-negativity
constraints and it is non-trivial to determine a good initial basic feasible solution,
especially since the problem contains both equality and inequality constraints.
A problem like this can be solved in Matlab (see Chapter ??.5.15), or on a
commercial or open source solver like the GNU Linear Programming Kit (GLPK,
http://www.gnu.org/software/glpk/). In Figure 6.7 we show an example model
file that describes the problem. In Figure 6.8, we show the data section of the
GLPK model file describing McLearey’s problem. Finally, figure 6.9 shows a
portion of the output generated by the GLPK solver glpsol using this model.
Note that there is no inventory in Year 3 (because it is too expensive) even though
it might be beneficial to McLearey to hold inventory for next year. This is because
the problem has no information about any other time periods and so, in a sense,
the end of the world occurs immediately after period 3. This type of end of the
world phenomenon is common in multi-period problems.
#
# This finds the optimal solution for McLearey
#

/* sets */
set DAY;
set DAY2;

/* parameters */
param makeCost {t in DAY};
param holdCost {t in DAY};
param demand {t in DAY};
param start;
param S;

/* decision variables: */
var x {t in DAY} >= 0;
var y {t in DAY} >= 0;

/* objective function */
minimize z: sum{t in DAY} (makeCost[t]*x[t]+holdCost[t]*y[t]);

/* Flow Constraints */
s.t. FLOWA : x[1] - y[1] = demand[1] - start;
s.t. FLOWB{t in DAY2} : x[t] + y[t-1] - y[t] = demand[t];

/* Manufacturing constraints */
s.t. MAKE{t in DAY} : x[t] <= S;

end;

Figure 6.7: Input model to GLPK describing McLearey’s Problem

T.Abraha(PhD) @AKU, 2024 Linear Optimization


446 Chapter 6. Interior-Point Methods
/* data section */
data;
set DAY := 1 2 3;
set DAY2 := 2 3;

param makeCost:=
1 200
2 200
3 200;

param holdCost:=
1 1
2 1
3 5;

param demand:=
1 100
2 200
3 50;

param start:=10;
param S:=150;
end;

Figure 6.8: Input data to GLPK describing McLearey’s Problem

Problem: Shamrock
Rows: 7
Columns: 6
Non-zeros: 17
Status: OPTIMAL
Objective: z = 68050 (MINimum)

No. Column name St Activity Lower bound Upper bound Marginal


------ ------------ -- ------------- ------------- ------------- -------------
1 x[1] B 140 0
2 x[2] B 150 0
3 x[3] B 50 0
4 y[1] B 50 0
5 y[2] NL 0 0 2
6 y[3] NL 0 0 205

Figure 6.9: Output from glpsol on the McLearey Problem.

■ Example 6.11 A craftsman makes two kinds of jewelry boxes for craft shows.
The oval box requires 30 minutes of machine work and 20 minutes of finishing.
The square box requires 20 minutes of machine work and 40 minutes of finishing.
Machine work is limited to 600 minutes per day and finishing to 800 minutes. If
there is $3 profit on the oval box and $4 on the square box, how many of each
should be produced to maximize profit? ■

Solution. Let x1 denote the number of square boxes and x2 the number of oval

T.Abraha(PhD) @AKU, 2024 Linear Optimization


6.17 Problems that Can’t be Initialized by Hand 447

boxes. The constraints read


30x1 + 20x2 ≤ 600 (timeformachinework),
20x1 + 40x2 ≤ 800 (timeforfinishing),
and the objective function is

z = 3x1 + 4x2 (profit).

Clearly we can divide our constraints by a positive number without altering them. If
we divide the first by 10 and the second by 20 we get
3x1 + 2x2 ≤ 60,
x1 + 2x2 ≤ 40.

This little trick simplifies the problem considerably.


The corresponding initial simplex tableau reads

x1 x2 s 1 s 2 z
3 2 1 0 0 60
 

1 2 0 1 0 40  .
−3 −4 0 0 1 0

Now you need to observe one important thing: The position of each column is only
important in the sense that the location gives you the corresponding variable. That
is, the first column corresponds to x1 , the second to x2 , the third to s1 , the fourth
to s2 , and the fifth to z. If we keep track of this information we can of course
interchange columns.
Hence we can rearrange our tableau according to

s1 s2 z x1 x2
1 0 0 3 2 60
 

0 1 0 1 2 40  .
0 0 1 −3 −4 0

Notice that the columns correspond to s1 , s2 , z, x1 , x2 , respectively. Type this matrix


into your TI85 and save it as “A”.
Now locate the pivot element as usual; in this case it is the 2,5 entry (the entry
printed bold). Next, exchange column 2 and 5 (since the pivot element is the 2,5
entry) on your calculator by typing "rSwap(AT ,2,5)T " (to get "rSwap(" on your TI85
press M AT RX → F 4 → M ORE → F 2 and to get “T ” (AT is called transpose of A)
on your TI85 press M AT RX → F 3 → F 2)

s1 x 2 z x1 s2
1 2 0 3 0 60
 

0 2 0 1 1 40 
0 −4 1 −3 0 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


448 Chapter 6. Interior-Point Methods

and note that the columns now correspond to s1 , x2 , z, x1 , s2 . Next, compute the
reduced echelon form by typing “rref Ans”

s 1 x2 z x1 s2
1 0 0 2 −1 20
 

0 1 0 12 1
2 20 
0 0 1 −1 2 80

Since there is still a negative entry in the last row we are not done yet. The new
pivot element is the 1,4 entry (again printed bold) and hence we are going to swap
column 1 and 4 on the calculator by typing “rSwap(AnsT ,1,4)T ”

x1
x2 z s1 s2
20 0 1 −1 20
 
 1
2 1 0 0 12 20 
−1 0 1 0 2 80

Again, note that the columns correspond to x1 , x2 , z, s1 , s2 . Computing the reduced


echelon form by typing “rref Ans” gives

x1 x2 z s 1 s2
1 0 0 21 − 12 10
 

0 1 0 − 14 43 15 
0 0 1 21 3
2 90

Since there are no negative entries in the last row we are done. Now we can rearrange
the tableau in such a way that the columns correspond to x1 , x2 , s1 , s2 , z as at the
outset

x1 x2 s 1 s 2 z
1
1 0 − 12 0 10
 
2

0 1 − 14 34 0 15  .
1 3
0 0 2 2 1 90

This is the final simplex tableau. Setting s1 = 0, s2 = 0 and solving for the remaining
variables we get x1 = 10, x2 = 15 and z = 90.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


7. Advanced Topics In Linear
Optimization

7.1 Sensitivity Analysis


7.2 Large-Scale Linear Programming
7.3 Decomposition Methods
7.4 Multi-Objective Optimization
In this section, we will consider the problem of degeneracy and prove (at last) that
there is an implementation of the Simplex Algorithm that is guaranteed to converge
to an optimal solution, assuming one exists.

7.5 Degeneracy Revisited


We’ve already discussed degeneracy. Recall the following theorem from Chapter 5
that defines degeneracy in terms of the simplex tableau:

Theorem 6.7. Consider Problem P (our linear programming problem). Let B ∈


Rm×m be a basis matrix corresponding to some set of basic variables xB . Let
b = B−1 b. If bj = 0 for some j = 1, . . . , m, then xB = b and xN = 0 is a degenerate
extreme point of the feasible region of Problem P .

We have seen in Example 6.5 that degeneracy can cause us to take extra steps
on our way from an initial basic feasible solution to an optimal solution. When
the simplex algorithm takes extra steps while remaining at the same degenerate
extreme point, this is called stalling. The problem can become much worse; for
certain entering variable rules, the simplex algorithm can become locked in a cycle of
pivots each one moving from one characterization of a degenerate extreme point to
the next. The following example from Beale and illustrated in Chapter 4 of [BJS04]
demonstrates the point.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


450 Chapter 7. Advanced Topics In Linear Optimization

■ Example 7.1 Consider the following linear programming problem:

3 1
min − x4 + 20x5 − x6 + 6x7
4 2
1
s.t x1 + x4 − 8x5 − x6 + 9x7 = 0
4
1 1 (7.1)
x2 + x4 − 12x5 − x6 + 3x7 = 0
2 2
x3 + x6 = 1
xi ≥ 0 i = 1, . . . , 7

It is conducive to analyze the A matrix of the constraints of this problem. We


have:
 
1 0 0 1/4 −8 −1 9
 
A = 0 1 0 1/2 −12 −1/2 3

 (7.2)
0 0 1 0 0 1 0

The fact that the A matrix contains an identity matrix embedded within it suggests
that an initial basic feasible solution with basic variables x1 , x2 and x3 would be a
good choice. This leads to a vector of reduced costs given by:
h i
cB T B−1 N − cN T = 3/4 −20 1/2 −6 (7.3)

These yield an initial tableau with structure:


 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 0 0 0 3/4 −20 1/2 −6 0 
 
x1 
 0 1 0 0 1/4 −8 −1 9 0 
 
x2 
 0 0 1 0 1/2 −12 −1/2 3 0 
x3 0 0 0 1 0 0 1 0 1

If we apply an entering variable rule where we always chose the non-basic


variable to enter with the most positive reduced cost (since this is a minimization
problem), and we choose the leaving variable to be the first row that is in a tie,
then we will obtain the following sequence of tableaux:
Tableau I:
 

z x1 x2 x3 x4 x5 x6 x7 RHS 
z 1 0 0 0 3/4 −20 1/2 −6 0 
 

 
x1 0 1 0 0 1/4 −8 −1 9 0
 
 
 
x2 
 0 0 1 0 1/2 −12 −1/2 3 0


 
x3 0 0 0 1 0 0 1 0 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


7.5 Degeneracy Revisited 451

Tableau II:
 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 −3 0 0 0 4 7/2 −33 0 
 
x4 
 0 4 0 0 1 −32 −4 36 0 
 
x2 
 0 −2 1 0 0 4 3/2 −15 0 
x3 0 0 0 1 0 0 1 0 1

Tableau III:
 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 −1 −1 0 0 0 2 −18 0 
 
x4 
 0 −12 8 0 1 0 8 −84 0 
 
x5 
 0 −1/2 1/4 0 0 1 3/8 −15/4 0 
x3 0 0 0 1 0 0 1 0 1

Tableau IV:
 

z x1 x2 x3 x4 x5 x6 x7 RHS 
z 1 2 −3 0 −1/4 0 0 3 0 
 

 
x6 0 −3/2 1 0 1/8 0 1 −21/2 0 
 

 
x5 
 0 1/16 −1/8 0 −3/64 1 0 3/16 0 
 
x3 0 3/2 −1 1 −1/8 0 0 21/2 1

Tableau V:
 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 1 −1 0 1/2 −16 0 0 0 
 
x6 
 0 2 −6 0 −5/2 56 1 0 0 
 
x7 
 0 1/3 −2/3 0 −1/4 16/3 0 1 0 
x3 0 −2 6 1 5/2 −56 0 0 1

Tableau VI:
 

z x1 x2 x3 x4 x5 x6 x7 RHS 
z 1 0 2 0 7/4 −44 −1/2 0 0 
 

 
x1 0 1 −3 0 −5/4 28 1/2 0 0 
 

 
x7 
 0 0 1/3 0 1/6 −4 −1/6 1 0 
 
x3 0 0 0 1 0 0 1 0 1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


452 Chapter 7. Advanced Topics In Linear Optimization

Tableau VII:
 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 0 0 0 3/4 −20 1/2 −6 0 
 
x1 
 0 1 0 0 1/4 −8 −1 9 0 
 
x2 
 0 0 1 0 1/2 −12 −1/2 3 0 
x3 0 0 0 1 0 0 1 0 1

We see that the last tableau (VII) is the same as the first tableau and thus we
have constructed an instance where (using the given entering and leaving variable
rules), the Simplex Algorithm will cycle forever at this degenerate extreme point. ■

7.6 The Lexicographic Minimum Ratio Leaving Variable


Rule
Given the example of the previous section, we require a method for breaking ties
in the case of degeneracy is required that prevents cycling from occurring. There is
a large literature on cycling prevention rules, however the most well known is the
lexicographic rule for selecting the entering variable.

Definition 7.1 — Lexicographic Order. Let x = [x1 , . . . , xn ]T and y = [y1 , . . . , yn ]T


be vectors in Rn . We say that x is lexicographically greater than y if: there
exists m < n so that xi = yi for i = 1, . . . , m, and xm+1 > ym+1 .
Clearly, if there is no such m < n, then xi = yi for i = 1, . . . , n and thus x = y.
We write x ≻ y to indicate that x is lexicographically greater than y. Naturally,
we can write x ⪰ y to indicate that x is lexicographically greater than or equal
to y.

Lexicographic ordering is simply the standard order operation > applied to the
individual elements of a vector in Rn with a precedence on the index of the vector.

Definition 7.2 A vector x ∈ Rn is lexicographically positive if x ≻ 0 where 0 is


the zero vector in Rn .

Lemma 7.1. Let x and y be two lexicographically positive vectors in Rn . Then


x + y is lexicographically positive. Let c > 0 be a constant in R, then cx is a
lexicographically positive vector.

Exercise 7.1 Prove Lemma 7.1. ■

7.6.1 Lexicographic Minimum Ratio Test


Suppose we are considering a linear programming problem and we have chosen an
entering variable xj according to a fixed entering variable rule. Assume further,
we are given some current basis matrix B and as usual, the right-hand-side vector
of the constraints is denoted b, while the coefficient matrix is denoted A. Then

T.Abraha(PhD) @AKU, 2024 Linear Optimization


7.6 The Lexicographic Minimum Ratio Leaving Variable Rule 453

the minimum ratio test asserts that we will chose as the leaving variable the basis
variable with the minimum ratio in the minimum ratio test. Consider the following
set:
( " #)
br bi
I0 = r : = min : i = 1, . . . , m and aji > 0 (7.4)
aj r aj i

In the absence of degeneracy, I0 contains a single element: the row index that has
the smallest ratio of bi to aji , where naturally: b = B−1 b and aj = B−1 A·j . In this
case, xj is swapped into the basis in exchange for xBr (the rth basic variable).
When we have a degenerate basic feasible solution, then I0 is not a singleton set
and contains all the rows that have tied in the minimum ratio test. In this case, we
can form a new set:
( " #)
a1 a1 i
I1 = r : r = min : i ∈ I0 (7.5)
aj r aj i

Here, we are taking the elements in column 1 of B−1 A·1 to obtain a1 . The elements
of this (column) vector are then being divided by the elements of the (column) vector
aj on a index-by-index basis. If this set is a singleton, then basic variable xBr leaves
the basis. If this set is not a singleton, we may form a new set I2 with column a2 .
In general, we will have the set:
( " #)
ak ak i
Ik = r : r = min : i ∈ Ik−1 (7.6)
aj r aj i

Lemma 7.2. For any degenerate basis matrix B for any linear programming problem,
we will ultimately find a k so that Ik is a singleton.

Exercise 7.2 Prove Lemma 7.2. [Hint: Assume that the tableau is arranged so that
the identity columns are columns 1 through m. (That is aj = ej for i = 1, . . . , m.)
Show that this configuration will easily lead to a singleton Ik for k < m.] ■

In executing the lexicographic minimum ratio test, we can see that we are essentially
comparing the tied rows in a lexicographic manner. If a set of rows ties in the
minimum ratio test, then we execute a minimum ratio test on the first column of
the tied rows. If there is a tie, then we move on executing a minimum ratio test on
the second column of the rows that tied in both previous tests. This continues until
the tie is broken and a single row emerges as the leaving row.

■Example 7.2 Let us consider the example from Beale again using the lexicographic
minimum ratio test. Consider the tableau shown below.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


454 Chapter 7. Advanced Topics In Linear Optimization

Tableau I:
 

z x1 x2 x3 x4 x5 x6 x7 RHS 
z 1 0 0 0 3/4 −20 1/2 −6 0 
 

 
x1 0 1 0 0 1/4 −8 −1 9 0 
 

 
x2 
 0 0 1 0 1/2 −12 −1/2 3 0 
 
x3 0 0 0 1 0 0 1 0 1

Again, we chose to enter variable x4 as it has the most positive reduced cost.
Variables x1 and x2 tie in the minimum ratio test. So we consider a new minimum
ratio test on the first column of the tableau:
( )
1 0
min , (7.7)
1/4 1/2

From this test, we see that x2 is the leaving variable and we pivot on element 1/2
as indicated in the tableau. Note, we only need to execute the minimum ratio test
on variables x1 and x2 since those were the tied variables in the standard minimum
ratio test. That is, I0 = {1, 2} and we construct I1 from these indexes alone. In
this case I1 = {2}. Pivoting yields the new tableau:
Tableau II:
 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 0 −3/2 0 0 −2 5/4 −21/2 0 
 
x1 
 0 1 −1/2 0 0 −2 −3/4 15/2 0 
 
x4 
 0 0 2 0 1 −24 −1 6 0 
x3 0 0 0 1 0 0 1 0 1

There is no question this time of the entering or leaving variable, clearly x6 must
enter and x3 must leave and we obtaina :
Tableau III:
 
z x1 x2 x3 x4 x5 x6 x7 RHS
 
z 
 1 0 −3/2 −5/4 0 −2 0 −21/2 −5/4 

 
x1 
 0 1 −1/2 3/4 0 −2 0 15/2 3/4 

 
x4 
 0 0 2 1 1 −24 0 6 1 
x6 0 0 0 1 0 0 1 0 1

Since this is a minimization problem and the reduced costs of the non-basic variables
are now all negative, we have arrived at an optimal solution. The lexicographic
minimum ratio test successfully prevented cycling. ■

aThanks to Ethan Wright for finding a small typo in this example, that is now fixed.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


7.6 The Lexicographic Minimum Ratio Leaving Variable Rule 455

7.6.2 Convergence of the Simplex Algorithm Under Lexicographic Mini-


mum Ratio Test
Lemma 7.3. Consider the problem:

cT x

max


P s.t. Ax = b



x≥0

Suppose the following hold:


1. Im is embedded in the matrix A and is used as the starting basis,
2. a consistent entering variable rule is applied (e.g., largest reduced cost first),
and
3. the lexicographic minimum ratio test is applied as the leaving variable rule.
Then each row of the sequence of augmented matrices [b|B−1 ] is lexicographically
positive. Here B is the basis matrix and b = B−1 b.
Proof. The initial basis Im yieds an augmented matrix [b|Im ]. This matrix clearly
has every row lexicographically positive, since b ≥ 0. Assume that the rows of
[b|B−1 ] ≻ 0 for the first n iterations of the simplex algorithm with fixed entering
variable rule and lexicographic minimum ratio test. We will show that it is also true
for step n + 1.
Suppose (after iteration n) we have the following tableau:

 
z x1 ... xj ... xm xm+1 ... xk ... xn RHS
z  1 z1 − c1 ... z j − cj ... z m − cm zm+1 − cm+1 ... z k − ck ... z n − cn z 
 
xB 1  0 a11 ... a1j ... a1m a1m+1 ... a1k ... a1n b1 
 
..  .. .. .. .. .. .. .. .. 
.  . . . . . . . . 
 
xBi  0 ai1 ... aij ... aim aim+1 ... aik ... ain bi 
 
..  .. .. .. .. .. .. .. .. 
.  . . . . . . . . 
 
xBr  0 ar1 ... arj ... arm arm+1 ... ark ... arn br 
 
..  .. .. .. .. .. .. .. .. 
.  . . . . . . . . 
xBm 0 am1 ... amj ... amm amm+1 ... amk ... amn bm

Table 7.1: Tableau used for Proof of Lemma 7.3

Assume using the entering variable rule of our choice that xk will enter. Let us
consider what happens as we choose a leaving variable and execute a pivot. Suppose
that after executing the lexicographic minimum ratio test, we pivot on element ark .
Consider the pivoting operation on row i: there are two cases:
Case I i ̸∈ I0 If i ̸∈ I0 , then we replace bi with
′ aik
bi = bi − br
ark
If aij < 0, then clearly b′i > 0. Otherwise, since i ̸∈ I0 , then:

br bi aik aik
< =⇒ br < bi =⇒ 0 < bi − br
ark aik ark ark

T.Abraha(PhD) @AKU, 2024 Linear Optimization


456 Chapter 7. Advanced Topics In Linear Optimization

Thus we know that bi > 0. It follows then that row i of the augmented matrix
[b|B−1 ] is lexicographically positive.

Case II i ∈ I0 Then bi = bi − (aik /ark )br = 0 since

br bi
=
ark aik

There are now two possibilities, either i ∈ I1 or i ̸∈ I1 . In the first, case we can
argue that

aik
a′i1 = ai1 − ar1 > 0
ark

for the same reason that bi > 0 in the case when i ∈ I0 , namely that the
lexicographic minimum ratio test ensures that:

ar1 ai1
<
ark aik

if i ̸∈ I1 . This confirms (since bi = 0) row i of the augmented matrix [b|B−1 ] is
lexicographically positive.
In the second case that i ∈ I1 , then we may proceed to determine whether i ∈ I2 .
This process continues until we identify the j for which Ij is the singleton index
r. Such a j must exist by Lemma 7.2. In each case, we may reason that row i
of the augmented matrix [b|B−1 ] is lexicographically positive.
The preceding argument shows that at step n + 1 of the simplex algorithm we
will arrive an augmented matrix [b|B−1 ] for which every row is lexicographically
positive. This completes the proof. ■

R The assumption that we force Im into the basis can be justified in one of two
ways:
1. We may assume that we first execute a Phase I simplex algorithm with
artificial variables. Then the forgoing argument applies.
2. Assume we are provided with a crash basis B and we form the equivalent
problem:

max 0T xB + (cTN − cTB B−1 N)xN






P′ s.t. Im xB + B−1 NxN = B−1 b


xB , xN ≥ 0

where B−1 b ≥ 0. This problem is clearly equivalent because its initial


simplex tableau will be identical to a simplex tableau generated by
Problem P with basis matrix B. If no such crash basis exists, then the
problem is infeasible.

Lemma 7.4. Under the assumptions of Lemma 7.3, let zi and zi+1 be row vectors
in Rn+1 corresponding to Row 0 from the simplex tableau at iterations i and i + 1
respectively. Assume, however, that we exchange the z column (column 1) and the
RHS column (column n + 2). Then zi+1 − zi is lexicographically positive.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


7.7 Bland’s Rule, Entering Variable Rules and Other Considerations 457

Proof. Consider the tableau in Table 7.1. If we are solving a maximization problem,
then clearly for xk to be an entering variable (as we assumed in the proof of Lemma
7.3) we must have zk − ck < 0. Then the new Row Zero is obtained by adding:

−(zk − ck ) h i
y= 0 ar1 . . . arj . . . arm arm+1 . . . ark . . . arn br
ark

to the current row zero consisting of [1 z1 − c1 . . . zn − cn z]. That is: zi+1 = zi + y,


or y = zi+1 − zi .
The fact that zk − ck < 0 and ark > 0 (in order to pivot at that element) implies
that −(zk −ck )/ark > 0. Further, Lemma 7.3 asserts that the vector [0 ar1 . . . arn br ]
is lexicographically positive (if we perform the exchange of column 1 and column
n + 2 as we assumed we would). Thus, y is lexicographically positive by Lemma 7.1.
This completes the proof. ■

Theorem 7.1 Under the assumptions of Lemma 7.3, the simplex algorithm converges
in a finite number of steps.

Proof. Assume by contradiction that we begin to cycle. Then there is a sequence of


row 0 vectors z0 , z1 , . . . , zl so that zl = z0 . Consider yi = zi − zi−1 . By Lemma 7.4,
yi ≻ 0 for i = 1, . . . , n. Then we have:

y1 + y2 + · · · + yl = (z1 − z0 ) + (z2 − z1 ) + · · · + (zl − zl−1 ) =


(z1 − z0 ) + (z2 − z1 ) + · · · + (z0 − zl−1 ) = z0 − z0 = 0 (7.8)

But by Lemma 7.1, the sum of lexicographically positive vectors is lexicographically


positive. Thus we have established a contradiction. This cycle cannot exist and
the simplex algorithm must converge by the same argument we used in the proof of
Theorem 6.8. This completes the proof. ■

R Again, the proof of correctness, i.e., that the simplex algorithm with the
lexicographic minimum ratio test finds a point of optimality, is left until the
next chapter when we’ll argue that the simplex algorithm finds a so-called
KKT point.

7.7 Bland’s Rule, Entering Variable Rules and Other Con-


siderations
There are many other anti-cycling rules that each have their own unique proofs of
convergence. Bland’s rule is a simple one: All the variables are ordered (say by giving
them an index) and strictly held in that order. If many variables may enter, then the
variable with lowest index is chosen to enter. If a tie occurs in the minimum ratio
test, then variable with smallest index leaves the basis. It is possible to show that
this rule will prevent cycling. However, it can also lead to excessively long simplex
algorithm execution.
In general, there are many rules for choosing the entering variable. Two common
ones are:

T.Abraha(PhD) @AKU, 2024 Linear Optimization


458 Chapter 7. Advanced Topics In Linear Optimization

1. Largest absolute reduced cost: In this case, the variable with most negative
(in maximization) or most positive (in minimization) reduced cost is chosen to
enter.
2. Largest impact on the objective: In this case, the variable whose entry will
cause the greatest increase (or decrease) to the objective function is chosen.
This of course requires pre-computation of the value of the objective function
for each possible choice of entering variable and can be time consuming.
Leaving variable rules (like Bland’s rule or the lexicographic minimum ratio test)
can also be expensive to implement. Practically, many systems ignore these rules
and use floating point error to break ties. This does not ensure that cycling does
not occur, but often is useful in a practical sense. However, care must be taken. In
certain simplex instances, floating point error cannot be counted on to ensure tie
breaking and consequently cycling prevention rules must be implemented. This is
particularly true in network flow problems that are coded as linear programs. It is
also important to note that none of these rules prevent stalling. Stalling prevention is
a complicated thing and there are still open questions on whether certain algorithms
admit or prevent stalling. See Chapter 4 of [BJS04] for a treatment of this subject.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


T.Abraha(PhD) @AKU, 2024 Linear Optimization
460 Chapter 8. Applications in Science and Engineering

8. Applications in Science and


Engineering

8.1 Engineering Applications


8.1.1 Network Flow Problems
8.1.2 Optimal Design and Control
8.1.3 Resource Allocation
8.2 Operations Research
8.2.1 Transportation and Assignment Problems
8.2.2 Production Planning
8.2.3 Scheduling Problems
8.3 Economics and Finance
8.3.1 Portfolio Optimization
8.3.2 Market Equilibrium Models
8.3.3 Game Theory and Linear Programming
8.4 Data Science and Machine Learning
8.4.1 Linear Regression and Classification
8.4.2 Support Vector Machines
8.4.3 Data Fitting and Approximation
8.5 Case Studies
8.5.1 Real-World Applications and Case Studies
8.5.2 Success Stories and Lessons Learned
8.6 The Revised Simplex Method and Optimality Conditions
T.Abraha(PhD) @AKU, 2024 Linear Optimization
8.7 The Revised Simplex Method
8.7 The Revised Simplex Method 461

The tableau method is a substantially data intensive process as we carry the entire
simplex tableau with us as we execute the simplex algorithm. However, consider the
data we need at each iteration of the algorithm:
1. Reduced costs: cTB B−1 A·j − cj for each variable xj where j ∈ J and J is the
set of indices of non-basic variables.
2. Right-hand-side values: b = B−1 b for use in the minimum ratio test.
3. aj = B−1 A·j for use in the minimum ratio test.
4. z = cTB B−1 b, the current objective function value.
The one value that is clearly critical to the computation is B−1 as it appears in
each and every computation. It would be far more effective to keep only the values:
B−1 , cTB B−1 , b and z and compute the reduced cost values and vectors aj as we
need them.
Let w = cTB B−1 , then the pertinent information may be stored in a new revised
simplex tableau with form:
 
w z 
 (8.2)
xB B−1 b
The revised simplex algorithm is detailed in Algorithm 8. In essence, the revised
simplex algorithm allows us to avoid computing aj until we absolutely need to do so.
In fact, if we do not apply Dantzig’s entering variable rule and simply select the first
acceptable entering variable, then we may be able to avoid computing a substantial
number of columns in the tableau.

■ Example 8.1 Consider a software company who is developing a new program. The
company has identified two types of bugs that remain in this software: non-critical
and critical. The company’s actuarial firm predicts that the risk associated with
these bugs are uniform random variables with mean $100 per non-critical bug and
mean $1000 per critical bug. The software currently has 50 non-critical bugs and 5
critical bugs.
Assume that it requires 3 hours to fix a non-critical bug and 12 hours to fix a
critical bug. For each day (8 hour period) beyond two business weeks (80 hours)
that the company fails to ship its product, the actuarial firm estimates it will loose
$500 per day.
We can find the optimal number of bugs of each type the software company
should fix assuming it wishes to minimize its exposure to risk using a linear
programming formulation.
Let x1 be the number of non-critical bugs corrected and x2 be the number of
critical software bugs corrected. Define:

y1 = 50 − x1 (8.3)
y2 = 5 − x2 (8.4)

Here y1 is the number of non-critical bugs that are not fixed while y2 is the number
of critical bugs that are not fixed.
The time (in hours) it takes to fix these bugs is:

3x1 + 12x2 (8.5)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


462 Chapter 8. Applications in Science and Engineering

Algorithm 8 Revised Simplex Algorithm

Revised Simplex Algorithm


1. Identify an initial basis matrix B and compute B−1 , w, b and z and place
these into a revised simplex tableau:
 

w z 
xB B−1 b

2. For each j ∈ J use w to compute: zj − cj = wA·j − cj .


3. Choose an entering variable xj (for a maximization problem, we choose a
variable with negative reduced cost, for a minimization problem we choose a
variable with positive reduced cost):
a. If there is no entering variable, STOP, you are at an optimal solution.
b. Otherwise, continue to Step 4.
4. Append the column aj = B−1 A·j to the revised simplex tableau:
  

w z   zj − cj 
xB B−1 b aj

5. Perform the minimum ratio test and determine a leaving variable (using any
leaving variable rule you prefer).
a. If aj ≤ 0, STOP, the problem is unbounded.
b. Otherwise, assume that the leaving variable is xBr which appears in
row r of the revised simplex tableau.
6. Use row operations and pivot on the leaving variable row of the column:
 
z − cj 
 j
aj

transforming the revised simplex tableau into:


  
w′ z ′   0 
−1 ′


xB B′ b er

where er is an identity column with a 1 in row r (the row that left). The
variable xj is now the rth element of xB .
7. Goto Step 2.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.7 The Revised Simplex Method 463

Let:
1
y3 = (80 − 3x1 − 12x2 ) (8.6)
8
Then y3 is a variable that is unrestricted in sign and determines the amount of
time (in days) either over or under the two-week period that is required to ship
the software. As an unrestricted variable, we can break it into two components:

y3 = z1 − z2 (8.7)

We will assume that z1 , z2 ≥ 0. If y3 > 0, then z1 > 0 and z2 = 0. In this case, the
software is completed ahead of the two-week deadline. If y3 < 0, then z1 = 0 and
z2 > 0. In this case the software is finished after the two-week deadline. Finally, if
y3 = 0, then z1 = z2 = 0 and the software is finished precisely on time.
We can form the objective function as:

z = 100y1 + 1000y2 + 500z2 (8.8)

The linear programming problem is then:

min z =y1 + 10y2 + 5z2


s.t. x1 + y1 = 50
x2 + y 2 = 5 (8.9)
3 3
x1 + x2 + z1 − z2 = 10
8 2
x1 , x 2 , y 1 , y 2 , z 1 , z 2 ≥ 0

Notice we have modified the objective function by dividing by 100. This will make
the arithmetic of the simplex algorithm easier. The matrix of coefficients for this
problem is:
 
x1 x2 y1 y2 z1 z2
 
1 0 1 0 0 0
(8.10)
 
 
0 1 0 1 0 0
 
3 3
8 2 0 0 1 −1

Notice there is an identity matrix embedded inside the matrix of coefficients. Thus
a good initial basic feasible solution is {y1 , y2 , z1 }. The initial basis matrix is I3
and naturally, B−1 = I3 as a result. We can see that cB = [1 10 0]T . It follows that
cTB B−1 = w = [1 10 0].
Our initial revised simplex tableau is thus:
 
z 1 10 0 100
 
y1  1 0 0 50 
(8.11)
 
 
y2 
 0 1 0 5  
z1 0 0 1 10

T.Abraha(PhD) @AKU, 2024 Linear Optimization


464 Chapter 8. Applications in Science and Engineering

There are three variables that might enter at this point, x1 , x2 and z1 . We can
compute the reduced costs for each of these variables using the columns of the A
matrix, the coefficients of these variables in the objective function and the current
w vector (in row 0 of the revised simplex tableau). We obtain:
 
h i
1

z1 − c1 = wA·1 − c1 = 1 10 0 
 −0 = 1
0 
3/8
 
h i
0

z2 − c2 = wA·2 − c2 = 1 10 0 
  − 0 = 10
1 
3/2
 
h i
0

z6 − c6 = wA·6 − c6 = 1 10 0 
 0 − 5 = −5
−1

By Dantzig’s rule, we enter variable x2 . We append B−1 A·2 and the reduced cost
to the revised simplex tableau to obtain:
  
z 1 10 0 100 10 M RT
  
y1  1 0 0 50   0 
  −

   (8.12)
y2 
 0 1 0 5  1 
   5
z1 0 0 1 10 3/2 20/3

After pivoting on the indicated element, we obtain the new tableau:


 
z 1 0 0 50
 
y1  1 0 0 50 



 (8.13)
x2 
 0 1 0 5 

z1 0 −3/2 1 5/2

We can compute reduced costs for the non-basic variables (except for y2 , which we
know will not re-enter the basis on this iteration) to obtain:

z1 − c1 = wA·1 − c1 = 1
z6 − c6 = wA·6 − c6 = −5

In this case, x1 will enter the basis and we augment our revised simplex tableau to
obtain:
  
z 1 0 0 50 1 M RT
  
y1  1 0 0 50  1  50
  
(8.14)

  
x2 
 0 1 0 5 
 0 

z1 0 −3/2 1 5/2 3/8 20/3

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.8 Farkas’ Lemma and Theorems of the Alternative 465

Note that:
    
1 0 0 1 1
−1     
B A·1 = 0
 1 0  0  =  0 
   

0 −3/2 1 3/8 3/8

This is the ā1 column that is appended to the right hand side of the tableau along
with z1 − c1 = 1. After pivoting, the tableau becomes:
 
z 1 4 −8/3 130/3
 
y1  1 4 −8/3 130/3 



 (8.15)
x2 
 0 1 0 5  
x1 0 −4 8/3 20/3

We can now check our reduced costs. Clearly, z1 will not re-enter the basis.
Therefore, we need only examine the reduced costs for the variables y2 and z2 .

z4 − c4 = wA·4 − c4 = −6
z6 − c6 = wA·6 − c6 = −7/3

Since all reduced costs are now negative, no further minimization is possible and
we conclude we have arrived at an optimal solution.
Two things are interesting to note: first, the solution for the number of non-
critical software bugs to fix is non-integer. Thus, in reality the company must fix
either 6 or 7 of the non-critical software bugs. The second thing to note is that
this economic model helps to explain why some companies are content to release
software that contains known bugs. In making a choice between releasing a flawless
product or making a quicker (larger) profit, a selfish, profit maximizer will always
choose to fix only those bugs it must fix and release sooner rather than later. ■

Exercise 8.1 Solve the following problem using the revised simplex algorithm.

max x1 + x2
s.t. 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 6
x1 , x 2 ≥ 0

8.8 Farkas’ Lemma and Theorems of the Alternative


Lemma 8.1 (Farkas’ Lemma). Let A ∈ Rm×n and c ∈ Rn be a row vector. Suppose
x ∈ Rn is a column vector and w ∈ Rm is a row vector. Then exactly one of the
following systems of inequalities has a solution:
1. Ax ≥ 0 and cx < 0 or
2. wA = c and w ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


466 Chapter 8. Applications in Science and Engineering

R Before proceeding to the proof, it is helpful to restate the lemma in the


following way:
1. If there is a vector x ∈ Rn so that Ax ≥ 0 and cx < 0, then there is no
vector w ∈ Rm so that wA = c and w ≥ 0.

2. Conversely, if there is a vector w ∈ Rm so that wA = c and w ≥ 0, then


there is no vector x ∈ Rn so that Ax ≥ 0 and cx < 0.

Proof. We can prove Farkas’ Lemma using the fact that a bounded linear program-
ming problem has an extreme point solution. Suppose that System 1 has a solution
x. If System 2 also has a solution w, then

wA = c =⇒ wAx = cx. (8.16)

The fact that System 1 has a solution ensures that cx < 0 and therefore wAx < 0.
However, it also ensures that Ax ≥ 0. The fact that System 2 has a solution implies
that w ≥ 0. Therefore we must conclude that:

w ≥ 0 and Ax ≥ 0 =⇒ wAx ≥ 0. (8.17)

This contradiction implies that if System 1 has a solution, then System 2 cannot
have a solution.
Now, suppose that System 1 has no solution. We will construct a solution for
System 2. If System 1 has no solution, then there is no vector x so that cx < 0 and
Ax ≥ 0. Consider the linear programming problem:
(
min cx
PF (8.18)
s.t. Ax ≥ 0

Clearly x = 0 is a feasible solution to this linear programming problem and further-


more is optimal. To see this, note that the fact that there is no x so that cx < 0 and
Ax ≥ 0, it follows that cx ≥ 0; i.e., 0 is a lower bound for the linear programming
problem PF . At x = 0, the objective achieves its lower bound and therefore this
must be an optimal solution. Therefore PF is bounded and feasible.
We can covert PF to standard form through the following steps:
1. Introduce two new vectors y and z with y, z ≥ 0 and write x = y − z (since x
is unrestricted).
2. Append a vector of surplus variables s to the constraints.
This yields the new problem:




min cy − cz
PF′ s.t. Ay − Az − Im s = 0 (8.19)

y, z, s ≥ 0


Applying Theorems 6.1 and 6.2, we see we can obtain an optimal basic feasible
solution for Problem PF′ in which the reduced costs for the variables are all negative
(that is, zj − cj ≤ 0 for j = 1, . . . , 2n + m). Here we have n variables in vector y, n
variables in vector z and m variables in vector s. Let B ∈ Rm×m be the basis matrix
at this optimal feasible solution with basic cost vector cB . Let w = cB B−1 (as it
was defined for the revised simplex algorithm).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.8 Farkas’ Lemma and Theorems of the Alternative 467

Consider the columns of the simplex tableau corresponding to a variable xk (in


our original x vector). The variable xk = yk − zk . Thus, these two columns are
additive inverses. That is, the column for yk will be B−1 A·k , while the column for zk
will be B−1 (−A·k ) = −B−1 A·k . Furthermore, the objective function coefficient will
be precisely opposite as well. Thus the fact that zj − cj ≤ 0 for all variables implies
that:

wA·k − ck ≤ 0 and
−wA·k + ck ≤ 0 and

That is, we obtain

wA = c (8.20)

since this holds for all columns of A.


Consider the surplus variable sk . Surplus variables have zero as their coefficient in
the objective function. Further, their simplex tableau column is simply B−1 (−ek ) =
−B−1 ek . The fact that the reduced cost of this variable is non-positive implies that:

w(−ek ) − 0 = −wek ≤ 0 (8.21)

Since this holds for all surplus variable columns, we see that −w ≤ 0 which implies
w ≥ 0. Thus, the optimal basic feasible solution to Problem PF′ must yield a vector
w that solves System 2.
Lastly, the fact that if System 2 does not have a solution, then System 1 does
follows from contrapositive on the previous fact we just proved. ■

Exercise 8.2 Suppose we have two statements A and B so that:

A ≡ System 1 has a solution.


B ≡ System 2 has a solution.

Our proof showed explicitly that NOT A =⇒ B. Recall that contrapositive is


the logical rule that asserts that:

X =⇒ Y ≡ NOT Y =⇒ NOT X (8.22)

Use contrapositive to prove explicitly that if System 2 has no solution, then System
1 must have a solution. [Hint: NOT NOT X ≡ X.] ■

8.8.1 Geometry of Farkas’ Lemma


Farkas’ Lemma has a pleasant geometric interpretation1 . Consider System 2: namely:

wA = c and w ≥ 0
1Thanks to Akinwale Akinbiyi for pointing out a typo in this discussion.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


468 Chapter 8. Applications in Science and Engineering

Geometrically, this states that c is inside the positive cone generated by the rows of
A. That is, let w = (w1 , . . . , wm ). Then we have:

w1 A1· + · · · + wm Am· (8.23)

and wi ≥ 0 for i = 1, . . . , m. Thus c is a positive combination of the rows of A. This


is illustrated in Figure 8.1.

A2· c
A1· Am·

Half-space Positive Cone of Rows of A


cy > 0

Figure 8.1: System 2 has a solution if (and only if) the vector c is contained inside
the positive cone constructed from the rows of A.

On the other hand, suppose System 1 has a solution. Then let y = −x. System
1 states that Ay ≤ 0 and cy > 0. That means that each row of A (as a vector)
must be at a right angle or obtuse to y. (Since Ai· x ≥ 0.) Further, we know that
the vector y must be acute with respect to the vector c. This means that System
1 has a solution only if the vector c is not in the positive cone of the rows of A or
equivalently the intersection of the open half-space {y : cy > 0} and the set of vectors
{y : Ai· y ≤ 0, i = 1, . . . m} is non-empty. This set is the cone of vectors perpendicular
to the rows of A. This is illustrated in Figure 8.2

Am·
Non-empty intersection
c
A2·
Half-space
cy > 0

Cone of perpendicular
vectors to Rows of A
A1·

Figure 8.2: System 1 has a solution if (and only if) the vector c is not contained
inside the positive cone constructed from the rows of A.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.8 Farkas’ Lemma and Theorems of the Alternative 469

■ Example 8.2 Consider the matrix:


 
1 0
A=
0 1
h i h i
and the vector c = 1 2 . Then clearly, we can see that the vector w = 1 2 will
satisfy System 2 of Farkas’ Lemma, since w ≥ 0 and wA = c.
h i h iT
Contrast this with c′ = 1 −1 . In this case, we can choose x = 0 1 . Then
h iT
Ax = 0 1 ≥ 0 and c′ x = −1. Thus x satisfies System 1 of Farkas’ Lemma.
These two facts are illustrated in Figure 8.3. Here, we see that c is inside the
positive cone formed by the rows of A, while c′ is not.

[0 1]

[1 2]
System 2 has
a solution

Positive cone

[1 0]

[1 -1]
System 1 has
a solution

Figure 8.3: An example of Farkas’ Lemma: The vector c is inside the positive cone
formed by the rows of A, but c′ is not.

Exercise 8.3 Consider the following matrix:


 
1 0
A=
1 1
h i
and the vector c = 1 2 . For this matrix and this vector, does System 1 have a
solution or does System 2 have a solution? [Hint: Draw a picture illustrating the
positive cone formed by the rows of A. Draw in c. Is c in the cone or not?] ■

8.8.2 Theorems of the Alternative


Farkas’ lemma can be manipulated in many ways to produce several equivalent
statements. The collection of all such theorems are called Theorems of the Alternative
and are used extensively in optimization theory in proving optimality conditions. We
state two that will be useful to us.
Corollary 8.1 Let A ∈ Rk×n and E ∈ Rl×n . Let c ∈ Rn be a row vector. Suppose
d ∈ Rn is a column vector and w ∈ Rk is a row vector and v ∈ Rl is a row vector.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


470 Chapter 8. Applications in Science and Engineering

Let:
 
A
M= 
E

and
h i
u= w v

Then exactly one of the following systems has a solution:


1. Md ≤ 0 and cd > 0 or
2. uM = c and u ≥ 0

Proof. Let x = −d. Then Md ≤ 0 implies Mx ≥ 0 and cd > 0 implies cx < 0. This
converts System 1 to the System 1 of Farkas’ Lemma. System 2 is already in the
form found in Farkas’ Lemma. This completes the proof. ■

Exercise 8.4 Prove the following corollary to Farkas’ Lemma:

Corollary 8.2 Let A ∈ Rm×n and c ∈ Rn be a row vector. Suppose d ∈ Rn is a


column vector and w ∈ Rm is a row vector and v ∈ Rn is a row vector. Then
exactly one of the following systems of inequalities has a solution:
1. Ad ≤ 0, d ≥ 0 and cd > 0 or
2. wA − v = c and w, v ≥ 0
[Hint: Write System 2 from this corollary as wA − In v = c and then re-write
the system with an augmented vector [w v] with an appropriate augmented
matrix. Let M be the augmented matrix you identified. Now write System 1
from Farkas’ Lemma using M and x. Let d = −x and expand System 1 until you
obtain System 1 for this problem.] ■

8.9 The Karush-Kuhn-Tucker Conditions


Theorem 8.1 Consider the linear programming problem:




max cx
P s.t. Ax ≤ b (8.24)

x≥0

with A ∈ Rm×n , b ∈ Rm and (row vector) c ∈ Rn . Then x∗ ∈ Rn is an optimal


solutiona to P if and only if there exists (row) vectors w∗ ∈ Rm and v∗ ∈ Rn so
that:
Ax∗ ≤ b
(
Primal Feasibility (8.25)
x∗ ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.9 The Karush-Kuhn-Tucker Conditions 471
 ∗



w A − v∗ =c

Dual Feasibility w ≥0 (8.26)

v∗ ≥ 0


( ∗
w (Ax∗ − b) = 0
Complementary Slackness (8.27)
v∗ x ∗ = 0

aThanks to Rich Benjamin for pointing out the fact I was missing “. . . is an optimal solution. . . ”

R The vectors w∗ and v∗ are sometimes called dual variables for reasons that
will be clear in the next chapter. They are also sometimes called Lagrange
Multipliers. You may have encountered Lagrange Multipliers in your Math
230 or Math 231 class. These are the same kind of variables except applied
to linear optimization problems. There is one element in the dual variable
vector w∗ for each constraint of the form Ax ≤ b and one element in the dual
variable vector v∗ for each constraint of the form x ≥ 0.

Proof. Suppose that x∗ is an optimal solution to Problem P . Consider only the


binding constraints at x∗ . For simplicity, write the constraints x ≥ 0 as −x ≤ 0.
Then we can form a new system of equations of the form:
 
A
 E  x = bE (8.28)
E

where E is a matrix of negative identity matrix rows corresponding to the variables


xk that are equal to zero. That is, if xk = 0, then −xk = 0 and the negative identity
matrix row −eTk will appear in E where −eTk ∈ R1×n . Here bE are the right hand
sides of the binding constraints. Let:
 
AE 
M=
E

The fact that x∗ is optimal implies that there is no improving direction d at the
point x∗ . That is, there is no d so that Md ≤ 0 and cT d > 0. Otherwise, by moving
in this direction we could find a new point x̂ = x∗ + λd (with λ sufficiently small) so
that:

Mx̂ = Mx∗ + λMd ≤ bE

The (negative) identity rows in E ensure that x̂ ≥ 0. It follows that:

cx̂ = cT (x∗ + λd) = cx∗ + λcd > cx∗

That is, this point is both feasible and has a larger objective function value than x∗ .
We can now apply Corollary 8.1 to show that there are vectors w and v so that:
 
h AE 
i
w v  =c (8.29)
E
w≥0 (8.30)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


472 Chapter 8. Applications in Science and Engineering

v≥0 (8.31)

Let I be the indices of the rows of A making up AE and let J be the indices of
the variables that are zero (i.e., binding in x ≥ 0). Then we can re-write Equation
8.29 as:
X X
wi Ai· − vj ej = c (8.32)
i∈I j∈J

The vector w has dimension equal to the number of binding constraints of the form
Ai· x = b while the vector v has dimension equal to the number of binding constraints
of the form x ≥ 0. We can extend w to w∗ by adding 0 elements for the constraints
where Ai· x < bi . Similarly we can extend v to v∗ by adding 0 elements for the
constraints where xj > 0. The result is that:

w∗ (Ax∗ − b) = 0 (8.33)
v∗ x ∗ = 0 (8.34)

In doing this we maintain w∗ , v∗ ≥ 0 and simultaneously guarantee that w∗ A−v∗ = c.


To prove the converse we assume that the KKT conditions hold and we are given
vectors x∗ , w∗ and v∗ . We will show that x∗ solves the linear programming problem.
By dual feasibility, we know that Equation 8.32 holds with w ≥ 0 and v ≥ 0 defined
as before and the given point x∗ . Let x be an alternative point. We can multiply
both side of Equation 8.32 by (x∗ − x). This leads to:
   
X  X 
wi Ai· x∗ − vj ej x∗ − vj ej x∗ = cx∗ − cx
X X
wi Ai· x − (8.35)
   
i∈I j∈J i∈I j∈J

We know that Ai· x∗ = bi for i ∈ I and that x∗j = 0 for j ∈ J. We can use this to
simplify Equation 8.35:

vj ej x = cx∗ − cx
X X
wi (bi − Ai· x) + (8.36)
i∈I j∈J

The left hand side must be non-negative, since w ≥ 0 and v ≥ 0, bi − Ai· x ≥ 0 for all
i, and x ≥ 0 and thus it follows that x∗ must be an optimal point since cx∗ − cx ≥ 0.
This completes the proof. ■

R The expressions:

Ax∗ ≤ b
(
Primal Feasibility (8.37)
x∗ ≥ 0
 ∗ ∗
w A − v = c


Dual Feasibility w∗ ≥ 0 (8.38)

v∗ ≥ 0


( ∗
w (Ax∗ − b) = 0
Complementary Slackness (8.39)
v∗ x ∗ = 0

are called the Karush-Kuhn-Tucker (KKT) conditions of optimality. Note


there is one element in w for each row of the constraints Ax ≤ b and one

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.9 The Karush-Kuhn-Tucker Conditions 473

element in the vector v for each constraint of the form x ≥ 0. The vectors
w and v are sometimes called dual variables and sometimes called Lagrange
Multipliers.
We can think of dual feasibility as expressing the following interesting
fact: at optimality, the gradient of the objective function c can be expressed
as a positive combination of the gradients of the binding constraints written
as less-than-or-equal-to inequalities. That is, the gradient of the constraint
Ai· x ≤ bi is Ai· and the gradient of the constraint −xj ≤ 0 is −ej . More
specifically, the vector c is in the cone generated by the binding constraints at
optimality.

■ Example 8.3 Consider the Toy Maker Problem (Equation 2.54) with Dual

Variables (Lagrange Multipliers) listed next to their corresponding constraints:





max z(x1 , x2 ) = 7x1 + 6x2 Dual Variable

s.t. 3x1 + x2 ≤ 120 (w1 )






x1 + 2x2 ≤ 160 (w1 )




 x1 ≤ 35 (w3 )

x1 ≥ 0




 (v1 )

x2 ≥ 0 (v2 )

In this problem we have:


   
3 1 120 h i
   
A = 1 2

 b = 160

 c= 7 6
1 0 35

Then the KKT conditions can be written as:


   
 3 1   120
 x1


   
1 2 ≤ 160

     

x2


 
Primal Feasibility 1 0 35

    
x 0


 1 ≥  




x2 0

  


 h i
3 1
  h i h i



w w2 w3 1 2 v v = 7 6
 1 1 2

 
 
1 0

Dual Feasibility h i h i

w2 w3 ≥ 0 0 0
 


 w1

 h i h i

 v v2 ≥ 0 0

1

T.Abraha(PhD) @AKU, 2024 Linear Optimization


474 Chapter 8. Applications in Science and Engineering
      
 3 1   120 120
 x1

 h i 
    
 w w2 w3 1 2 ≤ 160 − 160


1
       
Complementary Slackness  x2 
 
1 0 35 35

 h ih i
 v v2 x1 x2 = 0

1

Recall that at optimality, we had x1 = 16 and x2 = 72. The binding constraints in


this case where

3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160

To see this note that if 3(16) + 72 = 120 and 16 + 2(72) = 160. Then we should be
able to express c = [7 6] (the vector of coefficients of the objective function) as a
positive combination of the gradients of the binding constraints:
h i
∇(7x1 + 6x2 ) = 7 6
h i
∇(3x1 + x2 ) = 3 1
h i
∇(x1 + 2x2 ) = 1 2

That is, we wish to solve the linear equation:


 
h i 3 1 h i
w1 w2  = 7 6 (8.40)
1 2

Note, this is how Equation 8.32 looks when we apply it to this problem. The result
is a system of equations:

3w1 + w2 = 7
w1 + 2w2 = 6

A solution to this system is w1 = 85 and w2 = 11


5 . This fact is illustrated in Figure
8.4.
Figure 8.4 shows the gradient cone formed by the binding constraints at the
optimal point for the toy maker problem.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.9 The Karush-Kuhn-Tucker Conditions 475

(x∗1 , x∗2 ) = (16, 72)

3x1 + x2 ≤ 120
x1 + 2x2 ≤ 160
x1 ≤ 35
x1 ≥ 0
x2 ≥ 0

Figure 8.4: The Gradient Cone: At optimality, the cost vector c is obtuse with
respect to the directions formed by the binding constraints. It is also contained
inside the cone of the gradients of the binding constraints, which we will discuss at
length later.

Since x1 , x2 > 0, we must have v1 = v2 = 0. Moreover, since x1 < 35, we know


that x1 ≤ 35 is not a binding constraint and thus its dual variable w3 is also zero.
This leads to the conclusion:
   
x∗ 16
 1 =  
h i h i h i h i
∗ w1∗ w2∗ w3∗ = 8/5 11/5 0 v1∗ v2∗ = 0 0
x2 72

and the KKT conditions are satisfied. ■

Exercise 8.5 Consider the problem:

max x1 + x2
s.t. 2x1 + x2 ≤ 4
x1 + 2x2 ≤ 6
x1 , x 2 ≥ 0

Write the KKT conditions for an optimal point for this problem. (You will have a
vector w = [w1 w2 ] and a vector v = [v1 v2 ]).
Draw the feasible region of the problem. At the optimal point you identified
in Exercise 8.1, identify the binding constraints and draw their gradients. Show
that the objective function is in the positive cone of the gradients of the binding
constraints at this point. (Specifically find w and v.) ■

The Karush-Kuhn-Tucker Conditions for an Equality Problem


The KKT conditions can be modified to deal with problems in which we have equality
constraints (i.e., Ax = b).

T.Abraha(PhD) @AKU, 2024 Linear Optimization


476 Chapter 8. Applications in Science and Engineering

Corollary 8.3 Consider the linear programming problem:






maxcx
P s.t. Ax = b (8.41)

x≥0

with A ∈ Rm×n , b ∈ Rm and (row vector) c ∈ Rn . Then x∗ ∈ Rn if and only if


there exists (row) vectors w∗ ∈ Rm and v∗ ∈ Rn so that:

Ax∗ = b
(
Primal Feasibility (8.42)
x∗ ≥ 0
w∗ A − v∗ = c




Dual Feasibility w∗ unrestricted (8.43)
v∗ ≥ 0


n
Complementary Slackness v∗ x∗ = 0 (8.44)

Proof. Replace the constraints Ax = b with the equivalent constraints:


Ax ≤ b (8.45)
−Ax ≤ −b (8.46)
Let w1 be the vector corresponding to the rows in Ax ≤ b and w2 be the vector
corresponding to the rows in −Ax ≤ −b. Then the KKT conditions are:




Ax ≤ b
Primal Feasibility −Ax ≤ −b (8.47)

 ∗
x ≥0

 ∗



w1 A − w2 A − v∗ = c
w∗ ≥ 0



1
Dual Feasibility  (8.48)

 w2∗
≥0


v ≥0


 ∗ ∗

 w (Ax − b) = 0
 1
Complementary Slackness w2∗ (b − Ax∗ ) = 0 (8.49)

v∗ x ∗ = 0

The fact that Ax = b at optimality ensures we can re-write primal-feasibility as:


(
Ax = b
Primal Feasibility
x∗ ≥ 0
Furthermore, since Ax = b we know that complementary slackness of
w1∗ (Ax∗ − b) = 0
w2∗ (b − Ax∗ ) = 0
is always satisfied and thus, we only require v∗ x∗ = 0 as our complementary slackness
condition.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.10 Relating the KKT Conditions to the Tableau 477

Finally, let w = w1 − w2 . Then dual feasibility becomes:

wA − v = w1∗ A − w2 A − v∗ = c (8.50)

Since w1 , w2 ≥ 0, we know that w is unrestricted in sign. Thus we have:


wA − v∗ = c




Dual Feasibility w∗ unrestricted (8.51)
v∗ ≥ 0

This completes the proof. ■

Exercise 8.6 Use the trick of converting a minimization problem to a maximization


problem to identify the KKT conditions for the following problem:

min cx
s.t. Ax ≥ b (8.52)
x≥0

[Hint: Remember, Ax ≥ b is the same as writing −Ax ≤ −b. Now use the KKT
conditions for the maximization problem to find the KKT conditions for this
problem.] ■

8.10 Relating the KKT Conditions to the Tableau


Consider a linear programming problem in Standard Form:

max


cx
P s.t. Ax = b (8.53)

x≥0

with A ∈ Rm×n , b ∈ Rm and (row vector) c ∈ Rn .


The KKT conditions for a problem of this type assert that

wA − v = c
vx = 0

at an optimal point x for some vector w unrestricted in sign and v ≥ 0. (Note, for
the sake of notational ease, we have dropped the ∗ notation.)
Suppose at optimality we have a basis matrix B corresponding to a set of basic
variables xB and we simultaneously have non-basic variables xN . We may likewise
divide v into vB and vN .
Then we have:
h i h i h i
wA − v = c =⇒ w B N − vB vN = cB cN (8.54)

 
h i x
 B = 0
vx = 0 =⇒ vB vN (8.55)
xN

T.Abraha(PhD) @AKU, 2024 Linear Optimization


478 Chapter 8. Applications in Science and Engineering

We can rewrite Expression 8.54 as:


h i h i
wB − vB wN − vN = cB cN (8.56)

This simplifies to:

wB − vB = cB
wN − vN = cN

Let w = cB B−1 . Then we see that:

wB − vB = cB =⇒ cB B−1 B − vB = cB =⇒ cB − vB = cB =⇒ vB = 0 (8.57)

Since we know that xB ≥ 0, we know that vB should be equal to zero to ensure


complementary slackness. Thus, this is consistent with the KKT conditions.
We further see that:

wN − vN = cN =⇒ cB B−1 N − vN = cN =⇒ vN = cB B−1 N − cN (8.58)

Thus, the vN are just the reduced costs of the non-basic variables. (vB are the
reduced costs of the basic variables.) Furthermore, dual feasibility requires that
v ≥ 0. Thus we see that at optimality we require:

cB B−1 N − cN ≥ 0 (8.59)

This is precisely the condition for optimality in the simplex tableau.


We now can see the following facts are true about the Simplex Method:
1. At each iteration of the Simplex Method, primal feasibility is satisfied. This
is ensured by the minimum ratio test and the fact that we start at a feasible
point.
2. At each iteration of the Simplex Method, complementary slackness is satisfied.
After all, the vector v is just the reduced cost vector (Row 0) of the Simplex
tableau. If a variable is basic xj (and hence non-zero), then the its reduced
cost vj = 0. Otherwise, vj may be non-zero.
3. At each iteration of the Simplex Algorithm, we may violate dual feasibility
because we may not have v ≥ 0. It is only at optimality that we achieve dual
feasibility and satisfy the KKT conditions.
We can now prove the following theorem:
Theorem 8.2 Assuming an appropriate cycling prevention rule is used, the simplex
algorithm converges in a finite number of iterations to an optimal solution to the
linear programming problem.

Proof. Convergence is guaranteed by the proof of Theorem 7.1 in which we show


that when the lexicographic minimum ratio test is used, then the simplex algorithm
will always converge. Our work above shows that at optimality, the KKT conditions
are satisfied because the termination criteria for the simplex algorithm are precisely
the same as the criteria in the Karush-Kuhn-Tucker conditions. This completes the
proof. ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.10 Relating the KKT Conditions to the Tableau 479

■ Example 8.4 Consider the following linear programming problem:






max z(x1 , x2 ) = 3x1 + 5x2

 s.t. x1 + 2x2 ≤ 60 (w1 )




x1 + x2 ≤ 40 (w2 )

x1 ≥ 0 (v1 )






x2 ≥ 0 (v2 )

Note we have assigned dual variables corresponding to each constraint on the


right-hand-side of the constraints. That is, dual variable w1 corresponds to the
constraint x1 + 2x2 ≤ 60. We can write this problem in standard form as:



max z(x1 , x2 ) = 3x1 + 5x2

s.t. x1 + 2x2 + s1 = 60 (w1 )






x1 + x2 + s2 = 40 (w2 )





x1 ≥ 0 (v1 )

x2 ≥ 0 (v2 )






s1 ≥ 0 (v3 )






s2 ≥ 0 (v4 )

Note we have added two new dual variables v3 and v4 for the non-negativity
constraints on slack variables s1 and s2 . Our dual variable vectors are: w = [w1 w2 ]
and v = [v1 v2 v3 v4 ]. We can construct an initial simplex tableau as:
 
z x1 x2 s1 s2 RHS
z  
 1 −3 −5 0 0 0 
s1
 
 
 0 1 2 1 0 60 
s2  
0 1 1 0 1 40

In this initial configuration, we note that v1 = −3, v2 = −5, v3 = 0 and v4 = 0.


This is because s1 and s2 are basic variables. We also notice that complementary
slackness is satisfied. That is at the current values of x1 , x2 , s1 and s2 we have:
 
x1
 
i x 
 2
h
v1 v2 v3 v4  =0
 s1 
 
s2

T.Abraha(PhD) @AKU, 2024 Linear Optimization


480 Chapter 8. Applications in Science and Engineering

Applying the Simplex Algorithm yields the final tableau:


 
z x1 x2 s1 s2 RHS
 
z 
 1 0 0 2 1 160 

 
x2 
 0 0 1 1 −1 20 
x1 0 1 0 −1 2 20

The optimal value for v is [0 0 2 1]. Note v ≥ 0 as required. Further, complementary


slackness is still maintained. Notice further that the current value of B−1 can be
found in the portion of the matrix where the identity matrix stood in the initial
tableau. Thus we can compute w as:

w = cB B−1

Since cB = [5 3] (since x2 and x1 the basic variables at optimality) we see that:


 
h 1 −1 h
i i
w= 5 3  = 2 1
−1 2

That is, w1 = 2 and w2 = 1.


Note that w ≥ 0. This is because w is also a dual variable vector for our
original problem (not in standard form). The KKT conditions for a maximization
problem in canonical form require w ≥ 0 (see Theorem 8.1). Thus, it makes sense
that we have w ≥ 0. Note this does not always have to be the case if we do not
begin with a problem in canonical form.
Last, we can see that the constraints:

x1 + 2x2 ≤ 60
x1 + x2 ≤ 40

are both binding at optimality (since s1 and s2 are both zero). This means we
should be able to express c = [3 5]T as a positive combination of the gradients of the
left-hand-sides of these constraints using w. To see this, note that w1 corresponds
to x1 + 2x2 ≤ 60 and w2 to x1 + x2 ≤ 40. We have:
 
1
∇(x1 + 2x2 ) =  
2
 
1
∇(x1 + x2 ) =  
1

Then:
         
1 1 1 1 3
w1   + w2   = (2)   + (1)   =  
2 1 2 1 5

T.Abraha(PhD) @AKU, 2024 Linear Optimization


8.10 Relating the KKT Conditions to the Tableau 481

Thus, the objective function gradient is in the dual cone of the binding constraint.
That is, it is a positive combination of the gradients of the left-hand-sides of the
binding constraints at optimality. This is illustrated in Figure 8.5.

Figure 8.5: This figure illustrates the optimal point of the problem given in Example
8.4. Note that at optimality, the objective function gradient is in the dual cone
of the binding constraint. That is, it is a positive combination of the gradients of
the left-hand-sides of the binding constraints at optimality. The gradient of the
objective function is shown in green.

We can also verify that the KKT conditions hold for the problem in standard
form. Naturally, complementary slackness and primal feasibility hold. To see that
dual feasibility holds note that v = [0 0 2 1] ≥ 0. Further:
 
h i 1 2 1 0 h i h i
2 1  − 0 0 2 1 = 3 5 0 0
1 1 0 1
h i
Here 3 5 0 0 is the objective function coefficient vector for the problem in
Standard Form. ■

Exercise 8.7 Use a full simplex tableau to find the values of the Lagrange multipliers
(dual variables) at optimality for the problem from Exercise 8.5. Confirm that
complementary slackness holds at optimality. Lastly show that dual feasibility
holds by showing that the gradient of the objective function (c) is a positive
combination of the gradients of the binding constraints at optimality. [Hint: Use
the vector w you should have identified.] ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9. Duality

In the last chapter, we explored the Karush-Kuhn-Tucker (KKT) conditions and


identified constraints called the dual feasibility constraints. In this section, we show
that to each linear programming problem (the primal problem) we may associate
another linear programming problem (the dual linear programming problem). These
two problems are closely related to each other and an analysis of the dual problem
can provide deep insight into the primal problem.

9.1 The Dual Problem


Consider the linear programming problem
cT x




max
P  s.t. Ax ≤ b (9.1)


x≥0
Then the dual problem for Problem P is:




minwb
D s.t. wA ≥ c (9.2)

w≥0

R Let v be a vector of surplus variables. Then we can transform Problem D


into standard form as:


 min wb


 s.t. wA − v = c

DS (9.3)


 w≥0



v≥0
Thus we already see an intimate relationship between duality and the KKT
conditions. The feasible region of the dual problem (in standard form) is
precisely the the dual feasibility constraints of the KKT conditions for the
primal problem.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


484 Chapter 9. Duality

In this formulation, we see that we have assigned a dual variable wi


(i = 1, . . . , m) to each constraint in the system of equations Ax ≤ b of the
primal problem. Likewise dual variables v can be thought of as corresponding
to the constraints in x ≥ 0.

Lemma 9.1. The dual of the dual problem is the primal problem.
Proof. Rewrite Problem D as:



 max − bT wT


s.t. − AT wT ≤ −cT (9.4)

wT ≥ 0

Let β = −bT , G = −AT , u = wT and κ = −cT . Then this new problem becomes:




max βu
s.t. Gu ≤ κ (9.5)

u≥0

Let xT be the vector of dual variables (transposed) for this problem. We can
formulate the dual problem as:

min

 xT κ


s.t. xT G ≥ β (9.6)

xT ≥ 0

Expanding, this becomes:





 min − xT cT


s.t. − xT AT ≥ −bT (9.7)

xT ≥ 0

This can be simplified to:

cT x




max
P  s.t. Ax ≤ b (9.8)


x≥0

as required. This completes the proof. ■


Lemma 9.1 shows that the notion of dual and primal can be exchanged and that
it is simply a matter of perspective which problem is the dual problem and which is
the primal problem. Likewise, by transforming problems into canonical form, we can
develop dual problems for any linear programming problem.
The process of developing these formulations can be exceptionally tedious, as it
requires enumeration of all the possible combinations of various linear and variable
constraints. The following table summarizes the process of converting an arbitrary
primal problem into its dual. This table can be found in Chapter 6 of [BJS04].

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.1 The Dual Problem 485

MINIMIZATION PROBLEM MAXIMIZATION PROBLEM

CONSTRAINTS
0 

VARIABLES
0
UNRESTRICTED =

CONSTRAINTS
0

VARIABLES
 0
= UNRESTRICTED

Table 9.1: Table of Dual Conversions: To create a dual problem, assign a dual
variable to each constraint of the form Ax ◦ b, where ◦ represents a binary relation.
Then use the table to determine the appropriate sign of the inequality in the dual
problem as well as the nature of the dual variables.

■Example 9.1 Consider the problem of finding the dual problem for the Toy Maker
Problem (Example 2.30) in standard form. The primal problem is:

max 7x1 + 6x2


s.t. 3x1 + x2 + s1 = 120 (w1 )
x1 + 2x2 + s2 = 160 (w2 )
x1 + s3 = 35 (w3 )
x1 , x 2 , s 1 , s 2 , s 3 ≥ 0

Here we have placed dual variable names (w1 , w2 and w3 ) next to the constraints
to which they correspond.
The primal problem variables in this case are all positive, so using Table 9.1
we know that the constraints of the dual problem will be greater-than-or-equal-to
constraints. Likewise, we know that the dual variables will be unrestricted in sign
since the primal problem constraints are all equality constraints.
The coefficient matrix is:
 
3 1 1 0 0
 
A = 1 2 0 1 0


1 0 0 0 1

Clearly we have:
h i
c= 7 6 0 0 0
 
120
 
b=
160

35

T.Abraha(PhD) @AKU, 2024 Linear Optimization


486 Chapter 9. Duality

Since w = [w1 w2 w3 ], we know that wA will be:


h i
wA = 3w1 + w2 + w3 w1 + 2w2 w1 w2 w3

This vector will be related to c in the constraints of the dual problem. Remember,
in this case, all constraints are greater-than-or-equal-to. Thus we see that
the constraints of the dual problem are:

3w1 + w2 + w3 ≥ 7
w1 + 2w2 ≥ 6
w1 ≥ 0
w2 ≥ 0
w3 ≥ 0

We also have the redundant set of constraints that tell us w is unrestricted


because the primal problem had equality constraints. This will always happen
in cases when you’ve introduced slack variables into a problem to put it in stan-
dard form. This should be clear from the definition of the dual problem for a
maximization problem in canonical form.
Thus the whole dual problem becomes:

min 120w1 + 160w2 + 35w3


s.t. 3w1 + w2 + w3 ≥ 7
w1 + 2w2 ≥ 6
w1 ≥ 0 (9.9)
w2 ≥ 0
w3 ≥ 0
w unrestricted

Again, note that in reality, the constraints we derived from the wA ≥ c part
of the dual problem make the constraints “w unrestricted” redundant, for in fact
w ≥ 0 just as we would expect it to be if we’d found the dual of the Toy Maker
problem given in canonical form. ■

Exercise 9.1 Identify the dual problem for:

max x1 + x2
s.t. 2x1 + x2 ≥ 4
x1 + 2x2 ≤ 6
x1 , x 2 ≥ 0

Exercise 9.2 Use the table or the definition of duality to determine the dual for

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.2 Weak Duality 487

the problem:




min cx
s.t. Ax ≤ b (9.10)

x≥0

Compare it to the KKT conditions you derived in Exercise 8.6. ■

9.2 Weak Duality


There is a deep relationship between the objective function value, feasibility and
boundedness of the primal problem and the dual problem. We will explore these
relationships in the following lemmas.
Lemma 9.2 (Weak Duality). For the primal problem P and dual problem D let x
and w be feasible solutions to Problem P and Problem D respectively. Then:

wb ≥ cx (9.11)

Proof. Primal feasibility ensures that:

Ax ≤ b

Therefore, we have:

wAx ≤ wb (9.12)

Dual feasibility ensure that:

wA ≥ c

Therefore we have:

wAx ≥ cx (9.13)

Combining Equations 9.12 and 9.13 yields Equation 9.11:

wb ≥ cx

This completes the proof. ■

R Lemma 9.2 ensures that the optimal solution w∗ for Problem D must provide
an upper bound to Problem P , since for any feasible x, we know that:

w∗ b ≥ cx (9.14)

Likewise, any optimal solution to Problem P provides a lower bound on


solution D.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


488 Chapter 9. Duality

Corollary 9.1 If Problem P is unbounded, then Problem D is infeasible. Likewise,


if Problem D is unbounded, then Problem P is infeasible.

Proof. For any x, feasible to Problem P we know that wb ≥ cx for any feasible w.
The fact that Problem P is unbounded implies that for any V ∈ R we can find an x
feasible to Problem P that cx > V . If w were feasible to Problem D, then we would
have wb > V for any arbitrarily chosen V . There can be no finite vector w with this
property and we conclude that Problem D must be infeasible.
The alternative case that when Problem D is unbounded, then Problem P is
infeasible follows by reversing the roles of the problem. This completes the proof. ■

9.3 Strong Duality


Lemma 9.3. Problem D has an optimal solution w∗ ∈ Rm if and only if there exists
vector x∗ ∈ Rn and s∗ ∈ Rm such that:
( ∗
w A≥c
Primal Feasibility (9.15)
w∗ ≥ 0
Ax∗ + s∗




=b
Dual Feasibility x∗ ≥ 0 (9.16)

s∗ ≥ 0

(w∗ A − c) x∗ = 0
(
Complementary Slackness (9.17)
w ∗ s∗ = 0

Furthermore, these KKT conditions are equivalent to the KKT conditions for the
primal problem.

Proof. Following the proof of Lemma 9.1, let β = −bT , G = −AT , u = wT and
κ = −cT . Then the dual problem can be rewritten as:

max


βu
s.t. Gu ≤ κ

u≥0

Let xT ∈ R1×n and sT ∈ R1×m be the dual variables for this problem. Then applying
Theorem 8.1, we obtain KKT conditions for this problem:

Gu∗ ≤ κ
(
Primal Feasibility (9.18)
u∗ ≥ 0
 T


 x ∗ G − s∗ T =β

Dual Feasibility  ∗T (9.19)
x ≥0
  ∗T
s ≥0

x ∗ T

(Gu∗ − κ) = 0
Complementary Slackness (9.20)

s∗ T u ∗ = 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.3 Strong Duality 489

We can rewrite:

Gu∗ ≤ κ ≡ −AT wT ≤ −cT ≡ wA ≥ c


x∗ T G − s∗ T = β ≡ x∗ T (−AT ) − s∗ T = −bT ≡ Ax∗ + s∗ = b
x∗ T (Gu∗ − κ) = 0 ≡ x∗ T (−AT )w∗ T − (−cT ) = 0 ≡ (w∗ A − c)x∗ = 0
 

s∗ T u ∗ = 0 ≡ s∗ T w ∗ T = 0 ≡ w ∗ s∗ = 0

Thus, we have shown that the KKT conditions for the dual problem are:
( ∗
w A≥c
Primal Feasibility
w∗ ≥ 0
Ax∗ + s∗




=b

Dual Feasibility x ≥0

s∗ ≥ 0

(w∗ A − c) x∗ = 0
(
Complementary Slackness
w ∗ s∗ = 0

To prove the equivalence to the KKT conditions for the primal problem, define:

s∗ = b − Ax∗ (9.21)
v∗ = w∗ A − c (9.22)

That is, s∗ is a vector slack variables for the primal problem P at optimality and
v∗ is a vector of surplus variables for the dual problem D at optimality. Recall the
KKT conditions for the primal problem are:

Ax∗ ≤ b
(
Primal Feasibility
x∗ ≥ 0
 ∗ ∗
w A − v


=c

Dual Feasibility w ≥0

v∗ ≥ 0


( ∗
w (Ax∗ − b) = 0
Complementary Slackness
v∗ x ∗ = 0

We can rewrite these as:

Ax∗ + s∗ = b
(
Primal Feasibility (9.23)
x∗ ≥ 0
 ∗



w A − v∗ =c
Dual Feasibility  w∗ ≥ 0 (9.24)
v∗ ≥ 0
 
( ∗ ∗
w (s ) = 0
Complementary Slackness (9.25)
v∗ x ∗ = 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


490 Chapter 9. Duality

Where the complementary slackness condition w∗ (Ax∗ − b) = 0 is re-written as:

w∗ (Ax∗ − b) = 0 ≡ w∗ (−(b − Ax∗ ) = 0 ≡ w∗ (−s∗ ) = 0 ≡ w∗ (s∗ )

Likewise beginning with the KKT conditions for the dual problem (Expression
9.15 - 9.17) we can write:
( ∗
w A − v∗ ≥c
Primal Feasibility ∗
w ≥0
Ax∗ + s∗




=b

Dual Feasibility  x ≥0
s∗ ≥ 0


(v∗ ) x∗ = 0
(
Complementary Slackness
w ∗ s∗ = 0

Here we substitute v∗ for w∗ A − c in the complementary slackness terms. Thus we


can see that the KKT conditions for the primal and dual problems are equivalent.
This completes the proof. ■

R Notice that the KKT conditions for the primal and dual problems are equiva-
lent, but the dual feasibility conditions for the primal problem are identical to
the primal feasibility conditions for the dual problem and vice-versa. Thus,
two linear programming problems are dual to each other if they share KKT
conditions with the primal and dual feasibility conditions swapped.

Exercise 9.3 Compute the dual problem for a canonical form minimization problem:




cx
min
P s.t. Ax ≥ b

x≥0

Find the KKT conditions for the dual problem you just identified. Use the result
from Exercise 8.6 to show that KKT conditions for Problem P are identical to
the KKT conditions for the dual problem you just found. ■

Lemma 9.4 (Strong Duality). There is a bounded optimal solution x∗ for Problem
P if and only if there is a bounded optimal solution w∗ for Problem D. Furthermore,
cx∗ = w∗ b.
Proof. Suppose that there is a solution x∗ for Problem P . Let s∗ = b − Ax∗ . Clearly
s∗ ≥ 0.
By Theorem 8.1 there exists dual variables w∗ and v∗ satisfying dual feasibility
and complementary slackness. Dual feasibility in the KKT conditions implies that:

v∗ = w∗ A − c (9.26)

We also know that w∗ , v∗ ≥ 0. Complementary Slackness (from Theorem 8.1) states


that v∗ x∗ = 0. But v∗ is defined above and we see that:

v∗ x∗ = 0 =⇒ (w∗ A − c) x = 0 (9.27)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.3 Strong Duality 491

Likewise, since we have s∗ = b − Ax∗ . Complementary slackness assures us that


w∗ (b − Ax∗ ) = 0. Thus we see that:
w∗ (b − Ax∗ ) = 0 =⇒ w∗ s∗ = 0 (9.28)
Thus it follows from Lemma 9.3 that w∗ is an optimal solution to Problem D since
it satisfies the KKT conditions. The fact that Problem P has an optimal solution
when Problem D has an optimal solution can be proved in a similar manner starting
with the KKT conditions given in Lemma 9.3 and applying the same reasoning as
above.
Finally, at optimality, we know from Lemma 9.3 that:
(w∗ A − c) x∗ = 0 =⇒ w∗ Ax∗ = cx∗ (9.29)
We also know from Theorem 8.1 that:
w∗ (b − Ax∗ ) = 0 =⇒ w∗ b = w∗ Ax∗ (9.30)
Therefore we have: w∗ b = cx∗ . This completes the proof. ■

Corollary 9.2 If Problem P is infeasible, then Problem D is either unbounded or


infeasible. If Problem D is infeasible, then either Problem P is unbounded or
infeasible.
Proof. This result follows by contrapositive from Lemma 9.4. To see this, suppose
that Problem P is infeasible. Then Problem P has no bounded optimal solution.
Therefore, Problem D has no bounded optimal solution (by Lemma 9.4). If Problem
D has no bounded optimal solution, then either Problem D is unbounded or it
is infeasible. A symmetric argument on Problem D completes the proof of the
Lemma. ■

Exercise 9.4 Consider the problem

max x1 + x2
s.t. x1 − x2 ≥ 1
− x1 + x2 ≥ 1
x1 , x 2 ≥ 0

1. Show that this problem is infeasible.


2. Compute its dual.
3. Show that the dual is infeasible, thus illustrating Corollary 9.2.

The following theorem summarizes all of the results we have obtained in the last
two sections.
Theorem 9.1 — Strong Duality Theorem. Consider Problem P and Problem D.
Then exactly one of the following statements is true:
1. Both Problem P and Problem D possess optimal solutions x∗ and w∗
respectively and cx∗ = w∗ b.
2. Problem P is unbounded and Problem D is infeasible.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


492 Chapter 9. Duality

3. Problem D is unbounded and Problem P is infeasible.


4. Both problems are infeasible.

9.4 Geometry of the Dual Problem


The geometry of the dual problem is, in essence, exactly the same the geometry
of the primal problem insofar as they are both linear programming problems and
thus their feasible regions are polyhedral sets1 . Certain examples have a very nice
geometric visualization. Consider the problem:




max 6x1 + 6x2

 s.t. 3x1 + 2x2 ≤ 6

(9.31)


 2x1 + 3x2 ≤ 6

x1 , x 2 ≥ 0

In this problem we have:


   
3 2 h i 6
A= c= 6 6 b= 
2 3 6
Notice that A is a symmetric matrix and cT = b and the dual problem is:




min 6w1 + 6w2

 s.t. 3w1 + 2w2 ≥ 6

(9.32)


 2w1 + 3w2 ≤ 6

w1 , w 2 ≥ 0

This results in a geometry in which the dual feasible region is a reflection of the
primal feasible region (ignoring non-negativity constraints). This is illustrated in
Figure 9.1.

Figure 9.1: The dual feasible region in this problem is a mirror image (almost) of
the primal feasible region. This occurs when the right-hand-side vector b is equal to
the objective function coefficient column vector cT and the matrix A is symmetric.
1Thanks for Michael Cline for suggesting this section.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.4 Geometry of the Dual Problem 493

We can also illustrate the process of solving this problem using the (revised)
simplex algorithm. In doing so, we first convert Problem 9.31 to standard form:





max 6x1 + 6x2

 s.t. 3x1 + 2x2 + s1 =6

(9.33)


 2x1 + 3x2 + s2 = 6

x1 , x 2 ≥ 0

This yields an initial tableau:

 
z 0 0 0
 
s1 
 1 0 6 
 (9.34)
s2 0 1 6

h i h i
This is because our initial cB = 0 0 and thus w = 0 0 . This means at the start
of the simplex algorithm, w1 = 0 and w2 = 0 and x1 = 0 and x2 = 0, so we begin at
the origin, which is in the primal feasible region and not in the dual feasible
region. If we iterate and choose x1 as an entering variable, our updated tableau
will be:

 
z 2 0 12
 
x1  1/3 0 2 

 (9.35)
s2 −2/3 1 2

Notice at this point, x1 = 2, x2 = 0 and w1 = 2 and w2 = 0. This point is again


feasible for the primal problem but still infeasible for the dual. This step is illustrated
in Figure 9.2. Entering x2 yields the final tableau:

 
z 6/5 6/5 72/5
 
x1  3/5 −2/5 6/5 

 (9.36)
x2 −2/5 3/5 6/5

At this final point, x1 = x2 = w1 = w2 = 65 , which is feasible to both problems. This


step is also illustrated in Figure 9.2.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


494 Chapter 9. Duality

Figure 9.2: The simplex algorithm begins at a feasible point in the feasible region
of the primal problem. In this case, this is also the same starting point in the dual
problem, which is infeasible. The simplex algorithm moves through the feasible
region of the primal problem towards a point in the dual feasible region. At the
conclusion of the algorithm, the algorithm reaches the unique point that is both
primal and dual feasible.

We should note that this problem is atypical in that the primal and dual feasible
regions share one common point. In more standard problems, the two feasible regions
cannot be drawn in this convenient way, but the simplex process is the same. The
simplex algorithm begins at a point in the primal feasible region with a corresponding
dual vector that is not in the feasible region of the dual problem. As the simplex
algorithm progresses, this dual vector approaches and finally enters the dual feasible
region.
Exercise 9.5 Draw the dual feasible region for the following problem.

max 3x1 + 5x2


s.t. x1 + 2x2 ≤ 60
x1 + x2 ≤ 40
x1 , x 2 ≥ 0

Solve the problem using the revised simplex algorithm and trace the path of dual
variables (the w vector) in your plot of the dual feasible region. Also trace the
path of the primal vector x through the primal feasible region. [Hint: Be sure to
draw the area around the dual feasible region. Your dual vector w will not enter
the feasible region until the last simplex pivot.] ■

9.5 Economic Interpretation of the Dual Problem


Consider again, the value of the objective function in terms of the values of the
non-basic variables (Equation 6.11):
 
z = cx = cB B−1 b + cN − cB B−1 N xN (9.37)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.5 Economic Interpretation of the Dual Problem 495

Suppose we are at a non-degenerate optimal point. We’ve already observed that:


∂z
= −(zj − cj ) = cj − cB B−1 A·j (9.38)
∂xj
We can rewrite all these equations in terms of our newly defined term:
w = cB B−1 (9.39)
to obtain:
z = wb + (cN − wN) xN (9.40)
Remember, w is the vector of dual variables corresponding to the constraints in our
original problem P .
Suppose we fix the values of xN . Then we can see that the vector w has individual
elements with the property that:
∂z
= wi (9.41)
∂bi
That is, the ith element of w represents the amount that the objective function value
would change at optimality assuming we could modify the right-hand-side of the
constraints. Note, this result holds only in the absence of degeneracy, for
reasons we will see in an example.
Thus, we can think of wi as the shadow price for resource i (the right-hand-side
of the ith constraint). A shadow price is the fair price one would pay for an extra
unit of resource i.

■ Example 9.2 Consider a leather company that requires 1 square yard of leather
to make a regular belt and a 1 square yard of leather to make a deluxe belt. If the
leather company can use up to 40 square yards per week to construct belts, then
one constraint it may have is:

x1 + x2 ≤ 40

In the absence of degeneracy, the dual variable (say w1 ) will tell the fair price
we would pay for 1 extra yard of leather. Naturally, if this were not a binding
constraint, then w1 = 0 indicating that extra leather is worth nothing to us since
we already have a surplus of leather. ■

To understand the economics of the situation, suppose that we a manufacturer is


to produce products 1, . . . , n and we produce x1 , . . . , xn of each. If we can sell each
product for a profit of c1 , . . . , cn , then we wish to find values for x1 , . . . , xn to solve:
n
X
max cj x j (9.42)
j=1

Simultaneously, suppose that m resources (leather, wood, time etc.) are used to
make these n products and that aij units of resource i are used to manufacture
product j. Then clearly our constraints will be:
ai1 x1 + · · · + ain xn ≤ bi (9.43)

T.Abraha(PhD) @AKU, 2024 Linear Optimization


496 Chapter 9. Duality

where bi is the amount of resource i available to the company. Suppose now that the
company decides to sell off some of its resources (instead of manufacturing products).
Suppose we sell each resource for a price wi (i = 1, . . . , m) we’d like to know what a
fair price for these resources would be. Each unit of product j not manufactured
would result in a loss of profit of cj . At the same time, we would obtain a gain (from
selling the excess resources) of:
m
X
aij wi (9.44)
i=1

Because we would save aij units of unit i from not manufacturing 1 unit of xj
(i = 1, . . . , m). Selling this resource would require us to make more money in the sale
of the resource then we could in manufacturing the product, or:
m
X
aij wi ≥ cj (9.45)
i=1

If a selfish profit maximizer wishes to buy these items, then we will seek a price per
resource that minimizes the total he could pay for all the items, that is:
m
X
min w i bi (9.46)
i=1

The Strong Duality Theorem asserts that the optimal solution to this problem will
produce fair shadow prices that force the total amount an individual could purchase
the resources of the company for to be equal to the amount the company could make
in manufacturing products itself.

■ Example 9.3 Assume that a leather company manufactures two types of belts:
regular and deluxe. Each belt requires 1 square yard of leather. A regular belt
requires 1 hour of skilled labor to produce, while a deluxe belt requires 2 hours of
labor. The leather company receives 40 square yards of leather each week and a
total of 60 hours of skilled labor is available. Each regular belt nets $3 in profit,
while each deluxe belt nets $5 in profit. The company wishes to maximize profit.
We can compute the fair price the company could sell its time or labor (or the
amount the company should be willing to pay to obtain more leather or more
hours).
The problem for the leather manufacturer is to solve the linear programming
problem:

max 3x1 + 5x2


s.t. x1 + 2x2 ≤ 60
x1 + x2 ≤ 40
x1 , x 2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.5 Economic Interpretation of the Dual Problem 497

The dual problem for this problem is given as:

max 60w1 + 40w2


s.t. w1 + w2 ≥ 3
2w1 + w2 ≥ 5
w1 , w 2 ≥ 0

If we solve the primal problem, we obtain the final revised simplex tableau as:
 
z 2 1 160
 
x1  −1 2
 20 

x2 1 −1 20

Note that both x1 and x2 are in the basis. In this case, we have w1 = 2 and w2 = 1
from Row 0 of the tableau.
We can likewise solve the dual-problem by converting it to standard form and
then using the simplex algorithm, we would have:

max 60w1 + 40w2


s.t. w1 + w2 − v1 = 3
2w1 + w2 − v2 = 5
w1 , w2 , v 1 , v 2 ≥ 0

In this case, it is more difficult to solve the dual problem because there is no
conveniently obvious initial basic feasible solution (that is, the identity matrix is not
embedded inside the coefficient matrix).
The final full simplex tableau for the dual problem would look like:
 
z w1 w2 v1 v2 RHS
 
z 
 1 0 0 −20 −20 160 

 
w1 
 0 1 0 1 −1 2 
w2 0 0 1 −2 1 1
We notice two things: The reduced costs of v1 and v2 are precisely the negatives
of the values of x1 and x2 . This was to be expected, these variables are duals of each
other. However, in a minimization problem, the reduced costs have opposite sign.
The second thing to notice is that w1 = 2 and w2 = 1. These are the same values we
determined in the primal simplex tableau.
Lastly, let’s see what happens if we increase the amount of leather available by 1
square yard. If w2 (the dual variable that corresponds to the leather constraint) is
truly a shadow price, then we should predict our profit will increase by 1 unit. Our
new problem will become:
max 3x1 + 5x2
s.t. x1 + 2x2 ≤ 60
x1 + x2 ≤ 41
x1 , x 2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


498 Chapter 9. Duality

At optimality, our new revised simplex tableau will be:


 
z 2 1 161
 
x1  −1 2
 22 

x2 1 −1 19

Thus, if our Leather Manufacturer could obtain leather for a price under $1 per yard.
He would be a fool not buy it. Because he could make an immediate profit. This is
what economists call thinking at the margin.

Shadow Prices Under Degeneracy


We have asserted that the value of the dual variable does not necessarily provide a
shadow price at a degenerate constraint. To see this, consider the example of the
degenerate toy maker problem.

■ Example 9.4

max 7x1 + 6x2


s.t. 3x1 + x2 + s1 = 120
x1 + 2x2 + s2 = 160
x1 + s3 = 35
7
x1 + x2 + s4 = 100
4
x1 , x 2 , s 1 , s 2 , s 3 , s 4 ≥ 0

Recall the optimal full tableau for this problem was:


 
z x1 x2 s 1 s2 s3 s4 RHS
 
z 
 1 0 0 0 7/5 0 16/5 544  
 
x1 
 0 1 0 0 −2/5 0 4/5 16 

 
x2 
 0 0 1 0 7/10 0 −2/5 72  
 
s2 
 0 0 0 1 1/2 0 −2 0 

s3 0 0 0 0 2/5 1 −4/5 19

We can compute the dual variables w for this by using cB and B−1 at optimality.
You’ll notice that B−1 can always be found in the columns of the slack variables
for this problem because we would have begun the simplex algorithm with an
identity matrix in that position. We also know that cB = [7 6 0 0] at optimality.
Therefore, we can compute cB B−1 as:
 
0 0 −2/5 0 4/5
 
h i 1 0 7/10 0 −2/5 h i
cB B−1 = 7 6 0 0   = 0 7/5 0 16/5
 
0
 1 1/2 0 −2  
0 0 2/5 1 −4/5

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.6 The Dual Simplex Method 499

In this case, it would seem that modifying the right-hand-side of constraint 1 would
have no affect. This is true, if we were to increase the value by a increment of
1. Suppose however we decreased the value of the right-hand-side by 1. Since we
claim that:
∂z
= w1 (9.47)
∂b1
there should be no change to the optimal objective function value. However,
our new optimal point would occur at x1 = 15.6 and x2 = 72.2 with an objective
function value of 542.4, clearly the value of the dual variable for constraint 1 is
not a true representation the shadow price of resource 1. This is illustrated in
Figure ?? where we can see that modifying the right-hand-side of Constraint 1 is
transforming the feasible region in a way that substantially changes the optimal
solution. This is simply not detected because degeneracy in the primal problem
leads to alternative optimal solutions in the dual problem.

(a) Original Problem (b) RHS Decreased

(c) RHS Increased

It should be noted that a true margin price can be computed, however this is
outside the scope of the notes. The reader is referred to [BJS04] (Chapter 6) for
details. ■

9.6 The Dual Simplex Method


Occasionally when given a primal problem, it is easier to solve the dual problem and
extract the information about the primal problem from the dual.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


500 Chapter 9. Duality

■ Example 9.5 Consider the problem:

min x1 + 2x2
s.t. x1 + 2x2 ≥ 12
2x1 + 3x2 ≥ 20
x1 , x 2 ≥ 0

To transform this problem to standard form, we would have to introduce surplus


variables to obtain:
min x1 + 2x2
s.t. x1 + 2x2 − s1 = 12
2x1 + 3x2 − s2 = 20
x1 , x 2 , s 1 , s 2 ≥ 0

In this case there is no immediately obvious initial basic feasible solution and
we would have to solve a Phase I problem. Consider the dual of the original
maximization problem:

max 12w1 + 20w2


s.t. w1 + 2w2 ≤ 1
2w1 + 3w2 ≤ 1
w1 , w 2 ≥ 0

This is a maximization problem whose standard form is given by:

max 12w1 + 20w2


s.t. w1 + 2w2 + v1 = 1
2w1 + 3w2 + v2 = 1
w1 , w 2 , v 1 , v2 ≥ 0

In this case, a reasonable initial basic feasible solution for the dual problem is
to set v1 = v2 = 1 and w1 = w2 = 0 (i.e., w1 and w2 are non-basic variables) and
proceed with the simplex algorithm from this point. ■

In cases like the one illustrated in Example 9.5, we can solve the dual problem
directly in the simplex tableau of the primal problem instead of forming the dual
problem and solving it as a primal problem in its own tableau. The resulting
algorithm is called the dual simplex algorithm.
For the sake of space, we will provide the dual simplex algorithm for a maximiza-
tion problem:




max cx
P  s.t. Ax = b
x≥0

We will then shown how to adjust the dual simplex algorithm for minimization
problems.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.6 The Dual Simplex Method 501

Algorithm 9 The Matrix form of the Dual Simplex Algorithm

Dual Simplex Algorithm in Algebraic Form


1. Choose an initial basic solution xB and corresponding basis matrix B so
that wA·j − cj ≥ 0 for all j ∈ J , where J is the set of non-basic variables
and w = cB B−1 .
2. Construct a simplex tableau using this initial solution.
3. If b = B−1 b ≥ 0, then an optimal solution has been achieved; STOP. Other-
wise, the dual problem is feasible (since zj − cj ≥ 0). GOTO STEP 4.
4. Choose a leaving variable (row) xBi = bi so that bi < 0.
5. Choose the index of the entering variable (column) xj (j ∈ J ) using the
following minimum ratio test:
( )
zj − cj zk − ck
= min : k ∈ J , ak i < 0
aj i |aki |

6. If no entering variable can be selected (aki ≥ 0 for all k ∈ K) then the dual
problem is unbounded and the primal problem is infeasible. STOP.
7. Using a standard simplex pivot, pivot on element aji , thus causing xBi to
become 0 (and thus feasible) and causing xj to enter the basis. GOTO STEP
3.

The pivoting step works because we choose the entering variable specifically so
that the reduced costs will remain positive. Just as we chose the leaving variable
in the standard simplex algorithm using a minimum ratio test to ensure that B−1 b
remains positive, here we use it to ensure that zj − cj remains non-negative for all
j ∈ J and thus we assure dual feasibility is maintained.
The convergence of the dual simplex algorithm is outside of the scope of this
course. However, it suffices to understand that we are essentially solving the dual
problem in the primal simplex tableau using the simplex algorithm applied to the
dual problem. Therefore under appropriate cycle prevention rules, the dual simplex
does in fact converge to the optimal (primal) solution.
Theorem 9.2 In the absence of degeneracy, or when using an appropriate cycling
prevention rule, the dual simplex algorithm converges and is correct.

■ Example 9.6 Consider the following linear programming problem:

max − x1 − x2
s.t. 2x1 + x2 ≥ 4
x1 + 2x2 ≥ 2
x1 , x 2 ≥ 0

T.Abraha(PhD) @AKU, 2024 Linear Optimization


502 Chapter 9. Duality

Then the standard form problem is given as:

max − x1 − x2
s.t. 2x1 + x2 − s1 = 4
x1 + 2x2 − s2 = 2
x1 , x 2 ≥ 0

The coefficient matrix for this problem is:


 
2 1 −1 0 
A=
1 2 0 −1

In standard form, there is no clearly good choice for a starting basic feasible solution.
However, since this is a maximization problem and we know that x1 , x2 ≥ 0, we
know that the objective function −x1 − x2 must be bounded above by 0. A basic
solution that yields this objective function value occurs when s1 and s2 are both
non-basic and x1 and x2 are both non-basic.
If we let
 
−1 0 
B=
0 −1

Then we obtain the infeasible solution:


 
−4
b = B−1 b =  
−2

Likewise we have:
h i
w = cB B−1 = 0 0

since both s1 and s2 do not appear in the objective function. We can compute the
reduced costs in this case to obtain:

z1 − c1 = wA·1 − c1 = 1 ≥ 0
z2 − c2 = wA·2 − c2 = 1 ≥ 0
z3 − c3 = wA·3 − c3 = 0 ≥ 0
z4 − c4 = wA·4 − c4 = 0 ≥ 0

Thus, the fact that w ≥ 0 and the fact that zj − cj ≥ 0 for all j, shows us that we
have a dual feasible solution and based on our use of a basic solution, we know
that complementary slackness is ensured.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


9.6 The Dual Simplex Method 503

We can now set up our initial simplex tableau for the dual simplex algorithm.
This is given by:
 
z x1 x2 s1 s2 RHS
 
z 
 1 1 1 0 0 0 
 
s1 
 0 −2 −1 1 0 −4  
s2 0 −1 -2 0 1 −2

We can choose either s1 or s2 as a leaving variable. For the sake of argument,


suppose we choose s2 , the variable that’s most negative as the leaving variable.
Then our entering variable is chosen by comparing:
z1 − c1 1
=
|a21 | | − 1|
z2 − c2 1
=
|a22 | | − 2|

Clearly, 1/2 < 1 and therefore, x2 is our entering variable.


 

z x1 x2 s 1 s2 RHS 
z 1 1/2 0 0 1/2 −1
 
 
 
s1 0 -3/2 0 1 −1/2 −3
 
 
 
x2 0 1/2 1 0 −1/2 1

At this point, we see we have maintained dual feasibility, but we still do not have
primal feasibility. We can therefore choose a new leaving variable (s1 ) corresponding
to the negative element in the RHS. The minimum ratio test shows that this time
x1 will enter and the final simplex tableau will be:
 
z x1 x2 s1 s2 RHS
 
z 
 1 0 0 1/3 1/3 −2 

 
x1 
 0 1 0 −2/3 1/3 2 

x2 0 0 1 1/3 −2/3 0

It’s clear this is the optimal solution to the problem since we’ve achieved primal
and dual feasibility and complementary slackness. It’s also worth noting that this
optimal solution is degenerate, since there is a zero in the right hand side. ■

Exercise 9.6 Prove that the minimum ratio test given in the dual simplex algorithm
will maintain dual feasibility from one iteration of the simplex tableau to the next.
[Hint: Prove that the reduced costs remain greater than or equal to zero, just as
we proved that b remains positive for the standard simplex algorithm.] ■

T.Abraha(PhD) @AKU, 2024 Linear Optimization


10. More LP Notes

Contributed by Laurent Poirrier


Linear programming basis. Let a linear programming problem be given by

min cT x
s.t Ax = b
(P)
ℓ≤x≤u
x ∈ Rn ,

where we assume A ∈ Rm×n to be full row rank (we will see in the section “Starting
basis” how to make sure that this assumption holds). We first introduce the concept
of a basis:
• There are n variables xj for j = {0, . . . , n − 1}.
• A basis of (P) is a partition of {0, . . . , n − 1} into three disjoint index subsets
B, L and U, such that if B is the matrix formed by taking the columns of A
indexed by B, then B is square and invertible.
Thus, we always have |B| = m, and there are at most ( m n ) different bases, possibly

less than that since some of the combinations may yield a singular B matrix. Given
a specific basis, we establish some notation:
• For all j ∈ B the variable xj is called a basic variable, and the corresponding
jth column of A is called a basic column.
• For all j ∈ L ∪ U the variable xj is called a nonbasic variable, and the corre-
sponding jth column of A is called a nonbasic column.
• By convention, the vector formed by taking together all the basic variables
is denoted xB . Similarly, cB , ℓB and uB are formed by taking together the
same indices of c, ℓ and u, respectively. The same notation is also used for the
indices in L and U, giving cL , cU , ℓL , ℓU , uL , and uU . We already defined B
as taking together the basic columns of A. The remaining (nonbasic) columns
form the submatrices L and U . Thus, there is a permutation of the columns
of A that is given by [B | L | U ]. For conciseness, we will write A = [B | L | U ],
although it is an abuse of notation.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


506 Chapter 10. More LP Notes

• B, L and U are sets, so the order of the indices does not matter. However, it
must be consistent in the vectors defined above. For example, (i) cB1 must be
the objective function coefficient associated with the variable xB1 , (ii) ℓB1 and
uB1 must be the bounds on that same variable, and (iii) the first column of B
is the column of A that corresponds to that same variable.
The concept of basis is useful because of the following construction:
• We construct a solution x̄ of (P) as follows. Let us fix the components of x̄
in L or U at their lower or upper bound, respectively: x̄L = ℓL and x̄U = uU .
Given that these components are fixed, we can now compute the unique value
of x̄B such that the equality constraints Ax = b of (P) are satisfied. Indeed,
using the abuse of notation described earlier, we have
[B | L | U ] x̄ = b
B x̄B + Lx̄L + U x̄U = b
B x̄B + LℓL + U uU = b
B x̄B = b − LℓL − U uU
x̄B = B −1 (b − LℓL − U uU ) .

• The solution x̄ constructed above is uniquely defined by the partition B, L, U


(i.e., by the basis). We now see why B was required to be an invertible matrix.
• Any solution x that can be constructed as above for some partition B, L, U is
called a basic solution.
• If a basic solution x̄ satisfies ℓ ≤ x̄ ≤ u, then it is called a basic feasible
solution. Indeed, it satisfies all the constraints of (P). Note that the bound
constraints are automatically satisfied for x̄L and x̄U , so it is enough to verify
that ℓB ≤ x̄B ≤ uB .
• The feasible region of (P) is a polyhedron, and it has been shown that x̄ is a
basic feasible solution if and only if it is a vertex of that feasible region. In
other words, basic feasible solutions and vertices are defined differently, but
they are the same thing in the context of linear programming.
• Clearly, vertices are only a subset of all the feasible solutions to (P). However,
in the context of optimization, it sufficient to look at vertices because of the
following: If (P) has an optimal solution, then at least one optimal solution
of (P) is a vertex.
Tableau. A tableau is an equivalent reformulation of (P) that is determined by a
given basis. It lets us easily assess the impact of changing the current basis (making
a pivot) on (a) the objective function value, and (b) primal or dual feasibility.
• A tableau is given by
min c̄T x
s.t Āx = b̄
ℓ≤x≤u
x ∈ Rn .

• c̄T := cT − cTB B −1 A are called the reduced costs corresponding to the basis
B, L, U. They have the property that c̄B = 0, so c̄ expresses the direction of the
objective function only in terms of the nonbasic variables.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


507

• b̄ := B −1 b.
• Ā := B −1 A. If we use the partition Ā = [B̄ | L̄ | Ū ], we have that B̄ = B −1 B = I.
As a consequence, the tableau can be written

min c̄TL xL + cTU xU


s.t xB + L̄xL + Ū xU = b̄
ℓ≤x≤u
x ∈ Rn .

• We already saw above that the (primal) basic solution


 corresponding
 to a basis
(and hence to a tableau) is given by x̄B = B −1 b − Lℓ̄L − U ūU . It is feasible
if ℓB ≤ x̄B ≤ uB . In that case, we say that the basis is primal feasible.
Duality.
• The dual of (P) is
min −bT π − ℓT λ − uT µ
s.t AT π + Iλ + Iµ = c (D)
π free λ ≥ 0, µ ≤ 0.

• A basis of (D) is a partition of {1, . . . , 3n}. However, (D) has a special structure
with two identity matrices in the constraints and no bounds on π. This yields
a characterization of the bases of (D) that follows.
• A basis of (D) needs n basic variables, as many as there are equality constraints
in (D). The π variables do not have bounds, so they are always basic. We now
need to select n − m basic variables among the λ and µ.
• For any given j, the variables λj and µj cannot be both basic, because if they
were, the basis matrix would contain twice the same identity column, and thus
would not be invertible (see Figure 10.1). So we have the following: Consider
the n possible indices j ∈ {0, . . . , n − 1}. For n − m of them, either λj or µj is
basic (but not both). For m of them, neither λj nor µj is basic.

 
· · · · ·
 

 · · · · · 

[AT | I | I] =
 

 · · · · · 

 

 · · · · · 

· · · · ·
· · · · · ↓ · · · · ↓ · ·
· · · · · j · · · · j · ·

Figure 10.1: The constraint matrix of (D)

• Let use consider one of the latter values of j, i.e., one of the m values of j such
that neither λj nor µj is basic. Then, the only nonzeros in the jth row of the
basis matrix come from the jth row of AT (Figure 10.2). Now, considering all

T.Abraha(PhD) @AKU, 2024 Linear Optimization


508 Chapter 10. More LP Notes

   
· · · · · · · · ·
   

 • • • · · 


 • • • 

   

 · · · · · 
 → basis matrix 
 · · · · 

   

 • • • · · 


 • • • 

• • • · · • • •
· · · · ↓ · ↓ ↓ · ↓ · ↓ ↓ · · · · ·
· · · · nonbasic · nonbasic · · · · ·

Figure 10.2: A basis of (D)

the m such value of j, they give a square submatrix of AT . That submatrix is


also a submatrix of the basis matrix, with only zeros to its right. Since the
basis matrix must be invertible, that square m × m submatrix of AT must be
invertible too.
• We now see that a basis of (D) is uniquely defined by a partition of {0, . . . , n−1}
into three disjoint sets B, L, U such that |B| = m and if B T is the matrix
formed by taking the rows of AT indexed by B, then B T is invertible. For every
j ∈ L, λj is basic, and for every j ∈ U, µj is basic. Observe that the conditions
for B, L, U to form a basis of (D) are exactly the same as the conditions for
B, L, U to form a basis of (P).
• In order to construct a basic solution of (D), we partition A into [B | L | U ].
Knowing that λj = 0 for all j ∈ / L and µj = 0 for all j ∈/ U, we rewrite the
constraints as

BT π = cB
LT π + λL = cL
T
U π + µ U = cU
T
U π free λ ≥ 0, µ ≤ 0.

The π variables are all basic and their values can be computed directly as π̄ T =
cTB B −1 . Then, the basic λ variables have values λ̄TL = cTL − π T L = cTL − cTB B −1 L
and the basic µ variables have values µ̄TU = cTU − π T U = cTL − cTB B −1 U . For the
basic solution (π̄ T , λ̄T , µ̄T ) to be feasible in (D), we need λ̄ ≥ 0 and µ̄ ≤ 0. The
basis is then called dual feasible. Let c̄ be the reduced costs in the corresponding
primal tableau, i.e., c̄T := cT − cTB B −1 A. It is easy to verify that (π̄ T , λ̄T , µ̄T ) is
feasible if and only if c̄j ≥ 0 for all j ∈ L and c̄j ≤ 0 for all j ∈ U. Observe that
these are the optimality condition of the simplex method on the primal (P).
• To derive the reduced costs of (D) for a given basis B, L, U, we need to express
the objective function in terms of λB , λU , µB , µL only (the nonbasic variables).
Let us write a partitioned version of (D) again, this time without discarding

T.Abraha(PhD) @AKU, 2024 Linear Optimization


509

the nonbasic variables:

min −bT π − ℓTB λB − ℓTL λL − ℓTU λU − uTB µB − uTL µL − uTU µU


s.t BT π + λB + µB = cB
LT π + λL + µL = cL
T
U π + λU + µU = cU
T
U π free λ ≥ 0, µ ≤ 0.

This gives us
π = B −T (cB − λB − µB ) = (B −T cB ) − B −T (λB + µB )
λL = cL − LT π − µL = (cL − LT B −T cB ) + LT B −T (λB + µB ) − µL
µU = cU − U T π − λU = (cU − U T B −T cB ) + U T B −T (λB + µB ) − λU
where the first term of each right-hand side is constant and can be ignored in
an objective function. After rewriting the objective function and simplifying
the result, we get

min (x̄B − ℓB )λB + (uU − ℓU )λU − (uB − x̄B )µB − (uL − ℓL )µL

where x̄B = B −1 (b − LℓL − U uU ) corresponds to the primal basic solution


associated with B, L, U. The optimality conditions of the simplex method in (D)
are that the reduced costs for λ must be nonnegative and the reduced costs
for µ must be nonpositive. Observe that we can always assume (uU − ℓU ) ≥ 0
otherwise the problem is trivially infeasible. The conditions become x̄B ≥ ℓB
and x̄B ≤ uB . Observe that they correspond exactly to the feasibility of the
primal basic solution x̄ associated with B, L, U.
Pivoting. We can now apply the simplex method to (D). We need to start (and
maintain) a basis B, L, U that is dual feasible, so we need B invertible, c̄j ≥ 0 for all
j ∈ L and c̄j ≤ 0 for all j ∈ U.
At the beginning of each iteration, we select a dual nonbasic variable λB or µB
with a negative reduced cost to become a basic variable. If no such variable can be
found, then we have reached optimality, and we can stop. Otherwise, let j ∈ B be
the index of such a dual variable. Then at the next iteration, we will have either
j ∈ L′ (if it was a component of λB with a negative reduced cost) or j ∈ U ′ (if it was
a component of µB with a negative reduced cost), i.e., that variable will be basic.
That dual variable is said to enter the dual basis.
In a primal view, this corresponds to finding a component j ∈ B of the primal
basic solution x̄ that is infeasible. At the next iteration, we will have j ∈ L′ or j ∈ U ′ .
When adopting a primal view, the very same operation is described as the primal
variable xj leaving the primal basis.
The next step is to choose a primal entering variable. We will choose this variable
carefully in order to to maintain an invertible B matrix and reduced costs of the
appropriate sign.
Assume that the primal leaving variable xj is currently basic in row i (it corre-
sponds to the basic variable xBi ). Let us consider the objective and ith row of the

T.Abraha(PhD) @AKU, 2024 Linear Optimization


510 Chapter 10. More LP Notes

current tableau:

e∈L f ∈L g∈U h∈U


min c̄e xe + c̄f xf + c̄g xg + c̄h xh
..
s.t .
xj + āie xe + āif xf + āig xg + āih xh = b̄i
..
.
ℓ≤x≤u
x ∈ Rn

where āie , āig > 0 and āif , āih < 0. The four indices e, f, g, h represent the four possible
configurations: variable at upper or lower bound, and āik positive or negative. We
only use the notation e, f, g, h for simplicity: there can be zero or more than one
variable in each configuration. All variables in one given configuration are treated
similarly.
Any āik = 0 can be ignored. They do not interfere with the computations below,
and it can be shown that the B ′ matrix of the next iteration will be invertible if and
only if we do not consider the corresponding columns as candidate entering columns.
Since e, f ∈ L and g, h ∈ U, we currently have c̄e , c̄f ≥ 0 and c̄g , c̄h ≤ 0.
• If xj leaves to its lower bound, we will need c̄′j ≥ 0 at the next iteration,
while maintaining zero reduced costs for all other indices in B. Any such new
objective function can be achieved by adding a nonnegative multiple t of the
ith row of the tableau to the current objective function. The multiplier t will
be called the dual step length.
- We know that c̄e will become c̄′e = c̄e + t āie , which is guaranteed to always
meet c̄′e ≥ 0 because āie > 0.
- Instead, since āif < 0, we will have c̄′f = c̄f + t āif ≥ 0 if and only if t ≤
c̄f /(−āif ).
- For c̄′g = c̄g + t āig ≤ 0, we need t ≤ −c̄g /āig .
- Finally, c̄′h = c̄h + t āih ≤ 0 is guaranteed to always be met.
• If xj leaves to its upper bound, we will need c̄′j ≤ 0 at the next iteration,
while maintaining zero reduced costs for all other indices in B. Any such new
objective function can be achieved by subtracting a nonnegative multiple t of
the ith row of the tableau to the current objective function.
- The condition c̄′e = c̄e − t āie ≥ 0 requires t ≤ c̄e /āif .
- The condition c̄′f = c̄f − t āif ≥ 0 is always satisfied.
- The condition c̄′g = c̄g − t āig ≤ 0 is always satisfied.
- The condition c̄′h = c̄h − t āih ≤ 0 requires t ≤ (−c̄h )/(−āih ).
If the signs of the c̄k and āik coefficients are such that no conditions are imposed
on t, it can be shown that (D) is unbounded, which corresponds to (P) being infeasible
(note that, because of the finite bounds ℓ and u, (P) is never unbounded).
Each of the above conditions defines an upper bound tk on t, i.e., t ≤ tk for all
k ∈ L ∪ U. The most restrictive condition can be selected by computing t = mink tk .
If k is a value of k that yields the minimum, we will have c̄′k = 0 and k can be our
entering variable, i.e., we can set B′ = B \ {j} ∪ {k}. Finding k is called the ratio test.
Figure 10.3 summarizes how to compute tk depending on the signs of āik and c̄k .

T.Abraha(PhD) @AKU, 2024 Linear Optimization


511

j∈B k ∈ L∪U āik tk

k∈L >0
j ∈ L′ <0 c̄k /(−āik )
(x̄j < ℓj ) k∈U >0 (−c̄k )/āik
<0
k∈L >0 c̄k /āik
j ∈ U′ <0
(x̄j > uj ) k∈U >0
< 0 (−c̄k )/(−āik )

Figure 10.3: Computing the upper bounds tk on the dual step length t in the ratio
test.

Starting basis. Before we can apply the dual simplex method, we need to have a
dual feasible basis. First, this means that we need a set of column indices B such
that B is invertible. A simple way to obtain that is to add m artificial variables z
fixed to zero, as demonstrated in (P+):

min cT x + 0T z
s.t Ax + Iz = b
ℓ≤x≤u (P+)
0≤z≤0
x ∈ Rn , z ∈ Rm

We can do that as a very first step before starting the dual simplex method. Then, it
is easier to let n := n + m, cT := [cT 0T ], ℓT := [ℓT 0T ], uT := [uT 0T ] and A := [A I],
so that you can forget about the z variables and have a problem of the form (P),
but with the guarantee that the last m columns of A form an identity matrix (which
is invertible: I −1 = I). Note that having an m × m identity in A also ensures that A
is full row rank.
Once we have B, it is straightforward to construct L and U such that B, L, U is
dual feasible. Having B is enough to compute the reduced costs c̄T = cT − cTB B −1 A.
For all j ∈/ B, we can assign j to L if c̄j ≥ 0 or to U if c̄j ≤ 0. This way, c̄j will always
have the appropriate sign to ensure dual feasibility.
Summary. We can now give, in Figure 10.4, a precise description of the operations
in the dual simplex method with bounds. We can also make a few observation that
will prove useful in implementing the dual simplex method.
At Step 1, in most cases, there will be multiple candidate values of i such that
x̄Bi violates its bounds. Choosing one to become the leaving variable is called a
pricing rule. In theory, any candidate would work, but in practice it is a good idea

T.Abraha(PhD) @AKU, 2024 Linear Optimization


512 Chapter 10. More LP Notes

to choose a candidate with a large bound violation, for example one with the largest
violation.
There are a few useful invariants in the dual simplex method that we can use to
verify that our implementation is working as intended. First, we have the matrix
B formed with the columns of A with indices in B. This matrix must always
stay invertible. If B becomes singular, then the ratio test is not working properly.
Specifically, we are choosing an entering variable k such that the tableau element āik
is zero. Second, there is dual feasibility. We must always have c̄j ≥ 0 for all j ∈ L
and c̄j ≤ 0 for all j ∈ U. If we lose dual feasibility, it also means that the ratio test
is not working. In this case, we chose a wrong value for tk , not actually mink {tk },
something larger.
Finally, recall that at any given iteration of the simplex method, we can compute
the corresponding basic solution by letting x̄B = B −1 (b − LℓL − U uU ), x̄L = ℓL and
x̄U = uU . In the dual simplex method, x̄ will not be feasible (until the last iteration,
at which point we stop). However, we can still compute the corresponding dual
obective function value: z̄ = cT x̄. As the dual simplex method makes progress, this
objective should be nondecreasing: from one iteration to the next, it either stays the
same (when t = 0), or increases. If z̄ decreases, it means that we made a mistake in
the choice of the leaving variable.

Exercise 1
Consider the problem
max 2x1 + 3x2
s.t. 2x1 + x2 ≤ 10
x1 + 2x2 ≤ 10
x1 + x2 ≤ 6
x1 , x 2 ≥ 0

a)
Write down a matrix A and vectors b and c so that the problem can be written on
the form
max cT x
s.t. Ax ≤ b
x≥0

b)
Sketch the feasible region of the problem.

c)
Solve the problem using the Simplex method.

d)
We consider the same constraints as in a), but change the objective function to
2x1 + 2x2 . Apply the simplex method again to find all optimal solutions to this

T.Abraha(PhD) @AKU, 2024 Linear Optimization


513

modified problem.

Exercise 2
Find any optimal solution to the problem

max x1 + x2 + x3
s.t. 2x1 − 2x2 + x3 ≤ 4
3x1 − x2 + 2x3 ≤ 2
x1 , x 2 , x 3 ≥ 0

Exercise 3
In the field of compressive sensing one attempts to recover an unknown vector from
an underdetermined set of (linear) measurements, i.e., find an unknown x ∈ RN that
satisfies Ax = p, where
• p ∈ Rm is the vector of measurements, and
• A is the m × N matrix which collects those measurements.
In practical applications m is much smaller than N , and we can’t expect to recover
x in general. But if we have some additional information about x, it turns out that
the knowledge of the measurements in p may still be enough to recover x. The
additional information we will consider is sparsity (a vector is called sparse if it has
mostly components that are zero): For many “magic" matrices A, if it is known that
x is sparse, x can be recovered as the optimal solution to the problem

min ∥x∥1
(10.1)
s.t. Ax = p,

where ∥x∥1 = |x1 | + |x2 | + . . . + |xN |. Note that the variable x here is unconstrained:
It is not required to be non-negative. In the following we will test if this procedure
works for a very small vector and matrix.

a)
Show that x is an optimal solution to (10.1) if and only if it is an optimal solution
to
(−x+ − x− )
P
max    i  i  i
A −A x+   p 
s.t.  ≤ (10.2)
−A A x− −p
x+ , x− ≥ 0

This is a linear programming problem in standard form.


− + −
Hint: Write xi = x+ i − xi , where xi , xi ≥ 0.
Let
 us test  this procedure on the sparse vector x = (0, 0, −1), and the matrix
1 0 −1
A= , to see if x is recovered from the vector of measurements, which
0 1 −1
here can be computed to be p = (1, 1). Of course, it does not sound like rocket

T.Abraha(PhD) @AKU, 2024 Linear Optimization


514 Chapter 10. More LP Notes

science to recover a vector with 3 components from two measurements. But the
magic is that solving the problem (10.1) also can help recover sparse vectors x when
N is very large and m is very small compared to N !

b)
Solve (10.2), with A and p as given above, using the simplex method. Is the correct
x recovered? Is the optimum unique?
To get started, you can use that the primal dictionary is

ζ = −x1 −x2 −x3 −x4 −x5 −x6


w1 = 1 −x1 +x3 +x4 −x6
w2 = 1 −x2 +x3 +x5 −x6
w3 = −1 +x1 −x3 −x4 +x6
w4 = −1 +x2 −x3 −x5 +x6

where we wrote x+ = (x1 , x2 , x3 ), x− = (x4 , x5 , x6 ), and denoted the slack variables


by wi .
Hint: The starting dictionary above is not primal feasible, but dual feasible. So
write down the dual dictionary or apply the dual simplex method.

A couple of remarks should be made.


• For this particular exercise, simplex is not the easiest way to solve (10.1): That
Ax = p means simply that x1 − x3 = x2 − x3 = 1, so that x1 = x2 = x3 + 1, with
x3 arbitrary. The problem thus boils down to minimizing |x1 | + |x2 | + |x3 | =
2|x3 | + |x3 + 1|, which is easily solved by hand.
• For larger A and p, we depend on an implementation of simplex, since such
problems are too tedious to solve by hand.

T.Abraha(PhD) @AKU, 2024 Linear Optimization


515

Initialization
Add m variables to the problem, fixed to zero by their bounds.
From now on, only consider the enlarged problem:
n := n + m, cT := [cT 0T ], ℓT := [ℓT 0T ], uT := [uT 0T ] and A := [A I],
where 0T is a row vector of size m with all components set to zero.
Build the starting basis:
Set B := {n, . . . , n + m − 1}.
Form the corresponding basis matrix B.
Compute c̄T = cT − cTB B −1 A.
For all j ∈ {0, . . . , n − 1},
if c̄j > 0, set j ∈ L,
if c̄j < 0, set j ∈ U,
if c̄j = 0, we can arbitrarily select either j ∈ L or j ∈ U.
Step 1 (leaving variable)
Form the basis matrix B (from the columns of A indexed by B).
Compute c̄T = cT − cTB B −1 A.
Compute x̄B = B −1 (b − LℓL − U uU ).
Find a component i of xB such that either x̄Bi < ℓBi or x̄Bi > uBi .
If no such i exists, we reached optimality. Stop.
Let j be such that xj corresponds to xBi .
Step 2 (entering variable)
Compute the ith row of B −1 A.
Perform the ratio test: compute k = arg mink∈L∪U {tk }, where tk is defined as in Figure 10.3.
If there is no bound tk , the problem is infeasible. Stop.
Step 3 (pivoting)
Leaving variable:
B := B \ {j}
If x̄Bi < ℓBi , then L := L ∪ {j}.
If x̄Bi > uBi , then U := U ∪ {j}.
Entering variable:
If k ∈ L, then L := L \ {k}.
If k ∈ U, then U := U \ {k}.
B := B ∪ {k}
Go to Step 1.

Figure 10.4: Summary of the dual simplex method with bounds.

T.Abraha(PhD) @AKU, 2024 Linear Optimization

You might also like