Introduction
10-725 Optimization
Geoff Gordon
Ryan Tibshirani
Administrivia
• http://www.cs.cmu.edu/~ggordon/10725-F12/
• http://groups.google.com/group/10725-f12
Geoff Gordon—10-725 Optimization—Fall 2012 2
Administrivia
• Prerequisites: no formal ones, but class will be
fast-paced
• Algorithms: basic data structures & complexity
• Programming: we assume you can do it
• Linear algebra: matrices are your friends
• ML/stats: source of motivating examples
• Most important: formal thinking
Geoff Gordon—10-725 Optimization—Fall 2012 3
Administrivia
• Coursework: 5 HWs, scribing, midterm, project
• Project: use optimization to do something cool!
‣ groups of 2–3 (no singletons please)
‣ proposal, milestone, final poster session, final paper
• Final poster session: Tue or Wed, Dec 11 or 12,
starting at about 3PM in NSH atrium, lasting 3
hrs
Geoff Gordon—10-725 Optimization—Fall 2012 4
Administrivia
• Scribing
‣ multiple scribes per lecture (coordinate one
writeup); required to do once during term
‣ sign up now to avoid timing problems
• Late days: you have 5 to use wisely
‣ in lieu of any special exceptions for illness, travel,
holidays, etc.—your responsibility to allocate
‣ some deadlines will be non-extendable
Geoff Gordon—10-725 Optimization—Fall 2012 5
Administrivia
• Working together
‣ great to have study groups
‣ always write up your own solutions, closed notes
‣ disclose collaborations on front page of HW
Geoff Gordon—10-725 Optimization—Fall 2012 6
Administrivia
• Office hours
• Recitations: none this week
• Audit forms: please audit r.t. just sitting in
‣ except: postdocs & faculty welcome to sit in
• Waitlist: there shouldn’t be one
• Videos
Geoff Gordon—10-725 Optimization—Fall 2012 7
Most important
• Work hard, have fun!
Geoff Gordon—10-725 Optimization—Fall 2012 8
Optimization example
• Simple economy: m agents, n goods
‣ each agent: production pi ∈ Rn, consumption ci ∈ Rn
• Cost of producing p for agent i:
• Utility of consuming c for agent i: si(p)
2
di(c) 3
2
1.5
1 1
0.5
3
3 2 3
3 1 2
2 1
2 0 0
1 1
0 0
Geoff Gordon—10-725 Optimization—Fall 2012 9
Walrasian equilibrium
� � �
max i [di (ci ) − si (pi )] s.t. i pi = i ci
• Idea: put price λ on good j; agents optimize
j
production/consumption independently
di(c) – λTc
‣ high price → produce ↑, consume ↓ 0.2
0
ï0.2
‣ low price → produce ↓, consume ↑ ï0.4
ï0.6
ï0.8
3
‣ “just right” prices → constraint satisfied 2 3
2
1 1
0 0
Geoff Gordon—10-725 Optimization—Fall 2012 10
Algorithm: tâtonnement
� � �
max i [di (ci ) − si (pi )] s.t. i pi = i ci
λ ← [0 0 0 …]T di(c) – λTc
for k = 1, 2, … 0.2
0
ï0.2
‣ each agent solves for pi ï0.4
ï0.6
ï0.8
and ci at prices λ 3
2 3
2
‣ λ ← λ + tk(c – p) 1 1
0 0
Geoff Gordon—10-725 Optimization—Fall 2012 11
Results for a random market
produce/consume prices
0.3
4
second good
2 0.25
0 0.2
0 2 4 0.2 0.25
first good
Geoff Gordon—10-725 Optimization—Fall 2012 12
Why is tâtonnement cool?
• Algorithm is nearly obvious, given setup
‣ Leon Walras (1874), based on ideas of Antoine
Augustin Cournot (1838)
• But analysis (Arrow and Debreu, 1950s) is
subtle: needs concepts from later in this course
‣ duality, dual decomposition, convergence rates of
gradient descent
• Variants need even more subtlety
Geoff Gordon—10-725 Optimization—Fall 2012 13
“Typical” problem
• Minimize s.t.
• e.g.: f() and g () all linear:
i
• e.g.: f() and g () all convex:
i
• e.g.: f() linear, g () is –min(eig(reshape(x, k, k))):
1
Geoff Gordon—10-725 Optimization—Fall 2012 14
Ubiquitous (and pretty cool)
‣ LPs at least as old as Fourier
‣ first practical algorithm: simplex (Dantzig, 1947)
‣ for a long time, best runtime bounds were
exponential, but practical runtime observed good
‣ many thought LPs were NP-hard
‣ Kachiyan (1979), Karmarkar (1984): LP in P
‣ Spielman & Teng (2002): simplex solves “most” LPs
in poly time
‣ LPs are P-complete: “hardest” poly-time problem
Geoff Gordon—10-725 Optimization—Fall 2012 15
Optimization for ML & stats
• Lots of ML & stats based on optimization
‣
• Exceptions?
‣
• Advantages
‣
Geoff Gordon—10-725 Optimization—Fall 2012 16
Choices
performance (runtime, solution quality, …)
• Set up problem
usually many choices, widely different
• Transformations: duality, relaxations,
approximations
• Algorithms:
‣ first order, interior point, ellipsoid, cutting plane
‣ smooth v. nonsmooth v. some combination
‣ eigensystems
‣ message passing / relaxation
Geoff Gordon—10-725 Optimization—Fall 2012 17
Consequences
• First order (gradient descent, FISTA, Nesterov’s
method) v. higher order (Newton, log barrier,
ellipsoid, affine scaling)
‣ # iters poly in 1/ϵ vs. in log(1/ϵ)
‣ cost of each iteration: O(n) or less, vs. O(n3) or so
• Balanced (#constrs ! #vars) or not?
‣ e.g., ellipsoid handles #constrs = !
Geoff Gordon—10-725 Optimization—Fall 2012 18
Consequences
• Sparsity? Locality? Other special structure?
‣ in solution, in active constraints, in matrices
describing objective or constraints
• E.g., Ax = b: how fast can we compute Ax?
• E.g., simplex vs. log barrier
Geoff Gordon—10-725 Optimization—Fall 2012 19
Consequences
• What degree of “niceness”?
‣ differentiable, strongly convex, self-concordant,
submodular
• Can we split f(x) = g(x) + h(x)?
• Is f(x) “close to” a smooth fn?
• Care more about practical implementation or
analysis?
Geoff Gordon—10-725 Optimization—Fall 2012 20
Some more examples
• Image segmentation • Equilibria in games
• Perceptron, SVM (CE, EFCE, polymatrix)
• MPE in graphical model • Maximum entropy
• Linear regression • Network flow
• Lasso (group, graphical, …) • TSP
• Parsing, grammar learning • Experimental design
• Sensor placement in a • Compressed sensing
sensor network • …
Geoff Gordon—10-725 Optimization—Fall 2012 21
Example: playing poker
• http://www.cs.cmu.edu/~ggordon/poker/
• Problem: compute a minimax equilibrium
• Even this simple game has 2 strategies/player
26
• We reduce to an LP with ~100 variables
• Similar methods have been used for
competition-level 2-player limit Texas Hold’em
‣ abstract the game by clustering information sets
‣ buy a really big workstation, run for days
Geoff Gordon—10-725 Optimization—Fall 2012 22
Dynamic walking
[Schkolnik, Levashov, Manchester,Tedrake, 2010]
http://groups.csail.mit.edu/locomotion/movies/LittleDog_MIT_dynamic_short.f4v
Geoff Gordon—10-725 Optimization—Fall 2012 23