2019 Algorithmic Game Theory Lecture Notes
2019 Algorithmic Game Theory Lecture Notes
Palash Dey
Indian Institute of Technology, Kharagpur
palash.dey@cse.iitkgp.ac.in
Copyright c 2019 Palash Dey.
This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/).
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
Statutory warning: This is a draft version and may contain errors. If you find any error, please send an email to the author.
2
Contents
I Game Theory 7
3 Matrix Games 21
3.1 Security: the Maxmin Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Minimax Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Application of Matrix Games: Yao’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 View of Randomized Algorithm as Distribution over Deterministic Algorithms . . . . . 27
3.3.2 Yao’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Computing Equilibrium 29
4.1 Computing MSNEs of Bimatrix Games by Enumerating Support . . . . . . . . . . . . . . . . . 29
4.2 Important Classes of Succinct Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Potential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.4 Approximate PSNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.5 Local Search Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.6 Complexity Class: Polynomial Local Search (PLS) . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.7 PLS-Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.8 PSNE Computation in Congestion Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.9 Complexity Class: Functional NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.10 Complexity Class TFNP: NP Search Problems with Guaranteed Witness . . . . . . . . . . . . . 40
4.11 Complexity Class PPAD: A Syntactic Subclass of TFNP . . . . . . . . . . . . . . . . . . . . . . 40
4.12 A Canonical PPAD-complete Problem: Sperner’s Lemma . . . . . . . . . . . . . . . . . . . . . 41
3
4.13 Algorithms for ε-MSNE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.13.1 A Simple Polynomial Time Algorithm for 12 Approximate MSNE . . . . . . . . . . . . . 42
6 Price of Anarchy 57
6.1 Braess’s Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Pigou’s Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.3 PoA for Selfish Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4 Selfish Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Selfish Load Balancing Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.6 Price of Anarchy of Selfish Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
II Mechanism Design 69
9 Gibbard-Satterwaite Theorem 77
9.1 Statement and Proof of Gibbard-Satterwaite Theorem . . . . . . . . . . . . . . . . . . . . . . . 77
9.2 Way-outs from GS Impossibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4
10.3 Groves Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.4 Clarke (Pivotal) Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.5 Examples of VCG Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.6 Weighted VCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.7 Single Parameter Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.8 Implementability in Intermediate Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.9 A Glimpse of Algorithmic Mechanism Design: Knapsack Allocation . . . . . . . . . . . . . . . 95
Notation: N = {0, 1, 2, ...} denotes the set of natural numbers, R denotes the set of real numbers. For a set
X, its power set is denoted by 2X .
5
6
Part I
Game Theory
7
Chapter 1
Game theory is a field in mathematics which studies “games.” Intuitively speaking, a game is any “system”
where there are multiple parties (called players of the game), the “outcome” depends on the actions that
individual parties perform, and different parties derive different utility from the outcome. Let us see some
examples of games to have a better feel for the subject. Our first example is arguably the most exciting one
from the students’ perspective.
Example 1.0.1 (Grading Game). Consider the “Algorithmic Game Theory” class in IIT Kharagpur in
2019. Suppose the instructor announces that the grading policy will be as follows – the top 10% of
students get EX grade, next 20% get A, etc. Could you see the game that this grading policy induces? The
players are the students in the class. The outcome of the system is the function which maps students to
the grades that they get. Observe that the outcome of the system depends on the actions of all the players.
Example 1.0.2 (Prisoner’s Dilemma). Consider two persons have committed a crime and subsequently
been caught by the police. However, the police do not have enough evidence to make a strong case against
them. The police ask them separately about the truth. Both the prisoners each have the following options
– either tell the truth that they have committed the crime (let us call this action C) or lie that they have
not committed the crime (let us call this action NC). The prisoners also know the following – if both the
prisoners play C, then police has a strong case and both the prisoners get 5 years of jail terms; if both
the prisoners play NC, then the police do not have a strong case and both the prisoners get 2 years of jail
terms; if one prisoner plays C and the other plays NC, then the police again have a strong case and the
prisoner who played C gets 1 year of jail term (less punishment for cooperating with the police) whereas
the other prisoners get 10 years of jail terms.
9
Our next example is called congestion games which model the congestion on road networks, Internet
traffic, etc.
Example 1.0.3 (Congestion Games). Suppose every people in a city want to go from some point A in
the city to some other point B and each person decide the path that they wish to take. The speed with
which people can commute on any road is inversely proportional to the number of people using that road.
We can again observe that the situation again induces a game with commuters being its players and the
time that every commuter takes to reach its destination being the outcome.
The above way of modeling situations in terms of games is so fundamental that we can keep on giving
more and more examples of games from almost every field that we can think about. We now formally define
games. The most common form of games is the normal form games aka strategic form games.
Definition 1.1. A normal form game Γ is defined as a tuple hN, (Si )i∈N , (ui )i∈N i where
Example 1.1.1 (Prisoners’ Dilemma in Normal Form). The prisoners’ dilemma game in the normal
form can be written as follows.
Player 2
C NC
B Payoff matrix:
C (−5, −5) (−1, −10)
Player 1
NC (−10, −1) (−2, −2)
Okay, suppose we have modeled some scenario as a game. Then what? What are the questions we would
like to answer? One of the most important questions that we would like to answer is regarding prediction –
can we predict what will be the “outcome” which is equivalent to asking who will play what strategy? This is
10
indeed the case for many applications like predicting share prices in a stock market, predicting the amount
of revenue that a Government is likely to have by auctioning say 5G spectrum, predicting the congestion
of a road network on Independence day, predict the amount of profit that and so on. A little thought
towards answering these kinds of prediction related questions convinces us that one cannot hope to make
any “reliable” prediction without making some assumptions on how players behave.
1.2.1 Utility
We assume that each player has a utility function aka payoff function which maps from the set of outcomes
(which is the set of tuples, called strategy profile, specifying which player has played what strategy) to the set
R of real numbers denoting the amount of happiness that that player derives from some outcome. Observe
that the utility of any player not only depends on the strategy that he/she plays but also depends on the
strategies that other players play.
1.2.3 Intelligence
We assume that the players have infinite computational power. For example, we assume that the players are
competent enough to make any level of reasoning, can argue like any game theorist to compute the best
strategies for him/her under any number of circumstances.
11
1.3 Examples of Normal Form Games
We now see some examples of games in their normal form. Our first example is a kind of coordination game
where two players, say a husband (player 1) and a wife (player 2), need to independently decide among two
options, say going for shopping (option A) and going to a cricket stadium to watch a match (option B). If
the players play different strategies (that is they are not together), then both of them are unhappy and both
derive 0 utility. If they play the same strategies, then both are happy and derive positive utility. However, if
they both end up playing A, then the utility of player 1 is more than that of player 2; if both play B, then the
utility of player 2 is more than player 1.
Example 1.3.1 (Battle of Sexes). The game battle of sexes can be written in the normal form as follows.
Player 2
A B
B Payoff matrix:
A (1, 2) (0, 0)
Player 1
B (0, 0) (2, 1)
Our next example is another coordination game. Suppose two companies (player 1 and player 2) want
to decide on the product (between product A and product B) for which they will manufacture parts. Assume
that the parts that the companies produce are complementary. So, if the players end up playing different
strategies, then they both lose and derive 0 utility. On the other hand, if the players play the same strategy,
then they both gain positive utility. However, the exact amount of utility that the players derive may not be
the same for both the products (maybe because of some externalities like the existence of other companies
in the market).
Example 1.3.2 (Coordination Games). The coordination game in the normal form can be written as
follows.
Player 2
A B
B Payoff matrix:
A (10, 10) (0, 0)
Player 1
B (0, 0) (1, 1)
We next look at another kind of game which is called strictly competitive games or zero-sum games. Zero-
12
sum games always involve two players. In this kind of game, the utility of the row player in any strategy
profile is the minus of the utility of the column player in that strategy profile. Below we see two zero-sum
games, namely matching pennies and rock-paper-scissor, in normal form.
Example 1.3.3 (Matching Pennies). The game matching pennies can be written in the normal form as
follows.
Player 2
A B
B Payoff matrix:
A (1, −1) (−1, 1)
Player 1
B (−1, 1) (1, −1)
Example 1.3.4 (Rock-Paper-Scissors). The game Rock-Paper-Scissors can be written in the normal form
as follows.
Player 2
Rock Paper Scissors
B Payoff matrix: Rock (0, 0) (−1, 1) (1, −1)
Player 1 Paper (1, −1) (0, 0) (−1, 1)
Scissors (−1, 1) (1, −1) (0, 0)
Next, let us look at the tragedy of common game which involves more than two players. We have n players
for some natural number n. Each of them has two strategies – to build a factory (strategy 1) or to not build
a factory (strategy 0). If someone builds a factory, then he/she derives 1 unit of utility from that factory.
However, factories harm society (because they pollute the environment say). Each factory built causes 5
units of harm but that harm is shared by all players.
Example 1.3.5 (Tragedy of the Commons). The tragedy of the commons game in the normal form can
be written as follows.
B The set of players (N) : {1, 2, . . . , n} (we denote this set by [n])
13
B Utility:
5(s1 + · · · + sn )
ui (s1 , . . . , si , . . . , sn ) = si −
n
Our next example is auctions. Auction (for buying) is any game where we have a set of n sellers each
having a valuation of the item. Each seller submits a bid (which thus forms the strategy set) for selling the
item. Then an allocation function is applied to the profile of bids to decide who eventually sells the item to
the buyer and a payment function is applied to decide how much the seller gets paid. Hence one can think
of any number of auctions by simply choosing different allocation and payment functions. We now look at
the normal form representation of the first price auction and the second price auction which by far the most
used auctions around.
Example 1.3.6 (First Price and Second Price Auction). The first and second price auctions in the
normal form can be written as follows.
B The set of players (N) : {1, 2, . . . , n} (this set corresponds to the sellers of the item)
B Allocation function: a : ×i∈N Si −→ RN defined as a (si )i∈N = (ai )i∈N where ai = 1 (that
is, player i wins the auction) if si 6 sj for every j ∈ N and si < sj for every j ∈ {1, 2, . . . , i − 1};
otherwise ai = 0.
B Utility function:
ui (s1 , . . . , si , . . . , sn ) = ai (pi − vi )
14
Chapter 2
We now look at various game-theoretic solution concepts – theories for predicting the outcome of games.
Our first solution concept is the dominant strategy equilibrium.
Definition 2.1. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, a strategy s∗i ∈ Si for some player i ∈ N is
called a strongly dominant strategy for the player i if the following holds.
We call a strategy profile (s∗i )i∈N a strongly dominant strategy equilibrium of the game Γ if s∗i is a
strongly dominant strategy for the player i for every i ∈ N.
a We often use the shorthand (si , s−i ) to denote the strategy profile (s1 , . . . , si−1 , si , si+1 , . . .).
15
One can easily verify that strongly dominant strategy equilibrium, if it exists, is unique in any game. The
requirements for the strongly dominant strategy and strongly dominant strategy equilibrium can be relaxed
a little to define what is called a weakly dominant strategy and weakly dominant strategy equilibrium.
Weakly Dominant Strategy and Weakly Dominant Strategy Equilibrium (WDSE or DSE)
Definition 2.2. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, a strategy s∗i ∈ Si for some player i ∈ N is
called a weakly dominant strategy for the player i if the following holds.
for every s0i ∈ Si \ {s∗i }, there exists s−i ∈ S−i such that ui (s∗i , s−i ) > ui (s0i , s−i )
We call a strategy profile (s∗i )i∈N a weakly dominant strategy equilibrium of the game Γ if s∗i is a
weakly dominant strategy for the player i for every i ∈ N.
Very Weakly Dominant Strategy and Very Weakly Dominant Strategy Equilibrium
Definition 2.3. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, a strategy s∗i ∈ Si for some player i ∈ N is
called a very weakly dominant strategy for the player i if the following holds.
We call a strategy profile (s∗i )i∈N a very weakly dominant strategy equilibrium of the game Γ if s∗i is
a very weakly dominant strategy for the player i for every i ∈ N.
We now prove the first theorem in our course – bidding valuations form a weakly dominant strategy
equilibrium for the second price buying auction. In a buying auction, we have one buyer and n sellers.
Theorem 2.1. Bidding valuations form a weakly dominant strategy equilibrium for the second price
buying auction.
Proof of Theorem 2.1. To prove that bidding valuations is a weakly dominant strategy equilibrium, we need
to show that, for every player i ∈ N, the strategy s∗i = vi for the player i is a weakly dominant strategy for
the player i. Let s−i ∈ S−i be any strategy profile of the players other than the player i. We consider the
following 2 cases. Let θi = minj∈N\{i} sj be the minimum bid of any player other than i.
Case I: Player i wins the auction. We have ui (s∗i , s−i ) = θi − vi which is at least 0 since we have
θi 6 s∗i = vi . Let s0i ∈ Si be any other strategy for the player i. If the player i continues to win the auction
by playing s0i , then we have ui (s0i , s−i ) = θi − vi = ui (s∗i , s−i ). On the other hand, if the player i loses the
auction by playing s0i , then we have ui (s0i , s−i ) = 0 6 θi − vi = ui (s∗i , s−i ).
16
Case II: Player i does not win the auction. We have ui (s∗i , s−i ) = 0. If the player i continues to lose
the auction, then we have ui (s0i , s−i ) = 0 = ui (s∗i , s−i ). Suppose now that the player i wins the auction by
playing the strategy s0i . If there exists a player j ∈ {1, 2, . . . , i − 1} such that sj = θi = s∗i , then for player
i to win the auction, we have s0i > s∗i and thus ui (s0i , s−i ) = θi − vi = 0 = ui (s∗i , s−i ). If for every player
j ∈ {1, 2, . . . , i − 1} we have sj > s∗i , then we have θi < s∗i . Now for player i to win the auction by playing
the strategy s0i , we have s0i > θi . Thus we have ui (s0i , s−i ) = θi − vi < s∗i − vi 6 0 = ui (s∗i , s−i ).
Finally, let si ∈ Si \ {s∗i } be any other strategy. If si > vi , then in the profile where every other player
bids si +v
2 , then bidding si gives player i a utility of 0 (since player i loses) where as bidding vi gives player
i
i a utility of si −v
2
i
(player i wins and gets si +v
2 ). Similarly, if si < vi , then in the profile where every other
i
player bids si +v
2 , then bidding si gives player i a utility of
i si −vi
2 (player i wins and receives si +v
2 ) which
i
∗
is negative where as bidding vi gives player i a utility of 0. Hence si = vi is a weakly dominant strategy for
the player i and thus (vi )i∈N forms a weakly dominant strategy equilibrium.
It is easy to verify that bidding valuations do not form an SDSE. Can you prove that bidding valuations
forms unique weakly dominant strategy equilibrium in the second price auction?
Definition 2.4. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, we call a strategy profile (s∗i )i∈N a pure
strategy Nash equilibrium of the game Γ if the following holds.
For every i ∈ N, ui (s∗i , s∗−i ) > ui (s0i , s∗−i ) for every s0i ∈ Si
It follows immediately from the definition of PSNE is that every WDSE is also a PSNE. Now coming back to
our example of games, one can easily verify that the profiles (A, A) and (B, B) both form a pure strategy Nash
equilibrium for the Battle of sexes and the coordination games. However, both the matching pennies and
the rock-paper-scissors games do not have any pure strategy Nash equilibrium. To extend the requirement of
pure strategy Nash equilibrium, we now allow the players to play according to some probability distribution
over their strategies.1 We call probability distributions over strategy sets mixed strategies. We denote the
set of mixed strategies (probability distributions) over a strategy set S by ∆(S). Since we now allow players
to play according to some mixed strategy, we need to extend the utility functions to take mixed strategies
as input. Given a mixed strategy profile (σi )i∈N , the utility of player i ∈ N from this profile is defined as
1 Why we did not relax the requirements of dominant strategy equilibrium by allowing mixed strategies as we do to define mixed
17
follows. Let N = {1, 2, . . . , n}.
X X X Y
n
ui (σi )i∈N = ··· σi (si )ui (si )i∈N
s1 ∈S1 s2 ∈S2 sn ∈Sn i=1
Definition 2.5. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, we call a mixed strategy profile (σ∗i )i∈N ∈
×i∈N ∆(Si ) a mixed strategy Nash equilibrium of the game Γ if the following holds.
For every i ∈ N, ui (σ∗i , σ∗−i ) > ui (σ0i , σ∗−i ) for every σ0i ∈ ∆(Si )
It is not immediately clear how to check whether a given mixed strategy profile (σi )i∈N forms an MSNE
of a given game. The following characterization of MSNE is computationally appealing whose proof basically
follows from the averaging principle (which says that average of a set of numbers is at most the maximum).
Theorem 2.2. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, a mixed strategy profile (σ∗i )i∈N ∈
×i∈N ∆(Si ) forms an MSNE of the game Γ if and only if we have the following for every player
i ∈ N.
For every strategy si ∈ Si , σ∗i (si ) 6= 0 =⇒ ui (si , σ∗−i ) > ui (s0i , σ∗−i ) for every strategy s0i ∈ Si
Proof of Theorem 2.2. (If part). Suppose the mixed strategy profile (σ∗i )i∈N satisfies the equivalent condi-
tion. We need to prove that (σ∗i )i∈N forms a MSNE for the game. Let i ∈ N be any player and σi ∈ ∆(Si ) be
any mixed strategy of player i. Then we have the following.
X X Y
ui (σi , σ∗−i ) = σi (si ) σ∗j (sj )ui (sj )j∈N
si ∈Si s−i ∈S−i j∈N\{i}
X
= σi (si )ui (si , σ∗−i )
si ∈Si
X
6 σ∗i (si )ui (si , σ∗−i )
si ∈Si
= ui (σ∗i , σ∗−i )
The third line follows from the fact that σ∗i places probability mass only on those strategies si ∈ Si which
maximizes utility of the player i given others are playing according to σ∗−i .
(Only if part). Suppose the mixed strategy profile (σ∗i )i∈N forms an MSNE for the game. For the sake
of arriving to a contradiction, let us assume that there exists a player i ∈ N and a strategy si ∈ Si for which
the condition does not hold. That is, we have σ∗i (si ) 6= 0 and there exists a strategy s0i ∈ Si such that
ui (s0i , σ∗−i ) > ui (si , σ∗−i ). Without loss of generality, let us assume that we have ui (s0i , σ∗−i ) > ui (s00i , σ∗−i ) for
every s00i ∈ Si . Then we have the following.
X X Y
ui (σ∗i , σ∗−i ) = σ∗i (si ) σ∗j (sj )ui (sj )j∈N
si ∈Si s−i ∈S−i j∈N\{i}
18
X
= σ∗i (si )ui (si , σ∗−i )
si ∈Si
The third line follows from the assumption that ui (s0i , σ∗−i ) > ui (s00i , σ∗−i ) for every s00i ∈ Si , ui (s0i , σ∗−i ) >
ui (si , σ∗−i ), and σ∗i (si ) 6= 0. Hence we have ui (σ∗i , σ∗−i ) < ui (s0i , σ∗−i ) which contradicts our assumption that
(σ∗i )i∈N forms an MSNE for the game.
It immediately follows from Theorem 2.2 that, if a player mixes two or more strategies (i.e. puts non-zero
probability on two or more strategies) in an MSNE, then the expected utility of that player by playing any
of those pure strategies is the same (of course assuming others are playing according to the MSNE mixed
strategy profile) and thus the player is ”indifferent” to these pure strategies.
Using Theorem 2.2, we can check that {(1/2, 1/2), (1/2, 1/2)} forms an MSNE for the matching pennies game.
Can you prove that {(1/2, 1/2), (1/2, 1/2)} is the only MSNE for the matching pennies game? We can also verify
that {(1/3, 1/3, 1/3), (1/3, 1/3, 1/3)} forms an MSNE for the rock-paper-scissor game. Can there exist games where
there does not exist any MSNE? Yes! However, the celebrated Nash Theorem shows that every finite game2
has at least one MSNE.
Figure 2.1: Pictorial depiction of relationship among various game theoretic solution concepts.
2A game is called finite if it involves a finite number of players and the strategy set of every player is a finite set.
19
Nash Theorem
We note that the condition about the game being finite is necessary in the Nash Theorem since there
obviously exist games having no MSNE (can you give an example?). We skip the proof of the Nash theorem
since the flavor of the proof does not match with the taste of the course. We refer interested readers to
[WHJ+ , Section 5.3]. However, the existence of MSNEs for an important class of games, called two players
zero-sum games, was known long before the Nash Theorem. Two players zero-sum games is our next topic
of study.
20
Chapter 3
Matrix Games
One of the many important criticisms of Nash equilibrium is that it does not take into account the notion of
the ”security of the players.” Let us understand the notion of security with an example.
Example 3.1.1 (Game with unique but insecure Nash equilibrium). Let us consider the following game.
Player 2
A B
B Payoff matrix:
A (2, 2) (2.5, 1)
Player 1
B (−100, 2) (3, 3)
The above line of reasoning indicates another line of “rational thinking” – play the strategy which guar-
antees maximum utility without assuming anything about how the other player plays. This worst-case utility
in the above sense is called the “security” of that player which we define now formally.
21
Security of Player in Pure Strategies – Maxmin Value in Pure Strategies
Definition 3.1. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i in normal form, we define the “security
level in pure strategies” or “value” v−i of player i in pure strategies to be the maximum utility that
player i can get by playing any pure strategy:
A strategy s∗i ∈ Si that guarantees the player i his/her security level is called a maxmin pure strategy
for the player i.
a Forgames where strategy sets may be infinite, we need to replace max and min by sup and inf respectively since the
concepts max and min may not make sense there.
For example, we can see that the security of both the players in the matching pennies and the rock-paper-
scissor games is −1. We can naturally extend Definition 3.1 to mixed strategies as follows.
Definition 3.2. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i in normal form, we define the “security
level in mixed strategies” or “value” v−i of player i in mixed strategies to be the maximum utility that
player i can get by playing any mixed strategy:
A strategy s∗i ∈ Si that guarantees the player i his/her security level is called a maxmin mixed strategy
for the player i.
Again by averaging principle, it follows that the value v−i of player i in mixed strategies can be also be
written as the following.
v−i = sup min ui (σi , s−i )
σi ∈∆(Si ) s−i ∈S−i
Using the above characterization, we can easily compute that the value of both the players in the matching
pennies and the rock-paper-scissor games is 0 which is the same as the utility that each player receives in
those games in the unique Nash equilibrium. Actually, the following more general result holds whose proof
follows immediately from the definition of Nash equilibrium and the value of players. We leave its proof to
the reader.
Theorem 3.1. Given a strategic form game Γ = hN, (Si )i∈N , (ui )i∈N i, if (σ∗i )i∈N ∈ ×i∈N ∆(Si ) is an
MSNE for Γ , then we have ui ((σ∗i )i∈N ) > v−i for every player i ∈ N where v−i is the value of the
player i in mixed strategies.
It turns out that the inequality in Theorem 3.1 holds with equality for two-player zero-sum games; we
22
will prove this in Corollary 3.3. A game is called a zero-sum game or strictly competitive game is the sum
of the utilities for all the players is 0 in every strategy profile. Hence, it follows that in a zero-sum game, if
we have only two players, then the game can be represented by a matrix (assuming the game is finite) – the
matrix corresponds to the utility function of player 1 and the utility function of player 2 is simply the minus
of that matrix. This is why two-player zero-sum games are also called “matrix games.” It is assumed that the
matrix in a matrix game corresponds to the utility function of the row player. Hence, for a matrix game Γ
given by a m × n matrix A, the maxmin value of the row player, denoted by v, is given by the following.
X
m
v= max min σ(i)Ai,j
σ∈∆([m]) j∈[n]
i=1
Because of the above expression, the value of the row player in a zero sum game is also called the maxmin
value of the game in mixed strategies. Similarly, the value of the column player in mixed strategies, denoted
by v, is given by the following.
X n
v = min max σ(j)Ai,j
σ∈∆([n]) i∈[m]
j=1
Because of the above expression, the value of the column player in a zero-sum game is also called the
minmax value of the game in mixed strategies. The beauty of zero-sum games is that the maxmin value in
mixed strategies coincides with the minmax value in mixed strategies. Hence it follows from Theorem 3.1
that, if a Nash equilibrium exists (which we will prove that it always exists for matrix games), then the utility
that each player receives is their corresponding security levels. Moreover, the corresponding maxmin and
minmax strategies form an MSNE for the matrix game. The utility that the row player receives in any MSNE
in a matrix game is thus called the value of that matrix game. We prove these facts in the next section. From
now on, if not mentioned otherwise, we will assume all values to be in mixed strategies.
X
m X
n
max min σ(i)Ai,j 6 min max σ(j)Ai,j
σ∈∆([m]) j∈[n] σ∈∆([n]) i∈[m]
i=1 j=1
Pm
Proof of Lemma 3.1. Let σ∗ ∈ ∆([m]) be the mixed strategy which maximizes minj∈[n] i=1 σ(i)Ai,j . Then
we have the following.
X
m X
m
max min σ(i)Ai,j = min σ∗ (i)Ai,j
σ∈∆([m]) j∈[n] j∈[n]
i=1 i=1
23
X
n X
m
= min σ∗ (i)σ(j)Ai,j
σ∈∆([n])
j=1 i=1
X
m X
n
= min σ∗ (i)σ(j)Ai,j
σ∈∆([n])
i=1 j=1
X
m X
n
= min σ∗ (i) σ(j)Ai,j
σ∈∆([n])
i=1 j=1
X
n
6 min max σ(j)Ai,j
σ∈∆([n]) i∈[m]
j=1
We now present the most important result of matrix games which is known as the minmax Theorem.
Minmax Theorem
Theorem 3.2. For every matrix game given by a m × n matrix A, there exists a mixed strategy
x∗ = (x∗1 , . . . , x∗m ) ∈ ∆([m]) for the row player and a mixed strategy y∗ = (y∗1 , . . . , y∗n ) ∈ ∆([n]) for
the column player such that the following holds.
We now prepare background to prove Theorem 3.2. For that, let us analyze the situation for the row
player. Recall, the row player wants to play a mixed strategy, denoted by a vector x = (x1 , . . . , xm ) say,
which guarantees the highest utility in the worst case. Then the row player is faced with the following
optimization problem which is a linear program.
P
maximize minj∈[n] m i=1 Ai,j xi maximize t
Pm P
subject to: i=1 xi = 1 subject to: t6 m i=1 Ai,j xi ∀ j ∈ [n]
Pm
xi > 0 ∀ i ∈ [m] i=1 xi = 1
xi > 0 ∀ i ∈ [m]
Similarly, the column player is faced with the following linear program.
24
P
minimize maxi∈[m] n j=1 Ai,j yj minimize w
Pn Pn
subject to: j=1 yj = 1 subject to: w> j=1 Ai,j yj ∀ i ∈ [m]
Pn
yj > 0 ∀ j ∈ [n] j=1 yj = 1
yj > 0 ∀ j ∈ [n]
The proof of Theorem 3.2 follows easily from what is called the strong duality theorem for linear pro-
grams. We refer to Luca Trevisan’s blog post on LP duality for a brief but beautiful introduction to linear
programming duality [Tre].
Theorem 3.3. Let LP1 and LP2 be two linear programs which are dual of each other. If either LP1 or
LP2 is feasible and bounded, then so is the other, and OPT (LP1 ) = OPT (LP2 ).
Proof of Theorem 3.2. Let us call the linear programs of the row and column players LP1 and LP2 respectively.
The optimal values of the linear programs for both the row and column players is upper bounded and lower
bounded by respectively the maximum and minimum value of the matrix A. Hence, by strong duality
theorem, we know that we have OPT (LP1 ) = OPT (LP2 ). Let x∗ = (x∗1 , . . . , x∗m ) and y∗ = (y∗1 , . . . , y∗n ) be the
solutions of LP1 and LP2 respectively (i.e. which optimizes the corresponding linear programs). Hence, we
have the following.
X
m X
n
min x∗ Aej = min Ai,j xi = OPT (LP1 ) = OPT (LP2 ) = max Ai,j yj = max ei Ay∗
j∈[n] j∈[n] i∈[m] i∈[m]
i=1 j=1
Now the result follows from the following observation which is again due to the averaging principle.
Corollary 3.1. In every matrix game, there exists an MSNE where both the players play a mixed
strategy which guarantees each of them their value in mixed strategies.
25
Proof of Corollary 3.1. Let x∗ = (x∗1 , . . . , x∗m ) and y∗ = (y∗1 , . . . , y∗n ) be as defined in Theorem 3.2. Fromt the
proof of Theorem 3.2, we have the following.
Now we prove that x∗ is a utility-maximizing strategy for the row player given that the column player is
playing y∗ .
Hence (x∗ , y∗ ) forms an MSNE for the matrix game. Since linear programs can be solved in polynomial
time, it follows that x∗ and y∗ (and thus an MSNE of the matrix game) can be computed in polynomial
time.
We now show that the maxmin value in mixed strategies coincides with minmax value in mixed strategies
in matrix games. Again the proof follows almost immediately from Theorem 3.2.
Proof of Corollary 3.2. Due to Lemma 3.1, it is enough to prove maxx∈∆([m]) miny∈∆([n]) xAy >
miny∈∆([n]) maxx∈∆([m]) xAy. Let x∗ and y∗ be the strategies as defined in the proof of Theorem 3.2 (that is,
they are the mixed strategies of the row and column players which guarantee corresponding security levels
in mixed strategies). We have the following.
max min xAy > min x∗ Ay = max xAy∗ > min max xAy
x∈∆([m]) y∈∆([n]) y∈∆([n]) x∈∆([m]) y∈∆([n]) x∈∆([m])
We next deduce another important result from Theorem 3.2 which shows that in any MSNE of a matrix
game, both the row and the column players’ payoff are exactly their security levels.
Corollary 3.3. Let A be a m × n matrix. Then, in the corresponding matrix game, the payoff of both
the row and the column players at any MSNE are their corresponding security levels.
Proof of Corollary 3.3. Let (x∗ , y∗ ) ∈ ∆([m]) × ∆([n]) be an MSNE of the matrix game corresponding to A.
Let vr and vc be the security levels of the row and column players respectively. Then, by definition of security
26
levels, we have the following.
Since (x∗ , y∗ ) is an MSNE for the matrix game, we have x∗ Ay∗ > vr and −x∗ Ay∗ > −vc since otherwise
players can play the strategy that guarantees their security level. Also we claim that x∗ (y∗ respectively) is a
mixed strategy which guarantees security level for the row player (column player respectively) irrespective
of what the column player (row player respectively) plays. Indeed otherwise there exists a mixed strategy
y ∈ ∆([n]) (x ∈ ∆([m]) respectively) for the column player (row player respectively) which gives more payoff
to him/her. This contradicts our assumption that (x∗ , y∗ ) is an MSNE for the matrix game. Now we have the
following for the utility of the row player.
We know that any comparison based deterministic sorting algorithm must make Ω(n log n) pairwise com-
parisons to sort n elements. The typical proof of this statement views any comparison based deterministic
sorting algorithms like decision trees — the internal nodes of the decision tree represent comparisons and
leaf nodes correspond to complete orders of n elements. Since there must be at least n! leaf nodes, it follows
that the height of the decision tree, and thus the number of pairwise comparisons made by the algorithm
in the worst case, is Ω(log n!) = Ω(n log n). Can we beat this lower bound by designing randomized algo-
rithms for sorting? An algorithm for a problem (sorting for example) is called a randomized algorithm if it
adaptively tosses fair coin some number of times and, for every input, it outputs correctly with probability at
least 32 . We will prove, using Yao’s lemma that, any randomized algorithm for sorting also makes Ω(n log n)
number of queries in expectation. In general, Yao’s lemma provides a generic technique to prove lower
bounds for randomized algorithms.
Before presenting Yao’s lemma, let us see another perspective of randomized algorithms. Let A be a ran-
domized algorithm that makes s number of coin tosses. Observe that if the outcomes of these s random coin
tosses are fixed to some si , then the randomized algorithm A becomes a deterministic algorithm – call it
A(si ). Hence, A is nothing but a distribution over the deterministic algorithms {A(si ) : i ∈ [2s ]}.
27
3.3.2 Yao’s Lemma
Yao’s Lemma
Lemma 3.2. Let A be a randomized algorithm for some problem Π. For an input x, let T (A, x) be
the random variable denoting the cost (e.g. running time, number of comparisons made) of A on x.
Let X be the set of all inputs (of length n) to Π, X be a random variable denoting the input chosen
from X according to some distribution p on X, and AΠ be the set of all deterministic algorithms for
the problem Π. Then we have the following.
Proof. Consider the following matrix game B— rows are indexed by X, columns are indexed by A, the
entry of the matrix indexed by (x, a) is the cost of the algorithm a on x. Then by Lemma 3.1, we have the
following.
Theorem 3.4. Any randomized algorithm for sorting n items makes Ω(n log n) number of compari-
son queries.
Proof. Let A be any randomized algorithm for sorting, A the set of deterministic algorithms for sorting, X
the set of all possible input for it, and T (A, x) the random variable denoting the number of comparisons that
A makes on x. We need to show that maxx∈X E[T (A, x)] = Ω(n log n). From Yao’s Lemma, it is enough to
show that mina∈A E[T (a, X)] = Ω(n log n) for some random variable X on X. Let us define X to be uniform
distribution over X. We now show that for this choice of X, we have E[T (a, X)] = Ω(n log n) for every
deterministic sorting algorithm a ∈ A; i.e. the average number of comparison queries that any deterministic
sorting algorithm makes is Ω(n log n). Again viewing deterministic sorting algorithms as decision trees, we
need to prove that the average height (which is the sum of heights of all the leaf nodes divided by the number
of leaf nodes) of any binary tree T with t(= n!) nodes is Ω(log t)(= Ω(n log n)).
If the tree T is balanced (that is if the difference between the height of any two leaf nodes is at most 1),
then the result follows immediately. Otherwise, we take two leaf nodes u and v so that the height of u is at
least 2 more than the height of v. We remove u and make it a child of v – this operation reduces the average
height of the tree. Hence, we have proved that among all binary trees with t leaves, there exists a balanced
binary tree whose average height is minimum among all binary trees on t leaf nodes. Hence, the average
height of the decision tree for the algorithm a is Ω(n log n).
28
Chapter 4
Computing Equilibrium
Nash theorem guarantees the existence of a mixed strategy Nash equilibrium in every finite normal form
game. However, given a finite game in its normal form, can we compute a PSNE of the game in normal
form? Before we can answer this, there is another subtle but important issue – what is the guarantee that
a Nash equilibrium can be written in finite space (for example, what if the probability of playing certain
strategy for some player turns out to be an irrational number?). If the input game has exactly 2 players, then
we will see that all the probabilities involved in any MSNE are rational numbers. However, there exists 3
player normal form game where all the MSNEs are irrational. To tackle this fundamental issue, we change
our goal to finding an ε-MSNE – a strategy profile is an ε-MSNE if unilateral deviation can increase the utility
of any player at most by an additive ε. Given a normal form game, we denote the computational problem of
finding a ε-MSNE by ε-N ASH.
X
m X
m
bi,j xi = v ∀ j ∈ J, bi,j xi 6 v ∀ j ∈ [n] \ J
i=1 i=1
X X
xi = 1, yj = 1, xi > 0 ∀ i ∈ I, yj > 0 ∀j∈J
i∈I j∈J
The correctness of the above algorithm follows from the indifference principle (Theorem 2.2). The
running time of this algorithm is O(2m+n poly(m, n)) since linear programs can be solved in polynomial
time. The length of the input is 2mn and hence the above algorithm runs in exponential time. Indeed,
we do not know any polynomial-time algorithm for computing an MSNE even for Bimatrix games. We will
see why we do not hope to have a polynomial-time algorithm for computing an MSNE even for Bimatrix
29
games. However, before we proceed let’s shed light on the input size again. To describe a normal form game
with n players each having s strategies, we need nsn numbers. We observe that, unlike MSNE, a PSNE
can be computed in polynomial time by simply iterating over all possible strategy profiles. This observation
trivializes the computation of PSNEs. However, the above algorithm runs in polynomial time because “the
input itself is quite large.” Indeed, in many applications of game theory, network congestion games, for
example, the input is given succinctly and never presented explicitly by listing down the utilities of every
player in every strategy profile. Below we see a list of interesting types of games where the input can be
given succinctly.
2. Sparse games: In this type of game, all the nsn utilities are zero except a few which is given explicitly.
Graphical games is a form of sparse game.
3. Symmetric games: In these games, the players are all identical. Hence, the utility of a player in a
strategy profile depends on how many players play each of the s strategies. In any strategy profile,
all the players who play the same strategy receive the same utility. Such games can be described
using s n+s−1 numbers. We observe that a two-player bimatrix game given by matrices A and B is
s−1
symmetric if and only if A = BT .
4. Anonymous games: This is a generalization of symmetric games. Here all the players are different but
every player cannot distinguish others. So the utility of a player depends on how many other players
play each of the strategies. However, players who play the same strategy in a strategy profile may
receive different utilities. These games can be described using ns n+s−2
s−1 numbers.
5. Network congestion games: Here we have a graph G. Every player has a source and a destination
vertex in G; the strategy set of each player is the set of all paths from his source to his destination.
The load `(e) of an edge e ∈ G is the number of players using that edge. Each edge e also has a
congestion function ce which is assumed to be a non-decreasing cost function. The utility of player i
P
in the strategy profile P = (Pj )j∈N is e∈Pi ce (`(P)). For network congestion games, we usually work
with cost functions and players try to minimize their cost (so utility function is the minus of the cost
function).
6. Congestion games: Congestion games are a generalization of network congestion games. We have
a ground set E (think of this as the set of edges in network congestion games). The strategy set
of each player is some set of subsets of E (think of these as paths). Every element e ∈ E has a
function ce associated with it which is assumed to be a non-decreasing cost function. Let `e be the
number of players whose action includes e. The utility of player i in the strategy profile P = (Pj )j∈N
P
is e∈Pi ce (`(P)). For congestion games also, we usually work with cost functions and players try to
minimize their cost (so utility function is the minus of the cost function).
30
7. Multi-matrix games: Suppose in an n player game, each player has m strategies. For each ordered pair
(i, j) of players, we have an m×m utility matrix Aij . The utility of player i in a strategy profile (si )i∈[n]
P
is j6=i Aij si sj . These type of games arise often in networks where player i receives benefits from all its
neighbor and his/her utility is the sum of benefit received.
We will see next that a PSNE can be computed efficiently for some of the above succinct games but we
do not have much hope to have an efficient algorithm for computing a PSNE for some of the succinct games,
for example, congestion games.
Theorem 4.1. Every network congestion game, with arbitrary real-valued cost functions, has at least
one PSNE.
Proof. Let f be a flow in the given graph G.1 We define a potential function on the set of flows in G as
XX
fe
Φ(f) = ce (i)
e∈G i=1
We will now show that, if there exists a player i who can reduce his cost by shifting to another path Pi0
from his current path Pi , then the reduction of the cost of player i is exactly the same as the corresponding
reduction of the potential function (this is exactly the requirement for a function to be called a potential
function). Let f̂ be the new flow after player i changes his strategy to Pi0 from Pi , then we have the following.
X X
Φ(f̂) − Φ(f) = ce (f̂e ) − ce (fe )
e∈Pi0 e∈Pi
Once we have a potential function, the proof of the existence of PSNE is straight forward. Since there
are only finitely many atomic flows in G, there exists a flow f∗ which minimizes Φ(f∗ ). Since Φ is a potential
function, it follows that any strategy profile corresponding to the flow f∗ is a PSNE.
31
Potential Games
Definition 4.1. A game Γ = hN, (Si )i∈N , (ui )i∈N i is called a potential game if there exists a function
Φ from the set of all strategy profiles to real numbers which satisfies the following.
∀ i ∈ N, si , s0i ∈ Si , s−i ∈ S−i , Φ(si , s−i ) − Φ(s0i , s−i ) = ui (si , s−i ) − ui (s0i , s−i )
The proof of Theorem 4.1 immediately gives an algorithm for finding a PSNE in any potential game which
is called “best response dynamics.”
The following result follows immediately from the proof of Theorem 4.1.
Corollary 4.1. Best response dynamics converge to a PSNE in every finite normal form game.
Proof. Follows from the observation that the best response dynamics never consider any strategy profile
more than one.
Corollary 4.1 is appealing from the point of view of the predictive power of Nash equilibrium also. It
says that the players can eventually reach a pure strategy Nash equilibrium in potential games by repeatedly
playing the game and following best response dynamics. Although Corollary 4.1 guarantees convergence,
it does not talk about the speed of convergence. Obviously, if the potential function happens to take only
polynomially many distinct values, then the best response dynamics converge in polynomial time. However,
potential functions in typical applications take exponentially (in the number of players and the sum of the
number of strategies of the players) many values and it is indeed possible for the best response dynamic to
take exponentially many steps to reach any PSNE. To achieve fast convergence, we weaken our need – we
settle for ε-PSNE.
32
4.4 Approximate PSNE
ε-PSNE
Definition 4.2. Given a normal form game Γ = hN, (Si )i∈N , (ui )i∈N i, a strategy profile s = (si )i∈N
is called an ε-PSNE for Γ if no player can improve his utility more than ε fraction of his utility in s by
deviating. That is, we have the following.
For congestion games (where the goal is to minimize cost), the definition of ε-PSNE requires that the
cost of any player should not drop by more than ε fraction of his current cost. That is, we should have the
following for a strategy profile s to be a ε-PSNE.
To ensure fast convergence, we modify the best response dynamics to ε best response dynamics.
– pick a player i who has a move from si to s0i that guarantees a reduction of cost by at least
ε fraction of his current cost.
– replace si with s0i in s
We now prove that ε-best response dynamics guarantee fast convergence in network potential games
under some mild conditions.
Theorem 4.2. In an atomic network congestion game, suppose the following holds.
B All the players have the same source and the same destination.
B Cost function satisfies “α-bounded jump condition”: Ce (x + 1) ∈ [Ce (x), αCe (x)] for every edge
e and every positive integer x.
B The max-gain variant of the best response dynamics is used – among all players who have an
ε best response move, pick a player and a move which achieves largest absolute cost decrease
(for that player).
Φ(s0 )
Then an ε-PSNE will be reached in O nα ε log Φ(smin )
where Φ(s0 ) and Φ(smin ) are respectively the
initial and minimum value of the potential function Φ.
Proof. Let s be a strategy profile which is not a ε-PSNE. The proof has two parts. In the first part, we show
33
that there always exists a player i∗ ∈ N whose current cost is high. In the second part, we show that player
j ∈ N is chosen by the max-gain version of the ε-best response dynamics, then the drop in the cost of player
j is at least some significant fraction of the cost of player i∗ (in the current profile).
Φ(s)
Claim 4.4.1. In every strategy profile s, there exists a player i∗ ∈ N such that Ci∗ (s) > n .
Proof. Let us define the cost C(s) of a strategy profile s as the sum of costs of all the players in s. That is
X X
C(s) = Ci (s) = fe ce (fe )
i∈N e∈E[G]
X X
fe
Φ(s) = ce (i)
e∈E[G] i=1
P
Since cost functions are assumed to be non-decreasing, we have Φ(s) 6 C(s) = i∈N Ci (s). Thus we have
maxi∈N Ci (s) > Φ(s) ∗ Φ(s)
n . Let i ∈ argmaxi∈N Ci (s) and we have Ci (s) > n .
∗
Claim 4.4.2. Let j ∈ N be the player chosen by the max-gain version of the ε-best response dynamics and the
player j moves to strategy s0j from his current strategy sj . Then we have the following.
ε
Cj (s) − Cj (s0j , s−j ) > Ci (s) for every player i ∈ N
α
Proof. Let us fix any player i ∈ N. We consider two cases.
Case I: Suppose player i has an ε move. Then since player j has been chosen by the max-gain version of
the ε-best response dynamics, we have
ε
Cj (s) − Cj (s0j , s−j ) > max (Ci (s) − Ci (s0i , s−i )) > εCi (s) > Ci (s)
0
si ∈Si α
Case II: Suppose player i does not have an ε move. Since player i has been chosen by the max-gain version
of the ε-best response dynamics, we have
Since all the players have the same source and destination, s0j is a valid strategy for the player i too. Since
player i does not have any ε move, we have
From Equations (4.1) to (4.3), we have Ci (s) < αCj (s). Now we have the following chain of inequalities.
ε
Cj (s) − Cj (s0j , s−j ) > εCj (s) > Ci (s)
α
34
Hence the drop in potential in one iterations is given by
ε ε
Φ(s) − Φ(s0j , s−j ) = Cj (s) − Cj (s0j , s−j ) > max Ci (s) > Φ(s)
α i∈[n] αn
The ε-best response dynamics based algorithm for computing PSNE in network congestion games (under
certain assumptions) can be seen as computing an approximate minimizer of the potential function Φ. Can
this result be generalized to any potential game or at least to any congestion game?2 We will see now that
we do not hope to have such a general result.
B While there exists a vertex v whose movement can increase the cut value
Since the maximum cut problem is NP-hard, we do not expect the above local search for the maximum
cut problem to reach a “global optimum” in polynomial time. However, can we expect that the above local
search finds a “local optimum” in polynomial time? A local optimum is a solution that can not be improved
by local moves only. Obviously, if the input graph is unweighted, then the above local search terminates at
a local optimum in polynomial time. We will now learn the complexity-theoretic framework which would
indicate that we do not expect to have a polynomial-time algorithm to find a local optimum for the maximum
cut problem.
35
not have a polynomial-time algorithm even after decades of effort. Towards that, we need to define a local
search problem abstractly (like in the classical complexity-theoretic framework, we define decision problems
as language membership questions). An abstract local search problem is defined by three polynomial-time
algorithms.
(iii) An algorithm that says whether a solution is a local optimum and if not then executes a local move to
improve the value of the solution.
The complexity class PLS is the set of all abstract local search problems. Thus, by definition, every
problem in PLS admits a local search algorithm although that algorithm may not run in polynomial time.
What we are interested to know if there exists a polynomial-time algorithm which finds a local optimum
of a given local search problem. Loosely speaking, most of the problems that we encounter can be cast
as a local search problem and thus the corresponding local search versions belong to the complexity class
PLS. Observe that, given any optimization problem, depending on how we define local move, we can have
different local search versions of the problem.
4.7 PLS-Completeness
The notion of PLS-completeness characterizes the hardest problems in the class PLS. A problem P is PLS
complete if the problem P belongs to the class PLS and every problem in PLS “reduces” to P.
Definition 4.3. We say that a PLS problem P1 reduces to another PLS problem P2 if we have the
following two polynomial-time algorithms.
(ii) An algorithm B that maps every local optimum of A(x) to a local optimum of x.
Observe that the definition of reduction ensures that if we can solve the problem P2 in polynomial time,
then we can solve the problem P1 also in polynomial time. It is known that the (discussed above) local
search version of the maximum cut problem is PLS-complete [Sch91].
36
Proof. The problem of finding a PSNE of a congestion game clearly belongs to the class PLS since congestion
games are potential games. To prove PLS-hardness, we reduce from the local search version of the maximum
cut problem. Let G = (V, E, w) be an arbitrary instance of maximum cut. Consider the following congestion
game.
B Cost of the resource re or re is 0 if less than two players use it and we otherwise.
The above construction is the algorithm A in Definition 4.3. We now describe how to map a PSNE of the
constructed congestion game to a local optimum of the maximum cut. Before presenting the algorithm, let
us explain the high level idea first. Recall the potential function
XX
fe
Φ(s) = ce (i), where fe is the number of players using resource e
e∈E i=1
Each player has only 2 strategies to choose from – these strategies correspond to the side of the cut the
corresponding vertex belongs. For the player corresponding to the vertex v, playing {re : e ∈ δ(v)} corre-
sponds to v belonging to S and playing {re : e ∈ δ(v)} corresponds to v belonging to S. We next observe that,
for every resource e = {u, v}, the resources {re , re } combined will always be used exactly twice (by players
corresponding to u and v). The cost function is designed in such a way that an edge e contributes 0 to
the potential Φ(s) if e belongs to the cut and we otherwise. We can see that a cut of weight w(S, S) corre-
P
sponds to the value of the potential function being e∈E we − w(S, S) – hence a local optimum maximum
cut corresponds to a strategy profile where no local move (that is, move by any one player) can decrease
the potential further. Hence the algorithm for mapping a solution of local maximum cut to a PSNE of the
congestion game goes as follows. Let s be a PSNE of the congestion game constructed above. We define
S = {v ∈ V : the strategy of v in s is {re : e ∈ δ(v)}}. From the discussion above, it follows that (S, S) is a
solution of local search version of the maximum cut problem.
The proof of Theorem 4.3 actually shows more – it establishes a bijection between local moves in the
maximum cut problem and the moves in best response dynamics. Since we know that local search can take
exponentially many steps to reach a local optimum solution, we have the following immediate corollary.
Corollary 4.2. Best response dynamics can take exponentially (in the sum of the number of strategies
of the players) many steps to reach a PSNE for congestion games.
The result in Theorem 4.3 can be extended to symmetric congestion games too by reducing any arbitrary
congestion game to a symmetric congestion game.
37
Computing PSNE in Symmetric Congestion Game
Proof. Again the problem of finding a PSNE in a symmetric congestion game belongs to PLS due to best
response dynamics. To show PLS-hardness, we reduce from the problem of finding a PSNE in arbitrary
congestion game Γ . Suppose we have n players where the strategy set of the player i is Si for i ∈ [n] in Γ . We
construct a symmetric congestion game Γ 0 where the set of players remains the same as Γ . In Γ 0 , we have a
set {ri : i ∈ [n]} of n new resources; that is the set of resources in Γ 0 is E ∪ {ri : i ∈ [n]}. The idea is to augment
resource ri with every strategy in Si . That is the strategy set of every player is {si ∪ {ri } : si ∈ Si , i ∈ [n]}.
The cost of every resource in E remains the same as in Γ . The cost of ri is 0 if at most one player uses it
and ∞ if more than one player uses it for every i ∈ [n]. We observe that in any PSNE of Γ 0 , every new
resource ri , i ∈ [n] will be used by exactly one player –intuitively speaking, the player who uses the resource
ri simulates the i-th player of the original game Γ . Formally, given a PSNE s0 of Γ 0 , we are supposed to map
it to a PSNE s of Γ and this is how we do it – if si ∪ {ri } is a strategy played in s0 , then player i plays si in s.
We can see that the value of the potential function is the same for both the games and thus the PSNEs of Γ
and Γ 0 are in one-to-one correspondence.
Again the proof of Theorem 4.4 establishes a bijection between the moves in best response dynamics of
corresponding games. Thus we conclude the following.
Corollary 4.3. Best response dynamics can take exponentially (in the sum of the number of strategies
of the players) many steps to reach a PSNE even for symmetric congestion games.
We now return to the question of computing an MSNE of a bimatrix game and see why we do not expect
to have a polynomial-time algorithm for this problem.
(ii) does there exist an MSNE in which player 1 receives certain given payoff?
(iii) does there exist an MSNE in which the sum of payoffs of both the players is some given value?
(iv) does there exist an MSNE where the support size of player 1 is at least some given number?
38
(v) does there exist an MSNE where the support of player 1 contains some given strategy?
(vi) does there exist an MSNE where the support of player 1 does not contain some given strategy?
However, the above results do not shed any light on the complexity of the original problem of computing
an MSNE. Towards that, we define the complexity class “Functional NP ” or FNP for short. The class FNP
consists of all problems in NP with an extra demand that one needs to exhibit a certificate also for Y ES
instances. That is, an algorithm for an FNP problem can either say that the instance is a N O instance or,
if it says that the instance is a Y ES instance, then the algorithm must output a certificate also. Abstractly,
the problems in NP are modeled by languages whereas the problems in FNP are modeled by relations. A
binary relation P(x, y) where the length of y is polynomially bounded in the length of x belongs to the class
FNP if there is a deterministic polynomial-time verifier algorithm (Turing machine more concretely) that
can determine whether a given x is related to a given y. The proof of Cook-Levin theorem shows that the
functional version of SAT, call it functional SAT, (where we demand a satisfying assignment too in the output
if the instance is a Y ES instance) is complete for the FNP class too. Problems in FNP are also called search
problems. The class PLS is a sub-class of FNP since every problem in PLS has a polynomial-time algorithm
that determines whether a given solution is locally optimum for a given problem instance. The certificates
of a problem in PLS are its local optimum.
It follows from the definition of FNP, the problem of computing an MSNE of a bimatrix game belongs
to the class FNP since it is enough to check for deviations in pure strategies. Can we prove that MSNE
computation is FNP-complete3 to settle its complexity? Unfortunately N O unless we believe NP = co −
NP [MP91].
Theorem 4.5. If the problem of computing an MSNE of a bimatrix game is FNP-complete, then
NP = co − NP.
Proof. Suppose the problem of computing an MSNE of a bimatrix game is FNP-complete. Then there is a
reduction from functional SAT to MSNE computation for bimatrix games. This implies the existence of the
algorithms A and B which does the following for any instance x of SAT:
We observe that, existence of A and B implies that every SAT instance has a polynomial sized certificate
for N O instances which can be verified by a polynomial-time verifier: for any N O instance x of SAT, the
certificate is the bimatrix game A(x) along with an MSNE s∗ of the game; the verifier is the algorithm B.
Hence SAT belongs to co-NP and thus we have NP = co − NP.
Intuitively speaking, the hurdle in proving MSNE computation to be FNP-complete is that problems in
FNP may have N O instances which is never the case for MSNE computation. This motivates us to define a
3 The notion of reduction for the class FNP is exactly the same as the class PLS.
39
subclass of FNP which consists of only those search problems where the existence of at least one solution is
guaranteed.
(ii) Given an “intermediate solution”, an algorithm to return a next “intermediate solution” or output that
the current solution is a local optimum.
Although it is not immediately clear, there exists an algorithm for computing an MSNE for bimatrix
games which traverses a search graph (indeed all state of the art algorithms e.g. Lemke-Howson algorithm
traverses certain search spaces which are edges of best response polytope). Hence, we conclude that MSNE
computation is in PPAD. Can we prove MSNE computation to be PPAD-complete? Y ES!
4 For any generic non-succinct game, because of large input size, the computational task of PSNE can always be done in polynomial
40
FNP
TFNP
PPAD
PLS
Figure 4.1: Pictorial depiction of relationship among various complexity classes of search problems.
We now discuss a computational problem that belongs to the class PPAD more clearly. The problem is on
what is known as Sperner’s Lemma which turns out to be the heart of the proof of Nash Theorem. Consider
a subdivided simplex in a 2 dimensional plane. We color the vertices of this subdivided simplex as follows:
the top vertex is colored red, the left vertex is colored green, and the right vertex is colored blue; the vertices
in the edges can be colored with any color of its endpoints; the internal vertices can be colored arbitrarily
using any of red, green, or blue. Refer to Figure 4.2 for an example of legal coloring. A triangle is called
trichromatic if all its 3 vertices have different colors.
Sperner’s Lemma says that, in every legal coloring of subdivided simplex, there is an odd number of (and
thus at least one) trichromatic triangles. We now present a constructive proof for Sperner’s lemma. We
construct a directed graph G whose set of vertices is the set of triangles whose corners are not colored blue.
The graph G also has a special sink vertex s. For two non-special vertices x and y, we have an edge from
x to y if the corresponding triangles share a common edge whose endpoints are colored green and red and
while traversing from the triangle of x to triangle of y through the common edge, the green endpoint of that
edge appears on the right-hand side. The special vertex has an edge through the greed-red side of triangles
that appear on the left side of the outer triangle. Can you see that every non-special vertex of degree one
corresponds to a trichromatic triangle and vice-a-versa? Also, observe that the degree of s is always an odd
integer. Since the number of vertices of odd degree is always even in every graph, the number of trichromatic
triangles is always odd. The above proof can be generalized inductively to n dimensions using n + 1 colors.
We observe that the above proof gives a naı̈ve path following algorithm to compute a trichromatic triangle
of any subdivision which proves that the problem of finding a trichromatic triangle of a given subdivision
(call it Sperner’s problem) is in PPAD. It turns out that the above problem is PPAD-complete. The following
result is one of the cornerstone results in Algorithmic Game Theory.
41
Figure 4.2: A legal coloring of a triangulation. The picture is taken from Google images.
MSNE is PPAD-complete
We omit the proof of Theorem 4.6 since it is quite technical. We refer interested readers to [Rou10,
Section 4], The proof of Theorem 4.6 goes by reducing Sperner’s problem to what is known as Brower’s
problem and then reducing Brower’s problem to the problem of computing an MSNE for bimatrix games.
Brower’s Theorem states that any continuous function f : C −→ C mapping a compact set C ⊂ Rn to
itself has at least one fixed point: a x ∈ C such that f(x) = x. However, since C can be an uncountably
infinite set (which is indeed the case most often), the above line of reasoning proves that the problem of
computing an ε-MSNE (where unilateral deviation cannot increase the payoff of any player by more than ε)
is PPAD complete. For ε-MSNE to make any sense, we assume that the payoff of both the players in every
strategy profile is a real number in [0, 1] which we can assume without loss of generality by adding and then
multiplying each payoff values by suitable constants and this does not change any MSNE (why?).
For 3 or more player, the proof for PPAD-hardness goes as it is since other players can act as dummies
(for example, their payoffs may be a constant function). However, the Lemke-Howson line of argument for
proving membership in PPAD does not go through for 3 or more players and the problem of computing an
exact MSNE for games with 3 or more players seems to be strictly harder [EY10].
1
4.13.1 A Simple Polynomial Time Algorithm for 2
Approximate MSNE
We first present a simple 12 approximate MSNE for bimatrix games [DMP06]. The algorithm goes as follows.
Let i ∈ [m] be any arbitrary pure strategy of player 1. Let j ∈ [n] be a best response pure strategy for
player 2 against i. Let k ∈ [m] be a best response pure strategy of player 1 against j. We define a mixed
strategy x∗ of player 1 as i with probability 12 and j with probability 12 . We claim that (x∗ , j) is a 12 approximate
42
MSNE for the given bimatrix game. Let A and B be m × n payoff matrices of player 1 and 2 respectively.
Then we have the following for any pure strategy x ∈ [m] and y ∈ [n] of row and column players respectively.
1 1 1 1
u1 (x, j) = xAej 6 ek Aej 6 ei Aej + ek Aej + = u1 (x∗ , j) +
2 2 2 2
1 1 1 1 1 1 1
u2 (x∗ , y) = ei Ay + ek Ay 6 ei Aej + ek Ay 6 ei Aej + = u2 (x∗ , j) +
2 2 2 2 2 2 2
43
44
Chapter 5
We have seen in the last chapter that computing an MSNE even for a bimatrix game may be a computationally
challenging task. Even computing a PSNE for various succinct games e.g. congestion games seems to
be a computationally intractable task. These results cast doubt on the usefulness of the concept of Nash
equilibrium as a tool for predicting outcomes – if computing equilibriums are so hard, how do we expect
the players to discover it! This motivates us to relax the requirements of Nash equilibrium even further and
develop some computationally tractable concept.
Definition 5.1. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, a probability distribution σ over S =
Q 0
i∈N Si is called a correlated equilibrium if for every i ∈ N, we have the following for every si , si ∈
Si .
Es∼σ [ui (s)|si ] > Es∼σ [ui (s0i , s−i )|si ]
That is X X
σ(s|si )ui (s) > σ(s|si )ui (s0i , s−i )
s∈S s∈S
Q
Alternatively, a probability distribution σ over S = i∈N Si is called a correlated equilibrium if for
every i ∈ N, we have the following for every (switching) function δi : Si −→ Si .
45
Note that the distribution σ is over the strategy profiles. Any MSNE is also a correlated equilibrium where
the distribution σ is a product distribution. Hence, it follows from Nash Theorem that every finite game has
at least one correlated equilibrium.
One usual interpretation of correlated equilibrium is the following: suppose players agree that they will
collectively play according to some distribution σ over the set of strategy profiles. Some trusted third party
samples a strategy profile (si )i∈N from the distribution σ. The trusted third party conveys each player i
his/her strategy si in the sampled strategy profile. Each player i has two options at the time of playing –
either to follow the suggestion of the trusted third party and play si or to not follow the suggestion of the
trusted third party and play some other strategy s0i . At the time of playing, each player i thus knows two
things – (i) the strategy si suggested to him/her (ii) a posterior distribution on s−i derived from σ and si .
The requirement for σ to be a correlated equilibrium is that every player cannot improve his/her expected
utility by deviating to any other strategy s0i .
Example 5.1.1 (Traffic Light). This can be written in normal form as follows.
Player 2
Stop Go
B Payoff matrix:
Stop (0, 0) (0, 1)
Player 1
Go (1, 0) (−10, −10)
Let us understand the concept of CE with the example game of traffic lights. The game has two PSNEs
– (Stop, Go) and (Go, Stop). The game also has a CE σ which assigns the strategy profiles (Stop, Go) and
(Go, Stop) each a probability of 21 . Clearly σ is not a product distribution and thus does not correspond to
any MSNE. One can observe that the requirement in Definition 5.1 can be written as a linear program and
thus a CE can be computed in polynomial time. Given a game Γ = hN = [n], (Si )i∈N , (ui )i∈N i, to write an
Q
LP for CE, we introduce a variable xs for every strategy profile s ∈ S = i∈N Si . Then any solution to the
following LP is a CE.
xs > 0, ∀ s ∈ S
P
xs = 1
P P s∈S 0 0
(si ,s−i )∈S x(si ,s−i ) ui (si , s−i ) > (si ,s−i )∈S x(si ,s−i ) ui (si , s−i ), ∀ i ∈ N, si , si ∈ Si
46
Coarse Correlated Equilibrium (CCE)
Definition 5.2. Given a game Γ = hN, (Si )i∈N , (ui )i∈N i, a probability distribution σ over S =
Q
i∈N Si is called a coarse correlated equilibrium if for every i ∈ N, we have the following for every
s0i ∈ Si .
Es∼σ [ui (s)] > Es∼σ [ui (s0i , s−i )]
That is X X
σ(s)ui (s) > σ(s)ui (s0i , s−i )
s∈S s∈S
xs > 0, ∀ s ∈ S
P
s∈S xs = 1
P P 0 0
s∈S xs ui (s) > s∈S xs ui (si , s−i ), ∀ i ∈ N, si ∈ Si
In plain English, while CE guards against conditional unilateral deviation by any player, CCE guards
against unconditional unilateral deviation by any player. Another interpretation of CE and CCE is using non-
binding and binding contracts. Think of players signing a contract to play according to some distribution σ
over the set of strategy profiles. The notion of CE captures the dynamics of non-binding contract – any player
is allowed to deviate even after knowing what realization of strategy it is suggested to play. Whereas CCE
models binding contracts – no player is allowed to deviate from his/her suggested strategy after they agree
on σ. We leave it to readers to verify that every CE is also a CCE. The intuitive reason for this is if players
do not have any incentive to deviate even after knowing the realization of his/her strategy, he/she does not
have any incentive to deviate in the first place (before knowing his/her realized strategy). Hence, it follows
that every finite strategy form game has at least one CCE and it can be found in polynomial time. However,
we will that the algorithm for finding a CCE is much simpler and natural than the algorithm for finding a CE.
The above framework of external regret models various real world scenarios like network routing (where
A is the set of paths from a source to a destination), investment strategies (where A is the set of investment
47
strategies we have), etc. The adversary above is called an adaptive adversary since before choosing the utility
function πt on t-th day, it knows everything that happened so far, that is, p1 , . . . , pt and a1 , . . . , at−1 . An
adversary is called oblivious if πt depends only on t and (of course) the algorithm under consideration;
specifically, it does not depend on p1 , . . . , pt and a1 , . . . , at−1 .
Given the external regret framework, how “good” we expect to perform. To study this, we need to fix a
P
benchmark B. Our goal is to achieve time averaged regret T1 ( Tt=1 pt (at )πt (at ) − B) to go to 0 as T goes to
∞ – we call such an algorithm a no-regret algorithm. One obvious benchmark is the “best action sequence”
P
– compare the performance of our algorithm with Tt=1 maxat ∈A πt (at ). However, the following adversary
shows we do not hope to achieve our goal even when |A| = 2. If pt puts at least a probability of 1/2 to strategy
1 (respectively strategy 2), then the adversary sets the utility of strategy 1 (respectively strategy 2) to be 0
P
and the utility of strategy 2 (respectively strategy 1) to be 1. Obviously we have Tt=1 maxat ∈A πt (at ) = T
P
where as T1 ( Tt=1 pt (at )πt (at ) 6 T/2. Hence the time averaged regret is at least 1/2. The intuitive reason
for this strong lower bound is two fold: (i) the adversary has too much power, (ii) the benchmark is too
strong. To achieve a time averaged regret of oT (1), we weaken a benchmark – we compare with a “best
P
fixed” action. That is, our fix our benchmark to be maxa∈A Tt=1 πt (a). We now see that, with respect to
this benchmark, there exists a simple and natural no-regret algorithm.
Theorem 5.1.q Let|A| = n. Then there exists a no-regret algorithm whose expected time averaged
log n
regret is O T with respect to every fixed action.
We now present the algorithm in Theorem 5.1 which is popularly known as multiplicative weights (MW)
algorithm or randomized weighted majority or Hedge algorithm.
48
5.5 Analysis of Multiplicative Weight Algorithm
Proof of Theorem 5.1. We now proveq thatthe time-averaged regret of the multiplicative weight algorithm,
log n
for appropriate choice of ε, is O T thereby proving Theorem 5.1. We may assume without loss of
generality that the adversary we are working against is an oblivious one (why?). Let u1 , . . . , uT be the payoff
functions in T iterations.
P
Let Γt = a∈A wt (a) be the sum of weights all the actions in t-th iteration for t ∈ {0, 1, . . . , T }. The
P
idea of the proof is to relate both the benchmark OPT = maxa∈A Tt=1 πt (a) and the expected payoff
PT P
a∈A pt (a)πt (a) of the MW algorithm to ΓT . We begin with relating OPT with ΓT . Let a ∈ A be the
∗
t=1
PT
action such that OPT = t=1 πt (a∗ ).
ΓT > wT (a∗ )
Y
T
∗
= w0 (a∗ ) (1 + ε)πt (a )
t=1
PT
πt (a∗ )
= (1 + ε) t=1 [Since w0 (a∗ ) = 1]
= (1 + ε)OPT (5.1)
P P
We now relate the expected payoff Tt=1 a∈A pt (a)πt (a) of the MW algorithm to ΓT . The payoff in the
t-th step is given by,
X X wt (a)
pt (a)πt (a) = πt (a)
Γt
a∈A a∈A
QT
In particular, we have ΓT = Γ0 t=1 (1 + ενt ). Now from Equation (5.1), we have the following.
(1 + ε)OPT 6 ΓT
Y
t
6n (1 + ενi )
i=1
X
T
⇒ OPT ln(1 + ε) 6 ln n + ln(1 + ενt )
t=1
49
X
T
⇒ OPT ε − ε2 6 ln n +
ενt
t=1
ln n X
T
⇒ OPT (1 − ε) 6 + νt
ε
t=1
X
T
ln n
⇒ OPT − νt 6 εOPT +
ε
t=1
ln n
6 εT + [OPT 6 T ]
ε " #
√
r
ln n
6 2 T ln n Put ε =
T
X
T
r
1 ln n
⇒ (OPT − νt ) 6 2
T T
t=1
Before we show the connection between no-regret algorithms and various game-theoretic
q solution con-
ln n
cepts, let us make some remarks. In the proof of Theorem 5.1, we have set ε = T – so we assumed
that we know the time horizon T is known a priori. We show next that the same guarantee can be achieved
without knowing the time horizon T .
q
Proof of Remark 5.1. On t-th day, set the value of ε to be lnTtn where Tt is the smallest power of 2 larger
than t. It t is a power of 2, then we reset wt (a) = 1 for every action a ∈ A. Let us define ` = blg T c. We
partition the time interval [1, T ] to ` epochs ξi , i ∈ {0, 1, . . . , `} – ξi is the time interval [2i , min{2i+1 − 1, T }].
Without loss of generality we may assume that the length of every epoch is some power of 2 (by adding more
rounds of play where adversary chooses constant payoff function and this at most doubles T which does not
affect the result we wish q
to prove 2). We observe that the value of ε remains same in an epoch – in epoch
ξi , the value of ε is εi = ln2in . Let OPTi be the payoff of optimal action with respect to the epoch ξi alone
P
for i ∈ {0, 1, . . . , `}; that is OPTi = maxa∈A t∈ξi (a). We now have the following.
X X̀ X
T
!
OPT − νt 6 OPTi − νt
t=1 i=0 t∈ξi
X̀ ln n
6 εi 2i +
εi
i=0
√ X̀ i
= ln n 2 2 +1
i=0
50
` √
6 2 2 +2 ln n
r
ln n
=4
T
Remark 5.2. The problem of designing a no-regret algorithm is sometimes called combining expert
advice – at every iteration, the mixed strategy can be thought of as an expert’s advice and the goal is
to perform as good as the best expert asymptotically.
q
log n
One can ask if there exists a no-regret algorithm whose time-averaged regret is smaller than O T .
It turns out that the simple oblivious adversary which picks a cost vector
q uniformly
at random from {ei : i ∈
log n
[n]} enforces any algorithm to incur a time-averaged regret to be Ω T (can you prove it?).
This is called no-regret dynamic. The following result shows a fundamental connection of the no-regret
dynamic with coarse correlated equilibrium.
Theorem 5.2. For ε > 0, suppose after T iterations, the time averaged regret of every player is at
Q
q
most ε if every player runs MW algorithm, then T = O ln n
ε2 . Let σt = n t
i=1 pi be the product
P
distribution in t-th iteration for t ∈ [T ] and σ = T1 Tt=1 σt be the time averaged distribution. Then σ
is an ε coarse correlated equilibrium of the game Γ ; that is
51
Since the time averaged regret of player i after T iterations is at most ε with respect to any fixed action,
we have the following for every s0i ∈ Si .
1 X X
T T
!
0
Es∼σi [ui (si , s−i )] − Es∼σi [ui (s)] 6 ε
T
t=1 t=1
No Swap-Regret
Definition 5.3. An online algorithm is said to have no swap regret is its time averaged expected
swap regret
X
T X X
T X
! !
1
max pt (a)πt (δ(a)) − pt (a)πt (a)
T δ∈F swap
t=1 a∈A t=1 a∈A
goes to 0 as T goes to ∞.
We show next that if every runs no-swap-regret algorithm, then they converge to a correlated equilibrium.
The setup is exactly same as before. That is the set of actions available to player i is his/her strategy set Si .
Suppose each player i ∈ [n] chooses a mixed strategy pi ∈ ∆(Si ). The payoff function πi (in the external
regret framework) given to player i is his/her expected utility ui (si , p−i ) for playing a strategy si given every
other play j ∈ [n] \ {i} plays the mixed strategy pi .
Theorem 5.3. For ε > 0, suppose after T iterations, the time averaged swap regret of every player is
Q PT
at most ε. Let σt = n t 1
i=1 pi be the product distribution in t-th iteration for t ∈ [T ] and σ = T t=1 σt
be the time averaged distribution. Then σ is an ε correlated equilibrium of the game Γ ; that is, for
52
every player i ∈ N and every switching function δi : Si −→ Si , we have the following
Since the time-averaged swap regret of player i after T iterations is at most ε with respect to any fixed
action, we have the following for every player i ∈ N and every switching function δ : Si −→ Si .
1 XX X T X
T
!
pt (a)πt (δ(a)) − pt (a)πt (a) 6 ε
T
t=1 a∈A t=1 a∈A
1 X X
T T
!
⇒ Es∼σ [ui (δi (si ), s−i )] − Es∼σi [ui (s)] 6 ε
T
t=1 t=1
So, we have seen that, if players run no-swap-regret dynamics, then the corresponding distribution con-
verges to a correlated equilibrium. However, does there exist a no-swap-regret algorithm? Obviously, every
no-swap-regret algorithm is also a no-external-regret algorithm and the opposite is clearly not true. This
makes the following result interesting.
Theorem 5.4. If there exists an algorithm with time averaged external-regret R(T , n), then there
exists an algorithm with time averaged swap-regret nR(T , n). In particular, if there exists a no-
external-regret algorithm, then there exists a no-swap-regret algorithm.
Proof. Let the action set be A = {1, 2, . . . , n}. Let B be an algorithm with time averaged external-regret
R(T , n). We take n instances B1 , . . . , Bn of the algorithm B. We build a “master algorithm” H using
B1 , . . . , Bn . At every time t = 1, . . . , T , the following happens
2. Using pt1 , . . . , ptn , H computes a distribution pt (let the need dictate how it should be done).
53
4. H provides pt (i)πt to Bi as the payoff function for i ∈ [n].
pt1
pt (1)πt B1
pt2
pt pt (2)πt B2
πt H
ptn
pt (n)πt Bn
Figure 5.3: Schematic diagram of the master algorithm in the proof of Theorem 5.4.
1 XX t
T
p (i)πt (i)
T
t=1 i∈A
Let δ : A −→ A be any switching function. Then the time averaged expected payoff under this switching
function is
1 XX t
T
p (i)πt (δ(i))
T
t=1 i∈A
X
T X X
T X
!
1 t t
p (i)πt (δ(i)) − p (i)πt (i) (5.2)
T
t=1 i∈A t=1 i∈A
Since each Bi has external regret R(T , n), we have the following for every action λ ∈ A.
1 X t X
T T X
p (i)πt (λ) − pti (j)pt (i)πt (j) 6 R(T , n)
T
t=1 t=1 j∈A
54
From Equation (5.2) and section 5.8, it is obvious how we should set the consensus distribution pt from
pt1 , . . . , ptn . X
pt (i) = ptj (i)pt (j), ∀ i ∈ A (5.4)
j∈A
Since Equations (5.4) and (5.5) form a steady-state equation of an aperiodic and irreducible Markov
chain, there is a unique solution to the system of Equations (5.4) and (5.5). this concludes the proof of the
result.
Theorem 5.4 and Theorem 5.1 immediately give us the following result.
No Swap-Regret Algorithm
Corollary 5.2.
q Let |A| = n. Then there exists an algorithm whose expected time averaged swap-
log n
regret is O n T .
55
56
Chapter 6
Price of Anarchy
Let Γ = hN, (Si )i∈N , (ui )i∈N i be a game in normal form. Since players are assumed to be self-interested,
the strategy profile that they collectively play in some equilibrium may not be good collectively. The notion
of social welfare captures the goodness of strategy profiles. For example, in a network congestion game, a
natural social welfare of a strategy profile would be the average delay of the players. In general a social
Q
welfare function w : i∈N Si −→ R>0 maps strategy profiles to positive real numbers. The notion of price of
anarchy (PoA) quantifies how bad a strategy profile in equilibrium be with respect to a best strategy profile.
57
A A
s t s c(x)=0 t
B B
player takes the bottom edge resulting in an average travel time of 1 for every player. Whereas a strategy
profile where half of the players take the top edge and other half take the bottom edge results in average
travel time of 34 . Hence the PoA of the network is 43 .
For the network on the right of Figure 6.2, the cost function is non-linear. However, the bottom edge
continues to be a weakly dominated strategy for every player. Whereas the socially optimal strategy would
be to split total traffic as ε (for some appropriate ε) fraction using the top edge and the rest (1 − ε) fraction
using the bottom edge resulting in average delay ε + (1 − ε)p+1 . Hence the average travel time in the optimal
strategy goes to 0 as p → ∞. Hence, as p → ∞, the PoA also goes to ∞.
c(x)=1 c(x)=1
s t s t
c(x)=x c(x) = xp
58
now is that among all networks, Pigou’s network gives the worst PoA bound. Hence we may have bad PoA
because of non-linear cost function and not because of some complicated network structure. In the rest of
this section, we will make the above statement precise. We begin with formalizing Pigou’s network.
Pigou’s Network
B The cost of one edge is c(x) where x is the fraction of traffic using that edge. The function c is
non-decreasing, non-negative, and continuous.
Since cost function c is non-decreasing and non-negative, α(c) can also be written as
rc(r)
α(c) = sup
ε>0 εc(r) + (r − ε)c(r − ε)
Expressing α(c) where ε ranges over all non-negative real numbers will be convenient later.
We have seen in non-linear Pigou’s network that α(c) can get worse if we “increase” the non-linearity of
the function c. Hence the idea is to restrict the function c to vary within a class of functions and study how
bad the PoA can become. Important examples of such classes include set of all affine functions {ax + b :
a, b ∈ R>0 }, quadratic functions {ax2 + bx + c : a, b, c ∈ R>0 }, etc. For a class C of cost function, we define
α(C) as
rc(r)
α(C) = sup sup α(c) = sup sup sup
c∈C r>0 c∈C r>0 ε>0 εc(r) + (r − ε)c(r − ε)
Theorem 6.1. For any class C of cost functions, the PoA of any network with one source and one
destination with cost functions from C is at most α(C) with respect to PSNEs.
Proof. Let G be the graph under consideration and f : E[G] −→ R>0 be an s−t flow of G. Our first observation
is that f forms a PSNE if and only if every player is following a shortest path with respect to cost function
ce (f) for the edge e. Formally, let P denote the set of all paths from s to t in G and P be any path from s to
t. Then we have the following if f is a PSNE.
X
fP > 0 ⇒ P ∈ argmin ce (fe )
P∈P e∈P
59
Indeed otherwise, because of continuity of c, a suitably small fraction of players from the path P can
be shifted to a shorter path which would be beneficial for these shifted players. Let feq be a flow which
constitutes a PSNE.
Let f∗ be a socially optimal flow. That is
X
∗
f ∈ argmin ce (fe )
f is a s − t flow of value 1 e∈E
Since equilibrium flow feq only uses shortest paths according to the cost function ce (feq
e ), e ∈ E[G], all
paths used by the equilibrium flow f have the same cost; call it L. Also, for any path P ∈ P from s to t, its
eq
Since α(C) is supremum over all c ∈ C, r > 0, ε > 0, we have the following for any edge e ∈ E[G].
feq eq
e ce (fe )
α(C) >
f∗e ce (f∗ ) + (feq ∗ eq
e − fe )ce (fe )
1 eq
⇒ f∗e ce (f∗ ) > f ce (feq eq ∗ eq
e ) − (fe − fe )ce (fe )
α(C) e
X 1 X eq X
⇒ f∗e ce (f∗ ) > fe ce (feq
e )− (feq ∗ eq
e − fe )ce (fe )
α(C)
e∈E e∈E e∈E
∗ c(feq )
⇒ c(f ) >
α(C)
⇒ PoA 6 α(C)
Hence it follows from Theorem 6.1 that the PoA of any selfish routing instance with affine cost functions
is at most 34 . We end our discussion of PoA with another important example, namely selfish load balancing.
of jobs to machines, the load `j of machine j is the total time it operates to execute the jobs assigned to it:
P
`j = i∈[n]:A(i)=j w sj . The makespan of a schedule is the maximum load of any machine. One of the well
i
60
studied optimization problem is to compute a schedule which has minimum possible makespan. We now
study this problem with a game theoretic twist to it — each job is a player who itself chooses the machine
for it.
(ii) Strategy set of any player: [m]. That is the set of machines.
(iv) Cost function: ci (j1 , . . . , jn ) : `ji . That is the makespan of the machine where i-th job runs.
Proof. Let us associate each assignment A : [n] −→ [m] with its sorted load vector (λ1 , . . . , λm ) where λj is the
j-th highest load breaking ties arbitrarily. Given two load vectors λ = (λ1 , . . . , λm ) and λ0 = (λ01 , . . . , λ0m ), we
say that λ is lexicographically smaller than λ0 if there exists a j ∈ [m] such that λk = λ0k for every 1 6 k 6 j−1
and λj < λ0j .
Let A be an assignment whose corresponding sorted load vector (λ1 , . . . , λm ) is a lexicographically small-
est vector. We claim that A is a PSNE. Suppose not, then there exists a job i ∈ [n] who benefits from moving
from its current machine s to a new machine t. Suppose the load of machine s and t in the assignment A are
respectively λa and λb for some a, b ∈ [m]. Then the movement benefits the job i, we must have λb < λa .
Let the new assignment be A0 having corresponding sorted load vector (λ01 , . . . , λ0m ). Then we have λk = λ0k
for every 1 6 k 6 a − 1 and λa > λ0a . This contradicts our assumption that (λ1 , . . . , λm ) is a lexicographically
smallest vector.
Theorem 6.3. The PoA of any selfish load balancing game with n jobs and m identical (having same
2
speed) machines is at most 2 − m+1 with respect to PSNEs.
61
Proof. Since machines are identical, by scaling the weights of the jobs appropriately, we may assume without
loss of generality that the speed of every machine is 1. Let A be a PSNE assignment, its makespan be c(A),
and OPT be the minimum makespan possible. Let j∗ ∈ [m] be a machine having the highest load under
the assignment A and i∗ be a job assigned to j∗ . We observe that there must be at least 2 jobs which are
assigned to the machine j∗ otherwise we have c(A) = OPT (why?). So we have wi∗ 6 21 c(A). Also for any
other machine j ∈ [m] \ {j∗ }, its load `j under the assignment A is at least `j∗ − wi∗ otherwise the job i∗ has
a beneficial move. Hence, we have the following.
1 1
`j > `j∗ − wi∗ > c(A) − c(A) > c(A)
2 2
We now bound OPT which will prove the result.
P
i∈[n] wi
OPT >
P m
j∈[m] `j
=
mP
`j∗ + j∈[m]\{j∗ } `j
=
m
c(A) + 12 (m − 1)c(A)
>
m
m+1
= c(A)
2m
Hence the PoA is bounded as
c(A) 2m 2
PoA = 6 = 2−
OPT m+1 m+1
62
Chapter 7
Till now we have studied simultaneous move games in the complete information setting. In this chapter, we
would relax both the conditions. Relaxing the notion of simultaneous move gives rise to, what is called,
extensive form games. Relaxing the assumption of complete information gives birth to Bayesian games.
Bayesian Game
Definition 7.1. A Bayesian game Γ is given by hN, (Θi )i∈N , (Si )i∈N , p, (ui )i∈N i where
B ui : ×i∈N Θi ×i∈N Si −→ R
In a Bayesian game, the game Γ is a common knowledge to the set of players. Player i knows her
realization θi ∈ Θi of her type but does not know the realization of types of other players. However, since
player i knows the joint probability distribution over Θ, she forms a posterior distribution p(·|θi ) on Θ−i . Let
us see an example.
Example 7.1.1 (Sealed Bid Selling Auction as Bayesian Game). Suppose there is a seller who wishes
to buy an item from one of n potential buyers by following an auction.
B The set of players: N = {1, 2, . . . , n} (this set corresponds to the buyers of the item).
B The set of types: Θi = [0, 1] for every i ∈ [n]. This corresponds to the set of valuations of buyer i.
63
B The set of strategies: Si = [0, 1] for every i ∈ [n].
B p is the product distribution: p = ×i∈[n] pi where pi is the uniform distribution over [0, 1].
B Allocation function: a : ×i∈N Si −→ RN defined as a (si )i∈N = (ai )i∈N where ai = 1 if i-th
B Payment function: q : ×i∈N Si −→ RN defined as q (si )i∈N = (qi )i∈N where qi is the
B Utility function:
ui (θ1 , . . . , θn , s1 , . . . , si , . . . , sn ) = ai (θi − qi )
Any Bayesian game can be converted to an equivalent normal form game (with complete information) which
is called the Selten game corresponding to the given Bayesian game. The high-level idea of the Selten game
is the following. In the Selten game, we have a player corresponding to each type θi ∈ Θi in the Bayesian
game for every i ∈ N — think of these players as possible instantiations of player i which gets realized
depending on her type. The utility of the player corresponding to θi is the expected utility (with respect to
her posterior distribution p(·|θi )) of player i in the original Bayesian game under type θi . We now formally
define the Selten game.
Selten Game
Definition 7.2. Let Γ = hN, (Θi )i∈N , (Si )i∈N , p, (ui )i∈N i be any Bayesian game. The corresponding
normal form game Γ s is defined as Γ s = hNs , (Sθi )θi ∈Θi ,i∈N , (Uθi )θi ∈Θi ,i∈N i as
B Ns = ∪i∈N Θi
Observe that, viewing a Bayesian game as its corresponding Selten game, a pure strategy for player i
in the Bayesian game is nothing but the pure strategy profile (sθi )θi ∈Θi ∈ ×θi ∈Θi Si of the corresponding
players in the Selten game. Rephrasing, a pure strategy si of a player i in a Bayesian game is a function
si : Θi −→ Si . This allows us to extend any game-theoretic solution concepts that we have learned for
normal form games to Bayesian games. For example, Pure strategy Bayesian Nash equilibrium is defined as
follows.
64
Pure Strategy Bayesian Nash Equilibrium
Eθ−i [ui (θi , θ−i , s(θi ), s−i (θ−i ))] > Eθ−i [ui (θi , θ−i , ai , s−i (θ−i ))]
We conclude our exposition on Bayesian games by exhibiting a pure strategy Bayesian Nash equilibrium
for the first price sealed bid auction.
Theorem 7.1. In the Bayesian game corresponding to the first price selling auction, each buyer
bidding half their individual valuation forms a pure strategy Bayesian Nash equilibrium under the
following assumptions.
(iii) Each buyer is risk neutral: bid bi (θi ) of player i ∈ [2] is of the form αi θi for some αi ∈ (0, 1].
Using above line of arguments for player 2, the following b∗2 (θ2 ) maximizes u2 (θ1 , θ2 , b1 , b2 ).
θ2 if θ2
6 α1
2 2
b∗2 (θ2 ) =
α1 if θ2
> α1
2
65
Choosing α1 = α2 = 21 , we get b∗1 (θ1 ) = θ21 , b∗2 (θ2 ) = θ22 . We observe that, with α1 = α2 = 12 , b∗1 (θ1 ) =
θ1 ∗ θ2
2 , b2 (θ2 ) = 2 simultaneously maximizes both the players’ utilities and thus form a pure strategy Bayesian
Nash equilibrium.
The above proof can be generalized to n risk-neutral players where every player’s valuation is uniformly
distributed over [0, 1]. It can be shown that ( n−1
n θi )i∈[n] forms a pure strategy Bayesian Nash equilibrium.
We now relax the condition that players move simultaneously in a normal form game. We standard way to
model a sequential move game is using game tree. Let us see few examples of sequential move games.
Example 7.3.1 (Matching Pennies with Observation). We have seen the matching pennies game in
normal form where players move simultaneously. Let us now consider the following new game: player 1
first makes a move, player 2 observes the action played by player 1 (and thus the name with observation),
and player 2 makes a move, and then both players receive utilities. Each game tree has three types
of nodes: (i) a root node corresponding to initial decision node, (ii) internal nodes corresponding to
decisions of players, and (iii) leaf nodes corresponding to outcomes.
1
H T
2 2
H T H T
1, −1 −1, 1 1, −1 1, −1
Example 7.3.2 (Matching Pennies without Observation). We have seen the matching pennies game in
normal form where players move simultaneously. Let us now consider the following new game: player 1
first makes a move, player 2 makes a move before observing player 1’s action (and thus the name without
observation), and then both players receive utilities. Observe that, when player 2 plays her action, then
she does not know which of the two nodes within the dotted box she is in. The dotted box is called an
information set.
66
1
H T
2 2
H T H T
1, −1 −1, 1 1, −1 1, −1
We now formally define extensive form games. For that, we need to define the concept of an information
set.
Information Set
Definition 7.4. An information set of a player is a subset of that player’s decision nodes which are
indistinguishable to her.
For example, in the matching pennies game with observation, player 1 has one information set {ε} con-
sisting of empty history (corresponding to node 1) whereas player 2 has two information sets {H} and {T }
consisting of two sub-histories which are distinguishable for player 2. On the other hand, in the matching
pennies game without observation, player 1 still has one information set {ε} consisting of empty history and
player 2 also has one information set, namely {H, T } consisting of two sub-histories which are indistinguish-
able for player 2. We now define extensive form games.
Definition 7.5. An extensive form game Γ is a tuple hN, (Si )i∈N , H, P, (Ii )i∈N , C, (ui )i∈N i where
B H : set of all paths to leaf nodes called histories. SH is the set of all proper sub-histories which
are paths to internal nodes.
B C : ∪i∈N Ii −→ ∪i∈N Si with C(J) ⊆ Si for every J ∈ Ii , i ∈ N maps each information set to the
strategies available at that information set.
67
Perfect Information and Imperfect Information Games
Definition 7.6. An extensive form game is called a perfect information game if every information
set is singleton. Otherwise, the extensive form game is called an imperfect information game.
For example, the matching pennies game with observation is a perfect information game whereas the
matching pennies game without observation is an imperfect information game. The strategy set of player
i ∈ N : {si : Ii −→ Si |si (J) ∈ C(J) ∀ J ∈ Ii }. Like Bayesian games, any extensive form game can be expressed
equivalently as a strategic form game.
Definition 7.7. Given an extensive form game Γ is a tuple hN, (Si )i∈N , H, P, (Ii )i∈N , C, (ui )i∈N i, the
corresponding strategic form game Γ s hNs , (S0i )i∈Ns , (u0i )i∈Ns i is given by
B Ns = N
B u0i (s1 , . . . , sn ) is the utility that player i receives in Γ at the leaf node if player play according to
(s1 , . . . , sn ).
68
Part II
Mechanism Design
69
Chapter 8
In the first part of the course, we have analyzed various games and explored concepts to predict the behavior
of the system which consists of rational and intelligent players. Now we explore the theory of mechanism
design which concerns designing rules of the game to achieve a certain objective.
On a high level, mechanism design is the same as algorithm design but some part of the input is held with
strategic agents. So, in a generic mechanism design setting, we have a set of n players, each player i ∈ [n]
has a private data θi ∈ Θi , and our goal is to compute a function f : ×i∈[n] Θi −→ X where X is the set of
outcomes. To complete the picture, we need to define the utility function of every player i ∈ [n] (otherwise
saying that players are rational and intelligent does not make sense). Each player i ∈ [n] has a utility function
ui : X × Θ −→ R where Θ = ×n i=1 Θi . On top of that, we assume a prior distribution p ∈ ∆(Θ). The function
f is called a social choice function. Whenever we talk about any social choice function f : ×i∈[n] Θi −→ X,
it should be understood that the n parts of the input is held by n different players with utility functions
ui , i ∈ [n].
Loosely speaking, the job of the mechanism designer is to come up with a mechanism (an algorithm)
which ensures that the social choice function f, that is under consideration, is indeed computed by the
mechanism on every input θ ∈ Θ. To achieve this goal, the mechanism designer introduces a game among
the players by defining two things: (i) strategy set Si for player i for every i ∈ N and (ii) a function
g : ×i∈[n] Si −→ X which decides the outcome based on strategies played by each player. We now formally
define what is called an indirect mechanism.
Indirect Mechanism
Definition 8.1. An indirect mechanism is a tuple M = hN = [n], X, (Θi )i∈N , (Si )i∈N , P ∈ ∆(Θ), g :
×i∈N Si −→ X, ui : X × Θ −→ Ri where
71
B Si is the strategy set of player i ∈ N.
If N, X, (Θi )i∈N , P, ui is clear from the context, we will denote an indirect mechanism by
((Si )i∈N , g(·)).
As usual, we assume that the game Γ A mechanism is called a direct mechanism if we have Si = Θi for
every player i ∈ [n] and g = f. Hence, by definition, the set of direct mechanisms forms a strict subset of
the set of all indirect mechanisms. Typically N, X, (Θi )i∈N , and ui will be clear from the context and we will
assume that we are always given a prior distribution P ∈ ∆(Θ). Therefore, to avoid cumbersome notation,
we will denote an indirect mechanism by ((Θi )i∈N , f(·)). Observe the simplicity of direct mechanisms: unlike
indirect mechanism, the mechanism designer does not need to worry about the choice of Si , i ∈ [n] and g.
Actually, given a social choice function f, there is nothing left to design in direct mechanism. The mechanism
designer simply asks every player to reveal his/her true type and applies f on revealed type profile to choose
an outcome from X.
Example 8.0.1 (Buying Auction). An auction, say for a buyer to buy an item from n potential sellers,
is a mechanism.
B The set of players (N) : {0, 1, . . . , n}. Say player 0 is the buyer and the remaining players are the
sellers.
Think of ai = 1 if player i receives the item, 0 if player i neither receives nor gives the item, and
−1 if player i gives the item. Similarly pi is the payment made by player i; if pi is negative,
P
n
that means player i receives payment. So the condition pi = 0 guarantees that the mechanism
i=0
neither generates any surplus money nor requires any money from outside. This is called the budget
balance property.
B The set Θi is the set of all possible valuations of the item to player i.
B The set of strategies: Si , i ∈ [n] is the set of possible bids for the item. Observe that the buyer also
submits a bid. Think of it like the maximum amount of money the buyer is willing to pay for the
item.
B The function g decides the outcome. For different types of auctions, g will be different.
72
B ui ((a0 , . . . , an , p0 , . . . , pn ), (θ0 , . . . , θn )) = ai θi − pi
Ui (s1 , . . . , sn , θ1 , . . . , θn ) = ui (g (s1 , . . . , sn ) , θ1 , . . . , θn )
Towards answering the above question, we first need to formalize what do we mean by computing f in a
strategic environment. This notion is called the implementability of a social choice function.
Definition 8.2. We say that a (indirect) mechanism M = ((Si )i∈[n] , g(·)) implements a social choice
function f : Θ −→ X if there exists an equilibrium s∗ (·) = ((s∗i (·))i∈[n] ) in the induced Bayesian game
ΓM such that g(s∗1 (θ1 ), . . . , s∗n (θn )) = f(θ1 , . . . , θn ).
If the equilibrium in Definition 8.2 happens to be a very weakly dominant strategy equilibrium, then we
say that M implements f in dominant strategy equilibrium. In this case, we say the mechanism M Dominant
Strategy Incentive Compatible (DSIC in short). On the other hand, if the equilibrium is a pure strategy
Bayesian Nash equilibrium, then we say that M implements f in Bayesian Nash equilibrium. In this case, we
say M to be Bayesian Incentive Compatible (BIC in short).
In the context of a buying auction, consider the following two social choice functions using notation the
same as Example 8.0.1.
73
Otherwise : a0 = 1, aj = −1, ai = 0 ∀ i ∈ [n] \ {j}
where ak > aj ∀ k ∈ [j − 1], ak > aj ∀ k ∈ {j + 1, . . . , n}
p0 = min θi , pj = −p0 , pi = 0 ∀ i ∈ [n] \ {j}
i∈[n]\{j}
We have seen that the second price auction mechanism implements fsp in dominant strategies. One can
also observe that there is no direct mechanism that implements ffp even in a pure strategy Bayesian Nash
equilibrium simply because revealing types truthfully is not even a Nash equilibrium of the induced Bayesian
game. We will show next that there is actually no mechanism that implements ffp .
Theorem 8.1. Suppose there exists an indirect mechanism M = ((Si )i∈[n] , g(·)) that implements a
social choice function f : ×i∈[n] Θi −→ X in dominant strategy equilibrium. Then the direct mecha-
nism ((Θi )i∈[n] , f(·)) also implements f in dominant strategy equilibrium.
Proof. Since M implements f in dominant strategy equilibrium, there exists a very weakly dominant strategy
equilibrium (s∗i (·))i∈[n] of the Bayesian game ΓM such that we have the following.
Using Equations (8.1) and (8.2), we have the following for every i ∈ [n], θi , θ0i ∈ Θi , θ−i ∈ Θ−i .
The above proof can be closely imitated to prove the revelation principle for BIC mechanisms too. The
statement is as follows. We leave its proof to the reader.
Theorem 8.2. Suppose there exists an indirect mechanism M = ((Si )i∈[n] , g(·)) that implements
a social choice function f : ×i∈[n] Θi −→ X in a pure strategy Bayesian Nash equilibrium. Then the
74
direct mechanism ((Θi )i∈[n] , f(·)) also implements f in a pure strategy Bayesian Nash equilibrium.
Due to revelation principle, we can now easily check implementability of any social choice function f. The
social choice function f : ×i∈[n] Θi −→ X is implementable if and only if the direct mechanism ((Θi )i∈[n] , f(·))
is incentive compatible. The next question that we would like to investigate is the following.
Obviously not all social choice functions are implementable. For example, the social choice function ffp is
not implementable because the corresponding direct mechanism ((Θi )i∈[n] , ffp ) is not incentive compatible.
We will see that only a very limited class of social choice function is implementable if we allow the players
to have any arbitrary utility functions. This result, namely Gibbard-Satterwaite Theorem, is one of the
cornerstone results in game theory. We grab this occasion to introduce some useful properties of social
choice functions.
Ex-Post Efficiency
Definition 8.3. A social choice function f : ×i∈[n] Θi −→ X is called ex-post efficient or paretian or
simply efficient if, for every type profile θ ∈ ×i∈[n] Θi , the outcome f(θ) chosen by f is pareto optimal.
That is, for every type profile θ ∈ ×i∈[n] Θi , there does not exist any outcome x ∈ X such that
ui (x, θ) > ui (f(θ), θ) ∀ i ∈ [n] and uj (x, θ) > uj (f(θ), θ) for some j ∈ [n]
8.4.2 Non-Dictatorship
A social choice function f is called dictatorship if there exists a player, called dictator, such that the outcome
that f chooses for any type profile is best possible for the dictator. Formally, the notion of dictatorship is
defined as follows.
Non-Dictatorship
Definition 8.4. A social choice function f : ×i∈[n] Θi −→ X is called a dictatorship if there exists a
player d ∈ [n], called dictator, such that we have the following for every type profile θ ∈ Θ.
75
8.4.3 Individual Rationality
The property individual rationality deals with voluntary participation of players in a mechanism. That is,
when a rational and self-interested player will participate in the mechanism ((Θi )i∈[n] , f(·)). To formalize
this concept, suppose ui (θi ) is the utility of player i ∈ [n] if her type is θi ∈ Θi even if she does not
participate in the mechanism. The most well studied notion in this context is ex-post individual rationality or
simply individual rationality.
Definition 8.5. A social choice function f : ×i∈[n] Θi −→ X is said to be ex-post individually rational
or simply individually rational if we have the following for every player i ∈ [n]
We can see that ex-post individual rationality is relevant in applications where players are allowed to
withdraw from the mechanism even after every player has revealed their type. Our next form of individual
rationality, called interim individual rationality, is required in applications where players are allowed to
withdraw from the mechanism even after she has learned her type but other players are yet to declare their
types.
Definition 8.6. A social choice function f : ×i∈[n] Θi −→ X is said to be interim individually rational
if we have the following for every player i ∈ [n] and every θi ∈ Θi
The last type of individual rationality, called ex-ante individual rationality ensures that players do not
withdraw before knowing their actual type.
Definition 8.7. A social choice function f : ×i∈[n] Θi −→ X is said to be ex-ante individually rational
if we have the following for every player i ∈ [n]
Remark: Observe that, whether a given social choice function f is interim individually rational or ex-
ante individually rational depends on the prior distribution P. Actually, it does not make sense to talk about
interim individual rationality or ex-ante individual rationality without fixing a prior P (whose existence we
have always assumed without mentioning explicitly every time). On the other hand, whether a social choice
function is ex-post individually rational does not depend on the prior distribution at all.
76
Chapter 9
Gibbard-Satterwaite Theorem
We now present and prove the celebrated Gibbard-Satterwaite (GS in short) Theorem. GS Theorem essen-
tially says that, if players are allowed to have any arbitrary utility function, then, loosely speaking, only
dictatorship social choice functions are dominant strategy incentive compatible. For the GS theorem, we
assume that the utility of a player i depends only on the outcome and her own type (that is, it does not
depend on the types of other players). This only makes the GS Theorem stronger. Under this assumption,
we observe that any given type θi ∈ Θi of player i ∈ [n] induces a preference relation Ri on X defined as
follows.
xRi y ⇔ ui (x, θi ) > ui (y, θi )
The relation Ri is called the rational preference relation of player i. A player is said to have a strict
rational preference relation if Ri is a total order. In the context of GS Theorem, it is convenient to work
directly with strict relational preference relations than type sets. We denote the set of all possible strict
rational preference relations (which are complete orders over X) by L(X). Hence a social choice function,
in the context of GS Theorem, maps a preference profile in L(X)n to X. A social choice function is called
unanimous if f(P1 , . . . , Pn ) = x whenever every player has a strict rational preference relation and x is most
preferred to every Pi , i ∈ [n]. We now state the GS Theorem.
There have been many proofs known for the GS Theorem and most of them are completely accessible
77
and uses elementary techniques only. People often use one or more of the following ways to circumvent
the theoretical bottleneck that the GS Theorem implies. Here we present one such simple proof due to
Benoı̂t [Ben00].
Proof. If n = 1, then, by unanimity, f is dictatorship and thus we have nothing to prove. Hence, let us
assume that n > 2. We first show that f is monotone in the following sense.
Claim 9.1.1. Let (Pi , P−i ) be a preference profile, f(Pi , P−i ) = A. Let Pi0 be another preference where the
position of an outcome B improves from Pi and the order of every other outcomes remain unchanged from Pi .
Then f(Pi , P−i ) is either A or B.
Proof. Suppose f(Pi , P−i ) = C for some C ∈ X \ {A, B}. If C is preferred over A in Pi , then the i-th player
in the profile (Pi , P−i ) can manipulate by misreporting her preference to be Pi0 . On the other hand, if A
is preferred over C in Pi and thus in Pi0 , then the i-th player in the profile (Pi0 , P−i ) can manipulate by
misreporting her preference to be Pi .
Fix any outcome say B ∈ X. Consider an arbitrary profile P0 where B appears at the last position in every
preference. We place B at the first position one preference at a time from the preference of player 1 keeping
the order of every other outcomes same until f outputs B. Suppose r is the first player such that, when we
place B at the first position of the first r players, f chooses B. Such an r always exists since, due to unanimity,
f chooses B if B appears at the first position of every player’s preference. Consider the following two profiles
PB and QB . Hence, f does not choose B in PB and chooses B in QB . We call r the pivotal player for B and
will denote it by rB when there is scope for confusion. In what follows, we will show that the r-th player is
a dictator.
Observation 9.1.1. Let P0B be any profile of the following form. Then f does not choose B in P0B .
1 ··· r − 1 r r + 1 ··· n
· ··· · ··· ···
P0B : .. .. .. .. .. .. ..
. . . . . . .
··· B B B B
Proof. From the profile PB , replace the preference of player 1 to n one at a time with their corresponding
preference from P0B . Suppose there exists an i ∈ [n] such that f chooses the outcome B at the i-th step. Let
the profiles at (i − 1)-th and i-th steps be Pi−1 and Pi respectively. If i < r, then the i-th player in Pi−1 can
manipulate by misreporting her preference to be the one in Pi . On the other hand, if i > r, then the i-th
player in Pi can manipulate by misreporting her preference to be the one in Pi−1 .
Observation 9.1.2. Let Q0B be any profile of the following form. Then f chooses B in Q0B .
78
1 ··· r − 1 r r + 1 ··· n
B ··· B B ··· ···
Q0B : .. .. .. .. .. .. ..
. . . . . . .
··· · · · ·
Proof. From the profile Q0B , replace the preference of player 1 to n one at a time with their corresponding
preference from QB . Suppose there exists an i ∈ [n] such that f chooses the outcome B at the i-th step. Let
the profiles at (i − 1)-th and i-th steps be Qi−1 and Qi respectively. If i < r, then the i-th player in Qi can
manipulate by misreporting her preference to be the one in Qi−1 . On the other hand, if i > r, then the i-th
player in Qi−1 can manipulate by misreporting her preference to be the one in Qi .
We observe that Observations 9.1.1 and 9.1.2 imply that the pivotal player i does not depend on the
profile P0 we start with.
Proof. From RA B construct the following profile R1 by putting A at the first position and keeling the relative
order of every other outcomes same in every preference. By unanimity, f chooses A in R1 .
From R1 , put B at the first position in the first r − 1 preferences keeping every other order same to obtain
another profile R2 . By Claim 9.1.1 and Observation 9.1.1, f chooses A in R2 .
From R2 , put B at the second position in the r-th preference keeping every other order same to obtain
another profile R3 . By Claim 9.1.1, f chooses either B or A in R3 . However if f chooses B in R3 , then the
r-th player can manipulate in R3 by misreporting her preference according to R2 . Hence f chooses A in R3 .
79
1 ··· r−1 r r+1 ··· n
B ··· B A A ··· A
R3 : A ··· A B · ··· ·
.. .. .. .. .. .. ..
. . . . . . .
· ··· · · B ··· B
Now suppose f chooses some outcome D 6= A in RA B . From RB , put B at the first position in the first
A
From RAB,1 , put B at the second position in the r-th preference keeping every other order same to obtain
another profile RAB,2 . Due to Claim 9.1.1, f chooses either D or B in RB,2 . If f chooses D in RB,2 , then the
A A
r-th player can manipulate by putting B at the first position since, by Observation 9.1.2, f chooses B in the
resulting profile. On the other hand, if f chooses B in RA
B,2 , then we construct another profile RB,3 by putting
A
A at the second position in every preference from 1 to r − 1 and at the first position in every preference from
r + 1 to n keeping the order of every other outcome same (of course one preference at a time). The function
f continues to choose B otherwise, there exists an i ∈ [n] \ {r} such that, when we modify the preference of
the i-th player, then B does not win in the resulting profile. Let the profiles at (i − 1)-th step and i-th step
be R0i−1 and R0i . Then, if i < r, then player i can manipulate in R0i by misreporting her preference to be
as in profile R0i−1 . On the other hand, if i > r, then player i can manipulate in R0i−1 by misreporting her
preference to be as in profile R0i .
However the profile RAB,3 is the same as R3 where we have already argued that f chooses A. This is a
contradiction. Hence f chooses A in RAB.
80
We now show that r is the pivotal player for every outcome B. That is, for any C ∈ X \ {B},we have
r = rB .
C
Proof. Let A ∈ X \ {B, C} be an outcome other than B and C. We consider the following profile R3 . By
Observation 9.1.2, f chooses B in R3 . Using Observation 9.1.3 with respect to C instead of B, f chooses
1 ··· rB − 1 rB rB + 1 ··· n
B ··· B B A ··· A
R3 : .. .. .. .. .. .. ..
. . . . . . .
C ··· C C C ··· C
the first outcome of the player rC in R3 . Hence we have rC 6 rB . Exchanging the role of B and C in the
argument so far, we have rB 6 rC . Hence we have rB = rC .
Observation 9.1.5. The player r is a dictator for the social choice function f.
Proof. Consider an arbitrary profile R1 where player r places an outcome K ∈ X at the first position. We
construct R2 by placing B ∈ X \ {K} at the last position of every preference keeping the order of every
other pairs of alternatives same in every preference as in R1 . Then, by Observation 9.1.3, f chooses K in
R2 . Next we restore B in every preference in R2 to its original position in R1 thereby constructing R1 back.
Now, by Claim 9.1.1, f chooses either B or K in R1 . Repeating the above argument using another outcome
C ∈ X \ {K, B}, f chooses either C or K in R1 and thus f chooses K in R1 .
(ii) We can weaken our requirement and be satisfied with weaker equilibrium concepts like Bayesian Nash
equilibrium.
(iii) Can we have a social choice function where manipulation is computationally intractable, NP-hard say.
81
82
Chapter 10
Among the most successful way-outs from the impossibility result by Gibbard and Satterhwaite, has been to
work with quasi-linear environment. Suppose we have n players. In this environment, every outcome x is
a tuple (k, t1 , . . . , tn ) where k belongs to some set K of allocations and ti is the money received by player i
(from the system). We make the convention that, if ti < 0, then player i pays −ti (to the system). Hence, in
the quasi-linear environment, the set X of alternatives is
X
X = (k, t1 , . . . , tn ) : k ∈ K, ti ∈ R ∀ i ∈ [n], ti 6 0
i∈[n]
In the quasi-linear environment, the utility function of player i ∈ [n] has the following form.
(i) An allocation function k : ×i∈[n] Θi −→ K decides the allocation for every type profile.
(ii) Payment functions ti : ×i∈[n] Θi −→ R decides the payment for player i ∈ [n] for every type profile.
alternatively
X
n X
n
vi (k(θ), θi ) ∈ max vi (k, θi )
k∈K
i=1 i=1
83
10.2 Budget Balance (BB)
Budget balance is a property of the payment functions t1 (·), . . . , tn (·) in quasi-linear environment. We say a
social choice function f(·) = (k(·), t1 (·), . . . , tn (·)) is strongly budget balance if, for each θ ∈ Θ(= ×i∈[n] Θi ),
we have the following X
ti (θ) = 0
i∈[n]
and weakly budget balance if , for each θ ∈ Θ(= ×i∈[n] Θi ), we have the following
X
ti (θ) 6 0
i∈[n]
We will assume that the payment functions satisfy weakly budget balance.
Let us see what we achieved by restricting ourselves to quasi-linear environment.
Observation 10.1. If we have at least two players, then no social choice function in quasi-linear
environment is dictatorship.
Proof. Let n (> 2) be the number of players. For the sake of finding a contradiction, let f(·) =
(k(·), t1 (·), . . . , tn (·)) be a dictatorship social choice function with dictator d. Let ε > 0 be any positive
real number. For any θ ∈ Θ, consider the outcome x ∈ X defined as
We observe ud (x, θd ) = ud (f(θ), θd ) which contradicts our assumption that f is a dictatorship social
choice function with dictator d.
Next we see that ex-post efficient social choice functions can be characterized by allocative efficiency and
budget balance conditions together in quasi-linear environments.
Observation 10.2. A social choice function is ex-post efficient in a quasi-linear environment if and
only if it is allocatively efficient and strictly budget balanced.
Proof. For the if direction: let us assume that a social choice function f(·) = (k(·), t1 (·), . . . , tn (·)) is alloca-
tively efficient and strictly budget balanced. Then, for every θ ∈ Θ, we have the following which proves that
f is ex-post efficient.
X X X
ui (f(θ), θi ) = vi (k(θ), θi ) + ti (θ)
i∈[n] i∈[n] i∈[n]
X
= vi (k(θ), θi ) [strict budget balance]
i∈[n]
X X
> vi (k, θi ) + ti , ∀ (k, t1 , . . . , tn ) ∈ X [allocative efficiency and
i∈[n] i∈[n]
84
X
ti = 0]
i∈[n]
X
= ui (x, θi ), ∀ x ∈ X
i∈[n]
For the only if direction: let us assume that a social choice function f(·) = (k(·), t1 (·), . . . , tn (·)) is ex-post
efficient. We first claim that f is allocatively efficient. Suppose not, then there exists a type profile θ ∈ Θ and
an allocation k ∈ K such that
X X
vi (k(θ), θi ) < vi (k, θi )
i∈[n] i∈[n]
P
Then there exists an agent j ∈ [n] such that vj (k(θ), θi ) < vj (k, θi ). Let ε = i∈[n] vi (k, θi ) −
P
i∈[n] vi (k(θ), θi ). We observe that ε > 0. We consider the outcome x defined as follows.
We now have the following for every agent i ∈ [n] \ {j} which contradicts our assumption that f is ex-post
efficient.
ui (x, θi ) = vi (k, θi ) + ti = ti (θ) + vi (k(θ), θi ) = ui (f(θ), θi )
We now show that f must be strictly budget balanced. Suppose not, then there exists a type profile θ ∈ Θ
P
such that i∈[n] ti (θ) < 0. We consider the outcome x defined as follows.
X
x = k(θ), t1 = t1 (θ) − ti (θ), (ti = ti (θ))i6=1
i∈[n]
P
Then we have i∈[n] ti = 0 and thus x ∈ X. Also u1 (x, θ1 ) > u1 (f(θ), θ1 ) and ui (x, θi ) > ui (f(θ), θi ) for
every agent i ∈ [n] \ {1}. However, this contradicts our assumption that f is ex-post efficient.
85
Groves Theorem
Theorem 10.1. Let f(·) = (k∗ (·), t1 (·), . . . , tn (·)) be a allocatively efficient social choice function.
Then f(·) is dominant strategy incentive compatible if payment functions satisfy the following struc-
ture (known as Groves payment (or incentive) scheme):
X
ti (θi , θ−i ) = vj (k∗ (θ), θj ) + hi (θ−i ), ∀ i ∈ [n]
j6=i
Proof. Suppose not, then there exist a player i ∈ [n] and θi , θ0i ∈ Θ, θ−i ∈ Θ−i such that
vi (k∗ (θ0i , θ−i ), θ−i ) + ti (θ0i , θ−i ) > vi (k∗ (θi , θ−i ), θ−i ) + ti (θi , θ−i )
X X
⇒ vj (k∗ (θ0i , θ−i ), θj ) > vj (k∗ (θi , θ−i ), θj )
j∈[n] j∈[n]
Remark: Observe that, for any type profile θ−i of players other than i, the money received by player i
depends on her type θi only through the allocation function k∗ (θi , θ−i ). The change in money received by
player i if she changes her type from θi to θ0i , given other players play θ−i , is
X
ti (θi , θ−i ) − ti (θ0i , θ−i ) = [vj (k∗ (θi , θ−i ), θj ) − vj (k∗ (θ0i , θ−i ), θj )]
j∈[n]\{i}
In other words, the change of money that player i receives is the externality she imposes on the system.
The direct revelation mechanism of a social choice function f which is allocatively efficient and satisfies
Groves payment scheme is called a Groves mechanism.
86
Hence the money that player i receives is given by the total value of all agents other than i in an efficient
allocation in the presence of i minus their total value in an efficient allocation in the absence of i.
Clarke mechanism is also sometimes called pivotal mechanism since each allocated player plays a pivotal
role in deciding the value that other players receive in an efficient allocation due to her presence in the
system.
Another interpretation of Clarke payment scheme comes from the following observation.
X X
ti (θ) = vj (k∗ (θ), θj ) − vj (k∗−i (θ−i ), θj ) − vi (k∗ (θ), θi )
j∈[n] j6=i
is always non-negative and thus player i is given a discount of γi from her valuation vi (k∗ (θ), θi ) while
calculating the amount she need to pay. This quantity γi is called the Vickrey discount for player i.
Example 10.5.1 (Vickrey Auction for Selling Multiple Identical Items). VCG Mechanism (aka Vickrey
Auction) for Selling Multiple Identical Items by One Seller to Multiple Buyers.
B Suppose we have one seller and 5 buyers. The set of players (N) : {0, 1, 2, . . . , 5}. Player 0 is the
seller and players 1, . . . , 5 are the 5 potential buyers each wanting to buy one unit of the item.
Suppose the valuation of one unit of the item for the buyers are respectively 20, 15, 12, 10, 8.
B Suppose we use Clarke payment rule. Since VCG mechanisms are DSIC, we may assume without
loss of generality that every buyer bids her valuation.
B Any allocatively efficient rule (k∗ (·) and k∗−i (·), i ∈ [5] in particular) will allocate the 3 items to the
3 players (one item each) having highest valuations. Hence, players 1, 2, and 3 receives one item
each.
Similar calculation shows that payment received by agent 2 and 3 is also −10 (that is, they pay
10). And thus the Vickrey discounts to buyers 1, 2, and 3 are respectively 10, 5, and 2.
87
Example 10.5.2 (Combinatorial Auction). VCG Mechanism for Selling Multiple Non-identical Items by
One Seller to Multiple Buyers.
B Suppose we have one seller with two items, namely A and B, and 3 buyers. The set of players
(N) : {0, 1, 2, , 3}. Player 0 is the seller and players 1, 2, and 3 are the 3 potential buyers each
wanting to buy one unit of the item. The valuation of each player for different bundles of items are
shown as below. The notation ? means that the player is not interested in that bundle.
B Suppose we use Clarke payment rule. Since VCG mechanisms are DSIC, we may assume without
loss of generality that every buyer reports her valuation function truthfully.
B Any allocatively efficient rule (k∗ (·) and k∗−i (·), i ∈ [3] in particular) will allocate the bundle {A, B}
to buyer 1 and nothing to buyers 2 and 3.
SP1 : 10 SP2 : 7
s SP5 : 2 t
SP3 : 5 SP4 : 10
88
Example 10.5.3. VCG Mechanism for Strategic Network Formation.
B Consider the network in Figure 10.1 whose each edge belongs to some service provider. Each service
provider has a private type denoting the cost she incurs if one uses her edge. We wish to buy a path
from s to t in this strategic environment. We call the edge held by SPi by i for i ∈ [5].
B We use VCG mechanism to learn the types of the players. The set K of allocations is
B Any allocatively efficient allocation rule (k∗ (·) and k∗−i (·), i ∈ [5] in particular) will choose a
shortest path from s to t. Hence the allocation rule chooses the path s → B → A → t, i.e.
(0, 1, 1, 0, 1).
VCG mechanism shows that an allocatively efficient allocation rule is implementable in dominant strategy
equilibrium using certain payment rules. One can ask which other allocation rules are implementable in
dominant strategy equilibrium using some payment rule. It turns out a generalized class of “allocatively
efficient” allocations functions, called affine maximizer, characterizes the class of allocation functions which
are implementable in dominant strategy equilibrium. These generalizations allow giving weights to players,
weights to allocations, and allows restricting the set of feasible allocations.
89
Affine Maximizer
Definition 10.1. An allocation rule k : ×i∈[n] Θi −→ K is called an affine maximizer if there exist
K0 ⊆ K, w1 , . . . , wn ∈ R, and ck0 ∈ R ∀ k0 ∈ K0 such that, for every θ ∈ Θ, we have the following.
X
f(θ) ∈ argmax ck0 + wi vi (k0 , θ)
k0 ∈K0
i∈[n]
The Groves payment scheme for affine maximizers can be generalized as follows. The proof of Theo-
rem 10.2 follows closely the proof of Groves Theorem.
Theorem 10.2. Let f(·) = (k∗ (·), t1 (·), . . . , tn (·)) be a social choice function such that k∗ (·) is an
affine maximizer. Then f(·) is dominant strategy incentive compatible if payment functions satisfy the
following structure:
Xw ck∗ (θ)
j
ti (θi , θ−i ) = vj (k∗ (θ), θj ) + + hi (θ−i ), ∀ i ∈ [n]
wi wi
j6=i
Proof. Suppose not, then there exist a player i ∈ [n] and θi , θ0i ∈ Θ, θ−i ∈ Θ−i such that
vi (k∗ (θ0i , θ−i ), θ−i ) + ti (θ0i , θ−i ) > vi (k∗ (θi , θ−i ), θ−i ) + ti (θi , θ−i )
X X
⇒ ck∗ (θ0i ,θ−i ) + wj vj (k∗ (θ0i , θ−i ), θj ) > ck∗ (θ) + wj vj (k∗ (θi , θ−i ), θj )
j∈[n] j∈[n]
Robert’s Theorem states that, for unrestricted domain, affine maximizers are essentially the only alloca-
tion rules which are dominant strategy incentive compatible.
Robert’s Theorem
Theorem 10.3. If |K| > 3, the allocation rule k∗ (·) is onto, Θ is unrestricted, and the allocation rule
k∗ (·) is implementable in dominant strategy, then k∗ (·) is an affine maximizer.
The proof Theorem 10.3 is nontrivial. We refer to [NRTV07, Chapter 12] for a proof of Theorem 10.3.
Let us see another useful characterization DSIC mechanisms in quasi-linear environment which is (i) easy to
prove and (ii) does not depend on the domain of Θi .
90
Characterization of DSIC Mechanisms
Proposition 10.1. A social choice function f(·) = (k(·), t1 (·), . . . , tn (·)) is DSIC if and only if the
following conditions hold for every i ∈ [n] and θ−i ∈ Θ−i
(i) The payment ti (θ) for player i depends on θi only via k(θi , θ−i ). That is, for any two θi , θ0i ∈ Θi
such that k(θi , θ−i ) = k(θ0i , θ−i ), we have ti (θi , θ−i ) = ti (θ0i , θ−i ). That is, ti is a function of K
only.
(ii) The allocation function simultaneously optimizes for each player. That is, we have k(θi , θ−i ) ∈
argmaxK∈k(·,θ−i ) (vi (K, θi ) + ti (θi , θ−i )) = argmaxK∈k(·,θ−i ) (vi (K, θi ) + ti (K)).
Proof. If part: Let θi , θ0i ∈ Θi and θ−i ∈ Θ−i . Then we have the following.
Only if part; property (i): Suppose for some θi , θ0i ∈ Θi and θ−i ∈ Θ−i such that k(θi , θ−i ) = k(θ0i , θ−i ),
we have ti (θi , θ−i ) > ti (θ0i , θ−i ). Then, in the profile (θ0i , θ−i ), player i is better off misreporting her type to
be θi instead of θ0i which contradicts DSIC assumption of f.
Only if part; property (ii): Suppose for some θi , θ0i ∈ Θi and θ−i ∈ Θ−i , we have
Then, in the profile (θi , θ−i ), player i is better off misreporting her type to be θ0i instead of θi which
contradicts DSIC assumption of f.
We will next show that the assumption |K| = 2 in Theorem 10.3 is an important one since |K| = 2 falls
into the category called “single parameter” domain. In the single parameter domain, we have allocation
rules which are not an affine maximizer but dominant strategy incentive compatible.
Single-Parameter Domain
91
whereas the set Ki and the real interval [t0 , t1 ] is common knowledge. Intuitively speaking, we can
think of Ki as the set of allocations where player i “wins.”
Hence, in the single parameter domain, we assume without loss of generality that θi is a real number (the
real parameter). We will characterize the set of implementable social choice function in single parameter
setting. Towards that, we next define the notion of monotonicity.
We will show that every monotone allocation rule in single parameter domain is implementable using
suitable payment rule. To define the payment rule, we next define the notion of critical value function.
Definition 10.4. Given a allocation rule, the critical value function ci (θ−i ) of player i for a type
profile θ−i is defined as
ci (θ−i ) = sup θi
∈ Ki
θi ∈Θi :k∗ (θi ,θ−i )/
/ Ki , then ci (θ−i ) is
For some θ−i ∈ Θ−i , if there does not exist any θi ∈ Θi such that k∗ (θi , θ−i ) ∈
undefined. That is, critical value is the real number above which the player wins and below which
the player loses.
We now state and prove the main result of mechanism design in single parameter domain.
Theorem 10.4. A social choice function f(·) = (k∗ (·), t1 (·), . . . , tn (·)) in a single parameter domain
is DSIC and losers do not pay anything if and only if the following conditions hold:
(ii) Every winning player essentially pays her critical value. That is,
−c (θ ) if k∗ (θ , θ ) ∈ K
i −i i −i i
ti (θi , θ−i ) =
0 otherwise
For some θ−i ∈ Θ−i , if ci (θ−i ) is undefined, then we define ci (θ−i ) = λi where λi is any real
number.
92
Proof. If part: For any i ∈ [n], θi ∈ Θi , θ−i ∈ Θ−i , if player i wins, then her utility is θi − ci (θ−i ) + hi (θ−i )
and if she loses, then her utility is hi (θ−i ). Hence she prefers winning if and only if θi 6 ci (θ−i ) which is
exactly what she achieves by reporting θi (truthfully).
Only if part, monotonicity condition: Since k∗ (·) is not monotone for some player i ∈ [n], there exist
θi , θ0i ∈ Θi , θi < θ0i , θ−i ∈ Θ−i such that k∗ (θi , θ−i ) ∈ Ki but k∗ (θ0i , θ−i ) ∈ / Ki . Since f is DSIC, player i
prefers winning in the profile (θi , θ−i ). Thus we have θi − ci (θ−i ) > 0. Again since f is DSIC, player i prefers
losing in the profile (θ0i , θ−i ). Thus we have θ0i − ci (θ−i ) > 0. However this contradicts our assumption that
θi < θ0i . Hence k∗ (·) is monotone.
Only if part, payment rule: From Proposition 10.1, we know that, for every i ∈ [n], θ, θ0i ∈ Θi , θ−i ∈ Θ−i
such that k∗ (θi , θ−i ), k∗ (θ0i , θ−i ) ∈ Ki , we have ti (θi , θ−i ) = ti (θ0i , θ−i ). Suppose the payment rule is not
what is specified in the statement. Then there exist θi ∈ Θi , θ−i ∈ Θ−i such that player i wins in (θi , θ−i )
but ti (θi , θ−i ) 6= −ci (θ−i ). Let −ti (θi , θ−i ) > ci (θ−i ) and thus, due to monotonicity, there exists a type
θ0i ∈ Θi such that −ti (θi , θ−i ) > θ0i > ci (θ−i ) and player i wins in (θ0i , θ−i ). Then, in the profile (θ0i , θ−i ),
player i wins and thus have utility θ0i + ti (θ0i , θ−i ) = θ0i + ti (θi , θ−i ) < 0. Hence, in the profile (θ0i , θ−i ),
player i is better off reporting some type θlose i < ci (θ−i ) and have utility 0 by losing. This contradicts the
DSIC assumption of f. On the other hand, if we have −ti (θi , θ−i ) < ci (θ−i ), then we have a type θ0i ∈ Θi
such that −ti (θi , θ−i ) < θ0i < ci (θ−i ) and player i loses in (θ0i , θ−i ). Then, in the profile (θ0i , θ−i ), player i is
better off reporting θi and receive a utility θ0i + ti (θ0i , θ−i ) = θ0i + ti (θi , θ−i ) > 0 by winning than reporting
true type θ0i and receive an utility of 0 by losing.
We see that Theorem 10.4 leaves enough room for implementing allocation rules which are not affine
maximizer. All we need is that the allocation rule is monotone in each component. For example, in single
P
parameter domain, the allocation rule k∗ (θ) = argmaxk∈K n λ
i=1 vi (k, θi ) is implementable for any constant
λ > 1. Recall that, we have assumed a stylized special case of the single-parameter domain (refer to
Definition 10.2) and shown in Theorem 10.4 that monotone allocation rules are exactly the rules which
are implementable in dominant strategy in single-parameter domain. The celebrated Mayerson’s Lemma
shows that, in every single-parameter domain, monotone allocation rules are exactly the allocation rules
which are implementable in dominant strategy. We now formally state the Mayerson’s Lemma and refer to
the lecture notes of Prof. Tim Roughgarden for its proof.
Mayerson’s Lemma
(ii) If k is monotone, then there is a unique payment rule (t1 , . . . , tn ) where players bidding 0 pays
0 (these are called normalized payment rules) such that the mechanism (k, t1 , . . . , tn ) is DSIC.
(iii) The payment rule is given by the explicit formula for differentiable k as
Z θi
d
ti (θi , θ−i ) = − z ki (z, θi )dz
0 dz
where ki (·) is the allocation for player i. For other type of allocation rule k, similar payment
rules can be used to implement k.
93
We finally conclude this chapter by stating that the payment functions are essentially unique in VCG
mechanisms. The proof is not very difficult and we refer interested readers to [NRTV07, Theorem 9.37 of]
for its proof.
Theorem 10.5. Assume Θi is a connected in some Euclidean space for every i ∈ [n]. Let
(f, t1 , . . . , tn ) be a a DSIC mechanism. Then a mechanism (f, t01 , . . . , t0n ) is DSIC if and only if we
have the following for every i ∈ [n].
We have good a understanding of DSIC allocation rules for two extreme cases: (i) when each Θi is unre-
stricted (ii) when each Θi is single dimensional. For these two extremes we exactly know the set of DSIC
allocation rules in Theorems 10.3 and 10.4. How about other domains? We now present a partial charac-
terization for convex domains – each Θi is a convex set in some Euclidean space. For this, we introduce the
notion of weak monotonicity.
Weak Monotonicity
Definition 10.5. An allocation rule k∗ : Θ −→ K is called weakly monotone if we have the following
for every i ∈ [n], θi , θ0i ∈ Θi , θ−i ∈ Θ−i , k∗ (θi , θ−i ) = x 6= y = k∗ (θ0i , θ−i )
Theorem 10.6. If a mechanism (k∗ , t1 , . . . , tn ) is DSIC, then k∗ is weakly monotone. On the other
hand, if Θi is a convex set for every i ∈ [n], then, for every weakly monotone allocation rule k∗ , there
exists payment rules t1 , . . . , tn such that the mechanism (k∗ , t1 , . . . , tn ) is DSIC.
The first part of the statement above is easy to prove. Proving second part of the theorem is quite non-
trivial and we refer interested reader to [BCL+ 06, ABHM10].
Proof of first part. Suppose the mechanism (k∗ , t1 , . . . , tn ) is DSIC. Like in the proof of Theorem 10.4, we
have the following: for every i ∈ [n], θ, θ0i ∈ Θi , θ−i ∈ Θ−i such that k∗ (θi , θ−i ), k∗ (θ0i , θ−i ) ∈ Ki , we have
ti (θi , θ−i ) = ti (θ0i , θ−i ).
94
10.9 A Glimpse of Algorithmic Mechanism Design: Knapsack Alloca-
tion
In the classical Knapsack problem, we have a set I of n items and a bag of size W. Each item i ∈ [n], has a
P
weight wi and a valuation vi . The goal is to find a subset S ⊆ I of items of maximum total value j∈S vj
P P
subject to the constraint that j∈S wj 6 W. The quantity j∈S vj is called societal surplus or simply surplus
for S. Let us consider the Knapsack problem in a strategic environment – We have n players each having an
item. The weight of the item of player i is wi which is common knowledge. The size W of the knapsack
is also common knowledge. However the valuation vi of item i is known to player i only. For a motivating
example, think of n advertisers want to show their ads during a popular TV show. The total duration W
of commercial break of the TV show and the duration wi of each ad i are publicly known. However, the
valuation vi of the ad i is known to the advertiser i only. The advertiser i is vi if her ad is chosen and
0 otherwise. A natural goal in this application is to select a subset of ads with maximum total valuations
subject to the constraint that the sum of the duration of the chosen ads is at most W. We wish to design a
mechanism in this strategic setting.
We first observe that the mechanism design problem we have at hand belongs to single-parameter domain
— the type of player i can be completely revealed by knowing a single real parameter vi . From Theorem 10.4,
we know that only monotone allocation rules are implementable. Is the Knapsack allocation rule monotone?
Yes! If an item i ∈ [n] is chosen and we increase its valuation keeping everything else fixed, item i will
continue to be chosen. However, choosing a Knapsack allocation is NP-hard. Can we have an implementable
allocation rule which approximates the Knapsack objective well? Designing such rule is a two step process.
(ii) Design an allocation rule which (i) can be computed in polynomial time, (ii) approximates our objec-
tive function (in this case, the total valuation of items selected) well, and (iii) the allocation rule is
monotone. Observe that, due to Theorem 10.4, monotonicity of allocation rule justifies our assumption
of DSIC.
So, at least for single-parameter domain, algorithmic mechanism design is almost like designing approx-
imation algorithms with an extra requirement that our algorithm should be monotone. In case of Knapsack
problem in the above strategic environment, we now design a 21 factor approximation algorithm which is
also monotone.
We now define a simple allocation rule kgreedy based on a natural greedy algorithm for Knapsack. Without
loss of generality, we assume that wi 6 W for every i ∈ [n]. By renaming we assume that
v1 v2 vn
> > ··· >
w1 w2 wn
The allocation rule k picks either S or a most valuable item whichever gives more valuation.
95
1
2 Factor Approximation for Knapsack
Proposition 10.2. The allocation rule kgreedy approximates maximum surplus within a factor of 12 .
Proof. Let OPT be the maximum surplus and vmax be the maximum valuation of any item. We obviously have
X i∗
kgreedy > max vj , vi∗ +1
j=1
iX
∗
+1
1
> vj
2
j=1
> OPT
Also kgreedy is obviously monotone — increasing valuation of an already selected item makes that item
remain selected. Can we have an even better approximate Knapsack allocation rule? For the classical
Knapsack problem, we have a fully polynomial
3 time approximation scheme (FPTAS) — for every ε > 0,
there is an algorithm running in time O ε which approximates optimal solution within a factor of (1 + ε).
n
We encourage the readers to verify that the natural allocation rule for the FPTAS is not monotone. However,
the FPTAS can easily be modified so that the allocation rule is monotone and thus implementable with
suitable payment rules. This is a typical characteristic of works in algorithmic mechanism design. Consider
a NP-hard problem, verify whether the best known approximation algorithm induces a DSIC mechanism,
and if not, then either tweak the existing approximation algorithm or design a new approximation algorithm
whose corresponding mechanism is DSIC.
96
Chapter 11
Chapter 10 talks about mechanism design with money — we have an allocation rule and we wish to im-
plement it in a strategic environment by incentivizing players suitably. We now discuss mechanism design
without money. We begin with one of the most successful application of mechanism design without money
which is stable matching.
Blocking Pair
Definition 11.1. Suppose we have two sets A and B of n vertices each. Every vertex a ∈ A has
a preference order a over B and every vertex b ∈ B has a preference order b over A. Given a
matching M between A and B, a pair a ∈ A and b ∈ B is called a blocking pair for the matching M
if one of the following holds.
(i) Both a and b are not matched to anyone in M.
(ii) The vertex a is unmatched, b is matched with some M(b) ∈ A in M, and a b M(b).
(iii) The vertex b is unmatched, a is matched with some M(a) ∈ B in M, and b a M(a).
(iv) Both a and b are matched to M(a) and M(b) respectively in M, and we have a b M(b) and
97
b a M(a).
Stable Matching
Definition 11.2. Suppose we have two sets A and B of n vertices each. Every vertex a ∈ A has a
preference order a over B and every vertex b ∈ B has a preference order b over A. A matching
M is called stable if there is no blocking pair.
It follows from Definition 11.2 that every stable matching is a perfect matching. We now show the
main result of stable matching theory – every instance of stable matching problem has a stable matching.
Moreover, such a stable matching can be computed in polynomial time. This is the celebrated Theorem by
Gale and Shapley.
Gale-Shapley Theorem
Theorem 11.1. Every stable matching instance has a stable matching. Moreover it can be computed
in polynomial time.
Proof. Let us call the sets A and B as the sets of men and women respectively. We now describe an algorithm
called “proposal algorithm by men” or “men-proposing deferred acceptance algorithm.” Initially everyone is
unmatched. We pick any arbitrary unmatched man and let him propose his most preferred woman whom he
has not yet proposed. Whenever an unmatched woman receives a proposal she accepts the proposal and gets
matched. When a matched woman receives a proposal, she accepts her proposal if she prefers the proposer
more than her current matched partner; in this case, her previous matched partner becomes unmatched
again. The algorithm terminates when all men (and thus all women) are matched.
Observe that, once a woman gets matched, she never becomes unmatched although she can change her
partner. Also, every woman who has received at least on proposal is matched. Since there are n men and
each has a preference list over n women, after at most n2 iterations, every woman receives at least one
proposal and thus all women are matched. Hence the algorithm terminates after at most n2 iterations. What
we show next is that the matching M computed by the algorithm is a stable matching. For that, we take an
arbitrary pair a ∈ A and b ∈ B of man and woman who are not matched together by M. We will show that
(a, b) does not form a blocking pair. If a has never proposed to b during the run of the algorithm, then a
prefers his matching partner in M than b and thus (a, b) cannot form a blocking pair. On the other hand, if
a has proposed to b and still not b is not matched with a in M, then b prefers her partner in M than a and
thus (a, b) cannot form a blocking pair.
We observe that the proposal algorithm described in Theorem 11.1 does not specify which unmatched
man is picked to propose. Can different order of unmatched men result into different stable matching to
be output by the algorithm? To prove this, we will actually prove something stronger. For a man a ∈ A,
let us define h(a) to be the most preferred woman (according to a ) that a can be matched in any stable
98
matching. The following result shows that, in the matching output by the Gale-Shapley algorithm, every
man a is matched with h(a). This matching is called men-optimal stable matching.
Theorem 11.2. In any matching M computed by the proposal algorithm by men, every man a ∈ A
is matched with h(a) in M.
Proof. Consider any fixed execution of the Gale-Shapley algorithm by men. Let us define Ri = {(a, b) ∈
A × B : b has rejected a in the first i iterations} and R = ∪i∈[n2 ] Ri . To prove the statement, it is enough to
show that, for any pair (a, b) ∈ R, there is no stable matching which matched a with b. We show this claim
by induction on the number of iterations i.
For i = 0, R0 is an empty set and thus the statement is vacuously true. Hence the induction starts. Let us
assume the statement for i and prove for i + 1. If no woman has rejected any matched man in the (i + 1)-th
iteration, then we have Ri+1 = Ri and the claim follows from induction hypothesis. So let us assume that
some woman b ∈ B received a proposal from a man a ∈ A in the (i + 1)-th iterations and b has rejected her
previous partner, say a0 ∈ A. That is, we have Ri+1 = Ri ∪ {(a0 , b)}. Then we obviously have a b a0 . By
induction hypothesis, the man a is not matched with any woman whom he prefers more than b in any stable
matching. For the sake of finding a contradiction, let us assume that there is a stable matching M0 which
matches the woman b with a0 . Since the man a is not matched with any woman whom he prefers more than
b in M0 , the pair (a, b) forms a blocking pair for M0 which contradicts our assumption that M0 is a stable
matching.
Along the line of men-optimal stable matching, one can define notion of men-pessimal, women-optimal,
and women-pessimal stable matchings. It can be shown using similar argument as the proof of Theorem 11.2
that the stable matching computed by the Gale-Shapley proposal algorithm by men is actually a women-
pessimal stable matching.
The notion of stable matching safe-guards against every pair of man and woman deviating from a match-
ing – in a stable matching there is no pair of man and woman so that they deviate from the matching under
consideration and both become happier by getting matched with one another. One can think that we could
be more demanding that we wish to safe-guard against any coalition of men and women. In a stable match-
ing, can it happen that there is a coalition of men and women who can deviate from the matching under
consideration and get matched among themselves in a way that no one in the coalition is unhappier than
before and at least one of them is happier than previous matching? The next result shows that it cannot
happen – it is enough to safe-guard against a pair of man and woman. In the context of the stable match-
ing problem, a matching is called a core solution if no coalition of men and women get matched among
themselves in a way so that no one is the coalition is worse off and at least one of them is better off. More
abstractly, a core allocation is defined as follows.
Core Allocation
Definition 11.3. Let K ⊆ ×n i=1 Ki be a set of allocations. Every player i ∈ [n] has a preference >i
which is a partial order over Ki . We say an allocation (k1 , . . . , kn ) ∈ K is not a core allocation if there
99
exists a subset S ⊆ [n] of players and another allocation (k01 , . . . , k0n ) ∈ K such that
Given a matching M, the corresponding allocation vector would be (M(a))a∈A , (M(b))b∈B and, given
an allocation (ka )a∈A , (kb )b∈B , the corresponding matching M would be {(a, b) ∈ A×B : ka = b, kb = a}.
Typically, the set K of allocations would be clear from the context and we often skip defining it formally.
Observe that, in general, we may not have any core allocation in some situations. However, in case of stable
matching problem, core allocations are precisely the stable matchings.
Theorem 11.3. In every instance of the stable matching problem, a matching is a core if and only if
it is a stable matching.
Proof. By definition, every core matching is a stable matching. For the other direction, let M be a stable
matching. If M is not a core matching, let S be a subset of men and women of smallest cardinality who can
opt out of the matching M and get matched among themselves which makes, w.l.o.g, a man a ∈ S happier
by getting matched with a woman b ∈ S; that is b a M(a) where M(a) is the partner of a in M. Observe
that, a 6= M(b) and thus we have a b M(b). Then (a, b) forms a blocking pair for M contradicting our
assumption that M is a stable matching.
The last question on stable matching which we discuss here is regarding strategy-proofness. Is the
men-proposing deferred acceptance algorithm strategy-proof for the players? The following example an-
swers this question negatively. Consider a setting with 3 men m1 , m2 , m3 and 3 women w1 , w2 , w3
and the preferences be such that the men-proposing deferred acceptance algorithm outputs the matching
{{m1 , w1 }, {m2 , w2 }, {m3 , w3 }} and only woman w1 receives 2 proposals whereas the other 2 women receive
1 proposal each. Suppose the woman w1 received proposals from m2 and m1 (in this order), and rejects m2
after receiving proposal from m1 . One such concrete example is the following and run of the men-proposal
deferred acceptance algorithm with following proposal order of men: m2 , m1 , m2 , m3 .
Preference of m1 : w1 w3 w2 Preference of w1 : m3 m1 m2
Preference of m2 : w1 w2 w3 Preference of w2 : m1 m3 m2
Preference of m3 : w3 w2 w1 Preference of w3 : m1 m2 m3
Now suppose, the woman w1 rejects m1 instead of m2 when she receives a proposal from m1 –
this happens if the woman w1 misreports her preference to be m3 m2 m1 . Then the run of
100
the men-proposal deferred acceptance algorithm with proposal order m2 , m1 , m1 , m3 , . . . of men results
in matching {{m1 , w3 }, {m2 , w2 }, {m3 , w1 }} which is better for the woman w1 compared to the matching
{{m1 , w1 }, {m2 , w2 }, {m3 , w3 }}.
So the men-proposal deferred acceptance algorithm can be manipulated by women. Does there exist
any algorithm for stable matching which is strategy-proof? The answer turns out to be negative [Rot08].
However, the following result shows that any man does not have any incentive to misreport in the men-
proposal deferred acceptance algorithm. That is, the men-proposal deferred acceptance algorithm is strategy-
proof for men.
Theorem 11.4. The men-proposal deferred acceptance algorithm is strategy proof for men.
Proof. Suppose not, then there exists a man, say m ∈ A who can get better partner by misreporting his
preference to be 0m instead of m . Let M and M0 be the stable matchings output by the men-proposal
deferred acceptance algorithm on true preference profile (m , −m ) and the profile (0m , −m ). Let w1 ∈ B
be the partner of m in M0 . By assumption, we have w m M(m). Let m2 ∈ A be the partner of w in M.
Since (m, w) does not form a blocking pair for M, we have M(w) = m2 w m. Let M0 (m2 ) = w and again
we have M0 (m2 ) = w2 m2 M(m2 ) = w otherwise (m2 , w) forms a blocking pair for M0 . Let us define
A1 = {a ∈ A : M0 (a) a M(a)}. Then the above argument shows that the partner of any partner of any man
in A1 also belongs to A1 . Let us define B1 = {M0 (a) : a ∈ A1 } = {M(a) : a ∈ A1 }.
Obviously we have m ∈ A1 and thus A1 , B1 6= ∅. Consider the run of the men-proposal deferred accep-
tance algorithm on input (a , −a ). Suppose m be the last man in A1 who has made a proposal; m must
have made this proposal to w in some iteration t say and w must have accepted it. Since every man in A1
gets worse partner in M than M0 , every man m0 ∈ A1 has made a proposal to the woman M0 (m0 ) within the
first (t − 1) iterations and eventually gets rejected and thus every woman has received at least one proposal
within the first (t − 1) iterations. By the choice of m0 , the woman M(w0 ) rejects some man m00 ∈ A \ A1 .
So we have m0 w0 m00 and m00 w0 M0 (w0 ) since M0 (w0 ) also must have proposed w0 . On the other hand,
since we have m00 ∈ / A1 , it follows that w m00 M(m00 ) <m00 M0 (m00 ). Then (m00 , w0 ) forms a blocking pair
for M which is a contradiction.
0
In the stable matching problem, both the entities, namely men and women, have preferences over other.
However, in many applications of matching, for example, campus house allocation, school admission, etc.
only one type of entity has a preference. These situations are modelled by house allocation problem. In the
house allocation problem, we have n agents each occupying a house (so we have n houses in total). Each
agent also has a preference (a complete order) over the set of n houses. The question that we are interested
in is whether we can reallocate the houses which makes the agents better off. We consider the Top Trading
Cycle Algorithm (TTCA) which was credited to Gale in [SS74].
101
Top Trading Cycle Algorithm (TTCA): We construct a directed graph G on the vertex set [n]. We
have an edge from i to j if agent i prefers the house of agent j most; we are allowed to have i = j
which results into self-loop. Since the out-degree of every vertex is exactly 1, the graph G contains a
cycle (self-loops are considered to be cycle too). Also all cycles are vertex disjoint. We allocate houses
along every cycles. That is, for a cycle i1 → i2 → i3 → · · · → i` → i`+1 (= i1 ), we allocate the house
of agent ij+1 to the agent ij for every j ∈ [`]. We now delete the agents i1 , . . . , i` and repeat on the
remaining agents.
TTCA is DSIC
Proof. On an instance of the house allocation problem, consider an arbitrary run of TTCA. Suppose TTCA
runs for ` iterations – let Ni be the set of agents who are allocated houses in the i-th iteration of the algorithm
for i ∈ [`]. We will show that truth-telling is best strategy for every agent in ∪ij=1 Nj for every i ∈ [`]. For that
we use induction on i. For the base case, i.e. for i = 1, we need to show that no agent in N1 can be better
off by lying. However, this is obvious since every agent in N1 is allocated their top choice. Inductively, let us
assume the statement for i and prove for i + 1. We observe that, any agent in Ni+1 is never pointed to by any
agent in ∪ij=1 Nj in the first i iterations. Hence, even if any agent outside ∪ij=1 Nj misreport their preference,
the allocations in the first i iterations remain the same. Hence no agent in Ni+1 is allocated a house of any
agent in ∪ij=1 Nj . TTCA allocates every agent in Ni+1 her top choice outside the houses initially occupied by
someone in ∪ij=1 Nj . Hence there is no incentive for any agent in Ni+1 to misreport. Hence the mechanism
induced by TTCA is DSIC.
Showing DSIC is not enough in this case since the trivial mechanism which allocates every agent their
initial house is also DSIC. So we need to show that the performance of TTCA is good in some sense which
we show next.
Theorem 11.6. For every instance of the house allocation problem, the allocation computed by
TTCA is the unique core allocation of the instance.
Proof. We first prove that TTCA allocation is a core allocation. Let S ⊆ [n] be any set of agents and k be
another allocation where no agent in S is worse off and at least one agent in S is better off. Define `, Ni , i ∈ [`]
as in the proof of Theorem 11.5. Let us define i∗ = min{i ∈ [`] : at least one agent in Ni ∩S is better off in k};
a ∈ S ∩ Ni∗ be an agent in S that belongs to Ni∗ and better off in k. TTCA allocates agent a her top house
in Ni∗ ∪ · · · ∪ N` . Hence, by choice of the agent a, in the allocation k, the agent a is allocated some house
h ∈ N1 ∪ · · · ∪ Ni∗ −1 . Let b be the agent whom TTCA allocated the house h. Then we have b ∈ N1 ∪ · · · ∪ Ni∗ .
However, since the agent b receives different houses in the allocation k and the TTCA allocation, it follows
that b ∈ S. This contradicts the definition of i∗ .
102
We now prove uniqueness of core allocation. Any core allocation k must allocate the top choice for
every agent in N1 . Otherwise, the agents in N1 can form a coalition and redistribute their houses among
themselves like TTCA allocation – no agent is worse off since everyone receives her best choice and at least
one agent (who did not receive her best choice) is better off. So every core allocations must agree with TTCA
allocation on allocation of houses to agents in N1 . TTCA allocates best choice among houses in N2 ∪· · ·∪N` to
the agents in N2 . Every core allocation k must agree with TTCA allocation for agents in N2 also. Otherwise,
since k already agrees with TTCA allocation on N1 , the agents in N2 can form a coalition and redistribute
their houses among themselves like TTCA allocation – no agent is worse off since everyone receives her best
choice in N2 ∪ · · · ∪ N` and at least one agent (who did not receive her best choice in N2 ∪ · · · ∪ N` ) is
better off. Hence every core allocations must agree with TTCA allocation on allocation of houses to agents
in N1 ∪ N2 . Continuing inductively, we obtain the result.
103
Acknowledgement
I have closely followed the following material for preparing this lecture note.
I also thank all the students of the CS60025 Algorithmic Game Theory course for their invaluable inputs
with special mention to Aditya Anand, Arnab Maiti, Madhu Sudan Reddy, and Nishant Baranwal Somy.
104
Bibliography
[ABHM10] Itai Ashlagi, Mark Braverman, Avinatan Hassidim, and Dov Monderer. Monotonicity and imple-
mentability. Econometrica, 78(5):1749–1772, 2010.
[BCL+ 06] Sushil Bikhchandani, Shurojit Chatterji, Ron Lavi, Ahuva Mu’alem, Noam Nisan, and Arunava
Sen. Weak monotonicity characterizes deterministic dominant-strategy implementation. Econo-
metrica, 74(4):1109–1132, 2006.
[Ben00] Jean-Pierre Benoıt. The gibbard–satterthwaite theorem: a simple proof. Econ. Lett., 69(3):319–
322, 2000.
[DMP06] Constantinos Daskalakis, Aranyak Mehta, and Christos Papadimitriou. A note on approximate
nash equilibria. In International Workshop on Internet and Network Economics, pages 297–306.
Springer, 2006.
[EY10] Kousha Etessami and Mihalis Yannakakis. On the complexity of nash equilibria and other fixed
points. SIAM Journal on Computing, 39(6):2531–2597, 2010.
[GZ89] Itzhak Gilboa and Eitan Zemel. Nash and correlated equilibria: Some complexity considerations.
GEB, 1(1):80–93, 1989.
[JPY88] David S Johnson, Christos H Papadimitriou, and Mihalis Yannakakis. How easy is local search?
J. Comput. Syst. Sci, 37(1):79–100, 1988.
[MP91] Nimrod Megiddo and Christos H Papadimitriou. On total functions, existence theorems and
computational complexity. Theoretical Computer Science, 81(2):317–324, 1991.
[MS96] Dov Monderer and Lloyd S Shapley. Potential games. GEB, 14(1):124–143, 1996.
[NRTV07] Noam Nisan, Tim Roughgarden, Éva Tardos, and Vijay V. Vazirani, editors. Algorithmic Game
Theory. Cambridge University Press, 2007.
[Ros73] Robert W Rosenthal. A class of games possessing pure-strategy nash equilibria. Int. J. Game
Theory, 2(1):65–67, 1973.
[Rot08] Alvin E. Roth. Deferred acceptance algorithms: history, theory, practice, and open questions. Int.
J. Game Theory, 36(3-4):537–569, 2008.
105
[Sch91] Alejandro A Schäffer. Simple local search problems that are hard to solve. SIAM J. Comput,
20(1):56–87, 1991.
[SS74] Lloyd Shapley and Herbert Scarf. On cores and indivisibility. Journal of mathematical economics,
1(1):23–37, 1974.
[WHJ+ ] David Wells, Roger A Horn, Charles R Johnson, Michael Maschler, Eilon Solan, Shmuel Zamir,
Robin Pemantle, Mark C Wilson, Joachim von zur Gathen, Jürgen Gerhard, et al. Great new
titles in combinatorics from cambridge university press.
106