Adversarial Search and Game
Playing
CHAPTER 5
ICE 3201
Bangladesh University of Professionals
Environment Type Discussed In this Lecture
2
Fully
Observable Turn-taking: Semi-dynamic
yes Deterministic and non-deterministic
Multi-agent
yes
Sequential
yes no
Discrete no
yes Discrete
yes
Game Game Matrices Continuous Action Games
Tree
Search
CMPT 310 - Blind Search
Adversarial Search
Examine the problems that arise when we try to
plan ahead in a world where other agents are
planning against us.
A good example is in board games.
Adversarial games, while much studied in AI, are a
small part of game theory in economics.
Typical AI assumptions
Two agents whose actions alternate
Utility values for each agent are the opposite of the
other
creates the adversarial situation
Fully observable environments
In game theory terms: Zero-sum games of perfect
information.
We’ll relax these assumptions later.
Search versus Games
Search – no adversary
Solution is (heuristic) method for finding goal
Heuristic techniques can find optimal solution
Evaluation function: estimate of cost from start to goal through given node
Examples: path planning, scheduling activities
Games – adversary
Solution is strategy (strategy specifies move for every possible opponent
reply).
Optimality depends on opponent. Why?
Time limits force an approximate solution
Evaluation function: evaluate “goodness” of game position
Examples: chess, checkers, Othello, backgammon
Types of Games
deterministic Chance moves • on-line
Perfect Chess, checkers, Backgammon, backgam
information go, othello monopoly mon
• on-line
Imperfect Bridge, Skat Poker, scrabble, chess
information blackjack •
(Initial Chance tic-tac-
Moves) toe
• Theorem of Nobel Laureate Harsanyi: Every game with
chance moves during the game has an equivalent representation
with initial chance moves only.
• A deep result, but computationally it is more tractable to
consider chance moves as the game goes along.
• This is basically the same as the issue of full observability +
nondeterminism vs. partial observability + determinism.
Game Setup
Two players: MAX and MIN
MAX moves first and they take turns until the game is over
Winner gets award, loser gets penalty.
Games as search:
Initial state: e.g. board configuration of chess
Successor function: list of (move,state) pairs specifying legal moves.
Terminal test: Is the game finished?
Utility function: Gives numerical value of terminal states. E.g. win (+1), lose
(-1) and draw (0) in tic-tac-toe or chess
MAX uses search tree to determine next move.
Size of search trees
b = branching factor
d = number of moves by both players
Search tree is O(bd)
Chess
b ~ 35
D ~100
- search tree is ~ 10 154 (!!)
- completely impractical to search this
Game-playing emphasizes being able to make optimal decisions in a finite amount of time
Somewhat realistic as a model of a real-world agent
Even if games themselves are artificial
Partial Game Tree for Tic-Tac-Toe
Game tree (2-player, deterministic, turns)
How do we search this tree to find the optimal move?
Minimax strategy: Look ahead and reason backwards
Find the optimal strategy for MAX assuming an
infallible MIN opponent
Need to compute this all the down the tree
Game Tree Search Demo
Assumption: Both players play optimally!
Given a game tree, the optimal strategy can be
determined by using the minimax value of each
node.
Zermelo 1912.
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Minimax maximizes the utility for the worst-case outcome for max
The minimax decision
Pseudocode for Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
v←MAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← -!
for a,s in SUCCESSORS(state) do
v ← MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←!
for a,s in SUCCESSORS(state) do
v ← MIN(v,MAX-VALUE(s))
return v
Minimax algorithm
Minimax algorithm
Minimax algorithm
Minimax algorithm
Minimax algorithm
Minimax algorithm
Example of Algorithm Execution
MAX to move
Minimax Algorithm
Complete depth-first exploration of the game tree
Assumptions:
Max depth = d, b legal moves at each point
E.g., Chess: d ~ 100, b ~35
Criterion Minimax
Time O(bd)
Space O(bd)
Practical problem with minimax search
Number of game states is exponential in the number of
moves.
Solution: Do not examine every node
=> pruning
Remove branches that do not influence final decision
Revisit example …
Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
[-!,+!]
[-!, +!]
Alpha-Beta Example (continued)
[-!,+!]
[-!,3]
Alpha-Beta Example (continued)
[-!,+!]
[-!,3]
Alpha-Beta Example (continued)
[3,+!]
[3,3]
Alpha-Beta Example (continued)
[3,+!]
This node is worse
for MAX
[3,3] [-!,2]
Alpha-Beta Example (continued)
[3,14] ,
[3,3] [-!,2] [-!,14]
Alpha-Beta Example (continued)
[3,5] ,
[3,3] ["!,2] [-!,5]
Alpha-Beta Example (continued)
[3,3]
[3,3] ["!,2] [2,2]
Alpha-Beta Example (continued)
[3,3]
[3,3] [-!,2] [2,2]
Alpha-beta Algorithm
Depth first search – only considers nodes along a single
path at any time
α = highest-value choice that we can guarantee for MAX
so far in the current subtree.
β = lowest-value choice that we can guarantee for MIN so
far in the current subtree.
update values of α and β during search and prunes
remaining branches as soon as the value is known to be
worse than the current α or β value for MAX or MIN.
Alpha-beta Demo.
Effectiveness of Alpha-Beta Search
Worst-Case
branches are ordered so that no pruning takes place. In this case alpha-beta
gives no improvement over exhaustive search
Best-Case
each player’s best move is the left-most alternative (i.e., evaluated first)
in practice, performance is closer to best rather than worst-case
In practice often get O(b(d/2)) rather than O(bd)
this is the same as having a branching factor of sqrt(b),
since (sqrt(b))d = b(d/2)
i.e., we have effectively gone from b to square root of b
e.g., in chess go from b ~ 35 to b ~ 6
this permits much deeper search in the same amount of time
Typically twice as deep.
Example
-which nodes can be pruned?
MAX
MIN
MAX
5 6
3 4 1 2 7 8
Alpha Beta Pruning
Alpha-beta pruning is a modified version of the
minimax algorithm. It is an optimization technique
for the minimax algorithm.
As we have seen in the minimax search algorithm
that the number of game states it has to examine are
exponential in depth of the tree.
Since we cannot eliminate the exponent, but we can
cut it to half. Hence there is a technique by which
without checking each node of the game tree we can
compute the correct minimax decision, and this
technique is called pruning.
Alpha Beta Pruning
Alpha-beta pruning can be applied at any depth of a
tree, and sometimes it not only prune the tree leaves
but also entire sub-tree.
The two-parameter can be defined as:
Alpha: The best (highest-value) choice we have found so far at
any point along the path of Maximizer. The initial value of
alpha is -∞.
Beta: The best (lowest-value) choice we have found so far at
any point along the path of Minimizer. The initial value of beta
is +∞.
Condition for Alpha Beta Pruning
The main condition which required for alpha-beta
pruning is:
α>=β
Key Points
The Max player will only update the value of alpha.
The Min player will only update the value of beta.
While backtracking the tree, the node values will be
passed to upper nodes instead of values of alpha and
beta.
We will only pass the alpha, beta values to the child
nodes.
Example
Example
Step 1: At the first step the, Max player will start
first move from node A where α= -∞ and β= +∞,
these value of alpha and beta passed down to node B
where again α= -∞ and β= +∞, and Node B passes
the same value to its child D.
Example
Example
Step 2: At Node D, the value of α will be calculated
as its turn for Max. The value of α is compared with
firstly 2 and then 3, and the max (2, 3) = 3 will be the
value of α at node D and node value will also 3.
Step 3: Now algorithm backtrack to node B, where
the value of β will change as this is a turn of Min,
Now β= +∞, will compare with the available
subsequent nodes value, i.e. min (∞, 3) = 3, hence at
node B now α= -∞, and β= 3.
Example
Example
In the next step, algorithm traverse the next
successor of Node B which is node E, and the values
of α= -∞, and β= 3 will also be passed.
Step 4: At node E, Max will take its turn, and the
value of alpha will change. The current value of alpha
will be compared with 5, so max (-∞, 5) = 5, hence at
node E α= 5 and β= 3, where α>=β, so the right
successor of E will be pruned, and algorithm will not
traverse it, and the value at node E will be 5.
Example
Example
Step 5: At next step, algorithm again backtrack the
tree, from node B to node A. At node A, the value of
alpha will be changed the maximum available value is
3 as max (-∞, 3)= 3, and β= +∞, these two values now
passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be
passed on to node F.
Step 6: At node F, again the value of α will be
compared with left child which is 0, and max(3,0)= 3,
and then compared with right child which is 1, and
max(3,1)= 3 still α remains 3, but the node value of F
will become 1.
Example
Example
Step 7: Node F returns the node value 1 to node C,
at C α= 3 and β= +∞, here the value of beta will be
changed, it will compare with 1 so min (∞, 1) = 1.
Now at C, α=3 and β= 1, and again it satisfies the
condition α>=β, so the next child of C which is G will
be pruned, and the algorithm will not compute the
entire sub-tree G.
Example
Example
Step 8: C now returns the value of 1 to A here the
best value for A is max (3, 1) = 3. Following is the
final game tree which is the showing the nodes which
are computed and nodes which has never computed.
Hence the optimal value for the maximizer is 3 for
this example.
Final Comments about Alpha-Beta Pruning
Pruning does not affect final results
Entire subtrees can be pruned.
Good move ordering improves effectiveness of
pruning
Repeated states are again possible.
Store them in memory = transposition table
Practical Implementation
How do we make these ideas practical in real game trees?
Standard approach:
cutoff test: (where do we stop descending the tree)
depth limit
better: iterative deepening
cutoff only when no big changes are expected to occur next (quiescence search).
evaluation function
When the search is cut off, we evaluate the current state
by estimating its utility using an evaluation function.
Static (Heuristic) Evaluation Functions
An Evaluation Function:
estimates how good the current board configuration is for a player.
Typically, one figures how good it is for the player, and how good it is for the
opponent, and subtracts the opponents score from the players
Othello: Number of white pieces - Number of black pieces
Chess: Value of all white pieces - Value of all black pieces
Typical values from -infinity (loss) to +infinity (win) or [-1, +1].
If the board evaluation is X for a player, it’s -X for the opponent.
Many clever ideas about how to use the evaluation function.
e.g. null move heuristic: let opponent move twice.
Example:
Evaluating chess boards,
Checkers
Tic-tac-toe
Iterative (Progressive) Deepening
In real games, there is usually a time limit T on making a
move
How do we take this into account?
using alpha-beta we cannot use “partial” results with any confidence
unless the full breadth of the tree has been searched
So, we could be conservative and set a conservative depth-limit
which guarantees that we will find a move in time < T
disadvantage is that we may finish early, could do more search
In practice, iterative deepening search (IDS) is used
IDS runs depth-first search with an increasing depth-limit
when the clock runs out we use the solution found at the previous
depth limit
The State of Play
Checkers:
Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994.
Chess:
Deep Blue defeated human world champion Garry Kasparov in
a six-game match in 1997.
Othello:
human champions refuse to compete against computers: they
are too good.
Go:
human champions refuse to compete against computers: they
are too bad b > 300 (!)
See (e.g.) http://www.cs.ualberta.ca/~games/ for more information
Deep Blue
1957: Herbert Simon
“within 10 years a computer will beat the world chess champion”
1997: Deep Blue beats Kasparov
Parallel machine with 30 processors for “software” and 480
VLSI processors for “hardware search”
Searched 126 million nodes per second on average
Generated up to 30 billion positions per move
Reached depth 14 routinely
Uses iterative-deepening alpha-beta search with
transpositioning
Can explore beyond depth-limit for interesting moves
Summary
Game playing can be effectively modeled as a search problem
Game trees represent alternate computer/opponent moves
Evaluation functions estimate the quality of a given board
configuration for the Max player.
Minimax is a procedure which chooses moves by assuming that
the opponent will always choose the move which is best for them
Alpha-Beta is a procedure which can prune large parts of the
search tree and allow search to go deeper
For many well-known games, computer algorithms based on
heuristic search match or out-perform human world experts.