[go: up one dir, main page]

0% found this document useful (0 votes)
104 views36 pages

08 Adversarial Search

The document discusses search techniques for solving problems in adversarial games like chess, including minimax search which finds the optimal move by maximizing the minimum payoff against an optimal opponent, and α-β pruning which improves search efficiency by pruning branches that cannot influence the result. It also covers challenges for real-time games where the search must be cutoff before reaching an end state due to time constraints.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views36 pages

08 Adversarial Search

The document discusses search techniques for solving problems in adversarial games like chess, including minimax search which finds the optimal move by maximizing the minimum payoff against an optimal opponent, and α-β pruning which improves search efficiency by pruning branches that cannot influence the result. It also covers challenges for real-time games where the search must be cutoff before reaching an end state due to time constraints.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Introduction to

Artificial Intelligence
Chapter 2: Solving Problems
by Searching (6)
Adversarial Search
Nguyễn Hải Minh, Ph.D
nhminh@fit.hcmus.edu.vn
Outline
1. Games
2. Optimal Decisions in Games
3. α-β Pruning
4. Imperfect, Real-time Decisions

7/12/2020 Nguyễn Hải Minh @ FIT 2


Games vs. Search Problems
❑Unpredictable opponent
→specifying a move for every possible opponent
reply
❑Competitive environments:
→ the agents’ goals are in conflict
❑Time limits
→unlikely to find goal, must approximate
❑Example of complexity:
o Chess: b=35 , d = 100 ➔ Tree Size: ~10154
o Go: b=1000 (!)

7/12/2020 Nguyễn Hải Minh @ FIT 3


Types of Games
Deterministic Chance
Perfect Chess, Checkers, Go, Backgammon
information Othello Monopoly
Imperfect Bridge, poker,
information scrabble nuclear
war

7/12/2020 Nguyễn Hải Minh @ FIT 4


Types of Games

7/12/2020 Nguyễn Hải Minh @ FIT 5


Primary Assumptions
❑Assume only two players
❑There is no element of chance
o No dice thrown, no cards drawn, etc
❑Both players have complete knowledge of the state of
the game
o Examples are chess, checkers and Go
o Counter examples: poker
❑Zero-sum games
o Each player wins (+1), loses (0), or draws (1/2)
❑Rational Players
o Each player always tries to maximize his/her utility

7/12/2020 Nguyễn Hải Minh @ FIT 6


Game Setup (Formulation)
❑Two players: MAX and MIN
❑MAX moves first and then they take turns until the game is over
o Winner gets reward, loser gets penalty.
❑Games as search:
o S0 – Initial state: how the game is set up at the start
• e.g. board configuration of chess
o PLAYER(s): MAX or MIN is playing
o ACTIONS(s) – Successor function: list of (move, state) pairs
specifying legal moves.
o RESULT(s, a) – Transition model: result of a move a on state s
o TERMINAL-TEST(s): Is the game finished?
o UTILITY(s, p) – Utility function: Gives numerical value of terminal
states s for a player p
• e.g. win (+1), lose (0) and draw (1/2) in tic-tac-toe or chess
7/12/2020 Nguyễn Hải Minh @ FIT 7
Tic-Tac-Toe Game Tree

MAX uses search tree to


determine next move.

7/12/2020 Nguyễn Hải Minh @ FIT 8


Chess
❑Complexity:
o b ~ 35
o d ~100
o search tree is ~ 10154 nodes (!!)
→completely impractical to search this

❑Deep Blue: (May 11, 1997)


o Kasparov lost a 6-game match against IBM’s Deep Blue (1
win Kasp – 2 wins DB) and 3 ties.
❑In the future, focus will be to allow computers to LEARN
to play chess rather than being TOLD how it should play

7/12/2020 Nguyễn Hải Minh @ FIT 9


Deep Blue
❑Ran on a parallel computer with 30 IBM RS/6000
processors doing alpha–beta search.
❑Searched up to 30 billion positions/move, average depth
14 (be able to reach to 40 plies).
❑Evaluation function: 8000 features
o highly specific patterns of pieces (~4000 positions)
o 700,000 grandmaster games in database
❑Working at 200 million positions/sec, even Deep Blue
would require 10100 years to evaluate all possible games.
(The universe is only 1010 years old.)
❑Now: algorithmic improvements have allowed programs
running on standard PCs to win World Computer Chess
Championships.
o Pruning heuristics reduce the effective branching factor to
less than 3

7/12/2020 Nguyễn Hải Minh @ FIT 10


Checkers
❑Complexity:
o search tree is ~ 1018 nodes
→requires 100k years if solving 106 positions/sec
❑Chinook (1989-2007)
o The first computer program to win the world champion
title in a competition against humans.
o 1990: won 2 games in competition with world champion
Tinsley (final score: 2-4, 33 draws)
o 1994: 6 draws
❑Chinook’s search:
o Ran on regular PCs, used alpha-beta search.
o Play perfectly using alpha-beta search combining with a
database of 39 trillion endgame positions.
7/12/2020 Nguyễn Hải Minh @ FIT 11
GO 1 million trillion trillion
trillion trillion more
configurations than chess!
❑Complexity:
o Board: 19x19 → Branching factor: 361,
average depth ~ 200
o ~ 10174 possible board configuration.
o Control of territory is unpredictable until the endgame.
❑AlphaGo (2016) by Google
o Beat 9-dan professional Lee Sedol (4-1)
o Machine learning + Monte Carlo search guided by a “value
network” and a “policy network” (implemented using deep
neural network technology)
o Learn from human + Learn by itself (self-play games)

7/12/2020 Nguyễn Hải Minh @ FIT 12


Optimal Decision in Games
❑In normal search problem:
o Optimal solution is a sequence of action leading to a
goal state
❑In games:
o A search path that guarantee win for a player
o The optimal strategy can be determined from the
minimax value of each node

For MAX

7/12/2020 Nguyễn Hải Minh @ FIT 14


A two-ply game tree
MAX best move

MIN best move

Utility values for MAX


7/12/2020 Nguyễn Hải Minh @ FIT 15
Minimax Algorithm
❑John von Neumann devised a search technique,
called Minimax
❑You play against an opponent
o Your objectives are in direct opposition
o MAX tries to maximize his play while trying to
minimize his opponent’s (MIN’s) play
❑To implement Minimax, you need to know how
good (or bad) your position is.
o That is called the Utility function

7/12/2020 Nguyễn Hải Minh @ FIT 16


Minimax Algorithm
❑Definition of optimal play for MAX assumes MIN
plays optimally:
o maximizes worst-case outcome for MAX
❑But if MIN does not play optimally, MAX will do
even better
❑Minimax uses depth first search to traverse the
game tree
o Complete depth-first exploration of the game tree

7/12/2020 Nguyễn Hải Minh @ FIT 17


Minimax algorithm MAX best move

7/12/2020 Nguyễn Hải Minh @ FIT 18


Properties of minimax
❑Complete?
o Yes (if tree is finite)
❑Optimal?
o Yes (against an optimal opponent)
❑Time complexity?
o O(bm)
❑Space complexity?
o O(bm) (depth-first exploration)
For chess, b ≈ 35, m ≈100 for "reasonable" games
→ exact solution completely infeasible
7/12/2020 Nguyễn Hải Minh @ FIT 19
QUIZ
Calculate the utility value for the remaining nodes.
Which node should MAX and MIN choose?

7/12/2020 Nguyễn Hải Minh @ FIT 20


Problem with Minimax Search
❑Number of game states is exponential in the
number of moves.
o Solution: Do not examine every node
→pruning: Remove branches that do not influence final
decision

❑Bounded lookahead
o Limit depth for each search
o This is what chess players do: look ahead for a few
moves and see what looks best

7/12/2020 Nguyễn Hải Minh @ FIT 21


α-β pruning
❑Idea:
o If a move A is determined to be worse
than move B that has already been
examined and discarded, then examining
move A once again is pointless.
• α: best already explored option (utility value) along
path to the root for MAX
• β: best already explored option (utility value) along
path to the root for MIN

7/12/2020 Nguyễn Hải Minh @ FIT 22


The α-β algorithm

7/12/2020 Nguyễn Hải Minh @ FIT 23


α-β pruning example
Value range of Minimax
value for MAX

Value range of Minimax


value for MIN

7/12/2020 Nguyễn Hải Minh @ FIT 24


α-β pruning example

7/12/2020 Nguyễn Hải Minh @ FIT 25


α-β pruning example

7/12/2020 Nguyễn Hải Minh @ FIT 26


α-β pruning example

7/12/2020 Nguyễn Hải Minh @ FIT 27


α-β pruning example

Prune these
nodes! WHY?
7/12/2020 Nguyễn Hải Minh @ FIT 28
Properties of α-β pruning
❑Pruning does not affect final result
o Best case: Pruning can reduce tree size
o Worst case: as good as Minimax algorithm
❑Good move ordering improves effectiveness of
pruning
❑With "perfect ordering," time complexity =
O(bm/2)
→ doubles depth of search
❑In chess, Deep Blue achieved reduced the depth
from 38 to 6
7/12/2020 Nguyễn Hải Minh @ FIT 29
Why is it called α-β?
❑α is the value of the best
(i.e., highest-value) choice
found so far at any choice
point along the path for max
❑If v is worse than α, max will
avoid it
→ prune that branch
❑Define β similarly for min

7/12/2020 Nguyễn Hải Minh @ FIT 30


QUIZ
Calculate the utility value for the remaining nodes.
Which node(s) should be pruned?

7/12/2020 Nguyễn Hải Minh @ FIT 31


Imperfect, Real-time Decisions
❑Both Minimax and α-β pruning search all the
way to terminal states
o This depth is usually not practical because moves
must be made in a reasonable amount of time (~
minutes)
❑Standard approach:
o cutoff test:
e.g., depth limit
o evaluation function
= estimated desirability of position (win, lose, tie?)

7/12/2020 Nguyễn Hải Minh @ FIT 32


Evaluation functions
❑For chess, typically linear weighted sum of
features

Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)

Where wi: the value of the ith chess piece


o e.g., w1 = 9 with f1(s) = (#white queen) – (#black queen), etc.
o e.g. q = #queens, r = #rooks, n = #knights, b = #bishops,
p=#pawns
→Eval(s) = 9q + 5r + 3b + 3n + p

7/12/2020 Nguyễn Hải Minh @ FIT 33


Cutting off search
❑Minimax Cutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
❑Does it work in practice?
o bm = 106, b=35 → m=4
o 4-ply lookahead is a hopeless chess player!
o 4-ply ≈ human novice
o 8-ply ≈ typical PC, human master
o 12-ply ≈ Deep Blue, Kasparov
7/12/2020 Nguyễn Hải Minh @ FIT 34
Summary
❑Games are fun to work on!
❑They illustrate several important
points about AI
o perfection is unattainable → must
approximate
o good idea to think about what to think
about

7/12/2020 Nguyễn Hải Minh @ FIT 35


More reading (textbook, chapter
5.5—5.7)
❑Search vs lookup
❑Stochastic games
❑Partially observable games
❑State-of-the-art game programs

7/12/2020 Nguyễn Hải Minh @ FIT 36


Next week
❑Friday (Jun 15):
o Midterm Examination
o Close-book
o 45 mins

❑Lecture:
o Constraint Satisfaction Problems

7/12/2020 Nguyễn Hải Minh @ FIT 37

You might also like