0% found this document useful (0 votes)

203 views27 pages

Tutorial - AlphaGo PDF

Maximize expected reward z Policy gradient ascent on weights w

Uploaded by

Ze Wei Ng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

203 views27 pages

Tutorial - AlphaGo PDF

Maximize expected reward z Policy gradient ascent on weights w

Uploaded by

Ze Wei Ng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

A Tutorial on Google AlphaGo

Bo An
18/3/2016

Slides based on:

 Silver, David, et al. “Mastering the game of Go with deep neutral networks and tree search .” Nature
529.7587 (2016): 484-489.
 Shane Moon’s presentation
Games in AI
 Ideal test bed for AI research
Clear results
Clear motivation
Good challenge
 Success in search-based approach
chess (1997, Deep Blue)
and others
 Not successful in the game of Go
Go is to Chess as Poetry is to Double-entry accounting
It goes to the core of artificial intelligence, which involves the study of learning
and decision-making, strategic thinking, knowledge representation, pattern
recognition and, perhaps most intriguingly, intuition

2 /27
The game of Go

 An 4,000 years old board

game from China
 Standard size 19×19
 Two players, Black and
White, place the stones in
turns
 Stones can not be moved, but
can be captured and taken off
 Larger territory wins

3 /27
AlphaGo vs European Champion (Fan Hui 2-Dan*)rank

October 5– 9, 2015

- Time limit: 1 hour

- AlphaGo Wins (5:0)
4 /27
AlphaGo vs World Champion (Lee Sedol 9-Dan)

March 9 – 15, 2016

• Time limit: 2 hours

• Venue: Seoul, Four
Seasons Hotel

• AlphaGo Wins (4:1)

5 /27
Computer Go AI - Definition

d=1 d=2

Computer Go
s Artificial a s’
=
Intelligence

s (state) e.g. we can represent the board into a matrix-like form

a (action)

Given s, pick the best a

6 /27
Computer Go AI – An Implementation Idea?
d=1 d=2 d=3
... d = max D ~ 19x19=361
Process the simulation until the
game ends, then report win / lose
results
How about simulating all possible board
positions?e.g. it wins 13 times if the next stone gets placed here

…
37,839 times
Choose the “next action / stone”
that has the most win-counts in
the full-scale simulation
431320 times

Can also apply Minmax search learned in

CZ3005

7 /27
8 /27
AlphaGo: Key Ideas

 Objective: Reduce search space without sacrificing quality

 Key Idea 1: Take advantage of human top players’ data
Deep learning
 Key Idea 2: Self-play
Reinforcement learning
 Key Idea 3: Looking ahead
Monte Carlo tree search
We learned Minimax search with evaluation functions

9 /27
Reducing Searching Space
2.
1. Position
Reducingevaluation ahead of time
“action candidates” (Depth
(Breadth Reduction)
Reduction)

d=1 d=2 d=3 …

… dd =
= maxD
maxD
V=1
Win?
Win?
V=2 Lose?
Lose?

V = 10
…

If
If there
thereis of
Instead
Remove athese
is model that can
asimulating
function
from tell you
that
until
search can that
the
candidates
these moves V(s):
measure: are not“board
common / probable (e.g.
evaluation
maximum
in advance depth..
(breadth reduction)
by experts, etc.)…
of state s”

10 /27
1. Reducing“action candidates”

Learning: P ( next action | current state )

= P ( a | s)

11 /27
1. Reducing“action candidates”
(1)Imitating expert moves (supervised learning)

Current State Next State

s1 s2
Prediction
s2 s3
Model
s3 s4

Data: Online Go Experts (5 ~ 9 dan)

160K games, 30M board positions

12 /27
1. Reducing“action candidates”

(1)Imitating expert moves (supervised learning)

Current Board Board

Next Action

Prediction
Model

s f: s a a
There are 19×19 = 361 possible actions
(with different possibilities)

13 /27
1. Reducing“action candidates”
(1)Imitating expert moves (supervised learning)

Current Board Next Action

Deep
Learning
Prediction
(13 layer
CNN)
Model

s g: s p(a|s) p(a|s) argmax a

14 /27
Convolutional Neural Network (CNN)
Go: abstraction is the key to win
CNN: abstraction is its forte

15 /27
1. Reducing “action candidates”

(1)Imitating expert moves (supervised learning)

Current Board Next Action

Expert Moves Imitator Model

(w/CNN)

Training:

16 /27
1. Reducing “action candidates”

(2)Improving through self-plays (reinforcement learning)

improving by playing against itself

Expert Moves Expert Moves

Imitator Model VS Imitator Model
(w/CNN) (w/CNN)

Return: board positions, win/lose info

17 /27
1. Reducing “action candidates”

(2)Improving through self-plays (reinforcement learning)

Board position win/loss

Expert Moves Imitator Model

Win
Loss
(w/CNN)
z = +1
-1

Training:

18 /27
1. Reducing “action candidates”

(2)Improving through self-plays (reinforcement learning)

older models vs. newer models

Expert Moves
Updated Model Updated
Updated
Updated
Updated Model
Model
Model
Model
Imitator
ver 1.3
Model
1.1 VS ver1,000,000
1,000,000
ver 46235.2
ver 1.3
ver 1.5
3204.1 ver ver 2.0
1.7

It uses the same topology as the expert moves imitator model, and just uses the updated parameters

Return:
The finalboard
modelpositions,
wins 80%win/lose infowhen
of the time
playing against the first model

19 /27
2. Board Evaluation

Adds a regression layer to the model

Predicts values between 0~1
Close to 1: a good board position
Close to 0: a bad board position

Board position Win / Lose

Updated Model Value

ver 1,000,000 Prediction Win
Model (0 / 1)
(Regression)

Training:
20 /27
Reducing Search Space

1. Reducing“action candidates”
(Breadth Reduction)

Policy Network

2. Board Evaluation (Depth Reduction)

Value Network

21 /27
Looking Ahead (Monte Carlo Search Tree)

22 /27
Looking Ahead (Monte Carlo Search Tree)

23 /27
Results

24 /27
AlphaGo
Lee Sedol 9-dan vs AlphaGo Energy Consumption

A very, very tough calculation;)

25 /27
AlphaGo
Taking CPU / GPU resources to virtually infinity

26 /27
Discussions

 AlphaGo’s Weakness
make the state complicated
……
 What is the next step?
Poker, Mahjong

Chess RoboCup
Environment Static Dynamic

State Change Turn taking Real time

Information
Complete Incomplete
accessibility
Sensor Non-
Symbolic
Readings symbolic
Control Central Distributed

27 /27

AlphaGo's AI Mastery in Go
100% (1)
AlphaGo's AI Mastery in Go
41 pages
Lecture 23
No ratings yet
Lecture 23
52 pages
Final RL
No ratings yet
Final RL
10 pages
Alphago Game Algo
No ratings yet
Alphago Game Algo
3 pages
AlphaGo Paper
No ratings yet
AlphaGo Paper
20 pages
Mastering The Game of Go Without Human Knowledge
100% (2)
Mastering The Game of Go Without Human Knowledge
18 pages
Mastering The Game of Go With Deep Neural Networks and Tree Search - Nature - Nature Research
100% (1)
Mastering The Game of Go With Deep Neural Networks and Tree Search - Nature - Nature Research
15 pages
AlphaGo Zero: Reinforcement Learning Mastery
No ratings yet
AlphaGo Zero: Reinforcement Learning Mastery
42 pages
Lecture 22
No ratings yet
Lecture 22
44 pages
AlphaGo's AI Training and Achievements
100% (1)
AlphaGo's AI Training and Achievements
16 pages
AlphaGo Zero Pseudo Code Guide
100% (1)
AlphaGo Zero Pseudo Code Guide
3 pages
Alpha Go Nature Paper
100% (1)
Alpha Go Nature Paper
20 pages
Case Study On AlphaGo Zero
100% (1)
Case Study On AlphaGo Zero
21 pages
RADL LACuong
No ratings yet
RADL LACuong
81 pages
2014 Move Evaluation in Go Using Deep Convolutional Neural Networks David Silver Ilya Sutsveker
No ratings yet
2014 Move Evaluation in Go Using Deep Convolutional Neural Networks David Silver Ilya Sutsveker
8 pages
Alphago - AT: The First Computer Program To Ever Beat A Professional Player GO
No ratings yet
Alphago - AT: The First Computer Program To Ever Beat A Professional Player GO
4 pages
Mastering Chess and Shogi by Self-Play With A General Reinforcement Learning Algorithm
100% (1)
Mastering Chess and Shogi by Self-Play With A General Reinforcement Learning Algorithm
19 pages
Group-17 Go PPT
No ratings yet
Group-17 Go PPT
10 pages
Deep Learning
No ratings yet
Deep Learning
8 pages
L1 - UCLxDeepMind DL2020
No ratings yet
L1 - UCLxDeepMind DL2020
97 pages
Learning To Play Go From Scratch
100% (1)
Learning To Play Go From Scratch
2 pages
Table of Content
No ratings yet
Table of Content
33 pages
Section 5
No ratings yet
Section 5
29 pages
Deep Learning for Go AI
No ratings yet
Deep Learning for Go AI
9 pages
Alpha Zero
100% (1)
Alpha Zero
23 pages
AI Mastery in Go: No Human Input
No ratings yet
AI Mastery in Go: No Human Input
43 pages
Continuous Time 2
No ratings yet
Continuous Time 2
91 pages
B C G P N N - L - P: Etter Omputer O Layer With Eural ET Work and ONG Term Rediction
No ratings yet
B C G P N N - L - P: Etter Omputer O Layer With Eural ET Work and ONG Term Rediction
10 pages
Alphago Summary
No ratings yet
Alphago Summary
5 pages
Deep Learning in Shogi Training
No ratings yet
Deep Learning in Shogi Training
6 pages
AI Masters Chess, Shogi, and Go
No ratings yet
AI Masters Chess, Shogi, and Go
5 pages
Simple Review of AlphaGo
No ratings yet
Simple Review of AlphaGo
1 page
Lecture05 AdversarialSearch
No ratings yet
Lecture05 AdversarialSearch
51 pages
Green and Black Modern Machine Learning Presentation
No ratings yet
Green and Black Modern Machine Learning Presentation
14 pages
CH5 - Function Approximation
No ratings yet
CH5 - Function Approximation
33 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
Improvements To Increase The Efficiency of The Alphazero Algorithm: A Case Study in The Game 'Connect 4'
No ratings yet
Improvements To Increase The Efficiency of The Alphazero Algorithm: A Case Study in The Game 'Connect 4'
9 pages
Ee126 Project 1
No ratings yet
Ee126 Project 1
5 pages
Unit 4
No ratings yet
Unit 4
9 pages
Lecture 21
No ratings yet
Lecture 21
29 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Comp Go
No ratings yet
Comp Go
69 pages
Probability in Computer Science
No ratings yet
Probability in Computer Science
15 pages
Lecture 24
No ratings yet
Lecture 24
25 pages
The Rise of AI in Gaming From Chess To Go and Beyond
No ratings yet
The Rise of AI in Gaming From Chess To Go and Beyond
8 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
2021 Lecture05 AdversarialSearch
No ratings yet
2021 Lecture05 AdversarialSearch
46 pages
Unit 5 Digital Age
No ratings yet
Unit 5 Digital Age
60 pages
Learning To Stop: Dynamic Simulation Monte-Carlo Tree Search
No ratings yet
Learning To Stop: Dynamic Simulation Monte-Carlo Tree Search
10 pages
AI Masters Chess, Shogi, and Go
No ratings yet
AI Masters Chess, Shogi, and Go
32 pages
Assignment
No ratings yet
Assignment
2 pages
AI Assignment
No ratings yet
AI Assignment
5 pages
4-Hebbian Net-25-Jul-2018 - Reference Material I - Learning Rules in Neural Network PDF
No ratings yet
4-Hebbian Net-25-Jul-2018 - Reference Material I - Learning Rules in Neural Network PDF
19 pages
Compare 5 ML Classification Algorithms
No ratings yet
Compare 5 ML Classification Algorithms
4 pages
Photogrammetry: Dr. Razak Zakariya Lecturer Department of Marine Science FMSM UMT by
100% (1)
Photogrammetry: Dr. Razak Zakariya Lecturer Department of Marine Science FMSM UMT by
12 pages
A CNN-based Framework For Comparison of Contactless To Contact-Based Fingerprints
No ratings yet
A CNN-based Framework For Comparison of Contactless To Contact-Based Fingerprints
1 page
HIRA
No ratings yet
HIRA
7 pages
CS405-6 2 1 2-Wikipedia
No ratings yet
CS405-6 2 1 2-Wikipedia
7 pages
Resume
No ratings yet
Resume
1 page
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
No ratings yet
Omar Arif Omar - Arif@seecs - Edu.pk National University of Sciences and Technology
44 pages
Robotics For Intelligent Environments: Manfred Huber
No ratings yet
Robotics For Intelligent Environments: Manfred Huber
26 pages
DWDM PPT
No ratings yet
DWDM PPT
35 pages
Assignment Unit-1: III Year VI Semester CSE
No ratings yet
Assignment Unit-1: III Year VI Semester CSE
7 pages
Evaluasi Sistem Proteksi Kebakaran Pada Bangunan Hotel UNY 5 Lantai Di Yogyakarta
No ratings yet
Evaluasi Sistem Proteksi Kebakaran Pada Bangunan Hotel UNY 5 Lantai Di Yogyakarta
7 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning
9 pages
Mid Poster (123
No ratings yet
Mid Poster (123
1 page
Segmentation Techniques Comparison in Image Processing: Abstract
No ratings yet
Segmentation Techniques Comparison in Image Processing: Abstract
12 pages
Equinix (2019) Equinix IBX Sustainability Quick
No ratings yet
Equinix (2019) Equinix IBX Sustainability Quick
22 pages
Video De-Fencing Techniques
No ratings yet
Video De-Fencing Techniques
18 pages
The Evolution of Robotics Waves
No ratings yet
The Evolution of Robotics Waves
2 pages
RFCC Project Safety Analysis Guide
No ratings yet
RFCC Project Safety Analysis Guide
3 pages
Robotics for Engineering Students
No ratings yet
Robotics for Engineering Students
19 pages
Lab2 Solution PDF
No ratings yet
Lab2 Solution PDF
2 pages
Ktu Students: For More Study Materials WWW - Ktustudents.in
No ratings yet
Ktu Students: For More Study Materials WWW - Ktustudents.in
3 pages
Comprehensive Machine Learning Guide
No ratings yet
Comprehensive Machine Learning Guide
2 pages
LEED Lab Course at PUCP: Student Teams
No ratings yet
LEED Lab Course at PUCP: Student Teams
10 pages
Advanced Face Recognition Techniques
No ratings yet
Advanced Face Recognition Techniques
13 pages
Smart Grid Data Analytics Insights
No ratings yet
Smart Grid Data Analytics Insights
7 pages
423 Artificial Intelligence Indiashastra
No ratings yet
423 Artificial Intelligence Indiashastra
1 page
Intelligence, Biohacking, Post-Humanities, Inhuman Rationalism, Transhumanism, Xenofeminism
No ratings yet
Intelligence, Biohacking, Post-Humanities, Inhuman Rationalism, Transhumanism, Xenofeminism
4 pages