0% found this document useful (0 votes)

10 views23 pages

Add-On DRL CS06

Uploaded by

pivam12168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views23 pages

Add-On DRL CS06

Uploaded by

pivam12168

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Previous Lectures

• Supervised learning
– classification, regression

• Unsupervised learning
– clustering

• Reinforcement learning
– more general than supervised/unsupervised learning
– learn from interaction w/ environment to achieve a goal

environment
reward action
new state
agent

Slides from Peter Bodík RAD Lab, UC Berkeley

Recall
• examples

• defining an RL problem
– Markov Decision Processes

• solving an RL problem
– Dynamic Programming
– Monte Carlo methods
– Temporal-Difference learning
– Policy Gradient Methods
Robot in a room
actions: UP, DOWN, LEFT, RIGHT
+1
UP
-1 80% move UP
10% move LEFT
10% move RIGHT
START

• reward +1 at [4,3], -1 at [4,2]

• reward -0.04 for each step

• what’s the strategy to achieve max reward?

• what if the actions were deterministic?
Other examples
• pole-balancing
• TD-Gammon [Gerry Tesauro]
• helicopter [Andrew Ng]

• no teacher who would say “good” or “bad”

– is reward “10” good or bad?
– rewards could be delayed

• similar to control theory

– more general, fewer constraints

• explore the environment and learn from experience

– not just blind search, try to be smart about it
Robot in a room
actions: UP, DOWN, LEFT, RIGHT
+1
UP

-1 80% move UP
10% move LEFT
10% move RIGHT
START
reward +1 at [4,3], -1 at [4,2]
reward -0.04 for each step

• states
• actions
• rewards

• what is the solution?

Is this a solution?
+1

-1

• only if actions deterministic

– not in this case (actions are stochastic)

• solution/policy
– mapping from each state to an action
Optimal policy
+1

-1
Reward for each step: -2
+1

-1
Reward for each step: -0.1
+1

-1
Reward for each step: -0.04
+1

-1
Reward for each step: -0.01
+1

-1
Reward for each step: +0.01
+1

-1
Markov Decision Process (MDP)
• set of states S, set of actions A, initial state S0
• transition model P(s,a,s’) environment
– P( [1,1], up, [1,2] ) = 0.8 reward action
new state
• reward function r(s) agent
– r( [4,3] ) = +1
• goal: maximize cumulative reward in the long run

• policy: mapping from S to A

– (s) or (s,a) (deterministic vs. stochastic)

• reinforcement learning
– transitions and rewards usually not available
– how to change the policy based on experience
– how to explore the environment
Computing return from rewards
• episodic (vs. continuing) tasks
– “game over” after N steps
– optimal policy depends on N; harder to analyze

• additive rewards
– V(s0, s1, …) = r(s0) + r(s1) + r(s2) + …
– infinite value for continuing tasks

• discounted rewards
– V(s0, s1, …) = r(s0) + γ*r(s1) + γ2*r(s2) + …
– value bounded if rewards bounded
Value functions
• state value function: V(s)
– expected return when starting in s and following 

• state-action value function: Q(s,a)

– expected return when starting in s, performing a, and following 

s
• useful for finding the optimal policy
– can estimate from experience a

– pick the best action using Q(s,a) r

s’
• Bellman equation
Optimal value functions
• there’s a set of optimal policies Backup Diagrams:
– V defines partial ordering on policies See Textbook from
Richard S. Sutton
– they share the same optimal value function
Chapter 3: Finite
Markov Decision
Processes
• Bellman optimality equation Page 48

– system of n non-linear equations a

– solve for V*(s) r

– easy to extract the optimal policy
s’

• having Q*(s,a) makes it even simpler

Dynamic programming
• main idea
– use value functions to structure the search for good policies
– need a perfect model of the environment

• two main components

– policy evaluation: compute V from 
– policy improvement: improve  based on V

– start with an arbitrary policy

– repeat evaluation/improvement until convergence
Policy evaluation/improvement
• policy evaluation:  -> V
– Bellman eqn’s define a system of n eqn’s
– could solve, but will use iterative version

– start with an arbitrary value function V0, iterate until Vk converges

• policy improvement: V -> ’

– ’ either strictly better than , or ’ is optimal (if  = ’)

Policy/Value iteration
• Policy iteration

– two nested iterations; too slow

– don’t need to converge to Vk
• just move towards it

• Value iteration

– use Bellman optimality equation as an update

– converges to V*
Using DP
• need complete model of the environment and rewards
– robot in a room
• state space, action space, transition model

• can we use DP to solve

– robot in a room?
– back gammon?
– helicopter?
Self Study Example

Iterative policy evaluation:

See Textbook from Richard S. Sutton

CHAPTER 4. DYNAMIC PROGRAMMING

Exercise 4.1, 4.2

Page 61-67
Generalized Policy Iteration (GPI)
Basic Idea:

• GPI allows policy evaluation and policy improvement to interact in a less rigid manner than traditional
policy iteration.
• Unlike strict policy iteration where evaluation and improvement alternate in distinct phases, GPI allows
for more flexible and interleaved interactions between these processes.
Outline
• examples

• defining an RL problem
– Markov Decision Processes

• solving an RL problem
– Dynamic Programming
– Monte Carlo methods
– Temporal-Difference learning

25 June 2019: Department of Agriculture
No ratings yet
25 June 2019: Department of Agriculture
2 pages
Pta Resume 2023
No ratings yet
Pta Resume 2023
3 pages
Set 10 ESOL (QCF) Guidance For Assessors - Writing L1
No ratings yet
Set 10 ESOL (QCF) Guidance For Assessors - Writing L1
2 pages
ArtStation - Oleksandr Bohdan
No ratings yet
ArtStation - Oleksandr Bohdan
9 pages
05PT 8 Parameters Restoration
No ratings yet
05PT 8 Parameters Restoration
6 pages
The Discovery and History of The Dalgaranga Meteorite Crater, Western Australia
No ratings yet
The Discovery and History of The Dalgaranga Meteorite Crater, Western Australia
11 pages
Report RCCL - Lavesh
No ratings yet
Report RCCL - Lavesh
21 pages
Customer Satisfaction Towards Dainik Bhaskar
No ratings yet
Customer Satisfaction Towards Dainik Bhaskar
10 pages
Tarjeta Kardex
No ratings yet
Tarjeta Kardex
15 pages
RL Module 4
No ratings yet
RL Module 4
50 pages
Ancient Vedic College o Kerala
No ratings yet
Ancient Vedic College o Kerala
4 pages
Dagudag Vs Paderanga
No ratings yet
Dagudag Vs Paderanga
19 pages
Span of Control
No ratings yet
Span of Control
18 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
Intel Galileo Blueprints 1st Edition Schwartz Marco download
No ratings yet
Intel Galileo Blueprints 1st Edition Schwartz Marco download
59 pages
Lec 08
No ratings yet
Lec 08
59 pages
HP Compaq nx5000. M-B CRYSTAL 1.0. Schematic Diagram.
No ratings yet
HP Compaq nx5000. M-B CRYSTAL 1.0. Schematic Diagram.
60 pages
06 MDP
No ratings yet
06 MDP
89 pages
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
No ratings yet
Markov Decision Processes & Reinforcement Learning: Megan Smith Lehigh University, Fall 2006
40 pages
Literature Review On Led Display
100% (2)
Literature Review On Led Display
6 pages
Topic 7-When Technology & Humanity Cross
No ratings yet
Topic 7-When Technology & Humanity Cross
24 pages
TAP (Telocator Alphanumeric Protocol) Specification v1 - 8
No ratings yet
TAP (Telocator Alphanumeric Protocol) Specification v1 - 8
27 pages
LLB-Part-3103 Company Law Notes PDF 2019-20
No ratings yet
LLB-Part-3103 Company Law Notes PDF 2019-20
5 pages
RL Lecturer (1)
No ratings yet
RL Lecturer (1)
38 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
CB24 Competition Details
No ratings yet
CB24 Competition Details
6 pages
3rd Merit List of AD Leading to BS Hospitality and Tourism Program Fall 2024
No ratings yet
3rd Merit List of AD Leading to BS Hospitality and Tourism Program Fall 2024
7 pages
DLMAIRIL01_Q4-2024_Session4
No ratings yet
DLMAIRIL01_Q4-2024_Session4
80 pages
Users Guide Cfast
No ratings yet
Users Guide Cfast
117 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Global Demographic Trends and Patterns
No ratings yet
Global Demographic Trends and Patterns
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
31 pages
M 2
No ratings yet
M 2
12 pages
Lec 09
No ratings yet
Lec 09
51 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
CSE2530__Reinforcement_Learning__2025_P1+2
No ratings yet
CSE2530__Reinforcement_Learning__2025_P1+2
115 pages
lecture-06
No ratings yet
lecture-06
98 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
17 - Markov Decision Processes.pptx
No ratings yet
17 - Markov Decision Processes.pptx
59 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
02 MarkovDecisionProcess
No ratings yet
02 MarkovDecisionProcess
51 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
DLMAIRIL01_Q4-2024_Session2
No ratings yet
DLMAIRIL01_Q4-2024_Session2
68 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
7 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
20AI903_RL_UNIT 2
No ratings yet
20AI903_RL_UNIT 2
27 pages
A Brief Introduction To Reinforcement Learning
No ratings yet
A Brief Introduction To Reinforcement Learning
4 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Motherboard
No ratings yet
Motherboard
4 pages
Unit 4
No ratings yet
Unit 4
49 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Reinforcement-Learning-Cheatsheet
No ratings yet
Reinforcement-Learning-Cheatsheet
16 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
A crash course on reinforcement learning - Felix Wagner
No ratings yet
A crash course on reinforcement learning - Felix Wagner
84 pages
Working Conditions and Its Effect On Employees Performance in An Industrial Ion
91% (11)
Working Conditions and Its Effect On Employees Performance in An Industrial Ion
100 pages
DSA5102_lecture11
No ratings yet
DSA5102_lecture11
44 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
Lec17-ReinforcementLearning
No ratings yet
Lec17-ReinforcementLearning
58 pages
UNIT-5 AI
No ratings yet
UNIT-5 AI
19 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
No ratings yet
(Partially Observable) Markov Decision Processes: Frederike Petzschner & Lionel Rigoux
19 pages
An Introduction to Reinforcement Learning From theory to algorithms (December 19, 2024)_Joon Kwon
No ratings yet
An Introduction to Reinforcement Learning From theory to algorithms (December 19, 2024)_Joon Kwon
66 pages
Apostilles and Authentications: Office of The Secretary of State of Texas Government Filings Section
No ratings yet
Apostilles and Authentications: Office of The Secretary of State of Texas Government Filings Section
37 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Estimating and Costing PDF
No ratings yet
Estimating and Costing PDF
8 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
RDX Series Two-Way Radios User Guide Rdu2080d-Rdv2080d-Rdu4160d
No ratings yet
RDX Series Two-Way Radios User Guide Rdu2080d-Rdv2080d-Rdu4160d
104 pages
Hill Climbing: Fundamentals and Applications
From Everand
Hill Climbing: Fundamentals and Applications
Fouad Sabry
No ratings yet
India Mumbai Office Q1 2023
No ratings yet
India Mumbai Office Q1 2023
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
No ratings yet
Stochastic Process - Markov Property - Markov Chain - Markov Decision Process - Reinforcement Learning - RL Techniques - Example Applications
39 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
Nature of Problem Based and Project Based TTL Maam Princes - 010249
No ratings yet
Nature of Problem Based and Project Based TTL Maam Princes - 010249
24 pages
A Project Report On Study On Home Loans in Icici Bank PDF
100% (4)
A Project Report On Study On Home Loans in Icici Bank PDF
48 pages
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
No ratings yet
Reinforcement Learning: Amulya Viswambaran (202090007) Kehkashan Fatima (202090202) Sruthi Krishnan (202090333)
40 pages