0% found this document useful (0 votes)

101 views19 pages

Reinforcement Learning - Introduction

This lecture introduces reinforcement learning. It discusses how RL differs from supervised and unsupervised learning in having delayed feedback without a supervisor. RL finds optimal policies by maximizing cumulative reward through sequential decision making and trial-and-error interactions with an environment. Key components of an RL agent include its policy, value function, and model of the environment. Exploration and exploitation is also introduced as balancing learning from experiences versus taking known optimal actions.

Uploaded by

BONGA GIBSON BALENI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views19 pages

Reinforcement Learning - Introduction

Uploaded by

BONGA GIBSON BALENI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Lecture 1: Introduction to Reinforcement Learning

Lecture 1: Introduction to Reinforcement

Learning

David Silver
Lecture 1: Introduction to Reinforcement Learning
About RL

Many Faces of Reinforcement Learning

Computer Science

Engineering Neuroscience
Machine
Learning
Optimal Reward
Control System
Reinforcement
Learning
Operations Classical/Operant
Research Conditioning
Bounded
Mathematics Psychology
Rationality

Economics
Lecture 1: Introduction to Reinforcement Learning
About RL

Branches of Machine Learning

Supervised Unsupervised
Learning Learning

Machine
Learning

Reinforcement
Learning
Lecture 1: Introduction to Reinforcement Learning
About RL

Characteristics of Reinforcement Learning

What makes reinforcement learning different from other machine

learning paradigms?
There is no supervisor, only a reward signal
Feedback is delayed, not instantaneous
Time really matters (sequential, non i.i.d data)
Agent’s actions affect the subsequent data it receives
Lecture 1: Introduction to Reinforcement Learning
About RL

Examples of Reinforcement Learning

Fly stunt manoeuvres in a helicopter

Defeat the world champion at Backgammon
Manage an investment portfolio
Control a power station
Make a humanoid robot walk
Play many different Atari games better than humans
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Reward

Rewards

A reward Rt is a scalar feedback signal

Indicates how well agent is doing at step t
The agent’s job is to maximise cumulative reward
Reinforcement learning is based on the reward hypothesis
Definition (Reward Hypothesis)
All goals can be described by the maximisation of expected
cumulative reward
Do you agree with this statement?
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Reward

Examples of Rewards
Fly stunt manoeuvres in a helicopter
+ve reward for following desired trajectory
−ve reward for crashing
Defeat the world champion at Backgammon
+/−ve reward for winning/losing a game
Manage an investment portfolio
+ve reward for each $ in bank
Control a power station
+ve reward for producing power
−ve reward for exceeding safety thresholds
Make a humanoid robot walk
+ve reward for forward motion
−ve reward for falling over
Play many different Atari games better than humans
+/−ve reward for increasing/decreasing score
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Reward

Sequential Decision Making

Goal: select actions to maximise total future reward

Actions may have long term consequences
Reward may be delayed
It may be better to sacrifice immediate reward to gain more
long-term reward
Examples:
A financial investment (may take months to mature)
Refuelling a helicopter (might prevent a crash in several hours)
Blocking opponent moves (might help winning chances many
moves from now)
Lecture 1: Introduction to Reinforcement Learning
The RL Problem
Environments

Agent and Environment

observation action

Ot At At each step t the agent:

Executes action At
Receives observation Ot
reward Rt
Receives scalar reward Rt
The environment:
Receives action At
Emits observation Ot+1
Emits scalar reward Rt+1
t increments at env. step
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

Major Components of an RL Agent

An RL agent may include one or more of these components:

Policy: agent’s behaviour function
Value function: how good is each state and/or action
Model: agent’s representation of the environment
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

Policy

A policy is the agent’s behaviour

It is a map from state to action, e.g.
Deterministic policy: a = π(s)
Stochastic policy: π(a|s) = P[At = a|St = s]
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

Value Function

Value function is a prediction of future reward

Used to evaluate the goodness/badness of states
And therefore to select between actions, e.g.

vπ (s) = Eπ Rt+1 + γRt+2 + γ 2 Rt+3 + ... | St = s

Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

Model

A model predicts what the environment will do next

P predicts the next state
R predicts the next (immediate) reward, e.g.
a 0
Pss 0 = P[St+1 = s | St = s, At = a]

Ras = E [Rt+1 | St = s, At = a]
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

Categorizing RL agents (1)

Value Based
No Policy (Implicit)
Value Function
Policy Based
Policy
No Value Function
Actor Critic
Policy
Value Function
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

Categorizing RL agents (2)

Model Free
Policy and/or Value Function
No Model
Model Based
Policy and/or Value Function
Model
Lecture 1: Introduction to Reinforcement Learning
Inside An RL Agent

RL Agent Taxonomy

Model-Free

Value Function Actor Policy

Critic

Value-Based Policy-Based

Model-Based

Model
Lecture 1: Introduction to Reinforcement Learning
Problems within RL

Exploration and Exploitation (1)

Reinforcement learning is like trial-and-error learning

The agent should discover a good policy
From its experiences of the environment
Without losing too much reward along the way
Lecture 1: Introduction to Reinforcement Learning
Problems within RL

Exploration and Exploitation (2)

Exploration finds more information about the environment

Exploitation exploits known information to maximise reward
It is usually important to explore as well as exploit
Lecture 1: Introduction to Reinforcement Learning
Problems within RL

Examples

Restaurant Selection
Exploitation Go to your favourite restaurant
Exploration Try a new restaurant
Online Banner Advertisements
Exploitation Show the most successful advert
Exploration Show a different advert
Oil Drilling
Exploitation Drill at the best known location
Exploration Drill at a new location
Game Playing
Exploitation Play the move you believe is best
Exploration Play an experimental move

Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
No ratings yet
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
110 pages
6 Types of Neural Network
No ratings yet
6 Types of Neural Network
8 pages
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
No ratings yet
Computational Graphs in Deep Learning Unit v4 Deep Leaerning
3 pages
Lora and Qlora
No ratings yet
Lora and Qlora
5 pages
AI-Enhanced QA: EmbeddingAlign RAG
No ratings yet
AI-Enhanced QA: EmbeddingAlign RAG
7 pages
Seaborn - Plots - Jupyter Notebook
No ratings yet
Seaborn - Plots - Jupyter Notebook
36 pages
Probability and Statistics For ML - Cwa
No ratings yet
Probability and Statistics For ML - Cwa
822 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
Testbank PyTorch Recipes ProblemSolution Approach To Build Train and Deploy Neural Network Models 2nd Edition Pradeepta Mishra Fast Access
No ratings yet
Testbank PyTorch Recipes ProblemSolution Approach To Build Train and Deploy Neural Network Models 2nd Edition Pradeepta Mishra Fast Access
327 pages
Inferential Statistics
No ratings yet
Inferential Statistics
111 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
UE20CS302 Unit3 Slides
No ratings yet
UE20CS302 Unit3 Slides
308 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Cours 1 - Intro To Deep Learning
100% (1)
Cours 1 - Intro To Deep Learning
38 pages
Pytorch Tutorial: - Ntu Machine Learning Course
No ratings yet
Pytorch Tutorial: - Ntu Machine Learning Course
64 pages
LLaMa Model Hallucination Analysis
No ratings yet
LLaMa Model Hallucination Analysis
3 pages
Bias-Variance Tradeoff Presentation
No ratings yet
Bias-Variance Tradeoff Presentation
11 pages
MLOps Notes
100% (1)
MLOps Notes
48 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Efficient Fine-Tuning with PEFT
No ratings yet
Efficient Fine-Tuning with PEFT
10 pages
Math Essentials for ML Enthusiasts
No ratings yet
Math Essentials for ML Enthusiasts
25 pages
Neural Networks & SVMs in AI
No ratings yet
Neural Networks & SVMs in AI
19 pages
MLOPS Notes
100% (1)
MLOPS Notes
5 pages
11 Machine Learning System Design PDF
No ratings yet
11 Machine Learning System Design PDF
7 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Heuristic Search
No ratings yet
Heuristic Search
49 pages
Chapter 3
No ratings yet
Chapter 3
24 pages
Gradient Descent for Deep Learning
No ratings yet
Gradient Descent for Deep Learning
21 pages
Quiz
No ratings yet
Quiz
2 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Train With Shubham Syllabus
No ratings yet
Train With Shubham Syllabus
61 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Decision Trees
No ratings yet
Decision Trees
32 pages
MLOps Syllabus and Weekly Schedule (June 2021) PDF
No ratings yet
MLOps Syllabus and Weekly Schedule (June 2021) PDF
5 pages
Artificial Neural Networks: Part 1/3
No ratings yet
Artificial Neural Networks: Part 1/3
25 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
Predicting BMW Prices with Regression
No ratings yet
Predicting BMW Prices with Regression
5 pages
GPT2 From Scratch in PyTorch
No ratings yet
GPT2 From Scratch in PyTorch
13 pages
Machine Learning Deep Learning
No ratings yet
Machine Learning Deep Learning
2 pages
Top 10 Machine Learning Algo PDF
No ratings yet
Top 10 Machine Learning Algo PDF
15 pages
Deep Learning - Question Bank
No ratings yet
Deep Learning - Question Bank
6 pages
A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
Apache Spark vs Dask: Big Data Tools
No ratings yet
Apache Spark vs Dask: Big Data Tools
55 pages
Machine Learning - 2 Books in 1 - The Complete Guide For Beginners To Master Neural Networks, Artificial Intelligence, and Data Science With Python (BooksRack - Net)
No ratings yet
Machine Learning - 2 Books in 1 - The Complete Guide For Beginners To Master Neural Networks, Artificial Intelligence, and Data Science With Python (BooksRack - Net)
201 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
30 pages
RLHF - Reinforcement Learning From Human Feedback
No ratings yet
RLHF - Reinforcement Learning From Human Feedback
21 pages
Hive L1
No ratings yet
Hive L1
134 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Btech CSE
100% (1)
Btech CSE
17 pages
PThread API Reference
No ratings yet
PThread API Reference
348 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
Knowledge Representation and Reasoning
No ratings yet
Knowledge Representation and Reasoning
155 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
7 Libraries That Help in Time-Series problems-AI Data Science
No ratings yet
7 Libraries That Help in Time-Series problems-AI Data Science
20 pages
Federated Learning - Hope and Scope
No ratings yet
Federated Learning - Hope and Scope
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Brochure JRF 2024 25 Revised
No ratings yet
Brochure JRF 2024 25 Revised
13 pages
ORAL COM Lesson 4 SPEECH CONTEXT Notes
No ratings yet
ORAL COM Lesson 4 SPEECH CONTEXT Notes
2 pages
(Official) AVTC5 - Unit 4 - Agree Disagree - Full Essay
No ratings yet
(Official) AVTC5 - Unit 4 - Agree Disagree - Full Essay
14 pages
Class-III Exam Syllabus & Pattern 2023
No ratings yet
Class-III Exam Syllabus & Pattern 2023
1 page
Uplift Behavior of Screw Anchors in Sand. I. Dry Sand
No ratings yet
Uplift Behavior of Screw Anchors in Sand. I. Dry Sand
21 pages
4.4 Bearing Pads: Instruction Manual Maintenance Instructions 4
No ratings yet
4.4 Bearing Pads: Instruction Manual Maintenance Instructions 4
2 pages
1 Schools of Thoughts
No ratings yet
1 Schools of Thoughts
35 pages
The Material Point Method in Slope Stability Analysis
No ratings yet
The Material Point Method in Slope Stability Analysis
2 pages
Reflection On Developing Law Enforcement Public Safety Knowledge Skills For The 21st Century
No ratings yet
Reflection On Developing Law Enforcement Public Safety Knowledge Skills For The 21st Century
3 pages
Salesmanship: A Form of Leadership: Dr. Dennis E. Maligaya
No ratings yet
Salesmanship: A Form of Leadership: Dr. Dennis E. Maligaya
22 pages
Manual - MOVI-PLC® SyncCrane Application - SEW Eurodrive (PDFDrive)
No ratings yet
Manual - MOVI-PLC® SyncCrane Application - SEW Eurodrive (PDFDrive)
120 pages
Probability, Random Variables, and Data Analytics With Engineering Applications P. Mohana Shankar PDF Download
No ratings yet
Probability, Random Variables, and Data Analytics With Engineering Applications P. Mohana Shankar PDF Download
161 pages
Physical Science Lesson 3 - Modern Astronomy
No ratings yet
Physical Science Lesson 3 - Modern Astronomy
13 pages
Grade 6 Math Volume Test
No ratings yet
Grade 6 Math Volume Test
3 pages
BRM Project Report
No ratings yet
BRM Project Report
30 pages
SPM English (CEFR) 1119/1
100% (3)
SPM English (CEFR) 1119/1
14 pages
Course Material Econdev Module 2
No ratings yet
Course Material Econdev Module 2
5 pages
Linear Algebra Assignment
No ratings yet
Linear Algebra Assignment
4 pages
Diamec U6: Electrical System Manual
100% (1)
Diamec U6: Electrical System Manual
21 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Copia de 2018 - 02 - 09 - California - Dept - of - Transportation - Dust - Palliative - Test - Standards - Accepta
No ratings yet
Copia de 2018 - 02 - 09 - California - Dept - of - Transportation - Dust - Palliative - Test - Standards - Accepta
1 page
RM&IPR Notes - Unit 1
No ratings yet
RM&IPR Notes - Unit 1
24 pages
Marketing Analysis for Brand A
67% (3)
Marketing Analysis for Brand A
9 pages
1 1 The Microscope in Cell Studies Xfw4yw6Npr8mpb6G
No ratings yet
1 1 The Microscope in Cell Studies Xfw4yw6Npr8mpb6G
67 pages
ISC-2025 Sample Question Paper - 3
No ratings yet
ISC-2025 Sample Question Paper - 3
6 pages
Web Results: Page 5 of About 497,000 Results (0.45 Seconds)
No ratings yet
Web Results: Page 5 of About 497,000 Results (0.45 Seconds)
3 pages
PT716B SBR
No ratings yet
PT716B SBR
4 pages
WDM Concepts and Components: Wavelength-Division Multiplexing or WDM. The Key System Features of WDM
No ratings yet
WDM Concepts and Components: Wavelength-Division Multiplexing or WDM. The Key System Features of WDM
114 pages
Theo Mulder Artigo
No ratings yet
Theo Mulder Artigo
10 pages
SPC 3763889 0101
No ratings yet
SPC 3763889 0101
41 pages

Reinforcement Learning - Introduction

Uploaded by

Reinforcement Learning - Introduction

Uploaded by

Lecture 1: Introduction to Reinforcement Learning

Lecture 1: Introduction to Reinforcement

Many Faces of Reinforcement Learning

Branches of Machine Learning

Characteristics of Reinforcement Learning

What makes reinforcement learning different from other machine

Examples of Reinforcement Learning

Fly stunt manoeuvres in a helicopter

A reward Rt is a scalar feedback signal

Sequential Decision Making

Goal: select actions to maximise total future reward

Agent and Environment

Ot At At each step t the agent:

Major Components of an RL Agent

An RL agent may include one or more of these components:

A policy is the agent’s behaviour

Value function is a prediction of future reward

vπ (s) = Eπ Rt+1 + γRt+2 + γ 2 Rt+3 + ... | St = s

A model predicts what the environment will do next

Categorizing RL agents (1)

Categorizing RL agents (2)

Value Function Actor Policy

Exploration and Exploitation (1)

Reinforcement learning is like trial-and-error learning

Exploration and Exploitation (2)

Exploration finds more information about the environment

You might also like