0% found this document useful (0 votes)

11 views34 pages

Unit 6

The document provides an overview of reinforcement learning (RL), including its differences from supervised learning, key concepts such as Markov Decision Processes, Q-learning, and Deep Q-learning, as well as practical applications in robotics and automation. It discusses the advantages and disadvantages of RL, emphasizing its ability to solve complex problems and learn from interactions with the environment. Additionally, it outlines the elements of RL, such as policy, reward function, and value function, and highlights the importance of feedback in the learning process.

Uploaded by

renuka8177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views34 pages

Unit 6

Uploaded by

renuka8177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Reinforcement Learning

Dr. Kavita Mohite (Bhosle)

Introduction of deep reinforcement learning,
Markov Decision Process,
basic framework of reinforcement learning,
challenges of reinforcement learning,
Dynamic programming algorithms for reinforcement
learning,
Q Learning and Deep Q-Networks,
Deep Q recurrent networks,
Simple reinforcement learning for Tic-Tac-Toe.
1. Reinforcement learning differs from supervised
learning in a way that in supervised learning the
training data has the answer key with it so the
model is trained with the correct answer itself.

2. whereas in reinforcement learning, there is no

answer but the reinforcement agent decides what
to do to perform the given task.

3. In the absence of a training dataset, it is bound to

learn from its experience.
4. Reinforcement Learning (RL) is the science of decision
making. It is about learning the optimal behavior in an
environment to obtain maximum reward.

5. In RL, the data is accumulated from machine learning

systems that use a trial-and-error method. Data is not part
of the input.

6. RL uses algorithms that learn from outcomes and decide

which action to take next.

7. After each action, the algorithm receives feedback that

helps it determine whether the choice it made was
correct, neutral or incorrect.
8. It is a good technique to use for automated systems that
have to make a lot of small decisions without human
guidance.

9. RL is an autonomous, self-teaching system that

essentially learns by trial and error.

10. It performs actions with the aim of maximizing

rewards, or in other words, it is learning by doing in order
to achieve the best outcomes.
Reinforcement learning –

Input: The input should be an initial state from which the

model will start.

Output: There are many possible outputs as there are a

variety of solutions to a particular problem.

Training: The training is based upon the input, The model

will return a state and the user will decide to reward or
punish the model based on its output.

The model keeps continues to learn.

The best solution is decided based on the maximum
reward.
Difference between Reinforcement learning and Supervised learning:
Reinforcement learning Supervised learning

RL is all about making decisions

sequentially. In simple words, we In Supervised learning, the
can say that the output depends decision is made on the initial
on the state of the current input input or the input given at the
and the next input depends on the start
output of the previous input

In supervised learning the

In RL decision is dependent, So we
decisions are independent of each
give labels to sequences of
other so labels are given to each
dependent decisions
decision.

Example: Chess game,text Example: Object recognition, spam

summarization detetction
Positive: Positive Reinforcement is defined as when an event,
occurs due to a particular behavior, increases the strength and the
frequency of the behavior. In other words, it has a positive effect
on behavior. Advantages of reinforcement learning are:
Maximizes Performance
Sustain Change for a long period of time
Too much Reinforcement can lead to an overload of states
which can diminish the results
Negative: Negative Reinforcement is defined as strengthening of
behavior because a negative condition is stopped or avoided.
Advantages of reinforcement learning:
Increases Behavior
Provide defiance to a minimum standard of performance
It Only provides enough to meet up the minimum behavior
Neutral
Elements of Reinforcement Learning
Reinforcement learning elements are as follows:
Policy: Policy defines the learning agent behavior for given
time period. It is a mapping from perceived states of the
environment to actions to be taken when in those states.
Reward function: Reward function is used to define a goal
in a reinforcement learning problem. A reward function is
a function that provides a numerical score based on the
state of the environment
Value function: Value functions specify what is good in
the long run. The value of a state is the total amount of
reward an agent can expect to accumulate over the
future, starting from that state.
Model of the environment: Models are used for planning.
Various Practical Applications of Reinforcement
Learning –

RL can be used in robotics for industrial

automation.

RL can be used in machine learning and data

processing

RL can be used to create training systems that

provide custom instruction and materials
according to the requirement of students.
Application of Reinforcement Learnings
1. Robotics: Robots with pre-programmed
behavior are useful in structured
environments, such as the assembly line of an
automobile manufacturing plant, where the
task is repetitive in nature.

2. A master chess player makes a move. The

choice is informed both by planning, anticipating
possible replies and counter replies.

3. An adaptive controller adjusts parameters of a

petroleum refinery’s operation in real time.
RL can be used in large environments in the
following situations:

A model of the environment is known, but an

analytic solution is not available;

Only a simulation model of the environment is

given (the subject of simulation-based
optimization)

The only way to collect information about the

environment is to interact with
Advantages of Reinforcement learning
1. Reinforcement learning can be used to solve very
complex problems that cannot be solved by
conventional techniques.
2. The model can correct the errors that occurred
during the training process.
3. In RL, training data is obtained via the direct
interaction of the agent with the environment
4. Reinforcement learning can handle environments
that are non-deterministic, meaning that the
outcomes of actions are not always predictable. This
is useful in real-world applications where the
environment may change over time or is uncertain.
Advantages of Reinforcement learning
5. Reinforcement learning can be used to solve a
wide range of problems, including those that
involve decision making, control, and
optimization.
6. Reinforcement learning is a flexible approach
that can be combined with other machine
learning techniques, such as deep learning, to
improve performance.
Disadvantages of Reinforcement learning
1. Reinforcement learning is not preferable to use for
solving simple problems.

2. Reinforcement learning needs a lot of computation

3. Reinforcement learning is highly dependent on the quality

of the reward function. If the reward function is poorly
designed, the agent may not learn the desired behavior.

4. Reinforcement learning can be difficult to debug and

interpret. It is not always clear why the agent is behaving in a
certain way, which can make it difficult to diagnose and fix
problems.
Markov Decision Process
It allows machines and software agents to automatically
determine the ideal behavior within a specific context, in
order to maximize its performance.

Simple reward feedback is required for the agent to learn its

behavior; this is known as the reinforcement signal.

There are many different algorithms that tackle this issue.

In the problem, an agent is supposed to decide the best

action to select based on his current state. When this step is
repeated, the problem is known as a Markov Decision
Process.
A Markov Decision Process (MDP) model contains:

A set of possible world states S.

A set of Models.
A set of possible actions A.
A real-valued reward function R(s,a).
A policy the solution of Markov Decision Process.
State
A State is a set of tokens that represent every state that the agent can be in.
Model
A Model (sometimes called Transition Model) gives an action’s effect in a state. In
particular, T(S, a, S’) defines a transition T where being in state S and taking an action ‘a’
takes us to state S’ (S and S’ may be the same). For stochastic actions (noisy, non-
deterministic) we also define a probability P(S’|S,a) which represents the probability of
reaching a state S’ if action ‘a’ is taken in state S. Note Markov property states that the
effects of an action taken in a state depend only on that state and not on the prior
history.
Actions
An Action A is a set of all possible actions. A(s) defines the set of actions that can be
taken being in state S.

Reward
A Reward is a real-valued reward function. R(s) indicates the reward for simply being in
the state S. R(S,a) indicates the reward for being in a state S and taking an action ‘a’.
R(S,a,S’) indicates the reward for being in a state S, taking an action ‘a’ and ending up in
a state S’.

Policy
A Policy is a solution to the Markov Decision Process. A policy is a mapping from S to a.
It indicates the action ‘a’ to be taken while in state S.
Q-learning
The agent while being in that state may choose
from a set of allowable actions which may fetch
different rewards (or penalties).

Over time, The learning agent learns to maximize

these rewards to behave optimally at any given
state it is in.

Q-learning is a basic form of Reinforcement

Learning that uses Q-values (also called action
values) to iteratively improve the behavior of the
learning agent.
Q-learning in Reinforcement Learning
Q-learning is a popular model-free reinforcement learning
algorithm used in machine learning and artificial
intelligence applications. It falls under the category of
temporal difference learning techniques, in which an agent
picks up new information by observing results, interacting
with the environment, and getting feedback in the form of
rewards.

Components of Q-learning
Q-Values or Action-Values: Q-values are defined for states
and actions. Q(S, A) is an estimation of how good is it to
take the action A at the state S . This estimation of Q(S, A)
will be iteratively computed using the TD- Update
rule which we will see in the upcoming sections.
Deep Q-Learning
Q-Learning approach is not wrong in itself, this is only
practical for very small environments and quickly loses it’s
feasibility when the number of states and actions in the
environment increases. The solution for the above
problem comes from the realization that the values in the
matrix only have relative importance ie the values only
have importance with respect to the other values. Thus,
this thinking leads us to Deep Q-Learning which uses
a deep neural network to approximate the values.

The basic working step for Deep Q-Learning is that the

initial state is fed into the neural network and it returns
the Q-value of all possible actions as an output.
Define the Q-network: The Q-network is a deep neural network that
takes in the current state of the agent and outputs the Q-values for
each possible action. The Q-network can be defined using
TensorFlow’s Keras API.

Initialize the Q-network’s parameters: The Q-network’s parameters

can be initialized using TensorFlow’s variable initializers.

Define the loss function: The loss function is used to update the Q-
network’s parameters. The loss function is typically defined as the
mean squared error between the Q-network’s predicted Q-values
and the target Q-values.

Define the optimizer: The optimizer is used to minimize the loss

function and update the Q-network’s parameters. TensorFlow
provides a wide range of optimizers, such as Adam, RMSprop, etc.
Collect experience: The agent interacts with the environment and
collects experience in the form of (state, action, reward, next_state)
Simple reinforcement learning for Tic-Tac-Toe
Thank You

Unit V Reinforcement Learning and Genetic Algorithm
No ratings yet
Unit V Reinforcement Learning and Genetic Algorithm
40 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Ai PPT New
No ratings yet
Ai PPT New
14 pages
Module 1
No ratings yet
Module 1
72 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
Reinforcement
No ratings yet
Reinforcement
9 pages
L-14 - Reinforcement-L-d-07062024-111949am
No ratings yet
L-14 - Reinforcement-L-d-07062024-111949am
22 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
19 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
25 pages
Unit-6 Reinforcement Learning
No ratings yet
Unit-6 Reinforcement Learning
75 pages
Introduction To Prolog-Unit3
No ratings yet
Introduction To Prolog-Unit3
30 pages
Exp-14 Reinforcement Learning
No ratings yet
Exp-14 Reinforcement Learning
11 pages
Unit 5
No ratings yet
Unit 5
45 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
2 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Reinforcemnet Learning
No ratings yet
Reinforcemnet Learning
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
7 pages
UNIT V Reinforcement Learning
No ratings yet
UNIT V Reinforcement Learning
8 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
5 pages
Assignment 15 Modern AI
No ratings yet
Assignment 15 Modern AI
3 pages
UNIT-V-Reinforcement Learning
No ratings yet
UNIT-V-Reinforcement Learning
4 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
88 pages
RL Week - 1
No ratings yet
RL Week - 1
53 pages
Lecture 9 - Reinforced Learning
No ratings yet
Lecture 9 - Reinforced Learning
18 pages
IntroductiontoRL BR
No ratings yet
IntroductiontoRL BR
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
RL Vishnu Sankar
No ratings yet
RL Vishnu Sankar
26 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Unit 3
No ratings yet
Unit 3
29 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
5.5 Reinforcement Learning
No ratings yet
5.5 Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Module 01
No ratings yet
Module 01
66 pages
Reinforcement Learning Enhanced
No ratings yet
Reinforcement Learning Enhanced
3 pages
What Is Reinforcement Learning
No ratings yet
What Is Reinforcement Learning
15 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning Is An Autonomous
No ratings yet
Reinforcement Learning Is An Autonomous
3 pages
Reinforcement Learning (RL) : Agent
No ratings yet
Reinforcement Learning (RL) : Agent
35 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
23 pages
Reinforcement 2
No ratings yet
Reinforcement 2
2 pages
Assignment Accounting
No ratings yet
Assignment Accounting
5 pages
CIVE5024M Lecture
No ratings yet
CIVE5024M Lecture
14 pages
Deterministic Optimization Models in Operations Research
No ratings yet
Deterministic Optimization Models in Operations Research
68 pages
Ama 4305 Operations Research 1 Pp2
No ratings yet
Ama 4305 Operations Research 1 Pp2
4 pages
1st Quiz Manscie
No ratings yet
1st Quiz Manscie
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
22 pages
Hyperopt Guide for Freqtrade Optimization
No ratings yet
Hyperopt Guide for Freqtrade Optimization
27 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Lazy Notes C10 11 With Exercise
No ratings yet
Lazy Notes C10 11 With Exercise
4 pages
Goal Programming
No ratings yet
Goal Programming
16 pages
Linear Programming for Production Optimization
No ratings yet
Linear Programming for Production Optimization
11 pages
Hhaa 009
No ratings yet
Hhaa 009
51 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Process Capability Index and Quality Loss Function
100% (2)
Process Capability Index and Quality Loss Function
6 pages
Gas Portfolio and Transport Optimization
No ratings yet
Gas Portfolio and Transport Optimization
10 pages
Model-Based Deep Learning
No ratings yet
Model-Based Deep Learning
35 pages
Chapter08 BigM and TwoPhases Methods
No ratings yet
Chapter08 BigM and TwoPhases Methods
39 pages
Or Lec Notes
No ratings yet
Or Lec Notes
105 pages
Unit Vi: TO Artificial Neural Network
No ratings yet
Unit Vi: TO Artificial Neural Network
71 pages
Python Exam Practice - Exercises
No ratings yet
Python Exam Practice - Exercises
6 pages
Mba or Unit-Ii Notes
No ratings yet
Mba or Unit-Ii Notes
17 pages
ME371 Quality Engineering
No ratings yet
ME371 Quality Engineering
121 pages
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
No ratings yet
Logistic Regression: "And How Do You Know That These Fine Begonias Are Not of Equal Importance?"
21 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
169 pages
(Ebook) Bayesian Optimization in Action (MEAP V07) by Quan Nguyen ISBN 9781633439078, 1633439070 PDF Download
No ratings yet
(Ebook) Bayesian Optimization in Action (MEAP V07) by Quan Nguyen ISBN 9781633439078, 1633439070 PDF Download
95 pages
Monkey Algorithm For Global Numerical Optimization
No ratings yet
Monkey Algorithm For Global Numerical Optimization
13 pages
Exercise For Chapter-2
No ratings yet
Exercise For Chapter-2
2 pages
Ps and Solution CS229
No ratings yet
Ps and Solution CS229
55 pages
2022 1 Linear Programming
No ratings yet
2022 1 Linear Programming
8 pages
Management Science Summary
No ratings yet
Management Science Summary
5 pages

Unit 6

Uploaded by

Unit 6

Uploaded by

Reinforcement Learning

Dr. Kavita Mohite (Bhosle)

2. whereas in reinforcement learning, there is no

3. In the absence of a training dataset, it is bound to

5. In RL, the data is accumulated from machine learning

6. RL uses algorithms that learn from outcomes and decide

7. After each action, the algorithm receives feedback that

9. RL is an autonomous, self-teaching system that

10. It performs actions with the aim of maximizing

Input: The input should be an initial state from which the

Output: There are many possible outputs as there are a

Training: The training is based upon the input, The model

The model keeps continues to learn.

RL is all about making decisions

In supervised learning the

Example: Chess game,text Example: Object recognition, spam

RL can be used in robotics for industrial

RL can be used in machine learning and data

RL can be used to create training systems that

2. A master chess player makes a move. The

3. An adaptive controller adjusts parameters of a

A model of the environment is known, but an

Only a simulation model of the environment is

The only way to collect information about the

2. Reinforcement learning needs a lot of computation

3. Reinforcement learning is highly dependent on the quality

4. Reinforcement learning can be difficult to debug and

Simple reward feedback is required for the agent to learn its

There are many different algorithms that tackle this issue.

In the problem, an agent is supposed to decide the best

A set of possible world states S.

Over time, The learning agent learns to maximize

Q-learning is a basic form of Reinforcement

The basic working step for Deep Q-Learning is that the

Initialize the Q-network’s parameters: The Q-network’s parameters

Define the optimizer: The optimizer is used to minimize the loss

You might also like