[go: up one dir, main page]

0% found this document useful (0 votes)
9 views6 pages

ML Assignment 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

ML ASSIGNMENT – 2

Reinforcement Learning

Name: Karthik Nivedhan A


Roll number: 221225
ECE Third year

Sources:

What is Reinforcement Learning? - Reinforcement Learning Explained - AWS

AI

Comparison Between Model-Free And Model-Based Reinforcement Learning Algorithms In 2022 -


Techyv.com

Types of Reinforcement Learning - GeeksforGeeks

Q1. Reinforcement learning


1.Introduction
Reinforcement learning (RL) is a machine learning technique that trains
software to make decisions to achieve the most optimal results. It mimics the
trial-and-error learning process that humans use to achieve their goals.
Software actions that work towards your goal are reinforced, while actions that
detract from the goal are ignored.
RL algorithms use a reward-and-punishment paradigm as they process data.
They learn from the feedback of each action and self-discover the best
processing paths to achieve final outcomes. The algorithms are also capable of
delayed gratification. The best overall strategy may require short-term
sacrifices, so the best approach they discover may include some punishments or
backtracking along the way. RL is a powerful method to help artificial
intelligence systems achieve optimal outcomes in unseen environments.

2.Why Reinforcement learning?

Excels in complex environments


RL algorithms can be used in complex environments with many rules and
dependencies. In the same environment, a human may not be capable of
determining the best path to take, even with superior knowledge of the
environment. Instead, model-free RL algorithms adapt quickly to continuously
changing environments and find new strategies to optimize results.

Requires less human interaction


In traditional ML algorithms, humans must label data pairs to direct the
algorithm. When you use an RL algorithm, this isn’t necessary (unsupervised). It
learns by itself. At the same time, it offers mechanisms to integrate human
feedback, allowing for systems that adapt to human preferences, expertise, and
corrections.

Optimizes for long-term goals


RL inherently focuses on long-term reward maximization, which makes it apt for
scenarios where actions have prolonged consequences. It is particularly well-
suited for real-world situations where feedback isn't immediately available for
every step, since it can learn from delayed rewards.

3.Core Components of Reinforcement Learning


1. Agent: The decision-maker that interacts with the environment. It learns
a policy, which maps states to actions.
2. Environment: The external world with which the agent interacts. It
provides states and rewards to the agent.
3. State: The current situation or configuration of the environment.
4. Action: The choices available to the agent.
5. Reward: A numerical value indicating the desirability of a particular
action or outcome.

4.The Reinforcement Learning Process


1. Initialization: The agent starts in an initial state.
2. Action Selection: The agent chooses an action based on its current
policy.
3. Environment Transition: The environment transitions to a new state
based on the agent's action.
4. Reward: The environment provides a reward to the agent.
5. Policy Update: The agent updates its policy to improve its chances of
receiving higher rewards in the future.

5.Applications of Reinforcement Learning


Reinforcement learning has found applications in various domains, including:
 Game Playing: AlphaGo, DeepMind's AI that defeated the world
champion Go player, is a notable example.
 Robotics: RL has been used to train robots to perform tasks such as
grasping objects and navigating environments.
 Finance: RL can be used for algorithmic trading and risk management.
 Healthcare: RL can assist in medical diagnosis and treatment planning.
 Natural Language Processing: RL can be used for tasks like machine
translation and dialogue systems.

Q2. Types of RL with example

Reinforcement learning (RL) can be broadly categorized into two main types:
Model-Based and Model-Free. Each type has its own approach to learning
and decision-making.

 Model-Based RL: Builds a model of the environment to plan actions.


 Model-Free RL: Learns directly from experience without a model.

1.Model-Based Reinforcement Learning


In model-based RL, the agent attempts to build a model of the environment.
This model predicts the next state and reward given a current state and action.
Using this model, the agent can plan its actions and evaluate different policies.
Example: A self-driving car might learn a model of the traffic environment,
predicting the behaviour of other vehicles and pedestrians. Using this model, it
can plan its route and make decisions like changing lanes or braking.

2.Model-Free Reinforcement Learning


Model-free RL, on the other hand, doesn't require an explicit model of the
environment. Instead, it learns directly from experience, updating its policy
based on the rewards it receives.

Example: A game-playing AI might learn to play chess by playing numerous


games against itself. It doesn't need to understand the rules of chess explicitly;
it simply learns which moves lead to wins and losses.

Q3. Q- learning
1.Introduction
Q-learning is a popular model-free reinforcement learning algorithm that aims
to learn the optimal action-value function, often referred to as the Q-function.
This function estimates the expected future reward for taking a particular
action in a given state. By maximizing the Q-function, the agent can learn to
make decisions that lead to the highest cumulative reward.

2.Q-Function and updating function


The Q-function, denoted as Q (s, a), represents the expected future discounted
reward obtained by taking action an in-state s and then following the optimal
policy thereafter. The discount factor, denoted by γ, determines the importance
of future rewards relative to immediate ones.

 s: current state
 a: current action
 r: reward received
 s’: next state
 a’: next action
 α: learning rate
 γ: discount factor

The Q-Learning Update Rule


The Q-learning algorithm iteratively updates the Q-function based on the
Bellman equation:

 s is the current state


 a is the current action
 R (s, a) is the immediate reward received
 s' is the next state
 a' is the best action in the next state (according to the Q-function)
 α is the learning rate, which determines how much the Q-function is
updated based on new experiences

3.Overview of Q-Learning Algorithm


1. Initialize the Q-function: Set all Q-values to zero.
2. Choose an action: Select an action based on the current state and the Q-
function. This can be done using an ε-greedy policy, which chooses the
best action with probability 1 - ε and a random action with probability ε.
3. Observe the reward and next state: Take the chosen action, observe the
resulting reward and next state.
4. Update the Q-function: Use the Q-learning update rule to update the Q-
value for the current state and action.
5. Repeat: Repeat steps 2-4 until convergence or a desired number of
episodes.

4.Advantages of Q-Learning:
 Simple and effective for small action spaces.
 Doesn’t require a model of the environment.

 Q-learning has been applied to a wide range of problems, including: Game playing,
Robotics, Finance, Healthcare

In conclusion, Q-learning is a powerful and versatile algorithm for model-


free reinforcement learning. Its ability to learn optimal policies through
trial and error makes it suitable for a wide range of applications.

THANK YOU, MAM.

You might also like