[go: up one dir, main page]

0% found this document useful (0 votes)
8 views9 pages

RL Learning

Reinforcement Learning (RL) is a machine learning approach where agents learn by interacting with their environment through trial and error. Key components include the agent, environment, states, actions, rewards, and policies, with mathematical foundations provided by Markov Decision Processes (MDP). The document outlines various RL types, including model-free and model-based approaches, and highlights Q-Learning as a prominent algorithm for optimizing action-selection policies.

Uploaded by

shrutirajjmp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views9 pages

RL Learning

Reinforcement Learning (RL) is a machine learning approach where agents learn by interacting with their environment through trial and error. Key components include the agent, environment, states, actions, rewards, and policies, with mathematical foundations provided by Markov Decision Processes (MDP). The document outlines various RL types, including model-free and model-based approaches, and highlights Q-Learning as a prominent algorithm for optimizing action-selection policies.

Uploaded by

shrutirajjmp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Reinforcement Learning:

Teaching Machines to Learn


from Experience
Understanding the Decision-Making Power of AI

Presented by:

RA2311027010051 - DIYA SHARMA

RA2311027010063 - SHRUTI RAJ

RA2311027010065 - HARSH KUMAR


What is Reinforcement
Learning?
Definition
RL is machine learning where an agent learns by interacting
with an environment.

Key Idea
Learn by trial and error, like humans and animals.
Real-Life Examples of RL
AlphaGo defeating human champions
Robots learning to walk or grasp objects
AI mastering video games like Atari, Dota, Minecraft
Self-driving cars navigating roads
Components of Reinforcement
Learning
Agent The learner or decision maker

Environment The world the agent interacts


with

State (s) Current situation of the agent

Action (a) Possible moves the agent can


make

Reward (r) Feedback from the environment

Policy (π) Strategy the agent follows

Value (V/Q) Long-term value of state or


action
The Reinforcement Learning
Observe State
Agent perceives current environment state.

Choose Action
Agent selects and performs an action.

Receive Reward
Agent gets feedback from environment.

Update Knowledge
Agent adjusts policy or value estimates.

Repeat
Goal: maximize cumulative future rewards.
Mathematical Foundation: Markov Decision Processes
A Markov Decision Process (MDP) is a mathematical framework used to describe decision-making in situations where outcomes are
partly random and partly under the control of a decision-maker (agent).

It provides a formal model for environments in reinforcement learning.

🧱 MDP Components (5-Tuple)


An MDP is defined as a 5-tuple:

MDP=⟨S,A,P,R,γ⟩MDP=⟨S,A,P,R,γ⟩

States (S) Actions (A) Transition Probabilities (P)


Possible situations the agent can be in. Choices available to the agent. Likelihood of moving between states.

Reward Function (R) Discount Factor (γ)


Feedback received after actions. Importance of future rewards.
Types of Reinforcement
Model-Free vs Model- Value-Based Policy-Based Actor-Critic
Based
Examples: Q-Learning Example: REINFORCE Examples: PPO, A3C
Different approaches to algorithm combining value and
learning environment policy methods
dynamics.
Q-Learning Algorithm
Q-Learning is a model-free reinforcement learning algorithm used to learn the optimal action-selection policy for an agent interacting with an environment. The goal of Q-
Learning is to learn a policy that tells an agent the best action to take in each state in order to maximize the total cumulative reward over time.

Key Concepts:
State (s): The condition or configuration of the environment at any given time.
Action (a): The decision or move made by the agent.
Reward (r): The feedback received after taking an action in a state.
Q-value (Q(s, a)): The expected future reward for taking action aa in state ss. It represents the quality of the action taken in a given state.

Update Rule Parameters


Q(s,a) ← Q(s,a) + α [r + γ max Q(s',a') - Q(s,a)] α: learning rate
γ: discount factor
r: reward
s', a': next state and action
Summary and Next Steps
RL teaches agents to learn from experience.

Key components include agent, environment, rewards,


and policies.

Mathematical foundations guide algorithm design.

Explore advanced RL methods and applications next.

You might also like