Introduction to Reinforcement
Learning
Revolution History
Branches of ML
Branches of ML - Supervised Learning
• In Supervised Learning, models learn from labeled
training data, where input-output pairs are provided.
• The algorithm generalizes from this labeled data to
make predictions or classifications on new, unseen
data.
• Commonly used in tasks like image recognition,
natural language processing, and regression analysis.
Branches of ML - Unsupervised Learning
• UL deals with unlabeled data, aiming to discover
patterns, structures, or relationships within the data
itself.
• Clustering and dimensionality reduction are common
tasks in UL.
• Applications include customer segmentation, anomaly
detection, and feature extraction.
Branches of ML - Reinforcement Learning
• RL involves an agent learning to make decisions by
interacting with an environment.
• It receives feedback in the form of rewards or
penalties, guiding the agent toward optimal decision-
making strategies.
• RL is well-suited for scenarios where actions influence
future states, making it applicable in gaming, robotics,
and autonomous systems.
Can Machines Think?
The imitation game – Movie
Computing Machinery & Intelligence – Paper
What is intelligence according to
you?
-To be able to make decisions to achieve the goal
What is RL?
Example
Learning by interacting with the environment
RL Characteristics
• What makes reinforcement learning different
from other machine learning paradigms?
– There is no supervisor, only a reward signal
– Feedback is delayed, not instantaneous
– Time really matters - sequential
– Agent’s actions affect the subsequent data it
receives
Agent Environment Loop
Reward Hypothesis
• Any goal can be formalized as the outcome of
maximizing a cumulative reward
• Also we can consider minimizing the penalty
RL Problems
• Fly helicopter – inverse distance
• Walking robot – distance, speed
• Board games - maximize score or +1 (win) -1
(lose)
Reasons to learn
• Find a solution
– A program that plays chess very well
– A manufacturing robot with a specific purpose
• Adapt online to handle unforeseen
circumstances
– Chess program can learn to adapt to you
– Candy crush
– A robot that learns to navigate unknown terrains
What is RL?
Science and framework to make decisions from interactions
Thank You