ML Ch-3 Unsupervised Learning
ML Ch-3 Unsupervised Learning
Lecture 3
Agenda
Introduction
Understand the principles of unsupervised learning models
Clustering approaches: -
K-Means,
K nearest neighbors,
Hierarchical clustering
Correctly apply and evaluate clustering models
Reinforcement learning
Markov decision
2
Unsupervised Learning
models itself find the hidden patterns and insights from the
given data.
The goal of unsupervised learning is to find the underlying
structure of dataset, group that data according to
similarities, and represent that dataset in a compressed
format.
3
Why unsupervised learning
• Unsupervised learning is helpful for finding useful insights from
the data.
• Unsupervised learning is much similar as a human learns to
think by their own experiences, which makes it closer to the real
AI.
• Unsupervised learning works on unlabeled and uncategorized
data which make unsupervised learning more important.
• In real-world, we do not always have input data with the
corresponding output so to solve such cases, we need
unsupervised learning 4
How it works?
5
6
Clustering
It is the process of segregating a huge number of items into
small groups sharing similar characteristics.
For example,
If we have the list of all persons in Asia, We can group them
based on their nationalities like
Group 1: People belonging to India
Group 2: People belonging to China
Group 3: People belonging to Nepal etc...
7
Association
It is the process of measuring the degree of association between any 2
items.
For example,
If we go to a grocery shop, there is a high probability that we will buy a jam if
we already bought bread there.
This is because bread and jam are 2 items that are closely associated.
But, there is only a low probability that we will buy a biscuit if we already
bought a book.
This is because biscuits and books are not closely associated.
These kinds of associations can be identified using an extensive data
mining process.
This is nothing but Association rule mining.
8
Clustering vs association
Clustering finds commonalities
Association: used for finding the relationships between
variables in the large database.
It determines the set of items that occurs together in the dataset
Such as people who buy X item (suppose a bread) are also tend
to purchase Y (Butter/Jam) item.
9
Unsupervised learning
Unsupervised learning is used for more complex tasks as
compared to supervised learning
10
Applications of unsupervised learning
Clustering automatically split the dataset into groups base on
their similarities
Anomaly detection can discover unusual data points in your
dataset.
It is useful for finding fraudulent transactions
Association mining identifies sets of items which often occur
together in your dataset
Latent variable models are widely used for data preprocessing.
Like reducing the number of features in a dataset or decomposing
the dataset into multiple components
11
Application areas
• Market Segmentation
• Statistical data analysis
• Social network analysis
• Image segmentation
• Anomaly detection, etc.
12
Types of Clustering methods
Partitioning Clustering
Density-Based Clustering
Distribution Model-Based Clustering
Hierarchical Clustering
13
Types of Clustering methods…
Partitioning Clustering:-
divides the data into
nonhierarchical group.
E.g., K means clustering, K is
the number of groups
14 14
Types of Clustering methods…
Density-Based Clustering:- connects the highly-dense
areas into clusters,
This algorithm does it by identifying different clusters in the
dataset and connects the areas of high densities into clusters.
The dense areas in data space are divided from each other by
sparser areas.
These algorithms can face difficulty in clustering the data
points if the dataset has varying densities and high dimensions.
15
Types of Clustering methods…
16
Types of Clustering methods…
Distribution Model-Based Clustering:- the data is
divided based on the probability of how a dataset belongs to
a particular distribution.
The grouping is done by assuming some distributions
commonly Gaussian Distribution.
17
Types of Clustering methods…
Hierarchical Clustering:- can be used as an alternative for
the partitioned clustering as there is no requirement of pre-
specifying the number of clusters to be created.
18
Clustering Algorithms
K-Means algorithm:- It classifies the dataset by
dividing the samples into different clusters of equal
variances.
Mean-shift algorithm: it tries to find the dense areas in
the smooth density of data points.
Affinity Propagation: It is different from other
clustering algorithms as it does not require to specify
the number of clusters.
In this, each data point sends a message between the pair
of data points until convergence. 19
Applications of Clustering
In Identification of Cancer Cells: It divides the cancerous and
non-cancerous data sets into different groups.
In Search Engines: The search result appears based on the
closest object to the search query.
Customer Segmentation: It is used in market research to
segment the customers based on their choice and preferences.
In Biology: to classify different species of plants and animals
using the image recognition technique.
In Land Use: used in identifying the area of similar lands use
in the GIS database.
20
Reinforcement learning
is a type of machine learning method where an
intelligent agent (computer program) interacts with the
environment and learns to act within that.
E.g., How a Robotic dog learns the movement of his arms.
The agent continues doing these three things (take
action, change state/remain in the same state, and
get feedback), and by doing these actions, it learns
and explores the environment.
21
Reinforcement learning…
Terminologies:
Agent(): An entity that can perceive/explore the environment and act
upon it.
Environment(): A situation in which an agent is present or surrounded by.
Action(): are the moves taken by an agent within the environment.
State(): is a situation returned by the environment after each action taken
by the agent.
Reward(): A feedback returned to the agent from the environment to
evaluate the action of the agent.
Policy(): is a strategy applied by the agent for the next action based on
the current state.
Value(): It is expected long-term retuned with the discount factor and
opposite to the short-term reward. 22
23
Reinforcement Learning…
• is a feedback-based Machine learning technique in which an
agent learns to behave in an environment by performing the
actions and seeing the results of actions.
• For each good action, the agent gets positive feedback, and for
each bad action, the agent gets negative feedback or penalty.
The agent learns automatically using feedbacks without
any labeled data, unlike supervised learning.
Since there is no labeled data, so the agent is bound to
learn by its experience only.
24
Reinforcement Learning…
it solves a specific type of problem where decision
making is sequential, and the goal is long-term, such
as game-playing, robotics, etc.
The agent interacts with the environment and explores
it by itself.
The primary goal of an agent in reinforcement learning
is to improve the performance by getting the
maximum positive rewards.
25
Reinforcement Learning…
Key Features of Reinforcement Learning
In RL, the agent is not instructed about the environment
and what actions need to be taken.
It is based on the hit and trial process.
The agent takes the next action and changes states
according to the feedback of the previous action.
The agent may get a delayed reward.
The environment is stochastic, and the agent needs to
explore it to reach to get the maximum positive rewards.
26
Reinforcement learning…
27
28
29
Quiz
30
Quiz
1. What is the difference between regression and
classification algorithms
2. Write real world examples which can be solved by
regression and classification
31