0% found this document useful (0 votes)

22 views45 pages

Unit 3 ML

Uploaded by

Ash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views45 pages

Unit 3 ML

Uploaded by

Ash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Logistic Regression in Machine Learning

o Logistic regression is one of the most popular

Machine Learning algorithms, which comes under
the Supervised Learning technique. It is used for
predicting the categorical dependent variable using
a given set of independent variables.
o Logistic regression predicts the output of a
categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It
can be either Yes or No, 0 or 1, true or False, etc.
but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between
0 and 1.
o Logistic Regression is much similar to the Linear
Regression except that how they are used. Linear
Regression is used for solving Regression problems,
whereas Logistic regression is used for solving
the classification problems.
o In Logistic regression, instead of fitting a regression
line, we fit an "S" shaped logistic function, which
predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the
likelihood of something such as whether the cells
are cancerous or not, a mouse is obese or not based
on its weight, etc.
o Logistic Regression is a significant machine learning
algorithm because it has the ability to provide
probabilities and classify new data using continuous
and discrete datasets.
o Logistic Regression can be used to classify the
observations using different types of data and can
easily determine the most effective variables used
for the classification. The below image is showing
the logistic function:

Note: Logistic regression uses the concept of predictive

modeling as regression; therefore, it is called logistic
regression, but is used to classify samples; Therefore, it
falls under the classification algorithm.
Logistic Function (Sigmoid Function):

o The sigmoid function is a mathematical function

used to map the predicted values to probabilities.
o It maps any real value into another value within a
range of 0 and 1.
o The value of the logistic regression must be
between 0 and 1, which cannot go beyond this limit,
so it forms a curve like the "S" form. The S-form
curve is called the Sigmoid function or the logistic
function.
o In logistic regression, we use the concept of the
threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold
value tends to 1, and a value below the threshold
values tends to 0.
Assumptions for Logistic Regression:

o The dependent variable must be categorical in

nature.
o The independent variable should not have multi-
collinearity.
Logistic Regression Equation:

The Logistic regression equation can be obtained from

the Linear Regression equation. The mathematical steps
to get Logistic Regression equations are given below:

o We know the equation of the straight line can be

written as:

o In Logistic Regression y can be between 0 and 1

only, so for this let's divide the above equation by
(1-y):
o But we need range between -[infinity] to +[infinity],
then take logarithm of the equation it will become:

The above equation is the final equation for Logistic

Regression.
Type of Logistic Regression:

On the basis of the categories, Logistic Regression can

be classified into three types:

o Binomial: In binomial Logistic regression, there can

be only two possible types of the dependent
variables, such as 0 or 1, Pass or Fail, etc.
o Multinomial: In multinomial Logistic regression,
there can be 3 or more possible unordered types of
the dependent variable, such as "cat", "dogs", or
"sheep"
o Ordinal: In ordinal Logistic regression, there can be
3 or more possible ordered types of dependent
variables, such as "low", "Medium", or "High".
EM Algorithm in Machine Learning

The EM algorithm is considered a latent variable

model to find the local maximum likelihood
parameters of a statistical model, proposed by
Arthur Dempster, Nan Laird, and Donald Rubin in
1977. The EM (Expectation-Maximization) algorithm is
one of the most commonly used terms in machine
learning to obtain maximum likelihood estimates of
variables that are sometimes observable and sometimes
not. However, it is also applicable to unobserved data or
sometimes called latent. It has various real-world
applications in statistics, including obtaining the mode
of the posterior marginal distribution of parameters
in machine learning and data mining applications.

In most real-life applications of machine learning, it is

found that several relevant learning features are
available, but very few of them are observable, and the
rest are unobservable. If the variables are observable,
then it can predict the value using instances. On the
other hand, the variables which are latent or directly not
observable, for such variables Expectation-Maximization
(EM) algorithm plays a vital role to predict the value with
the condition that the general form of probability
distribution governing those latent variables is known to
us. In this topic, we will discuss a basic introduction to
the EM algorithm, a flow chart of the EM algorithm, its
applications, advantages, and disadvantages of EM
algorithm, etc.
What is an EM algorithm?
The Expectation-Maximization (EM) algorithm is defined
as the combination of various unsupervised machine
learning algorithms, which is used to determine
the local maximum likelihood estimates
(MLE) or maximum a posteriori estimates (MAP) for
unobservable variables in statistical models. Further, it is
a technique to find maximum likelihood estimation
when the latent variables are present. It is also referred
to as the latent variable model.

A latent variable model consists of both observable and

unobservable variables where observable can be
predicted while unobserved are inferred from the
observed variable. These unobservable variables are
known as latent variables.

Key Points:

o It is known as the latent variable model to

determine MLE and MAP parameters for latent
variables.
o It is used to predict values of parameters in
instances where data is missing or unobservable for
learning, and this is done until convergence of the
values occurs.
EM Algorithm

The EM algorithm is the combination of various

unsupervised ML algorithms, such as the k-means
clustering algorithm. Being an iterative approach, it
consists of two modes. In the first mode, we estimate
the missing or latent variables. Hence it is referred to as
the Expectation/estimation step (E-step). Further, the
other mode is used to optimize the parameters of the
models so that it can explain the data more clearly. The
second mode is known as the maximization-step or M-
step.

o Expectation step (E - step): It involves the

estimation (guess) of all missing values in the
dataset so that after completing this step, there
should not be any missing value.
o Maximization step (M - step): This step involves
the use of estimated data in the E-step and
updating the parameters.
o Repeat E-step and M-step until the convergence of
the values occurs.

The primary goal of the EM algorithm is to use the

available observed data of the dataset to estimate the
missing data of the latent variables and then use that
data to update the values of the parameters in the M-
step.
What is Convergence in the EM algorithm?

Convergence is defined as the specific situation in

probability based on intuition, e.g., if there are two
random variables that have very less difference in their
probability, then they are known as converged. In other
words, whenever the values of given variables are
matched with each other, it is called convergence.
Steps in EM Algorithm

The EM algorithm is completed mainly in 4 steps, which

include Initialization Step, Expectation Step,
Maximization Step, and convergence Step. These
steps are explained as follows:
o 1st Step: The very first step is to initialize the
parameter values. Further, the system is provided
with incomplete observed data with the assumption
that data is obtained from a specific model.

o 2nd Step: This step is known as Expectation or E-

Step, which is used to estimate or guess the values
of the missing or incomplete data using the
observed data. Further, E-step primarily updates the
variables.
o 3rd Step: This step is known as Maximization or M-
step, where we use complete data obtained from
the 2nd step to update the parameter values.
Further, M-step primarily updates the hypothesis.
o 4th step: The last step is to check if the values of
latent variables are converging or not. If it gets
"yes", then stop the process; else, repeat the
process from step 2 until the convergence occurs.
Applications of EM algorithm

The primary aim of the EM algorithm is to estimate the

missing data in the latent variables through observed
data in datasets. The EM algorithm or latent variable
model has a broad range of real-life applications in
machine learning. These are as follows:

o The EM algorithm is applicable in data clustering in

machine learning.
o It is often used in computer vision and NLP (Natural
language processing).
o It is used to estimate the value of the parameter in
mixed models such as the Gaussian Mixture
Modeland quantitative genetics.
o It is also used in psychometrics for estimating item
parameters and latent abilities of item response
theory models.
o It is also applicable in the medical and healthcare
industry, such as in image reconstruction and
structural engineering.
o It is used to determine the Gaussian density of a
function.
Advantages of EM algorithm
o It is very easy to implement the first two basic steps
of the EM algorithm in various machine learning
problems, which are E-step and M- step.
o It is mostly guaranteed that likelihood will enhance
after each iteration.
o It often generates a solution for the M-step in the
closed form.
Disadvantages of EM algorithm

o The convergence of the EM algorithm is very slow.

o It can make convergence for the local optima only.
o It takes both forward and backward probability into
consideration. It is opposite to that of numerical
optimization, which takes only forward probabilities.
Conclusion

In real-world applications of machine learning, the

expectation-maximization (EM) algorithm plays a
significant role in determining the local maximum
likelihood estimates (MLE) or maximum a posteriori
estimates (MAP) for unobservable variables in statistical
models. It is often used for the latent variables, i.e., to
estimate the latent variables through observed data in
datasets. It is generally completed in two important
steps, i.e., the expectation step (E-step) and the
Maximization step (M-Step), where E-step is used to
estimate the missing data in datasets, and M-step is
used to update the parameters after the complete data
is generated in E-step. Further, the importance of the EM
algorithm can be seen in various applications such
as data clustering, natural language processing
(NLP), computer vision, image reconstruction,
structural engineering, etc.
Clustering in Machine Learning

Clustering or cluster analysis is a machine learning

technique, which groups the unlabelled dataset. It can
be defined as "A way of grouping the data points into
different clusters, consisting of similar data points.
The objects with the possible similarities remain in a
group that has less or no similarities with another
group."

It does it by finding some similar patterns in the

unlabelled dataset such as shape, size, color, behavior,
etc., and divides them as per the presence and absence
of those similar patterns.

It is an unsupervised learning method, hence no

supervision is provided to the algorithm, and it deals
with the unlabeled dataset.

After applying this clustering technique, each cluster or

group is provided with a cluster-ID. ML system can use
this id to simplify the processing of large and complex
datasets.
The clustering technique is commonly used
for statistical data analysis.
Note: Clustering is somewhere similar to
the classification algorithm, but the difference is the type
of dataset that we are using. In classification, we work
with the labeled data set, whereas in clustering, we work
with the unlabelled dataset.

Example: Let's understand the clustering technique with

the real-world example of Mall: When we visit any
shopping mall, we can observe that the things with
similar usage are grouped together. Such as the t-shirts
are grouped in one section, and trousers are at other
sections, similarly, at vegetable sections, apples,
bananas, Mangoes, etc., are grouped in separate
sections, so that we can easily find out the things. The
clustering technique also works in the same way. Other
examples of clustering are grouping documents
according to the topic.

The clustering technique can be widely used in various

tasks. Some most common uses of this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by
the Amazon in its recommendation system to provide
the recommendations as per the past search of
products. Netflix also uses this technique to
recommend the movies and web-series to its users as
per the watch history.

The below diagram explains the working of the

clustering algorithm. We can see the different fruits are
divided into several groups with similar properties.

Types of Clustering Methods

The clustering methods are broadly divided into Hard

clustering (datapoint belongs to only one group)
and Soft Clustering (data points can belong to another
group also). But there are also other various approaches
of Clustering exist. Below are the main clustering
methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering

It is a type of clustering that divides the data into non-

hierarchical groups. It is also known as the centroid-
based method. The most common example of
partitioning clustering is the K-Means Clustering
algorithm.

In this type, the dataset is divided into a set of k groups,

where K is used to define the number of pre-defined
groups. The cluster center is created in such a way that
the distance between the data points of one cluster is
minimum as compared to another cluster centroid.
Density-Based Clustering

The density-based clustering method connects the

highly-dense areas into clusters, and the arbitrarily
shaped distributions are formed as long as the dense
region can be connected. This algorithm does it by
identifying different clusters in the dataset and connects
the areas of high densities into clusters. The dense areas
in data space are divided from each other by sparser
areas.

These algorithms can face difficulty in clustering the

data points if the dataset has varying densities and high
dimensions.
Distribution Model-Based Clustering

In the distribution model-based clustering method, the

data is divided based on the probability of how a
dataset belongs to a particular distribution. The
grouping is done by assuming some distributions
commonly Gaussian Distribution.

The example of this type is the Expectation-

Maximization Clustering algorithm that uses Gaussian
Mixture Models (GMM).
Hierarchical Clustering

Hierarchical clustering can be used as an alternative for

the partitioned clustering as there is no requirement of
pre-specifying the number of clusters to be created. In
this technique, the dataset is divided into clusters to
create a tree-like structure, which is also called
a dendrogram. The observations or any number of
clusters can be selected by cutting the tree at the correct
level. The most common example of this method is
the Agglomerative Hierarchical algorithm.
Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data

object may belong to more than one group or cluster.
Each dataset has a set of membership coefficients, which
depend on the degree of membership to be in a
cluster. Fuzzy C-means algorithm is the example of this
type of clustering; it is sometimes also known as the
Fuzzy k-means algorithm.
Clustering Algorithms

The Clustering algorithms can be divided based on their

models that are explained above. There are different
types of clustering algorithms published, but only a few
are commonly used. The clustering algorithm is based
on the kind of data that we are using. Such as, some
algorithms need to guess the number of clusters in the
given dataset, whereas some are required to find the
minimum distance between the observation of the
dataset.

Here we are discussing mainly popular Clustering

algorithms that are widely used in machine learning:

1. K-Means algorithm: The k-means algorithm is one

of the most popular clustering algorithms. It
classifies the dataset by dividing the samples into
different clusters of equal variances. The number of
clusters must be specified in this algorithm. It is fast
with fewer computations required, with the linear
complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries
to find the dense areas in the smooth density of
data points. It is an example of a centroid-based
model, that works on updating the candidates for
centroid to be the center of the points within a
given region.
3. DBSCAN Algorithm: It stands for Density-Based
Spatial Clustering of Applications with Noise. It is
an example of a density-based model similar to the
mean-shift, but with some remarkable advantages.
In this algorithm, the areas of high density are
separated by the areas of low density. Because of
this, the clusters can be found in any arbitrary
shape.
4. Expectation-Maximization Clustering using
GMM: This algorithm can be used as an alternative
for the k-means algorithm or for those cases where
K-means can be failed. In GMM, it is assumed that
the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The
Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering. In this, each data
point is treated as a single cluster at the outset and
then successively merged. The cluster hierarchy can
be represented as a tree-structure.
6. Affinity Propagation: It is different from other
clustering algorithms as it does not require to
specify the number of clusters. In this, each data
point sends a message between the pair of data
points until convergence. It has O(N 2T) time
complexity, which is the main drawback of this
algorithm.
Applications of Clustering

Below are some commonly known applications of

clustering technique in Machine Learning:

o In Identification of Cancer Cells: The clustering

algorithms are widely used for the identification of
cancerous cells. It divides the cancerous and non-
cancerous data sets into different groups.
o In Search Engines: Search engines also work on the
clustering technique. The search result appears
based on the closest object to the search query. It
does it by grouping similar data objects in one
group that is far from the other dissimilar objects.
The accurate result of a query depends on the
quality of the clustering algorithm used.
o Customer Segmentation: It is used in market
research to segment the customers based on their
choice and preferences.
o In Biology: It is used in the biology stream to
classify different species of plants and animals using
the image recognition technique.
o In Land Use: The clustering technique is used in
identifying the area of similar lands use in the GIS
database. This can be very useful to find that for
what purpose the particular land should be used,
that means for which purpose it is more suitable.
Hierarchical Clustering in Machine Learning
Hierarchical clustering is another unsupervised machine
learning algorithm, which is used to group the unlabeled
datasets into a cluster and also known as hierarchical
cluster analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the
form of a tree, and this tree-shaped structure is known as
the dendrogram.
Sometimes the results of K-means clustering and hierarchical
clustering may look similar, but they both differ depending on
how they work. As there is no requirement to predetermine
the number of clusters as we did in the K-Means algorithm.
The hierarchical clustering technique has two approaches:
1. Agglomerative: Agglomerative is a bottom-
up approach, in which the algorithm starts with taking
all data points as single clusters and merging them until
one cluster is left.
2. Divisive: Divisive algorithm is the reverse of the
agglomerative algorithm as it is a top-down
approach.
Why hierarchical clustering?

As we already have other clustering algorithms such as K-

Means Clustering, then why we need hierarchical
clustering? So, as we have seen in the K-means clustering that
there are some challenges with this algorithm, which are a
predetermined number of clusters, and it always tries to create
the clusters of the same size. To solve these two challenges,
we can opt for the hierarchical clustering algorithm because,
in this algorithm, we don't need to have knowledge about the
predefined number of clusters.
In this topic, we will discuss the Agglomerative Hierarchical
clustering algorithm.
Agglomerative Hierarchical clustering
The agglomerative hierarchical clustering algorithm is a
popular example of HCA. To group the datasets into clusters,
it follows the bottom-up approach. It means, this
algorithm considers each dataset as a single cluster at the
beginning, and then start combining the closest pair of clusters
together. It does this until all the clusters are merged into a
single cluster that contains all the datasets.
This hierarchy of clusters is represented in the form of the
dendrogram.
How the Agglomerative Hierarchical clustering Work?
The working of the AHC algorithm can be explained using the
below steps:

o Step-1: Create each data point as a single cluster.

Let's say there are N data points, so the number of
clusters will also be N.

o Step-2: Take two closest data points or clusters and

merge them to form one cluster. So, there will now
be N-1 clusters.

o Step-3: Again, take the two closest clusters and

merge them together to form one cluster. There will
be N-2 clusters.

o Step-4: Repeat Step 3 until only one cluster left. So,

we will get the following clusters. Consider the
below images:
o Step-5: Once all the clusters are combined into one
big cluster, develop the dendrogram to divide the
clusters as per the problem.
Note: To better understand hierarchical clustering, it is
advised to have a look on k-means clustering
Measure for the distance between two clusters

As we have seen, the closest distance between the two

clusters is crucial for the hierarchical clustering. There are
various ways to calculate the distance between two clusters,
and these ways decide the rule for clustering. These measures
are called Linkage methods. Some of the popular linkage
methods are given below:

1. Single Linkage: It is the Shortest Distance between

the closest points of the clusters. Consider the below
image:

2. Complete Linkage: It is the farthest distance between

the two points of two different clusters. It is one of the
popular linkage methods as it forms tighter clusters than
single-linkage.

3. Average Linkage: It is the linkage method in which

the distance between each pair of datasets is added up
and then divided by the total number of datasets to
calculate the average distance between two clusters. It
is also one of the most popular linkage methods.
4. Centroid Linkage: It is the linkage method in which
the distance between the centroid of the clusters is
calculated. Consider the below image:

From the above-given approaches, we can apply any of them

according to the type of problem or business requirement.
Woking of Dendrogram in Hierarchical clustering
The dendrogram is a tree-like structure that is mainly used to
store each step as a memory that the HC algorithm performs.
In the dendrogram plot, the Y-axis shows the Euclidean
distances between the data points, and the x-axis shows all the
data points of the given dataset.
The working of the dendrogram can be explained using the
below diagram:
In the above diagram, the left part is showing how clusters are
created in agglomerative clustering, and the right part is
showing the corresponding dendrogram.

o As we have discussed above, firstly, the datapoints

P2 and P3 combine together and form a cluster,
correspondingly a dendrogram is created, which
connects P2 and P3 with a rectangular shape. The
hight is decided according to the Euclidean distance
between the data points.
o In the next step, P5 and P6 form a cluster, and the
corresponding dendrogram is created. It is higher
than of previous, as the Euclidean distance between
P5 and P6 is a little bit greater than the P2 and P3.
o Again, two new dendrograms are created that
combine P1, P2, and P3 in one dendrogram, and P4,
P5, and P6, in another dendrogram.
o At last, the final dendrogram is created that
combines all the data points together.
We can cut the dendrogram tree structure at any level as per
our requirement.
K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning

algorithm that is used to solve the clustering problems
in machine learning or data science. In this topic, we will
learn what is K-means clustering algorithm, how the
algorithm works, along with the Python implementation
of k-means clustering.
What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning

algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there
will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled
dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.

It allows us to cluster the data into different groups and

a convenient way to discover the categories of groups in
the unlabeled dataset on its own without the need for
any training.

It is a centroid-based algorithm, where each cluster is

associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between
the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input,

divides the dataset into k-number of clusters, and
repeats the process until it does not find the best
clusters. The value of k should be predetermined in this
algorithm.

The k-means clustering algorithm mainly performs two

tasks:

o Determines the best value for K center points or

centroids by an iterative process.
o Assigns each data point to its closest k-center.
Those data points which are near to the particular k-
center, create a cluster.

Hence each cluster has datapoints with some

commonalities, and it is away from other clusters.

The below diagram explains the working of the K-means

Clustering Algorithm:
How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in

the below steps:

Step-1: Select the number K to decide the number of

clusters.

Step-2: Select random K points or centroids. (It can be

other from the input dataset).

Step-3: Assign each data point to their closest centroid,

which will form the predefined K clusters.

Step-4: Calculate the variance and place a new centroid

of each cluster.

Step-5: Repeat the third steps, which means reassign

each datapoint to the new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4
else go to FINISH.

Step-7: The model is ready.

Let's understand the above steps by considering the

visual plots:

Suppose we have two variables M1 and M2. The x-y axis

scatter plot of these two variables is given below:

o Let's take number k of clusters, i.e., K=2, to identify

the dataset and to put them into different clusters.
It means here we will try to group these datasets
into two different clusters.
o We need to choose some random k points or
centroid to form the cluster. These points can be
either the points from the dataset or any other
point. So, here we are selecting the below two
points as k points, which are not the part of our
dataset. Consider the below image:

o Now we will assign each data point of the scatter

plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we
have studied to calculate the distance between two
points. So, we will draw a median between both the
centroids. Consider the below image:

From the above image, it is clear that points left side of

the line is near to the K1 or blue centroid, and points to
the right of the line are close to the yellow centroid. Let's
color them as blue and yellow for clear visualization.
o As we need to find the closest cluster, so we will
repeat the process by choosing a new centroid. To
choose the new centroids, we will compute the
center of gravity of these centroids, and will find
new centroids as below:

o Next, we will reassign each datapoint to the new

centroid. For this, we will repeat the same process of
finding a median line. The median will be like below
image:

From the above image, we can see, one yellow point is

on the left side of the line, and two blue points are right
to the line. So, these three points will be assigned to new
centroids.
As reassignment has taken place, so we will again go to
the step-4, which is finding new centroids or K-points.

o We will repeat the process by finding the center of

gravity of centroids, so the new centroids will be as
shown in the below image:

o As we got the new centroids so again will draw the

median line and reassign the data points. So, the
image will be:

o We can see in the above image; there are no

dissimilar data points on either side of the line,
which means our model is formed. Consider the
below image:
As our model is ready, so we can now remove the
assumed centroids, and the two final clusters will be as
shown in the below image:

How to choose the value of "K number of clusters" in K-

means Clustering?

The performance of the K-means clustering algorithm

depends upon highly efficient clusters that it forms. But
choosing the optimal number of clusters is a big task.
There are some different ways to find the optimal
number of clusters, but here we are discussing the most
appropriate method to find the number of clusters or
value of K. The method is given below:
Elbow Method

The Elbow method is one of the most popular ways to

find the optimal number of clusters. This method uses
the concept of WCSS value. WCSS stands for Within
Cluster Sum of Squares, which defines the total
variations within a cluster. The formula to calculate the
value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1) +∑Pi
2
in
Cluster2distance(Pi C2) +∑Pi in CLuster3 distance(Pi C3)
2 2

In the above formula of WCSS,

∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of

the distances between each data point and its centroid
within a cluster1 and the same for the other two terms.

To measure the distance between data points and

centroid, we can use any method such as Euclidean
distance or Manhattan distance.

To find the optimal value of clusters, the elbow method

follows the below steps:

o It executes the K-means clustering on a given

dataset for different K values (ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and
the number of clusters K.
o The sharp point of bend or a point of the plot looks
like an arm, then that point is considered as the
best value of K.
Since the graph shows the sharp bend, which looks like
an elbow, hence it is known as the elbow method. The
graph for the elbow method looks like the below image:

Note: We can choose the number of clusters equal to

the given data points. If we choose the number of
clusters equal to the data points, then the value of
WCSS becomes zero, and that will be the endpoint of
the plot.

UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
EM Algorithm
No ratings yet
EM Algorithm
5 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
Bayesian Networks & EM Algorithm
No ratings yet
Bayesian Networks & EM Algorithm
7 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Commonly Used For Clustering, Where Latent Variables Are Inferred and Has Applications in Various Fields, Including
No ratings yet
Commonly Used For Clustering, Where Latent Variables Are Inferred and Has Applications in Various Fields, Including
2 pages
ds11 2
No ratings yet
ds11 2
19 pages
ExpectationMaximization Algorithm
No ratings yet
ExpectationMaximization Algorithm
7 pages
Lec 13
No ratings yet
Lec 13
27 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Unit2 6
No ratings yet
Unit2 6
12 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Expectation-Maximization (E-M) Algorithm
No ratings yet
Expectation-Maximization (E-M) Algorithm
12 pages
AI29
No ratings yet
AI29
3 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
CIA I ML Important 16 Questions Answers PART A
No ratings yet
CIA I ML Important 16 Questions Answers PART A
35 pages
2B Naive Bayes
No ratings yet
2B Naive Bayes
90 pages
The em Algorithm in ML in Bayesian Learning
No ratings yet
The em Algorithm in ML in Bayesian Learning
12 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
EM Algorithm for Statisticians
No ratings yet
EM Algorithm for Statisticians
36 pages
Machine Learning PPT Part III
No ratings yet
Machine Learning PPT Part III
26 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
Classification Algorithm Guide
100% (2)
Classification Algorithm Guide
23 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
Predictive Analytics & Supervised Learning
No ratings yet
Predictive Analytics & Supervised Learning
17 pages
AI 4 Unit Notes
No ratings yet
AI 4 Unit Notes
47 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Autoregressive and Lag Models Guide
No ratings yet
Autoregressive and Lag Models Guide
8 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Unit 2
No ratings yet
Unit 2
133 pages
AI ML 3 Updated
No ratings yet
AI ML 3 Updated
34 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
AI UNIT 3 Tycs
No ratings yet
AI UNIT 3 Tycs
16 pages
ML LVC 1 Post-Session Summary
No ratings yet
ML LVC 1 Post-Session Summary
15 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Lecture 3 - Machine Learning and Data Driven Analysis
No ratings yet
Lecture 3 - Machine Learning and Data Driven Analysis
36 pages
Machine Learning for Professionals
No ratings yet
Machine Learning for Professionals
32 pages
ML Unit3 EM GMM VodnalaSrujana
No ratings yet
ML Unit3 EM GMM VodnalaSrujana
4 pages
Machine 2023 Part 1
No ratings yet
Machine 2023 Part 1
4 pages
Unit 8 Stastical Learning Method
No ratings yet
Unit 8 Stastical Learning Method
4 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
KCA 034 - Unit 2
No ratings yet
KCA 034 - Unit 2
97 pages
Logistic Regression 5
No ratings yet
Logistic Regression 5
61 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Lec-01-Introduction To Statistical Learning
No ratings yet
Lec-01-Introduction To Statistical Learning
38 pages
Beamer
No ratings yet
Beamer
34 pages
MLT 2021-22
No ratings yet
MLT 2021-22
14 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Broadly, There Are 3 Types of Machine Learning Algorithms.
No ratings yet
Broadly, There Are 3 Types of Machine Learning Algorithms.
33 pages
ML Notes
No ratings yet
ML Notes
38 pages
BT-2024 MSC
No ratings yet
BT-2024 MSC
2 pages
ML-2024 MSC
No ratings yet
ML-2024 MSC
2 pages
Big Data Bhag 4 Changes
No ratings yet
Big Data Bhag 4 Changes
26 pages
Blockchain 1 by Minal
No ratings yet
Blockchain 1 by Minal
53 pages
Unit 4
No ratings yet
Unit 4
1 page
Intrw
No ratings yet
Intrw
6 pages
Remainder Theorem Classnotes
No ratings yet
Remainder Theorem Classnotes
16 pages
Online Class 14 Congestion Control
No ratings yet
Online Class 14 Congestion Control
17 pages
Profit and Loss Practice Sheet
No ratings yet
Profit and Loss Practice Sheet
50 pages
Online Class 6
No ratings yet
Online Class 6
28 pages
Calculation Sheet Final
88% (8)
Calculation Sheet Final
10 pages
Online Class TCPUDP IP
No ratings yet
Online Class TCPUDP IP
18 pages
Java Programming Exam Guide
No ratings yet
Java Programming Exam Guide
2 pages
Typing
No ratings yet
Typing
5 pages
MCA Bridge Course Overview
No ratings yet
MCA Bridge Course Overview
10 pages
JAVA Practical File
No ratings yet
JAVA Practical File
97 pages
Session-2 AI Domains and Technologies
No ratings yet
Session-2 AI Domains and Technologies
2 pages
Tan Rethinking The Up-Sampling Operations in CNN-based Generative Network For Generalizable CVPR 2024 Paper
100% (1)
Tan Rethinking The Up-Sampling Operations in CNN-based Generative Network For Generalizable CVPR 2024 Paper
10 pages
3.4. A Comprehensive Guide To Convolutional Neural Networks - The ELI5 Way - by Sumit Saha - Towards Data Science
No ratings yet
3.4. A Comprehensive Guide To Convolutional Neural Networks - The ELI5 Way - by Sumit Saha - Towards Data Science
17 pages
Unit 4 - Week 3: Assignment 3
No ratings yet
Unit 4 - Week 3: Assignment 3
3 pages
ML Guide for Absolute Beginners
No ratings yet
ML Guide for Absolute Beginners
4 pages
Mechanical Systems and Signal Processing
No ratings yet
Mechanical Systems and Signal Processing
16 pages
Deep Learning Tools
No ratings yet
Deep Learning Tools
23 pages
WIREs Data Min Knowl - 2023 - Shaik - Remote Patient Monitoring Using Artificial Intelligence Current State
No ratings yet
WIREs Data Min Knowl - 2023 - Shaik - Remote Patient Monitoring Using Artificial Intelligence Current State
31 pages
Linear Regression Guide for Students
No ratings yet
Linear Regression Guide for Students
35 pages
Foundation of Data Science - CS3352 - Important Questions With Answer - Unit 2 - Describing Data
No ratings yet
Foundation of Data Science - CS3352 - Important Questions With Answer - Unit 2 - Describing Data
11 pages
How Can We Tell If Our Research Paper Has Been Accepted by An International Peer-Reviewed Journal or Conference
100% (3)
How Can We Tell If Our Research Paper Has Been Accepted by An International Peer-Reviewed Journal or Conference
21 pages
190319windercleaningdatascience1576692643371 PDF
No ratings yet
190319windercleaningdatascience1576692643371 PDF
110 pages
Building Scalable AI-Powered Applications With Clo
No ratings yet
Building Scalable AI-Powered Applications With Clo
9 pages
ML Unit-1
No ratings yet
ML Unit-1
43 pages
Reading 3 Machine Learning
No ratings yet
Reading 3 Machine Learning
9 pages
Language Models & NLP Overview
No ratings yet
Language Models & NLP Overview
1 page
Agriculture 12 01350 v2
No ratings yet
Agriculture 12 01350 v2
23 pages
Chapter 4
No ratings yet
Chapter 4
9 pages
Zen Data Science Syllabus
No ratings yet
Zen Data Science Syllabus
13 pages
ML GTU Question Bank
No ratings yet
ML GTU Question Bank
4 pages
Investigate Latest AI Features in Software Products That Support Retail Market
No ratings yet
Investigate Latest AI Features in Software Products That Support Retail Market
17 pages
Pps II Course File Semester II 2020 21 Final
No ratings yet
Pps II Course File Semester II 2020 21 Final
187 pages
Describe The ROC Curve and Its Significance in Assessing The Performance of Binary Classification Mo
No ratings yet
Describe The ROC Curve and Its Significance in Assessing The Performance of Binary Classification Mo
2 pages
Image - Deblurring
No ratings yet
Image - Deblurring
18 pages
Deep Learning Insights for Beginners
No ratings yet
Deep Learning Insights for Beginners
10 pages
A I Super Intelligence
No ratings yet
A I Super Intelligence
29 pages
ANN Forecasting for Project Cost
No ratings yet
ANN Forecasting for Project Cost
4 pages
Cme250 Lecture2
No ratings yet
Cme250 Lecture2
69 pages
1JS18IS074 Internship Report
No ratings yet
1JS18IS074 Internship Report
27 pages