0% found this document useful (0 votes)

14 views25 pages

Unit 4

Cluster analysis is an unsupervised machine learning technique used to group similar objects, utilizing various algorithms like K-Means, Hierarchical, and DBSCAN. Key concepts include clusters, centroids, and distance metrics, with workflows involving data preparation, algorithm selection, and evaluation. Hierarchical clustering can be agglomerative or divisive, offering flexibility in visualization and analysis, but may be computationally intensive and sensitive to noise.

Uploaded by

gtejashwini55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views25 pages

Unit 4

Uploaded by

gtejashwini55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Untitled

Sunday, August 11, 2024 2:58 PM

Unit-4
Cluster Analysis:
Cluster analysis, or clustering, is a type of unsupervised machine learning technique used to group a
set of objects in such a way that objects in the same group (or cluster) are more similar to each other
than to those in other groups. It's commonly used in data mining and exploratory data analysis to
uncover patterns or structure in data.
Key Concepts in Cluster Analysis:
Clusters: Groups of data points that are similar to each other based on certain features.
Centroid: The center of a cluster, typically represented as the mean of all points within the cluster.
Distance Metric: A measure used to quantify the similarity or dissimilarity between data points.
Common metrics include Euclidean distance, Manhattan distance, and cosine similarity.
Common Clustering Algorithms:
K-Means Clustering:
Hierarchical Clustering
DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
Mean Shift Clustering
Gaussian Mixture Models(GMM)
Choosing the Right Algorithm:
• K-Means: Best for well-separated, spherical clusters. Requires specifying the number of clusters.
• Hierarchical Clustering: Useful for detailed clustering hierarchy and when the number of clusters is
unknown.
• DBSCAN: Ideal for clusters of arbitrary shape and dealing with noise, but requires careful parameter
tuning.
• Mean Shift: Good for finding clusters of arbitrary shape without requiring the number of clusters.
• GMM: Suitable for data that follows a Gaussian distribution or when you need probabilistic
clustering.
Example Workflow:
Data Preparation:
o Clean and preprocess the data (e.g., normalization, handling missing values).
Choose Algorithm:
o Select an appropriate clustering algorithm based on data characteristics and goals.
Apply Algorithm:
o Implement the chosen clustering algorithm on the dataset.
Evaluate Results:
o Assess the quality of the clusters using metrics like silhouette score or visual inspection.
Interpret and Use Clusters:
o Analyse the clusters to derive insights and apply them to business problems or research
questions.

Cluster analysis is a powerful tool for discovering hidden patterns and relationships in data, making
it invaluable in many fields such as marketing, biology, and image processing.

Hierarchical Clustering:
Hierarchical clustering is a type of clustering algorithm that builds a hierarchy of clusters either by
progressively merging smaller clusters (agglomerative) or by recursively splitting a larger cluster
(divisive). The result of hierarchical clustering is typically presented as a dendrogram—a tree-like

AML Notes UNIT-4 Page 1

(divisive). The result of hierarchical clustering is typically presented as a dendrogram—a tree-like
diagram that illustrates the arrangement and similarity of clusters.
Key Concepts in Hierarchical Clustering:
Dendrogram: A tree diagram that shows the arrangement of clusters formed at various levels of
similarity or distance. The height of the branches represents the dissimilarity between clusters.
Agglomerative Hierarchical Clustering: A bottom-up approach where each data point starts as its
own cluster, and pairs of clusters are merged as one moves up the hierarchy.
It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC). A
structure that is more informative than the unstructured set of clusters returned by flat clustering.
This clustering algorithm does not require us to prespecify the number of clusters. Bottom-up
algorithms treat each data as a singleton cluster at the outset and then successively agglomerate
pairs of clusters until all clusters have been merged into a single cluster that contains all data.

Steps:
• Consider each alphabet as a single cluster and calculate the distance of one cluster from all the other
clusters.
• In the second step, comparable clusters are merged together to form a single cluster. Let’s say
cluster (B) and cluster (C) are very similar to each other therefore we merge them in the second step
similarly to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
• We recalculate the proximity according to the algorithm and merge the two nearest clusters([(DE),
(F)]) together to form new clusters as [(A), (BC), (DEF)]
• Repeating the same process; The clusters DEF and BC are comparable and merged together to form
a new cluster. We’re now left with clusters [(A), (BCDEF)].
• At last, the two remaining clusters are merged together to form a single cluster [(ABCDEF)].
Steps:
Initialization:
o Start with each data point as its own cluster.
Calculate Distances:
o Compute the distance between each pair of clusters using the chosen distance metric and
linkage criterion.
Merge Clusters:
o Identify the pair of clusters with the smallest distance (or largest similarity) and merge them
into a single cluster.

AML Notes UNIT-4 Page 2

into a single cluster.
Update Distances:
o Recalculate distances between the new cluster and the remaining clusters.
Repeat:
o Continue merging clusters until all data points are in a single cluster or until the desired
number of clusters is achieved.
Construct Dendrogram:
o Plot the dendrogram to visualize the hierarchical structure of clusters.
Example Workflow:
Compute Pairwise Distances: Calculate the initial distances between all individual data points.
Merge Closest Clusters: Merge the closest clusters and update the distance matrix.
Iterate: Repeat the merging process until all points are in one cluster.
Cut Dendrogram: Determine the number of clusters by cutting the dendrogram at the desired level.

Divisive Hierarchical Clustering:

A top-down approach where all data points start in a single cluster, and the cluster is recursively split
into smaller clusters.
It is also known as a top-down approach. This algorithm also does not require to prespecify the
number of clusters. Top-down clustering requires a method for splitting a cluster that contains the
whole data and proceeds by splitting clusters recursively until individual data have been split into
singleton clusters.

Computing Distance Matrix:

While merging two clusters we check the distance between two every pair of clusters and merge the
pair with the least distance/most similarity. But the question is how is that distance determined.
There are different ways of defining Inter Cluster distance/similarity. Some of them are:
1. Min Distance: Find the minimum distance between any two points of the cluster.

AML Notes UNIT-4 Page 3

1. Min Distance: Find the minimum distance between any two points of the cluster.
2. Max Distance: Find the maximum distance between any two points of the cluster.
3. Group Average: Find the average distance between every two points of the clusters.
4. Ward’s Method: The similarity of two clusters is based on the increase in squared error when two
clusters are merged.
For example, if we group a given data using different methods, we may get different results:

Steps:
Initialization:
o Start with all data points in a single cluster.
Split Clusters:
o Identify the cluster that should be split based on the largest dissimilarity or another criterion.
Recalculate Distances:
o Compute distances between the newly created clusters and the remaining clusters.
Repeat:
o Continue splitting clusters until the desired number of clusters is achieved or a stopping
criterion is met.
Example Workflow:
Start with One Cluster: Begin with all data points in one cluster.
Split the Cluster: Identify and split the cluster with the highest dissimilarity.
Recalculate Distances: Update the distances between the newly formed clusters.
Iterate: Continue splitting until the desired number of clusters is achieved.

Hierarchical Agglomerative vs Divisive Clustering

• Divisive clustering is more complex as compared to agglomerative clustering, as in the case of

divisive clustering we need a flat clustering method as “subroutine” to split each cluster until we
have each data having its own singleton cluster.
• Divisive clustering is more efficient if we do not generate a complete hierarchy all the way down to
individual data leaves. The time complexity of a naive agglomerative clustering is O(n3) because we

AML Notes UNIT-4 Page 4

individual data leaves. The time complexity of a naive agglomerative clustering is O(n3) because we
exhaustively scan the N x N matrix dist_mat for the lowest distance in each of N-1 iterations. Using
priority queue data structure we can reduce this complexity to O(n2logn). By using some more
optimizations it can be brought down to O(n2). Whereas for divisive clustering given a fixed number
of top levels, using an efficient flat algorithm like K-Means, divisive algorithms are linear in the
number of patterns and clusters.
• A divisive algorithm is also more accurate. Agglomerative clustering makes decisions by considering
the local patterns or neighbor points without initially taking into account the global distribution of
data. These early decisions cannot be undone. whereas divisive clustering takes into consideration
the global distribution of data when making top-level partitioning decisions.

Distance Metric: A method for measuring the dissimilarity between data points or clusters. Common
metrics include Euclidean distance, Manhattan distance, and others.
Linkage Criteria: The method used to determine the distance between clusters. Common linkage
criteria include:
o Single Linkage: The distance between the closest points in the two clusters (also known as
minimum linkage).
L(R,S)=min(D(I ,j )), I ϵ R, jϵ S
o Complete Linkage: The distance between the farthest points in the two clusters (also known
as maximum linkage).
L(R,S)=max(D(I ,j )), I ϵ R, jϵ S
o Average Linkage: The average distance between all pairs of points in the two clusters.
L(R,S)=1/nR×nS∑( i=1 to nR) ∑( j=1 to nS) D(I ,j),I ∈R , j∈S
where,
• nR : Number of data-points in R
• nS : Number of data-points in S
○ Ward’s Linkage: Minimizes the variance within each cluster by merging clusters that result in
the smallest increase in total within-cluster variance.
Advantages of Hierarchical Clustering:
• No Need to Specify Number of Clusters: Unlike methods like K-Means, hierarchical clustering
does not require specifying the number of clusters in advance.
• Dendrogram Visualization: Provides a clear visual representation of the clustering process and
hierarchy.
• Flexibility: Can handle various types of data and distance metrics.
Disadvantages of Hierarchical Clustering:
• Computational Complexity: Can be computationally intensive, especially for large datasets.
• Scalability: May not scale well with very large datasets due to the quadratic complexity of
distance computations.
• Sensitivity to Noise: Can be sensitive to noise and outliers in the data.
Applications of Hierarchical Clustering:
• Gene Expression Analysis: Grouping genes with similar expression patterns.
• Customer Segmentation: Identifying customer groups with similar purchasing behaviors.
• Document Clustering: Organizing documents into hierarchical categories based on content
similarity.
• Image Analysis: Grouping similar images or features for image recognition tasks.
Hierarchical clustering is a powerful tool for exploratory data analysis and pattern discovery, offering

AML Notes UNIT-4 Page 5

Hierarchical clustering is a powerful tool for exploratory data analysis and pattern discovery, offering
flexibility in understanding and visualizing the structure within datasets.

Single Linkage Method:

The Single Linkage Method is a technique used in hierarchical clustering to build a hierarchical tree,
also known as a dendrogram, which represents nested clusters of data points. This method is also
known as nearest neighbor clustering.
Key Concepts:
Hierarchical Clustering:
o Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of
clusters. It can be divided into two main types:
▪ Agglomerative (Bottom-Up): Starts with each data point as its own cluster and merges
clusters iteratively.
▪ Divisive (Top-Down): Starts with all data points in a single cluster and splits clusters
iteratively.
Single Linkage Method:
The Single Linkage Method is a specific approach within agglomerative hierarchical clustering.

AML Notes UNIT-4 Page 6

o The Single Linkage Method is a specific approach within agglomerative hierarchical clustering.
It merges clusters based on the minimum distance between them.
o In this method, the distance between two clusters is defined as the shortest distance between
any single pair of points, where one point is in the first cluster and the other point is in the
second cluster.
Steps in the Single Linkage Method:
Initialization:
o Start with each data point as its own individual cluster.
Calculate Pairwise Distances:
o Compute the distance between each pair of clusters. For two clusters Ci and Cj , the distance
d(Ci, Cj) is given by: d(Ci, Cj )=min {d(x, y)∣ x∈ Ci, y∈ Cj} where d(x, y) is the distance between
points x and y.
Merge Clusters:
o Find the pair of clusters with the smallest distance and merge them into a single cluster.
Update Distances:
o Update the distances between the newly formed cluster and the remaining clusters.
Repeat:
o Repeat steps 2-4 until all data points are merged into a single cluster or until the desired
number of clusters is obtained.
Construct Dendrogram:
o A dendrogram (a tree diagram) is constructed to visualize the clustering process, showing how
clusters are merged at each step.
Example:
Consider a set of data points in a 2D space. Here’s a simplified illustration of how the Single Linkage
Method works:
Start with individual clusters:
o Each data point starts as its own cluster.
Compute pairwise distances:
o Calculate the distance between all pairs of points and identify the smallest distance.
Merge clusters:
o Merge the pair of clusters with the smallest distance. For example, if the closest pair of points
are in clusters C1 and C2, merge C1 and C2.
Update distances:
o Recalculate the distance between the newly formed cluster and all other clusters using the
minimum distance criterion.
Repeat until complete:
o Continue merging clusters based on the smallest distance and update the dendrogram
accordingly.
Advantages:
• Simplicity: The method is straightforward and easy to understand.

AML Notes UNIT-4 Page 7

• Flexibility: Can work well with different types of distance metrics.
Limitations:
• Chaining Effect: Single linkage clustering can lead to the "chaining effect," where clusters may
be elongated and may form a chain-like structure rather than compact clusters.
• Sensitivity to Noise: Sensitive to noise and outliers, which can affect the clustering results.
Comparison with Other Methods:
• Complete Linkage Method: Unlike single linkage, which uses the minimum distance, complete
linkage uses the maximum distance between any two points in the clusters to determine the
distance between clusters.
• Average Linkage Method: Uses the average distance between all pairs of points in the
clusters.
Applications:
• Data Analysis: Useful for exploratory data analysis to understand the structure and
relationships within the data.
• Bioinformatics: Often used for clustering genes or proteins based on expression data.
The Single Linkage Method is a fundamental technique in hierarchical clustering that is particularly
useful for understanding data structure and forming clusters based on the closest pairwise distances
between data points.

AML Notes UNIT-4 Page 8

K-Means and KNN Clustering Algorithms:
K-Means and K-Nearest Neighbors (KNN) are commonly used algorithms in machine learning, but
they serve different purposes and are applied in different contexts. Here’s a breakdown of each
algorithm:
K-Means Clustering:
K-Means is a clustering algorithm used to partition a dataset into k distinct, non-overlapping groups
or clusters. The objective is to minimize the variance within each cluster.
K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering
problems in machine learning or data science. In this topic, we will learn what is K-means clustering
algorithm, how the algorithm works, along with the Python implementation of k-means clustering.
What is K-Means Algorithm?
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled
dataset into different clusters. Here K defines the number of pre-defined clusters that need to be
created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters,

AML Notes UNIT-4 Page 9

created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters,
and so on.
"It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties."
It allows us to cluster the data into different groups and a convenient way to discover the
categories of groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main
aim of this algorithm is to minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of
clusters, and repeats the process until it does not find the best clusters. The value of k should be
predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:

• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of
each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
Let's understand the above steps by considering the visual plots:
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given
below:

AML Notes UNIT-4 Page 10

• Let's take number k of clusters, i.e., K=2, to identify the dataset and to put them into different
clusters. It means here we will try to group these datasets into two different clusters.
• We need to choose some random k points or centroid to form the cluster. These points can be
either the points from the dataset or any other point. So, here we are selecting the below two
points as k points, which are not the part of our dataset. Consider the below image:

• Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will
compute it by applying some mathematics that we have studied to calculate the distance
between two points. So, we will draw a median between both the centroids. Consider the
below image:

AML Notes UNIT-4 Page 11

below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid,
and points to the right of the line are close to the yellow centroid. Let's color them as blue and
yellow for clear visualization.

• As we need to find the closest cluster, so we will repeat the process by choosing a new
centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:

AML Notes UNIT-4 Page 12

• Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same
process of finding a median line. The median will be like below image:

From the above image, we can see, one yellow point is on the left side of the line, and two blue
points are right to the line. So, these three points will be assigned to new centroids.

AML Notes UNIT-4 Page 13

As reassignment has taken place, so we will again go to the step-4, which is finding new centroids or
K-points.
• We will repeat the process by finding the center of gravity of centroids, so the new centroids
will be as shown in the below image:

• As we got the new centroids so again will draw the median line and reassign the data points.
So, the image will be:

AML Notes UNIT-4 Page 14

• We can see in the above image; there are no dissimilar data points on either side of the line,
which means our model is formed. Consider the below image:

As our model is ready, so we can now remove the assumed centroids, and the two final clusters will
be as shown in the below image:

AML Notes UNIT-4 Page 15

How to choose the value of "K number of clusters" in K-means Clustering?

The performance of the K-means clustering algorithm depends upon highly efficient clusters that it
forms. But choosing the optimal number of clusters is a big task. There are some different ways to
find the optimal number of clusters, but here we are discussing the most appropriate method to find
the number of clusters or value of K. The method is given below:
Elbow Method:
The Elbow method is one of the most popular ways to find the optimal number of clusters. This
method uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which
defines the total variations within a cluster. The formula to calculate the value of WCSS (for 3
clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2
In the above formula of WCSS,
∑Pi in Cluster1 distance(Pi C1)2: It is the sum of the square of the distances between each data point and
its centroid within a cluster1 and the same for the other two terms.
To measure the distance between data points and centroid, we can use any method such as
Euclidean distance or Manhattan distance.
To find the optimal value of clusters, the elbow method follows the below steps:
• It executes the K-means clustering on a given dataset for different K values (ranges from 1-10).
• For each value of K, calculates the WCSS value.
• Plots a curve between calculated WCSS values and the number of clusters K.
• The sharp point of bend or a point of the plot looks like an arm, then that point is considered
as the best value of K.

The steps to be followed for the implementation are given below:

• Data Pre-processing
• Finding the optimal number of clusters using the elbow method
• Training the K-means algorithm on the training dataset
• Visualizing the clusters

Characteristics:
• Objective Function: Minimizes the within-cluster sum of squares (variance), defined as:
J=∑i=1 to k ∑x ∈Ci ∥x− μi∥^2

AML Notes UNIT-4 Page 16

J=∑i=1 to k ∑x ∈Ci ∥x− μi∥^2
where μ i is the centroid of cluster Ci, and x is a data point in cluster Ci.
• Distance Metric: Typically uses Euclidean distance to measure similarity between data points
and centroids.
Pros:
• Simple and Fast: Efficient for large datasets with a fixed number of clusters.
• Scalability: Works well with a large number of data points and features.
Cons:
• Requires k to be Specified: You need to know the number of clusters in advance, which can be
challenging.
• Sensitivity to Initialization: Can converge to local minima depending on the initial centroids.
• Assumes Spherical Clusters: Performs best when clusters are spherical and of similar size.
K-Nearest Neighbors (KNN):
K-Nearest Neighbors (KNN) is a classification and regression algorithm used to predict the class or
value of a new data point based on the classes or values of its nearest neighbors.
How KNN Works:
Choose k:
o Select the number k of nearest neighbors to consider.
Distance Calculation:
o Compute the distance between the new data point and all existing data points using a distance
metric (e.g., Euclidean distance).
Identify Neighbors:
o Find the k nearest data points to the new data point.
Classification or Regression:
o Classification: Assign the most common class label among the k nearest neighbors.
o Regression: Predict the value as the average (or weighted average) of the values of the k
nearest neighbors.
Characteristics:
• Instance-Based Learning: KNN does not explicitly learn a model but makes predictions based
on the training data during the prediction phase.
• Distance Metric: Commonly uses Euclidean distance for continuous variables, but other
metrics (e.g., Manhattan, Minkowski) can also be used.
Pros:
• Simple to Implement: Easy to understand and implement.
• No Training Phase: Instantaneously adapts to new data without retraining.
• Versatile: Can be used for both classification and regression.
Cons:
• Computationally Expensive: Requires distance computations for each query point, which can
be slow for large datasets.

AML Notes UNIT-4 Page 17

• Sensitive to Noise: Performance can be affected by noisy or irrelevant features.
• Scalability Issues: Becomes inefficient with a very large number of data points or high-
dimensional data.
Summary of Differences:
• Purpose:
o K-Means: Clustering algorithm for grouping similar data points into clusters.
o KNN: Classification or regression algorithm that makes predictions based on the
nearest neighbors.
• Output:
o K-Means: Produces k clusters with centroids.
o KNN: Provides a class label or value for new data points based on the nearest
neighbors.
• Training Phase:
o K-Means: Involves an iterative process to update centroids and assign data points
to clusters.
o KNN: No explicit training phase; predictions are made directly from the training
data.
• Use Cases:
o K-Means: Used for market segmentation, image compression, and anomaly
detection.
o KNN: Used for handwriting recognition, recommendation systems, and predictive
modeling.
Both algorithms are foundational in machine learning and data analysis, each suitable for different
types of tasks and data scenarios.

Time Series Forecasting : ARMA Model ,ARCH & GARCH Model:

Time series forecasting involves predicting future values based on previously observed values in a
time series. Several statistical models are used for time series analysis, including ARMA, ARCH, and
GARCH models. Here’s a detailed overview of each:
1. ARMA Model (AutoRegressive Moving Average)
The ARMA model is a classical time series model that combines two components:
• AutoRegressive (AR): Captures the relationship between an observation and a number of
lagged observations (previous values).
• Moving Average (MA): Models the relationship between an observation and a number of
lagged forecast errors.

AR, MA, ARMA, and ARIMA models are used to forecast the observation at (t+1) based on the

historical data of previous time spots recorded for the same observation. However, it is necessary to

make sure that the time series is stationary over the historical data of observation overtime period.

If the time series is not stationary then we could apply the differencing factor on the records and see

if the graph of the time series is a stationary overtime period.

AML Notes UNIT-4 Page 18

if the graph of the time series is a stationary overtime period.
ACF (Auto Correlation Function)

Auto Correlation function takes into consideration of all the past observations irrespective of its

effect on the future or present time period. It calculates the correlation between the t and (t-k) time

period. It includes all the lags or intervals between t and (t-k) time periods. Correlation is always

calculated using the Pearson Correlation formula.

PACF(Partial Correlation Function)

The PACF determines the partial correlation between time period t and t-k. It doesn’t take into

consideration all the time lags between t and t-k. For e.g. let's assume that today's stock price may

be dependent on 3 days prior stock price but it might not take into consideration yesterday's stock

price closure. Hence we consider only the time lags having a direct impact on future time period by

neglecting the insignificant time lags in between the two-time slots t and t-k.
How to differentiate when to use ACF and PACF?

Let's take an example of sweets sale and income generated in a village over a year. Under the

assumption that every 2 months there is a festival in the village, we take out the historical data of

sweets sale and income generated for 12 months. If we plot the time as month then we can observe

that when it comes to calculating the sweets sale we are interested in only alternate months as the

sale of sweets increases every two months. But if we are to consider the income generated next

month then we have to take into consideration all the 12 months of last year.

So in the above situation, we will use ACF to find out the income generated in the future but we will

be using PACF to find out the sweets sold in the next month.
AR (Auto-Regressive) Model

AML Notes UNIT-4 Page 19

The time period at t is impacted by the observation at various slots t-1, t-2, t-3, ….., t-k. The impact

of previous time spots is decided by the coefficient factor at that particular period of time. The price

of a share of any particular company X may depend on all the previous share prices in the time

series. This kind of model calculates the regression of past time series and calculates the present or

future values in the series in know as Auto Regression (AR) model.

Yt = β₁* y-₁ + β₂* yₜ-₂ + β₃ * yₜ-₃ + ………… + βₖ * yₜ-ₖ

Consider an example of a milk distribution company that produces milk every month in the country.

We want to calculate the amount of milk to be produced current month considering the milk

generated in the last year. We begin by calculating the PACF values of all the 12 lags with respect to

the current month. If the value of the PACF of any particular month is more than a significant value

only those values will be considered for the model analysis.

For e.g in the above figure the values 1,2, 3 up to 12 displays the direct effect(PACF) of the milk

production in the current month w.r.t the given the lag t. If we consider two significant values above

the threshold then the model will be termed as AR(2).

MA (Moving Average) Model

AML Notes UNIT-4 Page 20

The time period at t is impacted by the unexpected external factors at various slots t-1, t-2, t-3, …..,

t-k. These unexpected impacts are known as Errors or Residuals. The impact of previous time spots is

decided by the coefficient factor α at that particular period of time. The price of a share of any

particular company X may depend on some company merger that happened overnight or maybe the

company resulted in shutdown due to bankruptcy. This kind of model calculates the residuals or

errors of past time series and calculates the present or future values in the series in know as Moving

Average (MA) model.

Yt = α₁* Ɛₜ-₁ + α₂ * Ɛₜ-₂ + α₃ * Ɛₜ-₃ + ………… + αₖ * Ɛₜ-ₖ

Consider an example of Cake distribution during my birthday. Let's assume that your mom asks you

to bring pastries to the party. Every year you miss judging the no of invites to the party and end

upbringing more or less no of cakes as per requirement. The difference in the actual and expected

results in the error. So you want to avoid the error for this year hence we apply the moving average

model on the time series and calculate the no of pastries needed this year based on past collective

errors. Next, calculate the ACF values of all the lags in the time series. If the value of the ACF of any

particular month is more than a significant value only those values will be considered for the model

analysis.

For e.g in the above figure the values 1,2, 3 up to 12 displays the total error(ACF) of count in pastries

current month w.r.t the given the lag t by considering all the in-between lags between time t and

current month. If we consider two significant values above the threshold then the model will be

termed as MA(2).

AML Notes UNIT-4 Page 21

termed as MA(2).
ARMA (Auto Regressive Moving Average) Model

This is a model that is combined from the AR and MA models. In this model, the impact of previous

lags along with the residuals is considered for forecasting the future values of the time series. Here β

represents the coefficients of the AR model and α represents the coefficients of the MA model.

Yt = β₁* yₜ-₁ + α₁* Ɛₜ-₁ + β₂* yₜ-₂ + α₂ * Ɛₜ-₂ + β₃ * yₜ-₃ + α₃ * Ɛₜ-₃ +………… + βₖ * yₜ-ₖ + αₖ * Ɛₜ-ₖ

Consider the above graphs where the MA and AR values are plotted with their respective significant

values. Let's assume that we consider only 1 significant value from the AR model and likewise 1

significant value from the MA model. So the ARMA model will be obtained from the combined

values of the other two models will be of the order of ARMA(1,1).

ARIMA (Auto-Regressive Integrated Moving Average) Model

AML Notes UNIT-4 Page 22

We know that in order to apply the various models we must in the beginning convert the series into

Stationary Time Series. In order to achieve the same, we apply the differencing or Integrated

method where we subtract the t-1 value from t values of time series. After applying the first

differencing if we are still unable to get the Stationary time series then we again apply the second-

order differencing.

The ARIMA model is quite similar to the ARMA model other than the fact that it includes one more

factor known as Integrated( I ) i.e. differencing which stands for I in the ARIMA model. So in short

ARIMA model is a combination of a number of differences already applied on the model in order to

make it stationary, the number of previous lags along with residuals errors in order to forecast

future values.

Consider the above graphs where the MA and AR values are plotted with their respective significant

values. Let's assume that we consider only 1 significant value from the AR model and likewise 1

significant value from the MA model. Also, the graph was initially non-stationary and we had to

perform differencing operation once in order to convert into a stationary set. Hence the ARIMA

model which will be obtained from the combined values of the other two models along with the

Integral operator can be displayed as ARIMA(1,1,1).

Conclusion :

All these models give us an insight or at least close enough prediction about any particular time

series. Also, it depends on the users that which model perfectly suffices their needs. If the chances

of error rate are less in any one model compared to other models then it's preferred that we choose

AML Notes UNIT-4 Page 23

of error rate are less in any one model compared to other models then it's preferred that we choose

the one which gives us the closest estimation.

Applications
• Economics: Forecasting GDP, inflation rates.
• Finance: Predicting stock prices, interest rates.
• Engineering: Analyzing system behaviors and control systems.
Assumptions
• The time series is stationary, meaning its statistical properties do not change over time.
• The residuals (errors) are normally distributed and uncorrelated.
2. ARCH Model (Autoregressive Conditional Heteroskedasticity)
The ARCH model, introduced by Robert Engle in 1982, is designed to model time series data where
the variance of the errors is not constant but changes over time.
ARCH Model Structure
The ARCH model is specified as:
ϵt=σt zt
σt^2=α0+α1 ϵ(t−1)^2+α2 ϵ(t−2)^ 2+⋯+αq ϵ(t−q)^2
where:
• ϵt is the error term at time t.
• Σt^2 is the conditional variance of ϵt given past information.
• zt is a white noise error term with zero mean and unit variance.
• α0,α1,…,αq are parameters.
Applications
• Finance: Modeling and forecasting volatility of financial markets, such as stock prices and
exchange rates.
• Econometrics: Analyzing economic time series where volatility changes over time.
Assumptions
• The error terms are conditionally heteroskedastic, meaning the variance changes over time
but is dependent on past errors.
• The model assumes that the conditional variance depends on past squared errors.
3. GARCH Model (Generalized Autoregressive Conditional Heteroskedasticity)
The GARCH model, introduced by Tim Bollerslev in 1986, extends the ARCH model to include past
forecast variances as well as past squared errors. This makes it more flexible and better suited for
capturing the persistence of volatility.
GARCH Model Structure
The GARCH(p, q) model is specified as:
ϵt=σtzt
σt^2=α0+α1ϵ(t−1)^2+α2ϵ(t−2)^2+⋯+αqϵ(t−q)^2+β1σ(t−1)^2+β2σ(t−2)^2+⋯+βpσ(t−p)^
2where:

AML Notes UNIT-4 Page 24

2where:
• ϵt is the error term.
• σt^2 is the conditional variance.
• zt is a white noise error term.
• α0,α1,…,αq are the parameters for past squared errors.
• β1,β2,…,βp are the parameters for past variances.
Applications
• Finance: Forecasting and modeling volatility in stock returns, risk management, and
derivatives pricing.
• Economics: Analyzing and predicting economic indicators with variable volatility.
Assumptions
• The conditional variance depends on past squared errors and past variances.
• The model assumes that the volatility clustering is persistent, meaning periods of high
volatility are followed by more high volatility.
Summary
• ARMA: Useful for stationary time series where both the mean and variance are constant over
time.
• ARCH: Focuses on modeling changing volatility in a time series with conditional
heteroskedasticity.
• GARCH: An extension of ARCH that includes past variances in modeling, suitable for capturing
persistent volatility clustering.
These models are fundamental tools in time series analysis and forecasting, each addressing
different aspects of time-dependent data.

AML Notes UNIT-4 Page 25

Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
40 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
32 pages
Hierarchical Clustering Explained
No ratings yet
Hierarchical Clustering Explained
14 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
80 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
2 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
Cluster Analysis Concept & Methods
No ratings yet
Cluster Analysis Concept & Methods
14 pages
Chapter 4 - Clustering
No ratings yet
Chapter 4 - Clustering
21 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Pattern Recognition 21BR551 MODULE 04 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 04 NOTES
16 pages
DWM 4
No ratings yet
DWM 4
14 pages
K-Means Clustering Guide
100% (1)
K-Means Clustering Guide
14 pages
ML 8
No ratings yet
ML 8
12 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Clustering Methods and Algorithms
No ratings yet
Clustering Methods and Algorithms
110 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
Apznzay5vyj1g6gkah Kmbaixbpduyak6bcwuvl7ninq7zt7srgn 19bdjz0i5mveqgxmyzs4sqz261v5rbp8gqujfa Ek Rh6 Oh2dp6 Flr4vopezi37xvvodeenienswwosatwx3t7rl0sfya5pgiee532nsasohyxj6i5oerxobrlz4xgki2zckmaqqkwwwutmncfnicxaoazhdwpmg
No ratings yet
Apznzay5vyj1g6gkah Kmbaixbpduyak6bcwuvl7ninq7zt7srgn 19bdjz0i5mveqgxmyzs4sqz261v5rbp8gqujfa Ek Rh6 Oh2dp6 Flr4vopezi37xvvodeenienswwosatwx3t7rl0sfya5pgiee532nsasohyxj6i5oerxobrlz4xgki2zckmaqqkwwwutmncfnicxaoazhdwpmg
6 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Unit 4 Cluster Analysis 3
No ratings yet
Unit 4 Cluster Analysis 3
20 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
110 pages
Data Mining Clustering Techniques
No ratings yet
Data Mining Clustering Techniques
43 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Unsupervised Learning: Clustering
No ratings yet
Unsupervised Learning: Clustering
69 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Clustering
No ratings yet
Clustering
6 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Agnes
No ratings yet
Agnes
25 pages
Clustering Basics
No ratings yet
Clustering Basics
39 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Module 5
No ratings yet
Module 5
43 pages
ML Unit 4
No ratings yet
ML Unit 4
15 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Unit 4 Self Made
No ratings yet
Unit 4 Self Made
28 pages
Unit 3
No ratings yet
Unit 3
12 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
ARIMA Forecast for Vinhomes Stock
No ratings yet
ARIMA Forecast for Vinhomes Stock
22 pages
Program Book ICIC 2023 1
No ratings yet
Program Book ICIC 2023 1
173 pages
Martinez and Orrego Et Al - 2025
No ratings yet
Martinez and Orrego Et Al - 2025
12 pages
Tsa 3
No ratings yet
Tsa 3
16 pages
Causal Forecasting For Pricing
No ratings yet
Causal Forecasting For Pricing
17 pages
Time Series STATA Manual PDF
No ratings yet
Time Series STATA Manual PDF
544 pages
Project 1
No ratings yet
Project 1
6 pages
Don K. Mak - The Science of Financial Market Trading-World Scientific Publishing Company (2003)
100% (3)
Don K. Mak - The Science of Financial Market Trading-World Scientific Publishing Company (2003)
260 pages
Data Analytics Market of India Insights
No ratings yet
Data Analytics Market of India Insights
10 pages
Pakistan Energy Outlook Report
No ratings yet
Pakistan Energy Outlook Report
50 pages
The Autoregressive
No ratings yet
The Autoregressive
7 pages
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
No ratings yet
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
13 pages
Time Series Analysis Course
No ratings yet
Time Series Analysis Course
46 pages
Forecasting - Wikipedia
No ratings yet
Forecasting - Wikipedia
22 pages
Get Intermittent Demand Forecasting: Context, Methods and Applications 1st Edition Aris A. Syntetos Free All Chapters
100% (1)
Get Intermittent Demand Forecasting: Context, Methods and Applications 1st Edition Aris A. Syntetos Free All Chapters
47 pages
TDS ET With Solution PK
No ratings yet
TDS ET With Solution PK
26 pages
Business Statistics With Solutions in R 1st Edition Mustapha Abiodun Akinkunmi Full Chapters Instanly
100% (1)
Business Statistics With Solutions in R 1st Edition Mustapha Abiodun Akinkunmi Full Chapters Instanly
105 pages
System Identification
100% (2)
System Identification
646 pages
Full Machine Learning For Algorithmic Trading 2nd Edition Stefan Jansen Ebook All Chapters
No ratings yet
Full Machine Learning For Algorithmic Trading 2nd Edition Stefan Jansen Ebook All Chapters
46 pages
International Journal of Forecasting: Robert Fildes, Shaohui Ma, Stephan Kolassa
No ratings yet
International Journal of Forecasting: Robert Fildes, Shaohui Ma, Stephan Kolassa
36 pages
ARIMA-Based Forecasting of Philippine Population Growth and Influencing Factors
No ratings yet
ARIMA-Based Forecasting of Philippine Population Growth and Influencing Factors
64 pages
Electricity Consumption and Generation Forecasting With Artificial Neural Networks
No ratings yet
Electricity Consumption and Generation Forecasting With Artificial Neural Networks
22 pages
Operations Planning Essentials
No ratings yet
Operations Planning Essentials
15 pages
Rift Valleey University Chiro Campus: Analysis of Tourism Development, Management Gamachis Woreda
No ratings yet
Rift Valleey University Chiro Campus: Analysis of Tourism Development, Management Gamachis Woreda
32 pages
Basic Statistics For Risk Management in Banks and Financial Institutions Arindam Bandyopadhyay Instant Download
100% (2)
Basic Statistics For Risk Management in Banks and Financial Institutions Arindam Bandyopadhyay Instant Download
76 pages
Inventory Optimization Insights
No ratings yet
Inventory Optimization Insights
8 pages
A Multivariate Ari1399 PDF
No ratings yet
A Multivariate Ari1399 PDF
14 pages
Research Proposal
No ratings yet
Research Proposal
3 pages
A Predictive Analytics Approach For Demand Forecasting
100% (1)
A Predictive Analytics Approach For Demand Forecasting
22 pages
Time Series - SAS
No ratings yet
Time Series - SAS
9 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

Untitled

Sunday, August 11, 2024 2:58 PM

AML Notes UNIT-4 Page 1

AML Notes UNIT-4 Page 2

Divisive Hierarchical Clustering:

Computing Distance Matrix:

AML Notes UNIT-4 Page 3

Hierarchical Agglomerative vs Divisive Clustering

• Divisive clustering is more complex as compared to agglomerative clustering, as in the case of

AML Notes UNIT-4 Page 4

AML Notes UNIT-4 Page 5

Single Linkage Method:

AML Notes UNIT-4 Page 6

AML Notes UNIT-4 Page 7

AML Notes UNIT-4 Page 8

AML Notes UNIT-4 Page 9

The k-means clustering algorithm mainly performs two tasks:

How does the K-Means Algorithm Work?

AML Notes UNIT-4 Page 10

AML Notes UNIT-4 Page 11

AML Notes UNIT-4 Page 12

AML Notes UNIT-4 Page 13

AML Notes UNIT-4 Page 14

AML Notes UNIT-4 Page 15

The steps to be followed for the implementation are given below:

AML Notes UNIT-4 Page 16

AML Notes UNIT-4 Page 17

Time Series Forecasting : ARMA Model ,ARCH & GARCH Model:

if the graph of the time series is a stationary overtime period.

AML Notes UNIT-4 Page 18

calculated using the Pearson Correlation formula.

AML Notes UNIT-4 Page 19

future values in the series in know as Auto Regression (AR) model.

Yt = β₁* y-₁ + β₂* yₜ-₂ + β₃ * yₜ-₃ + ………… + βₖ * yₜ-ₖ

only those values will be considered for the model analysis.

the threshold then the model will be termed as AR(2).

AML Notes UNIT-4 Page 20

Average (MA) model.

Yt = α₁* Ɛₜ-₁ + α₂ * Ɛₜ-₂ + α₃ * Ɛₜ-₃ + ………… + αₖ * Ɛₜ-ₖ

AML Notes UNIT-4 Page 21

values of the other two models will be of the order of ARMA(1,1).

AML Notes UNIT-4 Page 22

Integral operator can be displayed as ARIMA(1,1,1).

AML Notes UNIT-4 Page 23

the one which gives us the closest estimation.

AML Notes UNIT-4 Page 24

AML Notes UNIT-4 Page 25

You might also like