HIERARCHICAL CLUSTERING PROJECT 1
DATA SCIENCE PERSONIFWY
Definition
       Data science is the study of data to extract meaningful insights for business. It is a
multidisciplinary approach that combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and computer engineering to analyse large
amounts of data. This analysis helps data scientists to ask and answer questions like what
happened, why it happened, what will happen, and what can be done with the results.
History of data science
       While the term data science is not new, the meanings and connotations have changed
over time. The word first appeared in the ’60s as an alternative name for statistics. In the late
’90s, computer science professionals formalized the term. A proposed definition for data
science saw it as a separate field with three aspects: data design, collection, and analysis. It
still took another decade for the term to be used outside of academia.
What is data science used for?
Data science is used to study data in four main ways:
1. Descriptive analysis
Descriptive analysis examines data to gain insights into what happened or what is happening
in the data environment. It is characterized by data visualizations such as pie charts, bar
charts, line graphs, tables, or generated narratives. For example, a flight booking service
may record data like the number of tickets booked each day. Descriptive analysis will reveal
booking spikes, booking slumps, and high-performing months for this service.
2. Diagnostic analysis
Diagnostic analysis is a deep-dive or detailed data examination to understand why
something happened. It is characterized by techniques such as drill-down, data discovery,
data mining, and correlations. Multiple data operations and transformations may be
performed on a given data set to discover unique patterns in each of these techniques.For
example, the flight service might drill down on a particularly high-performing month to
better understand the booking spike. This may lead to the discovery that many customers
visit a particular city to attend a monthly sporting event.
                                                                                                   1
                             HIERARCHICAL CLUSTERING PROJECT 1
3. Predictive analysis
Predictive analysis uses historical data to make accurate forecasts about data patterns that
may occur in the future. It is characterized by techniques such as machine learning,
forecasting, pattern matching, and predictive modeling. In each of these techniques,
computers are trained to reverse engineer causality connections in the data.For example, the
flight service team might use data science to predict flight booking patterns for the coming
year at the start of each year. The computer program or algorithm may look at past data and
predict booking spikes for certain destinations in May. Having anticipated their customer’s
future travel requirements, the company could start targeted advertising for those cities from
February.
4. Prescriptive analysis
Prescriptive analytics takes predictive data to the next level. It not only predicts what is
likely to happen but also suggests an optimum response to that outcome. It can analyze the
potential implications of different choices and recommend the best course of action. It uses
graph analysis, simulation, complex event processing, neural networks, and recommendation
engines from machine learning.
Back to the flight booking example, prescriptive analysis could look at historical marketing
campaigns to maximize the advantage of the upcoming booking spike. A data scientist could
project booking outcomes for different levels of marketing spend on various marketing
channels. These data forecasts would give the flight booking company greater confidence in
their marketing decisions.
The process of Data Science
       A business problem typically initiates the data science process. A data scientist will
work with business stakeholders to understand what business needs. Once the problem has
been defined, the data scientist may solve it using the OSEMN data science process:
O – Obtain data
Data can be pre-existing, newly acquired, or a data repository downloadable from the
internet. Data scientists can extract data from internal or external databases, company CRM
software, web server logs, social media or purchase it from trusted third-party sources.
                                                                                                2
                                  HIERARCHICAL CLUSTERING PROJECT 1
    S – Scrub data
    Data scrubbing, or data cleaning, is the process of standardizing the data according to a
    predetermined format. It includes handling missing data, fixing data errors, and removing
    any data outliers. Some examples of data scrubbing are: ·
•   Changing all date values to a common standard format.
•   Fixing spelling mistakes or additional spaces.
•   Fixing mathematical inaccuracies or removing commas from large numbers.
    E – Explore data
    Data exploration is preliminary data analysis that is used for planning further data modeling
    strategies. Data scientists gain an initial understanding of the data using descriptive statistics
    and data visualization tools. Then they explore the data to identify interesting patterns that
    can be studied or actioned.
    M – Model data
    Software and machine learning algorithms are used to gain deeper insights, predict
    outcomes, and prescribe the best course of action. Machine learning techniques like
    association, classification, and clustering are applied to the training data set. The model
    might be tested against predetermined test data to assess result accuracy. The data model can
    be fine-tuned many times to improve result outcomes.
    N – Interpret results
    Data scientists work together with analysts and businesses to convert data insights into
    action. They make diagrams, graphs, and charts to represent trends and predictions. Data
    summarization helps stakeholders understand and implement results effectively.
                                                                                                     3
                                 HIERARCHICAL CLUSTERING PROJECT 1
    Data Science Technologies
    Data science practitioners work with complex technologies such as:
1. Artificial intelligence: Machine learning models and related software are used for predictive
    and prescriptive analysis.
2. Cloud computing: Cloud technologies have given data scientists the flexibility and
    processing power required for advanced data analytics.
3. Internet of things: IoT refers to various devices that can automatically connect to the
    internet. These devices collect data for data science initiatives. They generate massive data
    which can be used for data mining and data extraction.
4. Quantum computing: Quantum computers can perform complex calculations at high speed.
    Skilled data scientists use them for building complex quantitative algorithms.
    Tools for Data Science
    AWS has a range of tools to support data scientists around the globe:
    Data storage
    For data warehousing, Amazon Redshift can run complex queries against structured or
    unstructured data. Analysts and data scientists can use AWS glue to manage and search for
    data. AWS Glue automatically creates a unified catalogue of all data in the data lake, with
    metadata attached to make it discoverable.
    Machine learning
    Amazon Sage Maker is a fully managed machine learning service that runs on the Amazon
    Elastic Compute Cloud (EC2). It allows users to organize data, build, train and deploy
    machine learning models, and scale operations.
    Analytics
•   Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon
    S3 or Glacier. It is fast, serverless, and works using standard SQL queries.
•   Amazon Elastic Map Reduce (EMR) processes big data using servers like Spark and
    Hadoop.
•   Amazon Kinesis allows aggregation and processing of streaming data in real-time. It uses
    website clickstreams, application logs, and telemetry data from IoT devices.
•   Amazon Open Search allows search, analysis, and visualization of petabytes of data.
                                                                                                    4
                            HIERARCHICAL CLUSTERING PROJECT 1
Challenges faced by Data Science
Multiple data sources
Different types of apps and tools generate data in various formats. Data scientists have to
clean and prepare data to make it consistent. This can be tedious and time-consuming.
Understanding the business problem
Data scientists have to work with multiple stakeholders and business managers to define the
problem to be solved. This can be challenging—especially in large companies with multiple
teams that have varying requirements.
Elimination of bias
Machine learning tools are not completely accurate, and some uncertainty or bias can exist
as a result. Biases are imbalances in the training data or prediction behavior of the model
across different groups, such as age or income bracket. For instance, if the tool is trained
primarily on data from middle-aged individuals, it may be less accurate when making
predictions involving younger and older people. The field of machine learning provides an
opportunity to address biases by detecting them and measuring them in the data and model.
                                                                                               5
                                 HIERARCHICAL CLUSTERING PROJECT 1
                           HIERARCHICAL CLUSTERING
    INTRODUCTION
➢ It is crucial to understand customer behaviour in any industry. I realized this last year when
    my chief marketing officer asked me – “Can you tell me which existing customers should
    we target for our new product?”
➢ That was quite a learning curve for me. I quickly realized as a data scientist how important it
    is to segment customers so my organization can tailor and build targeted strategies. This is
    where the concept of clustering came in ever so handy!
➢ Problems like segmenting customers are often deceptively tricky because we are not
    working with any target variable in mind. We are officially in the land of unsupervised
    learning where we need to figure out patterns and structures without a set outcome in mind.
    It’s both challenging and thrilling as a data scientist.
➢ Now, there are a few different ways to perform clustering. I will introduce you to one such
    type – hierarchical clustering.
➢ We will learn what hierarchical clustering is, its advantage over the other clustering
    algorithms, the different types of hierarchical clustering and the steps to perform it. We will
    finally take up a customer segmentation dataset and then implement hierarchical clustering
    in Python.
                                                                                                   6
                             HIERARCHICAL CLUSTERING PROJECT 1
What is Hierarchical Clustering?
Let’s say we have the below points and we want to cluster them into groups:
We can assign each of these points to a separate cluster:
Now, based on the similarity of these clusters, we can combine the most similar clusters
together and repeat this process until only a single cluster is left:
We are essentially building a hierarchy of clusters. That’s why this algorithm is called
hierarchical clustering. I will discuss how to decide the number of clusters in a later section.
For now, let’s look at the different types of hierarchical clustering.
                                                                                               7
                               HIERARCHICAL CLUSTERING PROJECT 1
   Types of Hierarchical Clustering
   There are mainly two types of hierarchical clustering:
1. Agglomerative hierarchical clustering
2. Divisive Hierarchical clustering
   Agglomerative Hierarchical Clustering:
           We assign each point to an individual cluster in this technique. Suppose there are 4
   data points. We will assign each of these points to a cluster and hence will have 4 clusters in
   the beginning:
   Then, at each iteration, we merge the closest pair of clusters and repeat this step until only a
   single cluster is left:
   We are merging (or adding) the clusters at each step. Hence, this type of clustering is also
   known as Additive hierarchical clustering.
                                                                                                  8
                             HIERARCHICAL CLUSTERING PROJECT 1
Divisive Hierarchical Clustering:
        Divisive hierarchical clustering works in the opposite way. Instead of starting with n
clusters (in case of n observations), we start with a single cluster and assign all the points to
that cluster.
So, it doesn’t matter if we have 10 or 1000 data points. All these points will belong to the
same cluster at the beginning:
Now, at each iteration, we split the farthest point in the cluster and repeat this process until
each cluster only contains a single point:
We are splitting (or dividing) the clusters at each step, hence the name divisive hierarchical
clustering.
Agglomerative Clustering is widely used in the industry and that will be the focus in this
article. Divisive hierarchical clustering will be a piece of cake once we have a handle on the
agglomerative type.
                                                                                                   9
                                 HIERARCHICAL CLUSTERING PROJECT 1
    Steps to Perform Hierarchical Clustering
➢ We merge the most similar points or clusters in hierarchical clustering – we know this. Now
    the question is – how do we decide which points are similar and which are not? It’s one of
    the most important questions in clustering!
➢ Here’s one way to calculate similarity – Take the distance between the centroids of these
    clusters. The points having the least distance are referred to as similar points and we can
    merge them. We can refer to this as a distance-based algorithm as well (since we are
    calculating the distances between the clusters).
➢ In hierarchical clustering, we have a concept called a proximity matrix. This stores the
    distances between each point. Let’s take an example to understand this matrix as well as the
    steps to perform hierarchical clustering.
    Step 1: First, we assign all the points to an individual cluster:
    Different colors here represent different clusters. You can see that we have 5 different
    clusters for the 5 points in our data.
    Step 2: Next, we will look at the smallest distance in the proximity matrix and merge the
    points with the smallest distance. We then update the proximity matrix:
    Here, the smallest distance is 3 and hence we will merge point 1 and 2:
    Let’s look at the updated clusters and accordingly update the proximity matrix:
                                                                                                  10
                             HIERARCHICAL CLUSTERING PROJECT 1
Here, we have taken the maximum of the two marks (7, 10) to replace the marks for this
cluster. Instead of the maximum, we can also take the minimum value or the average values
as well. Now, we will again calculate the proximity matrix for these clusters:
Step 3: We will repeat step 2 until only a single cluster is left.
So, we will first look at the minimum distance in the proximity matrix and then merge the
closest pair of clusters. We will get the merged clusters as shown below after repeating these
steps:
We started with 5 clusters and finally have a single cluster. This is how Agglomerative
hierarchical clustering works.
                                                                                            11
                                 HIERARCHICAL CLUSTERING PROJECT 1
     Why Hierarchical Clustering?
     We should first know how K-means works before we dive into hierarchical clustering. Trust
     me, it will make the concept of hierarchical clustering all the easier.
     Here’s a brief overview of how K-means works:
1.   Decide the number of clusters (k)
2.   Select k random points from the data as centroids
3.   Assign all the points to the nearest cluster centroid
4.   Calculate the centroid of newly formed clusters
5.   Repeat steps 3 and 4
➢ It is an iterative process. It will keep on running until the centroids of newly formed clusters
     do not change or the maximum number of iterations are reached.
➢ But there are certain challenges with K-means. It always tries to make clusters of the same
     size. Also, we must decide the number of clusters at the beginning of the algorithm. Ideally,
     we would not know how many clusters we should have, in the beginning of the algorithm
     and hence it a challenge with K-means.
➢ This is a gap hierarchical clustering bridges with aplomb. It takes away the problem of
     having to pre-define the number of clusters. Sounds like a dream! So, let’s see what
     hierarchical clustering is and how it improves on K-means.
     How it works
 1. Make each data point a cluster.
                                                                                                12
                                HIERARCHICAL CLUSTERING PROJECT 1
   2. Take the two closest clusters and make them one cluster.
   3. Repeat step 2 until there is only one cluster.
   Dendrograms
   We can use a dendrogram to visualize the history of groupings and figure out the optimal
   number of clusters.
1. Determine the largest vertical distance that doesn’t intersect any of the other clusters
2. Draw a horizontal line at both extremities
3. The optimal number of clusters is equal to the number of vertical lines going through the
   horizontal line
                                                                                               13
                             HIERARCHICAL CLUSTERING PROJECT 1
For e.g., in the below case, best choice for no. of clusters will be 4.
Linkage Criteria
Similar to gradient descent, you can tweak certain parameters to get drastically different
results.
The linkage criteria refer to how the distance between clusters is calculated.
                                                                                             14
                               HIERARCHICAL CLUSTERING PROJECT 1
Single Linkage
The distance between two clusters is the shortest distance between two points in each cluster
Complete Linkage
The distance between two clusters is the longest distance between two points in each cluster
Average Linkage
The distance between clusters is the average distance between each point in one cluster to
every point in other cluster
                                                                                             15
                           HIERARCHICAL CLUSTERING PROJECT 1
Ward Linkage
    The distance between clusters is the sum of squared differences within all clustering.
Euclidean Distance
The shortest distance between two points. For example, if x=(a,b) and y=(c,d), the Euclidean
distance between x and y is √(a−c)²+(b−d)²
Manhattan Distance
Imagine you were in the downtown center of a big city and you wanted to get from point A to
point B. You wouldn’t be able to cut across buildings, rather you’d have to make your way
by walking along the various streets. For example, if x=(a,b) and y=(c,d), the Manhattan
distance between x and y is |a−c|+|b−d|
                                                                                             16
                               HIERARCHICAL CLUSTERING PROJECT 1
Example 1 for Hierarchical Clustering
Let’s look at a concrete example of how we could go about labelling data using hierarchical
agglomerative clustering.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from sklearn.cluster import AgglomerativeClustering
import scipy.cluster.hierarchy as sch
In this tutorial, we use the csv file containing a list of customers with their gender, age,
annual income and spending score.
If you want to follow along, you can get the dataset from the super data science website.
To display our data on a graph at a later point, we can only take two variables (annual income
and spending score).
dataset = pd.read_csv('./data.csv')
X = dataset.iloc[:, [3, 4]].values
Looking at the dendrogram, the highest vertical distance that doesn’t intersect with any
clusters is the middle green one. Given that 5 vertical lines cross the threshold, the optimal
number of clusters is 5.
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
                                                                                                 17
                             HIERARCHICAL CLUSTERING PROJECT 1
We create an instance of Agglomerative Clustering using the Euclidean distance as the
measure of distance between points and ward linkage to calculate the proximity of clusters.
model = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
model.fit(X)
labels = model.labels_
The labels property returns an array of integers where the values correspond to the distinct
categories.
We can use a shorthand notation to display all the samples belonging to a category as a
specific color.
plt.scatter(X[labels==0, 0], X[labels==0, 1], s=50, marker='o', color='red')
plt.scatter(X[labels==1, 0], X[labels==1, 1], s=50, marker='o', color='blue')
plt.scatter(X[labels==2, 0], X[labels==2, 1], s=50, marker='o', color='green')
plt.scatter(X[labels==3, 0], X[labels==3, 1], s=50, marker='o', color='purple')
                                                                                               18
                             HIERARCHICAL CLUSTERING PROJECT 1
plt.scatter(X[labels==4, 0], X[labels==4, 1], s=50, marker='o', color='orange')
plt.show()
Example 2
Hierarchical Clustering for Customer Data
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly as py
import plotly.graph_objs as go
import warnings
warnings.filterwarnings('ignore')
from sklearn import preprocessing
import scipy.cluster.hierarchy as sch
from sklearn.cluster import AgglomerativeClustering
                                                                                  19
                            HIERARCHICAL CLUSTERING PROJECT 1
Data Exploration
In [2]:
df = pd.read_csv('../input/customer-segmentation-tutorial-in-python/Mall_Customers.csv')
df.head()
Out [2]:
      CustomerID     Gender     Age    Annual Income (k$)       Spending Score (1-100)
 0    1              Male       19     15                       39
 1    2              Male       21     15                       81
 2    3              Female     20     16                       6
 3    4              Female     23     16                       77
 4    5              Female     31     17                       40
In [3]:
df.isnull().sum()
Out [3]:
CustomerID                0
Gender                    0
Age                      0
Annual Income (k$)       0
Spending Score (1-100)   0
dtype : int64
In [4]:
df.describe()
Out [4]:
                                                                                           20
                              HIERARCHICAL CLUSTERING PROJECT 1
             CustomerID      Age           Annual Income (k$)         Spending Score (1-100)
 count       200.000000      200.000000    200.000000                 200.000000
 mean        100.500000      38.850000     60.560000                  50.200000
 std         57.879185       13.969007     26.264721                  25.823522
 min         1.000000        18.000000     15.000000                  1.000000
 25%         50.750000       28.750000     41.500000                  34.750000
 50%         100.500000      36.000000     61.500000                  50.000000
 75%         150.250000      49.000000     78.000000                  73.000000
 max         200.000000      70.000000     137.000000                 99.000000
In [5]:
plt.figure(1 , figsize = (15 , 6))
n=0
for x in ['Age' , 'Annual Income (k$)' , 'Spending Score (1-100)']:
   n += 1
   plt.subplot(1 , 3 , n)
   plt.subplots_adjust(hspace = 0.5 , wspace = 0.5)
   sns.distplot(df[x] , bins = 15)
   plt.title('Distplot of {}'.format(x))
plt.show()
                                                                                               21
                           HIERARCHICAL CLUSTERING PROJECT 1
Label Encoding
Label Encoding refers to converting the labels into numeric form so as to convert it into the
machine-readable form. Machine learning algorithms can then decide in a better way on how
those labels must be operated.
In [6]:
label_encoder = preprocessing.LabelEncoder()
df['Gender'] = label_encoder.fit_transform(df['Gender'])
df.head()
Out [6]:
                 Gender     Age     Annual Income (k$)      Spending Score (1-100)
 CustomerID
 0               1          1       19                      15                          39
 1               2          1       21                      15                          81
 2               3          0       20                      16                          6
 3               4          0       23                      16                          77
 4               5          0       31                      17                          40
                                                                                             22
                             HIERARCHICAL CLUSTERING PROJECT 1
Heatmap
A heat map is a data visualization technique that shows magnitude of a phenomenon as color
in two dimensions. The variation in color may be by hue or intensity, giving obvious visual
cues to the reader about how the phenomenon is clustered or varies over space.
In [7]:
plt.figure(1, figsize = (16 ,8))
sns.heatmap(df)
plt.show()
Dendrogram
A dendrogram is a diagram representing a tree. This diagrammatic representation is
frequently used in different contexts: in hierarchical clustering, it illustrates the arrangement
of the clusters produced by the corresponding analyses.
In [8]:
plt.figure(1, figsize = (16 ,8))
dendrogram = sch.dendrogram(sch.linkage(df, method = "ward"))
plt.title('Dendrogram')
plt.xlabel('Customers')
plt.ylabel('Euclidean distances')
                                                                                               23
                                 HIERARCHICAL CLUSTERING PROJECT 1
plt.show()
Agglomerative Clustering
This is a "bottom-up" approach: each observation starts in its own cluster, and pairs of
clusters are merged as one moves up the hierarchy.
In [9]:
hc = AgglomerativeClustering(n_clusters = 5, affinity = 'euclidean', linkage ='average')
y_hc = hc.fit_predict(df)
y_hc
Out [9]:
array ([ 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4,
           3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 4, 3, 2,
           3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
           2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0, 1, 2, 1, 0, 1, 0, 1,
           0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
           0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
           0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
           0, 1])
In [10]:
df['cluster'] = pd.DataFrame(y_hc)
                                                                                           24
                                     HIERARCHICAL CLUSTERING PROJECT 1
In [11]:
trace1 = go.Scatter3d(
    x= df['Age'],
    y= df['Spending Score (1-100)'],
    z= df['Annual Income (k$)'],
    mode='markers',
    marker=dict(
        color = df['cluster'],
        size= 10,
        line=dict(
             color= df['cluster'],
             width= 12
        ),
        opacity=0.8
    )
)
data = [trace1]
layout = go.Layout(
    title= 'Clusters using Agglomerative Clustering',
    scene = dict(
             xaxis = dict(title = 'Age'),
             yaxis = dict(title = 'Spending Score'),
             zaxis = dict(title = 'Annual Income')
        )
)
fig = go.Figure(data=data, layout=layout)
py.offline.iplot(fig)
In [12]:
X = df.iloc[:, [3,4]].values
plt.scatter(X[y_hc==0, 0], X[y_hc==0, 1], s=100, c='red', label ='Cluster 1')
plt.scatter(X[y_hc==1, 0], X[y_hc==1, 1], s=100, c='blue', label ='Cluster 2')
plt.scatter(X[y_hc==2, 0], X[y_hc==2, 1], s=100, c='green', label ='Cluster 3')
                                                                                  25
                               HIERARCHICAL CLUSTERING PROJECT 1
   plt.scatter(X[y_hc==3, 0], X[y_hc==3, 1], s=100, c='purple', label ='Cluster 4')
   plt.scatter(X[y_hc==4, 0], X[y_hc==4, 1], s=100, c='orange', label ='Cluster 5')
   plt.title('Clusters of Customers (Hierarchical Clustering Model)')
   plt.xlabel('Annual Income(k$)')
   plt.ylabel('Spending Score(1-100)')
   plt.show()
   Cluster Analysis
1. Green - Low Income, Low Spending
2. Yellow - Low Income, High Spending
3. Red - Medium Income, Medium Spending
4. Purple - High Income, Low Spending
5. Blue - High Income, High Spending
   In [13]:
   df.head()
   Out [13]:
                                         Annual Income      Spending Score (1-
                    Gender     Age                                                    cluster
    CustomerID                           (k$)               100)
    0               1          1         19                 15                        39        3
                                                                                                26
                           HIERARCHICAL CLUSTERING PROJECT 1
                                   Annual Income        Spending Score (1-
                 Gender     Age                                                cluster
 CustomerID                        (k$)                 100)
 1               2          1      21                   15                     81        4
 2               3          0      20                   16                     6         3
 3               4          0      23                   16                     77        4
 4               5          0      31                   17                     40        3
In [14]:
df.to_csv("segmented_customers.csv", index = False)
Conclusion
Thus, we have analysed Customer data and performed Hierarchical Clustering using
Agglomerative Clustering Algorithm. This kind of cluster analysis helps design better
customer acquisition strategies and helps in business growth.
DATA SCIENCE PERSONIFWY BATCH 6
FROM:
Bhumika Reddy Goddilla
bhumikareddy.1445050@gmail.com
                                                                                         27