The document compares hard assignment clustering (like K-means) with soft assignment clustering (model-based). K-means assigns data points to the nearest centroid and assumes spherical clusters, while model-based clustering uses statistical models to capture complex shapes and provides probabilistic cluster memberships. K-means is efficient for simple cases, whereas model-based clustering offers flexibility for overlapping and varied cluster shapes.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
1 views3 pages
5 A
The document compares hard assignment clustering (like K-means) with soft assignment clustering (model-based). K-means assigns data points to the nearest centroid and assumes spherical clusters, while model-based clustering uses statistical models to capture complex shapes and provides probabilistic cluster memberships. K-means is efficient for simple cases, whereas model-based clustering offers flexibility for overlapping and varied cluster shapes.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 3
Aspect Lec) Model-Based
Clustering Clustering
Clustering Hard assignment: Soft assignment: each
Approach each data point eC Wolo) ae
Lote fel oR elid probability of
one cluster based belonging to each
on closest centroid | cluster based ona
(distance-based). statistical model.
Assumptions | Assumes clusters Assumes data come
are spherical and from a mixture of
equally sized; no probability
explicit data distributions (e.g.,
distribution Gaussian mixtures),
assumed. PT am ered oie}
estimated.
Cluster Detects spherical Can detect clusters of
ES] T ol) fol Slee na various shapes
(elliptical, different
covariance
Sot.) B
Output Partition of data Probabilistic cluster
Taeme eee (ar Ue|
labels).
memberships and
model parameters
(means, covariances).Example
Tey
Lala<=tiCe lake d
Palle Piss Cm a)
of squares using
centroid updates.
Grouping
customers into 3
clusters based on
spending patterns
using Euclidean
distance.
Maximum likelihood
Clue cei
Expectation-
Maximization) based
on mixture models.
Clustering the Old
Faithful geyser data
into 3 clusters
modeled as Gaussian
distributions with
different means and
covariances.Ti) ol delat tel a)
« K-means example: Suppose an online store groups
customers into 3 clusters by spending frequency
and amount. K-means assigns each customer to the
nearest cluster centroid, iteratively updating
centroids until stable (10).
¢ Model-based example: The Old Faithful geyser
dataset can be modeled as a mixture of three
Gaussian clusters with elliptical shapes, where each
data point has a probability of belonging
to each cluster. This approach captures
more complex cluster shapes and provides
probabilistic assignments (6).
ST lag
K-means is simple, fast, and effective for spherical
clusters with hard assignments, while model-based
clustering offers a flexible probabilistic framework that
can model clusters with different shapes and overlapping
boundaries, providing richer information about cluster
it laales eal on