Data Mining
Clustering with Expectation
Maximization
Color Clustering by K-means Algorithm
Form K-means clusters from a set of n-dimensional vectors
1. Set ic (iteration count) to 1
2. Choose randomly a set of K means m1(1), …, mK(1).
3. For each vector xi, compute D(xi,mk(ic)), k=1,…K
and assign xi to the cluster Cj with nearest mean.
4. Increment ic by 1, update the means to get m1(ic),…,mK(ic).
5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.
K-Means Classifier
Classification Results
x1→C(x1)
x1={r1, g1, b1} x2→C(x2)
…
x2={r2, g2, b2}
xi→C(xi)
… Classifier …
xi={ri, gi, bi} (K-Means)
… Cluster Parameters
m1 for C1
m2 for C2
…
mk for Ck
K-Means Classifier (Cont.)
Input (Known) Output (Unknown)
x1={r1, g1, b1}
Classification Results
x2={r2, g2, b2} Cluster Parameters x1→C(x1)
… m1 for C1 x2→C(x2)
m2 for C2 …
xi={ri, gi, bi} … xi→C(xi)
… mk for Ck …
Input (Known) Output (Unknown)
Initial Guess of
x1={r1, g1, b1} Cluster Parameters
x2={r2, g2, b2} m1 , m2 , …, mk Classification Results (1)
… C(x1), C(x2), …, C(xi)
Cluster Parameters(1)
xi={ri, gi, bi} m1 , m2 , …, mk
Classification Results (2)
… C(x1), C(x2), …, C(xi)
Cluster Parameters(2)
m1 , m2 , …, mk
•••
•••
Classification Results (ic)
Cluster Parameters(ic) C(x1), C(x2), …, C(xi)
m1 , m2 , …, mk
•••
•••
K-Means (Cont.)
• Boot Step:
– Initialize K clusters: C1, …, CK
Each Cluster is represented by its mean mj
• Iteration Step:
– Estimate the cluster of each data
xi C(xi)
– Re-estimate the cluster parameters
m j = mean{xi | xi C j }
K-Means Example
K-Means Example
K-Means → EM
• Boot Step:
– Initialize K clusters: C1, …, CK
(j, j) and P(Cj) for each cluster j.
• Iteration Step:
Expectation
– Estimate the cluster of each data
p ( C j | xi )
Maximization
– Re-estimate the cluster parameters
( j , j ), p (C j ) For each cluster j
EM Classifier
Classification Results
p(C1|x1)
p(Cj|x2)
x1={r1, g1, b1}
…
x2={r2, g2, b2} p(Cj|xi)
…
… Classifier
xi={ri, gi, bi} (EM)
… Cluster Parameters
(1,1),p(C1) for C1
(2,2),p(C2) for C2
…
(k,k),p(Ck) for Ck
EM Classifier (Cont.)
Input (Known) Output (Unknown)
x1={r1, g1, b1} Cluster Parameters Classification Results
x2={r2, g2, b2} (1,1), p(C1) for C1 p(C1|x1)
(2,2), p(C2) for C2 p(Cj|x2)
… …
… p(Cj|xi)
xi={ri, gi, bi}
(k,k), p(Ck) for Ck …
…
Expectation Step
Input (Known) Input (Estimation) Output
x1={r1, g1, b1} Cluster Parameters Classification Results
x2={r2, g2, b2} (1,1), p(C1) for C1 p(C1|x1)
+ (2,2), p(C2) for C2 p(Cj|x2)
… …
… p(Cj|xi)
xi={ri, gi, (k,k), p(Ck) for Ck …
bi}
…
p ( xi | C j ) p (C j ) p ( xi | C j ) p (C j )
p (C j | xi ) = =
p ( xi ) p( x | C ) p(C )
j
i j j
Maximization Step
Input (Known) Input (Estimation) Output
x1={r1, g1, b1} Classification Results Cluster Parameters
p(C1|x1) (1, 1), p(C1) for C1
x2={r2, g2, b2} (2, 2), p(C2) for C2
p(Cj|x2)
+ …
… …
p(Cj|xi)
(k, k), p(Ck) for Ck
xi={ri, gi, …
bi}
…
p(C | x ) x
j i i p(C j | xi ) ( xi − j ) ( xi − j )T p(C j | xi )
j = i
j = i p(C j ) = i
p(C | x )
i
j i p(C j | xi ) N
i
EM Algorithm
• Boot Step:
– Initialize K clusters: C1, …, CK
(j, j) and P(Cj) for each cluster j.
• Iteration Step:
– Expectation Step
p ( xi | C j ) p (C j ) p ( xi | C j ) p (C j )
p (C j | xi ) = =
p ( xi ) p( x | C ) p(C )
j
i j j
– Maximization Step
p(C | x ) x
j i i p(C j | xi ) ( xi − j ) ( xi − j )T p(C j | xi )
j = i
j = i
p(C j ) = i
p(C | x )
i
j i p(C
i
j | xi ) N
Simpler Way to Understand
• Imagine you have a puzzle where some
pieces are missing. The EM algorithm helps
you complete the puzzle by guessing what
those missing pieces look like
Steps
You would follow
• Guess and Improve (Expectation Step)
– First you make a guess about what the missing
puzzle pieces might look like. This is like
saying “Hmm” I think the missing pieces could
be this color and shape. Your guess doesn’t
have to be prefect; its just a starting point
Steps
You would follow
• Make it better (Maximization Step)
– Then you look at the pieces you have and the
ones you guessed. You figure out how to adjust
your guess to make it match the pieces you
have as closely as possible. This steps is like
tweaking your guess to fit the puzzle better.
Steps
You would follow
• Repeat until done
– You keep doing these two steps over and over,
making your guess better and better each time.
It’s like refining your guess until the puzzle is
complete
EM
– The EM algorithm is like a smart helper that
makes educated guesses and keeps improving
them until the puzzle is solved. It’s great for
figuring out things when you don’t have all the
information you need.
EM Algorithm
• In Actual Terms
– The Expectation- Maximization algorithm is an
iterative statistical technique used for estimating
parameters of probabilistic models when some of the
data is missing or unobserved. EM is particularly useful
in situations where you have incomplete or partially
useful in situations where you have incomplete or
partially observed data, and you want to estimate the
underlying hidden variables or parameter of a statistical
model
EM
– The Expectation- Maximization algorithm is an
iterative optimization method that combines different
unsupervised machine learning algorithm to find
maximum likelihood or maximum posterior estimates
of parameter in statistical models that involve
unobserved latent variables
EM
– The EM Algorithm is commonly used for latent
variable models and can handle missing data. It consists
of an estimation step (E-step) and a maximization step
(M-Step) forming an iterative process to improve model
fit.
EM
– In E step, the algorithm computes the latent variable i.e.
expectation of the log-likelihood using the current
parameter estimates.
– In the M step, the algorithm determines the parameters
that maximize the expected log-likelihood obtained in
the E step, and corresponding model parameters are
updated based on the estimated latent variables
EM
– By iteratively repeating these steps, the EM algorithm
seeks to maximize the likelihood of the observed data.
It is commonly used for unsupervised learning task,
such as clustering, where latent variables are inferred,
and has application in various fields, including machine
learning, computer vision, and natural language
processing
Problem
Step1
26
Step2
27
Step3
28
Image Segmentation using EM
• Step 1: Feature Extraction
• Step 2: Image Segmentation using EM
Symbols
• The feature vector for pixel i is called xi.
• There are going to be K segments; K is
given.
• The j-th segment has a Gaussian distribution
with parameters j=(j,j).
• j's are the weights (which sum to 1) of
Gaussians. is the collection of parameters:
=(1, …, k, 1, …, k)
Initialization
• Each of the K Gaussians will have
parameters j=(j,j), where
– j is the mean of the j-th Gaussian.
– j is the covariance matrix of the j-th Gaussian.
• The covariance matrices are initialed to be
the identity matrix.
• The means can be initialized by finding the
average feature vectors in each of K windows
in the image; this is data-driven initialization.
E-Step
j f j ( xi | j )
p ( j | xi , ) = K
k =1
k f k ( xi | k )
1
1 − ( x − j )T j −1 ( x − j )
f j (x | j ) = e 2
(2 ) | j |
d /2 1/ 2
M-Step
i
x p ( j | xi , old
)
j new = i =1
N
p
i =1
( j | xi , old
)
)( xi − j )( xi − j
old new new T
p ( j | xi , )
j = i =1
new
N
p (
i =1
j | xi , old
)
N
1
j =
new old
p ( j | xi , )
N i =1
Sample Results
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
K=2 𝑷 𝒙𝟏 𝒄𝟏 = 𝑷( 𝟑, 𝟒 (𝟑, 𝟒))
Step1: Let us choose 1 =(3,4)
2 = (14,11)
Randomly to start Covariance 1= 1 0
0 1
Step2: step 1 3−3 T 10 −1 3−3
1 −
2 4 − 4 01 4 − 4
p ( x1 | c1 ) = e
[(2 ) (1 − 0)]
2 1/ 2
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
1 3−3 T 10 −1 3−3
1 −
2 4 − 4 01 4 − 4
p ( x1 | c1 ) = e
[(2 ) (1 − 0)]
2 1/ 2
1 0 T 10 −1 0
1 −
2 0 01 0
= e
(2 )
1 0 1
= e = = 0.159 Higher
(2 ) (2 )
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟏 𝒄𝟐 = 𝑷( 𝟑, 𝟒 (𝟏𝟒, 𝟏𝟏))
1 3−14 T 10 −1 3−14
1 −
2 4 −11 01 4 −11
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2
1 −11 T 10 −1 −11
1 −
2 − 7 01 − 7
= e
(2 )
−170
1
= e 2
(2 )
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟐 𝒄𝟏 = 𝑷( 𝟔, 𝟓 (𝟑, 𝟒))
1 6 −3 T 10 −1 6 −3
1 −
2 5 − 4 01 5 − 4
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2
1 3 T 10 −1 3
1 −
2 1 01 1
= e
(2 )
1 −5
= e = 0.00107 Higher
(2 )
EM Clustering with Dataset
X1 (3,4)
X2 (6,5)
X3 (9,8)
X4 (14,11)
𝑷 𝒙𝟐 𝒄𝟐 = 𝑷( 𝟔, 𝟓 (𝟏𝟒, 𝟏𝟏))
1 6 −14 T 10 −1 6 −14
1 −
2 5 −11 01 5 −11
p ( x1 | c 2 ) = e
[(2 ) ]
2 1/ 2
1 −8 T 10 −1 −8
1 −
2 − 6 01 − 6
= e
(2 )
1 100
1 − 2 *(64+36 ) 1 − 1 −50
= e = e 2
= e
(2 ) (2 ) (2 )
EM Applications
• After complete execution status of cluster
– C1: x1,x2
– C2: x3, x4
• M Step:
– Calculate new centroid
1 = Avg of x1 and x2 2
=
3+6 4+5
2
,
2
= 9 9
,
2 2
= (4.5, 4.5) =
9+14 5+11
2
,
2
= 23 19
,
2 2
= (11.5, 9.5)
Then after second iteration you may get the actual
result if cluster data doesn’t get change
Object Class Recognition in CBIR
• The Goal:
Automatic image labeling (annotation)
to enable object-based image retrieval
Problem Statement
Known: Some images and their corresponding
descriptions
•••
{trees, grass, cherry trees} {cheetah, trunk} {mountains, sky} {beach, sky, trees, water}
To solve: What object classes are present in new
images
•••
? ? ? ?
Abstract Regions
Original Images Color Regions Texture Regions Line Clusters
Boat, Water,
Sky, …!
boat
building
Object Model Learning (Ideal)
Sk
y
Tree Sky =
+ Tree =
sky Water Water
tree =
Boat =
water Boat Learned Mode
boat region attributes → object
Object Model Learning (Real)
? Sky = ?
+ Tree = ?
?
{sky, tree, water, boat} Water ?
? =
Boat = ?
? Learned Mode
region attributes → object
Model Initial Estimation
• Estimate the initial model of an object using
all the region features from all images that
contain the object
Tree
Sky
EM Variant
Initial Model for “trees” Final Model for “trees”
EM
Initial Model for “sky” Final Model for “sky”
Object Model Learning
Assumptions
• The feature distribution of each
object within a region is a Gaussian;
• Each image is a set of regions, each
of which can be modeled as a
mixture of multivariate Gaussian
distributions.
1. Initialization Step (Example)
Image & description
I1 O1 I2 O1 I3 O2
O2 O3 O3
qO(0)1 qO(0)2 qO(0)3
W=0.5 W=0.5 W=0.5 W=0.5 W=0.5 W=0.5
W=0.5 W=0.5 W=0.5 W=0.5 W=0.5 W=0.5
2. Iteration Step (Example)
I1 O1 I2 O1 I3 O2
O2 O3 O3
E-Step
qO( 1p ) qO( 2p ) qO( 3p )
W=0.8 W=0.2 W=0.2 W=0.8 W=0.2 W=0.8
W=0.8 W=0.2 W=0.8 W=0.2 W=0.2 W=0.8
M-Step
q( p + 1) ( p + 1)
q qO( 3p+ 1)
O1 O2
Image Labeling
Object Model
Database
Test Image Color Regions
compare Tree
Sky
To calculate p(tree | image)
p( tree| )
p(tree | image) = max p( tree| ) p(o | FIa ) = max p ( o | r a
)
r FI
a a
p( tree| )
p( tree| )
Experiments
• 860 images
• 18 keywords: mountains (30), orangutan (37), track
(40), tree trunk (43), football field (43), beach (45),
prairie grass (53), cherry tree (53), snow (54), zebra
(56), polar bear (56), lion (71), water (76),
chimpanzee (79), cheetah (112), sky (259), grass
(272), tree (361).
• A set of cross-validation experiments (80% as the
training set and the other 20% as the test set)
ROC Charts
1 1
0.8 0.8
True Positive Rate
True Positive Rate
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
False Positive Rate False Positive Rate
Independent Treatment of Using Intersection of
Color and Texture Color and Texture
Sample Results
cheetah
Sample Results (Cont.)
grass
Sample Results (Cont.)
lion
Any Question?