0% found this document useful (0 votes)

34 views14 pages

The Expectation-Maximization Algorithm

The Expectation-Maximization (EM) algorithm is a statistical method used for estimating parameters of probability distributions, particularly in situations where data is incomplete or inaccessible. It involves two main steps: the expectation step, which estimates the underlying data based on current parameter estimates, and the maximization step, which updates the parameter estimates using the expected data. The EM algorithm has numerous applications in fields such as genetics, econometrics, and signal processing, including tomographic image reconstruction and training hidden Markov models.

Uploaded by

vrishti.godhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views14 pages

The Expectation-Maximization Algorithm

Uploaded by

vrishti.godhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

The Expectatio

Maximization Algorithm
common task in signal processing is the estimation maximization) algorithm is ideally suited to problems of this
of the parameters of a probability distribution func- sort, in that it produces maximum-likelihood (ML) estimates
.tion. Perhaps the most frequently encountered esti- of parameters when there is a many-to-one mapping from an
mation problem is the estimation of the mean of a signal in underlying distribution to the distribution goveming the ob-
noise. In many parameter estimation problems the situation servation. In this article, the EM algorithm is presented at a
is more complicated because direct access to the data neces- level suitable for signal processing practitioners who have
sary to estimate the parameters is impossible, or some of the had some exposure to estimation theory. (A brief summary
data are missing. Such difficulties arise when an outcome is of ML estimation is provided in Box 1 for review.)
a result of an accumulation of simpler outcomes, or when The EM algorithm consists of two major steps: an expec-
outcomes are clumped together, for example, in a binning or tation step, followed by a maximization step. The expectation
histogram operation. There may also be data dropouts or is with respect to the unknown underlying variables, using
clustcrinr i n such LI way that the niimhcr 01' the cui-wit c\tiniatc of the parametcrs ancl
undcrlying data points i s unl\no\\ n (censor- conilitioncil upon the observ;itions. The
ing ;tnd/or truncation). The FV (eq~ectation- niaxinii/atiun \ ~ e pthcn provide\ a new e\ti-

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 47

in v - w x R I Q & /cr, nnhi q a h r m F

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
mate of the parameters. These two steps are iterated until
convergence. The concept is illustrated in Fig. I.
The EM algorithm was discovered and employed inde-
pendently by several different researchers until Dempster [ 11
brought their ideas together, proved convergence, and coined
the term “EM algorithm.” Since that seminal work, hundreds
of papers employing the EM algorithm in many areas have
been published. A large list of references is found at [2].A
typical application area of the EM algorithm is in genetics,
where the observed data (the phenotype) is a function of the
underlying, unobserved gene pattern (the genotype), e.g. [3].
Another area is estimating parameters of mixture distribu-
tions, e.g. [4]. The EM algorithm has also been widely used
in econometric, clinical, and sociological studies that have
unknown factors affecting the outcomes [5]. Some applica-
tions to the theory of statistical methods are found in [6].
In the area of signal processing applications, the largest
area of interest in the EM algorithm is in maximum likelihood
tomographic image reconstruction, e.g. [7, 81 Another com-
monly cited application is training of hidden Markov models,
especially for speech recognition, e.g. [91. The books [ 10,111
have chapters with extensive development on hidden Markov
models (HMMs).
Other signal processing and engineering applications be-
gan appearing in about 1985. These include: parameter esti-
mation [la, 131; ARMA modeling [14, 151; image modeling,
reconstruction, and processing [ 16, 171; simultaneous detec-
tion and estimation [18, 19, 201; pattern recognition and
neural network training [21, 22, 231; direction finding [24];
noise suppression 1251; spectroscopy [27]; signal and se-
quence detection [28]; time-delay estimation [29]; and spe-
cialized developments of the EM algorithm itself [30].The
EM algorithm has been the subject for multiprocessing algo-
rithm development [31]. The EM algorithm is also related to objects may be further subdivided into two shapes: round and
algorithms used in information theory to compute channel square. Using a pattern recognizer, it is desired to determine
capacity and rate distortion functions [32, 331, since the the probability of a dark object. For the sake of the example,
expectation step in the EM algorithm produces a result simi- assume that the objects are known to be trinomially distrib-
lar to entropy. The EM algorithm is philosophically similar uted. Let the random variable X,represent the number of
to ML detection in the presence of unknown phase (incoher- round dark objects, X, represent the number of square dark
ent detection) or other unknown parameters: the likelihood objects, and X,represent the number of light objects, and let
function is averaged with respect to the unknown quantity [x,,x2,x3]’ = x be the vector of values the random variables
(i.e., the expected value of the likelihood function is com- take for some image. (In this article the convention is that
puted) before detection, which is a maximization step (see, vectors are printed in bold font, and scalars are printed in
e.g., [34, Chap. 51. math italic. All vectors by convention are taken as column
vectors. Uppercase letters are random variables.) Assume
Ector’s Problem: An Introductory Example further that enough is known about the probabilities of the
different classes so that the probability may be written as
The image-processing example introduced by Ector and Hat-
ter (see the “Tale of Two Distributions” sidebar), although
somewhat contrived, illustrates most of the principles of the
EM algorithm as well as the notational conventions of this
article. In many aspects it is similar to a problem that is of
practical interest - the emission tomography (ET) problem
discussed later in this article.
Suppose that in an image pattern-recognition problem, where p is an unknown parameter of the distribution and n =
there are two general classes to be distinguished: a class of x,+x,+x,. The notation f ( x ,,x2,x3Ip)is typical throughout the
dark objects and a class of light objects. The class of dark article; it is used to indicate the probability function which

48 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
may be either a probability density function (pdf) or a prob-
ability mass function (pmf).
A feature extractor is employed that can distinguish which
objects are light and which are dark, but cannot distinguish
shape. Let bI,y2]T = y be the number of dark objects and number
of light objects detected, respectively, so that y , = xI + x2and yz
= x,, and let the corresponding random variables be Y , and Yz.
There is a many-to-one mapping between (x,,x2}and yl. For
example, if y , = 3, there is no way to tell from the measurements
whether x1= 1and x, = 2 or x, = 2 and x, = 1. The EM algorithm
is specifically designed for problems with such many-to-one
mapping\. Thcn (see Box Z),

(The symbol g is used to indicate the probability function for

the observed data.) From the observation of y I and y2, com-
pute the ML estimate of p ,

where “argmax” means “the value that maximizes the func-

tion.” In this example, it would be a simple matter to deter-
mine an ML estimate of p . In more interesting problems,
however. such straightforward estimation is not uossible. In
the interest of introdicing the EM we not take I . An overview ofthe EM algorithm. After initialization, the E-
the direct approach to the ML estimate. Taking the logarithm step and the A h t e p are alternated until the parameter estimate
has converged (no more change in the estimate).
of the likelihood often simplifies the maximization and yields
equivalent results since log is an increasing function, so Eq.
(3) may be written as

The idea behind the EM algorithm is that, even though we

do not know xI and x,, knowledge of the underlying distribu-
tionf(x,, x2,x31p)can be used to determine an estimate forp.
This is done by first estimating the underlying data, then
using these data to update our estimate of the parameter. This
is repeated until convergence. LetpIk1indicate the estimate of 2. Illustration of many-to-one mapping from X to Y. The point y
p after the kth iteration, k = 1,2,.... An initial parameter value is the image of x, and the set X(y) is the inverse map of y.
p”’ is assumed. The algorithm consists of two major steps:
Expectation Step (E-step). Compute the expected value (5)
of the x data using the current estimate of the parameter and
the observed data.
The expected value of x l , given the measurement yI and Similarly,
based upon the current estimate of the parameter, may be
computed as

In the current example, x3is known and does not need to be

Using the results of Box 2, computed.

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 49

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
Maximization Step (M-step). Use the data from the
expectation step as if it were actually measured data to
determine an ML estimate of the parameter. This estimated
data is sometimes called "imputed" data. As a numerical example, suppose that the true parameter
In this example, with xl"+l' and xZ"+" imputed and x3 is p = 0.5, n = 100 samples are drawn, with y , = 100. (The
available, the ML estimate of the parameter is obtained by true values of x, and x2 are 25 and 38, respectively, but the
algorithm does not know this.) Table 1 illustrates the result
taking the derivative of logf(x,ik"', xZ[k+l',x,lp) with respect
of the algorithm starting from p"' = 0. The final estimate p"
to p , equating it to zero, and solving for p , = 0.52 is in fact the ML estimate of p that would have been
obtained by maximizing Eq. (1) with respect top, had the x
(7) data been available.

General Statement of the EM Algorithm

Let Y denote the sample space of the observations, and let y

The estimate x,'"'' is not used in Eq. (7) and so, for this ERW"' x
denote an observation from Y. Let denote the under-
example, need not be computed. The EM algorithm consists
x,
lying space and let x E[W"be an outcome from with m < n.
The data x is referred to as the complete dufu. The complete
of iterating Eqs. (6) and (7) until convergence. Intermediate data x is not observed directly, but only by means of y, where
computation and storage may be eliminated by substituting y = y(x), and y(x) is a many-to-one mapping. An observation
Eq. (6) into Eq. (7) to obtain a one-step update: y determines a subset of x,which is denoted as ~ ( y )Figure
.
2 illustrates the mapping.
The probability density function (pdf) of the complete data
isfx(xlO) =f(xl0), where 8EOCR'is the set of parameters of
the density. (We will refer to the density of the random
variables for convenience, even for discrete random variables
for which probability mass function (pmf) would be appro-
priate. Subscripts indicating the random variable are sup-
pressed, with the argument to the density indicating the
random variable.) The pdf f is assumed to be a continuous
function of 0 and appropriately differentiable. The ML esti-
mate of 0 is assumed to lie within the region 0. The pdf of
the incomplete data is

Let

denote the likelihood function and let

denote the log-likelihood function.

The basic idea behind the EM algorithm is that we would
like to find 0 to maximize logf(xle), but we do not have the
data x to compute the log-likelihood. So instead, we maxi-
mize the expectation of log,f(xIe) given the data y and our
current estimate of 8. This can be expressed in two steps. Let
Ork' be our estimate of the parameters at the kth iteration.
For the E-step compute:

Q(81dk')= E[logf(~l8)ly,B'~']. (9)

3. Representation of ET. There are B boxes in the body and D de- It is important to distinguish between the first and second
tectors surrounding the body.
arguments of the Q functions. The second argument is a

50 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
conditioning argument to the expectation and is regarded as restriction to distributions in the exponential family. These
fixed and known at every E-step. The first argument condi- are pdfs (or pmfs) of the form
tions the likelihood of the complete data.
For the M-step let O'k+'l be that value of 0 which maximizes f(xie) = h(x) exp[c(0ITt(x)i/a(e) (11)
Q(O1O'k'):
where 8 is a vector of parameters for the family [35,36].The
function t(x) is called the suflicient statistic of the family (a
statistic is sufficient if it provides all of the information
necessary to estimate the parameters o i the distribution from
It is important to note that the maximization is with respect
the data [35,36]).Members of the exponential family include
to the first argument of the Q function, the conditioner of the
most distributions of engineering interest, including Gauss-
complete data likelihood.
ian, Poisson, binomial, uniform, Rayleigh, and others. For
The EM algorithm consists of choosing an initial elk1,then exponential families, the E-step can be written as
performing the E-step and the M-step successively until
convergence. Convergence may be determined by examining
when the parameters quit changing, i.e., stop when
I18ik1- OLk-llII< E for some E and some appropriate distance
measure (I.((.
Let tlk+'l - E[t(~)ly,O'~'].
As a conditional expectation is an
The general form of the EM algorithm as stated in Eqs. (9)
and (IO) may be specialized and simplified somewhat by estimator, t'k+''is an estimate of the sufficient statistic (The

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 51

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
.___ ___
Convergence of the EM Algorithm
For every iterative algorithm, the question of convergence
needs to be addressed: does the algorithm come finally to a
I --------+
Noise
source
solution, or does it iterate ad museum, ever learning but never
coming to a knowledge of the truth? For the EM algorithm,
Processor the convergence may be stated simply: at every iteration of
the EM algorithm, a value of the parameter is computed so
that the likelihood function does not decrease. That is, at
L- ~ J
4. Single-microphone ANC system.
every iteration the estimated parameter provides an increase
in the likelihood function increases until a local maximum is
achieved, at which point. the likelihood function cannot in-
crease (but will not decrease). Box 3 contains a more precise
statement of this convergence for the general EM algorithm.
Despite the convergencetheorem in Box 3, there is no guarantee
that the convergence will be to a global maximum. For likelihood
functions with multiple maxima, convergence will be to a local
maximum which depends on the initial starting point 0‘”’.
The convergence rate of the EM algorithm is also of
interest. Based on mathematical and empirical examinations,
it has been determined that the convergence rate is usually
5. Processor block diagram of the ANC system. slower than the quadratic convergence typically available
with a Newton’s-type method [4]. However, as observed by
EM algorithm is sometimes called the estimatiodmaximiza- Dempster [I], the convergence near the maximum (at least
tion algorithm because, for exponential families, the first step for exponential families) depends upon the eigenvalues of the
is an estimator. It has also been called the expectationhodi- Hessian 01the updale lunction M , so that rapid convergence
fication algorithm [9]). In light of the fact that the M-step will may be possible. In any event, even with potentially slow
be maximizing convergence there are advantages to EM algorithms over
Newton’s algorithms. In the first place, no Hessian needs to
E[log b(x)ly,e[k’] + c ( e y tLk+*’- log a(0) be computed. Also, there is no chance of “overshooting”’ the
target or diverging away from the maximum. The EM
with respect to 8 and that E[log b(x)ly,q[”] does not depend
algorithm is guaranteed to be stable and to converge to an
upon 0, it is sufficient to write:
ML estimate. Further discussion of convergence appears in
E-step Compute [37, 381.

P+”= E[t(x)ly,fP’. (12) Applications of the EM Algorithm

M-step Compute In this section several applications of the EM algorithm to
problems of signal processing interest are presented to illustrate
the computations required in the steps of the algorithm and also
to demonstrate the breadth of applications to which it may be
The EM algorithm may be diagrammed starting from an applied. The example in ET image reconstruction section and
initial guess of the parameter f3‘”’as follows: the previous introductory example illustrate the case in which
the densities are members of the exponential family. The other
M-slep examples in this section treat densities that are not in the
-9 ... , exponential family, so the more general statement of the EM
algorithm must be applied. The focus of the examples is on the
The EM algorithm has the advantage of being simple, at least EM algorithm; assumptions and details of the systems involved
in principle; actually computing the expectations and perform- are therefore not presented. The interested reader is encouraged
ing the maximizations may be computationally taxing. In adcl- to examine the references for details.
tion, as discussed in the next section, every iteration of the EM
algorithm increases the likelihood function until a point of
(local) maximum is reached. Unlike other optimization tech- Introductory Example, Revisited
niques, it is not necessary to compute gradients or Hessians, nor
is it necessary to wony about setting step-size parameters, as The multinomial distribution of the introductory example is
algorithms such as gradient descent require. a member of the exponential family with t(x) = x:

52 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
Box 2: Combination and
conditional expectations multinomials
Let Xi,Xz,X3have a multinomial distribution with class prob
abilities (pl,p2,p3), so

( x , + x z + x , ) ! x, *z 13
P(X,=x,,X, =x2,X7=x3)= PI Pz P3
x1!xz!x3!

outcomes can be combined to form a

omes. Let Y = X I + X2. The probability
P(Y,X3) can be determmed as follows:

~P(xl=i,xz=y-l,X~=.x,)
4 0

eorem. So ( X i +
+ p z p 3 ) . This
IY=y], it is first
ity, P(Xi = xilY
probability can

f(x,,x29x,lp) = a\sumed to be independent of each other. Let the set of

1 p unknown parameters be denoted by h = { h(l),h(2),..., h ( B ) } .
( x,~ !x2e x! ~ [ ~ l 2o P4 ~ . 2l o 4 ~ ~ ] [ ~ ~ ] ] ( l q ) A photon emission from box b is detected in tube d with
) !x,
probability p(b,d), and it inay be assumed that all emitted
photons are detected by some detector, so that
The E-step consists simply of estimating the underlying
data, given the current estimate and the data. This is followed D
(14)
&J(b,d) = 1’
by a straightforward maximization. d=l

ET Image Reconstruction Based upon the geometry of the sensors and the body it is
possible to determine p(b,d). The detector variables y(d) are
In ET [7], tissues within a body are stiinulated to emit Poisson distributed,
photons. These photons are detected by detectors surround-
ing the tissue. For purposes of computation the body is Ud)’
f ( y l h ( d ) )= P ( y ( d ) = y ) = e-h(d)-
divided into B boxes. The number of photons generated in Y!
each box is denoted by n(b),b = 1,2,..., B. The number of
photons detected in each detector is denoted by y(d), d = and it can be shown that
1,2,...,D , as shown in Fig. 3. Let y = [y(l),y(2) ,...,y ( d ) ]denote
B
the vector of observations.
= E[y(d)l= CW)P(W.
The generation of the photons from box b can be described b=l

as a Poisson process with mean h(b),i.e.,

Let x(b,d) be the number of emissions from box b
I@)” detected in detector d and let x = {x(b,d),b= 1,...,B , d =
f(nlh(b))= P(n(b)= nlh(b))= l,...,D}.For any given set of detector data { y ( d ) } there
, are
n!
many different ways that the photons could have been
The parameter h(b)is a function of the tissue density so generated. There is thus a many-to-one mapping from
that by estimating the parameters h(b) in each box it is x(b,d)to y ( d ) , and x constitutes the complete data set. Each
possible to construct an image of the body. The boxes are variable of the complete data x(b,d)is Poisson with mean

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 53

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
Assuming that each box generates independently of every d=l, ,D

other box and that the detectors operate independently, the d)logp(b,d ) - logx'k+l'(b,d)!
X[k+l'(b,
likelihood function of the complete data is I)
h[k+Il( b )= C x " " ( b , d ) p ( b , d )
d =I

l,(h)= f ( x l h )= e-b(b,d) h(b,d)x'b'd'

b=1,...,B x(b,d)! where Eq. (1 4) has been used.
d=1,...,D
Equations (18) and ( I 9) may be iterated until convergence.
and, using Eq. (15), the log-likelihood function is The overhead of storing x'"'' (b,d) at each iteration may be
eliminated by substituting Eq. (1 8) into Eq. (19) using Eq.
(15), much as was done in the introductory example. This
gives

Application of the EM algorithm is straightforward. Pois-

Active Noise Cancellation (ANC)
son distributions are in the exponential family. The sufficient
statistics for the distribution are the data, t(x) = x. Let be
the estimate of the parameters at the kth iteration and let Active noise cancellation is accomplished by measuring a
xrkl(b,d)be the estimate of the complete data. For the E-step, noise signal and using a speaker driven out of phase with the
compute noise to cancel it. In many traditional ANC techniques, two
microphones are used in conjunction with an adaptive filter
to provide cancellation (see, e.g., [39, 401). Using the EM
algorithm, ANC may be achieved with only one microphone
where the latter equality follows since each box is inde- [41]. The physical system is depicted in Fig. 4, with a block
diagram for the ANC in Fig. 5.
pendent. Since x(b,d) is Poisson with mean (b,d) and
The signal to be canceled is modeled as the output of an
y(d)=zf=,x(b,d) is Poisson with mean
all-pole filter,
h'"(d) = f=,h'kl(b,d),the conditional expectation may be
computed (using techniques similar to those in Box 2)

where

(b,d)is used in the likelihood function

For the M-step, xLk+ll
(17), which is maximized with respect to h(b): and u(t) is a white, unit-variance, zero-mean Gaussian proc-
ess. The signal r(t) is generated by the processor and corre-
sponds to the input of the speaker; the delay z-M is the delay
from the speaker to the microphone. The signal (T, v(t)models
the measurement error at the microphone. According to Fig.
5, the input to the processor can be written as

y ( t ) = s ( t ) + cJcv(t);

we assume that v(t) is a unit-variance, white Gaussian proc-

ess. The set of unknown parameters is 8 = [a',cJ:, 021'.
A block of N measurements is used for processing. The
observed data vector is

6. Illustration of a four-state HMM showing the states, the distri-

butions in each state, and some probabilistic transitions between these observations span a set of autoregressive samples given
the states.
bY

54 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
= [$(I - p ) , $(2- p ) , ..., 4 ~ 9 1 ~ .
The complete data set is x = [y‘, If we knew s,
estimation of the AR parameters would be straightforward
using familiar spectrum estimation techniques.
The likelihood function for the complete data is

Axle) =f(y,slO) =f(yls,O)f(sle).

The conditioning step provides important leverage be-

cause it is straightforward to determine f(yls,O). The condi-
tioning can be further broken down as

Then

and (see [42, page 1871)

The E-step may be computed as

E[logf(xlO)Iy,0[k’]= logf(s,-,(0)10)- Nlogo, - Nlogo,

Taking the gradient with respect to a and derivatives with

respect to (T, and ( T ~to maximize yields

which may be computed using a Kalman smoother. The

variable sp may be put into state-space form as

s,(t) = @ s , ( t - l ) + g u ( t )
y(t) = h T s p ( t+) o,v(t)

where

(22)
The expectations in Eqs. (20), (21), and (22) are first and
second moments of Gaussians, conditioned upon observation and

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 55

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
Let the elements of the HMM be parameterized by 0, i.e.,
there is a mapping 0 3 (A(8),j~(0),&~(~10)).
The mapping is
assumed to be appropriately smooth. In practice, the initial
probability and transition probabilities are some of the ele-
ments of 0. The parameter estimation problem for an HMM
is this: given a sequence of observations, y = ( y , , y2,...,yr),
determine the parameter 8 which maximizes the likelihood
function

7. Representation of signals in a spread-spectrum multiple-access

system.
That is, determine the initial state probabilities and the
T transition probabilities, as well as any parameters of the
11 = [O )...)0,1]. density functions which maximize the likelihood function
(23). From the complicated structure of (23), it is clear that
With an estimate of the parameters, the canceling signal
this is a complicated maximization problem. The EM algo-
c(t+M) is obtained by estimating s ( t + M ) using E[s,,(~)ly,B]
and rithm, however, provides the power necessary to compute
0IL1. without difficulty.
Let s = [~~~,s,,s~,...,s~]~
be a vector of the (unobserved)
HMMs states. The complete data vector can be expressed as x = (y,s).
The pdf of the complete data can be written as
The hidden Markov model is a stochastic model of a process
that exhibits features that change over time. It has been
applied in a broad variety of sequential pattern-recognition
problems such as speech recognition and handwriting recog- This factorization, with the pdf of the observation condi-
nition [9, 431. An overview appeared in Signal Processing tioned upon the unknown state sequence and the distribution
Magazine in [44]. Detailed descriptions of HMMs and their of the unknown state sequence, turns out to be the key step
application are given in [9, 10, 111. in the application of the EM algorithm.
A Markov chain is a stochastic model of a system that is Because of the Markov structure of the state, the state
capable of being in a finite number of states { 1,2,...,S } . The probabilities in Eq. (24) may be written
current state of the system is denoted by s,. The probability
1
of transition from a state at the current (discrete) time t to any
other state at time t + 1 depends only on the current state, and t=l

not on any prior states:

The pdf of the observations, conditioned upon the unob-
P(St+l = j I St = iJ-1 = il,...) = P(St+l = j I St = i). served states, factors as
T
It is common to express the transition probabilities as a matrix
A with elements P(s,+,=jls, = i) = at,. The initial state so is 1=1

chosen according to the probability

We will assume that the density in each state is Gaussian
T T with known diagonal covariance and unknown mean, pJ.
n = [P(so= l), ... ) P(S0 = S)] = [XI )...)ns]
(Many other distributions are possible, e.g., discrete selec-
In each state at time t, sl, a (possibly vector) random tion, Poisson, exponential, or Gaussian with unknown mean
variable is Y, selected according to the densityf(Y, = ylls, = and variance [45].) Then
i) =fs,( y ) , as shown in Fig. 6. The variable y is observed, but
r r
the underlying state is not, hence the name hidden Markov
model. The set of densities ,fi,f2 ,...,f , is denoted as ,fi,?,.The
triple ( A , 7 ~ 4 defines
, ~ ~ ) the HMM.
The HMM operates as follows: an initial state s g is chosen (27)
Let s = (1,2,...,S}7+L denote the set of all possible state
according to the probability law 7c. A succeeding state s, is sequences, including the initial state su. In the E-step
chosen according to the Markov probability transition A. An
output y , is chosen according to Jy,. Then a new state is Q(8I0"") = E[log ,f(y,sl0)ly,01,
chosen, and the process continues.

56 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
since the expectation is conditioned upon the observations,
the only random component comes from the state variable.
The E-step can thus be written as f--
' Matched
Filter1
. bl(4 ,

i
I
!__ Matched y2(i) Signal
, Detection b2(i)
-
I
r(t) Filter 2 Algorithm
The conditional probability is

8. Multiple-access receiver matched-filter bank.

Substituting from Eqs. (27) and (28),

Spread-Spectrum Multi-User Communication

py+ll = arg max Q(elOLk')
ws

In direct-sequence spread-spectrum multiple-access (SSMA)

The maximizations may be accomplished by differentiat- communications, all user- in a channel transmit simultane-
ing and equating the result to zero and solving for the appro- ously, using quasi-orthogonal spreading codes to reduce the
priate argument. For the mean, the result is inter-user interference [46].The system block diagram is
shown in Fig. 7. A signal received in a K-user system through
)f(sle'"'>C,
C,,,yf(~ls, elk1 .y,=yYr a Gaussian channel may be written as
CL:+]' =
,,=J
c,,,f(yls,elkJ)f(sleikl)~t
r(t) = S(t,b) + (T N ( t )
Efficient for computing this expression have where N(t) is unit-variance, zero-mean, white Gaussian noire
been developed based upon forward and backward inductive
and
computation (dynamic programming or the Viterbi algo-
rithm); see e.g. [lo, 1 I]. K M
S(t,b)= CU,z & ( i ) ~ ~ ( t - i T - ~ ~ )
The Markov chain parameters K, and a,, may also be k=l *=-In

obtained by maximizing Eq. (29) with constraints to preserve

the probabilistic nature of the parameters: is the composite signal from all K transmitters. Here akis the
amplitude of the kth transmitted signal (as seen at the re-
S
ceiver), b represents the symbols of all the users, bk(i)is the
nik+'l= argma~Q(810'~')
subject to E n , = 1, K , 2 0
11.1 r=l ith bit of the Mh user, zk is the channel propagation delay for
the kth user, and s A t ) is the signaling
- waveform of the kth user
S including the spreading code. For this example, coherent
11
= argmaxQ(81e'k1)subject to
%I ,=I
= 1, uz,,-
' reception of each user is assumed so that the amplitudes are
real.
This may be accomplished using Lagrange multipliers. At the receiver the signal is Passed through a bank of
Then the condition matched filters, with a filter matched to the spreading signal
of each of the users, as shown in Fig. 8. (This assumes that
synchronization for each user has been obtained.) The set of
matched filter outputs for the ith bit interval is

TD
(with h a Lagrange multiplier) leads to y(i) = Lyi(i), y2(i),..., y ~ ( i ) l .

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 57

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
Because the interference among the users is similar to in- likelihood of the unobserved data. From Eq. (31),f(ylb,a) is
tersymbol interference, optimal detection requires dealing Gaussian. To compute the E-step
with the entire sequence of matched filter vectors
E[log f(xla)l y, aik1]
= z f ( b l y , a‘k1)
log f(xl a)
bsl+l)(M+l)K

it is necessary to determine the conditional probability

For a Gaussian channel, it may be shown that
f(bly,a‘kl).
y = H(b)a + z, It is revealing to consider a single-user system. In this case
(30)
the log-likelihood function is
where H(b) depends upon the correlations between the
spreading signals and the bits transmitted and z is non-white,
zero-mean Gaussian noise. The likelihood function for the
received sequence may be written as (see [47])
and the E-step becomes

where R(b) and S(b) depend upon the bits and correlations
(33)
and c is a constant that makes the density integrate to 1. Note
that even though the noise is Gaussian, which is in the The conditional probability required for the expectation is
exponential family, the overall likelihood function is not
Gaussian because of the presence of the random bits - it is
actually a mixture of Gaussians. For the special case of only
a single user the likelihood function becomes

r 1 M 1

What is ultimately desired from the detector is the set of

bits for each user. It has been shown [46] that the inter-user
interference degrades the probability of error very little,
(34)
provided that sophisticated detection algorithms are em- Substituting Eq. (34) into Eq. (33) yields
ployed after the matched filters. However, most of the algo-
rithms that have been developed require knowledge of the a M (35)
amplitudes of each user [48]. Therefore, in order to determine E[logf(xl a, )I y, U : ” ] = yI(i) tanh(ajk’y,(i) / 0 2 )
o2i=-y
the bits reliably, the amplitude of each user must also be
af
known. Seen from the point of view of amplitude estimation, -- ( 2 M + 1) +constant
the bits are unknown nuisance parameters. (Other estimation 202
schemes relying on decision feedback may take a different
point of view.) Conveniently, Eq. (35) is quadratic in a, and the M-step is
easily computed by differentiating Eq. (35)with respect to a,,
If the bits were known, an ML estimate of the amplitudes
giving
could be easily obtained: a,n,= S(b)-’R(b)y. Lacking the bits,
however, more sophisticated tools for obtaining the ampli-
tudes must be applied as a precursor to detecting the bits. One
approach to estimating the signal amplitudes is the EM algo-
rithm [47]. For purposes of applying the EM algorithm, the Equation (36) gives the update equation for the amplitude
complete data set is x = { y, b} and the parameter set is 8 = a. estimate, which may be iterated until convergence. For mul-
To compute the expectations in the E-step, it is assumed that tiple-users, the E-step and M-step are structurally similar, but
the bits are independent and equally likely f I . more involved computationally [47].
The likelihood function of the complete data is
Summary
flxla) =f(y,bla) =f(ylb,a)f(bla). (32)
The EM algorithm may be employed when there is an under-
This conditioning is similar to that of Eqs. (19) and (24): lying set with a known distribution function that is observed
the complete-data likelihood is broken into a likelihood of the by means of a many-to-one mapping. If the distribution of the
observation, conditioned upon the unobserved data times a underlying complete data is exponential, the EM algorithm

58 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
may be specialized as in Eqs. (12) and (13). Otherwise, it will 10. L. Rabiner and B.-H. hang, Fnndamentals of Speech Recognition.
Prentice-Hall, 1993.
be necessary to use the general statement of the EM algorithm
(Eqs. (9) and (10)). In many cases, the type of conditioning 11. J.R. Deller, J.G. Proakis, and J.H.L. Hansen, Discrete-Time Processing
ofSpeech Signals. Macmillan, 1993.
exhibited in Eqs. (19), (24) or (32) may be used: the observed
data is conditioned upon data not observed so that the likeli- 12. M. Segal andE. Weinstein, “Parameter estimationof continuous dynami-
cal linear systems given discrete time observations,” P. IEEE, ~01.75, no. 5,
hood function may be computed. In general, if the complete pp. 727-729, 1987.
data set is x = (y,z) for some unobserved z, then
13. S. Zabin and H. Poor, “Efficient estimation of class-A noise parameters
via the EM algorithm,” IEEE Trans. Info. T., vol. 37, no. 1, pp. 60-72, 1991,
E[log ,f(~l8)ly,8’~’]= If( ~ly,8‘~’)log f(xl8) dz, 14. A. Isaksson, “Identification of ARX models subject to missing data,”
IEEEAuto C, vol. 38, no. 5, pp. 813-819, 1993.
since, conditioned upon y the only random component of x
1.5. I. Zisknd and D. Hertz, “Maximum likelihood localization of narrow-
is z. band autorcgressive sources via the EM algorithm,” IEEE Trans. Sig. Proc.,
Analytically, the most difficult portion of the EM algo- vol. 41, no. 8, pp. 2719-2724, 1993.
rithm is the E-step. This is also often the most difficult 16. R. Lagendijk, J. Biemond, and D. Boekee, “Identification and restoration
computational step; for the general EM algorithm, the expec- of noisy blurred images using the expectation-maximization algorithm,”
tation must be computed over all values of the unobserved IEEE Trans. ASSP, vol. 38, no. 7, pp. 1180-1191, 1990.
variables. There may be, as in the case of the HMM, efficient 17. A. Katsaggelos and K. Lay, “Maximum likelihood blur identification and
image restoration using the algorithm,” IEEE Trans. Sig. Proc., vol. 39, no.
algorithms to ease the computation, but even these cannot
3, pp. 729-733, 1991.
completely eliminate the computational burden.
18. A. Ansari and R. Viswanathan, “Application of EM algorithm to the
In most instances where the EM algorithm applies, there detection of direct sequence signal in pulsed noise jamming,” IEEE Trans.
are other algorithms that also apply, such as gradient descent Com.,vol.41,no. 8,pp. 1151-1154, 1993.
(see, e.g., [49]). As already observed, however, these algo- 19. M. Fcder, “Parameter estimation and cxtraction of helicoptcr signals
rithms may have problems of their own such as requiring observed with a wide-band interference,” IEEE Trans. Sig. Proc., vol. 41,
derivatives or setting of convergence-rate parameters. Be- no. 1, pp. 232-244, 1993.
cause of its generality and the guaranteed convergence, the 20. G. Kaleh, “Joint parameter estimation and symbol detection for linear
EM algorithm is a good choice to consider for many estima- and nonlinear unknown channels,” IEEE Trans. Com., vol. 42, no. 7, pp.
2506-2413, 1994.
tion problems. Future work will include application in new
and different areas, as well as developments to improve 21, W. Byrne, “Altemating minimization and Boltzman machine learning,”
IEEE Trans. Neural Net., vol. 3, no. 4, pp. 612-620, 1992.
convergence speed and computational structure.
22. M. Jordan and R. Jacobs, “Hierarchical mixtures of experts and the EM
algorithm,” Neural Comp., vol. 6, no. 2, pp. 181-214, 1994.
Todd K. Moon is Associate Professor at the Electrical and
23. R. Streit and T. Luginbuh, “ML training of probabilistic neural net-
Computer Engineering Department and Center for Self-Or- works,” IEEE Trans. Neural Net., vol. 5 , no. 5, pp. 764-783, 1994.
ganizing Intelligent Systems at Utah State University.
24. M. Miller and D. Fuhrmann, “Maximum likelihood narrow-band direc-
tion finding and the EM algorithm,” IEEE Trans. ASSP, vol. 38, no. 9, pp.
References 1560- 1577, 1990.
25. S. Vaseghi and P. Rayner, “Detection and suppression of impulsive noise
1. A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from in speech communication systems,” IEE Proc-I, vol. 137, no. I , pp. 38-46,
incomplete data via the EM algorithm,” J. Royal Statiscal Soc., Ser. R , vol. 1990.
39, no.1, pp.1-38, 1977.
26. E. Weinstein, A. Oppenheim, M. Feder, and J. Buck, “Iterative and
2. For an extensive list of references to papers describing applications of the sequential algorithm for multisensor signal enhancement,” IEEE Trans. Sig.
EM algorithm, see http://www.engineering/usu.edu/Departmenls/ece/Publi-Proc., vol. 42, no. 4, pp. 846-859, 1994.
cations/Moon on the World-Wide Web.
27. S.E. Bialkowski, “Expectation-maximization (EM) algorithm for regres-
3. C. Jiang, “The use of mixture models to detect effects of major genes on sion, deconvolution, and smoothing of shot-noise limited data,” Journal of
quantitative characteristics in a plant-breeding experiment,” Generics, vol. Chemomerrics, 1991.
136, no. 1, pp. 383-394, 1994.
28. C. Georghiades and D. Snyder, “The EM algorithm for symbol unsyn-
4. R. Redner and H.F. Walker, “Mixture densities, maximum-likelihood chronized sequencedetection,”IEEE Comun., vol. 39, no. 1, pp. 54-61,1991.
estimation and the EM algorithm (review),” SIAM Rev., vol. 26, no. 2, pp.
195-237, 1984. 29. N. Antoniadis and A. Hero, “Time-delay estimation for filtered Poisson
processes using an EM-type algorithm,” IEEE Trans. Sig. Proc., vol. 42, no.
5. J. Schmee and G.J. Hahn, “Simple method for regression analysis with 8, pp. 2112-2123, 1994.
censored data,” Technometrics, vol. 21, no. 4, pp. 417-432, 1979.
30. M. Segal and E. Weinstein, “The cascade EM algorithm,” P. IEEE, vol.
6. R.Little and D.Rubin, “On jointly estimating parameters and missing data 76, no. 10, pp. 1388-1390, 1988.
by maximizing the complete-data likelihood,” Am. Statistn., vol. 37, no. 3,
pp. 218-200, 1983. 3 1 , C. Gyulai, S. Bialkowski, G. S. Stiles, and L. Powers, “A comparison of
three multi-platform message-passing interfaces on an expecation-maximi-
7. L.A. Shepp and Y.Vardi, “Maximum likelihood reconstruction lor emis- zation algorithm,” in Proceedings of the 1993 World Conference on
sion tomography,” IEEE Med. Im., vol.1, pp. 113-122, October 1982. Transputers, pp. 4511164, 1993.
8. D.L. Snyder and D.G. Politte, “Image reconstruction from list-mode data 32. R.E. Blahut, “Computation of channel capacity and rate-distortion func-
in an emission tomography system having time-of-flight measurements,” tions,” IEEE Trans. Infor. Th., vol. 18, pp. 460-473, July 1972.
IEEENucl. S., vol. 30, no. 3, pp. 1843-1849, 1983.
33. 1. Csiszar and G. Tusnday, “Information geometry and altcrnating
9. L. Rabiner, “A tutorial on hidden Markov models and selected applications minimization procedures,” Statistics and Decisions, Supplement Issue I ,
in speech recognition,” P. IEEE, vol. 77, no. 2, pp. 257-286, 1989. 1984.

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 59

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.
34. J.G. Proakis, Digital Communications. McCraw Hill, 3rd ed., 1995 44. J. Picone, “Continuous speech recognition using hidden Markov mod-
els,” Signal Processing Magazine, vol. 7, p. 41, July 1990.
35. R.O. Duda and P.E. Hart, Pattern C2assification and Scene Analysis
Wiley, 1973. 45. L.E. Baum, T. Petrie, G. Soules, and N. Weiss, “A maximization
technique occurring in the statistical analysis of probabilistic functions of
36. P.J. Bickel and K.A. Doksum, Mathematical Statistics. Holden-Day, Markov chains,” Ann. Math. Stat., vol. 41, no. 1, pp. 164-171, 1970.
1977.
46. S. Verdu, “Optimum multiuser asymptotic efficiency,” IEEE Trans.
37. C. Wu, “On the convergence properties of the EM algorithm,” Ann.
Cum., vol. COM-34, no. 9, pp. 890-896, September 1986.
Statist., vol. I I , no. 1, pp. 95-103, 1983.
47. H.V. Poor, “On parameter estimation in DS/SSMA formats,”in Proceed-
38. R.A. Boyles, “On the convergence of the EM algorithm,” J . Roy. Sta. B.,
ings uf’the Internutional Conference on Advances in Communications and
vo1.45, no. 1, pp. 47-50, 1983.
Control Systems, 1988.
39. B. Widrow and S.D. Steams, Adaptive Signal Processing. Prentice-Hall,
48. R. Lupas and S. Verdu, “Near-far resistance of multiuser detectors in
1985.
asynchronous channels,” ZEEE Trans. Comm, vol. 38, pp. 496-508, April
40. J.C. Stevens and K.K. Ahuja, “Recent advances in active noise control,” 1990.
AIAA Jourrzal, vol. 29, no. 7, pp. 1058-1067, 1991.
49. A.V. Oppenheim, E. Weinsten, K. C. Zangi, M. Feder, and D. Gauger,
41. M. Feder, A. Oppenheim, and E. Weinstein, “Maximum likelihood noise “Single-sensor active noise cancellation based on the EM algorithm,”
cancellation using the EM algorithm,” IEEE Trans. ASSP., vol. 37, no. 2, pp. ICASSP, 1992.
204-216, 1989.
50. L. Scharf, Stutisticul Signal Processing: Detection, Estimation, und Time
42. S.M. Kay, Modern Spectral Estimation. Prentice-Hall, 1988. Series Analysis. Addison Wesley, 199 I .
43. Y. Singer, “Dynamical encoding of cursive handwriting,” Biol. Cybern., 5 1. H.L.V. Trees, Detection, Estimation, and Modulation Theory, Part I.
vol. 71, no. 3, pp. 227-237, 1994. New York: John Wiley and Sons, 1968.

60 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Authorized licensed use limited to: Birla Institute of Technology and Science. Downloaded on August 20,2022 at 12:57:15 UTC from IEEE Xplore. Restrictions apply.

The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
No ratings yet
A Modified Expectation Maximization Algorithm For Penalized Likelihood Estimation in Emission Tomorzradhv
6 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Theory and Use of EM Algorithm
No ratings yet
Theory and Use of EM Algorithm
26 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Bayesian Networks & EM Algorithm
No ratings yet
Bayesian Networks & EM Algorithm
7 pages
The em Algorithm in ML in Bayesian Learning
No ratings yet
The em Algorithm in ML in Bayesian Learning
12 pages
Lecture 1: Statistical Signal Processing
No ratings yet
Lecture 1: Statistical Signal Processing
3 pages
EM Algorithm for Data Scientists
No ratings yet
EM Algorithm for Data Scientists
31 pages
Expectation Maximization Homework Solution
100% (1)
Expectation Maximization Homework Solution
8 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Em Algorithm Thesis
100% (3)
Em Algorithm Thesis
6 pages
CpE646 6v3 PDF
No ratings yet
CpE646 6v3 PDF
44 pages
Introduction To Data Assimilation and The Ensemble Kalman Filter
No ratings yet
Introduction To Data Assimilation and The Ensemble Kalman Filter
49 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Ref 20
No ratings yet
Ref 20
15 pages
EM Algorithm for HMM Training
No ratings yet
EM Algorithm for HMM Training
5 pages
Fundamentals of Statistical Signal Processing
67% (3)
Fundamentals of Statistical Signal Processing
303 pages
EM Algorithm Explained with Examples
No ratings yet
EM Algorithm Explained with Examples
7 pages
Pattern Recognition Notes Part-2 - Studocu
No ratings yet
Pattern Recognition Notes Part-2 - Studocu
16 pages
EM NonSym - Копия
No ratings yet
EM NonSym - Копия
18 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
EM Algorithm & Gaussian Mixtures
No ratings yet
EM Algorithm & Gaussian Mixtures
10 pages
Package Emcluster': February 1, 2018
No ratings yet
Package Emcluster': February 1, 2018
31 pages
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
No ratings yet
Expectation Maximization: Dekang Lin Department of Computing Science University of Alberta
22 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Aiml Lab Algorithms
No ratings yet
Aiml Lab Algorithms
10 pages
Introduction To Adaptive Arrays 2nd Edition Robert A. Monzingo Full Access
100% (3)
Introduction To Adaptive Arrays 2nd Edition Robert A. Monzingo Full Access
169 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
EM Algorithm - Praveen
No ratings yet
EM Algorithm - Praveen
3 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
Objectives:: Expectation Maximization (Em)
No ratings yet
Objectives:: Expectation Maximization (Em)
17 pages
Phase-Type Distributions & Mixtures of Erlangs
No ratings yet
Phase-Type Distributions & Mixtures of Erlangs
132 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Commonly Used For Clustering, Where Latent Variables Are Inferred and Has Applications in Various Fields, Including
No ratings yet
Commonly Used For Clustering, Where Latent Variables Are Inferred and Has Applications in Various Fields, Including
2 pages
EM Algorithm for Statisticians
No ratings yet
EM Algorithm for Statisticians
36 pages
05 Vae
No ratings yet
05 Vae
76 pages
Intro Aml FP
No ratings yet
Intro Aml FP
92 pages
Chapter 02 Understanding of Data
No ratings yet
Chapter 02 Understanding of Data
63 pages
Tutorial On Diffusion Models
No ratings yet
Tutorial On Diffusion Models
4 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
The Parallel EM Algorithm and Its Applications in Computer Vision
No ratings yet
The Parallel EM Algorithm and Its Applications in Computer Vision
8 pages
Recursive Estimation
No ratings yet
Recursive Estimation
12 pages
Adctest
No ratings yet
Adctest
5 pages
CM Latent - Models 2022
No ratings yet
CM Latent - Models 2022
27 pages
SSP Estimation
No ratings yet
SSP Estimation
303 pages
Introduction To Adaptive Arrays 2nd Edition Robert A. Monzingo Download
No ratings yet
Introduction To Adaptive Arrays 2nd Edition Robert A. Monzingo Download
61 pages
SSPI Lecture 3 Estimation Intro 2025
No ratings yet
SSPI Lecture 3 Estimation Intro 2025
56 pages
Figueiredo EM Algorithm
No ratings yet
Figueiredo EM Algorithm
35 pages
Error Propagation
No ratings yet
Error Propagation
22 pages
Robust Stats
No ratings yet
Robust Stats
63 pages
RobustStats Practice Problems
No ratings yet
RobustStats Practice Problems
4 pages
Boots Trapping
No ratings yet
Boots Trapping
22 pages
Bootstrap Tut
No ratings yet
Bootstrap Tut
2 pages
Bootstrap Tut Sol
No ratings yet
Bootstrap Tut Sol
1 page
Chapter 4... Saroj
No ratings yet
Chapter 4... Saroj
42 pages
Lec 0-Is Imaginary Part Really Imaginary
No ratings yet
Lec 0-Is Imaginary Part Really Imaginary
52 pages
QMS210 Course Outline - Winter2025-Final Avh
No ratings yet
QMS210 Course Outline - Winter2025-Final Avh
18 pages
Regression Metrics
No ratings yet
Regression Metrics
3 pages
BCH-226 Business Statistics, 2022-23
No ratings yet
BCH-226 Business Statistics, 2022-23
4 pages
LCGC Eur Burke 2001 - Missing Values, Outliers, Robust Stat and NonParametric PDF
No ratings yet
LCGC Eur Burke 2001 - Missing Values, Outliers, Robust Stat and NonParametric PDF
6 pages
Statistics Practice Test
No ratings yet
Statistics Practice Test
6 pages
Wilcoxon Sign Test
No ratings yet
Wilcoxon Sign Test
23 pages
Mixture Models and Applications Nizar Bouguila Latest PDF 2025
No ratings yet
Mixture Models and Applications Nizar Bouguila Latest PDF 2025
142 pages
One Sample Z Test
100% (1)
One Sample Z Test
10 pages
Research Method FD
No ratings yet
Research Method FD
77 pages
ps7 Sol
No ratings yet
ps7 Sol
7 pages
B. Com. Semester-II Business Mathematics and Statistics (Code: 52411202)
No ratings yet
B. Com. Semester-II Business Mathematics and Statistics (Code: 52411202)
3 pages
STA 111 Exam Sem1 2023 - 2024 Final Draft After Moderation Amrking Scheme Final
No ratings yet
STA 111 Exam Sem1 2023 - 2024 Final Draft After Moderation Amrking Scheme Final
10 pages
LBY2STA Reviewer PDF
No ratings yet
LBY2STA Reviewer PDF
5 pages
Binary Logistic Regression Guide
No ratings yet
Binary Logistic Regression Guide
30 pages
TV Ratings Forecasting Models
No ratings yet
TV Ratings Forecasting Models
26 pages
ITM Chapter 6 On Testing of Hypothesis
No ratings yet
ITM Chapter 6 On Testing of Hypothesis
39 pages
Probability & Stochastic Exam
No ratings yet
Probability & Stochastic Exam
2 pages
Lesson Plan Ma3401
No ratings yet
Lesson Plan Ma3401
3 pages
Tips and Tricks For Analyzing Non-Normal Data
No ratings yet
Tips and Tricks For Analyzing Non-Normal Data
3 pages
Soal Asis Viiii Dan X
No ratings yet
Soal Asis Viiii Dan X
8 pages
UT Dallas Syllabus For Stat3360.502 05f Taught by Dan Watson (Daw016600)
No ratings yet
UT Dallas Syllabus For Stat3360.502 05f Taught by Dan Watson (Daw016600)
2 pages
Lab 6 Worksheet
No ratings yet
Lab 6 Worksheet
2 pages
Statistics and Probability Module 1
No ratings yet
Statistics and Probability Module 1
42 pages
CFA Level II Formula Sheet CFA Level II Formula Sheet: Finance (Harvard University) Finance (Harvard University)
100% (1)
CFA Level II Formula Sheet CFA Level II Formula Sheet: Finance (Harvard University) Finance (Harvard University)
5 pages
Module 1 - 2 - EDA
No ratings yet
Module 1 - 2 - EDA
12 pages
Stat. and Prob. Module 1
100% (1)
Stat. and Prob. Module 1
20 pages
Lecture Note5
No ratings yet
Lecture Note5
53 pages
Handbook of Modern Item Response Theory
No ratings yet
Handbook of Modern Item Response Theory
509 pages
Problem Set
No ratings yet
Problem Set
10 pages
Uji Normalitas Dan Homogenitas
No ratings yet
Uji Normalitas Dan Homogenitas
18 pages

The Expectation-Maximization Algorithm

Uploaded by

The Expectation-Maximization Algorithm

Uploaded by

The Expectatio

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 47

48 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

(The symbol g is used to indicate the probability function for

where “argmax” means “the value that maximizes the func-

The idea behind the EM algorithm is that, even though we

In the current example, x3is known and does not need to be

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 49

General Statement of the EM Algorithm

Let Y denote the sample space of the observations, and let y

denote the likelihood function and let

denote the log-likelihood function.

Q(81dk')= E[logf(~l8)ly,B'~']. (9)

50 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 51

P+”= E[t(x)ly,fP’. (12) Applications of the EM Algorithm

52 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

outcomes can be combined to form a

f(x,,x29x,lp) = a\sumed to be independent of each other. Let the set of

as a Poisson process with mean h(b),i.e.,

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 53

l,(h)= f ( x l h )= e-b(b,d) h(b,d)x'b'd'

Application of the EM algorithm is straightforward. Pois-

(b,d)is used in the likelihood function

we assume that v(t) is a unit-variance, white Gaussian proc-

6. Illustration of a four-state HMM showing the states, the distri-

54 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

Axle) =f(y,slO) =f(yls,O)f(sle).

The conditioning step provides important leverage be-

and (see [42, page 1871)

The E-step may be computed as

E[logf(xlO)Iy,0[k’]= logf(s,-,(0)10)- Nlogo, - Nlogo,

Taking the gradient with respect to a and derivatives with

which may be computed using a Kalman smoother. The

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 55

7. Representation of signals in a spread-spectrum multiple-access

not on any prior states:

chosen according to the probability

56 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

8. Multiple-access receiver matched-filter bank.

Spread-Spectrum Multi-User Communication

In direct-sequence spread-spectrum multiple-access (SSMA)

obtained by maximizing Eq. (29) with constraints to preserve

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 57

it is necessary to determine the conditional probability

What is ultimately desired from the detector is the set of

58 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

NOVEMBER 1996 IEEE SIGNAL PROCESSING MAGAZINE 59

60 IEEE SIGNAL PROCESSING MAGAZINE NOVEMBER 1996

You might also like