PSD Estimación
PSD Estimación
i i
i i
El conjunto de todas las posibles formas de onda de ruido de un proceso aleatorio se designa como X(t, S),
donde t representa el índice de tiempo y S representa el conjunto (espacio de muestras) de todas las posibles
funciones de muestra. Una única forma de onda del conjunto se designa por x(t, s). Normalmente, no empleare-
mos la variables s (o S) por hacer más cómoda la notación, de modo que el proceso alaetorio se designará como
X(t) y una única realización se designará mediante x(t).
Una vez definido un proceso aleatorio X(t) como un conjunto de funciones muestra, consideremos los
valores del proceso para cualquier conjunto de instantes de tiempo t1 > t2 > · · · > tn , donde n es cualquier
entero positivo. En general, las muestras Xti ≡ x(ti ), i = 1, 2, . . . , n son n variables aleatorias caracterizadas
estadísticamente por su función de densidad de probabilidad conjunta designada como p(xt1 , xt2 , . . . , xtn ) para
cualquier n.
para todo τ y todo n, y se dice que el proceso aleatorio es estacionario en sentido estricto. En otras palabras,
las propiedades estadísticas de un proceso aleatorio estacionario son invariantes ante una translación del eje
de tiempos. Por el contrario, si las funciones de densidad de probabilidad conjuntas son diferentes, el proceso
aleatorio no será estacionario.
En general, el valor del momento de primer orden depende del instante de tiempo ti , si la función de densidad de
probabilidad de Xti depende de ti . Sin embargo, cuando el proceso es estacionario, p(xti +τ ) = p(xti ) para todo
τ . Luego la función de densidad de probabilidad es independiente del tiempo y, en consecuencia, el momento
de primer orden es independiente del tiempo (una constante).
Consideremos ahora las dos variables aleatorias Xti = X(ti ), i = 1, 2, correspondientes a las muestras de
X(t) tomadas en t = t1 y t = t2 . La correlación estadística entre Xt1 y Xt2 se mide mediante el momento conjunto
∞ ∞
E(Xt1 Xt2 ) = xt1 xt2 p(xt1 xt2 ) dx1 dx2 (12.1.3)
−∞ −∞
Dado que el momento conjunto depende de los instantes de tiempo t1 y t2 , se designa como γxx (t1 , t2 ) y se
denomina función de autocorrelación del proceso aleatorio. Cuando el proceso X(t) es estacionario, la función
de densidad de probabilidad conjunta de la pareja (Xt1 , Xt2 ) es idéntica a la función de densidad de probabilidad
conjunta de la pareja (Xt1 +τ , Xt2 +τ ) para cualquier τ arbitraria. Esto implica que la función de autocorrelación
i i
i i
X(t) depende de la diferencia de tiempos t1 − t2 = τ . Por tanto, para un proceso aleatorio real estacionario, la
función de autocorrelación es
γxx (τ ) = E[Xt1 +τ Xt1 ] (12.1.4)
Por otro lado,
γxx (−τ ) = E(Xt1 −τ Xt1 ) = E(Xt1 Xt1 +τ ) = γxx (τ ) (12.1.5)
Por tanto, γxx (τ ) es una función par. Observe también que γxx (0) = E(Xt21 ) es la potencia media del proceso
aleatorio.
Existen procesos no estacionarios con la propiedad de que el valor medio del proceso es una constante
y la función de autocorrelación satisface la propiedad γxx (t1 , t2 ) = γxx (t1 − t2 ). Un proceso así se dice que es
estacionario en sentido amplio. Evidentemente, la estacionariedad en sentido amplio es una condición menos
restrictiva que la condición de estacionariedad en sentido estricto. En nuestro estudio sólo vamos a requerir que
los procesos sean estacionarios en sentido amplio.
Relacionada con la función de autocorrelación tenemos la autocovarianza , que se define como
donde m(t1 ) = E(Xt1 ) y m(t2 ) = E(Xt2 ) son los valores medios de Xt1 y Xt2 , respectivamente. Cuando el proceso
es estacionario,
cxx (t1 ,t2 ) = cxx (t1 − t2 ) = cxx (τ ) = γxx (τ ) − m2x (12.1.7)
donde τ = t1 − t2 . Además, la varianza del proceso es σx2 = cxx (0) = γxx (0) − m2x .
para cualquier conjunto de instantes de tiempo {ti } y {t j } y para cualquier valor entero positivo de m y n.
La función de correlación cruzada de X(t) e Y (t), designada como γ xy (t1 , t2 ), se define como el momento
conjunto
∞ ∞
γxy (t1 ,t2 ) ≡ E(Xt1 Yt2 ) = xt1 yt2 p(xt1 , yt2 )dxt1 dyt2 (12.1.8)
−∞ −∞
y la covarianza cruzada es
cxy (t1 ,t2 ) = γxy (t1 ,t2 ) − mx (t1 )my (t2 ) (12.1.9)
Cuando los procesos aleatorios son conjunta e individualmente estacionarios, tenemos γxy (t1 , t2 ) = γxy (t1 − t2 )
y cxy (t1 , t2 ) = cxy (t1 − t2). En este caso,
Los procesos aleatorios X(t) e Y (t) se dice que son estadísticamente independientes si y sólo si
p(xt1 , xt2 , . . . , xtn , yt1 , yt2 , . . . , ytm ) = p(xt1 , . . . , xtn )p(yt1 , . . . , ytm )
i i
i i
para todas las elecciones de ti, ti y para todos los enteros positivos n y m. Se dice que los procesos son incorrelados
si
γxy (t1 ,t2 ) = E(Xt1 )E(Yt2 ) (12.1.11)
de modo que cxy (t1 , t2 ) = 0.
Un proceso aleatorio complejo Z(t) se define como
Z(t) = X(t) + jY (t) (12.1.12)
donde X(t) e Y (t) son procesos aleatorios. La función de densidad de probabilidad conjunta de las variables
aleatorias conjugadas Zti ≡ Z(ti ), i = 1, 2, . . . , queda determinada por la función de densidad de probabili-
dad conjunta de las componentes (Xti , Yti ), i = 1, 2, . . . , n. Luego la función de densidad de probabilidad que
caracteriza Zti , i = 1, 2, . . . , n es
p(xt1 , xt2 , . . . , xtn , yt1 , yt2 , . . . , ytn )
Un proceso aleatorio complejo Z(t) puede encontrarse en la representación de las componentes en fase y
en cuadratura del equivalente paso bajo de una señal aleatoria de banda estrecha o de ruido. Una característica
importante de un proceso así es su función de autocorrelación, la cual se define como
γzz (t1 ,t2 ) = E(Zt1 Zt∗2 )
= E[(Xt1 + jYt1 )(Xt2 − jYt2 )] (12.1.13)
= γxx (t1 ,t2 ) + γyy (t1 ,t2 ) + j[γyx (t1 ,t2 ) − γxy (t1 ,t2 )]
Cuando los procesos aleatorios X(t) e Y (t) son conjunta e individualmente estacionarios, la función de autoco-
rrelación de Z(t) se convierte en
γzz (t1 ,t2 ) = γzz (t1 − t2 ) = γzz (τ )
donde τ = t1 − t2 . El conjugado complejo de (12.1.13) es
γzz∗ (τ ) = E(Zt∗1 Zt1 −τ ) = γzz (−τ ) (12.1.14)
Supongamos ahora que Z(t) = X(t) + jY (t) y W (t) = U(t) + jV (t) son dos procesos aleatorios complejos.
Su función de correlación cruzada se define como
γzw (t1 ,t2 ) = E(Zt1 Wt∗2 )
= E[(Xt1 + jYt1 )(Ut2 − jVt2 )] (12.1.15)
= γxu (t1 ,t2 ) + γyv (t1 ,t2 ) + j[γyu (t1 ,t2 ) − γxv(t1 ,t2 )]
Cuando X(t), Y (t), U(t) y V (t) son estacionarios dos a dos, las funciones de correlación cruzada dadas por
(12.1.15) se transforman en funciones de la diferencia temporal τ = t1 − t2 . Además, tenemos que
∗
γzw (τ ) = E(Zt∗1 Wt1 −τ ) = E(Zt∗1 +τ Wt1 ) = γwz (−τ ) (12.1.16)
i i
i i
Observe que
∞
γxx (0) = Γxx (F)dF (12.1.19)
−∞
= E(Xt2 ) ≥ 0
Puesto que E(Xt2 ) = γxx (0) representa la potencia media del proceso aleatorio, que es el área bajo Γxx (F), se
deduce que Γxx (F) es la distribución de potencia como una función de la frecuencia. Por esta razón, Γxx (F) se
conoce como espectro de densidad de potencia del proceso aleatorio.
Si el proceso aleatorio es real, γxx (τ ) es real y par, y por tanto, Γxx (F) es real y par. Si el proceso aleatorio
∗ (−τ ) y, por tanto
es complejo, γxx (τ ) = γxx
∞ ∞
Γ∗xx (F) = ∗
γxx (τ )e j2π F τ d τ = ∗
γxx (−τ )e− j2π F τ d τ
−∞ −∞
∞
= γxx (τ )e− j2π F τ d τ = Γxx (F)
−∞
que es el espectro de densidad de potencia cruzada. Puede demostrarse fácilmente que Γ ∗xy (F) = Γyx (−F). Para
procesos aleatorios reales, la condición es Γyx (F) = Γxy (−F).
y la secuencia de autocorrelación es
∞ ∞
γxx (n, k) = E(Xn Xk ) = xn xk p(xn , xk )dxn dxk (12.1.22)
−∞ −∞
i i
i i
donde mx = E(Xn ) es la media del proceso aleatorio. La varianza se define como σ 2 = cxx (0) = γxx (0) − m2x .
Para un proceso estacionario complejo Z(n) = X(n) + jY (n), tenemos
γzz (m) = γxx (m) + γyy (m) + j[γyx (m) − γxy (m)] (12.1.25)
γzw (m) = γxu (m) + γyv(m) + j[γyu (m) − γxv (m)] (12.1.26)
Como en el caso de un proceso aleatorio continuo en el tiempo, un proceso aleatorio discreto en el tiempo
tiene energía infinita pero una potencia media finita y está dada por
Utilizando el teorema de Wiener–Khintchine, obtenemos el espectro de densidad de potencia del proceso alea-
torio discreto en el tiempo calculando la transformada de Fourier de la autocorrelación γxx (m), es decir,
∞
Γxx ( f ) = ∑ γxx (m)e− j2π f m (12.1.28)
m=−∞
La transformada inversa es
1/2
γxx (m) = Γxx ( f )e j2π f m d f (12.1.29)
−1/2
de modo que Γxx ( f ) es la distribución de potencia en función de la frecuencia, es decir, Γxx ( f ) es el espectro de
densidad de potencia del proceso aleatorio X(n). Las propiedades que hemos establecido para Γxx (F) también
se cumplen para Γxx ( f ).
i i
i i
del conjunto. Bajo esta condición, podemos intentar estimar los promedios del conjunto utilizando promedios
temporales de una sola realización.
Para ilustrar este punto, consideremos la estimación de la media y de la autocorrelación del proceso aleatorio
a partir de una sola realización x(n). Como sólo estamos interesados en estos dos momentos, definimos la ergo-
dicidad con respecto a estos parámetros. Para obtener información detallada sobre los requisitos de ergodicidad
respecto de la media y ergodicidad respecto de la autocorrelación que se proporcionan a continuación, el lector
puede consultar el libro de Papoulis (1984).
mx = E(Xn )
En general, interpretamos el valor de m̂x dado en (12.1.31) como un estimado de la media estadística, cuyo valor
variará con las diferentes realizaciones del proceso aleatorio. Dado que m̂x es una variable aleatoria con una
función de densidad de probabilidad p(m̂x ), calculamos el valor esperado de m̂x sobre todas las realizaciones
posibles de X(n). Como el sumatorio y la esperanza son operaciones lineales, podemos intercambiarlas, de
modo que
N N
1 1
E(m̂x ) = ∑
2N + 1 n=−N
E[x(n)] = ∑
2N + 1 n=−N
m x = mx (12.1.32)
Dado que el valor medio de la estimación es igual a la media estadística, decimos que el estimado m̂x no está
polarizado.
A continuación calculamos la varianza de m̂x . Tenemos
Pero
N N
1
E(|m̂x |2 ) = ∑ ∑
(2N + 1) n=−N k=−N
2
E[x∗ (n)x(k)]
N N
1
= ∑ ∑ γxx (k − n)
(2N + 1)2 n=−N k=−N
2N
1 |m|
= ∑ 1 − 2N + 1 γxx (m)
2N + 1 m=−2N
Por tanto,
2N
1 |m|
var(m̂x ) = ∑ 1 − 2N + 1 γxx − |mx |2
2N + 1 m=−2N
(12.1.33)
2N
1 |m|
= ∑
2N + 1 m=−2N
1 −
2N + 1
cxx (m)
i i
i i
Si var(mx ) → 0 cuando N → ∞, el estimado converge con probabilidad 1 a la media estadística mx . Por tanto,
el proceso X(n) es ergódico respecto de la media si
2N
1 |m|
lı́m
N→∞ 2N + 1
∑ 1 − 2N + 1 cxx (m) = 0 (12.1.34)
m=−2N
Bajo esta condición, el estimado m̂x en el límite cuando N → ∞ se hace igual a la media estadística, es decir,
N
1
mx = lı́m
N→∞ 2N + 1
∑ x(n) (12.1.35)
n=−N
lo que implica que cxx (m) → 0 cuando m → ∞. Esta condición se cumple para la mayoría de los procesos de
media cero que pueden encontrarse en el mundo real.
N
1
rxx (m) = ∑
2N + 1 n=−N
x∗ (n)x(n + m) (12.1.37)
De nuevo, consideremos rxx (m) como una variable aleatoria para cualquier m dado, ya que es una función de
una realización particular. El valor esperado (valor medio de todas las realizaciones) es
N
1
E[rxx (m)] = ∑
2N + 1 n=−N
E[x∗ (n)x(n + m)]
N (12.1.38)
1
= ∑
2N + 1 n=−N
γxx (m) = γxx (m)
Por tanto, el valor esperado de la autocorrelación temporal es igual a la media estadística. Por tanto, tenemos
un estimado no polarizado de γxx (m).
Para determinar la varianza del estimado rxx (m), calculamos el valor esperado de |rxx (m)|2 y restamos el
cuadrado del valor medio. Así
var[rxx (m)] = E[|rxx (m)|2 ] − |γxx (m)|2 (12.1.39)
Pero
N N
1
E[|rxx (m)|2 ] = ∑ ∑
(2N + 1) n=−N k=−N
2
E[x∗ (n)x(n + m)x(k)x∗ (k + m)] (12.1.40)
El valor esperado del término x∗ (n)x(n + m)x(k)x∗ (k + m) es sólo la autocorrelación de un proceso aleatorio
definido como
vm (n) = x∗ (n)x(n + m)
i i
i i
y la varianza es
2N
1 |n|
∑
(m)
var[rxx (m)] = 1 − γvv (n) − |γxx (m)|2 (12.1.42)
2N + 1 n=−2N 2N + 1
Si var[rxx (m)] → 0 cuando N → ∞, el estimado rxx (m) converge con probabilidad 1 a la autocorrelación
estadística γxx (m). Bajo estas condiciones, el proceso es ergódico respecto de la correlación y la correlación
temporal es idéntica a la media estadística, es decir,
N
1
lı́m ∑ x∗ (n)x(n + m) = γxx (m)
N→∞ 2N + 1 n=−N
(12.1.43)
En nuestro estudio de las señales aleatorias, suponemos que los procesos aletorios son ergódicos respecto
de la media y ergódicos respecto de la correlación, por lo que podemos trabajar con promedios temporales de
la media y la autocorrelación obtenidos a partir de una sola realización del proceso.
a partir de la cual obtenemos la densidad espectral de potencia evaluando Γxx (z) sobre la circunferencia unidad
[es decir, sustituyendo z = exp( j2π f )].
Ahora suponemos que log Γxx (z) es analítica (posee derivadas de todos los órdenes) en una región anular
del plano z que incluye la circunferencia unidad (es decir, r1 < |z| < r2 donde r1 < 1 y r2 > 1). Luego, log Γxx (z)
se puede expandir en una serie de Laurent de la forma
∞
log Γxx (z) = ∑ v(m)z−m (12.2.2)
m=−∞
donde los {v(m)} son los coeficientes de las series. Podemos interpretar {v(m)} como la secuencia cuya trans-
formada z es V (z) = log Γxx (z). Del mismo modo, podemos evaluar log Γxx (z) sobre la circunferencia unidad,
∞
log Γxx ( f ) = ∑ v(m)e− j2π f m (12.2.3)
m=−∞
i i
Random Signals
In this appendix,we collect and summarize a number of results and establish the notation
relating to the representation of random signals. We make no attempt here to provide a
detailed discussion of the difficult and subtle mathematical issues of the underlying the-
ory. Although our approach is not rigorous, we have summarized the important results
and the mathematical assumptions implicit in their derivation. Detailed presentation
of the theory of random signals are found in texts such as Davenport (1970), Papoulis
(1984), Gray and Davidson (2004), Kay (2006), and Bertsekas and Tsitsiklis (2008).
signals, the index n is associated with the time index. In other words, each sample value
x[n] of a random signal is assumed to have resulted from a mechanism that is governed
by a probability law. An individual random variable xn is described by the probability
distribution function
Pxn (xn , n) = Probability [xn ≤ xn ], (1)
where xn denotes the random variable and xn is a particular value of xn .1 If xn takes
on a continuous range of values, it is equivalently specified by the probability density
function
∂Pxn (xn , n)
pxn (xn , n) = , (2)
∂xn
or the probability distribution function
xn
Pxn (xn , n) = pxn (x, n)dx. (3)
−∞
1 In this appendix, boldface type is used to denote the random variables and regular type denotes
dummy variables of probability functions.
1026
Random Signals
2 AVERAGES
It is often useful to characterize a random variable by averages such as the mean and
variance. Since a random process is an indexed set of random variables, we may likewise
characterize the process by statistical averages of the random variables making up the
random process. Such averages are called ensemble averages. We begin the discussion
of averages with some definitions.
2.1 Definitions
The average, or mean, of a random process is defined as
∞
mxn = E{xn } = xpxn (x, n)dx, (8)
−∞
where E denotes an operator called mathematical expectation. In general, the mean
(expected value) may depend on n. In addition, if g(·) is a single-valued function, then
g(xn ) is a random variable, and the set of random variables {g(xn )} defines a new random
process. To compute averages of this new process, we can derive probability distributions
of the new random variables. Alternatvely, it can be shown that
∞
E{g(xn )} = g(x)pxn (x, n)dx. (9)
−∞
If the random variables are discrete—i.e., if they have quantized values—the integrals
become summations over all possible values of the random variable. In that case E{g(x)}
has the form
E{g(xn )} = g(x)p̂xn (x, n). (10)
x
In cases where we are interested in the relationship between multiple random
processes, we must be concerned with multiple sets of random variables. For example,
for two sets of random variables, {xn } and {ym }, the expected value of a function of the
two random variables is defined as
∞ ∞
E{g(xn , ym )} = g(x, y)pxn ,ym (x, n, y, m)dx dy, (11)
−∞ −∞
where pxn ,ym (xm , n, ym , m) is the joint probability density of the random variables xn
and ym .
The mathematical expectation operator is a linear operator; that is, it can be shown
that
1. E{xn + ym } = E{xn } + E{ym }; i.e., the average of a sum is the sum of the averages.
2. E{axn } = aE{xn }; i.e., the average of a constant times xn is equal to the constant
times the average of xn .
In general, the average of a product of two random variables is not equal to the
product of the averages. When this property holds, however, the two random variables
are said to be linearly independent or uncorrelated. That is, xn and ym are linearly inde-
pendent or uncorrelated if
E{xn ym } = E{xm } · E{ym }. (12)
1027
Random Signals
It is easy to see from Eqs. (11) and (12) that a sufficient condition for linear indepen-
dence is
pxn ,ym (xn , n, ym , m) = pxn (xn , n) · pym (ym , m). (13)
However, Eq. (13) is a stronger statement of independence than Eq. (12). As previously
stated, random variables satisfying Eq. (13) are said to be statistically independent. If
Eq. (13) holds for all values of n and m, the random processes {xn } and {ym } are said to be
statistically independent. Statistically independent random processes are also linearly
independent; but the converse is not true: Linear independence does not imply statistical
independence.
It can be seen from Eqs. (9)–(11) that averages generally are functions of the time
index. For stationary processes, the mean is the same for all the random variables that
constitute the process; i.e., the mean of a stationary process is a constant, which we
denote simply mx .
In addition to the mean of a random process, as defined in Eq. (8), a number
of other averages are particularly important within the context of signal processing.
These are defined next. For notational convenience, we assume that the probability
distributions are continuous. Corresponding definitions for discrete random processes
can be obtained by applying Eq. (10).
The mean-square value of xn is the average of |xn |2 ; i.e.,
∞
E{|xn | } = mean square =
2
|x|2 pxn (x, n)dx. (14)
−∞
The mean-square value is sometimes referred to as the average power.
The variance of x n is the mean-square value of [xn − mxn ]; i.e.,
var[xn ] = E{|(xn − mxn )|2 } = σx2n . (15)
Since the average of a sum is the sum of the averages, it follows that Eq. (15) can be
written as
var[xn ] = E{|xn |2 } − |mxn |2 . (16)
In general, the mean-square value and the variance are functions of time; however, they
are constant for stationary processes.
The mean, mean square, and variance are simple averages that provide only a small
amount of information about a process. A more useful average is the autocorrelation
sequence, which is defined as
φxx [n, m] = E{xn x∗m }
∞ ∞ (17)
∗
= xn xm pxn ,xm (xn , n, xm , m)dxn dxm ,
−∞ −∞
where ∗ denotes complex conjugation. The autocovariance sequence of a random pro-
cess is defined as
γxx [n, m] = E{(xn − mxn )(xm − mxm )∗ }, (18)
which can be written as
γxx [n, m] = φxx [n, m] − mxn m∗xm . (19)
1028
Random Signals
Note that, in general, both the autocorrelation and autocovariance are two-dimensional
sequences, i.e., functions of two discrete variables.
The autocorrelation sequence is a measure of the dependence between values of
the random processes at different times. In this sense, it partially describes the time
variation of a random signal. A measure of the dependence between two different
random signals is obtained from the cross-correlation sequence. If {xn } and {ym } are
two random processes, their cross-correlation is
φxy [n, m] = E{xn y∗m }
∞ ∞ (20)
= xy∗ pxn ,ym (x, n, y, m)dx dy,
−∞ −∞
where pxn ,ym (x, n, y, m) is the joint probability density of xn and ym .The cross-covariance
function is defined as
γxy [n, m] = E{(xn − mxn )(ym − mym )∗ }
(21)
= φxy [n, m] − mxn m∗ym .
As we have pointed out, the statistical properties of a random process generally
vary with time. However, a stationary random process is characterized by an equilibrium
condition in which the statistical properties are invariant to a shift of time origin. This
means that the 1st -order probability distribution is independent of time. Similarly, all the
joint probability functions are also invariant to a shift of time origin; i.e., the 2nd -order
joint probability distributions depend only on the time difference (m − n). First-order
averages such as the mean and variance are independent of time; 2nd -order averages,
such as the autocorrelation φxx [n, m], are dependent on the time difference (m − n).
Thus, for a stationary process, we can write
mx = E{xn }, (22)
σx2 = E{|(xn − mx )|2 }, (23)
both independent of n. If we now denote the time difference by m, we have
φxx [n + m, n] = φxx [m] = E{xn+m x∗n }. (24)
That is,the autocorrelation sequence of a stationary random process is a one-dimensional
sequence, a function of the time difference m.
In many instances, we encounter random processes that are not stationary in the
strict sense—i.e., their probability distributions are not time invariant—but Eqs. (22)–
(24) still hold. Such random processes are said to be wide-sense stationary.
1029
Random Signals
1
L
xn = lim xn . (25)
L→∞ 2L + 1
n=−L
1
L
xn+m x∗n = lim xn+m x∗n . (26)
L→∞ 2L + 1
n=−L
It can be shown that the preceding limits exist if {xn } is a stationary process with finite
mean. As defined in Eqs. (25) and (26), these time averages are functions of an infinite
set of random variables and thus are properly viewed as random variables themselves.
However, under the condition known as ergodicity, the time averages in Eqs. (25) and
(26) are equal to constants in the sense that the time averages of almost all possible
sample sequences are equal to the same constant. Furthermore, they are equal to the
corresponding ensemble average.2 That is, for any single sample sequence {x[n]} for
−∞ < n < ∞,
1
L
x[n] = lim x[n] = E{xn } = mx (27)
L→∞ 2L + 1
n=−L
and
1
L
x[n + m]x∗ [n] = lim x[n + m]x∗ [n] = E{xn+m x∗n } = φxx [m]. (28)
L→∞ 2L + 1
n=−L
The time-average operator · has the same properties as the ensemble-average operator
E{·}. Thus, we generally do not distinguish between the random variable xn and its value
in a sample sequence, x[n]. For example, the expression E{x[n]} should be interpreted as
E{xn } = x[n]. In general, for ergodic processes, time averages equal ensemble averages.
In practice, it is common to assume that a given sequence is a sample sequence of
an ergodic random process so that averages can be computed from a single sequence.
Of course, we generally cannot compute with the limits in Eqs. (27) and (28), but instead
the quantities
1
L−1
m̂x = x[n], (29)
L
n=0
1
L−1
σ̂x2 = |x[n] − m̂x |2 , (30)
L
n=0
1030
Random Signals
and
1
L−1
x[n + m]x∗ [n]L = x[n + m]x∗ [n] (31)
L
n=0
or similar quantities are often computed as estimates of the mean, variance, and autocor-
relation. m̂x and σ̂x2 are referred to as the sample mean and sample variance, respectively.
The estimation of averages of a random process from a finite segment of data is a prob-
lem of statistics.
Property 3
∗
φxx [−m] = φxx [m], (38a)
∗
γxx [−m] = γxx [m], (38b)
∗
φxy [−m] = φyx [m], (38c)
∗
γxy [−m] = γyx [m]. (38d)
1031
Random Signals
Property 4
|φxy [m]|2 ≤ φxx [0]φyy [0], (39a)
|γxy [m]|2 ≤ γxx [0]γyy [0]. (39b)
In particular,
|φxx [m]| ≤ φxx [0], (40a)
|γxx [m]| ≤ γxx [0]. (40b)
Although the Fourier transform of a random signal does not exist except in a generalized
sense, the autocovariance and autocorrelation sequences of such a signal are aperiodic
sequences for which the transform does exist. The spectral representation of the cor-
relation functions plays an important role in describing the input–output relations for
a linear time-invariant system when the input is a random signal. Therefore, it is of
interest to consider the properties of correlation and covariance sequences and their
corresponding Fourier and z-transforms.
We define xx (ejω ),xx (ejω ),xy (ejω ),and xy (ejω ) as the DTFTs of φxx [m],γxx [m],
φxy [m], and γxy [m], respectively. Since these functions are all DTFTs of sequences, they
must be periodic with period 2π. From Eqs. (36a) and (36b), it follows that, over one
period |ω| ≤ π,
xx (ejω ) = xx (ejω ) + 2π|mx |2 δ(ω), |ω| ≤ π, (43a)
and
xy (ejω ) = xy (ejω ) + 2πmx m∗y δ(ω), |ω| ≤ π. (43b)
1032
Random Signals
In the case of zero-mean processes (mx = 0 and my = 0), the correlation and covariance
functions are identical so that xx (ejω ) = xx (ejω ) and xy (ejω ) = xy (ejω ).
From the inverse Fourier transform equation, it follows that
π
1
γxx [m] = xx (ejω )ejωm dω, (44a)
2π −π
π
1
φxx [m] = xx (ejω )ejωm dω, (44b)
2π −π
and, consequently,
π
1
E{|x[n]|2 } = φxx [0] = σx2 = xx (ejω )dω, (45a)
2π −π
π
1
σx2 = γxx [0] = xx (ejω )dω. (45b)
2π −π
Thus, the area under Pxx (ω) for −π ≤ ω ≤ π is proportional to the average power in the
signal. In fact, the integral of Pxx (ω) over a band of frequencies is proportional to the
power in the signal in that band. For this reason, the function Pxx (ω) is called the power
density spectrum, or simply, the power spectrum. When Pxx (ω) is a constant independent
of ω, the random process is referred to as a white-noise process, or simply, white noise.
When Pxx (ω) is constant over a band and zero otherwise, we refer to it as bandlimited
white noise.
From Eq. (38a), it can be shown that Pxx (ω) = Pxx ∗ (ω); i.e., P (ω) is always real
xx
valued. Furthermore, for real random processes, φxx [m] = φxx [−m], so in the real case,
Pxx (ω) is both real and even; i.e.,
1033
Fourier Analysis
of Signals Using the
Discrete Fourier Transform
0 INTRODUCTION
Because the discrete Fourier transform (DFT) can be computed efficiently, it plays a
central role in a wide variety of signal-processing applications, including filtering and
spectrum analysis. In this chapter, we take an introductory look at Fourier analysis of
signals using the DFT.
In applications and algorithms based on explicit evaluation of the Fourier trans-
form, it is ideally the discrete-time Fourier transform (DTFT) that is desired, although
it is the DFT that can actually be computed. For finite-length signals, the DFT provides
frequency-domain samples of the DTFT, and the implications of this sampling must be
clearly understood and accounted for. For example, in linear filtering or convolution
implemented by multiplying DFTs rather than DTFTs, a circular convolution is imple-
mented, and special care must be taken to ensure that the results will be equivalent to a
linear convolution. In addition, in many filtering and spectrum analysis applications, the
signals do not inherently have finite length.As we will discuss,this inconsistency between
the finite-length requirement of the DFT and the reality of indefinitely long signals can
be accommodated exactly or approximately through the concepts of windowing, block
processing, and the time-dependent Fourier transform.
Fourier Analysis of Signals Using the Discrete Fourier Transform
Continuous-to-
Antialiasing
discrete-time DFT
lowpass filter x[n] [n] V [k]
sc (t) xc (t) conversion
Haa( jΩ)
w [n]
One of the major applications of the DFT is in analyzing the frequency content of
continuous-time signals. For example, as we describe in Section 4.1, in speech analysis
and processing, frequency analysis is particularly useful in identifying and modeling the
resonances of the vocal cavity. Another example, introduced in Section 4.2, is Doppler
radar, in which the velocity of a target is represented by the frequency shift between
the transmitted and received signals.
The basic steps in applying the DFT to continuous-time signals are indicated in
Figure 1. The antialiasing filter is incorporated to eliminate or minimize the effect of
aliasing when the continuous-time signal is converted to a sequence by sampling. The
need for multiplication of x[n] by w[n], i.e., windowing, is a consequence of the finite-
length requirement of the DFT. In many cases of practical interest, sc (t) and, conse-
quently, x[n] are very long or even indefinitely long signals (such as with speech or
music). Therefore, a finite-duration window w[n] is applied to x[n] prior to computation
of the DFT. Figure 2 illustrates the Fourier transforms of the signals in Figure 1. Fig-
ure 2(a) shows a continuous-time spectrum that tapers off at high frequencies but is not
bandlimited. It also indicates the presence of some narrowband signal energy, repre-
sented by the narrow peaks. The frequency response of an antialiasing filter is illustrated
in Figure 2(b). As indicated in Figure 2(c), the resulting continuous-time Fourier trans-
form Xc (j) contains little useful information about Sc (j) for frequencies above the
cutoff frequency of the filter. Since Haa (j) cannot be ideal, the Fourier components of
the input in the passband and the transition band also will be modified by the frequency
response of the filter.
The conversion of xc (t) to the sequence of samples x[n] is represented in the fre-
quency domain by periodic replication, frequency normalization, and amplitude scaling
i.e.,
∞
1 ω 2πr
X (e ) =
jω
Xc j + j . (1)
T r=−∞ T T
This is illustrated in Figure 2(d). In a practical implementation, the antialiasing filter can-
not have infinite attenuation in the stopband. Therefore, some nonzero overlap of the
terms in Eq. (1), i.e., aliasing, can be expected; however, this source of error can be made
negligibly small either with a high-quality continuous-time filter or through the use of
initial oversampling followed by more effective discrete-time lowpass filtering and deci-
mation. If x[n] is a digital signal,so thatA/D conversion is incorporated in the second sys-
tem in Figure 1, then quantization error is also introduced. This error can be modeled as a
828
Sc ( jΩ)
−Ω0 0 Ω0 Ω
(a)
1 Haa( jΩ)
0 Ω
−
T (b) T
Xc ( jΩ)
–Ω 0 Ω0 Ω
−
T (c) T
X (e j)
W (e j)
−2 − 0 2
(e) 2
N
V (e j), V [k]
−2 − 0 2
(f)
Figure 2 Illustration of the Fourier transforms of the system of Figure 1. (a) Fourier
transform of continuous-time input signal. (b) Frequency response of antialiasing
filter. (c) Fourier transform of output of antialiasing filter. (d) Fourier transform of
sampled signal. (e) Fourier transform of window sequence. (f) Fourier transform
of windowed signal segment and frequency samples obtained using DFT samples.
829
Fourier Analysis of Signals Using the Discrete Fourier Transform
noise sequence added to x[n]. The noise can be made negligible through the use of
fine-grained quantization.
The sequence x[n] is typically multiplied by a finite-duration window w[n], since
the input to the DFT must be of finite duration. This produces the finite-length sequence
v[n] = w[n]x[n]. The effect in the frequency domain is a periodic convolution, i.e.,
π
1
V(ejω ) = X (ejθ )W(ej(ω−θ) )dθ. (2)
2π −π
Figure 2(e) illustrates the Fourier transform of a typical window sequence. Note that
the main lobe is assumed to be concentrated around ω = 0, and, in this illustration,
the side lobes are very small, suggesting that the window tapers at its edges. Properties
of windows such as the Bartlett, Hamming, Hanning, Blackman, and Kaiser windows
are discussed in Section 2. At this point, it is sufficient to observe that convolution of
W(ejω ) with X (ejω ) will tend to smooth sharp peaks and discontinuities in X (ejω ). This
is depicted by the continuous curve plotted in Figure 2(f).
The final operation in Figure 1 is the computation of the DFT. The DFT of the
windowed sequence v[n] = w[n]x[n] is
N−1
V [k] = v[n]e−j(2π/N)kn , k = 0, 1, . . . , N − 1, (3)
n=0
where we assume that the window length L is less than or equal to the DFT length N.
V [k], the DFT of the finite-length sequence v[n], corresponds to equally spaced samples
of the DTFT of v[n]; i.e.,
V [k] = V(ejω )ω=2πk/N . (4)
Figure 2(f) also shows V [k] as the samples of V(ejω ). Since the spacing between DFT fre-
quencies is 2π/N, and the relationship between the normalized discrete-time frequency
variable and the continuous-time frequency variable is ω = T , the DFT frequencies
correspond to the continuous-time frequencies
2πk
k = . (5)
NT
The use of this relationship between continuous-time frequencies and DFT frequencies
is illustrated by Examples 1 and 2.
Example 1 Fourier Analysis Using the DFT
Consider a bandlimited continuous-time signal xc (t) such that Xc (j) = 0 for || ≥
2π(2500). We wish to use the system of Figure 1 to estimate the continuous-time spec-
trum Xc (j). Assume that the antialiasing filter Haa (j) is ideal, and the sampling
rate for the C/D converter is 1/T = 5000 samples/s. If we want the DFT samples V [k]
to be equivalent to samples of Xc (j) that are at most 2π(10) rad/s or 10 Hz apart,
what is the minimum value that we should use for the DFT size N?
From Eq. (5), we see that adjacent samples in the DFT correspond to
continuous-time frequencies separated by 2π/(NT). Therefore, we require that
2π
≤ 20π,
NT
830
Fourier Analysis of Signals Using the Discrete Fourier Transform
Many commercial real-time spectrum analyzers are based on the principles em-
bodied in Figures 1 and 2. It should be clear from the preceding discussion, however,
that numerous factors affect the interpretation the DFT of a windowed segment of the
sampled signal in terms of the continuous-time Fourier transform of the original input
sc (t). To accommodate and mitigate the effects of these factors, care must be taken in fil-
tering and sampling the input signal. Furthermore, to interpret the results correctly, the
effects of the time-domain windowing and of the frequency-domain sampling inherent
in the DFT must be clearly understood. For the remainder of the discussion, we will as-
sume that the issues of antialiasing filtering and continuous-to-discrete-time conversion
have been satisfactorily handled and are negligible. In the next section, we concentrate
specifically on the effects of windowing and of the frequency-domain sampling imposed
by the DFT. We choose sinusoidal signals as the specific class of examples to discuss,
because sinusoids are perfectly bandlimited and they are easily computed. However,
most of the issues raised by the examples apply more generally.
831
Fourier Analysis of Signals Using the Discrete Fourier Transform
Radian frequency
0
Time
0 60
Time
(b)
0
Radian frequency
(a)
In previous sections, we discussed and illustrated Fourier analysis for sinusoidal signals
with stationary (nontime-varying) parameters and for nonstationary signals such as
speech and radar. In cases where the signal can be modeled by a sum of sinusoids or a
linear system excited by a periodic pulse train, the Fourier transforms of finite-length
segments of the signal have a convenient and natural interpretation in terms of Fourier
transforms, windowing, and linear system theory. However, more noise-like signals, such
as the example of unvoiced speech in Section 4.1, are best modeled as random signals.
Random processes are often used to model signals when the process that generates
the signal is too complex for a reasonable deterministic model. Typically, when the input
to an LTI system is modeled as a stationary random process, many of the essential
characteristics of the input and output are adequately represented by averages, such
as the mean value (dc level), variance (average power), autocorrelation function, or
power density spectrum. Consequently, it is of particular interest to estimate these for
a given signal. An estimate of the mean value of a stationary random process from a
871
Fourier Analysis of Signals Using the Discrete Fourier Transform
1
L−1
m̂x = x[n]. (63)
L
n=0
1
L−1
σ̂x2 = (x[n] − m̂x )2 . (64)
L
n=0
The sample mean and the sample variance, which are themselves random variables, are
unbiased and asymptotically unbiased estimators, respectively; i.e., the expected value
of m̂x is the true mean mx and the expected value of σ̂x2 approaches the true variance σx2
as L approaches ∞. Furthermore, they are both consistent estimators; i.e., they improve
with increasing L, since their variances approach zero as L approaches ∞.
In the remainder of this chapter, we study the estimation of the power spectrum9
of a random signal using the DFT. We will see that there are two basic approaches
to estimating the power spectrum. One approach, which we develop in this section, is
referred to as periodogram analysis and is based on direct Fourier transformation of
finite-length segments of the signal. The second approach, developed in Section 6, is to
first estimate the autocovariance sequence and then compute the Fourier transform of
this estimate. In either case, we are typically interested in obtaining unbiased consistent
estimators. Unfortunately, the analysis of such estimators is very difficult, and generally,
only approximate analyses can be accomplished. Even approximate analyses are beyond
the scope of this text, and we refer to the results of such analyses only in a qualitative
way. Detailed discussions are given in Blackman and Tukey (1958), Hannan (1960),
Jenkins and Watts (1968), Koopmans (1995), Kay and Marple (1981), Marple (1987),
Kay (1988) and Stoica and Moses (2005).
9The term power spectrum is commonly used interchangeably with the more precise term power density
spectrum.
872
Fourier Analysis of Signals Using the Discrete Fourier Transform
random signals.) Consequently, a good estimate of Pxx (ω) will provide a useful estimate
of Pss (). The window w[n] in Figure 1 selects a finite-length segment (L samples) of
x[n], which we denote v[n], the Fourier transform of which is
L−1
V(ejω ) = w[n]x[n]e−jωn . (66)
n=0
1
L−1
I(ω) = cvv [m]e−jωm , (68)
LU
m=−(L−1)
where
L−1
cvv [m] = x[n]w[n]x[n + m]w[n + m]. (69)
n=0
We note that the sequence cvv [m] is the aperiodic correlation sequence for the finite-
length sequence v[n] = w[n]x[n]. Consequently, the periodogram is in fact the Fourier
transform of the aperiodic correlation of the windowed data sequence.
Explicit computation of the periodogram can be carried out only at discrete fre-
quencies. From Eqs. (66) and (67), we see that if the DTFT of w[n]x[n] is replaced by its
DFT, we will obtain samples at the DFT frequencies ωk = 2πk/N for k = 0, 1, . . . , N −1.
Specifically, samples of the periodogram are given by
1
I[k] = I(ωk ) = |V [k]|2 , (70)
LU
where V [k] is the N-point DFT of w[n]x[n]. If we want to choose N to be greater than the
window length L, appropriate zero-padding would be applied to the sequence w[n]x[n].
If a random signal has a nonzero mean, its power spectrum has an impulse at
zero frequency. If the mean is relatively large, this component will dominate the spec-
trum estimate, causing low-amplitude, low-frequency components to be obscured by
leakage. Therefore, in practice the mean is often estimated using Eq. (63), and the
resulting estimate is subtracted from the random signal before computing the power
spectrum estimate. Although the sample mean is only an approximate estimate of the
zero-frequency component, subtracting it from the signal often leads to better estimates
at neighboring frequencies.
873
Fourier Analysis of Signals Using the Discrete Fourier Transform
874
Fourier Analysis of Signals Using the Discrete Fourier Transform
Cww (ejω ) approaches a periodic impulse train. The scale can be adjusted by choosing
the normalizing constant U so that
π
1
L−1
1
|W(e )| dω =
jω 2
(w[n])2 = 1, (78)
2πLU −π LU
n=0
or
1
L−1
U= (w[n])2 . (79)
L
n=0
For the rectangular window, we should choose U = 1, while other data windows would
require a value of 0 < U < 1 if w[n] is normalized to a maximum value of 1. Alter-
natively, the normalization can be absorbed into the amplitude of w[n]. Therefore, if
properly normalized, the periodogram and modified periodogram are both asymptoti-
cally unbiased; i.e., the bias approaches zero as the window length increases.
To examine whether the periodogram is a consistent estimate or becomes a consis-
tent estimate as the window length increases, it is necessary to consider the behavior of
the variance of the periodogram. An expression for the variance of the periodogram is
very difficult to obtain even in the simplest cases. However, it has been shown (see Jenk-
ins and Watts, 1968) that over a wide range of conditions, as the window length increases,
var[I(ω)] Pxx
2
(ω). (80)
That is, the variance of the periodogram estimate is approximately the same size as the
square of the power spectrum that we are estimating. Therefore, since the variance does
not asymptotically approach zero with increasing window length, the periodogram is
not a consistent estimate.
The properties of the periodogram estimate of the power spectrum just discussed
are illustrated in Figure 24, which shows periodogram estimates of white noise us-
ing rectangular windows of lengths L = 16, 64, 256, and 1024. The sequence x[n]
4
Amplitude
0
0 128 256 384 512
Sample number (k)
(a)
875
512
876
Fourier Analysis of Signals Using the Discrete Fourier Transform
x [n] w[n]
0 n
(a)
x [n + m] w[n + m]
877
Fourier Analysis of Signals Using the Discrete Fourier Transform
and Watts, 1968) that if N = L, the periodogram estimates at the DFT frequencies
2πk/N become uncorrelated. Since, as N increases, the DFT frequencies get closer
together, this behavior is inconsistent with our goal of obtaining a good estimate of
the power spectrum. We would prefer to obtain a smooth spectrum estimate without
random variations resulting from the estimation process. This can be accomplished by
averaging multiple independent periodogram estimates to reduce the fluctuations.
1
K−1
Ī(ω) = Ir (ω). (84)
K
r=0
To examine the bias and variance of Ī(ω), let us take L = R, so that the segments do
not overlap, and assume that φxx [m] is small for m > L; i.e., signal samples more than
L apart are approximately uncorrelated. If we assume that the periodograms Ir (ω) are
identically distributed independent random variables, then the expected value of Ī(ω) is
1
K−1
E{Ī(ω)} = E{Ir (ω)}, (85)
K
r=0
or, since we assume that the periodograms are independent and identically distributed,
878
Fourier Analysis of Signals Using the Discrete Fourier Transform
where L is the window length. When the window w[n] is the rectangular window, the
method of averaging periodograms is called Bartlett’s procedure, and in this case it can
be shown that
L − |m|, |m| ≤ (L − 1),
cww [m] = (88)
0 otherwise,
and, therefore,
2
sin(ωL/2)
Cww (ejω ) = . (89)
sin(ω/2)
That is, the expected value of the average periodogram spectrum estimate is the convo-
lution of the true power spectrum with the Fourier transform of the triangular sequence
cww [n] that results as the autocorrelation of the rectangular window. Thus, the average
periodogram is also a biased estimate of the power spectrum.
To examine the variance,we use the fact that,in general,the variance of the average
of K independent identically distributed random variables is 1/K times the variance of
each individual random variable. (See Bertsekas and Tsitsiklis, 2008.) Therefore, the
variance of the average periodogram is
1
var[Ī(ω)] = var[Ir (ω)], (90)
K
or, with Eq. (80), it follows that
1 2
var[Ī(ω)] P (ω). (91)
K xx
879
Fourier Analysis of Signals Using the Discrete Fourier Transform
10 Note that the spectrogram is normally computed such that the windowed segments overlap con-
siderably as r varies, while in periodogram averaging R is normally equal to the window length or half the
window length.
880
C H A P T E R 10
INTRODUCTION
Motivated by situations in which x(t) is the voltage across (or current through) a
unit resistor, we refer to x2 (t) as the instantaneous power in the signal x(t). When
x(t) is WSS, the expected instantaneous power is given by
Z ∞
2 1
E[x (t)] = Rxx (0) = Sxx (jω) dω , (10.1)
2π −∞
c
°Alan V. Oppenheim and George C. Verghese, 2010 183
184 Chapter 10 Power Spectral Density
where Sxx (jω) is the CTFT of the autocorrelation function Rxx (τ ). Furthermore,
when x(t) is ergodic in correlation, so that time averages and ensemble averages
are equal in correlation computations, then (10.1) also represents the time-average
power in any ensemble member. Note that since Rxx (τ ) = Rxx (−τ ), we know
Sxx (jω) is always real and even in ω; a simpler notation such as Pxx (ω) might
therefore have been more appropriate for it, but we shall stick to Sxx (jω) to avoid
a proliferation of notational conventions, and to keep apparent the fact that this
quantity is the Fourier transform of Rxx (τ ).
The integral above suggests that we might be able to consider the expected (in
stantaneous) power (or, assuming the process is ergodic, the time-average power)
in a frequency band of width dω to be given by (1/2π)Sxx (jω) dω. To examine
this thought further, consider extracting a band of frequency components of x(t)
by passing x(t) through an ideal bandpass filter, shown in Figure 10.1.
�
−ω0 ω0 ω
FIGURE 10.1 Ideal bandpass filter to extract a band of frequencies from input, x(t).
Because of the way we are obtaining y(t) from x(t), the expected power in the
output y(t) can be interpreted as the expected power that x(t) has in the selected
passband. Using the fact that
Thus Z
1
Sxx (jω) dω (10.4)
2π passband
c
°Alan V. Oppenheim and George C. Verghese, 2010
Section 10.2 Einstein-Wiener-Khinchin Theorem on Expected Time-Averaged Power 185
is useful to have a name for the Laplace transform of the autocorrelation function;
we shall refer to Sxx (s) as the complex PSD.
Exactly parallel results apply for the DT case, leading to the conclusion that
Sxx (ejΩ ) is the power spectral density of x[n].
The previous section defined the PSD as the transform of the autocorrelation func
tion, and provided an interpretation of this transform. We now develop an alter
native route to the PSD. Consider a random realization x(t) of a WSS process.
We have already mentioned the difficulties with trying to take the CTFT of x(t)
directly, so we proceed indirectly. Let xT (t) be the signal obtained by windowing
x(t), so it equals x(t) in the interval (−T , T ) but is 0 outside this interval. Thus
where we define the window function wT (t) to be 1 for |t| < T and 0 otherwise. Let
XT (jω) denote the Fourier transform of xT (t); note that because the signal xT (t) is
nonzero only over the finite interval (−T, T ), its Fourier transform is typically well
defined. We know that the energy spectral density (ESD) S xx (jω) of xT (t) is
given by
S xx (jω) = |XT (jω)|2 (10.6)
and that this ESD is actually the Fourier transform of xT (τ )∗x← ←
T (τ ), where xT (t) =
xT (−t). We thus have the CTFT pair
Z ∞
←
xT (τ ) ∗ xT (τ ) = wT (α)wT (α − τ )x(α)x(α − τ ) dα ⇔ |XT (jω)|2 , (10.7)
−∞
or, dividing both sides by 2T (which is valid, since scaling a signal by a constant
scales its Fourier transform by the same amount),
Z ∞
1 1
wT (α)wT (α − τ )x(α)x(α − τ ) dα ⇔ |XT (jω)|2 . (10.8)
2T −∞ 2T
The quantity on the right is what we defined (for the DT case) as the periodogram
of the finite-length signal xT (t).
Because the Fourier transform operation is linear, the Fourier transform of the
expected value of a signal is the expected value of the Fourier transform. We
may therefore take expectations of both sides in the preceding equation. Since
E[x(α)x(α − τ )] = Rxx (τ ), we conclude that
1
Rxx (τ )Λ(τ ) ⇔ E[|XT (jω)|2 ] , (10.9)
2T
where Λ(τ ) is a triangular pulse of height 1 at the origin and decaying to 0 at
|τ | = 2T , the result of carrying out the convolution wT ∗ wT← (τ ) and dividing by
c
°Alan V. Oppenheim and George C. Verghese, 2010
186 Chapter 10 Power Spectral Density
c
°Alan V. Oppenheim and George C. Verghese, 2010
Parametric Signal
Modeling
0 INTRODUCTION
It is often convenient to use several different representations of signals and systems. The
representation as a linear combination of sinusoidal and complex exponential signals
led to the Fourier series, the Fourier transform, and the frequency domain characteriza-
tion of signals and LTI systems. Although these representations are particularly useful
because of their generality, they are not always the most efficient representation for
signals with a known structure.
This chapter introduces another powerful approach to signal representation called
parametric signal modeling. In this approach. a signal is represented by a mathematical
model that has a predefined structure involving a limited number of parameters. A
given signal s[n] is represented by choosing the specific set of parameters that results in
the model output ŝ[n] being as close as possible in some prescribed sense to the given
signal. A common example is to model the signal as the output of a discrete-time linear
system as shown in Figure 1. Such models, which are comprised of the input signal v[n]
and the system function H(z) of the linear system, become useful with the addition of
constraints that make it possible to solve for the parameters of H(z) given the signal
v[n] LTI ˆ
s[n]
System
H(z) Figure 1 Linear system model for a
signal s[n].
Parametric Signal Modeling
to be represented. For example, if the input v[n] is specified, and the system function is
assumed to be a rational function of the form
q
bk z−k
k=0
H(z) = , (1)
p
−k
1− ak z
k=1
then the signal is modeled by the values of the ak s and bk s or equivalently, by the poles
and zeros of H(z), along with knowledge of the input. The input signal v[n] is generally
assumed to be a unit impulse δ[n] for deterministic signals, or white noise if the signal
s[n] is viewed as a random signal. When the model is appropriately chosen, it is possible
to represent a large number of signal samples by a relatively small set of parameters.
Parametric signal modeling has a wide range of applications, including data com-
pression, spectrum analysis, signal prediction, deconvolution, filter design, system iden-
tification, signal detection, and signal classification. In data compression, for example, it
is the set of model parameters that is transmitted or stored, and the receiver then uses
the model with those parameters to regenerate the signal. In filter design, the model pa-
rameters are chosen to best approximate, in some sense, the desired frequency response,
or equivalently, the desired impulse response, and the model with these parameters then
corresponds to the designed filter. The two key elements for success in all of the appli-
cations are an appropriate choice of model and an accurate estimate of the parameters
for the model.
The model represented by Eq. (1) in general has both poles and zeros. While there
are a variety of techniques for determining the full set of numerator and denominator
coefficients in Eq. (1), the most successful and most widely used have concentrated on
restricting q to be zero, in which case, H(z) in Figure 1 has the form
G G
H(z) = = , (2)
p A(z)
−k
1− ak z
k=1
where we have replaced the parameter b0 by the parameter G to emphasize its role
as an overall gain factor. Such models are aptly termed “all-pole” models.1 By its very
nature, it would appear that an all-pole model would be appropriate only for modeling
signals of infinite duration. While this may be true in a theoretical sense, this choice
for the system function of the model works well for signals found in many applications,
and as we will show, the parameters can be computed in a straightforward manner from
finite-duration segments of the given signal.
1 Detailed discussion of this case and the general pole/zero case are given in Kay (1988), Thierrien
(1992), Hayes (1996) and Stoica and Moses (2005).
933
Parametric Signal Modeling
The input and output of the all-pole system in Eq. (2) satisfy the linear constant–
coefficient difference equation
p
ŝ[n] = ak ŝ[n − k] + Gv[n], (3)
k=1
which indicates that the model output at time n is comprised of a linear combination
of past samples plus a scaled input sample. As we will see, this structure suggests that
the all-pole model is equivalent to the assumption that the signal can be approximated
as a linear combination of (or equivalently, is linearly predictable from) its previous
values. Consequently, this method for modeling a signal is often also referred to as
linear predictive analysis or linear prediction.2
2When used in the context of speech processing, linear predictive analysis is often referred to as linear
predictive coding (LPC). (See Rabner and Schafer, 1978 and Quatieri, 2002.)
934
Parametric Signal Modeling
is sought so that its output g[n] would be equal to the scaled input Gv[n]. In this for-
mulation, then, we choose the parameters of the inverse filter (and therefore implicitly
the parameters of the model system) to minimize the mean-squared error between g[n]
and Gv[n]. As we will see, this leads to a set of well-behaved linear equations.
From Figure 2 and Eq. (5) it follows that g[n] and s[n] satisfy the difference equa-
tion
p
g[n] = s[n] − ak s[n − k]. (6)
k=1
If v[n] is an impulse, then, for n > 0, the error ê[n] corresponds to the error between
s[n] and the linear prediction of s[n] using the model parameters. Thus, it is convenient
to also express Eq. (7) as
ê[n] = e[n] − Gv[n], (8)
where e[n] is the prediction error given by
p
e[n] = s[n] − ak s[n − k]. (9)
k=1
For a signal that exactly fits the all-pole model of Eq. (3), the modeling error ê[n] will
be zero, and the prediction error e[n] will be the scaled input, i.e.,
e[n] = Gv[n]. (10)
This formulation in terms of inverse filtering leads to considerable simplification,
since v[n] is assumed known and e[n] can be computed from s[n] using Eq. (9). The
parameter values ak are then chosen to minimize
E = |ê[n]|2 , (11)
where the notation · denotes a summing operation for finite energy determinis-
tic signals and an ensemble averaging operation for random signals. Minimizing E in
Eq. (11) results in an inverse filter that minimizes the total energy in the modeling error
in the case of deterministic signals or the mean-squared value of the modeling error in
the case of random signals. For convenience, we will often refer to · as the averaging
operator where its interpretation as a sum or as an ensemble average should be clear
from the context. Again, note that in solving for the parameters ak specifying the inverse
system of Figure 2, the all-pole system is implicitly specified, as well.
935
Parametric Signal Modeling
To find the optimal parameter values, we substitute Eq. (8) into Eq. (11) to obtain
E = (e[n] − Gv[n])2 , (12)
or equivalently,
E = e2 [n] + G2 v2 [n] − 2G v[n]e[n] . (13)
To find the parameters that minimize E, we differentiate Eq. (12) with respect
to the ith filter coefficient ai and set the derivative equal to zero, leading to the set of
equations
∂E ∂ 2
= e [n] − 2G v[n]s[n − i] = 0, i = 1, 2, . . . , p, (14)
∂ai ∂ai
where we have assumed that G is independent of ai and, of course, so is v[n], and
consequently that
∂ 2 2
G v [n] = 0. (15)
∂ai
For models that will be of interest to us, v[n] will be an impulse if s[n] is a causal
finite-energy signal and white noise if s[n] is a wide-sense stationary random pro-
cess. With v[n] an impulse and s[n] zero for n < 0, the product v[n]s[n − i] = 0 for
i = 1, 2, . . . p. With v[n] as white noise,
v[n]s[n − i] = 0, i = 1, 2, . . . p, (16)
since for any value of n, the input of a causal system with white-noise input is uncor-
related with the output values prior to time n. Thus, for both cases, Eq. (14) reduces
to
∂E ∂ 2
= e [n] = 0 i = 1, 2, , . . . , p (17)
∂ai ∂ai
In other words, choosing the coefficients to minimize the average squared modeling
error ê2 [n] is equivalent to minimizing the average squared prediction error e2 [n] .
Expanding Eq. (17) and invoking the linearity of the averaging operator, we obtain
from Eq. (17) the equations
p
s[n]s[n − i] − ak s[n − k]s[n − i] = 0, i = 1, . . . , p. (18)
k=1
Defining
φss [i, k] = s[n − i]s[n − k] , (19)
Eqs. (18) can be rewritten more compactly as
p
ak φss [i, k] = φss [i, 0], i = 1, 2, . . . , p. (20)
k=1
Equations (20) comprise a system of p linear equations in p unknowns. Com-
putation of the parameters of the model can be achieved by solving the set of linear
equations for the parameters ak for k = 1, 2, . . . , p, using known values for φss [i, k] for
i = 1, 2, . . . , p and k = 0, 1, . . . , p or first computing them from s[n].
936
Parametric Signal Modeling
p
e[n] = s[n] − ak s[n − k] = Gv[n]. (21)
k=1
As indicated by Eq. (17), minimizing the inverse modeling error E in Eq. (11) is
equivalent to minimizing the averaged prediction error e2 [n] . If the signal s[n] were
produced by the model system, and if v[n] is an impulse, and if s[n] truly fits the all-pole
model, then the signal at any n > 0 is linearly predictable from past values, i.e., the
prediction error is zero. If v[n] is white noise, then the prediction error is white.
The interpretation in terms of prediction is depicted in Figure 3, where the transfer
function of the prediction filter P(z) is
p
P(z) = ak z−k . (22)
k=1
This system is referred to as the pth -order linear predictor for the signal s[n]. Its output
is
p
s̃[n] = ak s[n − k], (23)
k=1
and as Figure 3 shows, the prediction error signal is e[n] = s[n] − s̃[n]. The sequence
e[n] represents the amount by which the linear predictor fails to exactly predict the
signal s[n]. For this reason, e[n] is also sometimes called the prediction error residual or
simply the residual. With this point of view, the coefficients ak are called the prediction
coefficients. As is also shown in Figure 3, the prediction error filter is related to the linear
predictor by
p
A(z) = 1 − P(z) = 1 − ak z−k . (24)
k=1
s[n] e[n]
−
Linear ~
s[n]
Predictor
P(z)
Figure 3 Linear prediction formulation
A(z) for all-pole signal modeling.
937
Parametric Signal Modeling
To use the optimum inverse filter or equivalently the optimum linear predictor as a basis
for parametric signal modeling, it is necessary to be more specific about the assumed
input v[n] and about the method of computing the averaging operator ·. To this end,
we consider separately the case of deterministic signals and the case of random signals.
In both cases, we will use averaging operations that assume knowledge of the signal to
be modeled over all time −∞ < n < ∞. In Section 3, we discuss some of the practical
considerations when only a finite-length segment of the signal s[n] is available.
With this definition of the averaging operator, φss [i, k] in Eq. (19) is given by
∞
φss [i, k] = s[n − i]s[n − k], (26)
n=−∞
and equivalently,
∞
φss [i, k] = s[n]s[n − (i − k)]. (27)
n=−∞
These equations are called the autocorrelation normal equations and also the Yule–
Walker equations. They provide a basis for computing the parameters a1 , . . . , ap from
the autocorrelation function of the signal. In Section 2.5, we discuss an approach to
choosing the gain factor G.
938
Parametric Signal Modeling
w[n] LTI ˆ
s[n]
System
white H(z) Figure 4 Linear system model for a
noise random signal s[n].
p
ŝ[n] = ak ŝ[n − k] + Gw[n], (31)
k=1
where the input has autocorrelation function E{w[n + m]w[n]} = δ[m], zero mean
(E{w[n]} = 0), and unit average power (E{(w[n])2 } = δ[0] = 1), with E{·} representing
the expectation or probability average operator.3
The resulting model for analysis is the same as that depicted in Figure 2, but the
desired output g[n] changes. In the case of random signals, we want to make g[n] as
much like a white-noise signal as possible, rather than the unit sample sequence that
was desired in the deterministic case. For this reason, the optimal inverse filter for
random signals is often referred to as a whitening filter.
We also choose the operator · in Eq. (11) as an appropriate one for random
signals, specifically the mean-squared value or equivalently the average power. Then
Eq. (11) becomes
E = E{(ê[n])2 }. (32)
If s[n] is assumed to be a sample function of a stationary random process, then φss [i, k]
in Eq. (19) would be the autocorrelation function
The system coefficients can be found as before from Eq. (20). Thus, the system
coefficients satisfy a set of equations of the same form as Eq. (30), i.e.,
p
ak rss [i − k] = rss [i], i = 1, 2, . . . , p. (34)
k=1
Therefore, modeling random signals again results in the Yule–Walker equations, with
the autocorrelation function in this case being defined by the probabilistic average
3 Computation of E{·} requires knowledge of the probability densities. In the case of stationary random
signals, only one density is required. In the case of ergodic random processes, a single infinite time average
could be used. In practical applications, however, such averages must be approximated by estimates obtained
from finite time averages.
939
Parametric Signal Modeling
As outlined in more detail in Problem 2, if Eq. (36) is expanded, and Eq. (20) is substi-
tuted into the result, it follows that in general,
p
E = φss [0, 0] − ak φss [0, k]. (37)
k=1
Equation (37) is true for any appropriate choice of the averaging operator. In particular,
for averaging definitions for which φss [i, k] = rss [i − k], Eq. (37) becomes
p
E = rss [0] − ak rss [k]. (38)
k=1
where c is any constant. The fact that the equality in Eq. (39) is required follows from the
form of the recursive solution of theYule–Walker equations as developed in Section 6. In
words, the autocorrelation normal equations require that for the lags |m| = 0, 1, . . . , p
the autocorrelation functions of the model output and the signal being modeled are
proportional.
940
Parametric Signal Modeling
sd[n]
sr[n]
941
Parametric Signal Modeling
inverse filter will be a unit impulse, and in the random signal case, the output of the
optimum inverse filter will be a zero-mean white-noise sequence with unit average
power. To show that the optimum inverse filter will be exact, note that for a 1st -order
model, Eqs. (30) or (34) reduce to
rsd sd [0]a1 = rsd sd [1], (42)
so from Eq. (41), it follows that the optimum predictor coefficient for both the deter-
ministic and the random signal is
α
rsd sd [1] 1 − α2
a1 = = = α. (43)
rsd sd [0] 1
1 − α2
From Eq. (38), the minimum mean-squared error is
1 α 1 − α2
E= − a1 = = 1, (44)
1−α2 1−α2 1 − α2
which is the size of the unit impulse in the deterministic case and the average power
of the white-noise sequence in the random case.
As mentioned earlier, and as is clear in this example, when the signal is generated
by an all-pole system excited by either an impulse or white noise, all-pole modeling can
determine the parameters of the all-pole system exactly. This requires prior knowledge
of the model order p and the autocorrelation function. This was possible to obtain
for this example, because a closed-form expression was available for the infinite sum
required to compute the autocorrelation function. In a practical setting, it is generally
necessary to estimate the autocorrelation function from a finite-length segment of the
given signal. Problem 14 considers the effect of finite autocorrelation estimates (to be
discussed next) for the deterministic signal sd [n] of this section.
To use the results of Sections 1 and 2 for modeling of either deterministic or random sig-
nals, we require apriori knowledge of the correlation functions φss [i, k] that are needed
to form the system equations satisfied by the coefficients ak , or we must estimate these
from the given signal. Furthermore,we may want to apply block processing or short-time
analysis techniques to represent the time-varying properties of a nonstationary signal,
such as speech. In this section, we will discuss two distinct approaches to the computa-
tion of the correlation estimates for practical application of the concepts of parametric
signal modeling. These two approaches have come to be known as the autocorrelation
method and the covariance method.
942
Parametric Signal Modeling
be zero for all n outside the interval 0 ≤ n ≤ M, even if they have been extracted from
a longer sequence. This, of course, imposes a limit to the exactness that can be expected
of the model, since the IIR impulse response of an all-pole model will be used to model
the finite-length segment of s[n].
Although the prediction error sequence need not be computed explicitly to solve
for the filter coefficients, it is nevertheless informative to consider its computation in
some detail. The impulse response of the prediction error filter is, by the definition of
A(z), in Eq. (24),
p
hA [n] = δ[n] − ak δ[n − k]. (45)
k=1
It can be seen that since the signal s[n] has finite length M + 1 and hA [n], the impulse
response of the prediction filter A[z], has length p + 1, the prediction error sequence
e[n] = hA [n] ∗ s[n] will always be identically zero outside the interval 0 ≤ n ≤ M + p.
Figure 6 shows an example of the prediction error signal for a linear predictor with
p = 5. In the upper plot, hA [n − m] the (time-reversed and shifted) impulse response
of the prediction error filter, is shown as a function of m for three different values of n.
The dark lines with square dots depict hA [n − m], and the lighter lines with round dots
show the sequence s[m] for 0 ≤ m ≤ 30. On the left side is hA [0 − m], which shows
that the first nonzero prediction error sample is e[0] = s[0]. This, of course, is consistent
with Eq. (9). On the extreme right is hA [M + p − m], which shows that the last nonzero
error sample is e[M + p] = −ap s[M]. The second plot in Figure 6 shows the error signal
e[n] for 0 ≤ n ≤ M + p. From the point of view of linear prediction, it follows that the
first p samples (dark lines and dots) are predicted from samples that are assumed to
be zero. Similarly, the samples of the input for n ≥ M + 1 are assumed to be zero to
obtain a finite-length signal. The linear predictor attempts to predict the zero samples
s[m]
hA[0 − m] hA[n − m] hA[M + p − m]
m
−p n M M+p
p
e[n] = hA[n] * s[n] = s[n] −
Σ a s[n − k]
k=1
k
n
p
M M+p
943
Parametric Signal Modeling
s[n], s[n + m]
↑ ↑ n
−m ↑ M
M−m
in the interval M + 1 ≤ n ≤ M + p from prior samples that are nonzero and part of the
original signal. Indeed, if s[0] = 0 and s[M] = 0, then it will be true that both e[0] = s[0]
and e[M + p] = −ap s[M] will be nonzero. That is, the prediction error (total-squared
error E) can never be exactly zero if the signal is defined to be zero outside the interval
0 ≤ n ≤ M. Furthermore, the total-squared prediction error for a pth -order predictor
would be
∞
M+p
E (p) = e[n]2 = e[n]2 = e[n]2 , (46)
n=−∞ n=0
i.e., the limits of summation can be infinite for convenience, but practically speaking,
they are finite.
When the signal is assumed to be identically zero outside the interval 0 ≤ n ≤ M,
the correlation function φss [i, k] reduces to the autocorrelation function rss [m] where the
values needed in Eq. (30) are for m = |i − k|. Figure 7 shows the shifted sequences used
in computing rss [m] with s[n] denoted by round dots and s[n + m] by square dots. Note
that for a finite-length signal, the product s[n]s[n + m] is nonzero only over the interval
0 ≤ n ≤ M −m when m ≥ 0. Since rss is an even function, i.e., rss [−m] = rss [m] = rss [|m|]
it follows that the autocorrelation values needed for the Yule–Walker equations can be
computed as,
∞
M−|m|
rss [|m|] = s[n]s[n + |m|] = s[n]s[n + |m|]. (47)
n=−∞ n=0
For the finite-length sequence s[n], Eq. (47) has all the necessary properties of an auto-
correlation function and rss [m] = 0 for m > M. But of course rss [m] is not the same as
the autocorrelation function of the infinite length signal from which the segment was
extracted.
Equation (47) can be used to compute estimates of the autocorrelation func-
tion for either deterministic or random signals.4 Often, the finite-length input signal
is extracted from a longer sequence of samples. This is the case, for example, in ap-
plications to speech processing, where voiced segments (e.g., vowel sounds) of speech
are treated as deterministic and unvoiced segments (fricative sounds) are treated as
4 In the context of random signals, Eq. (47) is a biased estimate of the autocorrelation function. When
p M as is often the case, this statistical bias is generally negligible.
944
Parametric Signal Modeling
random signals.5 According to the previous discussion, the first p and last p samples of
the prediction error can be large due to the attempt to predict nonzero samples from
zero samples and to predict zero samples from nonzero samples. Since this can bias the
estimation of the predictor coefficients, a signal-tapering window, such as a Hamming
window is generally applied to the signal before computation of the autocorrelation
function.
s[m]
hA[p − m] hA[n − m] hA[M − m]
m
p n M
p
e[n] = hA[n] * s[n] = s[n] − Σ a s[n − k]
k=1
k
p M n
5 In both cases, the deterministic autocorrelation function in Eq. (47) is used as an estimate.
6The definitions of total-squared prediction error in Eqs. (48) and (46) are distinctly different, so we
use the subscript cov to distinguish them.
945
Parametric Signal Modeling
n
i k p
M
in the upper part of Figure 6, but in this case, the prediction error is only computed
over the interval p ≤ n ≤ M as needed in Eq. (48). As shown by the prediction error
filter impulse responses hA [n − m] in the upper plot, there are no end effects when the
prediction error is computed in this way, since all the signal samples needed to compute
the prediction error are available. Because of this, it is possible for the prediction error
to be precisely zero over the entire interval p ≤ n ≤ M, if the signal from which the
finite length segment was extracted was generated as the output of an all-pole system.
Seen another way, if s[n] is the output of an all-pole system with an input that is zero for
n > 0, then as seen from Eqs. (9) and (10) the prediction error will be zero for n > 0.
The covariance function inherits the same definition of the averaging operator,
i.e.,
M
φss [i, k] = s[n − i]s[n − k]. (49)
n=p
The shifted sequences s[n − i] (light lines and round dots) and s[n − k] (dark lines and
square dots) are shown in Figure 9. This figure shows that since we need φss [i, k] only
for i = 0, 1, . . . , p and k = 1, 2, . . . , p, the segment s[n] for 0 ≤ n ≤ M contains all the
samples that are needed to compute φss [i, k] in Eq. (49).
Prediction Error
Both the averaged prediction error e2 [n] and averaged modeling error ê2 [n] are
nonnegative and nonincreasing with increasing model order p. In the autocorrelation
method based on estimates obtained from finite-length signals, the averaged modeling
or prediction error will never be zero, because the autocorrelation values will not be ex-
act. Furthermore, the minimum value of the prediction error even with an exact model
is Gv[n] as indicated by Eq. (10). In the covariance method, the prediction error for
n > 0 can be exactly zero if the original signal was generated by an all-pole model. This
will be demonstrated in Example 2.
946
Parametric Signal Modeling
4 MODEL ORDER
An important issue in parametric signal modeling is the model order p, the choice of
which has a major impact on the accuracy of the model. A common approach to choosing
p is to examine the averaged prediction error (often referred to as the residual) from
(p)
the optimum pth -order model. Let ak be the parameters for the optimal pth -order
predictor found using Eq. (30). The prediction error energy for the pth -order model
using the autocorrelation method is7
∞
2
p
(p)
E =
(p)
s[n] − ak s[n − k] . (51)
n=−∞ k=1
For the zeroth -orderpredictor, (p = 0), there are no delay terms in Eq. (51), i.e., the
“predictor” is just the identity system so e[n] = s[n]. Consequently, for p = 0,
∞
E (0) = s2 [n] = rss [0]. (52)
n=−∞
7 Recall that E (p) denotes the total-squared prediction error for the covariance method, while we use
cov
E (p) with no subscript to denote the total-squared prediction error for the autocorrelation method.
947
Parametric Signal Modeling
Plotting the normalized mean-squared prediction error V (p) = E (p) /E (0) as a func-
tion of p shows how increasing p changes this error energy. In the autocorrelation
method, we showed that the averaged prediction error can never be precisely zero,
even if the signal s[n] was generated by an all-pole system, and the model order is the
same as the order of the generating system. In the covariance method, however, if the
(p)
all-pole model is a perfect model for the signal s[n], Ecov will become identically zero
at the correct choice of p, since the averaged prediction error only considers values
for p ≤ n ≤ M. Even if s[n] is not perfectly modeled by an all-pole system, there is
often a value of p above which increasing p has little or no effect on either V (p) or
(p) (p) (0)
Vcov = Ecov /Ecov . This threshold is an efficient choice of model order for representing
the signal as an all-pole model.
1
Autocorrelation Method
Covariance Method
prediction error V(p) and Vcov
(p)
0.8
Normalized mean-squared
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20
Model order p
948
Parametric Signal Modeling
To obtain an understanding of the nature of the spectrum estimate in Eq. (56) for
the deterministic case, it is useful to recall that the DTFT of the finite-length signal s[n]
is
M
S(ejω ) = s[n]e−jωn . (57)
n=0
where, due to the finite length of s[n], rss [m] = 0 for |m| > M. The values of rss [m]
for m = 0, 1, 2, . . . , p are used in the computation of the all-pole model using the
autocorrelation method. Thus, it is reasonable to suppose that there is a relationship
949
Parametric Signal Modeling
between the Fourier spectrum of the signal, |S(ejω )|2 , and the all-pole model spectrum,
|Ŝ(ejω )|2 = |H(ejω )|2 .
One approach to illuminating this relationship is to obtain an expression for
the averaged prediction error in terms of the DTFT of the signal s[n]. Recall that
the prediction error is e[n] = hA [n] ∗ s[n], where hA [n] is the impulse response of the
prediction error filter. From Parseval’s Theorem, the averaged prediction error is
M+p π
1
E= (e[n])2 = |S(ejω )|2 |A(ejω )|2 dω, (59)
2π −π
n=0
where S(ejω ) is the DTFT of s[n] as given by Eq. (57). Since H(z) = G/A(z), Eq. (59)
can be expressed in terms of H(ejω ) as
G2 π |S(ejω )|2
E= dω. (60)
2π −π |H(ejω )|2
Since the integrand in Eq. (60) is positive, and |H(ejω )|2 > 0 for −π < ω ≤ π, it
therefore follows from Eq. (60) that minimizing E is equivalent to minimizing the ratio
of the energy spectrum of the signal s[n] to the magnitude-squared of the frequency
response of the linear system in the all-pole model. The implication of this is that
the all-pole model spectrum will attempt to match the energy spectrum of the sig-
nal more closely at frequencies where the signal spectrum is large, since frequencies
where |S(ejω )|2 > |H(ejω )|2 contribute more to the mean-squared error than frequen-
cies where the opposite is true. Thus, the all-pole model spectrum estimate favors a
good fit around the peaks of the signal spectrum. This will be illustrated by the discus-
sion in Section 5.1. Similar analysis and reasoning also applies to the case in which s[n]
is random.
950
Parametric Signal Modeling
0.5
−0.5
0 50 100 150 200
Samples number (8 kHz sampling rate)
(a)
−5
0 12 27 40 50 100 150 200
Lag number (8 kHz sampling rate)
(b)
951
Parametric Signal Modeling
40
DTFT p = 12 p = 40
20
−20
−40
−60
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency in kHz
(a)
1
V (p) = —––
E(p)
E(o)
0. 5
0
0 5 10 15 20 25 30 35 40
p
(b)
Figure 12 (a) Comparison of DTFT and all-pole model spectra for voiced speech
segment in Figure 11(a). (b) Normalized prediction error as a function of p.
This example illustrates that the choice of the model order p controls the de-
gree of smoothing of the DTFT spectrum. Figure 12(b) shows that as p increases, the
mean-squared prediction error decreases quickly and then levels off, as in our previ-
ous example. Recall that in Sections 2.4 and 2.5, we argued that the all-pole model
with appropriately chosen gain results in a match between the autocorrelation func-
tions of the signal and the all-pole model up to p correlation lags as in Eq. (39).
This implies that as p increases, the all-pole model spectrum will approach the DTFT
spectrum, and when p → ∞, it follows that rhh [m] = rss [m] for all m, and therefore,
|H(ejω )|2 = |S(ejω )|2 . However, this does not mean that H(ejω ) = S(ejω ) because H(z)
is an IIR system, and S(z) is the z-transform of a finite-length sequence. Also note
that as p → ∞, the averaged prediction error does not approach zero, even though
|H(ejω )|2 → |S(ejω )|2 . As we have discussed, this occurs because the total error in
Eq. (11) is the prediction error ẽ[n] minus Gv[n]. Said differently, the linear predic-
tor must always predict the first nonzero sample from the zero-valued samples that
precede it.
952
Parametric Signal Modeling
0.1
0.05
−0.05
−0.1
0 50 100 150 200
Sample number (8 kHz sampling rate)
(a)
0.1
−0.1
0 12 40 50 100 150 200
Lag number (8 kHz sampling rate)
(b)
The other main class of speech sounds is comprised of the unvoiced sounds such
as fricatives. These sounds are produced by creating random turbulent air flow in the
vocal tract; therefore, they are best modeled in terms of an all-pole system excited by
white noise. Figure 13 shows an example of a 201-point Hamming-windowed segment of
unvoiced speech and its corresponding autocorrelation function. Note that the autocor-
relation function shows no indication of periodicity in either the signal waveform or the
autocorrelation function. A comparison of the DTFT of the signal in Figure 13(a) with
two all-pole model spectra computed from the autocorrelation function in Figure 13(b)
is shown in Figure 14(a). From the point of view of spectrum analysis of random signals,
the magnitude-squared of the DTFT is a periodogram. Thus, it contains a component
that is randomly varying with frequency. Again, by choice of the model order, the peri-
odogram can be smoothed to any desired degree.
953
Parametric Signal Modeling
20
DTFT p = 12 p = 40
10
−20
−30
−40
−50
−60
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency in kHz
(a)
1
V (p) = —––
E(p)
E(o)
0.5
0
0 5 10 15 20 25 30 35 40
p
(b)
Figure 14 (a) Comparison of DTFT and all-pole model spectra for unvoiced speech
segment in Figure 13(a). (b) Normalized prediction error as a function of p.
Figure 15 shows the zeros of the prediction error system function A(z) (poles of
the model system) for the two spectrum estimates in Figure 12(a). For p = 12, the
zeros of A(z) are denoted by the open circles. Five complex conjugate pairs of zeros are
close to the unit circle, and their manifestations as poles are clearly evident in heavy
line curve of Figure 12(a). For the case p = 40, the zeros of A(z) are denoted by the
large filled dots. Observe that most of the zeros are close to the unit circle, and they
are more or less evenly distributed around the unit circle. This produces the peaks
in the model spectrum that are spaced approximately at multiples of the normalized
radian frequency corresponding to the fundamental frequency of the speech signal; i.e.,
at angles 2π(296 Hz)/8 kHz.
954
Parametric Signal Modeling
p = 12 p = 40
0.5
Imaginary Part 0
−0.5
That is, the sum of two sinusoids can be represented as the impulse response of an LTI
system whose system function has both poles and zeros. The numerator polynomial
would be a somewhat complicated function of the amplitudes, frequencies, and phase
shifts. What is important for our discussion is that the numerator is a 3rd -order poly-
nomial and the denominator is a 4th -order polynomial, the roots of which are all on
the unit circle at angles equal to ±ω1 and ±ω2 . The difference equation describing this
system with impulse excitation has the form
4
3
s[n] − ak s[n − k] = bk δ[n − k] (63)
k=1 k=1
where the coefficients ak would result from multiplying the denominator factors. Note
that
4
s[n] − ak s[n − k] = 0 for n ≥ 4, (64)
k=1
which suggests that the signal s[n] can be predicted with no error by a 4th -order predictor
except at the very beginning (0 ≤ n ≤ 3). The coefficients of the denominator can be
955
Parametric Signal Modeling
50
Amplitude
0
−50
0 10 20 30 40 50 60 70 80 90 100
Time index n
(a) Sinusoidal signal
120
DTFT
100 Autocorrelation method
Log magnitude in dB
Covariance method
80 True frequencies
60
40
20
0
0.15 0.2 0.22 0.25 0.3
Frequency Ⲑ
(b) Spectrum estimates
estimated from the signal by applying the covariance method to a short segment of the
signal selected so as not to include the first four samples. In the ideal case for which
Eq. (61) accurately represents the signal (e.g., high SNR), the roots of the resulting
polynomial provide good estimates of the frequencies of the component sinusoids.
Figure 16(a) shows a plot of 101 samples of the signal8
s[n] = 20 cos(0.2πn − 0.1π) + 22 cos(0.22πn + 0.9π). (65)
Because the two frequencies are close together, it is necessary to use a large number of
samples to resolve the two frequencies by Fourier analysis. However, since the signal fits
the all-pole model perfectly, the covariance method can be used to obtain very accurate
estimates of the frequencies from very short segments of the signal. This is illustrated
in Figure 16(b).
The DTFT of the 101 samples (with rectangular window) shows no indication that
there are two distinct sinusoid frequencies around ω = 0.21π. Recall that the main lobe
width for an (M + 1)-point rectangular window is ω = 4π/(M + 1). Consequently, a
101-point rectangular window can clearly resolve two frequencies only if they are no
closer than about .04π rad/s. Correspondingly, the DTFT does not show two spectral
peaks.
Similarly, use of the autocorrelation method results in the spectrum estimate
shown by the heavy line. This estimate also contains only one spectral peak. The predic-
8The tapering of the segment of the signal in Figure 16(a) is not a result of windowing. It is caused by
the “beating” of the two cosines of nearly the same frequency. The period of the beat frequency (difference
between 0.22π and 0.2π) is 100 samples.
956
Parametric Signal Modeling
tion error polynomial (in factored form) obtained with the autocorrelation method is
The two real poles contribute no peaks, and the complex poles are close to the unit circle,
but at ±0.21π, which is halfway between the two frequencies. Thus, the windowing
inherent in the autocorrelation method causes the resulting model to lock onto the
average frequency 0.21π.
On the other hand, the factored prediction error polynomial obtained with the
covariance method is (with rounding of the magnitudes and angles) given by
In this case, the angles of the zeros are almost exactly equal to the frequencies of the
two sinusoids. Also shown in Figure 16(b) is the frequency response of the model, i.e.,
1
|Hcov (ejω )|2 = . (68)
|Acov (ejω )|2
plotted in dB. In this case, the prediction error is very close to zero, which, if used
to estimate the gain of the all-pole model, would lead to an indeterminant estimate.
Therefore, the gain is arbitrarily set to one, which leads to a plot of Eq. (68) on a similar
scale to the other estimates. Since the poles are almost exactly on the unit circle, the
magnitude spectrum becomes exceedingly large at the pole frequencies. Note that the
roots of the prediction error polynomial give an accurate estimate of the frequencies.
This method, of course, does not provide accurate information about the amplitudes
and phases of the sinusoidal components.
In both the autocorrelation and covariance methods of computing the correlation val-
ues, the predictor coefficients that minimize the mean-squared inverse filter error and
equivalently the prediction error satisfy a set of linear equations of the general form:
⎡ ⎤⎡ ⎤ ⎡ ⎤
φss [1, 1] φss [1, 2] φss [1, 3] · · · φss [1, p] a1 φss [1, 0]
⎢ φss [2, 1] φss [2, 2] φss [2, 3] · · · φss [2, p] ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ a2 ⎥ ⎢ φss [2, 0] ⎥
⎢ φss [3, 1] φss [3, 2] φss [3, 3] · · · φss [3, p] ⎥ ⎢ a3 ⎥ = ⎢ φss [3, 0] ⎥
⎥ ⎢ ⎥ ⎢
⎢ ⎥. (69)
⎢ .. .. .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . . . ··· . ⎦ ⎣ . ⎦ ⎣ . ⎦
φss [p, 1] φss [p, 2] φss [p, 3] · · · φss [p, p] ap φss [p, 0]
957
Parametric Signal Modeling
(L–D.1) This step initializes the mean-squared prediction error to be the energy of the
signal. That is, a zeroth -order predictor (no predictor) yields no reduction in
prediction error energy, since the prediction error e[n] is identical to the signal
s[n].
The next line in Figure 17 states that steps (L–D.2) through (L–D.5) are repeated p
times, with each repetition of those steps increasing the order of the predictor by one.
In other words, the algorithm computes a predictor of order i from the predictor of
order i − 1 starting with i − 1 = 0.
958
i i
14
Estimación del espectro
de potencia
En este capítulo vamos a abordar la estimación de las características espectrales de señales caracterizadas como
procesos aleatorios. Muchos de los fenómenos que ocurren en la naturaleza se caracterizan mejor estadística-
mente en términos de promedios. Por ejemplo, los fenómenos meteorológicos tales como las fluctuaciones en la
temperatura y la presión del aire se caracterizan mejor estadísticamente como procesos aleatorios. Las tensiones
de ruido térmico generadas en las resistencias y dispositivos electrónicos son ejemplos adicionales de señales
físicas que están modeladas como procesos aleatorios.
Debido a las fluctuaciones aleatorias en tales señales, debemos adoptar un punto de vista estadístico, que
trate con las características promedio de señales aleatorias. En particular, la función de autocorrelación de un
proceso aleatorio es la media estadística apropiada que se utilizará para caracterizar las señales aleatorias en
el dominio del tiempo, y la transformada de Fourier de la función de autocorrelación, que da el espectro de la
densidad de potencia, y proporciona la transformación del dominio del tiempo al dominio de la frecuencia.
Los métodos de estimación del espectro de potencia tienen una historia relativamente larga. Para adquirir una
perspectiva histórica, el lector puede consultar los documentos de Robinson (1982) y el libro de Marple (1987).
Nosotros vamos a cubrir los métodos de estimación del espectro de potencia clásicos basados en el periodograma,
desarrollado originalmente por Schuster (1898) y por Yule (1927), que dieron origen a los modernos métodos
paramétricos o basados en modelo. Estos métodos fueron desarrollados posteriormente y aplicados por Walker
(1931), Bartlett (1948), Parzen (1957), Blackman y Tukey (1958), Burg (1967), y otros. Describimos también
el método de Capon (1969) y los métodos basados en el autoanálisis de la matriz de correlación de datos.
i i
i i
de la señal no son estacionarios, no podemos seleccionar un registro de datos arbitrariamente largo para estimar
el espectro. En tal caso, la longitud del registro de datos que seleccionemos se determina por la rapidez de las
variaciones temporales en los parámetros estadísticos de la señal. Por último, nuestro objetivo es seleccionar
un registro de datos tan corto como sea posible que nos permita resolver las características espectrales de las
diferentes componentes de señal en el registro de datos que contiene espectros muy poco espaciados.
Uno de los problemas con los que solemos encontrarnos en los métodos de estimación de los espectros de
potencia basados en un registro de datos de longitud finita es la distorsión del espectro que estamos intentando
estimar. Este problema se produce tanto en el cálculo del espectro de una señal determinística como en la
estimación del espectro de potencia de una señal aleatoria. Puesto que es más fácil observar el efecto de la
longitud finita del registro de datos sobre una señal determinística, vamos a abordar este caso en primer lugar.
A continuación, consideraremos sólo las señales aleatorias y la estimación de sus espectros de potencia.
La magnitud |Xa (F)|2 representa la distribución de la energía de la señal como una función de la frecuencia
energía de la señal, es decir,
y se conoce como espectro de densidad de potencia
como se ha descrito en el Capítulo 4. Por tanto, la energía total de la señal es simplemente la integral de Sxx (F)
para todo F [es decir, el área total bajo Sxx (F)].
También es interesante destacar que Sxx (F) puede interpretarse como la transformada de Fourier de otra
función, Rxx (τ ), conocida como función de autocorrelación de la señal de energía finita x a (t), definida como
∞
Rxx (τ ) = x∗a (t)xa (t + τ ) dt (14.1.3)
−∞
de modo que Rxx (τ ) y Sxx (F) forman una pareja de transformadas de Fourier.
i i
i i
Supongamos ahora que calculamos el espectro de densidad de potencia de la señal xa (t) a partir de sus
muestras tomadas a la frecuencia de Fs muestras por segundo. Para garantizar que no existe aliasing espectral
resultante del proceso de muestreo, se supone que la señal será prefiltrada, de modo que, para propósitos
prácticos, su ancho de banda está limitado a B hercios. Entonces la frecuencia de muestreo Fs se selecciona de
manera que Fs > 2B.
La versión muestreada de xa (t) es una secuencia x(n), −∞ < n < ∞, que tiene la transformada de Fourier
(espectro de tensión)
∞
X(ω ) = ∑ x(n)e− jω n
n=−∞
o, lo que es equivalente,
∞
X( f ) = ∑ x(n)e− j2π f n (14.1.5)
n=−∞
Recuerde que X( f ) puede expresarse en términos del espectro de tensión de la señal analógica xa (t) como
∞
F
X = Fs ∑ Xa (F − kFs ) (14.1.6)
X(F) k=−∞
Por tanto, el espectro de tensión de la señal muestreada es idéntico al espectro de tensión de la señal analógica.
energía de la señal muestreada es
Como consecuencia, el espectro de densidad de potencia
2
F F
Sxx = X = F 2 |Xa (F)|2 (14.1.8)
Sxx (F) X(F) s
Por tanto, el espectro de densidad de potencia puede obtenerse mediante la transformada de Fourier de la
autocorrelación de la secuencia {x(n)}.
Las relaciones anteriores nos llevan a diferenciar entre dos métodos distintos para calcular el espectro de
densidad de potencia de una señal xa (t) a partir de sus muestras x(n). Uno es el método directo, que implica el
cálculo de la transformada de Fourier de {x(n)}, y así
i i
i i
El segundo método es el método indirecto porque requiere dos pasos. En primer lugar, la autocorrelación r xx (k)
se calcula a partir de x(n) y luego se calcula la transformada de Fourier de la autocorrelación como en (14.1.10)
para obtener el espectro de densidad de potencia.
Sin embargo, en la práctica, sólo la secuencia de duración finita x(n), 0 ≤ n ≤ N − 1, está disponible para
calcular el espectro de la señal. En efecto, limitar la duración de la secuencia x(n) a N puntos es equivalente a
multiplicar x(n) por una ventana rectangular. Por tanto, tenemos
x(n), 0 ≤ n ≤ N − 1
x̃(n) = x(n)w(n) = (14.1.12)
0, en otro caso
A partir de nuestra exposición sobre el diseño de filtros FIR basado en el uso de ventanas para limitar la duración
de la respuesta al impulso, recordemos que la multiplicación de dos secuencia es equivalente a convolucionar
sus espectros de tensión. En consecuencia, la relación correspondiente en el dominio de la frecuencia a (14.1.12)
es
X̃( f ) = X( f ) ∗ W ( f )
1/2 (14.1.13)
= X(α )W ( f − α )d α
−1/2
EJEMPLO 14.1.1
Solución. La característica espectral W ( f ) para la ventana rectangular de longitud N = 61 se ilustra en la Figura 10.2.2(a).
Observe que la anchura del lóbulo principal de la función de ventana es ∆ω = 4π /61 o ∆ f = 2/61, que es estrecha comparada
con X( f ).
La convolución de X( f ) con W ( f ) se ilustra en la Figura 14.1.1. Observe que dicha energía se ha fugado a la banda de
frecuencias 0.1 < | f | ≤ 0.5, donde X( f ) = 0. Además esto se debe a la anchura del lóbulo principal en W ( f ), lo que hace
que X( f ) se disperse fuera del rango | f | ≤ 0.1. Sin embargo, la energía del lóbulo secundario en X̃( f ) se debe a la presencia
de los lóbulos secundarios de W ( f ), que se convolucionan con X( f ). La dispersión de X( f ) para | f | > 0.1 y los lóbulos
secundarios en el rango 0.1 ≤ | f | ≤ 0.5 constituyen las fugas.
Como en el caso de diseño del filtro FIR, podemos reducir las fugas del lóbulo secundario seleccionando
ventanas que tengan lóbulos secundarios pequeños. Esto implica que las ventanas tienen un corte suave en el
dominio del tiempo en lugar del abrupto corte de la ventana rectangular. Aunque tales funciones de ventana
reducen las fugas del lóbulo secundario, dan lugar a un aumento del suavizado o dispersión de la característica
espectral X( f ). Por ejemplo, el uso de la ventana de Blackman de longitud N = 61 del Ejemplo 14.1.1 da lugar
i i
i i
10
0
−10
Densidad de energía (dB)
−20
−30
−40
−50
−60
−70
−80
−90
−100 f
0 .1 .2 .3 .4 .5
Frecuencia (ciclos/muestra)
Figura 14.1.1. Espectro obtenido convolucionando una ventana rectangular de longitud M = 61 con el espectro
paso bajo ideal del Ejemplo 14.1.1.
a la característica espectral X̃( f ) mostrada en la Figura 14.1.2. Las fugas del lóbulo secundario se han reducido
notablemente, aunque la anchura espectral ha aumentado aproximadamente en un 50 %.
La dispersión del espectro que se está estimando mediante el uso de ventanas es especialmente un problema
cuando deseamos resolver las señales con componentes de frecuencia muy poco espaciadas. Por ejemplo, la
señal con la característica espectral X( f ) = X1 ( f ) + X2 ( f ), como la mostrada en la Figura 14.1.3, no se puede
resolver como dos señales separadas a menos que la anchura de la función de ventana sea significativamente
más estrecha que la separación de frecuencia ∆ f . Por tanto, observe que utilizar ventanas de suavizado en el
dominio de tiempo reduce las fugas a expensas de una disminución de la resolución en frecuencia.
10
0
−10
Densidad de energía (dB)
−20
−30
−40
−50
−60
−70
−80
−90
−100 f
0 .1 .2 .3 .4 .5
Frecuencia (ciclos/muestra)
Figura 14.1.2. Espectro obtenido convolucionando una ventana de Blackman de longitud M = 61 con el
espectro paso bajo ideal en el Ejemplo 14.1.1.
i i
i i
X(f)
X1(f) X2(f)
f
0 0.5
El espectro dado por (14.1.14) puede calcularse numéricamente en el conjunto de N puntos de frecuencia
por medio de la DFT. Por tanto,
N−1
X̃(k) = ∑ x̃(n)e− j2π kn/N (14.1.15)
n=0
Así,
k
|X̃(k)|2 = Sx̃x̃ ( f )| f =k/N = Sx̃x̃ (14.1.16)
N
y por tanto
N−1 2
k
Sx̃x̃ = ∑ x̃(n)e− j2π kn/N (14.1.17)
N n=0
donde E[·] designa la media estadística. Luego, por medio del teorema de Wiener–Khintchine, el espectro
de densidad de potencia del proceso aleatorio estacionario es la transformada de Fourier de la función de
autocorrelación, es decir, ∞
Γxx (F) = γxx (τ )e− j2π F τ dt (14.1.19)
−∞
i i
i i
En la práctica, trabajamos con una única realización del proceso aleatorio a partir de la cual estimamos el
espectro de potencia del proceso. La función de autocorrelación real γxx (τ ) no es conocida y, como consecuencia,
no podemos calcular la transformada de Fourier en (14.1.19) para obtener Γxx (F). Por el contrario, a partir de
una única realización del proceso aleatorio podemos calcular la función de autocorrelación con respecto a la
media temporal
T0
1
Rxx (τ ) = x∗ (t)x(t + τ ) dt (14.1.20)
2T0 −T0
donde 2T0 es el intervalo de observación. Si el proceso aleatorio estacionario es ergódico en los momentos de
primer y segundo orden (media y función de autocorrelación), luego
Esta relación justifica el uso de la autocorrelación temporal Rxx (τ ) como un estimado de la función de
autocorrelación estadística γxx (τ ). Además, la transformada de Fourier de Rxx (τ ) proporciona un estimado
Pxx (F) del espectro de la densidad de potencia, es decir,
T0
Pxx (F) = Rxx (τ )e− j2π F τ d τ
−T0
T0 T0
1
= x∗ (t)x(t + τ )dt e− j2π F τ d τ (14.1.22)
2T0 −T0 −T0
2
1 T0 − j2π Ft
= x(t)e dt
2T0 −T0
El espectro de densidad de potencia real es el valor esperado de Pxx (F) en el límite cuando T0 → ∞,
A partir de (14.1.20) y (14.1.22) de nuevo debemos trabajar con dos posibles métodos para calcular Pxx(F), el
método directo dado por (14.1.22) o el método indirecto, en el que obtenemos Rxx (τ ) primero y luego calculamos
la transformada de Fourier.
Consideremos la estimación del espectro de densidad de potencia a partir de las muestras de una única
realización del proceso aleatorio. En particular, suponemos que xa (t) se muestrea a una frecuencia Fs > 2B,
donde B es la frecuencia más alta contenida en el espectro de densidad de potencia del proceso aleatorio. Por
tanto, obtenemos una secuencia de duración finita x(n), 0 ≤ n ≤ N − 1, muestreando xa (t). A partir de estas
muestras calculamos la secuencia de autocorrelación temporal
1 N−m−1 ∗
rxx (m) = ∑ x (n)x(n + m),
N − m n=0
m = 0, 1, . . . , N − 1 (14.1.24)
i i
i i
1 N−m−1
E[rxx (m)] = ∑ E[x∗ (n)x(n + m)] = γxx (m)
N − m n=0
(14.1.26)
(m) es un estimado no
donde γxx (m) es la secuencia de autocorrelación real (estadística) de x(n). Por tanto, rxx
(m) es aproximadamente
desviado de la función de γxx (m). La varianza del estimado rxx
∞
N
var[rxx (m)] ≈ ∑
[N − m]2 n=−∞
∗
|γxx (n)|2 + γxx (n − m)γxx (n + m) (14.1.27)
siempre que
∞
∑ |γxx (n)|2 < ∞
n=−∞
Dado que E[rxx (m)] = γxx (m) y la varianza del estimado converge a cero cuando N → ∞, se dice que el estimado
rxx (m) es coherente.
Para valores grandes del parámetro de retardo m, el estimado rxx (m) dado por (14.1.24) tiene una varianza
grande, especialmente cuando m se aproxima a N. Esto se debe al hecho de que se han empleado muy pocos
puntos de datos en el estimado para retardos grandes. Como alternativa a (14.1.24) podemos emplear el estimado
1 N−m−1 ∗
rxx (m) = ∑ x (n)x(n + m),
N n=0
0 ≤ m ≤ N −1
(14.1.29)
1 N−1 ∗
rxx (m) = ∑ x (n)x(n + m),
N n=|m|
m = −1, −2, . . ., 1 − N
1 N−m−1
E[rxx (m)] = ∑ E[x∗ (n)x(n + m)]
N n=0
(14.1.30)
N − |m| |m|
= γxx (m) = 1 − γxx (m)
N N
Sin embargo, este estimado tiene una varianza menor, dada aproximadamente por
1 ∞
var[rxx (m)] ≈ ∑ [γxx (n)|2 + γxx∗ (n − m)γxx(n + m)]
N n=−∞
(14.1.31)
y su varianza converge a cero cuando N → ∞. Por tanto, el estimado rxx (m) también es un estimado coherente
de γxx (m).
i i
i i
Utilizaremos el estimado rxx (m) dado por (14.1.29) en nuestro tratamiento de la estimación del espectro de
potencia. El correspondiente estimado del espectro de densidad de potencia es
N−1
Pxx ( f ) = ∑ rxx (m)e− j2π f m (14.1.33)
m=−(N−1)
Si sustituimos la expresión de rxx (m) dada en (14.1.29) en la Ecuación (14.1.33), el estimado Pxx ( f ) también se
puede expresar como
2
1 N−1
− j2π f n 1
Pxx ( f ) = ∑ x(n)e = |X( f )|2 (14.1.34)
N n=0 N
donde X( f ) es la transformada de Fourier de la secuencia de muestras x(n). Esta forma bien conocida del
estimado del espectro de densidad de potencia se denomina periodograma. Originalmente fue presentado por
Schuster (1898) para detectar y medir las “periodicidades ocultas” en los datos.
A partir de (14.1.33), el valor medio del estimado del periodograma Pxx ( f ) es
N−1
E[Pxx ( f )] = E ∑ rxx (m)e − j2π f m
= ∑N−1
m=−(N−1)
E[rxx (m)]e− j2π f m
m=−(N−1)
(14.1.35)
N−1
|m|
E[Pxx ( f )] = ∑ 1 −
N
γxx (m)e− j2π f m
m=−(N−1)
La interpretación que damos a (14.1.35) es que el valor medio del espectro estimado es la transformada de
Fourier de la función de autocorrelación tratada mediante ventanas
|m|
γ̃xx (m) = 1 − γxx (m) (14.1.36)
N
donde la función de ventana es la ventana de Bartlett (triangular). Por tanto, el valor medio del espectro estimado
es
∞ 1/2
E[Pxx ( f )] = ∑ γ̃xx (m)e− j2π f m =
−1/2
Γxx (α )WB ( f − α )d α (14.1.37)
m=−∞
dondeWB( f ) es la característica espectral de la ventana de Bartlett. La relación (14.1.37) ilustra que el valor medio
del espectro estimado es la convolución del espectro de densidad de potencia real Γxx ( f ) con la transformada
de Fourier WB ( f ) de la ventana de Bartlett. En consecuencia, la media del espectro estimado es una versión
suavizada del espectro real y sufre los mismos problemas de fugas espectrales, que se deben al número finito
de puntos de datos.
Observe que el espectro estimado está desviado asintóticamente, es decir,
N−1 ∞
lı́m E
N→∞
∑ rxx (m)e− j2π f m = ∑ γxx (m)e− j2π f m = Γxx ( f )
m=−(N−1) m=−∞
Sin embargo, en general, la varianza del estimado Pxx ( f ) no disminuye a cero cuando N → ∞. Por ejemplo,
cuando la secuencia de datos es un proceso aleatorio gaussiano, puede demostrarse fácilmente que la varianza
es (véase el Problema 14.4)
sen 2π f N 2
var[Pxx ( f )] = Γxx ( f ) 1 +
2
(14.1.38)
N sen 2π f
i i
i i
Por tanto, concluimos que el periodograma no es un estimado coherente del espectro de densidad de potencia
real (es decir, no converge al espectro de densidad de potencia real).
En resumen, la autocorrelación estimada rxx (m) es un estimado coherente de la función de autocorrelación
real γxx (m). Sin embargo, su transformada de Fourier Pxx ( f ), el periodograma, no es un estimado coherente del
espectro de densidad de potencia real. Ya hemos mencionado que Pxx( f ) es un estimado desviado asintóticamente
de Γxx ( f ), pero para una secuencia de duración finita, el valor medio de Pxx ( f ) presenta una desviación, la cual
es evidente a partir de (14.1.37) como una distorsión del espectro de la densidad de potencia real. Por tanto, el
espectro estimado se ve afectado por los efectos de suavizado y las fugas incluidas en la ventana de Bartlett.
En última instancia, el suavizado y las fugas limitan nuestra capacidad para resolver los espectros muy poco
espaciados.
Los problemas de fugas y de resolución en frecuencia que acabamos de describir, así como el problema
de que el periodograma no es un estimado coherente del espectro de potencia, son la causa de los métodos de
estimación del espectro de potencia que se describen en las Secciones 14.2, 14.3 y 14.4. Los métodos descritos
en la Sección 14.2 son métodos clásicos no paramétricos, que no plantean ninguna suposición acerca de la
secuencia de datos. El énfasis de los métodos clásicos está puesto en la obtención de un estimado coherente
del espectro de potencia a través de operaciones de promediado o suavizado realizadas directamente sobre el
periodograma o la autocorrelación. Como veremos, el efecto de estas operaciones es reducir la resolución de
frecuencia aún más, a la vez que se disminuye la varianza del estimado.
Los métodos de estimación del espectro descritos en la Sección 14.3 están basados en un determinado
modelo que define cómo fueron generados los datos. En general, los métodos basados en modelo que se han
desarrollado en las últimas décadas proporcionan una resolución significativamente más alta que los métodos
clásicos.
En las Secciones 14.4 y 14.5 se describen métodos adicionales. En la Sección 14.4, se abordan métodos
basados en bancos de filtros que permiten estimar el espectro de potencia. Los métodos descritos en la Sección
14.5 están basados en la descomposición autovalor/autovector de la matriz de correlación de datos.
i i
i i
Debemos resaltar que rellenar con ceros y evaluar la DFT en L > N puntos no mejora la resolución en
frecuencia del estimado espectral. Simplemente nos proporciona un método para interpolar los valores del
espectro medido en más frecuencias. La resolución en frecuencia en el estimado espectral Pxx ( f ) se determina
mediante la longitud N del registro de datos.
EJEMPLO 14.1.2
Obtenemos una secuencia de N = 16 muestras muestreando una señal analógica que consta de dos componentes en frecuencia.
La secuencia discreta en el tiempo resultante es
donde ∆ f es la separación de frecuencia. Evalúe el espectro de potencia P( f ) = (1/N)|X( f )| 2 en las frecuencias f k = k/L,
k = 0, 1, . . . , L − 1, para L = 8, 16, 32 y 128 para los valores ∆ f = 0.06 y ∆ f = 0.01.
Solución. Rellenando con ceros, aumentamos la secuencia de datos para obtener el estimado del espectro de potencia
Pxx (k/L). Los resultados para ∆ f = 0.06 se han dibujado en la Figura 14.1.4. Observe que el relleno de ceros no cambia
la resolución, pero tiene el efecto de interpolar el espectro Pxx ( f ). En este caso, la separación de frecuencia ∆ f es lo
suficientemente grande como para que las dos componentes en frecuencia sean resolubles.
0 8
0 16
0 32
0 128
i i
i i
0 8
0 16
0 32
0 128
Los estimados espectrales para ∆ f = 0.01 se muestran en la Figura 14.1.5. En este caso, las dos componentes espectrales
no son resolubles. De nuevo, el efecto del relleno con ceros es proporcionar más interpolación, lo que nos permite disponer
de una mejor imagen del espectro estimado, aunque esto no mejora la resolución en frecuencia.
Cuando sólo se necesitan unos pocos puntos del periodograma, el algoritmo de Goertzel descrito en el
Capítulo 8 puede proporcionar un cálculo más eficiente. Puesto que el algoritmo de Goertzel ha sido interpretado
como un método de filtrado lineal para calcular la DFT, es evidente que el estimado del periodograma puede
obtenerse pasando la señal a través de un banco de filtros sintonizados en paralelo y elevando al cuadrado sus
salidas (véase el Problema 14.5).
i i
i i
Dado que los estimados se basan por completo en un registro finito de datos, la resolución en frecuencia de
estos métodos es, en el mejor de los casos, igual a la anchura espectral de la ventana rectangular de longitud N,
que es aproximadamente 1/N en los puntos a −3-dB. Debemos ser más precisos al especificar la resolución en
frecuencia de los métodos especificados. Todas las técnicas de estimación descritas en esta sección disminuyen
la resolución en frecuencia con el fin de reducir la varianza en el estimado espectral.
En primer lugar, describimos los estimados y obtenemos la media y la varianza de cada uno. En la Sección
14.2.4 se proporciona una comparativa de los tres métodos. Aunque los estimados espectrales se expresan como
una función de la variable continua en frecuencia f , en la práctica, los estimados se calculan en frecuencias
discretas mediante el algoritmo FFT. Los requisitos de cálculo basados en algoritmos FFT se abordan en la
Sección 14.2.5.
i = 0, 1, . . . , K − 1
xi (n) = x(n + iM), (14.2.1)
n = 0, 1, . . . , M − 1
Por último, promediamos los periodogramas para los K segmentos para obtener el estimado del espectro de
potencia de Bartlett [Bartlett (1948)]
1 K−1 (i)
B
Pxx (f) = ∑ Pxx ( f )
K i=0
(14.2.3)
Las propiedades estadísticas de este estimado pueden obtenerse fácilmente. En primer lugar, el valor medio
es
1 K−1
∑ E[Pxx ( f )]
B ( f )] (i)
E[Pxx =
K i=0 (14.2.4)
(i)
= E[Pxx ( f )]
A partir de (14.1.35) y (14.1.37) obtenemos el valor esperado para el periodograma como
M−1
|m|
∑
(i)
E[Pxx ( f )] = 1 − γxx (m)e− j2π f m
m=−(M−1)
M
(14.2.5)
1 1/2 sen π ( f − α )M 2
= Γxx (α ) dα
M −1/2 sen π ( f − α )
donde
2
1 sen π f M
WB ( f ) = (14.2.6)
M sen π f
es la característica en frecuencia de la ventana de Bartlett
i i
i i
|m|
1− , |m| ≤ M − 1
wB (n) = M (14.2.7)
0, en otro caso
A partir de (14.2.5) observe que el espectro real ahora se ha convolucionado con la característica en frecuencia
WB ( f ) de la ventana de Bartlett. El efecto de reducir la longitud de los datos de N puntos a M = N/K da lugar
a una ventana cuya anchura espectral se ha incrementado en un factor de K. En consecuencia, la resolución en
frecuencia se ha reducido en un factor K.
A cambio de esta reducción en la resolución, hemos reducido la varianza. La varianza del estimado de
Bartlett es
K−1
B ( f )] = 1
∑ var[Pxx ( f )]
(i)
var[Pxx
K 2 i=0 (14.2.8)
1 (i)
= var[Pxx ( f )]
K
Si utilizamos (14.1.38) en (14.2.8), obtenemos
2
B 1 sen 2 π f M
var[Pxx ( f )] = Γ2xx ( f ) 1 + (14.2.9)
K M sen 2π f
Por tanto, la varianza del estimado del espectro de potencia de Bartlett se ha reducido por el factor K.
n = 0, 1, . . . , M − 1
xi (n) = x(n + iD), (14.2.10)
i = 0, 1, . . . , L − 1
donde iD es el punto de partida de la secuencia i. Observe que si D = M, los segmentos no se solapan y el número
L de segmentos de datos es idéntico al número K en el método de Bartlett. Sin embargo, si D = M/2, existe un
50 % de solapamientos entre segmentos de datos sucesivos y se obtienen L = 2K segmentos. Alternativamente,
podemos formar K segmentos de datos de longitud 2M cada uno.
La segunda modificación hecha por Welch al método de Bartlett consiste en enventanar los segmentos de
datos antes de calcular el periodograma. El resultado es un periodograma “modificado”
2
M−1
1
∑ xi (n)w(n)e− j2π f n ,
(i)
P̃xx ( f ) = i = 0, 1, . . . , L − 1 (14.2.11)
MU n=0
1 M−1 2
U= ∑ w (n)
M n=0
(14.2.12)
El estimado del espectro de potencia de Welch es la media de estos periodogramas modificados, es decir,
1 L−1 (i)
W
Pxx (f) = ∑ P̃xx ( f )
L i=0
(14.2.13)
i i
i i
1 L−1
∑ E[P̃xx ( f )]
W ( f )] = (i)
E[Pxx
L i=0 (14.2.14)
(i)
= E[P̃xx ( f )]
Puesto que
1/2
γxx (n) = Γxx (α )e j2πα n d α (14.2.16)
−1/2
1 L−1 L−1
∑ ∑ E[P̃xx ( f )P̃xx ( f )] − {E[PxxW ( f )]}2
W (i) ( j)
var[Pxx ( f )] = (14.2.20)
L2 i=0 j=0
En el caso de que no se produzca ningún solapamiento entre segmentos de datos sucesivos (L = K), Welch
demostró que
W ( f )] = 1 var[P̃(i) ( f )]
var[Pxx xx
L (14.2.21)
1
≈ Γ2xx ( f )
L
En el caso de un solapamiento del 50 % entre segmentos de datos sucesivos (L = 2K), la varianza del
estimado del espectro de potencia de Welch con la ventana de Bartlett (triangular), también obtenida en los
estudios de Welch, es
W 9 2
var[Pxx ( f )] ≈ Γ (f) (14.2.22)
8L xx
i i
i i
Aunque hemos considerado sólo la ventana triangular en los cálculos de la varianza, pueden emplearse otras
funciones de ventana. En general, éstas nos proporcionarán una varianza diferente. Además, es posible variar
el solapamiento del segmento de datos más o menos del 50 % que se ha considerado en esta sección con el fin
de mejorar las características relevantes del estimado.
donde la función de ventana w(n) tiene una longitud de 2M − 1 y es cero para |m| ≥ M. Con esta definición para
w(n), los límites del sumatorio de la Ecuación (14.2.23) pueden ampliarse a (−∞, ∞). Por tanto, la expresión
equivalente en el dominio de la frecuencia para (14.2.23) es la convolución integral
1/2
BT
Pxx (f) = Pxx (α )W ( f − α )d α (14.2.24)
−1/2
donde Pxx ( f ) es el periodograma. Está claro a partir de (14.2.24) que el efecto de la ventana sobre la autocorre-
lación es suavizar el estimado del periodograma, de modo que disminuya la varianza del estimado a costa de
reducir la resolución.
La secuencia de ventana w(n) debe ser simétrica (par) alrededor de m = 0, para garantizar que el estimado
del espectro de potencia es real. Además, es deseable seleccionar el espectro de la ventana para que sea no
negativo, es decir,
W ( f ) ≥ 0, | f | ≤ 1/2 (14.2.25)
Esta condición asegura que Pxx BT ( f ) ≥ 0 para | f | ≤ 1/2, que es una propiedad deseable para cualquier estimado
del espectro de potencia. Sin embargo, debemos indicar que algunas de las funciones de ventana que hemos
presentado no satisfacen esta condición. Por ejemplo, a pesar de sus bajos niveles en el lóbulo secundario, las
ventanas de Hamming y Hann (o Hanning) no satisfacen la propiedad dada en (14.2.25) y, por tanto, pueden
dar lugar a estimados del espectro negativos en algunas partes del rango de frecuencia.
El valor esperado del estimado del espectro de potencia según el método de Blackman–Tukey es
1/2
BT
E[Pxx ( f )] = E[Pxx (α )]W ( f − α )d α (14.2.26)
−1/2
i i
i i
De manera similar, trabajando en el dominio del tiempo, el valor esperado del estimado del espectro de
potencia de Blackman–Tukey es
M−1
BT ( f )]
E[Pxx = ∑ E[rxx (m)]w(m)e− j2π f m
m=−(M−1)
(14.2.29)
M−1
= ∑ γxx (m)wB (m)w(m)e − j2π f m
m=−(M−1)
Evidentemente, deberíamos seleccionar la longitud de la ventana para w(n), tal que M << N, es decir, w(n) debe
ser más estrecho que wB (m) para proporcionar un suavizado adicional del periodograma. Bajo esta condición,
(14.2.28) se transforma en
1/2
BT
E[Pxx ( f )] ≈ Γxx (θ )W ( f − θ )d θ (14.2.31)
−1/2
ya que
1/2 1/2
WB (α − θ )W ( f − α )d α = WB (α )W ( f − θ − α )d α
−1/2 −1/2 (14.2.32)
≈ W(f −θ)
BT
var[Pxx ( f )] = E{[Pxx
BT
( f )]2 } − {E[Pxx
BT
( f )]}2 (14.2.33)
donde la media puede aproximarse como en (14.2.31). El momento de segundo orden en la Ecuación (14.2.33)
es
1/2 1/2
BT
E{[Pxx ( f )]2 } = E[Pxx (α )Pxx (θ )]W ( f − α )W ( f − θ )d α d θ (14.2.34)
−1/2 −1/2
Suponiendo que el proceso aleatorio es gaussiano (véase el Problema 14.5), tenemos que
-
sen π (θ + α )N 2
sen π (θ − α )N 2
E[Pxx (α )Pxx (θ )] = Γxx (α )Γxx (θ ) 1 + + (14.2.35)
N sen π (θ + α ) N sen π (θ − α )
1/2 1/2
+ Γxx (α )Γxx (θ )W ( f − α )W ( f − θ )
−1/2 −1/2
-
sen π (θ + α )N 2
sen π (θ − α )N 2
× + dα dθ (14.2.36)
N sen π (θ + α ) N sen π (θ − α )
i i
i i
de acuerdo con (14.2.33). Esto deja el segundo término de (14.2.36), que es la varianza. Para el caso en que
N M, las funciones sen π (θ + α )N/N sen π (θ + α ) y sen π (θ − α )N/N sen π (θ − α ) son relativamente
estrechas comparadas con W ( f ) en las vecindades de θ = −α y θ = α , respectivamente. Por tanto,
1/2
-
sen π (θ + α )N 2 sen π (θ − α )N 2
Γxx (θ )W ( f − θ ) + dθ
−1/2 N sen π (θ + α ) N sen π (θ − α )
(14.2.37)
Γxx (−α )W ( f + α ) + Γxx(α )W ( f − α )
≈
N
Con esta aproximación, la varianza de Pxx
BT
( f ) es
1 1/2
BT ( f )] ≈
var[Pxx Γxx (α )W ( f − α )[Γxx (−α )W ( f + α ) + Γxx (α )W ( f − α )]d α
N −1/2
(14.2.38)
1 1/2 2
≈ Γ (α )W 2 ( f − α )d α
N −1/2 xx
donde en el último paso hemos hecho la aproximación
1/2
Γxx (α )Γxx (−α )W ( f − α )W ( f + α )d α ≈ 0 (14.2.39)
−1/2
Realizamos una aproximación más en (14.2.38). Cuando W ( f ) es estrecho comparado con el espectro de
potencia real Γxx ( f ), la Ecuación (14.2.38) se aproxima aún más como
1/2
BT ( f )] ≈ Γ2 ( f ) 1
var[Pxx W 2 (θ )d θ
xx
N −1/2
(14.2.40)
1 M−1
≈ Γxx ( f )
2
∑
N m=−(M−1)
w 2
(m)
i i
i i
donde
2
1 sen π f N
WB ( f ) = (14.2.44)
N sen π f
Para N grande (es decir, N → ∞),
1/2
E[Pxx ( f )] → Γxx ( f ), WB (θ )d θ = wB (0)Γxx ( f ) = Γxx ( f )
−1/2 (14.2.45)
var[Pxx ( f )] → Γ2xx ( f )
Por tanto, como se ha indicado previamente, el periodograma es un estimado asintóticamente insesgado del
espectro de potencia, pero no es coherente porque su varianza no tiende a cero cuando N tiende a infinito.
Asintóticamente, el periodograma se caracteriza por el factor de calidad
Γ2xx ( f )
QP = =1 (14.2.46)
Γ2xx ( f )
El hecho de que QP sea fijo e independiente de la longitud de los datos N es otra indicación de la mala calidad
de este estimado.
Estimado del espectro de potencia de Bartlett. La media y la varianza del estimado del espectro de potencia
de Bartlett
1/2
B
E[Pxx ( f )] = Γxx (θ )WB ( f − θ )d θ (14.2.47)
−1/2
2
B 1 sen 2 π f M
var[Pxx ( f )] = Γ2xx ( f ) 1 + (14.2.48)
K M sen 2π f
y
2
1 sen π f M
WB ( f ) = (14.2.49)
M sen π f
Cuando N → ∞ y M → ∞, y manteniéndose K = N/M fijo, tenemos que
1/2
B ( f )]
E[Pxx → Γxx ( f ), WB ( f )d f = Γxx ( f )wB (0) = Γxx ( f )
−1/2 (14.2.50)
B 1 2
var[Pxx ( f )] → Γ (f)
K xx
Observe que el estimado del espectro de potencia de Bartlett es asintóticamente insesgado y si se permite
que K aumente al aumentar N, el estimado también es coherente. Por tanto, asintóticamente, este estimado se
caracteriza por el factor de calidad
N
QB = K = (14.2.51)
M
La resolución en frecuencia del estimado de Bartlett, medido tomando la anchura a 3-dB del lóbulo principal
de la ventana rectangular, es
0.9
∆f = (14.2.52)
M
i i
i i
N
QB = = 1.1N∆ f (14.2.53)
0.9/∆ f
Estimado del espectro de potencia de Welch. La media y la varianza del estimado del espectro de potencia
de Welch son
1/2
W
E[Pxx ( f )] = Γxx (θ )W ( f − θ )d θ (14.2.54)
−1/2
donde
2
M−1
1 − j2π f n
W(f) = ∑ w(n)e (14.2.55)
MU n=0
y
1 2
Γxx ( f ), sin solapamiento
W L
var[Pxx ( f )] = (14.2.56)
9 Γ2xx ( f ), para un 50 % de solapamiento y
ventana triangular
8L
Cuando N → ∞ y M → ∞, la media converge a
W
E[Pxx ( f )] → Γxx ( f ) (14.2.57)
1.28
∆f = (14.2.59)
M
En consecuencia, el factor de calidad expresado en términos de ∆ f y N es
0.78N∆ f , sin solapamiento
QW = para un 50 % de solapamiento y (14.2.60)
1.39N∆ f ,
ventana triangular
Estimado del espectro de potencia de Blackman–Tukey. La media y la varianza de este estimado se aproximan
como sigue
1/2
BT ( f )]
E[Pxx ≈ Γxx (θ )W ( f − θ )d θ
−1/2
(14.2.61)
1 M−1
BT ( f )]
var[Pxx ≈ Γxx ( f )
2
∑ w (m)
N m=−(M−1)
2
i i
i i
donde w(m) es la secuencia de ventana utilizada para afilar la secuencia de autocorrelación estimada. Para las
ventanas rectangular y de Bartlett (triangular) tenemos
1 M−1 2M/N, ventana rectangular
∑
N n=−(M−1)
w (n) =
2
2M/3N, ventana triangular
(14.2.62)
Es evidente a partir de (14.2.61) que el valor medio del estimado es asintóticamente insesgado. Su factor
de calidad para la ventana triangular es
N
QBT = 1.5 (14.2.63)
M
Como la longitud de la ventana es 2M − 1, la resolución en frecuencia mediada en los puntos a 3-dB es
1.28 0.64
∆f = = (14.2.64)
2M M
y, por tanto,
1.5
QBT = N∆ f = 2.34N∆ f (14.2.65)
0.64
Estos resultados se resumen en la Tabla 14.1. Es evidente a partir de los resultados que hemos obtenido que
los estimados del espectro de potencia de Welch y Blackman–Tukey son algo mejores que el estimado de Bartlett.
Sin embargo, las diferencias en lo que se refiere a las prestaciones son relativamente poco significativas. Lo
importante es que el factor de calidad aumenta cuando aumenta la longitud N de los datos. Este comportamiento
característico no es compartido por el estimado del periodograma. Además, el factor de calidad depende del
producto de la longitud de los datos N por la resolución en frecuencia Delta f . Para obtener un determinado
nivel de calidad, puede disminuirse ∆ f (aumentar la resolución de frecuencia) aumentando la longitud N de los
datos, y viceversa.
i i