[go: up one dir, main page]

0% encontró este documento útil (0 votos)
40 vistas85 páginas

PSD Estimación

Cargado por

etpinto79
Derechos de autor
© © All Rights Reserved
Nos tomamos en serio los derechos de los contenidos. Si sospechas que se trata de tu contenido, reclámalo aquí.
Formatos disponibles
Descarga como PDF, TXT o lee en línea desde Scribd
0% encontró este documento útil (0 votos)
40 vistas85 páginas

PSD Estimación

Cargado por

etpinto79
Derechos de autor
© © All Rights Reserved
Nos tomamos en serio los derechos de los contenidos. Si sospechas que se trata de tu contenido, reclámalo aquí.
Formatos disponibles
Descarga como PDF, TXT o lee en línea desde Scribd
Está en la página 1/ 85

i i

736 Tratamiento digital de señales

12.1.1 Procesos aleatorios


Muchos fenómenos físicos que pueden encontrarse en la naturaleza se caracterizan mejor utilizando parámetros
estadísticos. Por ejemplo, los fenómenos metereológicos como la temperatura y la presión del aire fluctúan
aleatoriamente como una función del tiempo. Las tensiones de ruido térmico generadas en las resistencias de
los dispositivos electrónicos, como por ejemplo un receptor de radio o de televisión, también son fenómenos
que fluctúan aleatoriamente. Estos son sólo unos pocos ejemplos de señales aleatorias. Tales señales suelen
modelarse como señales de energía y duración infinitas.
Suponga que tomamos el conjunto de formas de onda correspondientes a la temperatura del aire en diferentes
ciudades del mundo. Para cada ciudad, disponemos de la correspondiente forma de onda, que es una función del
tiempo como se ilustra en la Figura 12.1.1. El conjunto de todas las posibles formas de onda se denomina conjunto
de funciones temporales o, lo que es lo mismo, proceso aleatorio. La forma de onda para la temeperatura en
cualquier ciudad es una realización o función muestra del proceso aleatorio.
Del mismo modo, la tensión de ruido térmico generada en una resistencia es una realización o función
muestra del proceso aleatorio que consta de todas las formas de onda de tensión de ruido generadas por el
conjunto de todas las resistencias.

Figura 12.1.1. Funciones de ejemplo de un proceso aleatorio.

i i
i i

Capítulo 12 Predicción lineal y filtros lineales óptimos 737

El conjunto de todas las posibles formas de onda de ruido de un proceso aleatorio se designa como X(t, S),
donde t representa el índice de tiempo y S representa el conjunto (espacio de muestras) de todas las posibles
funciones de muestra. Una única forma de onda del conjunto se designa por x(t, s). Normalmente, no empleare-
mos la variables s (o S) por hacer más cómoda la notación, de modo que el proceso alaetorio se designará como
X(t) y una única realización se designará mediante x(t).
Una vez definido un proceso aleatorio X(t) como un conjunto de funciones muestra, consideremos los
valores del proceso para cualquier conjunto de instantes de tiempo t1 > t2 > · · · > tn , donde n es cualquier
entero positivo. En general, las muestras Xti ≡ x(ti ), i = 1, 2, . . . , n son n variables aleatorias caracterizadas
estadísticamente por su función de densidad de probabilidad conjunta designada como p(xt1 , xt2 , . . . , xtn ) para
cualquier n.

12.1.2 Procesos aleatorios estacionarios


Suponga que disponemos de n muestras del proceso aleatorio X(t) en t = ti , i = 1, 2, . . . , n, y de otro conjunto de
n muestras desplazadas en el tiempo respecto del primer conjunto una cantidad τ . Luego el segundo conjunto
de muestras es Xti +τ ≡ X(ti + τ ), i = 1, 2, . . . , n, como se muestra en la Figura 12.1.1. Este segundo conjunto
de n variables aleatorias se caracteriza por su función de densidad de probabilidad conjunta p(xti +τ , . . . , xtn +τ ).
Las funciones de densidad de probabilidad conjuntas de los dos conjuntos de variables aleatorias pueden o no
ser idénticas. Si son idénticas, entonces

p(xt1 , xt2 , . . . , xtn ) = p(xt1 +τ , xt2 +τ , . . . , xtn +τ ) (12.1.1)

para todo τ y todo n, y se dice que el proceso aleatorio es estacionario en sentido estricto. En otras palabras,
las propiedades estadísticas de un proceso aleatorio estacionario son invariantes ante una translación del eje
de tiempos. Por el contrario, si las funciones de densidad de probabilidad conjuntas son diferentes, el proceso
aleatorio no será estacionario.

12.1.3 Promedios estadísticos


Consideremos un proceso aleatorio X(t) muestreado en los instantes de tiempo t = ti . Así, X(ti ) es una variable
aleatoria cuya función de densidad de probabilidad es p(xti ). El momento de primer orden de la variable aleatoria
se define como el valor esperado de X l (ti ), es decir,
 ∞
E(Xtli ) = xtli p(xti )dxti (12.1.2)
−∞

En general, el valor del momento de primer orden depende del instante de tiempo ti , si la función de densidad de
probabilidad de Xti depende de ti . Sin embargo, cuando el proceso es estacionario, p(xti +τ ) = p(xti ) para todo
τ . Luego la función de densidad de probabilidad es independiente del tiempo y, en consecuencia, el momento
de primer orden es independiente del tiempo (una constante).
Consideremos ahora las dos variables aleatorias Xti = X(ti ), i = 1, 2, correspondientes a las muestras de
X(t) tomadas en t = t1 y t = t2 . La correlación estadística entre Xt1 y Xt2 se mide mediante el momento conjunto
 ∞ ∞
E(Xt1 Xt2 ) = xt1 xt2 p(xt1 xt2 ) dx1 dx2 (12.1.3)
−∞ −∞

Dado que el momento conjunto depende de los instantes de tiempo t1 y t2 , se designa como γxx (t1 , t2 ) y se
denomina función de autocorrelación del proceso aleatorio. Cuando el proceso X(t) es estacionario, la función
de densidad de probabilidad conjunta de la pareja (Xt1 , Xt2 ) es idéntica a la función de densidad de probabilidad
conjunta de la pareja (Xt1 +τ , Xt2 +τ ) para cualquier τ arbitraria. Esto implica que la función de autocorrelación

i i
i i

738 Tratamiento digital de señales

X(t) depende de la diferencia de tiempos t1 − t2 = τ . Por tanto, para un proceso aleatorio real estacionario, la
función de autocorrelación es
γxx (τ ) = E[Xt1 +τ Xt1 ] (12.1.4)
Por otro lado,
γxx (−τ ) = E(Xt1 −τ Xt1 ) = E(Xt1 Xt1 +τ ) = γxx (τ ) (12.1.5)

Por tanto, γxx (τ ) es una función par. Observe también que γxx (0) = E(Xt21 ) es la potencia media del proceso
aleatorio.
Existen procesos no estacionarios con la propiedad de que el valor medio del proceso es una constante
y la función de autocorrelación satisface la propiedad γxx (t1 , t2 ) = γxx (t1 − t2 ). Un proceso así se dice que es
estacionario en sentido amplio. Evidentemente, la estacionariedad en sentido amplio es una condición menos
restrictiva que la condición de estacionariedad en sentido estricto. En nuestro estudio sólo vamos a requerir que
los procesos sean estacionarios en sentido amplio.
Relacionada con la función de autocorrelación tenemos la autocovarianza , que se define como

cxx (t1 ,t2 ) = E{[Xt1 − m(t1 )][Xt2 − m(t2 )]}


(12.1.6)
= γxx (t1 ,t2 ) − m(t1)m(t2 )

donde m(t1 ) = E(Xt1 ) y m(t2 ) = E(Xt2 ) son los valores medios de Xt1 y Xt2 , respectivamente. Cuando el proceso
es estacionario,
cxx (t1 ,t2 ) = cxx (t1 − t2 ) = cxx (τ ) = γxx (τ ) − m2x (12.1.7)
donde τ = t1 − t2 . Además, la varianza del proceso es σx2 = cxx (0) = γxx (0) − m2x .

12.1.4 Promedios estadísticos para procesos aleatorios conjuntos


Sean X(t) e Y (t) dos procesos aleatorios y sean Xti ≡ X(ti ), i = 1, 2, . . . , n, e Yt j ≡ Y (t j ), j = 1, 2, . . . , m, las
variables aleatorias en los instantes t1 > t2 > · · · > tn y t1 > t2 > · · · > tm , respectivamente. Los dos conjuntos
de variables aleatorias se caracterizan estadísticamente por la función de densidad de probabilidad conjunta

p(xt1 , xt2 , . . . , xtn , yt  , yt  , . . . , ytm )


1 2

para cualquier conjunto de instantes de tiempo {ti } y {t j } y para cualquier valor entero positivo de m y n.
La función de correlación cruzada de X(t) e Y (t), designada como γ xy (t1 , t2 ), se define como el momento
conjunto
 ∞ ∞
γxy (t1 ,t2 ) ≡ E(Xt1 Yt2 ) = xt1 yt2 p(xt1 , yt2 )dxt1 dyt2 (12.1.8)
−∞ −∞

y la covarianza cruzada es
cxy (t1 ,t2 ) = γxy (t1 ,t2 ) − mx (t1 )my (t2 ) (12.1.9)
Cuando los procesos aleatorios son conjunta e individualmente estacionarios, tenemos γxy (t1 , t2 ) = γxy (t1 − t2 )
y cxy (t1 , t2 ) = cxy (t1 − t2). En este caso,

γxy (−τ ) = E(Xt1 Yt1 +τ ) = E(Xt1 −τ Yt1 ) = γyx (τ ) (12.1.10)

Los procesos aleatorios X(t) e Y (t) se dice que son estadísticamente independientes si y sólo si

p(xt1 , xt2 , . . . , xtn , yt1 , yt2 , . . . , ytm ) = p(xt1 , . . . , xtn )p(yt1 , . . . , ytm )

i i
i i

Capítulo 12 Predicción lineal y filtros lineales óptimos 739

para todas las elecciones de ti, ti y para todos los enteros positivos n y m. Se dice que los procesos son incorrelados
si
γxy (t1 ,t2 ) = E(Xt1 )E(Yt2 ) (12.1.11)
de modo que cxy (t1 , t2 ) = 0.
Un proceso aleatorio complejo Z(t) se define como
Z(t) = X(t) + jY (t) (12.1.12)
donde X(t) e Y (t) son procesos aleatorios. La función de densidad de probabilidad conjunta de las variables
aleatorias conjugadas Zti ≡ Z(ti ), i = 1, 2, . . . , queda determinada por la función de densidad de probabili-
dad conjunta de las componentes (Xti , Yti ), i = 1, 2, . . . , n. Luego la función de densidad de probabilidad que
caracteriza Zti , i = 1, 2, . . . , n es
p(xt1 , xt2 , . . . , xtn , yt1 , yt2 , . . . , ytn )
Un proceso aleatorio complejo Z(t) puede encontrarse en la representación de las componentes en fase y
en cuadratura del equivalente paso bajo de una señal aleatoria de banda estrecha o de ruido. Una característica
importante de un proceso así es su función de autocorrelación, la cual se define como
γzz (t1 ,t2 ) = E(Zt1 Zt∗2 )
= E[(Xt1 + jYt1 )(Xt2 − jYt2 )] (12.1.13)
= γxx (t1 ,t2 ) + γyy (t1 ,t2 ) + j[γyx (t1 ,t2 ) − γxy (t1 ,t2 )]

Cuando los procesos aleatorios X(t) e Y (t) son conjunta e individualmente estacionarios, la función de autoco-
rrelación de Z(t) se convierte en
γzz (t1 ,t2 ) = γzz (t1 − t2 ) = γzz (τ )
donde τ = t1 − t2 . El conjugado complejo de (12.1.13) es
γzz∗ (τ ) = E(Zt∗1 Zt1 −τ ) = γzz (−τ ) (12.1.14)

Supongamos ahora que Z(t) = X(t) + jY (t) y W (t) = U(t) + jV (t) son dos procesos aleatorios complejos.
Su función de correlación cruzada se define como
γzw (t1 ,t2 ) = E(Zt1 Wt∗2 )
= E[(Xt1 + jYt1 )(Ut2 − jVt2 )] (12.1.15)
= γxu (t1 ,t2 ) + γyv (t1 ,t2 ) + j[γyu (t1 ,t2 ) − γxv(t1 ,t2 )]

Cuando X(t), Y (t), U(t) y V (t) son estacionarios dos a dos, las funciones de correlación cruzada dadas por
(12.1.15) se transforman en funciones de la diferencia temporal τ = t1 − t2 . Además, tenemos que

γzw (τ ) = E(Zt∗1 Wt1 −τ ) = E(Zt∗1 +τ Wt1 ) = γwz (−τ ) (12.1.16)

12.1.5 Espectro de densidad de potencia


Un proceso aleatorio estacionario es una señal de energía infinita y por tanto su transformada de Fourier no existe.
La característica espectral de un proceso aleatorio se obtiene de acuerdo con el teorema de Wiener–Khintchine,
calculando la transformada de Fourier de la función de autocorrelación. Es decir, la distribución de potencia con
la frecuencia está dada por la función
 ∞
Γxx (F) = γxx (τ )e− j2π F τ d τ (12.1.17)
−∞

i i
i i

740 Tratamiento digital de señales

La transformada inversa de Fourier está dada por


 ∞
γxx (τ ) = Γxx (F)e j2π F τ dF (12.1.18)
−∞

Observe que
 ∞
γxx (0) = Γxx (F)dF (12.1.19)
−∞

= E(Xt2 ) ≥ 0

Puesto que E(Xt2 ) = γxx (0) representa la potencia media del proceso aleatorio, que es el área bajo Γxx (F), se
deduce que Γxx (F) es la distribución de potencia como una función de la frecuencia. Por esta razón, Γxx (F) se
conoce como espectro de densidad de potencia del proceso aleatorio.
Si el proceso aleatorio es real, γxx (τ ) es real y par, y por tanto, Γxx (F) es real y par. Si el proceso aleatorio
∗ (−τ ) y, por tanto
es complejo, γxx (τ ) = γxx
 ∞  ∞
Γ∗xx (F) = ∗
γxx (τ )e j2π F τ d τ = ∗
γxx (−τ )e− j2π F τ d τ
−∞ −∞
 ∞
= γxx (τ )e− j2π F τ d τ = Γxx (F)
−∞

Por tanto, Γxx (F) siempre es real.


La definición del espectro de densidad de potencia puede extenderse a dos procesos aleatorios conjuntamente
estacioarios X(t) e Y (t), que tienen una función de correlación cruzada γxy (τ ). La transformada de Fourier de
γxy (τ ) es
 ∞
Γxy (F) = γxy (τ )e− j2π F τ d τ (12.1.20)
−∞

que es el espectro de densidad de potencia cruzada. Puede demostrarse fácilmente que Γ ∗xy (F) = Γyx (−F). Para
procesos aleatorios reales, la condición es Γyx (F) = Γxy (−F).

12.1.6 Señales aleatorias discretas en el tiempo


Esta caracterización de señales aleatorias continuas en el tiempo puede trasladarse fácilmente a señales discretas
en el tiempo. Tales señales se obtienen normalmente muestreando uniformemente un proceso aleatorio continuo
en el tiempo.
Un proceso aleatorio discreto en el tiempo X(n) consta de un conjunto de secuencias muestra x(n). Las
propiedades estadísticas de X(n) son similares a la caracterización de X(t), con la restricción de que n es
ahora una variable entera (tiempo). Más específicamente, establezcamos ahora la forma para los momentos que
utilizamos en este texto.
El momento de primer orden de X(n) se define como
 ∞
E(Xnl ) = xln p(xn )dxn (12.1.21)
−∞

y la secuencia de autocorrelación es
 ∞ ∞
γxx (n, k) = E(Xn Xk ) = xn xk p(xn , xk )dxn dxk (12.1.22)
−∞ −∞

i i
i i

Capítulo 12 Predicción lineal y filtros lineales óptimos 741

De forma similar, la autocovarianza es

cxx (n, k) = γxx (n, k) − E(Xn)E(Xk ) (12.1.23)

Para un proceso estacionario, tenemos las formas especiales (m = n − k)

γxx (n − k) = γxx (m)


(12.1.24)
cxx (n − k) = cxx (m) = γxx (m) − m2x

donde mx = E(Xn ) es la media del proceso aleatorio. La varianza se define como σ 2 = cxx (0) = γxx (0) − m2x .
Para un proceso estacionario complejo Z(n) = X(n) + jY (n), tenemos

γzz (m) = γxx (m) + γyy (m) + j[γyx (m) − γxy (m)] (12.1.25)

y la correlación cruzada de dos secuencias complejas es

γzw (m) = γxu (m) + γyv(m) + j[γyu (m) − γxv (m)] (12.1.26)

Como en el caso de un proceso aleatorio continuo en el tiempo, un proceso aleatorio discreto en el tiempo
tiene energía infinita pero una potencia media finita y está dada por

E(Xn2 ) = γxx (0) (12.1.27)

Utilizando el teorema de Wiener–Khintchine, obtenemos el espectro de densidad de potencia del proceso alea-
torio discreto en el tiempo calculando la transformada de Fourier de la autocorrelación γxx (m), es decir,

Γxx ( f ) = ∑ γxx (m)e− j2π f m (12.1.28)
m=−∞

La transformada inversa es
 1/2
γxx (m) = Γxx ( f )e j2π f m d f (12.1.29)
−1/2

Observe que la potencia media es


 1/2
γxx (0) = Γxx ( f )d f (12.1.30)
−1/2

de modo que Γxx ( f ) es la distribución de potencia en función de la frecuencia, es decir, Γxx ( f ) es el espectro de
densidad de potencia del proceso aleatorio X(n). Las propiedades que hemos establecido para Γxx (F) también
se cumplen para Γxx ( f ).

12.1.7 Promedios temporales para un proceso aleatorio discreto en el tiempo


Aunque hemos caracterizado un proceso aleatorio en función de promedios estadísticos, como la media y
la autocorrelación, en la práctica, normalmente dispondremos de una sola realización del proceso aleatorio.
Consideremos el problema de obtener los promedios del proceso aleatorio a partir de una sola realización. Para
conseguir esto, el proceso aleatorio tiene que ser ergódico.
Por definición, un proceso aleatorio X(n) es ergódico si, con probabilidad 1, todos los promedios estadísticos
pueden determinarse a partir de una sola función muestra del proceso. En efecto, el proceso aleatorio es ergódico
si los promedios temporales obtenidos a partir de una sola realización son iguales a los promedios estadísticos

i i
i i

742 Tratamiento digital de señales

del conjunto. Bajo esta condición, podemos intentar estimar los promedios del conjunto utilizando promedios
temporales de una sola realización.
Para ilustrar este punto, consideremos la estimación de la media y de la autocorrelación del proceso aleatorio
a partir de una sola realización x(n). Como sólo estamos interesados en estos dos momentos, definimos la ergo-
dicidad con respecto a estos parámetros. Para obtener información detallada sobre los requisitos de ergodicidad
respecto de la media y ergodicidad respecto de la autocorrelación que se proporcionan a continuación, el lector
puede consultar el libro de Papoulis (1984).

12.1.8 Procesos ergódicos respecto de la media


Dado un proceso aleatorio estacionario X(n) con media

mx = E(Xn )

definimos la media temporal como


N
1
m̂x = ∑ x(n)
2N + 1 n=−N
(12.1.31)

En general, interpretamos el valor de m̂x dado en (12.1.31) como un estimado de la media estadística, cuyo valor
variará con las diferentes realizaciones del proceso aleatorio. Dado que m̂x es una variable aleatoria con una
función de densidad de probabilidad p(m̂x ), calculamos el valor esperado de m̂x sobre todas las realizaciones
posibles de X(n). Como el sumatorio y la esperanza son operaciones lineales, podemos intercambiarlas, de
modo que
N N
1 1
E(m̂x ) = ∑
2N + 1 n=−N
E[x(n)] = ∑
2N + 1 n=−N
m x = mx (12.1.32)

Dado que el valor medio de la estimación es igual a la media estadística, decimos que el estimado m̂x no está
polarizado.
A continuación calculamos la varianza de m̂x . Tenemos

var(m̂x ) = E(|m̂x |2 ) − |mx |2

Pero
N N
1
E(|m̂x |2 ) = ∑ ∑
(2N + 1) n=−N k=−N
2
E[x∗ (n)x(k)]

N N
1
= ∑ ∑ γxx (k − n)
(2N + 1)2 n=−N k=−N
2N  
1 |m|
= ∑ 1 − 2N + 1 γxx (m)
2N + 1 m=−2N

Por tanto,
2N  
1 |m|
var(m̂x ) = ∑ 1 − 2N + 1 γxx − |mx |2
2N + 1 m=−2N
(12.1.33)
2N  
1 |m|
= ∑
2N + 1 m=−2N
1 −
2N + 1
cxx (m)

i i
i i

Capítulo 12 Predicción lineal y filtros lineales óptimos 743

Si var(mx ) → 0 cuando N → ∞, el estimado converge con probabilidad 1 a la media estadística mx . Por tanto,
el proceso X(n) es ergódico respecto de la media si

2N  
1 |m|
lı́m
N→∞ 2N + 1
∑ 1 − 2N + 1 cxx (m) = 0 (12.1.34)
m=−2N

Bajo esta condición, el estimado m̂x en el límite cuando N → ∞ se hace igual a la media estadística, es decir,

N
1
mx = lı́m
N→∞ 2N + 1
∑ x(n) (12.1.35)
n=−N

Luego la media temporal en el límite cuando N → ∞, es igual a la media del conjunto.


Una condición suficiente para (12.1.34) es que si

∑ |cxx (m)| < ∞ (12.1.36)
m=−∞

lo que implica que cxx (m) → 0 cuando m → ∞. Esta condición se cumple para la mayoría de los procesos de
media cero que pueden encontrarse en el mundo real.

12.1.9 Procesos ergódicos respecto de la correlación


Consideremos ahora el estimado de la autocorrelación γxx (m) a partir de una sola realización del proceso.
Continuando con la notación anterior, designamos el estimado (en general, para una señal compleja) como

N
1
rxx (m) = ∑
2N + 1 n=−N
x∗ (n)x(n + m) (12.1.37)

De nuevo, consideremos rxx (m) como una variable aleatoria para cualquier m dado, ya que es una función de
una realización particular. El valor esperado (valor medio de todas las realizaciones) es

N
1
E[rxx (m)] = ∑
2N + 1 n=−N
E[x∗ (n)x(n + m)]
N (12.1.38)
1
= ∑
2N + 1 n=−N
γxx (m) = γxx (m)

Por tanto, el valor esperado de la autocorrelación temporal es igual a la media estadística. Por tanto, tenemos
un estimado no polarizado de γxx (m).
Para determinar la varianza del estimado rxx (m), calculamos el valor esperado de |rxx (m)|2 y restamos el
cuadrado del valor medio. Así
var[rxx (m)] = E[|rxx (m)|2 ] − |γxx (m)|2 (12.1.39)
Pero
N N
1
E[|rxx (m)|2 ] = ∑ ∑
(2N + 1) n=−N k=−N
2
E[x∗ (n)x(n + m)x(k)x∗ (k + m)] (12.1.40)

El valor esperado del término x∗ (n)x(n + m)x(k)x∗ (k + m) es sólo la autocorrelación de un proceso aleatorio
definido como
vm (n) = x∗ (n)x(n + m)

i i
i i

744 Tratamiento digital de señales

Puesto que (12.1.40) puede escribirse como


N N
1
∑ ∑ γvv (n − k)
(m)
E[|rxx (m)|2 ] =
(2N + 1)2 n=−N k=−N
2N   (12.1.41)
1 |n|
∑ 1 − 2N + 1 γvv (n)
(m)
=
2N + 1 n=−2N

y la varianza es
2N  
1 |n|

(m)
var[rxx (m)] = 1 − γvv (n) − |γxx (m)|2 (12.1.42)
2N + 1 n=−2N 2N + 1

Si var[rxx (m)] → 0 cuando N → ∞, el estimado rxx (m) converge con probabilidad 1 a la autocorrelación
estadística γxx (m). Bajo estas condiciones, el proceso es ergódico respecto de la correlación y la correlación
temporal es idéntica a la media estadística, es decir,
N
1
lı́m ∑ x∗ (n)x(n + m) = γxx (m)
N→∞ 2N + 1 n=−N
(12.1.43)

En nuestro estudio de las señales aleatorias, suponemos que los procesos aletorios son ergódicos respecto
de la media y ergódicos respecto de la correlación, por lo que podemos trabajar con promedios temporales de
la media y la autocorrelación obtenidos a partir de una sola realización del proceso.

12.2 Representación de innovaciones de un


proceso aleatorio estacionario
En esta sección demostramos que un proceso aleatorio estacionario en sentido amplio puede representarse
como la salida de un sistema lineal causalmente invertible y causal excitado por un proceso de ruido blanco.
La condición de que el sistema es causalmente invertible también nos permite representar el proceso aleatorio
estacionario en sentido amplio mediante la salida del sistema inverso, que es un proceso de ruido blanco.
Considere un proceso estacionario en sentido amplio {x(n)} con autocorrelación {γxx (m)} y densidad espec-
tral de potencia Γxx ( f ), | f | ≤ 12 . Supongamos que Γxx ( f ) es real y continua para todo | f | ≤ 12 . La transformada
z de la autocorrelación {γxx (m)} es

Γxx (z) = ∑ γxx (m)z−m (12.2.1)
m=−∞

a partir de la cual obtenemos la densidad espectral de potencia evaluando Γxx (z) sobre la circunferencia unidad
[es decir, sustituyendo z = exp( j2π f )].
Ahora suponemos que log Γxx (z) es analítica (posee derivadas de todos los órdenes) en una región anular
del plano z que incluye la circunferencia unidad (es decir, r1 < |z| < r2 donde r1 < 1 y r2 > 1). Luego, log Γxx (z)
se puede expandir en una serie de Laurent de la forma

log Γxx (z) = ∑ v(m)z−m (12.2.2)
m=−∞

donde los {v(m)} son los coeficientes de las series. Podemos interpretar {v(m)} como la secuencia cuya trans-
formada z es V (z) = log Γxx (z). Del mismo modo, podemos evaluar log Γxx (z) sobre la circunferencia unidad,

log Γxx ( f ) = ∑ v(m)e− j2π f m (12.2.3)
m=−∞

i i
Random Signals

In this appendix,we collect and summarize a number of results and establish the notation
relating to the representation of random signals. We make no attempt here to provide a
detailed discussion of the difficult and subtle mathematical issues of the underlying the-
ory. Although our approach is not rigorous, we have summarized the important results
and the mathematical assumptions implicit in their derivation. Detailed presentation
of the theory of random signals are found in texts such as Davenport (1970), Papoulis
(1984), Gray and Davidson (2004), Kay (2006), and Bertsekas and Tsitsiklis (2008).

1 DISCRETE-TIME RANDOM PROCESSES

The fundamental concept in the mathematical representation of random signals is that


of a random process. In our discussion of random processes as models for discrete-time
signals, we assume that the reader is familiar with the basic concepts of probability, such
as random variables, probability distributions, and averages.
In using the random-process model in practical signal-processing applications, we
consider a particular sequence to be one of an ensemble of sample sequences. Given a
discrete-time signal, the structure, i.e., the underlying probability law, of the correspond-
ing random process is generally not known and must somehow be inferred. It may be
possible to make reasonable assumptions about the structure of the process, or it may
be possible to estimate the properties of a random-process representation from a finite
segment of a typical sample sequence.
Formally, a random process is an indexed family of random variables {xn } charac-
terized by a set of probability distribution functions that, in general, may be a function
of the index n. In using the concept of a random process as a model for discrete-time
Random Signals

signals, the index n is associated with the time index. In other words, each sample value
x[n] of a random signal is assumed to have resulted from a mechanism that is governed
by a probability law. An individual random variable xn is described by the probability
distribution function
Pxn (xn , n) = Probability [xn ≤ xn ], (1)
where xn denotes the random variable and xn is a particular value of xn .1 If xn takes
on a continuous range of values, it is equivalently specified by the probability density
function
∂Pxn (xn , n)
pxn (xn , n) = , (2)
∂xn
or the probability distribution function
 xn
Pxn (xn , n) = pxn (x, n)dx. (3)
−∞

The interdependence of two random variables xn and xm of a random process is


described by the joint probability distribution function
Pxn ,xm (xn , n, xm , m) = Probability [xn ≤ xn and xm ≤ xm ] (4)
and by the joint probability density
∂2 Pxn ,xm (xn , n, xm , m)
pxn ,xm (xn , n, xm , m) = . (5)
∂xn ∂xm
Two random variables are statistically independent if knowledge of the value of
one does not affect the probability density of the other. If all the random variables of a
collection of random variables, {xn }, are statistically independent, then
Pxn ,xm (xn , n, xm , m) = Pxn (xn , n) · Pxm (xm , m) m  = n. (6)
A complete characterization of a random process requires the specification of all
possible joint probability distributions. As we have indicated, these probability distribu-
tions may be a function of the time indices m and n. In the case where all the probability
distributions are independent of a shift of time origin, the random process is said to be
stationary. For example, the 2nd -order distribution of a stationary process satisfies
Pxn +k,xm +k (xn+k , n + k, xm+k , m + k) = Pxn xm (xn , n, xm , m) for all k. (7)
In many of the applications of discrete-time signal processing, random processes
serve as models for signals in the sense that a particular signal can be considered a sample
sequence of a random process. Although the details of such signals are unpredictable—
making a deterministic approach to signal representation inappropriate—certain
average properties of the ensemble can be determined, given the probability law of
the process. These average properties often serve as a useful, although incomplete,
characterization of such signals.

1 In this appendix, boldface type is used to denote the random variables and regular type denotes
dummy variables of probability functions.

1026
Random Signals

2 AVERAGES

It is often useful to characterize a random variable by averages such as the mean and
variance. Since a random process is an indexed set of random variables, we may likewise
characterize the process by statistical averages of the random variables making up the
random process. Such averages are called ensemble averages. We begin the discussion
of averages with some definitions.

2.1 Definitions
The average, or mean, of a random process is defined as
 ∞
mxn = E{xn } = xpxn (x, n)dx, (8)
−∞
where E denotes an operator called mathematical expectation. In general, the mean
(expected value) may depend on n. In addition, if g(·) is a single-valued function, then
g(xn ) is a random variable, and the set of random variables {g(xn )} defines a new random
process. To compute averages of this new process, we can derive probability distributions
of the new random variables. Alternatvely, it can be shown that
 ∞
E{g(xn )} = g(x)pxn (x, n)dx. (9)
−∞
If the random variables are discrete—i.e., if they have quantized values—the integrals
become summations over all possible values of the random variable. In that case E{g(x)}
has the form

E{g(xn )} = g(x)p̂xn (x, n). (10)
x
In cases where we are interested in the relationship between multiple random
processes, we must be concerned with multiple sets of random variables. For example,
for two sets of random variables, {xn } and {ym }, the expected value of a function of the
two random variables is defined as
 ∞ ∞
E{g(xn , ym )} = g(x, y)pxn ,ym (x, n, y, m)dx dy, (11)
−∞ −∞
where pxn ,ym (xm , n, ym , m) is the joint probability density of the random variables xn
and ym .
The mathematical expectation operator is a linear operator; that is, it can be shown
that

1. E{xn + ym } = E{xn } + E{ym }; i.e., the average of a sum is the sum of the averages.
2. E{axn } = aE{xn }; i.e., the average of a constant times xn is equal to the constant
times the average of xn .

In general, the average of a product of two random variables is not equal to the
product of the averages. When this property holds, however, the two random variables
are said to be linearly independent or uncorrelated. That is, xn and ym are linearly inde-
pendent or uncorrelated if
E{xn ym } = E{xm } · E{ym }. (12)

1027
Random Signals

It is easy to see from Eqs. (11) and (12) that a sufficient condition for linear indepen-
dence is
pxn ,ym (xn , n, ym , m) = pxn (xn , n) · pym (ym , m). (13)
However, Eq. (13) is a stronger statement of independence than Eq. (12). As previously
stated, random variables satisfying Eq. (13) are said to be statistically independent. If
Eq. (13) holds for all values of n and m, the random processes {xn } and {ym } are said to be
statistically independent. Statistically independent random processes are also linearly
independent; but the converse is not true: Linear independence does not imply statistical
independence.
It can be seen from Eqs. (9)–(11) that averages generally are functions of the time
index. For stationary processes, the mean is the same for all the random variables that
constitute the process; i.e., the mean of a stationary process is a constant, which we
denote simply mx .
In addition to the mean of a random process, as defined in Eq. (8), a number
of other averages are particularly important within the context of signal processing.
These are defined next. For notational convenience, we assume that the probability
distributions are continuous. Corresponding definitions for discrete random processes
can be obtained by applying Eq. (10).
The mean-square value of xn is the average of |xn |2 ; i.e.,
 ∞
E{|xn | } = mean square =
2
|x|2 pxn (x, n)dx. (14)
−∞
The mean-square value is sometimes referred to as the average power.
The variance of x n is the mean-square value of [xn − mxn ]; i.e.,
var[xn ] = E{|(xn − mxn )|2 } = σx2n . (15)
Since the average of a sum is the sum of the averages, it follows that Eq. (15) can be
written as
var[xn ] = E{|xn |2 } − |mxn |2 . (16)
In general, the mean-square value and the variance are functions of time; however, they
are constant for stationary processes.
The mean, mean square, and variance are simple averages that provide only a small
amount of information about a process. A more useful average is the autocorrelation
sequence, which is defined as
φxx [n, m] = E{xn x∗m }
 ∞ ∞ (17)

= xn xm pxn ,xm (xn , n, xm , m)dxn dxm ,
−∞ −∞
where ∗ denotes complex conjugation. The autocovariance sequence of a random pro-
cess is defined as
γxx [n, m] = E{(xn − mxn )(xm − mxm )∗ }, (18)
which can be written as
γxx [n, m] = φxx [n, m] − mxn m∗xm . (19)

1028
Random Signals

Note that, in general, both the autocorrelation and autocovariance are two-dimensional
sequences, i.e., functions of two discrete variables.
The autocorrelation sequence is a measure of the dependence between values of
the random processes at different times. In this sense, it partially describes the time
variation of a random signal. A measure of the dependence between two different
random signals is obtained from the cross-correlation sequence. If {xn } and {ym } are
two random processes, their cross-correlation is
φxy [n, m] = E{xn y∗m }
 ∞ ∞ (20)
= xy∗ pxn ,ym (x, n, y, m)dx dy,
−∞ −∞
where pxn ,ym (x, n, y, m) is the joint probability density of xn and ym .The cross-covariance
function is defined as
γxy [n, m] = E{(xn − mxn )(ym − mym )∗ }
(21)
= φxy [n, m] − mxn m∗ym .
As we have pointed out, the statistical properties of a random process generally
vary with time. However, a stationary random process is characterized by an equilibrium
condition in which the statistical properties are invariant to a shift of time origin. This
means that the 1st -order probability distribution is independent of time. Similarly, all the
joint probability functions are also invariant to a shift of time origin; i.e., the 2nd -order
joint probability distributions depend only on the time difference (m − n). First-order
averages such as the mean and variance are independent of time; 2nd -order averages,
such as the autocorrelation φxx [n, m], are dependent on the time difference (m − n).
Thus, for a stationary process, we can write
mx = E{xn }, (22)
σx2 = E{|(xn − mx )|2 }, (23)
both independent of n. If we now denote the time difference by m, we have
φxx [n + m, n] = φxx [m] = E{xn+m x∗n }. (24)
That is,the autocorrelation sequence of a stationary random process is a one-dimensional
sequence, a function of the time difference m.
In many instances, we encounter random processes that are not stationary in the
strict sense—i.e., their probability distributions are not time invariant—but Eqs. (22)–
(24) still hold. Such random processes are said to be wide-sense stationary.

2.2 Time Averages


In a signal-processing context, the notion of an ensemble of signals is a convenient math-
ematical concept that allows us to use the theory of probability to represent the signals.
However, in a practical situation, we always have available at most a finite number
of finite-length sequences rather than an infinite ensemble of sequences. For example,
we might wish to infer the probability law or certain averages of the random-process
representation from measurements on a single member of the ensemble. When the
probability distributions are independent of time, intuition suggests that the amplitude

1029
Random Signals

distribution (histogram) of a long segment of an individual sequence of samples should


be approximately equal to the single probability density that describes each of the ran-
dom variables of the random-process model. Similarly, the arithmetic average of a large
number of samples of a single sequence should be very close to the mean of the process.
To formalize these intuitive notions, we define the time average of a random process as

1 
L
xn  = lim xn . (25)
L→∞ 2L + 1
n=−L

Similarly, the time autocorrelation sequence is defined as

1 
L
xn+m x∗n  = lim xn+m x∗n . (26)
L→∞ 2L + 1
n=−L

It can be shown that the preceding limits exist if {xn } is a stationary process with finite
mean. As defined in Eqs. (25) and (26), these time averages are functions of an infinite
set of random variables and thus are properly viewed as random variables themselves.
However, under the condition known as ergodicity, the time averages in Eqs. (25) and
(26) are equal to constants in the sense that the time averages of almost all possible
sample sequences are equal to the same constant. Furthermore, they are equal to the
corresponding ensemble average.2 That is, for any single sample sequence {x[n]} for
−∞ < n < ∞,

1 
L
x[n] = lim x[n] = E{xn } = mx (27)
L→∞ 2L + 1
n=−L

and
1 
L
x[n + m]x∗ [n] = lim x[n + m]x∗ [n] = E{xn+m x∗n } = φxx [m]. (28)
L→∞ 2L + 1
n=−L

The time-average operator · has the same properties as the ensemble-average operator
E{·}. Thus, we generally do not distinguish between the random variable xn and its value
in a sample sequence, x[n]. For example, the expression E{x[n]} should be interpreted as
E{xn } = x[n]. In general, for ergodic processes, time averages equal ensemble averages.
In practice, it is common to assume that a given sequence is a sample sequence of
an ergodic random process so that averages can be computed from a single sequence.
Of course, we generally cannot compute with the limits in Eqs. (27) and (28), but instead
the quantities

1 
L−1
m̂x = x[n], (29)
L
n=0

1 
L−1
σ̂x2 = |x[n] − m̂x |2 , (30)
L
n=0

2A more precise statement is that the random variables x  and x ∗


n n+m xn  have means equal to mx
and φxx [m], respectively, and their variances are zero.

1030
Random Signals

and
1 
L−1
x[n + m]x∗ [n]L = x[n + m]x∗ [n] (31)
L
n=0

or similar quantities are often computed as estimates of the mean, variance, and autocor-
relation. m̂x and σ̂x2 are referred to as the sample mean and sample variance, respectively.
The estimation of averages of a random process from a finite segment of data is a prob-
lem of statistics.

3 PROPERTIES OF CORRELATION AND COVARIANCE


SEQUENCES OF STATIONARY PROCESSES

Several useful properties of correlation and covariance functions follow in a straight-


forward way from the definitions. These properties are given in this section.
Consider two real stationary random processes {xn } and {yn } with autocorrelation,
autocovariance, cross-correlation, and cross-covariance being given, respectively, by
φxx [m] = E{xn+m x∗n }, (32)
γxx [m] = E{(xn+m − mx )(xn − mx )∗ }, (33)
φxy [m] = E{xn+m y∗n }, (34)
γxy [m] = E{(xn+m − mx )(yn − my )∗ }, (35)
where mx and my are the means of the two processes. The following properties are easily
derived by simple manipulations of the definitions:
Property 1
γxx [m] = φxx [m] − |mx |2 , (36a)
γxy [m] = φxy [m] − mx m∗y . (36b)
These results follow directly from Eqs. (19) and (21), and they indicate that the corre-
lation and covariance sequences are identical for zero-mean processes.
Property 2
φxx [0] = E[|xn |2 ] = Mean-square value, (37a)
γxx [0] = σx2 = Variance. (37b)

Property 3

φxx [−m] = φxx [m], (38a)

γxx [−m] = γxx [m], (38b)

φxy [−m] = φyx [m], (38c)

γxy [−m] = γyx [m]. (38d)

1031
Random Signals

Property 4
|φxy [m]|2 ≤ φxx [0]φyy [0], (39a)
|γxy [m]|2 ≤ γxx [0]γyy [0]. (39b)
In particular,
|φxx [m]| ≤ φxx [0], (40a)
|γxx [m]| ≤ γxx [0]. (40b)

Property 5. If yn = xn−n0 , then


φyy [m] = φxx [m], (41a)
γyy [m] = γxx [m]. (41b)

Property 6. For many random processes, the random variables become


uncorrelated as they become more separated in time. If this is true,
lim γxx [m] = 0, (42a)
m→∞

lim φxx [m] = |mx |2 , (42b)


m→∞

lim γxy [m] = 0, (42c)


m→∞

lim φxy [m] = mx m∗y . (42d)


m→∞
The essence of these results is that the correlation and covariance are finite-energy
sequences that tend to die out for large values of m. Thus, it is often possible to represent
these sequences in terms of their Fourier transforms or z-transforms.

4 FOURIER TRANSFORM REPRESENTATION


OF RANDOM SIGNALS

Although the Fourier transform of a random signal does not exist except in a generalized
sense, the autocovariance and autocorrelation sequences of such a signal are aperiodic
sequences for which the transform does exist. The spectral representation of the cor-
relation functions plays an important role in describing the input–output relations for
a linear time-invariant system when the input is a random signal. Therefore, it is of
interest to consider the properties of correlation and covariance sequences and their
corresponding Fourier and z-transforms.
We define xx (ejω ),xx (ejω ),xy (ejω ),and xy (ejω ) as the DTFTs of φxx [m],γxx [m],
φxy [m], and γxy [m], respectively. Since these functions are all DTFTs of sequences, they
must be periodic with period 2π. From Eqs. (36a) and (36b), it follows that, over one
period |ω| ≤ π,
xx (ejω ) = xx (ejω ) + 2π|mx |2 δ(ω), |ω| ≤ π, (43a)
and
xy (ejω ) = xy (ejω ) + 2πmx m∗y δ(ω), |ω| ≤ π. (43b)

1032
Random Signals

In the case of zero-mean processes (mx = 0 and my = 0), the correlation and covariance
functions are identical so that xx (ejω ) = xx (ejω ) and xy (ejω ) = xy (ejω ).
From the inverse Fourier transform equation, it follows that
 π
1
γxx [m] = xx (ejω )ejωm dω, (44a)
2π −π
 π
1
φxx [m] = xx (ejω )ejωm dω, (44b)
2π −π

and, consequently,
 π
1
E{|x[n]|2 } = φxx [0] = σx2 = xx (ejω )dω, (45a)
2π −π
 π
1
σx2 = γxx [0] = xx (ejω )dω. (45b)
2π −π

Sometimes it is notationally convenient to define the quantity

Pxx (ω) = xx (ejω ), (46)

in which case Eqs. (45a) and (45b) are expressed as


 π
1
E{|x[n]|2 } = Pxx (ω)dω, (47a)
2π −π
 π
1
σx =
2
Pxx (ω)dω. (47b)
2π −π

Thus, the area under Pxx (ω) for −π ≤ ω ≤ π is proportional to the average power in the
signal. In fact, the integral of Pxx (ω) over a band of frequencies is proportional to the
power in the signal in that band. For this reason, the function Pxx (ω) is called the power
density spectrum, or simply, the power spectrum. When Pxx (ω) is a constant independent
of ω, the random process is referred to as a white-noise process, or simply, white noise.
When Pxx (ω) is constant over a band and zero otherwise, we refer to it as bandlimited
white noise.
From Eq. (38a), it can be shown that Pxx (ω) = Pxx ∗ (ω); i.e., P (ω) is always real
xx
valued. Furthermore, for real random processes, φxx [m] = φxx [−m], so in the real case,
Pxx (ω) is both real and even; i.e.,

Pxx (ω) = Pxx (−ω). (48)

An additional important property is that the power density spectrum is nonnegative;


i.e., Pxx (ω) ≥ 0 for all ω.
The cross power density spectrum is defined as

Pxy (ω) = xy (ejω ). (49)

1033
Fourier Analysis
of Signals Using the
Discrete Fourier Transform

0 INTRODUCTION

Because the discrete Fourier transform (DFT) can be computed efficiently, it plays a
central role in a wide variety of signal-processing applications, including filtering and
spectrum analysis. In this chapter, we take an introductory look at Fourier analysis of
signals using the DFT.
In applications and algorithms based on explicit evaluation of the Fourier trans-
form, it is ideally the discrete-time Fourier transform (DTFT) that is desired, although
it is the DFT that can actually be computed. For finite-length signals, the DFT provides
frequency-domain samples of the DTFT, and the implications of this sampling must be
clearly understood and accounted for. For example, in linear filtering or convolution
implemented by multiplying DFTs rather than DTFTs, a circular convolution is imple-
mented, and special care must be taken to ensure that the results will be equivalent to a
linear convolution. In addition, in many filtering and spectrum analysis applications, the
signals do not inherently have finite length.As we will discuss,this inconsistency between
the finite-length requirement of the DFT and the reality of indefinitely long signals can
be accommodated exactly or approximately through the concepts of windowing, block
processing, and the time-dependent Fourier transform.
Fourier Analysis of Signals Using the Discrete Fourier Transform

Continuous-to-
Antialiasing
discrete-time DFT
lowpass filter x[n] [n] V [k]
sc (t) xc (t) conversion
Haa( jΩ)
w [n]

Figure 1 Processing steps in the discrete-time Fourier analysis of a continuous-


time signal.

1 FOURIER ANALYSIS OF SIGNALS USING THE DFT

One of the major applications of the DFT is in analyzing the frequency content of
continuous-time signals. For example, as we describe in Section 4.1, in speech analysis
and processing, frequency analysis is particularly useful in identifying and modeling the
resonances of the vocal cavity. Another example, introduced in Section 4.2, is Doppler
radar, in which the velocity of a target is represented by the frequency shift between
the transmitted and received signals.
The basic steps in applying the DFT to continuous-time signals are indicated in
Figure 1. The antialiasing filter is incorporated to eliminate or minimize the effect of
aliasing when the continuous-time signal is converted to a sequence by sampling. The
need for multiplication of x[n] by w[n], i.e., windowing, is a consequence of the finite-
length requirement of the DFT. In many cases of practical interest, sc (t) and, conse-
quently, x[n] are very long or even indefinitely long signals (such as with speech or
music). Therefore, a finite-duration window w[n] is applied to x[n] prior to computation
of the DFT. Figure 2 illustrates the Fourier transforms of the signals in Figure 1. Fig-
ure 2(a) shows a continuous-time spectrum that tapers off at high frequencies but is not
bandlimited. It also indicates the presence of some narrowband signal energy, repre-
sented by the narrow peaks. The frequency response of an antialiasing filter is illustrated
in Figure 2(b). As indicated in Figure 2(c), the resulting continuous-time Fourier trans-
form Xc (j) contains little useful information about Sc (j) for frequencies above the
cutoff frequency of the filter. Since Haa (j) cannot be ideal, the Fourier components of
the input in the passband and the transition band also will be modified by the frequency
response of the filter.
The conversion of xc (t) to the sequence of samples x[n] is represented in the fre-
quency domain by periodic replication, frequency normalization, and amplitude scaling
i.e.,
∞  
1  ω 2πr
X (e ) =

Xc j + j . (1)
T r=−∞ T T

This is illustrated in Figure 2(d). In a practical implementation, the antialiasing filter can-
not have infinite attenuation in the stopband. Therefore, some nonzero overlap of the
terms in Eq. (1), i.e., aliasing, can be expected; however, this source of error can be made
negligibly small either with a high-quality continuous-time filter or through the use of
initial oversampling followed by more effective discrete-time lowpass filtering and deci-
mation. If x[n] is a digital signal,so thatA/D conversion is incorporated in the second sys-
tem in Figure 1, then quantization error is also introduced. This error can be modeled as a

828
Sc ( jΩ)

−Ω0 0 Ω0 Ω
(a)

1 Haa( jΩ)

 0  Ω

T (b) T

Xc ( jΩ)

 –Ω 0 Ω0  Ω

T (c) T

X (e j)

−2 − −0 0 0 =Ω0T  2 


(d)

W (e j)

−2 − 0  2 
(e) 2
N

V (e j), V [k]

−2 − 0  2 
(f)

Figure 2 Illustration of the Fourier transforms of the system of Figure 1. (a) Fourier
transform of continuous-time input signal. (b) Frequency response of antialiasing
filter. (c) Fourier transform of output of antialiasing filter. (d) Fourier transform of
sampled signal. (e) Fourier transform of window sequence. (f) Fourier transform
of windowed signal segment and frequency samples obtained using DFT samples.

829
Fourier Analysis of Signals Using the Discrete Fourier Transform

noise sequence added to x[n]. The noise can be made negligible through the use of
fine-grained quantization.
The sequence x[n] is typically multiplied by a finite-duration window w[n], since
the input to the DFT must be of finite duration. This produces the finite-length sequence
v[n] = w[n]x[n]. The effect in the frequency domain is a periodic convolution, i.e.,
 π
1
V(ejω ) = X (ejθ )W(ej(ω−θ) )dθ. (2)
2π −π
Figure 2(e) illustrates the Fourier transform of a typical window sequence. Note that
the main lobe is assumed to be concentrated around ω = 0, and, in this illustration,
the side lobes are very small, suggesting that the window tapers at its edges. Properties
of windows such as the Bartlett, Hamming, Hanning, Blackman, and Kaiser windows
are discussed in Section 2. At this point, it is sufficient to observe that convolution of
W(ejω ) with X (ejω ) will tend to smooth sharp peaks and discontinuities in X (ejω ). This
is depicted by the continuous curve plotted in Figure 2(f).
The final operation in Figure 1 is the computation of the DFT. The DFT of the
windowed sequence v[n] = w[n]x[n] is

N−1
V [k] = v[n]e−j(2π/N)kn , k = 0, 1, . . . , N − 1, (3)
n=0

where we assume that the window length L is less than or equal to the DFT length N.
V [k], the DFT of the finite-length sequence v[n], corresponds to equally spaced samples
of the DTFT of v[n]; i.e.,

V [k] = V(ejω )ω=2πk/N . (4)

Figure 2(f) also shows V [k] as the samples of V(ejω ). Since the spacing between DFT fre-
quencies is 2π/N, and the relationship between the normalized discrete-time frequency
variable and the continuous-time frequency variable is ω = T , the DFT frequencies
correspond to the continuous-time frequencies
2πk
k = . (5)
NT
The use of this relationship between continuous-time frequencies and DFT frequencies
is illustrated by Examples 1 and 2.
Example 1 Fourier Analysis Using the DFT
Consider a bandlimited continuous-time signal xc (t) such that Xc (j) = 0 for || ≥
2π(2500). We wish to use the system of Figure 1 to estimate the continuous-time spec-
trum Xc (j). Assume that the antialiasing filter Haa (j) is ideal, and the sampling
rate for the C/D converter is 1/T = 5000 samples/s. If we want the DFT samples V [k]
to be equivalent to samples of Xc (j) that are at most 2π(10) rad/s or 10 Hz apart,
what is the minimum value that we should use for the DFT size N?
From Eq. (5), we see that adjacent samples in the DFT correspond to
continuous-time frequencies separated by 2π/(NT). Therefore, we require that

≤ 20π,
NT

830
Fourier Analysis of Signals Using the Discrete Fourier Transform

which implies that


N ≥ 500
satisfies the condition. If we wish to use a radix-2 FFT algorithm to compute the DFT
in Figure 1, we would choose N = 512 for an equivalent continuous-time frequency
spacing of  = 2π(5000/512) = 2π(9.77) rad/s.

Example 2 Relationship Between DFT Values


Consider the problem posed in Example 1, in which 1/T = 5000, N = 512, and xc (t)
is real-valued and is sufficiently bandlimited to avoid aliasing with the given sampling
rate. If it is determined that V [11] = 2000(1 + j), what can be said about other values
of V [k] or about Xc (j)?
Referring to the symmetry properties of the DFT, V [k] = V ∗ [((−k))N ], k =
0, 1, . . . , N − 1, and consequently, V [N − k] = V ∗ [k], so it follows in this case that
V [512 − 11] = V [501] = V ∗ [11] = 2000(1 − j).
We also know that the DFT sample k = 11 corresponds to the continuous-time fre-
quency 11 = 2π(11)(5000)/512 = 2π(107.4), and similarly, k = 501 corresponds to
the frequency −2π(11)(5000)/512 = −2π(107.4). Although windowing smooths the
spectrum, we can say that
Xc (j11 ) = Xc (j2π(107.4)) ≈ T · V [11] = 0.4(1 + j).
Note that the factor T is required to compensate for the factor 1/T introduced by
sampling, as in Eq. (1). We can again exploit symmetry to conclude that
Xc (−j11 ) = Xc (−j2π(107.4)) ≈ T · V ∗ [11] = 0.4(1 − j).

Many commercial real-time spectrum analyzers are based on the principles em-
bodied in Figures 1 and 2. It should be clear from the preceding discussion, however,
that numerous factors affect the interpretation the DFT of a windowed segment of the
sampled signal in terms of the continuous-time Fourier transform of the original input
sc (t). To accommodate and mitigate the effects of these factors, care must be taken in fil-
tering and sampling the input signal. Furthermore, to interpret the results correctly, the
effects of the time-domain windowing and of the frequency-domain sampling inherent
in the DFT must be clearly understood. For the remainder of the discussion, we will as-
sume that the issues of antialiasing filtering and continuous-to-discrete-time conversion
have been satisfactorily handled and are negligible. In the next section, we concentrate
specifically on the effects of windowing and of the frequency-domain sampling imposed
by the DFT. We choose sinusoidal signals as the specific class of examples to discuss,
because sinusoids are perfectly bandlimited and they are easily computed. However,
most of the issues raised by the examples apply more generally.

831
Fourier Analysis of Signals Using the Discrete Fourier Transform

Short-time Fourier analysis

Radian frequency
0
Time

0 60
Time

(b)

0
Radian frequency

(a)

Figure 23 Illustration of time-dependent Fourier analysis of Doppler radar signal.


(a) Sequence of Fourier transforms of Doppler radar signal. (b) Doppler frequency estimated
by picking the largest peak in the time-dependent Fourier transform.

5 FOURIER ANALYSIS OF STATIONARY RANDOM


SIGNALS: THE PERIODOGRAM

In previous sections, we discussed and illustrated Fourier analysis for sinusoidal signals
with stationary (nontime-varying) parameters and for nonstationary signals such as
speech and radar. In cases where the signal can be modeled by a sum of sinusoids or a
linear system excited by a periodic pulse train, the Fourier transforms of finite-length
segments of the signal have a convenient and natural interpretation in terms of Fourier
transforms, windowing, and linear system theory. However, more noise-like signals, such
as the example of unvoiced speech in Section 4.1, are best modeled as random signals.
Random processes are often used to model signals when the process that generates
the signal is too complex for a reasonable deterministic model. Typically, when the input
to an LTI system is modeled as a stationary random process, many of the essential
characteristics of the input and output are adequately represented by averages, such
as the mean value (dc level), variance (average power), autocorrelation function, or
power density spectrum. Consequently, it is of particular interest to estimate these for
a given signal. An estimate of the mean value of a stationary random process from a

871
Fourier Analysis of Signals Using the Discrete Fourier Transform

finite-length segment of data is the sample mean, defined as

1 
L−1
m̂x = x[n]. (63)
L
n=0

Similarly, an estimate of the variance is the sample variance, defined as

1 
L−1
σ̂x2 = (x[n] − m̂x )2 . (64)
L
n=0

The sample mean and the sample variance, which are themselves random variables, are
unbiased and asymptotically unbiased estimators, respectively; i.e., the expected value
of m̂x is the true mean mx and the expected value of σ̂x2 approaches the true variance σx2
as L approaches ∞. Furthermore, they are both consistent estimators; i.e., they improve
with increasing L, since their variances approach zero as L approaches ∞.
In the remainder of this chapter, we study the estimation of the power spectrum9
of a random signal using the DFT. We will see that there are two basic approaches
to estimating the power spectrum. One approach, which we develop in this section, is
referred to as periodogram analysis and is based on direct Fourier transformation of
finite-length segments of the signal. The second approach, developed in Section 6, is to
first estimate the autocovariance sequence and then compute the Fourier transform of
this estimate. In either case, we are typically interested in obtaining unbiased consistent
estimators. Unfortunately, the analysis of such estimators is very difficult, and generally,
only approximate analyses can be accomplished. Even approximate analyses are beyond
the scope of this text, and we refer to the results of such analyses only in a qualitative
way. Detailed discussions are given in Blackman and Tukey (1958), Hannan (1960),
Jenkins and Watts (1968), Koopmans (1995), Kay and Marple (1981), Marple (1987),
Kay (1988) and Stoica and Moses (2005).

5.1 The Periodogram


Let us consider the problem of estimating the power density spectrum Pss () of a
continuous-time signal sc (t). An intuitive approach to the estimation of the power spec-
trum is suggested by Figure 1 and the associated discussion in Section 1. Based on that
approach, we now assume that the input signal sc (t) is a stationary random signal. The
antialiasing lowpass filter creates a new stationary random signal whose power spectrum
is bandlimited, so that the signal can be sampled without aliasing. Then, x[n] is a station-
ary discrete-time random signal whose power density spectrum Pxx (ω) is proportional
to Pss () over the bandwidth of the antialiasing filter; i.e.,
1 ω
Pxx (ω) = Pss , |ω| < π, (65)
T T
where we have assumed that the cutoff frequency of the antialiasing filter is π/T and
that T is the sampling period. (See Problem 39 for a further consideration of sampling of

9The term power spectrum is commonly used interchangeably with the more precise term power density
spectrum.

872
Fourier Analysis of Signals Using the Discrete Fourier Transform

random signals.) Consequently, a good estimate of Pxx (ω) will provide a useful estimate
of Pss (). The window w[n] in Figure 1 selects a finite-length segment (L samples) of
x[n], which we denote v[n], the Fourier transform of which is


L−1
V(ejω ) = w[n]x[n]e−jωn . (66)
n=0

Consider as an estimate of the power spectrum the quantity


1
I(ω) = |V(ejω )|2 , (67)
LU
where the constant U anticipates a need for normalization to remove bias in the spectrum
estimate. When the window w[n] is the rectangular window sequence, this estimator for
the power spectrum is called the periodogram. If the window is not rectangular, I(ω)
is called the modified periodogram. Clearly, the periodogram has some of the basic
properties of the power spectrum. It is nonnegative, and for real signals, it is a real and
even function of frequency. Furthermore, it can be shown (Problem 33) that

1 
L−1
I(ω) = cvv [m]e−jωm , (68)
LU
m=−(L−1)

where

L−1
cvv [m] = x[n]w[n]x[n + m]w[n + m]. (69)
n=0

We note that the sequence cvv [m] is the aperiodic correlation sequence for the finite-
length sequence v[n] = w[n]x[n]. Consequently, the periodogram is in fact the Fourier
transform of the aperiodic correlation of the windowed data sequence.
Explicit computation of the periodogram can be carried out only at discrete fre-
quencies. From Eqs. (66) and (67), we see that if the DTFT of w[n]x[n] is replaced by its
DFT, we will obtain samples at the DFT frequencies ωk = 2πk/N for k = 0, 1, . . . , N −1.
Specifically, samples of the periodogram are given by
1
I[k] = I(ωk ) = |V [k]|2 , (70)
LU
where V [k] is the N-point DFT of w[n]x[n]. If we want to choose N to be greater than the
window length L, appropriate zero-padding would be applied to the sequence w[n]x[n].
If a random signal has a nonzero mean, its power spectrum has an impulse at
zero frequency. If the mean is relatively large, this component will dominate the spec-
trum estimate, causing low-amplitude, low-frequency components to be obscured by
leakage. Therefore, in practice the mean is often estimated using Eq. (63), and the
resulting estimate is subtracted from the random signal before computing the power
spectrum estimate. Although the sample mean is only an approximate estimate of the
zero-frequency component, subtracting it from the signal often leads to better estimates
at neighboring frequencies.

873
Fourier Analysis of Signals Using the Discrete Fourier Transform

5.2 Properties of the Periodogram


The nature of the periodogram estimate of the power spectrum can be determined by
recognizing that, for each value of ω, I(ω) is a random variable. By computing the mean
and variance of I(ω), we can determine whether the estimate is biased and whether it
is consistent.
From Eq. (68), the expected value of I(ω) is
1 
L−1
E{I(ω)} = E{cvv [m]}e−jωm . (71)
LU
m=−(L−1)

The expected value of cvv [m] can be expressed as



L−1
E{cvv [m]} = E{x[n]w[n]x[n + m]w[n + m]}
n=0
(72)

L−1
= w[n]w[n + m]E{x[n]x[n + m]}.
n=0
Since we are assuming that x[n] is stationary,
E{x[n]x[n + m]} = φxx [m], (73)
and Eq. (72) can then be rewritten as
E{cvv [m]} = cww [m]φxx [m], (74)
where cww [m] is the aperiodic autocorrelation of the window, i.e.,

L−1
cww [m] = w[n]w[n + m]. (75)
n=0
That is, the mean of the aperiodic autocorrelation of the windowed signal is equal to the
aperiodic autocorrelation of the window multiplied by the true autocorrelation function;
i.e., in an average sense, the autocorrelation function of the data window appears as a
window on the true autocorrelation function.
From Eq. (71), Eq. (74), and the modulation–windowing property of Fourier trans-
forms, it follows that
 π
1
E{I(ω)} = Pxx (θ)Cww (ej(ω−θ) )dθ, (76)
2πLU −π
where Cww (ejω ) is the Fourier transform of the aperiodic autocorrelation of the window,
i.e.,
Cww (ejω ) = |W(ejω )|2 . (77)
According to Eq. (76), both the periodogram and the modified periodogram are
biased estimates of the power spectrum, since E{I(ω)} is not equal to Pxx (ω). Indeed, we
see that the bias arises as a result of convolution of the true power spectrum with the
Fourier transform of the aperiodic autocorrelation of the data window. If we increase
the window length, we expect that W(ejω ) should become more concentrated around
ω = 0, and thus Cww (ejω ) should look increasingly like a periodic impulse train. If
the scale factor 1/(LU) is correctly chosen, then E{I(ω)} should approach Pxx (ω) as

874
Fourier Analysis of Signals Using the Discrete Fourier Transform

Cww (ejω ) approaches a periodic impulse train. The scale can be adjusted by choosing
the normalizing constant U so that
 π
1 
L−1
1
|W(e )| dω =
jω 2
(w[n])2 = 1, (78)
2πLU −π LU
n=0
or
1 
L−1
U= (w[n])2 . (79)
L
n=0

For the rectangular window, we should choose U = 1, while other data windows would
require a value of 0 < U < 1 if w[n] is normalized to a maximum value of 1. Alter-
natively, the normalization can be absorbed into the amplitude of w[n]. Therefore, if
properly normalized, the periodogram and modified periodogram are both asymptoti-
cally unbiased; i.e., the bias approaches zero as the window length increases.
To examine whether the periodogram is a consistent estimate or becomes a consis-
tent estimate as the window length increases, it is necessary to consider the behavior of
the variance of the periodogram. An expression for the variance of the periodogram is
very difficult to obtain even in the simplest cases. However, it has been shown (see Jenk-
ins and Watts, 1968) that over a wide range of conditions, as the window length increases,
var[I(ω)]  Pxx
2
(ω). (80)
That is, the variance of the periodogram estimate is approximately the same size as the
square of the power spectrum that we are estimating. Therefore, since the variance does
not asymptotically approach zero with increasing window length, the periodogram is
not a consistent estimate.
The properties of the periodogram estimate of the power spectrum just discussed
are illustrated in Figure 24, which shows periodogram estimates of white noise us-
ing rectangular windows of lengths L = 16, 64, 256, and 1024. The sequence x[n]

4
Amplitude

0
0 128 256 384 512
Sample number (k)
(a)

Figure 24 Periodograms of pseudorandom white-noise sequence. (a) Window


length L = 16 and DFT length N = 1024.

875
512

Figure 24 (continued ) (b) L = 64 and N = 1024. (c) L = 256 and N = 1024.


(d) L = 1024 and N = 1024.

876
Fourier Analysis of Signals Using the Discrete Fourier Transform

was obtained√from a pseudorandom-number generator whose output was scaled so


that |x[n]| ≤ 3. A good random-number generator produces a uniform distribution of
amplitudes, and the measured sample-to-sample correlation is small. Thus, the power
spectrum of the output of the random-number generator could be modeled in this case
by Pxx (ω) = σx2 = 1 for all ω. For each of the four rectangular windows, the periodogram
was computed with normalizing constant U = 1 and at frequencies ωk = 2πk/N for
N = 1024 using the DFT. That is,
L−1 2
 
1 1  
I[k] = I(ωk ) = |V [k]|2 =  w[n]x[n]e−j(2π/N)kn  . (81)
L L 
n=0
In Figure 24, the DFT values are connected by straight lines for purposes of display.
Recall that I(ω) is real and an even function of ω, so we only need to plot I[k] for
0 ≤ k ≤ N/2 corresponding to 0 ≤ ω ≤ π. Note that the spectrum estimate fluctuates
more rapidly as the window length L increases. This behavior can be understood by
recalling that, although we view the periodogram method as a direct computation of
the spectrum estimate, we have seen that the underlying correlation estimate of Eq. (69)
is, in effect, Fourier transformed to obtain the periodogram. Figure 25 illustrates a win-
dowed sequence,x[n]w[n],and a shifted version,x[n+m]w[n+m],as required in Eq. (69).
From this figure, we see that (L − m) signal values are involved in computing a partic-
ular correlation lag value cvv [m]. Thus, when m is close to L, only a few values of x[n]
are involved in the computation, and we expect that the estimate of the correlation
sequence will be considerably more inaccurate for these values of m and consequently
will also show considerable variation between adjacent values of m. On the other hand,
when m is small, many more samples are involved, and the variability of cvv [m] with m
should not be as great. The variability at large values of m manifests itself in the Fourier
transform as fluctuations at all frequencies, and thus, for large L, the periodogram
estimate tends to vary rapidly with frequency. Indeed, it can be shown (see Jenkins

x [n] w[n]

0 n

(a)

x [n + m] w[n + m]

−m 0 n Figure 25 Illustration of sequences


involved in Eq. (69). (a) A finite-length
sequence. (b) Shifted sequence for
(b) m > 0.

877
Fourier Analysis of Signals Using the Discrete Fourier Transform

and Watts, 1968) that if N = L, the periodogram estimates at the DFT frequencies
2πk/N become uncorrelated. Since, as N increases, the DFT frequencies get closer
together, this behavior is inconsistent with our goal of obtaining a good estimate of
the power spectrum. We would prefer to obtain a smooth spectrum estimate without
random variations resulting from the estimation process. This can be accomplished by
averaging multiple independent periodogram estimates to reduce the fluctuations.

5.3 Periodogram Averaging


The averaging of periodograms in spectrum estimation was first studied extensively
by Bartlett (1953); later, after fast algorithms for computing the DFT were developed,
Welch (1970) combined these computational algorithms with the use of a data window
w[n] to develop the method of averaging modified periodograms. In periodogram aver-
aging, a data sequence x[n], 0 ≤ n ≤ Q − 1, is divided into segments of length-L samples,
with a window of length L applied to each; i.e., we form the segments
xr [n] = x[rR + n]w[n], 0 ≤ n ≤ L − 1. (82)
If R < L the segments overlap, and for R = L the segments are contiguous. Note
that Q denotes the length of the available data. The total number of segments depends
on the values of, and relationship among, R, L, and Q. Specifically, there will be K full-
length segments, where K is the largest integer for which (K − 1)R + (L − 1) ≤ Q − 1.
The periodogram of the r th segment is
1
Ir (ω) = |Xr (ejω )|2 , (83)
LU
where Xr (ejω ) is the DTFT of xr [n]. Each Ir (ω) has the properties of a periodogram, as
described previously. Periodogram averaging consists of averaging the K periodogram
estimates Ir (ω); i.e., we form the time-averaged periodogram defined as

1 
K−1
Ī(ω) = Ir (ω). (84)
K
r=0

To examine the bias and variance of Ī(ω), let us take L = R, so that the segments do
not overlap, and assume that φxx [m] is small for m > L; i.e., signal samples more than
L apart are approximately uncorrelated. If we assume that the periodograms Ir (ω) are
identically distributed independent random variables, then the expected value of Ī(ω) is

1 
K−1
E{Ī(ω)} = E{Ir (ω)}, (85)
K
r=0

or, since we assume that the periodograms are independent and identically distributed,

E{Ī(ω)} = E{Ir (ω)} for any r. (86)


From Eq. (76), it follows that
 π
1
E{Ī(ω)} = E{Ir (ω)} = Pxx (θ)Cww (ej(ω−θ) )dθ, (87)
2πLU −π

878
Fourier Analysis of Signals Using the Discrete Fourier Transform

where L is the window length. When the window w[n] is the rectangular window, the
method of averaging periodograms is called Bartlett’s procedure, and in this case it can
be shown that

L − |m|, |m| ≤ (L − 1),
cww [m] = (88)
0 otherwise,

and, therefore,
 2
sin(ωL/2)
Cww (ejω ) = . (89)
sin(ω/2)
That is, the expected value of the average periodogram spectrum estimate is the convo-
lution of the true power spectrum with the Fourier transform of the triangular sequence
cww [n] that results as the autocorrelation of the rectangular window. Thus, the average
periodogram is also a biased estimate of the power spectrum.
To examine the variance,we use the fact that,in general,the variance of the average
of K independent identically distributed random variables is 1/K times the variance of
each individual random variable. (See Bertsekas and Tsitsiklis, 2008.) Therefore, the
variance of the average periodogram is
1
var[Ī(ω)] = var[Ir (ω)], (90)
K
or, with Eq. (80), it follows that
1 2
var[Ī(ω)]  P (ω). (91)
K xx

Consequently, the variance of Ī(ω) is inversely proportional to the number of peri-


odograms averaged, and as K increases, the variance approaches zero.
From Eq. (89), we see that as L, the length of the segment xr [n], increases, the
main lobe of Cww (ejω ) decreases in width, and consequently, from Eq. (87), E{Ī(ω)}
more closely approximates Pxx (ω). However, for fixed total data length Q, the total
number of segments (assuming that L = R) is Q/L; therefore, as L increases, K decreases.
Correspondingly, from Eq. (91), the variance of Ī(ω) will increase. Thus, as is typical in
statistical estimation problems, for a fixed data length there is a trade off between bias
and variance. However, as the data length Q increases, both L and K can be allowed to
increase, so that as Q approaches ∞, the bias and variance of Ī(ω) can approach zero.
Consequently, periodogram averaging provides an asymptotically unbiased, consistent
estimate of Pxx (ω).
The preceding discussion assumed that nonoverlapping rectangular windows were
used in computing the time-dependent periodograms. Welch (1970) showed that if a
different window shape is used, the variance of the average periodogram still behaves,
as in Eq. (91). Welch also considered the case of overlapping windows and showed
that if the overlap is one-half the window length, the variance is further reduced by
almost a factor of 2, due to the doubling of the number of sections. Greater overlap
does not continue to reduce the variance, because the segments become decreasingly
independent as the overlap increases.

879
Fourier Analysis of Signals Using the Discrete Fourier Transform

5.4 Computation of Average Periodograms Using


the DFT
As with the periodogram, the average periodogram can be explicitly evaluated only
at a discrete set of frequencies. Because of the availability of the FFT algorithms for
computing the DFT, a particularly convenient and widely used choice is the set of
frequencies ωk = 2πk/N for an appropriate choice of N. From Eq. (84), we see that if
the DFT of xr [n] is substituted for the Fourier transform of xr [n] in Eq. (83), we obtain
samples of Ī(ω) at the DFT frequencies ωk = 2πk/N for k = 0, 1, . . . , N − 1. Specifically,
with Xr [k] denoting the DFT of xr [n],
1
Ir [k] = Ir (ωk ) = |Xr [k]|2 , (92a)
LU
1 
K−1
Ī[k] = Ī(ωk ) = Ir [k]. (92b)
K
r=0

It is worthwhile to note the relationship between periodogram averaging and


the time-dependent Fourier transform as discussed in detail in Section 3. Equation
(92a) shows that, except for the introduction of the normalizing constant 1/(LU), each
individual periodogram is simply the magnitude-squared of the time-dependent Fourier
transform at time rR and frequency 2πk/N. Thus, for each frequency index k, the average
power spectrum estimate at frequency corresponding to k is the time average of the
time-sampled time-dependent Fourier transform. This can be visualized by considering
the spectrograms in Figure 22. The value Ī[k] is simply the average along a horizontal
line at frequency 2πk/N (or 2πk/(NT ) in analog frequency).10 Averaging the wideband
spectrogram implies that the resulting power spectrum estimate will be smooth when
considered as a function of frequency, while the narrowband condition corresponds to
longer time windows and thus, less smoothness in frequency.
We denote Ir (2πk/N ) as the sequence Ir [k] and Ī(2πk/N ) as the sequence Ī[k].
According to Eqs. (92a) and (92b), the average periodogram estimate of the power
spectrum is computed at N equally spaced frequencies by averaging the magnitude of
the DFTs of the windowed data segments with the normalizing factor LU. This method
of power spectrum estimation provides a very convenient framework within which to
trade off between resolution and variance of the spectrum estimate. It is particularly
simple and efficient to implement using FFT algorithms. An important advantage of the
method over those to be discussed in Section 6 is that the spectrum estimate is always
nonnegative.

5.5 An Example of Periodogram Analysis


Power spectrum analysis is a valuable tool for modeling signals, and it also can be used
to detect signals, particularly when it comes to finding hidden periodicities in sampled

10 Note that the spectrogram is normally computed such that the windowed segments overlap con-
siderably as r varies, while in periodogram averaging R is normally equal to the window length or half the
window length.

880
C H A P T E R 10

Power Spectral Density

INTRODUCTION

Understanding how the strength of a signal is distributed in the frequency domain,


relative to the strengths of other ambient signals, is central to the design of any
LTI filter intended to extract or suppress the signal. We know this well in the case
of deterministic signals, and it turns out to be just as true in the case of random
signals. For instance, if a measured waveform is an audio signal (modeled as a
random process since the specific audio signal isn’t known) with additive distur­
bance signals, you might want to build a lowpass LTI filter to extract the audio
and suppress the disturbance signals. We would need to decide where to place the
cutoff frequency of the filter.
There are two immediate challenges we confront in trying to find an appropriate
frequency-domain description for a WSS random process. First, individual sample
functions typically don’t have transforms that are ordinary, well-behaved functions
of frequency; rather, their transforms are only defined in the sense of generalized
functions. Second, since the particular sample function is determined as the out­
come of a probabilistic experiment, its features will actually be random, so we have
to search for features of the transforms that are representative of the whole class
of sample functions, i.e., of the random process as a whole.
It turns out that the key is to focus on the expected power in the signal. This is a
measure of signal strength that meshes nicely with the second-moment characteri­
zations we have for WSS processes, as we show in this chapter. For a process that
is second-order ergodic, this will also correspond to the time average power in any
realization. We introduce the discussion using the case of CT WSS processes, but
the DT case follows very similarly.

10.1 EXPECTED INSTANTANEOUS POWER AND POWER SPECTRAL


DENSITY

Motivated by situations in which x(t) is the voltage across (or current through) a
unit resistor, we refer to x2 (t) as the instantaneous power in the signal x(t). When
x(t) is WSS, the expected instantaneous power is given by

Z ∞
2 1
E[x (t)] = Rxx (0) = Sxx (jω) dω , (10.1)
2π −∞

c
°Alan V. Oppenheim and George C. Verghese, 2010 183
184 Chapter 10 Power Spectral Density

where Sxx (jω) is the CTFT of the autocorrelation function Rxx (τ ). Furthermore,
when x(t) is ergodic in correlation, so that time averages and ensemble averages
are equal in correlation computations, then (10.1) also represents the time-average
power in any ensemble member. Note that since Rxx (τ ) = Rxx (−τ ), we know
Sxx (jω) is always real and even in ω; a simpler notation such as Pxx (ω) might
therefore have been more appropriate for it, but we shall stick to Sxx (jω) to avoid
a proliferation of notational conventions, and to keep apparent the fact that this
quantity is the Fourier transform of Rxx (τ ).
The integral above suggests that we might be able to consider the expected (in­
stantaneous) power (or, assuming the process is ergodic, the time-average power)
in a frequency band of width dω to be given by (1/2π)Sxx (jω) dω. To examine
this thought further, consider extracting a band of frequency components of x(t)
by passing x(t) through an ideal bandpass filter, shown in Figure 10.1.

x(t) � H(jω) � y(t)

�Δ� �H(jω) �Δ�


1


−ω0 ω0 ω

FIGURE 10.1 Ideal bandpass filter to extract a band of frequencies from input, x(t).

Because of the way we are obtaining y(t) from x(t), the expected power in the
output y(t) can be interpreted as the expected power that x(t) has in the selected
passband. Using the fact that

Syy (jω) = |H(jω)|2 Sxx (jω) , (10.2)

we see that this expected power can be computed as


Z +∞ Z
1 1
E{y 2 (t)} = Ryy (0) = Syy (jω) dω = Sxx (jω) dω . (10.3)
2π −∞ 2π passband

Thus Z
1
Sxx (jω) dω (10.4)
2π passband

is indeed the expected power of x(t) in the passband. It is therefore reasonable to


call Sxx (jω) the power spectral density (PSD) of x(t). Note that the instanta­
neous power of y(t), and hence the expected instantaneous power E[y 2 (t)], is always
nonnegative, no matter how narrow the passband, It follows that, in addition to
being real and even in ω, the PSD is always nonnegative, Sxx (jω) ≥ 0 for all ω.
While the PSD Sxx (jω) is the Fourier transform of the autocorrelation function, it

c
°Alan V. Oppenheim and George C. Verghese, 2010
Section 10.2 Einstein-Wiener-Khinchin Theorem on Expected Time-Averaged Power 185

is useful to have a name for the Laplace transform of the autocorrelation function;
we shall refer to Sxx (s) as the complex PSD.
Exactly parallel results apply for the DT case, leading to the conclusion that
Sxx (ejΩ ) is the power spectral density of x[n].

10.2 EINSTEIN-WIENER-KHINCHIN THEOREM ON EXPECTED TIME­


AVERAGED POWER

The previous section defined the PSD as the transform of the autocorrelation func­
tion, and provided an interpretation of this transform. We now develop an alter­
native route to the PSD. Consider a random realization x(t) of a WSS process.
We have already mentioned the difficulties with trying to take the CTFT of x(t)
directly, so we proceed indirectly. Let xT (t) be the signal obtained by windowing
x(t), so it equals x(t) in the interval (−T , T ) but is 0 outside this interval. Thus

xT (t) = wT (t) x(t) , (10.5)

where we define the window function wT (t) to be 1 for |t| < T and 0 otherwise. Let
XT (jω) denote the Fourier transform of xT (t); note that because the signal xT (t) is
nonzero only over the finite interval (−T, T ), its Fourier transform is typically well
defined. We know that the energy spectral density (ESD) S xx (jω) of xT (t) is
given by
S xx (jω) = |XT (jω)|2 (10.6)
and that this ESD is actually the Fourier transform of xT (τ )∗x← ←
T (τ ), where xT (t) =
xT (−t). We thus have the CTFT pair
Z ∞

xT (τ ) ∗ xT (τ ) = wT (α)wT (α − τ )x(α)x(α − τ ) dα ⇔ |XT (jω)|2 , (10.7)
−∞

or, dividing both sides by 2T (which is valid, since scaling a signal by a constant
scales its Fourier transform by the same amount),
Z ∞
1 1
wT (α)wT (α − τ )x(α)x(α − τ ) dα ⇔ |XT (jω)|2 . (10.8)
2T −∞ 2T

The quantity on the right is what we defined (for the DT case) as the periodogram
of the finite-length signal xT (t).
Because the Fourier transform operation is linear, the Fourier transform of the
expected value of a signal is the expected value of the Fourier transform. We
may therefore take expectations of both sides in the preceding equation. Since
E[x(α)x(α − τ )] = Rxx (τ ), we conclude that
1
Rxx (τ )Λ(τ ) ⇔ E[|XT (jω)|2 ] , (10.9)
2T
where Λ(τ ) is a triangular pulse of height 1 at the origin and decaying to 0 at
|τ | = 2T , the result of carrying out the convolution wT ∗ wT← (τ ) and dividing by

c
°Alan V. Oppenheim and George C. Verghese, 2010
186 Chapter 10 Power Spectral Density

2T . Now taking the limit as T goes to ∞, we arrive at the result


1
Rxx (τ ) ⇔ Sxx (jω) = lim E[|XT (jω)|2 ] . (10.10)
T →∞ 2T
This is the Einstein-Wiener-Khinchin theorem (proved by Wiener, and inde­
pendently by Khinchin, in the early 1930’s, but — as only recently recognized —
stated by Einstein in 1914).
The result is important to us because it underlies a basic method for estimating
Sxx (jω): with a given T , compute the periodogram for several realizations of the
random process (i.e., in several independent experiments), and average the results.
Increasing the number of realizations over which the averaging is done will reduce
the noise in the estimate, while repeating the entire procedure for larger T will
improve the frequency resolution of the estimate.

10.2.1 System Identification Using Random Processes as Input

Consider the problem of determining or “identifying” the impulse response h[n]


of a stable LTI system from measurements of the input x[n] and output y[n], as
indicated in Figure 10.2.

x[n] � h[n] � y[n]

FIGURE 10.2 System with impulse response h[n] to be determined.

The most straightforward approach is to choose the input to be a unit impulse


x[n] = δ[n], and to measure the corresponding output y[n], which by definition is
the impulse response. It is often the case in practice, however, that we do not wish
to — or are unable to — pick this simple input.
For instance, to obtain a reliable estimate of the impulse response in the presence of
measurement errors, we may wish to use a more “energetic” input, one that excites
the system more strongly. There are generally limits to the amplitude we can use
on the input signal, so to get more energy we have to cause the input to act over
a longer time. We could then compute h[n] by evaluating the inverse transform
of H(ejΩ ), which in turn could be determined as the ratio Y (ejΩ )/X(ejΩ ). Care
has to be taken, however, to ensure that X(ejΩ ) = 6 0 for any Ω; in other words,
the input has to be sufficiently “rich”. In particular, the input cannot be just a
finite linear combination of sinusoids (unless the LTI system is such that knowledge
of its frequency response at a finite number of frequencies serves to determine the
frequency response at all frequencies — which would be the case with a lumped
system, i.e., a finite-order system, except that one would need to know an upper
bound on the order of the system so as to have a sufficient number of sinusoids
combined in the input).

c
°Alan V. Oppenheim and George C. Verghese, 2010
Parametric Signal
Modeling

0 INTRODUCTION

It is often convenient to use several different representations of signals and systems. The
representation as a linear combination of sinusoidal and complex exponential signals
led to the Fourier series, the Fourier transform, and the frequency domain characteriza-
tion of signals and LTI systems. Although these representations are particularly useful
because of their generality, they are not always the most efficient representation for
signals with a known structure.
This chapter introduces another powerful approach to signal representation called
parametric signal modeling. In this approach. a signal is represented by a mathematical
model that has a predefined structure involving a limited number of parameters. A
given signal s[n] is represented by choosing the specific set of parameters that results in
the model output ŝ[n] being as close as possible in some prescribed sense to the given
signal. A common example is to model the signal as the output of a discrete-time linear
system as shown in Figure 1. Such models, which are comprised of the input signal v[n]
and the system function H(z) of the linear system, become useful with the addition of
constraints that make it possible to solve for the parameters of H(z) given the signal

v[n] LTI ˆ
s[n]
System
H(z) Figure 1 Linear system model for a
signal s[n].
Parametric Signal Modeling

to be represented. For example, if the input v[n] is specified, and the system function is
assumed to be a rational function of the form

q
bk z−k
k=0
H(z) = , (1)
p
−k
1− ak z
k=1

then the signal is modeled by the values of the ak s and bk s or equivalently, by the poles
and zeros of H(z), along with knowledge of the input. The input signal v[n] is generally
assumed to be a unit impulse δ[n] for deterministic signals, or white noise if the signal
s[n] is viewed as a random signal. When the model is appropriately chosen, it is possible
to represent a large number of signal samples by a relatively small set of parameters.
Parametric signal modeling has a wide range of applications, including data com-
pression, spectrum analysis, signal prediction, deconvolution, filter design, system iden-
tification, signal detection, and signal classification. In data compression, for example, it
is the set of model parameters that is transmitted or stored, and the receiver then uses
the model with those parameters to regenerate the signal. In filter design, the model pa-
rameters are chosen to best approximate, in some sense, the desired frequency response,
or equivalently, the desired impulse response, and the model with these parameters then
corresponds to the designed filter. The two key elements for success in all of the appli-
cations are an appropriate choice of model and an accurate estimate of the parameters
for the model.

1 ALL-POLE MODELING OF SIGNALS

The model represented by Eq. (1) in general has both poles and zeros. While there
are a variety of techniques for determining the full set of numerator and denominator
coefficients in Eq. (1), the most successful and most widely used have concentrated on
restricting q to be zero, in which case, H(z) in Figure 1 has the form
G G
H(z) = = , (2)

p A(z)
−k
1− ak z
k=1

where we have replaced the parameter b0 by the parameter G to emphasize its role
as an overall gain factor. Such models are aptly termed “all-pole” models.1 By its very
nature, it would appear that an all-pole model would be appropriate only for modeling
signals of infinite duration. While this may be true in a theoretical sense, this choice
for the system function of the model works well for signals found in many applications,
and as we will show, the parameters can be computed in a straightforward manner from
finite-duration segments of the given signal.

1 Detailed discussion of this case and the general pole/zero case are given in Kay (1988), Thierrien
(1992), Hayes (1996) and Stoica and Moses (2005).

933
Parametric Signal Modeling

The input and output of the all-pole system in Eq. (2) satisfy the linear constant–
coefficient difference equation

p
ŝ[n] = ak ŝ[n − k] + Gv[n], (3)
k=1
which indicates that the model output at time n is comprised of a linear combination
of past samples plus a scaled input sample. As we will see, this structure suggests that
the all-pole model is equivalent to the assumption that the signal can be approximated
as a linear combination of (or equivalently, is linearly predictable from) its previous
values. Consequently, this method for modeling a signal is often also referred to as
linear predictive analysis or linear prediction.2

1.1 Least-Squares Approximation


The goal in all-pole modeling is to choose the input v[n] and the parameters G, and
a1 , . . . , ap in Eq. (3) such that ŝ[n] is a close approximation in some sense to s[n],
the signal to be modeled. If, as is usually the case, v[n] is specified in advance (e.g.,
v[n] = δ[n]), a direct approach to determining the best values for the parameters might
be to minimize the total energy in the error signal ese [n] = (s[n]−ŝ[n]),thereby obtaining
a least-squares approximation to s[n]. Specifically, for deterministic signals, the model
parameters might be chosen to minimize the total squared error
∞ ∞
 2
  
p
(s[n] − ŝ[n]) =
2
s[n] − ak ŝ[n − k] − Gv[n] . (4)
n=−∞ n=−∞ k=1
In principle, the ak s minimizing this squared error can be found by differentiating the
expression in Eq. (4) with respect to each parameter, setting that derivative to zero, and
solving the resulting equations. However, this results in a nonlinear system of equations,
the solution of which is computationally difficult, in general. Although this least-squares
problem is too difficult for most practical applications, the basic least-squares principle
can be applied to slightly different formulations with considerable success.

1.2 Least-Squares Inverse Model


A formulation based on inverse filtering provides a relatively straightforward and
tractable solution for the parameter values in the all-pole model. In any approach to
approximation, it is recognized at the outset that the model output will in most cases
not be exactly equal to the signal to be modeled. The inverse filtering approach is based
on the recognition that if the given signal s[n] is in fact the output of the filter H(z) in
the model of Figure 1 then with s[n] as the input to the inverse of H(z), the output will
be v[n]. Consequently, as depicted in Figure 2 and with H(z) assumed to be an all-pole
system as specified in Eq. (2), the inverse filter, whose system function
p
A(z) = 1 − ak z−k , (5)
k=1

2When used in the context of speech processing, linear predictive analysis is often referred to as linear
predictive coding (LPC). (See Rabner and Schafer, 1978 and Quatieri, 2002.)

934
Parametric Signal Modeling

s[n] Inverse g[n]


System
A(z) Figure 2 Inverse filter formulation for
all-pole signal modeling.

is sought so that its output g[n] would be equal to the scaled input Gv[n]. In this for-
mulation, then, we choose the parameters of the inverse filter (and therefore implicitly
the parameters of the model system) to minimize the mean-squared error between g[n]
and Gv[n]. As we will see, this leads to a set of well-behaved linear equations.
From Figure 2 and Eq. (5) it follows that g[n] and s[n] satisfy the difference equa-
tion

p
g[n] = s[n] − ak s[n − k]. (6)
k=1

The modeling error ê[n] is now defined as



p
ê[n] = g[n] − Gv[n] = s[n] − ak s[n − k] − Gv[n]. (7)
k=1

If v[n] is an impulse, then, for n > 0, the error ê[n] corresponds to the error between
s[n] and the linear prediction of s[n] using the model parameters. Thus, it is convenient
to also express Eq. (7) as
ê[n] = e[n] − Gv[n], (8)
where e[n] is the prediction error given by

p
e[n] = s[n] − ak s[n − k]. (9)
k=1

For a signal that exactly fits the all-pole model of Eq. (3), the modeling error ê[n] will
be zero, and the prediction error e[n] will be the scaled input, i.e.,
e[n] = Gv[n]. (10)
This formulation in terms of inverse filtering leads to considerable simplification,
since v[n] is assumed known and e[n] can be computed from s[n] using Eq. (9). The
parameter values ak are then chosen to minimize
 
E = |ê[n]|2 , (11)

where the notation  ·  denotes a summing operation for finite energy determinis-
tic signals and an ensemble averaging operation for random signals. Minimizing E in
Eq. (11) results in an inverse filter that minimizes the total energy in the modeling error
in the case of deterministic signals or the mean-squared value of the modeling error in
the case of random signals. For convenience, we will often refer to · as the averaging
operator where its interpretation as a sum or as an ensemble average should be clear
from the context. Again, note that in solving for the parameters ak specifying the inverse
system of Figure 2, the all-pole system is implicitly specified, as well.

935
Parametric Signal Modeling

To find the optimal parameter values, we substitute Eq. (8) into Eq. (11) to obtain
 
E = (e[n] − Gv[n])2 , (12)
or equivalently,
   
E = e2 [n] + G2 v2 [n] − 2G v[n]e[n] . (13)

To find the parameters that minimize E, we differentiate Eq. (12) with respect
to the ith filter coefficient ai and set the derivative equal to zero, leading to the set of
equations
∂E ∂  2  
= e [n] − 2G v[n]s[n − i] = 0, i = 1, 2, . . . , p, (14)
∂ai ∂ai
where we have assumed that G is independent of ai and, of course, so is v[n], and
consequently that
∂  2  2 
G v [n] = 0. (15)
∂ai
For models that will be of interest to us, v[n] will be an impulse if s[n] is a causal
finite-energy signal and white noise if s[n] is a wide-sense stationary random pro-
cess. With v[n] an impulse and s[n] zero for n < 0, the product v[n]s[n − i] = 0 for
i = 1, 2, . . . p. With v[n] as white noise,
v[n]s[n − i] = 0, i = 1, 2, . . . p, (16)
since for any value of n, the input of a causal system with white-noise input is uncor-
related with the output values prior to time n. Thus, for both cases, Eq. (14) reduces
to
∂E ∂  2 
= e [n] = 0 i = 1, 2, , . . . , p (17)
∂ai ∂ai
In other words, choosing the coefficients to minimize the average squared modeling
error ê2 [n] is equivalent to minimizing the average squared prediction error e2 [n] .
Expanding Eq. (17) and invoking the linearity of the averaging operator, we obtain
from Eq. (17) the equations

p
s[n]s[n − i] − ak s[n − k]s[n − i] = 0, i = 1, . . . , p. (18)
k=1
Defining
φss [i, k] = s[n − i]s[n − k] , (19)
Eqs. (18) can be rewritten more compactly as

p
ak φss [i, k] = φss [i, 0], i = 1, 2, . . . , p. (20)
k=1
Equations (20) comprise a system of p linear equations in p unknowns. Com-
putation of the parameters of the model can be achieved by solving the set of linear
equations for the parameters ak for k = 1, 2, . . . , p, using known values for φss [i, k] for
i = 1, 2, . . . , p and k = 0, 1, . . . , p or first computing them from s[n].

936
Parametric Signal Modeling

1.3 Linear Prediction Formulation of All-Pole Modeling


As suggested earlier, an alternative and useful interpretation of all-pole signal modeling
stems from the interpretation of Eq. (3) as a linear prediction of the output in terms of
past values, with the prediction error e[n] being the scaled input Gv[n], i.e.,


p
e[n] = s[n] − ak s[n − k] = Gv[n]. (21)
k=1

As indicated by Eq. (17), minimizing the inverse modeling error E in Eq. (11) is
equivalent to minimizing the averaged prediction error e2 [n] . If the signal s[n] were
produced by the model system, and if v[n] is an impulse, and if s[n] truly fits the all-pole
model, then the signal at any n > 0 is linearly predictable from past values, i.e., the
prediction error is zero. If v[n] is white noise, then the prediction error is white.
The interpretation in terms of prediction is depicted in Figure 3, where the transfer
function of the prediction filter P(z) is


p
P(z) = ak z−k . (22)
k=1

This system is referred to as the pth -order linear predictor for the signal s[n]. Its output
is


p
s̃[n] = ak s[n − k], (23)
k=1

and as Figure 3 shows, the prediction error signal is e[n] = s[n] − s̃[n]. The sequence
e[n] represents the amount by which the linear predictor fails to exactly predict the
signal s[n]. For this reason, e[n] is also sometimes called the prediction error residual or
simply the residual. With this point of view, the coefficients ak are called the prediction
coefficients. As is also shown in Figure 3, the prediction error filter is related to the linear
predictor by


p
A(z) = 1 − P(z) = 1 − ak z−k . (24)
k=1

s[n] e[n]


Linear ~
s[n]
Predictor
P(z)
Figure 3 Linear prediction formulation
A(z) for all-pole signal modeling.

937
Parametric Signal Modeling

2 DETERMINISTIC AND RANDOM SIGNAL MODELS

To use the optimum inverse filter or equivalently the optimum linear predictor as a basis
for parametric signal modeling, it is necessary to be more specific about the assumed
input v[n] and about the method of computing the averaging operator ·. To this end,
we consider separately the case of deterministic signals and the case of random signals.
In both cases, we will use averaging operations that assume knowledge of the signal to
be modeled over all time −∞ < n < ∞. In Section 3, we discuss some of the practical
considerations when only a finite-length segment of the signal s[n] is available.

2.1 All-Pole Modeling of Finite-Energy Deterministic


Signals
In this section, we assume an all-pole model that is causal and stable and also that both
the input v[n] and the signal s[n] to be modeled are zero for n < 0. We further assume
that s[n] has finite energy and is known for all n ≥ 0. We choose the operator · in
Eq. (11) as the total energy in the modeling error sequence ê[n], i.e.,
  ∞

E = |ê[n]|2 = |ê[n]|2 . (25)
n=−∞

With this definition of the averaging operator, φss [i, k] in Eq. (19) is given by


φss [i, k] = s[n − i]s[n − k], (26)
n=−∞

and equivalently,


φss [i, k] = s[n]s[n − (i − k)]. (27)
n=−∞

The coefficients φss [i, k] in Eq. (20) are now


φss [i, k] = rss [i − k], (28)
where for real signals s[n], rss [m] is the deterministic autocorrelation function

 ∞

rss [m] = s[n + m]s[n] = s[n]s[n − m]. (29)
n=−∞ n=−∞

Therefore, Eq. (20) takes the form



p
ak rss [i − k] = rss [i] i = 1, 2, . . . , p. (30)
k=1

These equations are called the autocorrelation normal equations and also the Yule–
Walker equations. They provide a basis for computing the parameters a1 , . . . , ap from
the autocorrelation function of the signal. In Section 2.5, we discuss an approach to
choosing the gain factor G.

938
Parametric Signal Modeling

w[n] LTI ˆ
s[n]
System
white H(z) Figure 4 Linear system model for a
noise random signal s[n].

2.2 Modeling of Random Signals


For all-pole modeling of zero-mean, wide-sense stationary, random signals, we assume
that the input to the all-pole model is zero-mean, unit-variance, white noise as indicated
in Figure 4. The difference equation for this system is


p
ŝ[n] = ak ŝ[n − k] + Gw[n], (31)
k=1

where the input has autocorrelation function E{w[n + m]w[n]} = δ[m], zero mean
(E{w[n]} = 0), and unit average power (E{(w[n])2 } = δ[0] = 1), with E{·} representing
the expectation or probability average operator.3
The resulting model for analysis is the same as that depicted in Figure 2, but the
desired output g[n] changes. In the case of random signals, we want to make g[n] as
much like a white-noise signal as possible, rather than the unit sample sequence that
was desired in the deterministic case. For this reason, the optimal inverse filter for
random signals is often referred to as a whitening filter.
We also choose the operator · in Eq. (11) as an appropriate one for random
signals, specifically the mean-squared value or equivalently the average power. Then
Eq. (11) becomes

E = E{(ê[n])2 }. (32)

If s[n] is assumed to be a sample function of a stationary random process, then φss [i, k]
in Eq. (19) would be the autocorrelation function

φss [i, k] = E{s[n − i]s[n − k]} = rss [i − k]. (33)

The system coefficients can be found as before from Eq. (20). Thus, the system
coefficients satisfy a set of equations of the same form as Eq. (30), i.e.,


p
ak rss [i − k] = rss [i], i = 1, 2, . . . , p. (34)
k=1

Therefore, modeling random signals again results in the Yule–Walker equations, with
the autocorrelation function in this case being defined by the probabilistic average

rss [m] = E {s[n + m]s[n]} = E {s[n]s[n − m]} . (35)

3 Computation of E{·} requires knowledge of the probability densities. In the case of stationary random
signals, only one density is required. In the case of ergodic random processes, a single infinite time average
could be used. In practical applications, however, such averages must be approximated by estimates obtained
from finite time averages.

939
Parametric Signal Modeling

2.3 Minimum Mean-Squared Error


For modeling of either deterministic signals (Section 2.1) or random signals (Section
2.2) the minimum value of the prediction error e[n] in Figure 3 can be expressed in
terms of the corresponding correlation values in Eq. (20) to find the optimum predictor
coefficients. To see this, we write E as
 2

p
E= s[n] − ak s[n − k] . (36)
k=1

As outlined in more detail in Problem 2, if Eq. (36) is expanded, and Eq. (20) is substi-
tuted into the result, it follows that in general,


p
E = φss [0, 0] − ak φss [0, k]. (37)
k=1

Equation (37) is true for any appropriate choice of the averaging operator. In particular,
for averaging definitions for which φss [i, k] = rss [i − k], Eq. (37) becomes


p
E = rss [0] − ak rss [k]. (38)
k=1

2.4 Autocorrelation Matching Property


An important and useful property of the all-pole model resulting from the solution of
Eq. (30) for deterministic signals and Eq. (34) for random signals is referred to as the
autocorrelation matching property (Makhoul, 1973). Equations (30) and (34) represent
a set of p equations to be solved for the model parameters ak for k = 1, . . . , p. The
coefficients in these equations on both the left- and right-hand sides of the equations are
comprised of the (p+1) correlation values rss [m], m = 0, 1, . . . , p, where the correlation
function is appropriately defined, depending on whether the signal to be modeled is
deterministic or random.
The basis for verifying the autocorrelation matching property is to observe that the
signal ŝ[n] obviously fits the model when the model system H(z) in Figure 1 is specified
as the all-pole system in Eq. (2). If we were to consider again applying all-pole modeling
to ŝ[n], we would of course again obtain Eqs. (30) or (34), but this time, with rŝŝ [m] in
place of rss [m]. The solution must again be the same parameter values ak , k = 1, 2, . . . , p,
since ŝ[n] fits the model, and this solution will result if

rss [m] = crŝŝ [m] 0 ≤ m ≤ p, (39)

where c is any constant. The fact that the equality in Eq. (39) is required follows from the
form of the recursive solution of theYule–Walker equations as developed in Section 6. In
words, the autocorrelation normal equations require that for the lags |m| = 0, 1, . . . , p
the autocorrelation functions of the model output and the signal being modeled are
proportional.

940
Parametric Signal Modeling

2.5 Determination of the Gain Parameter G


With the approach that we have taken, determination of the optimal choice for the
coefficients ak of the model does not depend on the system gain G. From the perspective
of the inverse filtering formulation in Figure 2, one possibility is to choose G so that
(ŝ[n])2 = (s[n])2 . For finite-energy deterministic signals, this corresponds to matching
the total energy in the model output to the total energy in the signal that is being
modeled. For random signals, it is the average power that is matched. In both cases, this
corresponds to choosing G, so that rŝŝ [0] = rss [0]. With this choice, the proportionality
factor c in Eq. (39) is unity.

Example 1 1st -Order System


Figure 5 shows two signals, both of which are outputs of a 1st -order system with system
function
1
H(z) = . (40)
1 − αz−1
The signal sd [n] = h[n] = αn u[n] is the output when the input is a unit impulse δ[n],
while the signal sr [n] is the output when the input to the system is a zero mean, unit
variance white-noise sequence. Both signals extend over the range −∞ < n < ∞, as
suggested by Figure 5.

sd[n]

sr[n]

Figure 5 Examples of deterministic and random outputs of a 1st -order all-pole


system.

The autocorrelation function for the signal sd [n] is



 α|m|
rsd sd [m] = rhh [m] = αn+m αn = , (41)
1 − α2
n=0
the autocorrelation function of sr [n] is also given by Eq. (41) since sr [n] is the response
of the system to white noise, for which the autocorrelation function is a unit impulse.
Since both signals were generated with a 1st -order all-pole system, a 1st -order
all-pole model will be an exact fit. In the deterministic case, the output of the optimum

941
Parametric Signal Modeling

inverse filter will be a unit impulse, and in the random signal case, the output of the
optimum inverse filter will be a zero-mean white-noise sequence with unit average
power. To show that the optimum inverse filter will be exact, note that for a 1st -order
model, Eqs. (30) or (34) reduce to
rsd sd [0]a1 = rsd sd [1], (42)
so from Eq. (41), it follows that the optimum predictor coefficient for both the deter-
ministic and the random signal is
α
rsd sd [1] 1 − α2
a1 = = = α. (43)
rsd sd [0] 1
1 − α2
From Eq. (38), the minimum mean-squared error is
1 α 1 − α2
E= − a1 = = 1, (44)
1−α2 1−α2 1 − α2
which is the size of the unit impulse in the deterministic case and the average power
of the white-noise sequence in the random case.

As mentioned earlier, and as is clear in this example, when the signal is generated
by an all-pole system excited by either an impulse or white noise, all-pole modeling can
determine the parameters of the all-pole system exactly. This requires prior knowledge
of the model order p and the autocorrelation function. This was possible to obtain
for this example, because a closed-form expression was available for the infinite sum
required to compute the autocorrelation function. In a practical setting, it is generally
necessary to estimate the autocorrelation function from a finite-length segment of the
given signal. Problem 14 considers the effect of finite autocorrelation estimates (to be
discussed next) for the deterministic signal sd [n] of this section.

3 ESTIMATION OF THE CORRELATION FUNCTIONS

To use the results of Sections 1 and 2 for modeling of either deterministic or random sig-
nals, we require apriori knowledge of the correlation functions φss [i, k] that are needed
to form the system equations satisfied by the coefficients ak , or we must estimate these
from the given signal. Furthermore,we may want to apply block processing or short-time
analysis techniques to represent the time-varying properties of a nonstationary signal,
such as speech. In this section, we will discuss two distinct approaches to the computa-
tion of the correlation estimates for practical application of the concepts of parametric
signal modeling. These two approaches have come to be known as the autocorrelation
method and the covariance method.

3.1 The Autocorrelation Method


Suppose that we have available a set of M + 1 signal samples s[n] for 0 ≤ n ≤ M, and we
wish to compute the coefficients for an all-pole model. In the autocorrelation method,
it is assumed that the signal ranges over −∞ < n < ∞, with the signal samples taken to

942
Parametric Signal Modeling

be zero for all n outside the interval 0 ≤ n ≤ M, even if they have been extracted from
a longer sequence. This, of course, imposes a limit to the exactness that can be expected
of the model, since the IIR impulse response of an all-pole model will be used to model
the finite-length segment of s[n].
Although the prediction error sequence need not be computed explicitly to solve
for the filter coefficients, it is nevertheless informative to consider its computation in
some detail. The impulse response of the prediction error filter is, by the definition of
A(z), in Eq. (24),

p
hA [n] = δ[n] − ak δ[n − k]. (45)
k=1

It can be seen that since the signal s[n] has finite length M + 1 and hA [n], the impulse
response of the prediction filter A[z], has length p + 1, the prediction error sequence
e[n] = hA [n] ∗ s[n] will always be identically zero outside the interval 0 ≤ n ≤ M + p.
Figure 6 shows an example of the prediction error signal for a linear predictor with
p = 5. In the upper plot, hA [n − m] the (time-reversed and shifted) impulse response
of the prediction error filter, is shown as a function of m for three different values of n.
The dark lines with square dots depict hA [n − m], and the lighter lines with round dots
show the sequence s[m] for 0 ≤ m ≤ 30. On the left side is hA [0 − m], which shows
that the first nonzero prediction error sample is e[0] = s[0]. This, of course, is consistent
with Eq. (9). On the extreme right is hA [M + p − m], which shows that the last nonzero
error sample is e[M + p] = −ap s[M]. The second plot in Figure 6 shows the error signal
e[n] for 0 ≤ n ≤ M + p. From the point of view of linear prediction, it follows that the
first p samples (dark lines and dots) are predicted from samples that are assumed to
be zero. Similarly, the samples of the input for n ≥ M + 1 are assumed to be zero to
obtain a finite-length signal. The linear predictor attempts to predict the zero samples

s[m]
hA[0 − m] hA[n − m] hA[M + p − m]

m
−p n M M+p

p
e[n] = hA[n] * s[n] = s[n] −
Σ a s[n − k]
k=1
k

n
p
M M+p

Figure 6 Illustration (for p = 5) of computation of prediction error for the au-


tocorrelation method. (Square dots denote samples of hA [n − m] and light round
dots denote samples of s[m] for the upper plot and e[n] for the lower plot.)

943
Parametric Signal Modeling

s[n], s[n + m]

↑ ↑ n
−m ↑ M
M−m

Figure 7 Illustration of computation of the autocorrelation function for a finite-


length sequence. (Square dots denote samples of s[n + m], and light round dots
denote samples of s[n].)

in the interval M + 1 ≤ n ≤ M + p from prior samples that are nonzero and part of the
original signal. Indeed, if s[0] = 0 and s[M] = 0, then it will be true that both e[0] = s[0]
and e[M + p] = −ap s[M] will be nonzero. That is, the prediction error (total-squared
error E) can never be exactly zero if the signal is defined to be zero outside the interval
0 ≤ n ≤ M. Furthermore, the total-squared prediction error for a pth -order predictor
would be
  ∞
 
M+p
E (p) = e[n]2 = e[n]2 = e[n]2 , (46)
n=−∞ n=0

i.e., the limits of summation can be infinite for convenience, but practically speaking,
they are finite.
When the signal is assumed to be identically zero outside the interval 0 ≤ n ≤ M,
the correlation function φss [i, k] reduces to the autocorrelation function rss [m] where the
values needed in Eq. (30) are for m = |i − k|. Figure 7 shows the shifted sequences used
in computing rss [m] with s[n] denoted by round dots and s[n + m] by square dots. Note
that for a finite-length signal, the product s[n]s[n + m] is nonzero only over the interval
0 ≤ n ≤ M −m when m ≥ 0. Since rss is an even function, i.e., rss [−m] = rss [m] = rss [|m|]
it follows that the autocorrelation values needed for the Yule–Walker equations can be
computed as,

 
M−|m|
rss [|m|] = s[n]s[n + |m|] = s[n]s[n + |m|]. (47)
n=−∞ n=0

For the finite-length sequence s[n], Eq. (47) has all the necessary properties of an auto-
correlation function and rss [m] = 0 for m > M. But of course rss [m] is not the same as
the autocorrelation function of the infinite length signal from which the segment was
extracted.
Equation (47) can be used to compute estimates of the autocorrelation func-
tion for either deterministic or random signals.4 Often, the finite-length input signal
is extracted from a longer sequence of samples. This is the case, for example, in ap-
plications to speech processing, where voiced segments (e.g., vowel sounds) of speech
are treated as deterministic and unvoiced segments (fricative sounds) are treated as
4 In the context of random signals, Eq. (47) is a biased estimate of the autocorrelation function. When
p M as is often the case, this statistical bias is generally negligible.

944
Parametric Signal Modeling

random signals.5 According to the previous discussion, the first p and last p samples of
the prediction error can be large due to the attempt to predict nonzero samples from
zero samples and to predict zero samples from nonzero samples. Since this can bias the
estimation of the predictor coefficients, a signal-tapering window, such as a Hamming
window is generally applied to the signal before computation of the autocorrelation
function.

3.2 The Covariance Method


An alternative choice for the averaging operator for the prediction error for a pth -order
predictor is
  M
Ecov
(p)
= (e[n])2 = (e[n])2 . (48)
n=p

As in the autocorrelation method, the averaging is over a finite interval (p ≤ n ≤ M),


but the difference is that the signal to be modeled is known over the larger interval
0 ≤ n ≤ M. The total-squared prediction error only includes values of e[n] that can be
computed from samples within the interval 0 ≤ n ≤ M. Consequently, the averaging
takes place over a shorter interval p ≤ n ≤ M. This is significant, since it relieves
the inconsistency between the all-pole model and the finite-length signal.6 In this case,
we only seek to match the signal over a finite interval rather than over all n as in
the autocorrelation method. The upper plot in Figure 8 shows the same signal s[m] as

s[m]
hA[p − m] hA[n − m] hA[M − m]

m
p n M

p
e[n] = hA[n] * s[n] = s[n] − Σ a s[n − k]
k=1
k

p M n

Figure 8 Illustration (for p = 5) of computation of prediction error for the co-


variance method. (In upper plot, square dots denote samples of hA [n − m], and
light round dots denote samples of s[m].)

5 In both cases, the deterministic autocorrelation function in Eq. (47) is used as an estimate.
6The definitions of total-squared prediction error in Eqs. (48) and (46) are distinctly different, so we
use the subscript cov to distinguish them.

945
Parametric Signal Modeling

s[n − i], s[n − k]

n
i k p
M

Figure 9 Illustration of computation of covariance function for a finite-length


sequence. (Square dots denote samples of s[n − k ] and light round dots denote
samples of s[n − i ].)

in the upper part of Figure 6, but in this case, the prediction error is only computed
over the interval p ≤ n ≤ M as needed in Eq. (48). As shown by the prediction error
filter impulse responses hA [n − m] in the upper plot, there are no end effects when the
prediction error is computed in this way, since all the signal samples needed to compute
the prediction error are available. Because of this, it is possible for the prediction error
to be precisely zero over the entire interval p ≤ n ≤ M, if the signal from which the
finite length segment was extracted was generated as the output of an all-pole system.
Seen another way, if s[n] is the output of an all-pole system with an input that is zero for
n > 0, then as seen from Eqs. (9) and (10) the prediction error will be zero for n > 0.
The covariance function inherits the same definition of the averaging operator,
i.e.,

M
φss [i, k] = s[n − i]s[n − k]. (49)
n=p

The shifted sequences s[n − i] (light lines and round dots) and s[n − k] (dark lines and
square dots) are shown in Figure 9. This figure shows that since we need φss [i, k] only
for i = 0, 1, . . . , p and k = 1, 2, . . . , p, the segment s[n] for 0 ≤ n ≤ M contains all the
samples that are needed to compute φss [i, k] in Eq. (49).

3.3 Comparison of Methods


The autocorrelation and covariance methods have many similarities, but there are many
important differences in the methods and the resulting all-pole models. In this section,
we summarize some of the differences that we have already demonstrated and call
attention to some others.

Prediction Error
Both the averaged prediction error e2 [n] and averaged modeling error ê2 [n] are
nonnegative and nonincreasing with increasing model order p. In the autocorrelation
method based on estimates obtained from finite-length signals, the averaged modeling
or prediction error will never be zero, because the autocorrelation values will not be ex-
act. Furthermore, the minimum value of the prediction error even with an exact model
is Gv[n] as indicated by Eq. (10). In the covariance method, the prediction error for
n > 0 can be exactly zero if the original signal was generated by an all-pole model. This
will be demonstrated in Example 2.

946
Parametric Signal Modeling

Equations for Predictor Coefficients


In both methods, the predictor coefficients that minimize the averaged prediction error
satisfy a general set of linear equations expressed in matrix form as a = ψ. The
coefficients of the all-pole model are obtained by inverting the matrix ; i.e., a = −1 ψ.
In the covariance method, the elements φss [i, k] of the matrix  are computed using
Eq. (49). In the autocorrelation method, the covariance values become autocorrelation
values,i.e.,φss [i, k] = rss [|i−k|] and are computed using Eq. (47). In both cases,the matrix
 is symmetric and positive-definite, but in the autocorrelation method, the matrix 
is also a Toeplitz matrix. This implies numerous special properties of the solution, and
it implies that the solution of the equations can be done more efficiently than would
be true in general. In Section 6, we will explore some of these implications for the
autocorrelation method.
Stability of the Model System
The prediction error filter has a system function A(z) that is a polynomial in z−1 . There-
fore, it can be represented in terms of its zeros as

p p
A(z) = 1 − ak z−k = (1 − zk z−1 ). (50)
k=1 k=1
In the autocorrelation method, the zeros of the prediction error filter A(z) are
guaranteed to lie strictly within the unit circle of the z plane; i.e., |zk | < 1. This means
that the poles of the causal system function H(z) = G/A(z) of the model lie inside the
unit circle, which implies that the model system is stable. A simple proof of this assertion
is given by Lang and McClellan (1979) and McClellan (1988). Problem 10 develops a
proof that depends on the lattice filter interpretation of the prediction error system to
be discussed in Section 7.1. In the covariance method as we have formulated it, no such
guarantee can be given.

4 MODEL ORDER

An important issue in parametric signal modeling is the model order p, the choice of
which has a major impact on the accuracy of the model. A common approach to choosing
p is to examine the averaged prediction error (often referred to as the residual) from
(p)
the optimum pth -order model. Let ak be the parameters for the optimal pth -order
predictor found using Eq. (30). The prediction error energy for the pth -order model
using the autocorrelation method is7

 2
 p
(p)
E =
(p)
s[n] − ak s[n − k] . (51)
n=−∞ k=1

For the zeroth -orderpredictor, (p = 0), there are no delay terms in Eq. (51), i.e., the
“predictor” is just the identity system so e[n] = s[n]. Consequently, for p = 0,


E (0) = s2 [n] = rss [0]. (52)
n=−∞

7 Recall that E (p) denotes the total-squared prediction error for the covariance method, while we use
cov
E (p) with no subscript to denote the total-squared prediction error for the autocorrelation method.

947
Parametric Signal Modeling

Plotting the normalized mean-squared prediction error V (p) = E (p) /E (0) as a func-
tion of p shows how increasing p changes this error energy. In the autocorrelation
method, we showed that the averaged prediction error can never be precisely zero,
even if the signal s[n] was generated by an all-pole system, and the model order is the
same as the order of the generating system. In the covariance method, however, if the
(p)
all-pole model is a perfect model for the signal s[n], Ecov will become identically zero
at the correct choice of p, since the averaged prediction error only considers values
for p ≤ n ≤ M. Even if s[n] is not perfectly modeled by an all-pole system, there is
often a value of p above which increasing p has little or no effect on either V (p) or
(p) (p) (0)
Vcov = Ecov /Ecov . This threshold is an efficient choice of model order for representing
the signal as an all-pole model.

Example 2 Model Order Selection


To demonstrate the effect of model order, consider a signal s[n] generated by exciting
a 10th -order system
0.6
H(z) = (53)
(1 − 1.03z−1 + 0.79z−2 − 1.34z−3 + 0.78z−4 − 0.92z−5
+1.22z−6 − 0.43z−7 + 0.6z−8 − 0.29z−9 − 0.23z−10 )
with an impulse v[n] = δ[n]. The samples of s[n] for 0 ≤ n ≤ 30 are shown as the
sequence in the upper plots in Figures 6 and 8. This signal was used as the signal
to be modeled by an all-pole model with both the autocorrelation method and the
covariance method. Using the 31 samples of s[n], the appropriate autocorrelation and
covariance values were computed and the predictor coefficients computed by solving
Eqs. (30) and (34) respectively. The normalized mean-squared prediction errors are
plotted in Figure 10. Note that in both the autocorrelation and covariance methods
the normalized error decreases abruptly at p = 1 in both plots, then decreasing more
slowly as p increases. At p = 10, the covariance method gives zero error, while the
autocorrelation method gives a nonzero averaged error for p ≥ 10. This is consistent
with our discussion of the prediction error in Section 3.

1
Autocorrelation Method
Covariance Method
prediction error V(p) and Vcov
(p)

0.8
Normalized mean-squared

0.6

0.4

0.2

0
0 2 4 6 8 10 12 14 16 18 20
Model order p

Figure 10 Normalized mean-squared prediction error V (p ) as a function of model


order p in Example 2.

948
Parametric Signal Modeling

While Example 2 is an ideal simulation, the general nature of the dependence of


averaged prediction error as a function of p is typical of what happens when all-pole
modeling is applied to sampled signals. The graph of V (p) as a function of p tends to
flatten out at some point, and that value of p is often selected as the value to be used
in the model. In applications such as speech analysis, it is possible to choose the model
order based on physical models for the production of the signal to be modeled. (See
Rabiner and Schafer, 1978.)

5 ALL-POLE SPECTRUM ANALYSIS

All-pole signal modeling provides a method of obtaining high-resolution estimates of a


signal’s spectrum from truncated or windowed data. The use of parametric signal mod-
eling in spectrum analysis is based on the fact that if the data fits the model, then a finite
segment of the data can be used to determine the model parameters and, consequently,
also its spectrum. Specifically, in the deterministic case
|Ŝ(ejω )|2 = |H(ejω )|2 |V(ejω )|2 = |H(ejω )|2 (54)
since |V(ejω )|2 = 1 for a unit impulse excitation to the model system. Likewise, for
random signals the power spectrum of the output of the model is
Pŝŝ (ejω ) = |H(ejω )|2 Pww (ejω ) = |H(ejω )|2 , (55)
since Pww (ejω ) = 1 for the white-noise input. Thus, we can obtain an estimate of the
spectrum of a signal s[n] by computing an all-pole model for the signal and then com-
puting the magnitude-squared of the frequency response of the model system. For both
the deterministic and random cases, the spectrum estimate takes the form
2
G
Spectrum estimate = |H(ejω )|2 = . (56)

p
1− ak e−jωk
k=1

To obtain an understanding of the nature of the spectrum estimate in Eq. (56) for
the deterministic case, it is useful to recall that the DTFT of the finite-length signal s[n]
is

M
S(ejω ) = s[n]e−jωn . (57)
n=0

Furthermore, note that



M−|m|  π
1
rss [m] = s[n + m]s[n] = |S(ejω )|2 ejωm dω, (58)
2π −π
n=0

where, due to the finite length of s[n], rss [m] = 0 for |m| > M. The values of rss [m]
for m = 0, 1, 2, . . . , p are used in the computation of the all-pole model using the
autocorrelation method. Thus, it is reasonable to suppose that there is a relationship

949
Parametric Signal Modeling

between the Fourier spectrum of the signal, |S(ejω )|2 , and the all-pole model spectrum,
|Ŝ(ejω )|2 = |H(ejω )|2 .
One approach to illuminating this relationship is to obtain an expression for
the averaged prediction error in terms of the DTFT of the signal s[n]. Recall that
the prediction error is e[n] = hA [n] ∗ s[n], where hA [n] is the impulse response of the
prediction error filter. From Parseval’s Theorem, the averaged prediction error is


M+p  π
1
E= (e[n])2 = |S(ejω )|2 |A(ejω )|2 dω, (59)
2π −π
n=0

where S(ejω ) is the DTFT of s[n] as given by Eq. (57). Since H(z) = G/A(z), Eq. (59)
can be expressed in terms of H(ejω ) as

G2 π |S(ejω )|2
E= dω. (60)
2π −π |H(ejω )|2
Since the integrand in Eq. (60) is positive, and |H(ejω )|2 > 0 for −π < ω ≤ π, it
therefore follows from Eq. (60) that minimizing E is equivalent to minimizing the ratio
of the energy spectrum of the signal s[n] to the magnitude-squared of the frequency
response of the linear system in the all-pole model. The implication of this is that
the all-pole model spectrum will attempt to match the energy spectrum of the sig-
nal more closely at frequencies where the signal spectrum is large, since frequencies
where |S(ejω )|2 > |H(ejω )|2 contribute more to the mean-squared error than frequen-
cies where the opposite is true. Thus, the all-pole model spectrum estimate favors a
good fit around the peaks of the signal spectrum. This will be illustrated by the discus-
sion in Section 5.1. Similar analysis and reasoning also applies to the case in which s[n]
is random.

5.1 All-Pole Analysis of Speech Signals


All-pole modeling is widely used in speech processing both for speech coding, where the
term linear predictive coding (LPC) is often used, and for spectrum analysis. (See Atal
and Hanauer, 1971, Makhoul, 1975, Rabiner and Schafer, 1978, and Quatieri, 2002.)
To illustrate many of the ideas discussed in this chapter, we discuss in some detail
the use of all-pole modeling for spectrum analysis of speech signals. This method is
typically applied in a time-dependent manner by periodically selecting short segments
of the speech signal for analysis in much the same way as is done in time-dependent
Fourier analysis. Since the time-dependent Fourier transform is essentially a sequence
of DTFTs of finite-length segments, the above discussion of the relationship between the
DTFT and the all-pole spectrum characterizes the relationship between time-dependent
Fourier analysis and time-dependent all-pole model spectrum analysis, as well.
Figure 11 shows a 201-point Hamming-windowed segment of a speech signal s[n]
in the top panel and the corresponding autocorrelation function rss [m] below. During
this time interval, the speech signal is voiced (vocal cords vibrating), as evidenced by the
periodic nature of the signal. This periodicity is reflected in the autocorrelation function
as the peak at about 27 samples (27/8 = 3.375 ms for 8 kHz sampling rate) and integer
multiples thereof.

950
Parametric Signal Modeling

0.5

−0.5
0 50 100 150 200
Samples number (8 kHz sampling rate)
(a)

−5
0 12 27 40 50 100 150 200
Lag number (8 kHz sampling rate)
(b)

Figure 11 (a) Windowed voiced speech waveform. (b) Corresponding auto-


correlation function (samples connected by straight lines).

When applying all-pole modeling to voiced speech, it is useful to think of the


signal as being deterministic, but with an excitation function that is a periodic train of
impulses. This accounts for the periodic nature of the autocorrelation function when
several periods of the signal are included in the window as in Figure 11(a).
Figure 12 shows a comparison of the DTFT of the signal in Figure 11(a) with
spectra computed from all-pole modeling with two different model orders and using
the autocorrelation function in Figure 11(b). Note that the DTFT of s[n] shows peaks at
multiples of the fundamental frequency F0 = 8 kHz/27 = 296 Hz, as well as many other
less prominent peaks and dips that can be attributed to windowing effects. If the first
13 samples of rss [m] in Figure 11(b) are used to compute an all-pole model spectrum
(p = 12), the result is the smooth curve shown with the heavy line in Figure 12(a). With
the filter order as 12 and the fundamental period of 27 samples, this spectrum estimate in
effect ignores the spectral structure owing to the periodicity of the signal and produces
a much smoother spectrum estimate. If 41 values of rss [m] are used, however, we obtain
the spectrum plotted with the thin line. Since the period of the signal is 27, a value of
p = 40 includes the periodicity peak in the autocorrelation function and thus, the all-
pole spectrum tends to represent much of the fine detail in the DTFT spectrum. Note
that both cases support our assertion above that the all-pole model spectrum estimate
tends to favor good representation at the peaks of the DTFT spectrum.

951
Parametric Signal Modeling

40
DTFT p = 12 p = 40

20

log magnitude (dB) 0

−20

−40

−60
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency in kHz
(a)

1
V (p) = —––
E(p)
E(o)

0. 5

0
0 5 10 15 20 25 30 35 40
p
(b)

Figure 12 (a) Comparison of DTFT and all-pole model spectra for voiced speech
segment in Figure 11(a). (b) Normalized prediction error as a function of p.

This example illustrates that the choice of the model order p controls the de-
gree of smoothing of the DTFT spectrum. Figure 12(b) shows that as p increases, the
mean-squared prediction error decreases quickly and then levels off, as in our previ-
ous example. Recall that in Sections 2.4 and 2.5, we argued that the all-pole model
with appropriately chosen gain results in a match between the autocorrelation func-
tions of the signal and the all-pole model up to p correlation lags as in Eq. (39).
This implies that as p increases, the all-pole model spectrum will approach the DTFT
spectrum, and when p → ∞, it follows that rhh [m] = rss [m] for all m, and therefore,
|H(ejω )|2 = |S(ejω )|2 . However, this does not mean that H(ejω ) = S(ejω ) because H(z)
is an IIR system, and S(z) is the z-transform of a finite-length sequence. Also note
that as p → ∞, the averaged prediction error does not approach zero, even though
|H(ejω )|2 → |S(ejω )|2 . As we have discussed, this occurs because the total error in
Eq. (11) is the prediction error ẽ[n] minus Gv[n]. Said differently, the linear predic-
tor must always predict the first nonzero sample from the zero-valued samples that
precede it.

952
Parametric Signal Modeling

0.1

0.05

−0.05

−0.1
0 50 100 150 200
Sample number (8 kHz sampling rate)
(a)

0.1

−0.1
0 12 40 50 100 150 200
Lag number (8 kHz sampling rate)
(b)

Figure 13 (a) Windowed unvoiced speech waveform. (b) Corresponding auto-


correlation function (samples connected by straight lines).

The other main class of speech sounds is comprised of the unvoiced sounds such
as fricatives. These sounds are produced by creating random turbulent air flow in the
vocal tract; therefore, they are best modeled in terms of an all-pole system excited by
white noise. Figure 13 shows an example of a 201-point Hamming-windowed segment of
unvoiced speech and its corresponding autocorrelation function. Note that the autocor-
relation function shows no indication of periodicity in either the signal waveform or the
autocorrelation function. A comparison of the DTFT of the signal in Figure 13(a) with
two all-pole model spectra computed from the autocorrelation function in Figure 13(b)
is shown in Figure 14(a). From the point of view of spectrum analysis of random signals,
the magnitude-squared of the DTFT is a periodogram. Thus, it contains a component
that is randomly varying with frequency. Again, by choice of the model order, the peri-
odogram can be smoothed to any desired degree.

5.2 Pole Locations


In speech processing, the poles of the all-pole model have a close relationship to the
resonance frequencies of the vocal tract, thus, it is often useful to factor the polynomial
A(z) to obtain its zeros for representation as in Eq. (50). As discussed in Section 3.3, the
zeros zk of the prediction error filter are the poles of the all-pole model system function.
It is the poles of the system function that are responsible for the peaks in the spectrum
estimates discussed in Section 5.1. The closer a pole is to the unit circle, the more peaked
is the spectrum for frequencies close to the angle of the pole.

953
Parametric Signal Modeling

20
DTFT p = 12 p = 40
10

log magnitude (dB) −10

−20

−30

−40

−50

−60
0 500 1000 1500 2000 2500 3000 3500 4000
Frequency in kHz
(a)

1
V (p) = —––
E(p)
E(o)

0.5

0
0 5 10 15 20 25 30 35 40
p
(b)

Figure 14 (a) Comparison of DTFT and all-pole model spectra for unvoiced speech
segment in Figure 13(a). (b) Normalized prediction error as a function of p.

Figure 15 shows the zeros of the prediction error system function A(z) (poles of
the model system) for the two spectrum estimates in Figure 12(a). For p = 12, the
zeros of A(z) are denoted by the open circles. Five complex conjugate pairs of zeros are
close to the unit circle, and their manifestations as poles are clearly evident in heavy
line curve of Figure 12(a). For the case p = 40, the zeros of A(z) are denoted by the
large filled dots. Observe that most of the zeros are close to the unit circle, and they
are more or less evenly distributed around the unit circle. This produces the peaks
in the model spectrum that are spaced approximately at multiples of the normalized
radian frequency corresponding to the fundamental frequency of the speech signal; i.e.,
at angles 2π(296 Hz)/8 kHz.

954
Parametric Signal Modeling

p = 12 p = 40

0.5

Imaginary Part 0

−0.5

−1 Figure 15 Zeros of prediction error


filters (poles of model systems) used to
−1 −0.5 0 0.5 1 obtain the spectrum estimates in
Real Part Figure 12.

5.3 All-Pole Modeling of Sinusoidal Signals


As another important example, we consider the use of the poles of an all-pole model to
estimate frequencies of sinusoidal signals. To see why this is possible, consider the sum
of two sinusoids
s[n] = [A1 cos(ω1 n + θ1 ) + A2 cos(ω2 n + θ2 )] u[n]. (61)
The z-transform of s[n] has the form
b0 + b1 z−1 + b2 z−2 + b3 z−3
S(z) = . (62)
(1 − e 1 z )(1 − e−jω1 z−1 )(1 − ejω2 z−1 )(1 − e−jω2 z−1 )
jω −1

That is, the sum of two sinusoids can be represented as the impulse response of an LTI
system whose system function has both poles and zeros. The numerator polynomial
would be a somewhat complicated function of the amplitudes, frequencies, and phase
shifts. What is important for our discussion is that the numerator is a 3rd -order poly-
nomial and the denominator is a 4th -order polynomial, the roots of which are all on
the unit circle at angles equal to ±ω1 and ±ω2 . The difference equation describing this
system with impulse excitation has the form

4 
3
s[n] − ak s[n − k] = bk δ[n − k] (63)
k=1 k=1

where the coefficients ak would result from multiplying the denominator factors. Note
that

4
s[n] − ak s[n − k] = 0 for n ≥ 4, (64)
k=1

which suggests that the signal s[n] can be predicted with no error by a 4th -order predictor
except at the very beginning (0 ≤ n ≤ 3). The coefficients of the denominator can be

955
Parametric Signal Modeling

50

Amplitude
0

−50
0 10 20 30 40 50 60 70 80 90 100
Time index n
(a) Sinusoidal signal
120
DTFT
100 Autocorrelation method
Log magnitude in dB

Covariance method
80 True frequencies

60

40

20

0
0.15 0.2 0.22 0.25 0.3
Frequency ␻Ⲑ␲
(b) Spectrum estimates

Figure 16 Spectrum estimation for a sinusoidal signal.

estimated from the signal by applying the covariance method to a short segment of the
signal selected so as not to include the first four samples. In the ideal case for which
Eq. (61) accurately represents the signal (e.g., high SNR), the roots of the resulting
polynomial provide good estimates of the frequencies of the component sinusoids.
Figure 16(a) shows a plot of 101 samples of the signal8
s[n] = 20 cos(0.2πn − 0.1π) + 22 cos(0.22πn + 0.9π). (65)
Because the two frequencies are close together, it is necessary to use a large number of
samples to resolve the two frequencies by Fourier analysis. However, since the signal fits
the all-pole model perfectly, the covariance method can be used to obtain very accurate
estimates of the frequencies from very short segments of the signal. This is illustrated
in Figure 16(b).
The DTFT of the 101 samples (with rectangular window) shows no indication that
there are two distinct sinusoid frequencies around ω = 0.21π. Recall that the main lobe
width for an (M + 1)-point rectangular window is ω = 4π/(M + 1). Consequently, a
101-point rectangular window can clearly resolve two frequencies only if they are no
closer than about .04π rad/s. Correspondingly, the DTFT does not show two spectral
peaks.
Similarly, use of the autocorrelation method results in the spectrum estimate
shown by the heavy line. This estimate also contains only one spectral peak. The predic-
8The tapering of the segment of the signal in Figure 16(a) is not a result of windowing. It is caused by
the “beating” of the two cosines of nearly the same frequency. The period of the beat frequency (difference
between 0.22π and 0.2π) is 100 samples.

956
Parametric Signal Modeling

tion error polynomial (in factored form) obtained with the autocorrelation method is

Aa (z) = (1 − 0.998ej0.21π z−1 )(1 − 0.998e−j0.21π z−1 )


· (1 − 0.426z−1 )(1 − 0.1165z−1 ) (66)

The two real poles contribute no peaks, and the complex poles are close to the unit circle,
but at ±0.21π, which is halfway between the two frequencies. Thus, the windowing
inherent in the autocorrelation method causes the resulting model to lock onto the
average frequency 0.21π.
On the other hand, the factored prediction error polynomial obtained with the
covariance method is (with rounding of the magnitudes and angles) given by

Ac (z) = (1 − ej0.2π z−1 )(1 − e−j0.2π z−1 )


· (1 − ej0.22π z−1 )(1 − e−j0.22π z−1 ). (67)

In this case, the angles of the zeros are almost exactly equal to the frequencies of the
two sinusoids. Also shown in Figure 16(b) is the frequency response of the model, i.e.,

1
|Hcov (ejω )|2 = . (68)
|Acov (ejω )|2

plotted in dB. In this case, the prediction error is very close to zero, which, if used
to estimate the gain of the all-pole model, would lead to an indeterminant estimate.
Therefore, the gain is arbitrarily set to one, which leads to a plot of Eq. (68) on a similar
scale to the other estimates. Since the poles are almost exactly on the unit circle, the
magnitude spectrum becomes exceedingly large at the pole frequencies. Note that the
roots of the prediction error polynomial give an accurate estimate of the frequencies.
This method, of course, does not provide accurate information about the amplitudes
and phases of the sinusoidal components.

6 SOLUTION OF THE AUTOCORRELATION NORMAL


EQUATIONS

In both the autocorrelation and covariance methods of computing the correlation val-
ues, the predictor coefficients that minimize the mean-squared inverse filter error and
equivalently the prediction error satisfy a set of linear equations of the general form:
⎡ ⎤⎡ ⎤ ⎡ ⎤
φss [1, 1] φss [1, 2] φss [1, 3] · · · φss [1, p] a1 φss [1, 0]
⎢ φss [2, 1] φss [2, 2] φss [2, 3] · · · φss [2, p] ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ a2 ⎥ ⎢ φss [2, 0] ⎥
⎢ φss [3, 1] φss [3, 2] φss [3, 3] · · · φss [3, p] ⎥ ⎢ a3 ⎥ = ⎢ φss [3, 0] ⎥
⎥ ⎢ ⎥ ⎢
⎢ ⎥. (69)
⎢ .. .. .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . . . ··· . ⎦ ⎣ . ⎦ ⎣ . ⎦
φss [p, 1] φss [p, 2] φss [p, 3] · · · φss [p, p] ap φss [p, 0]

957
Parametric Signal Modeling

In matrix notation, these linear equations have the representation


a = ψ. (70)
Since φ[i, k] = φ[k, i], in both the autocorrelation and covariance methods, the matrix
 is symmetric, and, because it arises in a least-squares problem, it is also positive-
definite, which guarantees that it is invertible. In general, this leads to efficient solu-
tion methods, such as the Cholesky decomposition (see Press, et al., 2007), that are
based on matrix factorization and applicable when  is symmetric and positive definite.
However, in the specific case of the autocorrelation method or any method for which
φss [i, k] = rss [|i − k|], Eqs. (69) become the autocorrelation normal equations (also
referred to as the Yule–Walker equations).
⎡ ⎤⎡ ⎤ ⎡ ⎤
rss [0] rss [1] rss [2] · · · rss [p − 1] a1 rss [1]
⎢ rss [1]
⎢ rss [0] rss [1] · · · rss [p − 2] ⎥ ⎢ ⎥ ⎢
⎥ ⎢ a2 ⎥ ⎢ rss [2] ⎥

⎢ rss [2] r [1] r [0] · · · r [p − 3] ⎥ ⎢ a ⎥ ⎢ r [3] ⎥
⎢ ss ss ss ⎥ ⎢ 3 ⎥ = ⎢ ss ⎥ . (71)
⎢ .. .. .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥
⎣ . . . ··· . ⎦ ⎣ . ⎦ ⎣ . ⎦
rss [p − 1] rss [p − 2] rss [p − 3] · · · rss [0] ap rss [p]
In this case, in addition to the matrix  being symmetric and positive-definite, it is also a
Toeplitz matrix, i.e., all the elements on each subdiagonal are equal. This property leads
to an efficient algorithm, referred to as the Levinson–Durbin recursion, for solving the
equations.

6.1 The Levinson–Durbin Recursion


The Levinson–Durbin algorithm for computing the predictor coefficients that minimize
the total-squared prediction error results from the high degree of symmetry in the
matrix  and furthermore, as Eq. (71) confirms, the elements of the right-hand side
vector ψ are primarily the same values that populate the matrix . Equations (L–D.1) to
(L–D.6) in Figure 17 define the computations. A derivation of these equations is given
in Section 6.2, but before developing the details of the derivation, it is helpful to simply
examine the steps of the algorithm.

(L–D.1) This step initializes the mean-squared prediction error to be the energy of the
signal. That is, a zeroth -order predictor (no predictor) yields no reduction in
prediction error energy, since the prediction error e[n] is identical to the signal
s[n].

The next line in Figure 17 states that steps (L–D.2) through (L–D.5) are repeated p
times, with each repetition of those steps increasing the order of the predictor by one.
In other words, the algorithm computes a predictor of order i from the predictor of
order i − 1 starting with i − 1 = 0.

(L–D.2) This step computes a quantity ki . The sequence of parameters ki , i = 1, 2, . . . , p


which we refer to as the k-parameters, plays a key role in generating the next
set of predictor coefficients.9
9 For reasons to be discussed in Section 7, the k-parameters are also called PARCOR (for PARtial
CORrelation) coefficients or also, reflection coefficients.

958
i i

14
Estimación del espectro
de potencia
En este capítulo vamos a abordar la estimación de las características espectrales de señales caracterizadas como
procesos aleatorios. Muchos de los fenómenos que ocurren en la naturaleza se caracterizan mejor estadística-
mente en términos de promedios. Por ejemplo, los fenómenos meteorológicos tales como las fluctuaciones en la
temperatura y la presión del aire se caracterizan mejor estadísticamente como procesos aleatorios. Las tensiones
de ruido térmico generadas en las resistencias y dispositivos electrónicos son ejemplos adicionales de señales
físicas que están modeladas como procesos aleatorios.
Debido a las fluctuaciones aleatorias en tales señales, debemos adoptar un punto de vista estadístico, que
trate con las características promedio de señales aleatorias. En particular, la función de autocorrelación de un
proceso aleatorio es la media estadística apropiada que se utilizará para caracterizar las señales aleatorias en
el dominio del tiempo, y la transformada de Fourier de la función de autocorrelación, que da el espectro de la
densidad de potencia, y proporciona la transformación del dominio del tiempo al dominio de la frecuencia.
Los métodos de estimación del espectro de potencia tienen una historia relativamente larga. Para adquirir una
perspectiva histórica, el lector puede consultar los documentos de Robinson (1982) y el libro de Marple (1987).
Nosotros vamos a cubrir los métodos de estimación del espectro de potencia clásicos basados en el periodograma,
desarrollado originalmente por Schuster (1898) y por Yule (1927), que dieron origen a los modernos métodos
paramétricos o basados en modelo. Estos métodos fueron desarrollados posteriormente y aplicados por Walker
(1931), Bartlett (1948), Parzen (1957), Blackman y Tukey (1958), Burg (1967), y otros. Describimos también
el método de Capon (1969) y los métodos basados en el autoanálisis de la matriz de correlación de datos.

14.1 Estimación de los espectros procedentes de observaciones


de duración finita de señales
El problema básico que vamos a considerar en este capítulo es la estimación del espectro de densidad de potencia
de una señal procedente de la observación de la señal en un intervalo de tiempo finito. Como veremos, la longitud
del registro finito de la secuencia de datos es una limitación importante en la calidad del estimado de espectro de
potencia. Cuando se trabaja con señales que son estadísticamente estacionarias, cuando más largo es el registro
de datos, mejor es el estimado que podremos extraer de los datos. Por el contrario, si los parámetros estadísticos

i i
i i

856 Tratamiento digital de señales

de la señal no son estacionarios, no podemos seleccionar un registro de datos arbitrariamente largo para estimar
el espectro. En tal caso, la longitud del registro de datos que seleccionemos se determina por la rapidez de las
variaciones temporales en los parámetros estadísticos de la señal. Por último, nuestro objetivo es seleccionar
un registro de datos tan corto como sea posible que nos permita resolver las características espectrales de las
diferentes componentes de señal en el registro de datos que contiene espectros muy poco espaciados.
Uno de los problemas con los que solemos encontrarnos en los métodos de estimación de los espectros de
potencia basados en un registro de datos de longitud finita es la distorsión del espectro que estamos intentando
estimar. Este problema se produce tanto en el cálculo del espectro de una señal determinística como en la
estimación del espectro de potencia de una señal aleatoria. Puesto que es más fácil observar el efecto de la
longitud finita del registro de datos sobre una señal determinística, vamos a abordar este caso en primer lugar.
A continuación, consideraremos sólo las señales aleatorias y la estimación de sus espectros de potencia.

14.1.1 Cálculo del espectro de densidad de energía


Considere el cálculo del espectro de una señal determinística a partir de una secuencia finita de datos. La secuencia
x(n) normalmente es el resultado de muestrear una señal continua en el tiempo xa(t) a una determinada frecuencia
de muestreo uniforme Fs . Nuestro objetivo es obtener un estimado del espectro real de una secuencia de duración
finita x(n).
Recuerde que si x(t) es una señal de energía finita, es decir,
 ∞
E= |xa (t)|2 dt < ∞
−∞

entonces su transformada de Fourier existe y está dada por


 ∞
Xa (F) = xa (t)e− j2π Ft dt
−∞

A partir del teorema de Parseval, tenemos


 ∞  ∞
E= |xa (t)|2 dt = |Xa (F)|2 dF (14.1.1)
−∞ −∞

La magnitud |Xa (F)|2 representa la distribución de la energía de la señal como una función de la frecuencia
energía de la señal, es decir,
y se conoce como espectro de densidad de potencia

Sxx (F) = |Xa (F)|2 (14.1.2)

como se ha descrito en el Capítulo 4. Por tanto, la energía total de la señal es simplemente la integral de Sxx (F)
para todo F [es decir, el área total bajo Sxx (F)].
También es interesante destacar que Sxx (F) puede interpretarse como la transformada de Fourier de otra
función, Rxx (τ ), conocida como función de autocorrelación de la señal de energía finita x a (t), definida como
 ∞
Rxx (τ ) = x∗a (t)xa (t + τ ) dt (14.1.3)
−∞

Así, fácilmente se deduce que


 ∞
Rxx (τ )e− j2π F τ d τ = Sxx (F) = |Xa (F)|2 (14.1.4)
−∞

de modo que Rxx (τ ) y Sxx (F) forman una pareja de transformadas de Fourier.

i i
i i

Capítulo 14 Estimación del espectro de potencia 857

Supongamos ahora que calculamos el espectro de densidad de potencia de la señal xa (t) a partir de sus
muestras tomadas a la frecuencia de Fs muestras por segundo. Para garantizar que no existe aliasing espectral
resultante del proceso de muestreo, se supone que la señal será prefiltrada, de modo que, para propósitos
prácticos, su ancho de banda está limitado a B hercios. Entonces la frecuencia de muestreo Fs se selecciona de
manera que Fs > 2B.
La versión muestreada de xa (t) es una secuencia x(n), −∞ < n < ∞, que tiene la transformada de Fourier
(espectro de tensión)

X(ω ) = ∑ x(n)e− jω n
n=−∞

o, lo que es equivalente,

X( f ) = ∑ x(n)e− j2π f n (14.1.5)
n=−∞

Recuerde que X( f ) puede expresarse en términos del espectro de tensión de la señal analógica xa (t) como
  ∞
F
X = Fs ∑ Xa (F − kFs ) (14.1.6)
X(F) k=−∞

donde f = F/Fs es la variable de frecuencia normalizada.


En ausencia de aliasing, en el rango fundamental |F| ≤ Fs /2, tenemos
 
F
X = Fs Xa (F), |F| ≤ Fs /2 (14.1.7)
X(F)

Por tanto, el espectro de tensión de la señal muestreada es idéntico al espectro de tensión de la señal analógica.
energía de la señal muestreada es
Como consecuencia, el espectro de densidad de potencia
    2
F  F 
Sxx = X  = F 2 |Xa (F)|2 (14.1.8)
Sxx (F) X(F)  s

Podemos destacar que la autocorrelación de la señal muestreada, que se define como



rxx (k) = ∑ x∗ (n)x(n + k) (14.1.9)
n=−∞

tiene la transformada de Fourier (teorema de Wiener–Khintchine)



Sxx ( f ) = ∑ rxx (k)e− j2π k f (14.1.10)
k=−∞

Por tanto, el espectro de densidad de potencia puede obtenerse mediante la transformada de Fourier de la
autocorrelación de la secuencia {x(n)}.
Las relaciones anteriores nos llevan a diferenciar entre dos métodos distintos para calcular el espectro de
densidad de potencia de una señal xa (t) a partir de sus muestras x(n). Uno es el método directo, que implica el
cálculo de la transformada de Fourier de {x(n)}, y así

Sxx ( f ) = |X( f )|2


 2
 ∞ 
 − j2π f n  (14.1.11)
=  ∑ x(n)e 
n=−∞ 

i i
i i

858 Tratamiento digital de señales

El segundo método es el método indirecto porque requiere dos pasos. En primer lugar, la autocorrelación r xx (k)
se calcula a partir de x(n) y luego se calcula la transformada de Fourier de la autocorrelación como en (14.1.10)
para obtener el espectro de densidad de potencia.
Sin embargo, en la práctica, sólo la secuencia de duración finita x(n), 0 ≤ n ≤ N − 1, está disponible para
calcular el espectro de la señal. En efecto, limitar la duración de la secuencia x(n) a N puntos es equivalente a
multiplicar x(n) por una ventana rectangular. Por tanto, tenemos

x(n), 0 ≤ n ≤ N − 1
x̃(n) = x(n)w(n) = (14.1.12)
0, en otro caso

A partir de nuestra exposición sobre el diseño de filtros FIR basado en el uso de ventanas para limitar la duración
de la respuesta al impulso, recordemos que la multiplicación de dos secuencia es equivalente a convolucionar
sus espectros de tensión. En consecuencia, la relación correspondiente en el dominio de la frecuencia a (14.1.12)
es
X̃( f ) = X( f ) ∗ W ( f )
 1/2 (14.1.13)
= X(α )W ( f − α )d α
−1/2

Recuerde de nuestra exposición de la Sección 10.2.1 que la convolución de la función de ventana W ( f )


con X( f ) suaviza el espectro X( f ), siempre que el espectro W ( f ) sea relativamente estrecho comparado con
X( f ). Pero esta condición implica que la ventana w(n) sea lo suficientemente larga (es decir, N debe ser lo
suficientemente grande) como para que W ( f ) sea estrecho comparado con X( f ). Incluso si W ( f ) es estrecho
comparado con X( f ), la convolución de X( f ) con los lóbulos secundarios de W ( f ) da lugar a la energía del
lóbulo secundario X̃( f ), en las bandas de frecuencia donde el espectro real de la señal cumple que X( f ) = 0.
Esta energía del lóbulo secundario se conoce como fuga. El siguiente ejemplo ilustra el problema de las fugas.

EJEMPLO 14.1.1

Una señal con el espectro (de tensión)



1, | f | ≤ 0.1
X( f ) =
0, en otro caso
se convoluciona con la ventana rectangular de longitud N = 61. Determine el espectro de X̃( f ) dado por (14.1.13).

Solución. La característica espectral W ( f ) para la ventana rectangular de longitud N = 61 se ilustra en la Figura 10.2.2(a).
Observe que la anchura del lóbulo principal de la función de ventana es ∆ω = 4π /61 o ∆ f = 2/61, que es estrecha comparada
con X( f ).
La convolución de X( f ) con W ( f ) se ilustra en la Figura 14.1.1. Observe que dicha energía se ha fugado a la banda de
frecuencias 0.1 < | f | ≤ 0.5, donde X( f ) = 0. Además esto se debe a la anchura del lóbulo principal en W ( f ), lo que hace
que X( f ) se disperse fuera del rango | f | ≤ 0.1. Sin embargo, la energía del lóbulo secundario en X̃( f ) se debe a la presencia
de los lóbulos secundarios de W ( f ), que se convolucionan con X( f ). La dispersión de X( f ) para | f | > 0.1 y los lóbulos
secundarios en el rango 0.1 ≤ | f | ≤ 0.5 constituyen las fugas.

Como en el caso de diseño del filtro FIR, podemos reducir las fugas del lóbulo secundario seleccionando
ventanas que tengan lóbulos secundarios pequeños. Esto implica que las ventanas tienen un corte suave en el
dominio del tiempo en lugar del abrupto corte de la ventana rectangular. Aunque tales funciones de ventana
reducen las fugas del lóbulo secundario, dan lugar a un aumento del suavizado o dispersión de la característica
espectral X( f ). Por ejemplo, el uso de la ventana de Blackman de longitud N = 61 del Ejemplo 14.1.1 da lugar

i i
i i

Capítulo 14 Estimación del espectro de potencia 859

10
0
−10
Densidad de energía (dB)

−20
−30
−40
−50
−60
−70
−80
−90
−100 f
0 .1 .2 .3 .4 .5
Frecuencia (ciclos/muestra)

Figura 14.1.1. Espectro obtenido convolucionando una ventana rectangular de longitud M = 61 con el espectro
paso bajo ideal del Ejemplo 14.1.1.

a la característica espectral X̃( f ) mostrada en la Figura 14.1.2. Las fugas del lóbulo secundario se han reducido
notablemente, aunque la anchura espectral ha aumentado aproximadamente en un 50 %.
La dispersión del espectro que se está estimando mediante el uso de ventanas es especialmente un problema
cuando deseamos resolver las señales con componentes de frecuencia muy poco espaciadas. Por ejemplo, la
señal con la característica espectral X( f ) = X1 ( f ) + X2 ( f ), como la mostrada en la Figura 14.1.3, no se puede
resolver como dos señales separadas a menos que la anchura de la función de ventana sea significativamente
más estrecha que la separación de frecuencia ∆ f . Por tanto, observe que utilizar ventanas de suavizado en el
dominio de tiempo reduce las fugas a expensas de una disminución de la resolución en frecuencia.

10
0
−10
Densidad de energía (dB)

−20
−30
−40
−50
−60
−70
−80
−90
−100 f
0 .1 .2 .3 .4 .5
Frecuencia (ciclos/muestra)

Figura 14.1.2. Espectro obtenido convolucionando una ventana de Blackman de longitud M = 61 con el
espectro paso bajo ideal en el Ejemplo 14.1.1.

i i
i i

860 Tratamiento digital de señales

X(f)
X1(f) X2(f)

f
0 0.5

Figura 14.1.3. Dos espectros de señal de banda estrecha.


Está claro a partir de esta exposición que el espectro de densidad de energía de la secuencia de ventana
{x̃(n)} es una aproximación del espectro deseado de la secuencia {x(n)}. La densidad espectral obtenida a partir
de {x̃(n)} es
 2
N−1 
 − j2π f n 
Sx̃x̃ ( f ) = |X̃( f )| =  ∑ x̃(n)e
2
 (14.1.14)
 n=0 

El espectro dado por (14.1.14) puede calcularse numéricamente en el conjunto de N puntos de frecuencia
por medio de la DFT. Por tanto,
N−1
X̃(k) = ∑ x̃(n)e− j2π kn/N (14.1.15)
n=0

Así,  
k
|X̃(k)|2 = Sx̃x̃ ( f )| f =k/N = Sx̃x̃ (14.1.16)
N
y por tanto
  N−1 2

k  
Sx̃x̃ =  ∑ x̃(n)e− j2π kn/N  (14.1.17)
N  n=0 

que es una versión distorsionada del espectro real Sxx (k/N).

14.1.2 Estimación de la autocorrelación y del espectro de potencia


de señales aleatorias: el periodograma
Las señales de energía finita consideradas en la sección anterior tienen transformada de Fourier y se caracterizan
energía Por el contrario, la importante clase de señales
en el dominio espectral por su espectro de densidad de potencia.
caracterizadas como procesos aleatorios estacionarios no tienen energía finita y por tanto no tienen transformada
de Fourier. Tales señales tienen una potencia media finita y se caracterizan, por tanto, mediante el espectro de
densidad de potencia. Si x(t) es un proceso aleatorio estacionario, su función de autocorrelación es

γxx (τ ) = E[x∗ (t)x(t + τ )] (14.1.18)

donde E[·] designa la media estadística. Luego, por medio del teorema de Wiener–Khintchine, el espectro
de densidad de potencia del proceso aleatorio estacionario es la transformada de Fourier de la función de
autocorrelación, es decir,  ∞
Γxx (F) = γxx (τ )e− j2π F τ dt (14.1.19)
−∞

i i
i i

Capítulo 14 Estimación del espectro de potencia 861

En la práctica, trabajamos con una única realización del proceso aleatorio a partir de la cual estimamos el
espectro de potencia del proceso. La función de autocorrelación real γxx (τ ) no es conocida y, como consecuencia,
no podemos calcular la transformada de Fourier en (14.1.19) para obtener Γxx (F). Por el contrario, a partir de
una única realización del proceso aleatorio podemos calcular la función de autocorrelación con respecto a la
media temporal
 T0
1
Rxx (τ ) = x∗ (t)x(t + τ ) dt (14.1.20)
2T0 −T0
donde 2T0 es el intervalo de observación. Si el proceso aleatorio estacionario es ergódico en los momentos de
primer y segundo orden (media y función de autocorrelación), luego

γxx (τ ) = lı́m Rxx (τ )


T0 →∞
 T0 (14.1.21)
1
= lı́m x∗ (t)x(t + τ ) dt
T0 →∞ 2T0 −T0

Esta relación justifica el uso de la autocorrelación temporal Rxx (τ ) como un estimado de la función de
autocorrelación estadística γxx (τ ). Además, la transformada de Fourier de Rxx (τ ) proporciona un estimado
Pxx (F) del espectro de la densidad de potencia, es decir,
 T0
Pxx (F) = Rxx (τ )e− j2π F τ d τ
−T0
 T0  T0
1
= x∗ (t)x(t + τ )dt e− j2π F τ d τ (14.1.22)
2T0 −T0 −T0
 2
1  T0 − j2π Ft 

=  x(t)e dt 
2T0 −T0

El espectro de densidad de potencia real es el valor esperado de Pxx (F) en el límite cuando T0 → ∞,

Γxx (F) = lı́m E[Pxx (F)]


T0 →∞
  2 
1  T0 − j2π Ft 
 (14.1.23)
= lı́m E  x(t)e dt 
T0 →∞ 2T0 −T0

A partir de (14.1.20) y (14.1.22) de nuevo debemos trabajar con dos posibles métodos para calcular Pxx(F), el
método directo dado por (14.1.22) o el método indirecto, en el que obtenemos Rxx (τ ) primero y luego calculamos
la transformada de Fourier.
Consideremos la estimación del espectro de densidad de potencia a partir de las muestras de una única
realización del proceso aleatorio. En particular, suponemos que xa (t) se muestrea a una frecuencia Fs > 2B,
donde B es la frecuencia más alta contenida en el espectro de densidad de potencia del proceso aleatorio. Por
tanto, obtenemos una secuencia de duración finita x(n), 0 ≤ n ≤ N − 1, muestreando xa (t). A partir de estas
muestras calculamos la secuencia de autocorrelación temporal

1 N−m−1 ∗

rxx (m) = ∑ x (n)x(n + m),
N − m n=0
m = 0, 1, . . . , N − 1 (14.1.24)

 (m) = [r (−m)]∗ . A continuación, calculamos la transformada de


y para valores negativos de m tenemos rxx xx
Fourier
N−1

Pxx (f) = ∑ 
rxx (m)e− j2π f m (14.1.25)
m=−N+1

i i
i i

862 Tratamiento digital de señales

El factor de normalización N − m en (14.1.24) da lugar a un estimado de valor medio

1 N−m−1

E[rxx (m)] = ∑ E[x∗ (n)x(n + m)] = γxx (m)
N − m n=0
(14.1.26)

 (m) es un estimado no
donde γxx (m) es la secuencia de autocorrelación real (estadística) de x(n). Por tanto, rxx
 (m) es aproximadamente
desviado de la función de γxx (m). La varianza del estimado rxx
∞ 
N

var[rxx (m)] ≈ ∑
[N − m]2 n=−∞

|γxx (n)|2 + γxx (n − m)γxx (n + m) (14.1.27)

que es un resultado proporcionado por Jenkins y Watts (1968). Evidentemente,



lı́m var[rxx (m)] = 0 (14.1.28)
N→∞

siempre que

∑ |γxx (n)|2 < ∞
n=−∞

Dado que E[rxx (m)] = γxx (m) y la varianza del estimado converge a cero cuando N → ∞, se dice que el estimado

rxx (m) es coherente.
Para valores grandes del parámetro de retardo m, el estimado rxx (m) dado por (14.1.24) tiene una varianza

grande, especialmente cuando m se aproxima a N. Esto se debe al hecho de que se han empleado muy pocos
puntos de datos en el estimado para retardos grandes. Como alternativa a (14.1.24) podemos emplear el estimado

1 N−m−1 ∗
rxx (m) = ∑ x (n)x(n + m),
N n=0
0 ≤ m ≤ N −1
(14.1.29)
1 N−1 ∗
rxx (m) = ∑ x (n)x(n + m),
N n=|m|
m = −1, −2, . . ., 1 − N

que presenta una desviación de |m|γxx (m)/N, ya que su valor medio es

1 N−m−1
E[rxx (m)] = ∑ E[x∗ (n)x(n + m)]
N n=0
  (14.1.30)
N − |m| |m|
= γxx (m) = 1 − γxx (m)
N N

Sin embargo, este estimado tiene una varianza menor, dada aproximadamente por

1 ∞
var[rxx (m)] ≈ ∑ [γxx (n)|2 + γxx∗ (n − m)γxx(n + m)]
N n=−∞
(14.1.31)

Observe que rxx (m) se desvía asintóticamente,es decir,

lı́m E[rxx (m)] = γxx (m) (14.1.32)


N→∞

y su varianza converge a cero cuando N → ∞. Por tanto, el estimado rxx (m) también es un estimado coherente
de γxx (m).

i i
i i

Capítulo 14 Estimación del espectro de potencia 863

Utilizaremos el estimado rxx (m) dado por (14.1.29) en nuestro tratamiento de la estimación del espectro de
potencia. El correspondiente estimado del espectro de densidad de potencia es

N−1
Pxx ( f ) = ∑ rxx (m)e− j2π f m (14.1.33)
m=−(N−1)

Si sustituimos la expresión de rxx (m) dada en (14.1.29) en la Ecuación (14.1.33), el estimado Pxx ( f ) también se
puede expresar como
 2
1 N−1 
− j2π f n  1
Pxx ( f ) =  ∑ x(n)e  = |X( f )|2 (14.1.34)
N  n=0  N

donde X( f ) es la transformada de Fourier de la secuencia de muestras x(n). Esta forma bien conocida del
estimado del espectro de densidad de potencia se denomina periodograma. Originalmente fue presentado por
Schuster (1898) para detectar y medir las “periodicidades ocultas” en los datos.
A partir de (14.1.33), el valor medio del estimado del periodograma Pxx ( f ) es
 
N−1
E[Pxx ( f )] = E ∑ rxx (m)e − j2π f m
= ∑N−1
m=−(N−1)
E[rxx (m)]e− j2π f m
m=−(N−1)
  (14.1.35)
N−1
|m|
E[Pxx ( f )] = ∑ 1 −
N
γxx (m)e− j2π f m
m=−(N−1)

La interpretación que damos a (14.1.35) es que el valor medio del espectro estimado es la transformada de
Fourier de la función de autocorrelación tratada mediante ventanas
 
|m|
γ̃xx (m) = 1 − γxx (m) (14.1.36)
N

donde la función de ventana es la ventana de Bartlett (triangular). Por tanto, el valor medio del espectro estimado
es
∞  1/2
E[Pxx ( f )] = ∑ γ̃xx (m)e− j2π f m =
−1/2
Γxx (α )WB ( f − α )d α (14.1.37)
m=−∞

dondeWB( f ) es la característica espectral de la ventana de Bartlett. La relación (14.1.37) ilustra que el valor medio
del espectro estimado es la convolución del espectro de densidad de potencia real Γxx ( f ) con la transformada
de Fourier WB ( f ) de la ventana de Bartlett. En consecuencia, la media del espectro estimado es una versión
suavizada del espectro real y sufre los mismos problemas de fugas espectrales, que se deben al número finito
de puntos de datos.
Observe que el espectro estimado está desviado asintóticamente, es decir,
 
N−1 ∞
lı́m E
N→∞
∑ rxx (m)e− j2π f m = ∑ γxx (m)e− j2π f m = Γxx ( f )
m=−(N−1) m=−∞

Sin embargo, en general, la varianza del estimado Pxx ( f ) no disminuye a cero cuando N → ∞. Por ejemplo,
cuando la secuencia de datos es un proceso aleatorio gaussiano, puede demostrarse fácilmente que la varianza
es (véase el Problema 14.4)
   
sen 2π f N 2
var[Pxx ( f )] = Γxx ( f ) 1 +
2
(14.1.38)
N sen 2π f

i i
i i

864 Tratamiento digital de señales

la cual, en el límite cuando N → ∞, se convierte en

lı́m var[Pxx ( f )] = Γ2xx ( f ) (14.1.39)


N→∞

Por tanto, concluimos que el periodograma no es un estimado coherente del espectro de densidad de potencia
real (es decir, no converge al espectro de densidad de potencia real).
En resumen, la autocorrelación estimada rxx (m) es un estimado coherente de la función de autocorrelación
real γxx (m). Sin embargo, su transformada de Fourier Pxx ( f ), el periodograma, no es un estimado coherente del
espectro de densidad de potencia real. Ya hemos mencionado que Pxx( f ) es un estimado desviado asintóticamente
de Γxx ( f ), pero para una secuencia de duración finita, el valor medio de Pxx ( f ) presenta una desviación, la cual
es evidente a partir de (14.1.37) como una distorsión del espectro de la densidad de potencia real. Por tanto, el
espectro estimado se ve afectado por los efectos de suavizado y las fugas incluidas en la ventana de Bartlett.
En última instancia, el suavizado y las fugas limitan nuestra capacidad para resolver los espectros muy poco
espaciados.
Los problemas de fugas y de resolución en frecuencia que acabamos de describir, así como el problema
de que el periodograma no es un estimado coherente del espectro de potencia, son la causa de los métodos de
estimación del espectro de potencia que se describen en las Secciones 14.2, 14.3 y 14.4. Los métodos descritos
en la Sección 14.2 son métodos clásicos no paramétricos, que no plantean ninguna suposición acerca de la
secuencia de datos. El énfasis de los métodos clásicos está puesto en la obtención de un estimado coherente
del espectro de potencia a través de operaciones de promediado o suavizado realizadas directamente sobre el
periodograma o la autocorrelación. Como veremos, el efecto de estas operaciones es reducir la resolución de
frecuencia aún más, a la vez que se disminuye la varianza del estimado.
Los métodos de estimación del espectro descritos en la Sección 14.3 están basados en un determinado
modelo que define cómo fueron generados los datos. En general, los métodos basados en modelo que se han
desarrollado en las últimas décadas proporcionan una resolución significativamente más alta que los métodos
clásicos.
En las Secciones 14.4 y 14.5 se describen métodos adicionales. En la Sección 14.4, se abordan métodos
basados en bancos de filtros que permiten estimar el espectro de potencia. Los métodos descritos en la Sección
14.5 están basados en la descomposición autovalor/autovector de la matriz de correlación de datos.

14.1.3 Uso de la DFT en la estimación del espectro de potencia


Como se indica en las Ecuaciones (14.1.14) y (14.1.34), el espectro de densidad de energía
potencia estimado Sxx ( f )
y el periodograma Pxx ( f ), respectivamente, pueden calcularse utilizando la DFT, la cual a su vez se calcula
eficientemente mediante un algoritmo FFT. Si tenemos N puntos de datos, calculamos como un mínimo la DFT
de N puntos. Por ejemplo, el cálculo proporciona muestras del periodograma
   2
k 1 N−1 
− j2π nk/N 
Pxx =  ∑ x(n)e  , k = 0, 1, . . . , N − 1 (14.1.40)
N N  n=0 

en las frecuencias fk = k/N.


Sin embargo, en la práctica, un muestreo disperso del espectro como éste no proporciona una buena repre-
sentación del estimado del espectro continuo Pxx ( f ). Esto puede solucionarse rápidamente evaluando Pxx ( f ) en
más frecuencias. Del mismo modo, podemos aumentar de forma efectiva la longitud de la secuencia rellenan-
do con ceros y evaluando después Pxx ( f ) en un conjunto más denso de frecuencias. Luego si aumentamos la
longitud de la secuencia de datos a L puntos rellenando con ceros y evaluando la DFT de L puntos, tenemos
   2
k 1 N−1 
− j2π nk/L 
Pxx =  ∑ x(n)e  , k = 0, 1, . . . , L − 1 (14.1.41)
L N  n=0 

i i
i i

Capítulo 14 Estimación del espectro de potencia 865

Debemos resaltar que rellenar con ceros y evaluar la DFT en L > N puntos no mejora la resolución en
frecuencia del estimado espectral. Simplemente nos proporciona un método para interpolar los valores del
espectro medido en más frecuencias. La resolución en frecuencia en el estimado espectral Pxx ( f ) se determina
mediante la longitud N del registro de datos.

EJEMPLO 14.1.2

Obtenemos una secuencia de N = 16 muestras muestreando una señal analógica que consta de dos componentes en frecuencia.
La secuencia discreta en el tiempo resultante es

x(n) = sen 2π (0.135)n + cos 2π (0.135 + ∆ f )n, n = 0, 1, . . . , 15

donde ∆ f es la separación de frecuencia. Evalúe el espectro de potencia P( f ) = (1/N)|X( f )| 2 en las frecuencias f k = k/L,
k = 0, 1, . . . , L − 1, para L = 8, 16, 32 y 128 para los valores ∆ f = 0.06 y ∆ f = 0.01.

Solución. Rellenando con ceros, aumentamos la secuencia de datos para obtener el estimado del espectro de potencia
Pxx (k/L). Los resultados para ∆ f = 0.06 se han dibujado en la Figura 14.1.4. Observe que el relleno de ceros no cambia
la resolución, pero tiene el efecto de interpolar el espectro Pxx ( f ). En este caso, la separación de frecuencia ∆ f es lo
suficientemente grande como para que las dos componentes en frecuencia sean resolubles.

0 8

0 16

0 32

0 128

Figura 14.1.4. Espectros de dos sinusoides con separación de frecuencia ∆ f = 0.06.

i i
i i

866 Tratamiento digital de señales

0 8

0 16

0 32

0 128

Figura 14.1.5. Espectros de dos sinusoides con separación de frecuencia ∆ f = 0.01.

Los estimados espectrales para ∆ f = 0.01 se muestran en la Figura 14.1.5. En este caso, las dos componentes espectrales
no son resolubles. De nuevo, el efecto del relleno con ceros es proporcionar más interpolación, lo que nos permite disponer
de una mejor imagen del espectro estimado, aunque esto no mejora la resolución en frecuencia.

Cuando sólo se necesitan unos pocos puntos del periodograma, el algoritmo de Goertzel descrito en el
Capítulo 8 puede proporcionar un cálculo más eficiente. Puesto que el algoritmo de Goertzel ha sido interpretado
como un método de filtrado lineal para calcular la DFT, es evidente que el estimado del periodograma puede
obtenerse pasando la señal a través de un banco de filtros sintonizados en paralelo y elevando al cuadrado sus
salidas (véase el Problema 14.5).

14.2 Métodos no paramétricos para la estimación


del espectro de potencia
Los métodos de estimación del espectro de potencia descritos en esta sección son los métodos clásicos de-
sarrollados por Bartlett (1948), Blackman y Tukey (1958), y Welch (1967). Estos métodos no hacen ninguna
suposición acerca de cómo fueron generados los datos y, por tanto, se denominan métodos no paramétricos.

i i
i i

Capítulo 14 Estimación del espectro de potencia 867

Dado que los estimados se basan por completo en un registro finito de datos, la resolución en frecuencia de
estos métodos es, en el mejor de los casos, igual a la anchura espectral de la ventana rectangular de longitud N,
que es aproximadamente 1/N en los puntos a −3-dB. Debemos ser más precisos al especificar la resolución en
frecuencia de los métodos especificados. Todas las técnicas de estimación descritas en esta sección disminuyen
la resolución en frecuencia con el fin de reducir la varianza en el estimado espectral.
En primer lugar, describimos los estimados y obtenemos la media y la varianza de cada uno. En la Sección
14.2.4 se proporciona una comparativa de los tres métodos. Aunque los estimados espectrales se expresan como
una función de la variable continua en frecuencia f , en la práctica, los estimados se calculan en frecuencias
discretas mediante el algoritmo FFT. Los requisitos de cálculo basados en algoritmos FFT se abordan en la
Sección 14.2.5.

14.2.1 El método de Bartlett: promediado de periodogramas


El método de Bartlett para reducir la varianza en el periodograma es un proceso de tres pasos. En primer lugar,
la secuencia de N puntos se subdivide en K segmentos no solapados, donde cada segmento tiene una longitud
M. Esto da como resultado K segmentos de datos

i = 0, 1, . . . , K − 1
xi (n) = x(n + iM), (14.2.1)
n = 0, 1, . . . , M − 1

Para cada segmento, calculamos el periodograma


 2
1 M−1 
− j2π f n 
=  ∑ xi (n)e
(i)
Pxx ( f )  , i = 0, 1, . . . , K − 1 (14.2.2)
M  n=0 

Por último, promediamos los periodogramas para los K segmentos para obtener el estimado del espectro de
potencia de Bartlett [Bartlett (1948)]
1 K−1 (i)
B
Pxx (f) = ∑ Pxx ( f )
K i=0
(14.2.3)

Las propiedades estadísticas de este estimado pueden obtenerse fácilmente. En primer lugar, el valor medio
es
1 K−1
∑ E[Pxx ( f )]
B ( f )] (i)
E[Pxx =
K i=0 (14.2.4)
(i)
= E[Pxx ( f )]
A partir de (14.1.35) y (14.1.37) obtenemos el valor esperado para el periodograma como

M−1  
|m|

(i)
E[Pxx ( f )] = 1 − γxx (m)e− j2π f m
m=−(M−1)
M
  (14.2.5)

1 1/2 sen π ( f − α )M 2
= Γxx (α ) dα
M −1/2 sen π ( f − α )

donde
 2
1 sen π f M
WB ( f ) = (14.2.6)
M sen π f
es la característica en frecuencia de la ventana de Bartlett

i i
i i

868 Tratamiento digital de señales


 |m|
1− , |m| ≤ M − 1
wB (n) = M (14.2.7)

0, en otro caso

A partir de (14.2.5) observe que el espectro real ahora se ha convolucionado con la característica en frecuencia
WB ( f ) de la ventana de Bartlett. El efecto de reducir la longitud de los datos de N puntos a M = N/K da lugar
a una ventana cuya anchura espectral se ha incrementado en un factor de K. En consecuencia, la resolución en
frecuencia se ha reducido en un factor K.
A cambio de esta reducción en la resolución, hemos reducido la varianza. La varianza del estimado de
Bartlett es
K−1
B ( f )] = 1
∑ var[Pxx ( f )]
(i)
var[Pxx
K 2 i=0 (14.2.8)
1 (i)
= var[Pxx ( f )]
K
Si utilizamos (14.1.38) en (14.2.8), obtenemos
  2 
B 1 sen 2 π f M
var[Pxx ( f )] = Γ2xx ( f ) 1 + (14.2.9)
K M sen 2π f

Por tanto, la varianza del estimado del espectro de potencia de Bartlett se ha reducido por el factor K.

14.2.2 Método de Welch: promediado de periodogramas modificados


Welch (1967) realizó dos modificaciones básicas al método de Bartlett. En primer lugar, permitió que los
segmentos de datos se solaparan. Por tanto, los segmentos de datos pueden representarse como

n = 0, 1, . . . , M − 1
xi (n) = x(n + iD), (14.2.10)
i = 0, 1, . . . , L − 1

donde iD es el punto de partida de la secuencia i. Observe que si D = M, los segmentos no se solapan y el número
L de segmentos de datos es idéntico al número K en el método de Bartlett. Sin embargo, si D = M/2, existe un
50 % de solapamientos entre segmentos de datos sucesivos y se obtienen L = 2K segmentos. Alternativamente,
podemos formar K segmentos de datos de longitud 2M cada uno.
La segunda modificación hecha por Welch al método de Bartlett consiste en enventanar los segmentos de
datos antes de calcular el periodograma. El resultado es un periodograma “modificado”
 2
M−1 
1  
 ∑ xi (n)w(n)e− j2π f n  ,
(i)
P̃xx ( f ) = i = 0, 1, . . . , L − 1 (14.2.11)
MU  n=0 

donde U es un factor de normalización para la potencia de la función de ventana y se selecciona como

1 M−1 2
U= ∑ w (n)
M n=0
(14.2.12)

El estimado del espectro de potencia de Welch es la media de estos periodogramas modificados, es decir,

1 L−1 (i)
W
Pxx (f) = ∑ P̃xx ( f )
L i=0
(14.2.13)

i i
i i

Capítulo 14 Estimación del espectro de potencia 869

El valor medio del estimado de Welch es

1 L−1
∑ E[P̃xx ( f )]
W ( f )] = (i)
E[Pxx
L i=0 (14.2.14)
(i)
= E[P̃xx ( f )]

Pero el valor esperado del periodograma modificado es


M−1 M−1
1
∑ ∑ w(n)w(m)E[xi (n)x∗i (m)]e− j2π f (n−m)
(i)
E[P̃xx ( f )] =
MU n=0 m=0
(14.2.15)
M−1 M−1
1
=
MU ∑ ∑ w(n)w(m)γxx (n − m)e− j2π f (n−m)
n=0 m=0

Puesto que
 1/2
γxx (n) = Γxx (α )e j2πα n d α (14.2.16)
−1/2

sustituyendo la expresión de γxx (n) dada en (14.2.16) en (14.2.15), obtenemos


 1/2
 
M−1 M−1
1
Γxx (α ) ∑ ∑ w(n)w(m)e
(i) − j2π (n−m)( f −α )
E[P̃xx ( f )] = dα
MU −1/2 n=0 m=0
 1/2 (14.2.17)
= Γxx (α )W ( f − α )d α
−1/2

donde, por definición,


 2
M−1 
1  − j2π f n 
W(f) =  ∑ w(n)e  (14.2.18)
MU  n=0 

El factor de normalización U garantiza que


 1/2
W ( f )d f = 1 (14.2.19)
−1/2

La varianza del estimado de Welch es

1 L−1 L−1
∑ ∑ E[P̃xx ( f )P̃xx ( f )] − {E[PxxW ( f )]}2
W (i) ( j)
var[Pxx ( f )] = (14.2.20)
L2 i=0 j=0

En el caso de que no se produzca ningún solapamiento entre segmentos de datos sucesivos (L = K), Welch
demostró que
W ( f )] = 1 var[P̃(i) ( f )]
var[Pxx xx
L (14.2.21)
1
≈ Γ2xx ( f )
L
En el caso de un solapamiento del 50 % entre segmentos de datos sucesivos (L = 2K), la varianza del
estimado del espectro de potencia de Welch con la ventana de Bartlett (triangular), también obtenida en los
estudios de Welch, es
W 9 2
var[Pxx ( f )] ≈ Γ (f) (14.2.22)
8L xx

i i
i i

870 Tratamiento digital de señales

Aunque hemos considerado sólo la ventana triangular en los cálculos de la varianza, pueden emplearse otras
funciones de ventana. En general, éstas nos proporcionarán una varianza diferente. Además, es posible variar
el solapamiento del segmento de datos más o menos del 50 % que se ha considerado en esta sección con el fin
de mejorar las características relevantes del estimado.

14.2.3 Método de Blackman y Tukey: suavizado del periodograma


Blackman y Tukey (1958) propusieron y analizaron el método en el que la secuencia de autocorrelación de las
muestras se hace pasar a través de una ventana en primer lugar y a continuación se le aplica la transformada
de Fourier para proporcionar el estimado del espectro de potencia. El razonamiento para aplicar una función
de ventana a la secuencia de autocorrelación estimada rxx (m) es que, para retardos largos, los estimados son
menos fiables porque se emplea un número menor (N − m) de puntos de datos en los mismos. Para valores de
m próximos a N, la varianza de estos estimados es muy alta y, por tanto, estos estimados deben tener un peso
menor en la formación del espectro de potencia estimado. Por tanto, el estimado de Blackman–Tukey es
M−1
BT
Pxx (f) = ∑ rxx (m)w(m)e− j2π f m (14.2.23)
m=−(M−1)

donde la función de ventana w(n) tiene una longitud de 2M − 1 y es cero para |m| ≥ M. Con esta definición para
w(n), los límites del sumatorio de la Ecuación (14.2.23) pueden ampliarse a (−∞, ∞). Por tanto, la expresión
equivalente en el dominio de la frecuencia para (14.2.23) es la convolución integral
 1/2
BT
Pxx (f) = Pxx (α )W ( f − α )d α (14.2.24)
−1/2

donde Pxx ( f ) es el periodograma. Está claro a partir de (14.2.24) que el efecto de la ventana sobre la autocorre-
lación es suavizar el estimado del periodograma, de modo que disminuya la varianza del estimado a costa de
reducir la resolución.
La secuencia de ventana w(n) debe ser simétrica (par) alrededor de m = 0, para garantizar que el estimado
del espectro de potencia es real. Además, es deseable seleccionar el espectro de la ventana para que sea no
negativo, es decir,
W ( f ) ≥ 0, | f | ≤ 1/2 (14.2.25)
Esta condición asegura que Pxx BT ( f ) ≥ 0 para | f | ≤ 1/2, que es una propiedad deseable para cualquier estimado

del espectro de potencia. Sin embargo, debemos indicar que algunas de las funciones de ventana que hemos
presentado no satisfacen esta condición. Por ejemplo, a pesar de sus bajos niveles en el lóbulo secundario, las
ventanas de Hamming y Hann (o Hanning) no satisfacen la propiedad dada en (14.2.25) y, por tanto, pueden
dar lugar a estimados del espectro negativos en algunas partes del rango de frecuencia.
El valor esperado del estimado del espectro de potencia según el método de Blackman–Tukey es
 1/2
BT
E[Pxx ( f )] = E[Pxx (α )]W ( f − α )d α (14.2.26)
−1/2

donde a partir de (14.1.37), tenemos


 1/2
E[Pxx (α )] = Γxx (θ )WB (α − θ )d θ (14.2.27)
−1/2

y WB ( f ) es la transformada de Fourier de la ventana de Bartlett. Sustituyendo (14.2.27) en (14.2.26) se obtiene


la doble integral de convolución
 1/2  1/2
BT
E[Pxx ( f )] = Γxx (θ )WB (α − θ )W ( f − α )d α d θ (14.2.28)
−1/2 −1/2

i i
i i

Capítulo 14 Estimación del espectro de potencia 871

De manera similar, trabajando en el dominio del tiempo, el valor esperado del estimado del espectro de
potencia de Blackman–Tukey es

M−1
BT ( f )]
E[Pxx = ∑ E[rxx (m)]w(m)e− j2π f m
m=−(M−1)
(14.2.29)
M−1
= ∑ γxx (m)wB (m)w(m)e − j2π f m

m=−(M−1)

donde la ventana de Bartlett es



|m|
wB (m) = 1− , |m| < N
N (14.2.30)
0, en otro caso

Evidentemente, deberíamos seleccionar la longitud de la ventana para w(n), tal que M << N, es decir, w(n) debe
ser más estrecho que wB (m) para proporcionar un suavizado adicional del periodograma. Bajo esta condición,
(14.2.28) se transforma en
 1/2
BT
E[Pxx ( f )] ≈ Γxx (θ )W ( f − θ )d θ (14.2.31)
−1/2

ya que
 1/2  1/2
WB (α − θ )W ( f − α )d α = WB (α )W ( f − θ − α )d α
−1/2 −1/2 (14.2.32)
≈ W(f −θ)

La varianza del estimado del espectro de potencia de Blackman–Tukey es

BT
var[Pxx ( f )] = E{[Pxx
BT
( f )]2 } − {E[Pxx
BT
( f )]}2 (14.2.33)

donde la media puede aproximarse como en (14.2.31). El momento de segundo orden en la Ecuación (14.2.33)
es
 1/2  1/2
BT
E{[Pxx ( f )]2 } = E[Pxx (α )Pxx (θ )]W ( f − α )W ( f − θ )d α d θ (14.2.34)
−1/2 −1/2

Suponiendo que el proceso aleatorio es gaussiano (véase el Problema 14.5), tenemos que
   -
sen π (θ + α )N 2
sen π (θ − α )N 2
E[Pxx (α )Pxx (θ )] = Γxx (α )Γxx (θ ) 1 + + (14.2.35)
N sen π (θ + α ) N sen π (θ − α )

Sustituyendo (14.2.35) en (14.2.34) tenemos


 1/2 2
BT
E{[Pxx ( f )]2 } = Γxx (θ )W ( f − θ )d θ
−1/2

 1/2  1/2
+ Γxx (α )Γxx (θ )W ( f − α )W ( f − θ )
−1/2 −1/2

  -
sen π (θ + α )N 2
sen π (θ − α )N 2
× + dα dθ (14.2.36)
N sen π (θ + α ) N sen π (θ − α )

i i
i i

872 Tratamiento digital de señales

El primer término de (14.2.36) es simplemente la media al cuadrado de Pxx BT ( f ), la cual se va a sustraer

de acuerdo con (14.2.33). Esto deja el segundo término de (14.2.36), que es la varianza. Para el caso en que
N  M, las funciones sen π (θ + α )N/N sen π (θ + α ) y sen π (θ − α )N/N sen π (θ − α ) son relativamente
estrechas comparadas con W ( f ) en las vecindades de θ = −α y θ = α , respectivamente. Por tanto,
 1/2
  -
sen π (θ + α )N 2 sen π (θ − α )N 2
Γxx (θ )W ( f − θ ) + dθ
−1/2 N sen π (θ + α ) N sen π (θ − α )
(14.2.37)
Γxx (−α )W ( f + α ) + Γxx(α )W ( f − α )

N
Con esta aproximación, la varianza de Pxx
BT
( f ) es

1 1/2
BT ( f )] ≈
var[Pxx Γxx (α )W ( f − α )[Γxx (−α )W ( f + α ) + Γxx (α )W ( f − α )]d α
N −1/2
 (14.2.38)
1 1/2 2
≈ Γ (α )W 2 ( f − α )d α
N −1/2 xx
donde en el último paso hemos hecho la aproximación
 1/2
Γxx (α )Γxx (−α )W ( f − α )W ( f + α )d α ≈ 0 (14.2.39)
−1/2

Realizamos una aproximación más en (14.2.38). Cuando W ( f ) es estrecho comparado con el espectro de
potencia real Γxx ( f ), la Ecuación (14.2.38) se aproxima aún más como
  1/2
BT ( f )] ≈ Γ2 ( f ) 1
var[Pxx W 2 (θ )d θ
xx
N −1/2
  (14.2.40)
1 M−1
≈ Γxx ( f )
2

N m=−(M−1)
w 2
(m)

14.2.4 Prestaciones de los estimadores no paramétricos


del espectro de potencia
En esta sección comparamos la calidad de los estimados del espectro de potencia de Bartlett, Welch, y Blackman
y Tukey. Como medida de la calidad, utilizamos la relación de su varianza respecto de la media al cuadrado del
estimado del espectro de potencia, es decir,
A ( f )]}2
{E[Pxx
QA = A ( f )]
(14.2.41)
var[Pxx
donde A = B, W , o BT son los tres estimados del espectro de potencia. El recíproco de esta magnitud, denominado
variabilidad, también se puede emplear como medida de rendimiento.
Como referencia, el periodograma tiene la media y la varianza siguientes:
 1/2
E[Pxx ( f )] = Γxx (θ )WB ( f − θ )d θ (14.2.42)
−1/2
  2 
sen 2π f N
var[Pxx ( f )] = Γ2xx ( f ) 1+ (14.2.43)
N sen 2π f

i i
i i

Capítulo 14 Estimación del espectro de potencia 873

donde
 2
1 sen π f N
WB ( f ) = (14.2.44)
N sen π f
Para N grande (es decir, N → ∞),
 1/2
E[Pxx ( f )] → Γxx ( f ), WB (θ )d θ = wB (0)Γxx ( f ) = Γxx ( f )
−1/2 (14.2.45)
var[Pxx ( f )] → Γ2xx ( f )

Por tanto, como se ha indicado previamente, el periodograma es un estimado asintóticamente insesgado del
espectro de potencia, pero no es coherente porque su varianza no tiende a cero cuando N tiende a infinito.
Asintóticamente, el periodograma se caracteriza por el factor de calidad

Γ2xx ( f )
QP = =1 (14.2.46)
Γ2xx ( f )

El hecho de que QP sea fijo e independiente de la longitud de los datos N es otra indicación de la mala calidad
de este estimado.

Estimado del espectro de potencia de Bartlett. La media y la varianza del estimado del espectro de potencia
de Bartlett
 1/2
B
E[Pxx ( f )] = Γxx (θ )WB ( f − θ )d θ (14.2.47)
−1/2

  2 
B 1 sen 2 π f M
var[Pxx ( f )] = Γ2xx ( f ) 1 + (14.2.48)
K M sen 2π f

y
 2
1 sen π f M
WB ( f ) = (14.2.49)
M sen π f
Cuando N → ∞ y M → ∞, y manteniéndose K = N/M fijo, tenemos que
 1/2
B ( f )]
E[Pxx → Γxx ( f ), WB ( f )d f = Γxx ( f )wB (0) = Γxx ( f )
−1/2 (14.2.50)
B 1 2
var[Pxx ( f )] → Γ (f)
K xx

Observe que el estimado del espectro de potencia de Bartlett es asintóticamente insesgado y si se permite
que K aumente al aumentar N, el estimado también es coherente. Por tanto, asintóticamente, este estimado se
caracteriza por el factor de calidad
N
QB = K = (14.2.51)
M
La resolución en frecuencia del estimado de Bartlett, medido tomando la anchura a 3-dB del lóbulo principal
de la ventana rectangular, es
0.9
∆f = (14.2.52)
M

i i
i i

874 Tratamiento digital de señales

Por tanto, M = 0.9/∆ f y el factor de calidad se transforma en

N
QB = = 1.1N∆ f (14.2.53)
0.9/∆ f

Estimado del espectro de potencia de Welch. La media y la varianza del estimado del espectro de potencia
de Welch son
 1/2
W
E[Pxx ( f )] = Γxx (θ )W ( f − θ )d θ (14.2.54)
−1/2

donde
 2
M−1 
1  − j2π f n 
W(f) =  ∑ w(n)e  (14.2.55)
MU  n=0 
y


 1 2
 Γxx ( f ), sin solapamiento
W L
var[Pxx ( f )] = (14.2.56)

 9 Γ2xx ( f ), para un 50 % de solapamiento y
 ventana triangular
8L
Cuando N → ∞ y M → ∞, la media converge a
W
E[Pxx ( f )] → Γxx ( f ) (14.2.57)

y la varianza converge hacia cero, por lo que el estimado es coherente.


Bajo las dos condiciones dadas por (14.2.56), el factor de calidad es
 N

 L= , sin solapamiento
M
QW = (14.2.58)
 8L = 16N , para un 50 % de solapamiento y

9 9M ventana triangular

Por otro lado, la anchura espectral de la ventana triangular en el punto a 3-dB es

1.28
∆f = (14.2.59)
M
En consecuencia, el factor de calidad expresado en términos de ∆ f y N es

 0.78N∆ f , sin solapamiento

QW = para un 50 % de solapamiento y (14.2.60)

 1.39N∆ f ,
ventana triangular

Estimado del espectro de potencia de Blackman–Tukey. La media y la varianza de este estimado se aproximan
como sigue
 1/2
BT ( f )]
E[Pxx ≈ Γxx (θ )W ( f − θ )d θ
−1/2
  (14.2.61)
1 M−1
BT ( f )]
var[Pxx ≈ Γxx ( f )
2
∑ w (m)
N m=−(M−1)
2

i i
i i

Capítulo 14 Estimación del espectro de potencia 875

Estimado Factor de calidad


Bartlett 1.11N∆ f
Welch (50 % de solapamiento) 1.39N∆ f
Blackman–Tukey 2.34N∆ f

Tabla 14.1. Calidad de los estimados del espectro de potencia.

donde w(m) es la secuencia de ventana utilizada para afilar la secuencia de autocorrelación estimada. Para las
ventanas rectangular y de Bartlett (triangular) tenemos

1 M−1 2M/N, ventana rectangular

N n=−(M−1)
w (n) =
2
2M/3N, ventana triangular
(14.2.62)

Es evidente a partir de (14.2.61) que el valor medio del estimado es asintóticamente insesgado. Su factor
de calidad para la ventana triangular es
N
QBT = 1.5 (14.2.63)
M
Como la longitud de la ventana es 2M − 1, la resolución en frecuencia mediada en los puntos a 3-dB es
1.28 0.64
∆f = = (14.2.64)
2M M
y, por tanto,
1.5
QBT = N∆ f = 2.34N∆ f (14.2.65)
0.64
Estos resultados se resumen en la Tabla 14.1. Es evidente a partir de los resultados que hemos obtenido que
los estimados del espectro de potencia de Welch y Blackman–Tukey son algo mejores que el estimado de Bartlett.
Sin embargo, las diferencias en lo que se refiere a las prestaciones son relativamente poco significativas. Lo
importante es que el factor de calidad aumenta cuando aumenta la longitud N de los datos. Este comportamiento
característico no es compartido por el estimado del periodograma. Además, el factor de calidad depende del
producto de la longitud de los datos N por la resolución en frecuencia Delta f . Para obtener un determinado
nivel de calidad, puede disminuirse ∆ f (aumentar la resolución de frecuencia) aumentando la longitud N de los
datos, y viceversa.

14.2.5 Requisitos de cálculo de los estimados no paramétricos del espectro


de potencia
El otro aspecto importante de los estimados no paramétricos del espectro de potencia son los requisitos de
cálculo. Para esta comparación, suponemos que los estimados se basan en una cantidad fija de datos N y una
resolución especificada ∆ f . Se supone que se emplea el algoritmo FFT de raíz 2 para todos los cálculos. Sólo
contabilizaremos el número de multiplicaciones complejas necesarias para calcular el estimado del espectro de
potencia.

Estimado del espectro de potencia de Bartlett.


Longitud FFT = M = 0.9/∆ f
N
Número de transformadas FFT = = 1.11N∆ f
M
 
N M N
Número de operaciones = log2 M = log2 0.9
∆f
M 2 2

i i

También podría gustarte