1916
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003
Combined RLS-LMS Initialization for Per Tone
Equalizers in DMT-Receivers
Geert Ysebaert, Koen Vanbleu, Gert Cuypers, Marc Moonen, and Thierry Pollet
Abstract—In discrete multitone receivers, the classical equalizer structure consists of a (real) time domain equalizer (TEQ)
combined with complex one-tap frequency domain equalizers. An
alternative receiver is based on a per tone equalization (PTEQ),
which optimizes the signal-to-noise ratio (SNR) on each tone separately and, hence, the total bitrate. In this paper, a new initialization scheme for the PTEQ is introduced, based on a combination of
least mean squares (LMS) and recursive least squares (RLS) adaptive filtering. It will be shown that the proposed method has only
slightly slower convergence than full square-root RLS (SR-RLS)
while complexity as well as memory cost are reduced considerably.
Hence, in terms of complexity and convergence speed, the proposed
algorithm is in between LMS and RLS.
Index Terms—Adaptive filtering, ADSL, discrete multitone,
equalization, LMS, RLS.
I. INTRODUCTION
SYMMETRIC digital subscriber lines (ADSLs) provide
high bitrates over the existing telephone network. ADSLs
employ a transmission scheme based on discrete multitone
(DMT) [1]. DMT divides the available bandwidth into parallel
subchannels or tones, which are then modulated separately.
Mathematically, this operation is performed by means of an
inverse fast Fourier transform (IFFT). After IFFT modulation,
a guard time sequence of samples—called a cyclic prefix—is
inserted between successive symbols to ensure that samples
from one symbol do not interfere with the samples from
another symbol. At the receiver, the cyclic prefix is removed
and demodulation is performed by means of an FFT.
It is known that the mitigation of intersymbol interference
(ISI) with a cyclic prefix is only effective if the length of the impulse response is shorter than the cyclic prefix. A long prefix,
A
Manuscript received January 4, 2002; revised January 21 2003. G. Ysebaert
and G. Cuypers are with the I. W. T., and K. Vanbleu is with the F. W. O. Vlaanderen. This work was carried out at the ESAT Laboratory of the Katholieke Universiteit Leuven and was supported by the Belgian State, Prime Minister’s Office—Federal Office for Scientific, Technical, and Cultural Affairs—Interuniversity Poles of Attraction Program (2002-2007)—IUAP P5/22 (“Dynamical
Systems and Control: Computation, Identification and Modeling”) and P5/11
(“Mobile multimedia communication systems and networks”), the Concerted
Research Action GOA-MEFISTO-666 (Mathematical Engineering for Information and Communication Systems Technology) of the Flemish Government, Research Project FWO G.0295.97 (“Design and implementation of adaptive digital
signal processing algorithms for broadband applications”), and by Alcatel-Bell.
The associate editor coordinating the review of this paper and approving it for
publication was Dr. Zhi-Quan (Tom) Luo.
G. Ysebaert, K. Vanbleu, G. Cuypers, and M. Moonen are with the
Katholieke Universiteit Leuven, ESAT/SCD-SISTA, B-3001 Leuven, Belgium (e-mail: ysebaert@esat.kuleuven.ac.be; vanbleu@esat.kuleuven.ac.be;
cuypers@esat.kuleuven.ac.be; moonen@esat.kuleuven.ac.be).
T. Pollet is with Access to Networks, Research, and Innovation, ALCATEL,
Antwerpen, Belgium (e-mail: Thierry.Pollet@alcatel.be).
Digital Object Identifier 10.1109/TSP.2003.812727
however, introduces a large overhead and, hence, results in a
small useful bitrate. A well-known solution is to add a -tap
time domain equalizer (TEQ) to shorten the effective channel
impulse response. In the literature, many algorithms exist to initialize this TEQ [2]–[6]. In [4], the TEQ is calculated based on a
minimum mean-square-error (MMSE) criterion, whereas in [5],
the TEQ is obtained by maximizing the shortening signal-tonoise ratio (SSNR). Both formulations define the TEQ initialization problem without a direct relation to the resulting bitrate.
In general, these methods suffer from sensitivity to the so-called
synchronization delay and unpredictable behavior. Algorithms
exist for calculating the TEQ, which optimize the SNR and,
hence, obtain optimal bitrates, but they are based on difficult
nonlinear optimization procedures [3]. A general disadvantage,
which is independent of the used criterion, is that the TEQ equalizes all tones in a combined way, which limits system performance.
In [7] and [8], Van Acker et al. proposed a new equalizer
scheme based on a “per tone” equalization (PTEQ). This structure is able to optimize the SNR for each tone separately and
hence achieves substantial bitrate improvements.1 Moreover,
when radio frequency interference (RFI) is present, the gain
in performance compared with a TEQ-based receiver is even
higher [9]. In summary, the PTEQ has a performance that is always better than the traditional TEQ-based receiver. This motivates the search for cheap initialization procedures to initialize
the PTEQ coefficients.
This paper addresses the problem of adaptively initializing
the per tone equalizer coefficients in a cheap way. The problem
consists of solving several least squares problems in parallel to
taps per used tone. Due to bad conditioning of
determine
the problem, an initialization based on a straightforward least
mean squares (LMS) adaptive filtering [10] will require many
more training symbols than that foreseen by the ADSL standard.
As an alternative to LMS, a recursive least squares (RLS) filter
adaptation could be implemented to give faster convergence at
the expense of extra computational complexity. In [11], a reasonably cheap RLS-based initialization scheme is presented,
where most of the RLS processing is shared over all used tones
PTEQ inputs are common for all the tones. Here,
since
we extend the ideas of [11] by focusing on a mixture of RLS
and LMS to combine the advantages of both schemes, i.e., fast
convergence and low complexity, respectively. We will show
that the obtained adaptive algorithm reduces the computational
complexity approximately by a factor of four compared with the
1Strictly speaking, power allocation and bandwidth optimization should also
be considered when speaking about achieving the optimal bitrate. However, in
this paper, we will only consider the “equalization-problem.”
1053-587X/03$17.00 © 2003 IEEE
YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS
scheme of [11] while still using a reasonably small number of
training symbols. In other words, both convergence speed and
computational complexity are in between RLS and LMS. Apart
from the lower complexity, the proposed algorithm exhibits also
a substantially lower memory cost. The memory needed to store
all the filter coefficients and RLS-LMS dependent parameters
is approximately halved in size compared with full RLS, which
makes the proposed scheme tractable for hardware implementation.
In the search for adaptive filters with fast convergence and
low complexity, several schemes have been previously developed to find “intermediate” solutions between LMS and RLS
algorithms. In the scope of acoustic echo cancellation, a set of
algorithms were presented to link the advantages of LMS and
RLS, i.e., the class of affine projection algorithms (APA) [12],
[13]. According to [12], APA algorithms are especially useful
in case of a large number of filter taps and small block lengths.
However, this is not the case for per tone equalization in ADSL,
where the number of taps is typically small. Moreover, PTEQ
initialization requires several APA problems to be solved in parallel, where it is not possible to exploit the common part of the
PTEQ inputs in a cheap way.
In the past, the computational complexity of full RLS
schemes is simplified as well, using so-called fast RLS
techniques [10]. These schemes are, in particular, suitable
for filtering problems where incoming signals are filtered
by a tapped delay line. The complexity reductions attained
in these algorithms rely on the signal shift nature of the
filtering problem. However, extensions to “linear combiner”
problems—as is the case with the per tone equalizer—are not
possible.
Although the initialization procedure described in this paper
is only treated for per tone equalization, it can be used in general
for problems where many RLS problems have to be solved simultaneously but where inputs can be shared over the different
RLS schemes to reduce the complexity and memory requirements.
The paper is organized as follows. In Section II, the data
model is introduced, whereas an overview of the per tone equalizer structure is given in Section III to make the paper self-contained. Section IV describes the combination of RLS and LMS.
In Section V, the computational complexity is calculated, and
a comparison is made with normalized LMS and RLS. Finally,
simulation results are presented in Section VI followed by conclusions in Section VII.
II. DATA MODEL
The goal of an equalizer is to reconstruct the transmitted
signal by removing interferences of neighboring symbols. An
easy way to take into account ISI as well as intercarrier interference (ICI) is to consider a data model with three consecutive
complex valued symbols, namely,
and
.
is a complex frequency domain symbol transHere,
mitted at time , with the size of the modulating IFFT. The
includes
elements
vector
that are chosen from a complex QAM constellation with a size
depending on the SNR of the corresponding frequency bin or
1917
tone ( is the tone index). The
th symbol is the symbol of
th and
th cause interference
interest, whereas the
on this symbol.
The frequency domain symbols are transformed to the time
domain by means of an IFFT, which is mathematically repre. Before transmission
sented by the -points IDFT matrix
of a DMT symbol, the last samples of the IFFT output are
copied to the front of the symbol to form a cyclic prefix. This
operation is performed by the matrix
(1)
. The DMT symbols are transmitted over
of size
a channel modeled as a finite impulse response (FIR) filter of
vector
length (without loss of generality). The
with received time domain samples may now be specified as
..
.
..
.
..
..
.
.
(2)
the length of a symbol with prefix added,
with
additive noise with the sample index, and
and
zero
matrices of size
and
, respectively.
The vector of size denotes the channel impulse response
in reversed order, i.e.,
(3)
where the last samples represent the head of the channel impulse response, whereas the tail corresponds to the first
samples. Head and tail are calculated in such a way that the
zero reference delay maximizes the energy in
consecutive
. is the synchronization
channel coefficients, i.e., in
delay and is a design parameter. It represents a delay relative to
the zero reference delay.
1918
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003
III. PER TONE EQUALIZATION
The aim of the receiver is to reconstruct the transmitted
from the received samples
. Traditional
symbol
approaches insert a (real) -tap TEQ prior to demodulation
to shorten the channel impulse response to the cyclic prefix
remove the
length plus one. Afterwards, one-tap FEQs
remaining magnitude and phase distortion introduced by the
for tone can be
overall channel. Hence, the FEQ output
written as
..
.
..
..
.
.
(4)
the th row of the DFT matrix
and
an
Toeplitz matrix. In [8], a per tone approach is taken where
the TEQ operations are transferred to the frequency domain in
adjusted to each tone separately.
order to design a PTEQ
Hence, (4) can be rewritten as
with
..
.
..
.
..
.
(5)
overall complexity during data transmission is roughly the same
for a PTEQ and a TEQ based structure.
IV. EQUALIZER INITIALIZATION: COMBINED RLS AND LMS
A. Normalized Least-Mean-Square Algorithm
Direct initialization of all the PTEQ coefficients, based on
the knowledge of the channel impulse response and the signal
and noise power spectral densities, turns out to be computationally too expensive to implement in hardware. Therefore, several
techniques have been devised to initialize the equalizer coefficients based on adaptive filtering with training symbols. During
data transmission, further equalizer coefficient adaptation can
be based on so-called “decision directed operation.”
With the normalized least-mean-square algorithm (NLMS)
[10], each equalizer coefficient can be updated according to the
following algorithm.
Algorithm: Normalized LMS
for each tone
Initialize filter coefficients
For
Tone-independent part
(7)
Tone-dependent part
For
(8)
where the FEQ is incorporated in the PTEQ for tone , and
equals with its coefficients in reversed order. Apparently, the
FFT operation of (4) is turned into a sliding FFT in (5). However,
the filtering with the PTEQ can be computed efficiently using
so-called “difference terms,” and a modified
only one FFT,
[8].
PTEQ denoted with
for tone can be
Van Acker et al showed that the PTEQ
found as the solution of an MMSE problem [8]
(9)
(10)
end
end
Here,
(6)
and with
the expectation operwith2
ator. This criterion indicates that the PTEQ of tone constructs
(real) difference terms with the
a linear combination of
(complex) FFT output for that tone and that the coefficients
should be such that the filter output is as close as possible to the
. It is important to notice
transmitted constellation point
that the difference terms are common for all the tones, which
in particular will considerably reduce the computational complexity of the initialization scheme.
This PTEQ is optimal (compared to the TEQ) in the sense
that it optimizes the SNR for each tone separately. Note that
the PTEQ is not necessary real anymore but will have complex
values in the general case. Still, it has been shown in [8] that the
2
[
v
v
111
indicates that its coefficients are in a different order than v , i.e., v
v
v ] .
=
contains
difference terms
for
(which are common
contains the FFT output for tone
for all tones),
of
,
is the normalized stepsize,
prevents overflow when
becomes very small,
is the set of used tones,
indicates complex conjugation,
is either the decision on the filter output (decision
and
directed mode) or a training symbol (training mode). A signal
flow graph (SFG) is shown in Fig. 2, and the functionality of
the building blocks is explained in Fig. 1.
It is well known that the convergence speed of NLMS is determined by the eigenvalue spread of the input correlation matrix
(11)
typically has a large eigenvalue
Simulations show that
spread, and as a consequence, convergence is slow. Iterative
initialization based on LMS-based schemes have low complexity but require, unfortunately, an excessively large number
YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS
1919
Notice that the tone index is omitted since the difference
terms are common for all the tones.
, determine Givens transformations
2) For
[10]
such that
(15)
using the previously defined transforma3) Update
tions, and apply exponential weighting with
(16)
(17)
Tone-dependent part:
For
1) Form the product
Fig. 1. Building blocks for SFGs.
of training symbols. The low cost can easily be understood
from Fig. 2.
(18)
2) Determine the Givens transformation [10]
such that
B. Square-Root Recursive Least Squares
To overcome the slow convergence of NLMS, a square-root
recursive least squares scheme (SR-RLS)3 [10] can be used. The
scheme is based on equalizer coefficient updating similar to (10)
but now with a transformed input vector
(19)
3) Update
, and apply exponential weighting
(12)
(20)
which is called the Kalman gain vector for tone . The transformation is constructed based on the updating of a lower triangular
, where
is the (upper trianmatrix
gular) Cholesky factor of the sample covariance matrix, i.e.,
(21)
4) Update
(13)
(22)
and
indiwith a forgetting factor and
cating complex conjugate transpose. The factor determines a
, of the most
window, with an effective length of
recently received symbols. The following formulas describe the
SR-RLS algorithm.
Algorithm: SR-RLS
and
for each tone
Initialize filter coefficients
For
Tone-independent part
1) Form the matrix-vector product
(14)
3The square-root RLS algorithm is sometimes referred to as the inverse
QR-RLS algorithm [10].
end
end
represents a rotation matrix acting upon the
In step 2,
th and the last component of
such that the
th component is zeroed. This algorithm ensures that a Kalman
gain vector, as defined in (12), is obtained in a cheap way. See
[10] for further details.
The convergence behavior for tone now depends on the
eigenvalues of
(23)
(24)
1920
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003
Fig. 2.
Signal flow graph for NLMS performed on each tone.
(25)
cients will be updated in the direction determined by a modified
Kalman vector [compare with (12)]
(26)
becomes indepenwhere we used the fact that for large ,
dent of the iteration index in a stationary environment, and
will be equal to the inverse of the input corhence,
relation matrix times the effective window length . In other
has no eigenvalue spread, and hence, unwords, the matrix
like with LMS, convergence is fast.
This algorithm is depicted in Fig. 3. See [11] for a detailed
description of this signal flow graph. One can see that every
used tone has a -tap PTEQ with as its inputs the complex FFT
output for that tone and
real difference terms. It is imreal difference terms give rise to a
portant to see that the
real triangular part of
, which is common
. The FFT
for all the tones
output is taken as the last input to the RLS-structure and makes
only the last (bottom) row of
complex and different for different tones.
C. Combined RLS and LMS
In Fig. 3, one can see that a large part of the computational
, whereas the
cost is due to the per tone RLS-part
is reasonably cheap. It would be intercommon part with
esting if the per tone part could be simplified while preserving
the fast convergence of RLS.
A solution exists in replacing the per tone RLS part with
a much cheaper LMS equivalent. The resulting structure will
be referred to as RLS-LMS. This means that the PTEQ coeffi-
(27)
elements of this vector are transformed versions
The first
of the difference terms (cf. RLS), whereas the last element is
treated in an LMS way. In general, an appropriate scaling of the
last element is required in order to have equal order of magnitude of the elements. Hence, the FFT output is scaled with the
.
inverse of the averaged FFT output energy
In this way, the RLS part is used to improve the eigenvalue
spread of the common, real difference terms, whereas the complex FFT output is scaled and treated in LMS sense. Mathematically, the RLS-LMS algorithm can be written as follows:
Algorithm: RLS combined with LMS
and the tone-dependent
Initialize the tone-independent
filter coefficients
and accumulated FFT output energy
.
For
Tone-independent part: See SR-RLS algorithm description
Tone-dependent part
For
1) Accumulate the energy of the FFT output of tone in
order to scale the FFT output in a similar way as the difference terms
(28)
YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS
Fig. 3.
2) Update
1921
Signal flow graph for SR-RLS performed on each tone.
for each used tone
(29)
end
end
,
The stepsize is added to ensure convergence. Note that
and the first
elements of
have real values.
This procedure is convergent in the mean and the convergence
.
depends on the eigenvalues of the cross correlation matrix
(For a proof, see Appendix A.) The RLS part will remove a part
of the eigenvalue spread of the input correlation matrix, which
would be observed when ordinary LMS would be used. As a result, the overall eigenvalue spread of the proposed scheme only
depends on three unequal eigenvalues (see Appendix B). Hence,
we can state that RLS-LMS solves in a sense an LMS problem
with three distinct eigenvalues. In general, LMS algorithms do
not experience a lot of convergence problems in the case of an
ill conditioned three-dimensional problem.
As a second argument for the favorable convergence of the
combined approach, we mention the fact that the noise is averaged due to the exponential weighting. A pure LMS update
is characterized by a large eigenvalue spread as well as a noisy
version of the true gradient, leading both to slow convergence
[10]. In the proposed algorithm, the noise is averaged, resulting
in a more reliable update direction. These arguments explain intuitively why the combined approach works as well.
The RLS-LMS SFG is depicted in Fig. 4, which clearly illustrates the reduced per tone complexity.
The stepsize must be smaller than the effective memory of
the algorithm
due to exponential weighting, i.e.,
(30)
1922
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003
Fig. 4. Signal flow graph for RLS combined with an LMS-based per tone structure.
(For a proof, see Appendix B.) A large stepsize results, of
course, in a large excess error but makes a fast convergence
possible. Hence, like in LMS-based schemes, a tradeoff has to
be made.
V. COMPUTATIONAL COMPLEXITY
In this section, we determine the computational complexity
for the different initialization algorithms, based on, respectively,
NLMS, SR-RLS, and the combined RLS-LMS. In our complexity calculations, we only consider the number of real multiplications and real additions, i.e., a multiplication of two complex numbers is counted as four real multiplications. Note again
that the difference terms are real and that the PTEQ coefficients
are complex valued.
1) Results of the complexity calculations for NLMS are
summarized in Table I. Note that the per tone part should
to obtain
be multiplied by the number of used tones
the overall complexity.
2) Referring to Fig. 3, SR-RLS contains a real common part,
due to
real common difference terms, as well as a
complex per tone part. The complexity of the common
part is given in Table II, whereas the complexity of the
per tone part can be found in Table III. Since the common
part is the same for SR-RLS as well as RLS-LMS, it can
be described by (14)–(17). The complexity calculations
of the per tone part can be derived in a similar way. The
Kalman vector is in this case a complex vector, which
results in extra complexity for the filter update compared
with NLMS.
3) The combined RLS-LMS contains the same common part
as SR-RLS but has a significantly lower number of computations per tone. Comparing Tables III and IV, one can
see that a substantial per tone complexity reduction is
achieved.
For a typical ADSL downstream case, where
MHz,
, and where approximately
,
tones are
YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS
TABLE I
COMPLEXITY CALCULATION FOR THE TONE-INDEPENDENT
TONE-DEPENDENT PART OF NLMS
AND
1923
TABLE III
COMPLEXITY CALCULATION FOR THE PER TONE PART OF SR-RLS
TABLE IV
COMPLEXITY CALCULATION FOR THE PER TONE PART OF RLS-LMS
TABLE II
COMPLEXITY CALCULATION FOR THE COMMON PART OF SR-RLS AND
FOR THE COMBINED RLS-LMS APPROACH
used, the overall complexity for the respective algorithms becomes
1) NLMS:
for each tone . Due to the real common difference terms,
is real and common for all tones,
remain to be
whereas complex numbers of
stored for each tone. For RLS-LMS, the latter part disappears,
needs storage. Hence, when
and instead, only the real value
is small compared with the number of used tones, the common
part is almost negligible, and approximately a two-fold reduction in memory is obtained with RLS-LMS compared with full
SR-RLS.
In the upstream case, the reduction in computational complexity is smaller since fewer tones are of interest. However, the
two-fold reduction for the memory requirements of the per tone
part still holds. Hence, RLS-LMS is still favorable for upstream
initialization.
add./s
VI. SIMULATION RESULTS
A. Convergence Behavior
mult./s
2) RLS:
add./s
mult./s
As a performance measure for the simulations, we will use
the overall bitrate as well as the SNR for tone averaged over
simulation runs, using the following formulas [8]:
bitrate
3) RLS-LMS:
(31)
(32)
add./s
mult./s.
Hence, a four- to five-fold complexity reduction per iteration is obtained in the downstream case. Apart from the reduced complexity, the memory requirements for the per tone
part are lowered as well. In the full SR-RLS scheme, the main
part of the memory is assigned to store the complex filter
and the elements of the lower triangular matrix
coefficients
SNR
(33)
where is the number of bits assigned to tone , is the SNR
the noise margin, and
the coding gain. In our simgap,
ulations, the following values were used:
,
,
dB,
dB,
dB, and
MHz.
1924
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003
Fig. 6. Convergence speed of SR-RLS and RLS-LMS for CSA #4 with 24
DSL NEXT disturbers.
Fig. 5. Comparison of different initialization procedures for the per tone
24 for standard channel CSA#4 with 24 DSL NEXT
equalizers with T
disturbers (downstream).
=
The convergence in terms of bitrate is used to measure the performance since it captures the convergence of the SNRs of the
different tones in a uniquely determined way. Moreover, it is the
only relevant way to combine the tone-dependent SNRs into one
measure representing all the tones.
Simulations were performed on a standard loop CSA#4 [4],
[14] with additive white Gaussian noise of 140 dBm/Hz and
,
24 DSL NEXT disturbers. Other parameters are
,
,
, and
. Front end filters to separate
up- and downstream transmission are included in the channel. It
is assumed that echo is perfectly cancelled. The used tones are
from 33 up to tone 255.
In Fig. 5, the initialization procedure for SR-RLS, NLMS,
and RLS-LMS are compared for the downstream case, where
the initial values for the filter coefficients are white and where
is initialized with 10 on the diagonal. The bitrate is plotted
as a function of the number of iterations (symbols). Two types
of training sequences are used:
• a white training sequence where data on different tones
and different symbols are uncorrelated;
• a pseudo random binary sequency (PRBS), as defined in
the ADSL standard, used in the startup phase of the ADSL
modem [14]. This PRBS is repetitive with a period of 512
symbols (downstream).
In Fig. 5, we can clearly see the fast convergence of
RLS-LMS, which is in between SR-RLS and NLMS. SR-RLS
initialization needs approximately 200 symbols to converge,
whereas the combination of RLS and LMS is close to the
optimal value in more or less 1000 symbols. Notice that after
1000 symbols, NLMS has still not converged. The learning
curve obtained with a PRBS training sequence shows almost no
performance difference compared with a completely random
training signal.
In upstream ADSL [over plain old telephone service (POTS)],
a PRBS sequence with a period of 64 symbols is used. Due to
this highly repetitive character, all adaptive initialization proce-
TABLE V
CONVERGENCE BEHAVIOR OF SR-RLS AND RLS-LMS
CSA #1–8 LOOPS
FOR THE
dures suffer from unacceptable performance (the schemes converge but stay far below the MMSE solution). Hence, we will
limit ourselves to the downstream case.
B. Convergence Versus Equalizer Taps
Fig. 6 shows the number of symbols required to reach 98.5%
of the MMSE bitrate as a function of the number of equalizer
taps per tone. SR-RLS and RLS-LMS are considered with a
random training signal or a PRBS for loop CSA#4 with additive white Gaussian noise of 140 dBm/Hz and 24 DSL NEXT
disturbers. Other parameters are
and
. The
figure suggests that the convergence speed decreases with the
number of taps. However, this is an observation that is valid for
SR-RLS as well. The PRBS training sequence shows a slightly
slower convergence behavior.
C. Different Channel Models
To illustrate the performance of the proposed algorithm for a
wide range of channel models, simulations were executed for
the downstream CSA #1–8 channels [4] with additive white
Gaussian noise of 140 dBm/Hz and 24 DSL NEXT disturbers.
Table V indicates the number of iterations required to reach,
with SR-RLS and RLS-LMS, 99% of the bitrate obtained with
,
, and
.
the MMSE PTEQ, where
The worst-case convergence difference between SR-RLS and
YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS
1925
. For compact notation, the tone index
will be omitted in the formulas. Consider
(34)
is the estimation error at the optimal Wiener solution
where
. The weight error vector can be written, with (29), as
(35)
(36)
(37)
=
Fig. 7. Eigenvalues of X with T
16 for standard channel T1.601#13 with
24 DSL NEXT disturbers (downstream).
RLS-LMS is obtained for loop CSA #8, where RLS-LMS is approximately 4, which is four times slower than SR-RLS.
(38)
the identity matrix of size
with
becomes
. After applying (34), this
(39)
D. Eigenvalues of
Fig. 7 depicts the eigenvalue spread of
on a logarithmic
for a standard loop T1.601#13
scale for RLS-LMS with
is determined based on 3000 symbols. It is seen
[14]. Here,
eigenvalues are equal and that 2 eigenvalues are
that
clearly different. This is in accordance with the derivation in
eigenvalues
Appendix B. We have also shown that the
with
approximately equal to
.
are equal to
in this simulation, we expect that
eigenSince
values will approximately be equal to
.
This value can also be found in Fig. 7. The upperbound for
, which is equal to
the eigenvalue was determined to be
in our simulation. In addition, this value
is approximately confirmed by Fig. 7.
(40)
where
sides yields
(41)
VII. CONCLUSIONS
In this paper, we presented a new scheme to initialize the
per tone equalizers for DMT-based receivers. The per tone
equalizers form an upperbound for the bitrate obtained with
the more traditional TEQ based receivers, which motivates
the need for cheap initialization algorithms. The presented
initialization algorithm is based on a combination of RLS and
LMS. We showed that the behavior of the new algorithm is
situated between SR-RLS and LMS. More specifically, the
RLS-LMS scheme achieves convergence in an acceptably
small number of training symbols for a complexity lower than
SR-RLS and with reduced memory requirements. It was proven
that the algorithm is convergent in the mean and upperbounds
for the stepsize that was derived.
APPENDIX A
PROOF CONVERGENCE IN THE MEAN OF RLS-LMS
In this Appendix, we will prove that the convergence
of RLS-LMS depends on the cross correlation matrix
. Taking expected values of both
(42)
is orthogonal to
where we assumed that the input vector
the estimation error (orthogonality principle [10]), that
becomes independent of the time index (which holds for sta), and that the input vector
is
tionary inputs and
(cf. traditional “independence assumption”
independent of
[10]).
,
Now, convergence is assured if all eigenvalues
of
satisfy the following relation:
(43)
or more specifically (if all
lowing must hold:
s are positive, see below), the fol-
(44)
As a first conclusion, we see that convergence in the mean de.
pends on the eigenvalues of the cross correlation matrix
1926
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003
(48)
(49)
(50)
orthogonal projection
ically as
APPENDIX B
UPPERBOUND FOR THE STEPSIZE
of
on
can be written mathemat-
In the following, we will derive more specific upper and lower
and apbounds for the stepsize . With
proaching infinity, one can state approximately4
(52)
The cosine squared of the angle between
and
is given by
(45)
(53)
(46)
the number of most recent input symbols taken into
with
account by the exponential forgetting with . is defined as in
(13), but only difference terms are considered.
Equivalently, we can write the inverse of the FFT correlation
for tone
(54)
(55)
(47)
where we used (28) for the accumulated energy of the FFT
output.
The previous equations are now used to determine a more
in (48)–(50), shown at the top of the
explicit expression for
page. One can easily prove that the previous matrix has
eigenvalues equal to
and two eigenvalues equal to
, with equal to
(56)
(57)
Hence, is always positive [see (51)] and smaller than one.
, which means that
Therefore,
is positive definite. In other words, the algorithm converges in
the mean if
(58)
(51)
The value of has a very specific structure, which is well known
in subspace theory. It represents, in fact, the cosine of the principal angle [15] between the subspace spanned by the FFT samples and the subspace spanned by the difference terms. This can
easily be seen if we consider the data-driven case where the expectation operator is left out.
When samples of the FFT output for tone are stacked, we
(size
) spanning the subspace formed
obtain a vector
by the FFT output of that tone. If we do the same for the differ[size
], where
ence terms, we obtain the matrix
difference terms consecutive in time are put into the rows. The
4Ergodicity is assumed, i.e., expected values are replaced by their time averages.
This equation indicates that the stepsize must be smaller than
the effective number of samples in the memory of the system
, where is the forgetting factor.
REFERENCES
[1] J. A. C. Bingham, “Multicarrier modulation for data transmission: An
idea whose time has come,” IEEE Commun. Mag., vol. 28, pp. 5–14,
May 1990.
[2] T. Pollet, H. Steendam, and M. Moeneclaey, “Performance degradation
of multi-carrier systems caused by an insufficient guard interval duration,” in Proc. Int. Workshop Copper Wire Access Syst. Bridging Last
Copper Drop, 1997, pp. 265–270.
[3] N. Al-Dhahir and J. M. Cioffi, “Optimum finite-length equalization for
multicarrier transceivers,” IEEE Trans. Commun., vol. 44, pp. 56–64,
Jan. 1996.
[4]
, “Efficiently computed reduced-parameter input-aided MMSE
equalizers for ml detection: A unified approach,” IEEE Trans. Inform.
Theory, vol. 42, pp. 903–915, May 1996.
YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS
[5] P. J. W. Melsa, R. C. Younce, and C. E. Rohrs, “Impulse response shortening for discrete multitone transceivers,” IEEE Trans. Commun., vol.
44, pp. 1662–1672, Dec. 1996.
[6] M. V. Bladel and M. Moeneclaey, “Time-domain equalization for multicarrier communication,” in Proc. IEEE Global Telecommun. Conf.,
1995, pp. 167–171.
[7] T. Pollet, M. Peeters, M. Moonen, and L. Vandendorpe, “Equalization
for DMT-based broadband modems,” IEEE Commun. Mag., pp.
106–113, May 2000.
[8] K. Van Acker, G. Leus, M. Moonen, O. van de Wiel, and T. Pollet, “Per
tone equalization for DMT-based systems,” IEEE Trans. Commun., vol.
49, pp. 109–119, Jan. 2001.
[9] K. Van Acker, T. Pollet, G. Leus, and M. Moonen, “Combination of per
tone equalization and windowing in DMT-receivers,” Signal Process.,
vol. 81, pp. 1571–1579, 2001.
[10] S. Haykin, Adaptive Filter Theory, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1996.
[11] K. Van Acker, G. Leus, M. Moonen, and T. Pollet, “RLS-based initialization for per tone equalizers in DMT-receivers,” in Proc. Eur. Signal
Process. Conf., Tampere, Finland, Sept. 2000.
[12] M. Montazeri and P. Duhamel, “A set of algorithms linking NLMS and
block RLS algorithms,” IEEE Trans. Signal Processing, vol. 43, pp.
444–453, Feb. 1995.
[13] S. L. Gay and S. Tavathia, “The fast affine projection algorithm,” in Proc.
IEEE Int. Conf. Acoust., Speech Signal Process., 1995, pp. 3023–3026.
[14] “Draft new recommendation g.992.1: ADSL transceivers,” Int.
Telecommun. Union, Tech. Rep., July 1999.
[15] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd
ed. Baltimore, MD: John Hopkins Univ. Press, 1996.
Geert Ysebaert was born in Leuven, Belgium,
in 1976. In 1999, he received the Master degree
in electrical engineering from the Katholieke
Universiteit Leuven (KU Leuven). He is currently
pursuing the Ph.D. degree with the Electrical
Engineering Department (ESAT), KU Leuven, under
the supervision of Prof. M. Moonen, and he is
supported by the Flemish Institute for Scientific and
Technological Research in Industry (IWT).
His research interests are in the area of digital
signal processing for DSL communications.
Koen Vanbleu was born in Bonheiden, Belgium,
in 1976. In 1999, he received the Master degree
in electrical engineering from the Katholieke
Universiteit Leuven (KU Leuven), Leuven, Belgium.
Currently, he is pursuing the Ph.D. degree with
the SCD-SISTA Laboratory of the Department of
Electrical Engineering (ESAT), KU Leuven, where
he is supported by the Belgian National Fund for
Scientific Research (FWO)-Flanders.
He is working in the field of digital signal processing for telecommunication applications under the
supervision of Prof. M. Moonen.
1927
Gert Cuypers was born in Leuven, Belgium, in
1975. In 1998, he received the Master degree in electrical engineering from the Katholieke Universiteit
Leuven (KU Leuven). He is currently pursuing the
Ph.D. degree with the Electrical Engineering Department (ESAT), KU Leuven, under the supervision
of Prof. M. Moonen and is supported by the Flemish
Institute for Scientific and Technological Research
in Industry (IWT).
In the world of radio amateurs, he is also known
as on4dsp. His research interests are in the area of
digital signal processing for telecommunications.
Marc Moonen received the electrical engineering
and the Ph.D. degrees in applied sciences from
the Katholieke Universiteit Leuven (KU Leuven),
Leuven, Belgium, in 1986 and 1990, respectively.
Since 2000, he has been an Associate Professor at
the Electrical Engineering Department, KU Leuven,
where he is currently heading a research team of 16
Ph.D. candidates and postdoctoral students, working
in the area of signal processing for digital communications, wireless communications, DSL, and audio
signal processing.
Dr. Moonen received the 1994 KU Leuven Research Council Award, the
1997 Alcatel Bell (Belgium) Award (with P. Vandaele), and was a 1997 “Laureate of the Belgium Royal Academy of Science.” He was chairman of the
IEEE Benelux Signal Processing Chapter from 1998 to 2002 and is currently
a European Association for Signal, Speech and Image Processing (EURASIP)
AdCom Member, and a member of the editorial board of Integration, the VLSI
Journal, the EURASIP Journal on Applied Signal Processing, and the IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS II.
Thierry Pollet received the diploma degree in
electrical engineering from the University of Ghent,
Ghent, Belgium, in 1989.
From 1989 to 1996, he was with the Communications Engineering Laboratory, University of Ghent,
as a Research Assistant. In 1996, he joined the Alcatel Corporate Research Center, Antwerp, Belgium.
Currently, he is a Project Manager for the Strategic
Program Access to Networks. His main interest are
high-speed copper transmission, digital communications, equalization, and synchronization.