[go: up one dir, main page]

Academia.eduAcademia.edu
1916 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003 Combined RLS-LMS Initialization for Per Tone Equalizers in DMT-Receivers Geert Ysebaert, Koen Vanbleu, Gert Cuypers, Marc Moonen, and Thierry Pollet Abstract—In discrete multitone receivers, the classical equalizer structure consists of a (real) time domain equalizer (TEQ) combined with complex one-tap frequency domain equalizers. An alternative receiver is based on a per tone equalization (PTEQ), which optimizes the signal-to-noise ratio (SNR) on each tone separately and, hence, the total bitrate. In this paper, a new initialization scheme for the PTEQ is introduced, based on a combination of least mean squares (LMS) and recursive least squares (RLS) adaptive filtering. It will be shown that the proposed method has only slightly slower convergence than full square-root RLS (SR-RLS) while complexity as well as memory cost are reduced considerably. Hence, in terms of complexity and convergence speed, the proposed algorithm is in between LMS and RLS. Index Terms—Adaptive filtering, ADSL, discrete multitone, equalization, LMS, RLS. I. INTRODUCTION SYMMETRIC digital subscriber lines (ADSLs) provide high bitrates over the existing telephone network. ADSLs employ a transmission scheme based on discrete multitone (DMT) [1]. DMT divides the available bandwidth into parallel subchannels or tones, which are then modulated separately. Mathematically, this operation is performed by means of an inverse fast Fourier transform (IFFT). After IFFT modulation, a guard time sequence of samples—called a cyclic prefix—is inserted between successive symbols to ensure that samples from one symbol do not interfere with the samples from another symbol. At the receiver, the cyclic prefix is removed and demodulation is performed by means of an FFT. It is known that the mitigation of intersymbol interference (ISI) with a cyclic prefix is only effective if the length of the impulse response is shorter than the cyclic prefix. A long prefix, A Manuscript received January 4, 2002; revised January 21 2003. G. Ysebaert and G. Cuypers are with the I. W. T., and K. Vanbleu is with the F. W. O. Vlaanderen. This work was carried out at the ESAT Laboratory of the Katholieke Universiteit Leuven and was supported by the Belgian State, Prime Minister’s Office—Federal Office for Scientific, Technical, and Cultural Affairs—Interuniversity Poles of Attraction Program (2002-2007)—IUAP P5/22 (“Dynamical Systems and Control: Computation, Identification and Modeling”) and P5/11 (“Mobile multimedia communication systems and networks”), the Concerted Research Action GOA-MEFISTO-666 (Mathematical Engineering for Information and Communication Systems Technology) of the Flemish Government, Research Project FWO G.0295.97 (“Design and implementation of adaptive digital signal processing algorithms for broadband applications”), and by Alcatel-Bell. The associate editor coordinating the review of this paper and approving it for publication was Dr. Zhi-Quan (Tom) Luo. G. Ysebaert, K. Vanbleu, G. Cuypers, and M. Moonen are with the Katholieke Universiteit Leuven, ESAT/SCD-SISTA, B-3001 Leuven, Belgium (e-mail: ysebaert@esat.kuleuven.ac.be; vanbleu@esat.kuleuven.ac.be; cuypers@esat.kuleuven.ac.be; moonen@esat.kuleuven.ac.be). T. Pollet is with Access to Networks, Research, and Innovation, ALCATEL, Antwerpen, Belgium (e-mail: Thierry.Pollet@alcatel.be). Digital Object Identifier 10.1109/TSP.2003.812727 however, introduces a large overhead and, hence, results in a small useful bitrate. A well-known solution is to add a -tap time domain equalizer (TEQ) to shorten the effective channel impulse response. In the literature, many algorithms exist to initialize this TEQ [2]–[6]. In [4], the TEQ is calculated based on a minimum mean-square-error (MMSE) criterion, whereas in [5], the TEQ is obtained by maximizing the shortening signal-tonoise ratio (SSNR). Both formulations define the TEQ initialization problem without a direct relation to the resulting bitrate. In general, these methods suffer from sensitivity to the so-called synchronization delay and unpredictable behavior. Algorithms exist for calculating the TEQ, which optimize the SNR and, hence, obtain optimal bitrates, but they are based on difficult nonlinear optimization procedures [3]. A general disadvantage, which is independent of the used criterion, is that the TEQ equalizes all tones in a combined way, which limits system performance. In [7] and [8], Van Acker et al. proposed a new equalizer scheme based on a “per tone” equalization (PTEQ). This structure is able to optimize the SNR for each tone separately and hence achieves substantial bitrate improvements.1 Moreover, when radio frequency interference (RFI) is present, the gain in performance compared with a TEQ-based receiver is even higher [9]. In summary, the PTEQ has a performance that is always better than the traditional TEQ-based receiver. This motivates the search for cheap initialization procedures to initialize the PTEQ coefficients. This paper addresses the problem of adaptively initializing the per tone equalizer coefficients in a cheap way. The problem consists of solving several least squares problems in parallel to taps per used tone. Due to bad conditioning of determine the problem, an initialization based on a straightforward least mean squares (LMS) adaptive filtering [10] will require many more training symbols than that foreseen by the ADSL standard. As an alternative to LMS, a recursive least squares (RLS) filter adaptation could be implemented to give faster convergence at the expense of extra computational complexity. In [11], a reasonably cheap RLS-based initialization scheme is presented, where most of the RLS processing is shared over all used tones PTEQ inputs are common for all the tones. Here, since we extend the ideas of [11] by focusing on a mixture of RLS and LMS to combine the advantages of both schemes, i.e., fast convergence and low complexity, respectively. We will show that the obtained adaptive algorithm reduces the computational complexity approximately by a factor of four compared with the 1Strictly speaking, power allocation and bandwidth optimization should also be considered when speaking about achieving the optimal bitrate. However, in this paper, we will only consider the “equalization-problem.” 1053-587X/03$17.00 © 2003 IEEE YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS scheme of [11] while still using a reasonably small number of training symbols. In other words, both convergence speed and computational complexity are in between RLS and LMS. Apart from the lower complexity, the proposed algorithm exhibits also a substantially lower memory cost. The memory needed to store all the filter coefficients and RLS-LMS dependent parameters is approximately halved in size compared with full RLS, which makes the proposed scheme tractable for hardware implementation. In the search for adaptive filters with fast convergence and low complexity, several schemes have been previously developed to find “intermediate” solutions between LMS and RLS algorithms. In the scope of acoustic echo cancellation, a set of algorithms were presented to link the advantages of LMS and RLS, i.e., the class of affine projection algorithms (APA) [12], [13]. According to [12], APA algorithms are especially useful in case of a large number of filter taps and small block lengths. However, this is not the case for per tone equalization in ADSL, where the number of taps is typically small. Moreover, PTEQ initialization requires several APA problems to be solved in parallel, where it is not possible to exploit the common part of the PTEQ inputs in a cheap way. In the past, the computational complexity of full RLS schemes is simplified as well, using so-called fast RLS techniques [10]. These schemes are, in particular, suitable for filtering problems where incoming signals are filtered by a tapped delay line. The complexity reductions attained in these algorithms rely on the signal shift nature of the filtering problem. However, extensions to “linear combiner” problems—as is the case with the per tone equalizer—are not possible. Although the initialization procedure described in this paper is only treated for per tone equalization, it can be used in general for problems where many RLS problems have to be solved simultaneously but where inputs can be shared over the different RLS schemes to reduce the complexity and memory requirements. The paper is organized as follows. In Section II, the data model is introduced, whereas an overview of the per tone equalizer structure is given in Section III to make the paper self-contained. Section IV describes the combination of RLS and LMS. In Section V, the computational complexity is calculated, and a comparison is made with normalized LMS and RLS. Finally, simulation results are presented in Section VI followed by conclusions in Section VII. II. DATA MODEL The goal of an equalizer is to reconstruct the transmitted signal by removing interferences of neighboring symbols. An easy way to take into account ISI as well as intercarrier interference (ICI) is to consider a data model with three consecutive complex valued symbols, namely, and . is a complex frequency domain symbol transHere, mitted at time , with the size of the modulating IFFT. The includes elements vector that are chosen from a complex QAM constellation with a size depending on the SNR of the corresponding frequency bin or 1917 tone ( is the tone index). The th symbol is the symbol of th and th cause interference interest, whereas the on this symbol. The frequency domain symbols are transformed to the time domain by means of an IFFT, which is mathematically repre. Before transmission sented by the -points IDFT matrix of a DMT symbol, the last samples of the IFFT output are copied to the front of the symbol to form a cyclic prefix. This operation is performed by the matrix (1) . The DMT symbols are transmitted over of size a channel modeled as a finite impulse response (FIR) filter of vector length (without loss of generality). The with received time domain samples may now be specified as .. . .. . .. .. . . (2) the length of a symbol with prefix added, with additive noise with the sample index, and and zero matrices of size and , respectively. The vector of size denotes the channel impulse response in reversed order, i.e., (3) where the last samples represent the head of the channel impulse response, whereas the tail corresponds to the first samples. Head and tail are calculated in such a way that the zero reference delay maximizes the energy in consecutive . is the synchronization channel coefficients, i.e., in delay and is a design parameter. It represents a delay relative to the zero reference delay. 1918 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003 III. PER TONE EQUALIZATION The aim of the receiver is to reconstruct the transmitted from the received samples . Traditional symbol approaches insert a (real) -tap TEQ prior to demodulation to shorten the channel impulse response to the cyclic prefix remove the length plus one. Afterwards, one-tap FEQs remaining magnitude and phase distortion introduced by the for tone can be overall channel. Hence, the FEQ output written as .. . .. .. . . (4) the th row of the DFT matrix and an Toeplitz matrix. In [8], a per tone approach is taken where the TEQ operations are transferred to the frequency domain in adjusted to each tone separately. order to design a PTEQ Hence, (4) can be rewritten as with .. . .. . .. . (5) overall complexity during data transmission is roughly the same for a PTEQ and a TEQ based structure. IV. EQUALIZER INITIALIZATION: COMBINED RLS AND LMS A. Normalized Least-Mean-Square Algorithm Direct initialization of all the PTEQ coefficients, based on the knowledge of the channel impulse response and the signal and noise power spectral densities, turns out to be computationally too expensive to implement in hardware. Therefore, several techniques have been devised to initialize the equalizer coefficients based on adaptive filtering with training symbols. During data transmission, further equalizer coefficient adaptation can be based on so-called “decision directed operation.” With the normalized least-mean-square algorithm (NLMS) [10], each equalizer coefficient can be updated according to the following algorithm. Algorithm: Normalized LMS for each tone Initialize filter coefficients For Tone-independent part (7) Tone-dependent part For (8) where the FEQ is incorporated in the PTEQ for tone , and equals with its coefficients in reversed order. Apparently, the FFT operation of (4) is turned into a sliding FFT in (5). However, the filtering with the PTEQ can be computed efficiently using so-called “difference terms,” and a modified only one FFT, [8]. PTEQ denoted with for tone can be Van Acker et al showed that the PTEQ found as the solution of an MMSE problem [8] (9) (10) end end Here, (6) and with the expectation operwith2 ator. This criterion indicates that the PTEQ of tone constructs (real) difference terms with the a linear combination of (complex) FFT output for that tone and that the coefficients should be such that the filter output is as close as possible to the . It is important to notice transmitted constellation point that the difference terms are common for all the tones, which in particular will considerably reduce the computational complexity of the initialization scheme. This PTEQ is optimal (compared to the TEQ) in the sense that it optimizes the SNR for each tone separately. Note that the PTEQ is not necessary real anymore but will have complex values in the general case. Still, it has been shown in [8] that the 2 [ v v 111 indicates that its coefficients are in a different order than v , i.e., v v v ] . = contains difference terms for (which are common contains the FFT output for tone for all tones), of , is the normalized stepsize, prevents overflow when becomes very small, is the set of used tones, indicates complex conjugation, is either the decision on the filter output (decision and directed mode) or a training symbol (training mode). A signal flow graph (SFG) is shown in Fig. 2, and the functionality of the building blocks is explained in Fig. 1. It is well known that the convergence speed of NLMS is determined by the eigenvalue spread of the input correlation matrix (11) typically has a large eigenvalue Simulations show that spread, and as a consequence, convergence is slow. Iterative initialization based on LMS-based schemes have low complexity but require, unfortunately, an excessively large number YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS 1919 Notice that the tone index is omitted since the difference terms are common for all the tones. , determine Givens transformations 2) For [10] such that (15) using the previously defined transforma3) Update tions, and apply exponential weighting with (16) (17) Tone-dependent part: For 1) Form the product Fig. 1. Building blocks for SFGs. of training symbols. The low cost can easily be understood from Fig. 2. (18) 2) Determine the Givens transformation [10] such that B. Square-Root Recursive Least Squares To overcome the slow convergence of NLMS, a square-root recursive least squares scheme (SR-RLS)3 [10] can be used. The scheme is based on equalizer coefficient updating similar to (10) but now with a transformed input vector (19) 3) Update , and apply exponential weighting (12) (20) which is called the Kalman gain vector for tone . The transformation is constructed based on the updating of a lower triangular , where is the (upper trianmatrix gular) Cholesky factor of the sample covariance matrix, i.e., (21) 4) Update (13) (22) and indiwith a forgetting factor and cating complex conjugate transpose. The factor determines a , of the most window, with an effective length of recently received symbols. The following formulas describe the SR-RLS algorithm. Algorithm: SR-RLS and for each tone Initialize filter coefficients For Tone-independent part 1) Form the matrix-vector product (14) 3The square-root RLS algorithm is sometimes referred to as the inverse QR-RLS algorithm [10]. end end represents a rotation matrix acting upon the In step 2, th and the last component of such that the th component is zeroed. This algorithm ensures that a Kalman gain vector, as defined in (12), is obtained in a cheap way. See [10] for further details. The convergence behavior for tone now depends on the eigenvalues of (23) (24) 1920 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003 Fig. 2. Signal flow graph for NLMS performed on each tone. (25) cients will be updated in the direction determined by a modified Kalman vector [compare with (12)] (26) becomes indepenwhere we used the fact that for large , dent of the iteration index in a stationary environment, and will be equal to the inverse of the input corhence, relation matrix times the effective window length . In other has no eigenvalue spread, and hence, unwords, the matrix like with LMS, convergence is fast. This algorithm is depicted in Fig. 3. See [11] for a detailed description of this signal flow graph. One can see that every used tone has a -tap PTEQ with as its inputs the complex FFT output for that tone and real difference terms. It is imreal difference terms give rise to a portant to see that the real triangular part of , which is common . The FFT for all the tones output is taken as the last input to the RLS-structure and makes only the last (bottom) row of complex and different for different tones. C. Combined RLS and LMS In Fig. 3, one can see that a large part of the computational , whereas the cost is due to the per tone RLS-part is reasonably cheap. It would be intercommon part with esting if the per tone part could be simplified while preserving the fast convergence of RLS. A solution exists in replacing the per tone RLS part with a much cheaper LMS equivalent. The resulting structure will be referred to as RLS-LMS. This means that the PTEQ coeffi- (27) elements of this vector are transformed versions The first of the difference terms (cf. RLS), whereas the last element is treated in an LMS way. In general, an appropriate scaling of the last element is required in order to have equal order of magnitude of the elements. Hence, the FFT output is scaled with the . inverse of the averaged FFT output energy In this way, the RLS part is used to improve the eigenvalue spread of the common, real difference terms, whereas the complex FFT output is scaled and treated in LMS sense. Mathematically, the RLS-LMS algorithm can be written as follows: Algorithm: RLS combined with LMS and the tone-dependent Initialize the tone-independent filter coefficients and accumulated FFT output energy . For Tone-independent part: See SR-RLS algorithm description Tone-dependent part For 1) Accumulate the energy of the FFT output of tone in order to scale the FFT output in a similar way as the difference terms (28) YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS Fig. 3. 2) Update 1921 Signal flow graph for SR-RLS performed on each tone. for each used tone (29) end end , The stepsize is added to ensure convergence. Note that and the first elements of have real values. This procedure is convergent in the mean and the convergence . depends on the eigenvalues of the cross correlation matrix (For a proof, see Appendix A.) The RLS part will remove a part of the eigenvalue spread of the input correlation matrix, which would be observed when ordinary LMS would be used. As a result, the overall eigenvalue spread of the proposed scheme only depends on three unequal eigenvalues (see Appendix B). Hence, we can state that RLS-LMS solves in a sense an LMS problem with three distinct eigenvalues. In general, LMS algorithms do not experience a lot of convergence problems in the case of an ill conditioned three-dimensional problem. As a second argument for the favorable convergence of the combined approach, we mention the fact that the noise is averaged due to the exponential weighting. A pure LMS update is characterized by a large eigenvalue spread as well as a noisy version of the true gradient, leading both to slow convergence [10]. In the proposed algorithm, the noise is averaged, resulting in a more reliable update direction. These arguments explain intuitively why the combined approach works as well. The RLS-LMS SFG is depicted in Fig. 4, which clearly illustrates the reduced per tone complexity. The stepsize must be smaller than the effective memory of the algorithm due to exponential weighting, i.e., (30) 1922 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003 Fig. 4. Signal flow graph for RLS combined with an LMS-based per tone structure. (For a proof, see Appendix B.) A large stepsize results, of course, in a large excess error but makes a fast convergence possible. Hence, like in LMS-based schemes, a tradeoff has to be made. V. COMPUTATIONAL COMPLEXITY In this section, we determine the computational complexity for the different initialization algorithms, based on, respectively, NLMS, SR-RLS, and the combined RLS-LMS. In our complexity calculations, we only consider the number of real multiplications and real additions, i.e., a multiplication of two complex numbers is counted as four real multiplications. Note again that the difference terms are real and that the PTEQ coefficients are complex valued. 1) Results of the complexity calculations for NLMS are summarized in Table I. Note that the per tone part should to obtain be multiplied by the number of used tones the overall complexity. 2) Referring to Fig. 3, SR-RLS contains a real common part, due to real common difference terms, as well as a complex per tone part. The complexity of the common part is given in Table II, whereas the complexity of the per tone part can be found in Table III. Since the common part is the same for SR-RLS as well as RLS-LMS, it can be described by (14)–(17). The complexity calculations of the per tone part can be derived in a similar way. The Kalman vector is in this case a complex vector, which results in extra complexity for the filter update compared with NLMS. 3) The combined RLS-LMS contains the same common part as SR-RLS but has a significantly lower number of computations per tone. Comparing Tables III and IV, one can see that a substantial per tone complexity reduction is achieved. For a typical ADSL downstream case, where MHz, , and where approximately , tones are YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS TABLE I COMPLEXITY CALCULATION FOR THE TONE-INDEPENDENT TONE-DEPENDENT PART OF NLMS AND 1923 TABLE III COMPLEXITY CALCULATION FOR THE PER TONE PART OF SR-RLS TABLE IV COMPLEXITY CALCULATION FOR THE PER TONE PART OF RLS-LMS TABLE II COMPLEXITY CALCULATION FOR THE COMMON PART OF SR-RLS AND FOR THE COMBINED RLS-LMS APPROACH used, the overall complexity for the respective algorithms becomes 1) NLMS: for each tone . Due to the real common difference terms, is real and common for all tones, remain to be whereas complex numbers of stored for each tone. For RLS-LMS, the latter part disappears, needs storage. Hence, when and instead, only the real value is small compared with the number of used tones, the common part is almost negligible, and approximately a two-fold reduction in memory is obtained with RLS-LMS compared with full SR-RLS. In the upstream case, the reduction in computational complexity is smaller since fewer tones are of interest. However, the two-fold reduction for the memory requirements of the per tone part still holds. Hence, RLS-LMS is still favorable for upstream initialization. add./s VI. SIMULATION RESULTS A. Convergence Behavior mult./s 2) RLS: add./s mult./s As a performance measure for the simulations, we will use the overall bitrate as well as the SNR for tone averaged over simulation runs, using the following formulas [8]: bitrate 3) RLS-LMS: (31) (32) add./s mult./s. Hence, a four- to five-fold complexity reduction per iteration is obtained in the downstream case. Apart from the reduced complexity, the memory requirements for the per tone part are lowered as well. In the full SR-RLS scheme, the main part of the memory is assigned to store the complex filter and the elements of the lower triangular matrix coefficients SNR (33) where is the number of bits assigned to tone , is the SNR the noise margin, and the coding gain. In our simgap, ulations, the following values were used: , , dB, dB, dB, and MHz. 1924 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003 Fig. 6. Convergence speed of SR-RLS and RLS-LMS for CSA #4 with 24 DSL NEXT disturbers. Fig. 5. Comparison of different initialization procedures for the per tone 24 for standard channel CSA#4 with 24 DSL NEXT equalizers with T disturbers (downstream). = The convergence in terms of bitrate is used to measure the performance since it captures the convergence of the SNRs of the different tones in a uniquely determined way. Moreover, it is the only relevant way to combine the tone-dependent SNRs into one measure representing all the tones. Simulations were performed on a standard loop CSA#4 [4], [14] with additive white Gaussian noise of 140 dBm/Hz and , 24 DSL NEXT disturbers. Other parameters are , , , and . Front end filters to separate up- and downstream transmission are included in the channel. It is assumed that echo is perfectly cancelled. The used tones are from 33 up to tone 255. In Fig. 5, the initialization procedure for SR-RLS, NLMS, and RLS-LMS are compared for the downstream case, where the initial values for the filter coefficients are white and where is initialized with 10 on the diagonal. The bitrate is plotted as a function of the number of iterations (symbols). Two types of training sequences are used: • a white training sequence where data on different tones and different symbols are uncorrelated; • a pseudo random binary sequency (PRBS), as defined in the ADSL standard, used in the startup phase of the ADSL modem [14]. This PRBS is repetitive with a period of 512 symbols (downstream). In Fig. 5, we can clearly see the fast convergence of RLS-LMS, which is in between SR-RLS and NLMS. SR-RLS initialization needs approximately 200 symbols to converge, whereas the combination of RLS and LMS is close to the optimal value in more or less 1000 symbols. Notice that after 1000 symbols, NLMS has still not converged. The learning curve obtained with a PRBS training sequence shows almost no performance difference compared with a completely random training signal. In upstream ADSL [over plain old telephone service (POTS)], a PRBS sequence with a period of 64 symbols is used. Due to this highly repetitive character, all adaptive initialization proce- TABLE V CONVERGENCE BEHAVIOR OF SR-RLS AND RLS-LMS CSA #1–8 LOOPS FOR THE dures suffer from unacceptable performance (the schemes converge but stay far below the MMSE solution). Hence, we will limit ourselves to the downstream case. B. Convergence Versus Equalizer Taps Fig. 6 shows the number of symbols required to reach 98.5% of the MMSE bitrate as a function of the number of equalizer taps per tone. SR-RLS and RLS-LMS are considered with a random training signal or a PRBS for loop CSA#4 with additive white Gaussian noise of 140 dBm/Hz and 24 DSL NEXT disturbers. Other parameters are and . The figure suggests that the convergence speed decreases with the number of taps. However, this is an observation that is valid for SR-RLS as well. The PRBS training sequence shows a slightly slower convergence behavior. C. Different Channel Models To illustrate the performance of the proposed algorithm for a wide range of channel models, simulations were executed for the downstream CSA #1–8 channels [4] with additive white Gaussian noise of 140 dBm/Hz and 24 DSL NEXT disturbers. Table V indicates the number of iterations required to reach, with SR-RLS and RLS-LMS, 99% of the bitrate obtained with , , and . the MMSE PTEQ, where The worst-case convergence difference between SR-RLS and YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS 1925 . For compact notation, the tone index will be omitted in the formulas. Consider (34) is the estimation error at the optimal Wiener solution where . The weight error vector can be written, with (29), as (35) (36) (37) = Fig. 7. Eigenvalues of X with T 16 for standard channel T1.601#13 with 24 DSL NEXT disturbers (downstream). RLS-LMS is obtained for loop CSA #8, where RLS-LMS is approximately 4, which is four times slower than SR-RLS. (38) the identity matrix of size with becomes . After applying (34), this (39) D. Eigenvalues of Fig. 7 depicts the eigenvalue spread of on a logarithmic for a standard loop T1.601#13 scale for RLS-LMS with is determined based on 3000 symbols. It is seen [14]. Here, eigenvalues are equal and that 2 eigenvalues are that clearly different. This is in accordance with the derivation in eigenvalues Appendix B. We have also shown that the with approximately equal to . are equal to in this simulation, we expect that eigenSince values will approximately be equal to . This value can also be found in Fig. 7. The upperbound for , which is equal to the eigenvalue was determined to be in our simulation. In addition, this value is approximately confirmed by Fig. 7. (40) where sides yields (41) VII. CONCLUSIONS In this paper, we presented a new scheme to initialize the per tone equalizers for DMT-based receivers. The per tone equalizers form an upperbound for the bitrate obtained with the more traditional TEQ based receivers, which motivates the need for cheap initialization algorithms. The presented initialization algorithm is based on a combination of RLS and LMS. We showed that the behavior of the new algorithm is situated between SR-RLS and LMS. More specifically, the RLS-LMS scheme achieves convergence in an acceptably small number of training symbols for a complexity lower than SR-RLS and with reduced memory requirements. It was proven that the algorithm is convergent in the mean and upperbounds for the stepsize that was derived. APPENDIX A PROOF CONVERGENCE IN THE MEAN OF RLS-LMS In this Appendix, we will prove that the convergence of RLS-LMS depends on the cross correlation matrix . Taking expected values of both (42) is orthogonal to where we assumed that the input vector the estimation error (orthogonality principle [10]), that becomes independent of the time index (which holds for sta), and that the input vector is tionary inputs and (cf. traditional “independence assumption” independent of [10]). , Now, convergence is assured if all eigenvalues of satisfy the following relation: (43) or more specifically (if all lowing must hold: s are positive, see below), the fol- (44) As a first conclusion, we see that convergence in the mean de. pends on the eigenvalues of the cross correlation matrix 1926 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 7, JULY 2003 (48) (49) (50) orthogonal projection ically as APPENDIX B UPPERBOUND FOR THE STEPSIZE of on can be written mathemat- In the following, we will derive more specific upper and lower and apbounds for the stepsize . With proaching infinity, one can state approximately4 (52) The cosine squared of the angle between and is given by (45) (53) (46) the number of most recent input symbols taken into with account by the exponential forgetting with . is defined as in (13), but only difference terms are considered. Equivalently, we can write the inverse of the FFT correlation for tone (54) (55) (47) where we used (28) for the accumulated energy of the FFT output. The previous equations are now used to determine a more in (48)–(50), shown at the top of the explicit expression for page. One can easily prove that the previous matrix has eigenvalues equal to and two eigenvalues equal to , with equal to (56) (57) Hence, is always positive [see (51)] and smaller than one. , which means that Therefore, is positive definite. In other words, the algorithm converges in the mean if (58) (51) The value of has a very specific structure, which is well known in subspace theory. It represents, in fact, the cosine of the principal angle [15] between the subspace spanned by the FFT samples and the subspace spanned by the difference terms. This can easily be seen if we consider the data-driven case where the expectation operator is left out. When samples of the FFT output for tone are stacked, we (size ) spanning the subspace formed obtain a vector by the FFT output of that tone. If we do the same for the differ[size ], where ence terms, we obtain the matrix difference terms consecutive in time are put into the rows. The 4Ergodicity is assumed, i.e., expected values are replaced by their time averages. This equation indicates that the stepsize must be smaller than the effective number of samples in the memory of the system , where is the forgetting factor. REFERENCES [1] J. A. C. Bingham, “Multicarrier modulation for data transmission: An idea whose time has come,” IEEE Commun. Mag., vol. 28, pp. 5–14, May 1990. [2] T. Pollet, H. Steendam, and M. Moeneclaey, “Performance degradation of multi-carrier systems caused by an insufficient guard interval duration,” in Proc. Int. Workshop Copper Wire Access Syst. Bridging Last Copper Drop, 1997, pp. 265–270. [3] N. Al-Dhahir and J. M. Cioffi, “Optimum finite-length equalization for multicarrier transceivers,” IEEE Trans. Commun., vol. 44, pp. 56–64, Jan. 1996. [4] , “Efficiently computed reduced-parameter input-aided MMSE equalizers for ml detection: A unified approach,” IEEE Trans. Inform. Theory, vol. 42, pp. 903–915, May 1996. YSEBAERT et al.: COMBINED RLS-LMS INITIALIZATION FOR PER TONE EQUALIZERS [5] P. J. W. Melsa, R. C. Younce, and C. E. Rohrs, “Impulse response shortening for discrete multitone transceivers,” IEEE Trans. Commun., vol. 44, pp. 1662–1672, Dec. 1996. [6] M. V. Bladel and M. Moeneclaey, “Time-domain equalization for multicarrier communication,” in Proc. IEEE Global Telecommun. Conf., 1995, pp. 167–171. [7] T. Pollet, M. Peeters, M. Moonen, and L. Vandendorpe, “Equalization for DMT-based broadband modems,” IEEE Commun. Mag., pp. 106–113, May 2000. [8] K. Van Acker, G. Leus, M. Moonen, O. van de Wiel, and T. Pollet, “Per tone equalization for DMT-based systems,” IEEE Trans. Commun., vol. 49, pp. 109–119, Jan. 2001. [9] K. Van Acker, T. Pollet, G. Leus, and M. Moonen, “Combination of per tone equalization and windowing in DMT-receivers,” Signal Process., vol. 81, pp. 1571–1579, 2001. [10] S. Haykin, Adaptive Filter Theory, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1996. [11] K. Van Acker, G. Leus, M. Moonen, and T. Pollet, “RLS-based initialization for per tone equalizers in DMT-receivers,” in Proc. Eur. Signal Process. Conf., Tampere, Finland, Sept. 2000. [12] M. Montazeri and P. Duhamel, “A set of algorithms linking NLMS and block RLS algorithms,” IEEE Trans. Signal Processing, vol. 43, pp. 444–453, Feb. 1995. [13] S. L. Gay and S. Tavathia, “The fast affine projection algorithm,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 1995, pp. 3023–3026. [14] “Draft new recommendation g.992.1: ADSL transceivers,” Int. Telecommun. Union, Tech. Rep., July 1999. [15] G. H. Golub and C. F. V. Loan, Matrix Computations, 3rd ed. Baltimore, MD: John Hopkins Univ. Press, 1996. Geert Ysebaert was born in Leuven, Belgium, in 1976. In 1999, he received the Master degree in electrical engineering from the Katholieke Universiteit Leuven (KU Leuven). He is currently pursuing the Ph.D. degree with the Electrical Engineering Department (ESAT), KU Leuven, under the supervision of Prof. M. Moonen, and he is supported by the Flemish Institute for Scientific and Technological Research in Industry (IWT). His research interests are in the area of digital signal processing for DSL communications. Koen Vanbleu was born in Bonheiden, Belgium, in 1976. In 1999, he received the Master degree in electrical engineering from the Katholieke Universiteit Leuven (KU Leuven), Leuven, Belgium. Currently, he is pursuing the Ph.D. degree with the SCD-SISTA Laboratory of the Department of Electrical Engineering (ESAT), KU Leuven, where he is supported by the Belgian National Fund for Scientific Research (FWO)-Flanders. He is working in the field of digital signal processing for telecommunication applications under the supervision of Prof. M. Moonen. 1927 Gert Cuypers was born in Leuven, Belgium, in 1975. In 1998, he received the Master degree in electrical engineering from the Katholieke Universiteit Leuven (KU Leuven). He is currently pursuing the Ph.D. degree with the Electrical Engineering Department (ESAT), KU Leuven, under the supervision of Prof. M. Moonen and is supported by the Flemish Institute for Scientific and Technological Research in Industry (IWT). In the world of radio amateurs, he is also known as on4dsp. His research interests are in the area of digital signal processing for telecommunications. Marc Moonen received the electrical engineering and the Ph.D. degrees in applied sciences from the Katholieke Universiteit Leuven (KU Leuven), Leuven, Belgium, in 1986 and 1990, respectively. Since 2000, he has been an Associate Professor at the Electrical Engineering Department, KU Leuven, where he is currently heading a research team of 16 Ph.D. candidates and postdoctoral students, working in the area of signal processing for digital communications, wireless communications, DSL, and audio signal processing. Dr. Moonen received the 1994 KU Leuven Research Council Award, the 1997 Alcatel Bell (Belgium) Award (with P. Vandaele), and was a 1997 “Laureate of the Belgium Royal Academy of Science.” He was chairman of the IEEE Benelux Signal Processing Chapter from 1998 to 2002 and is currently a European Association for Signal, Speech and Image Processing (EURASIP) AdCom Member, and a member of the editorial board of Integration, the VLSI Journal, the EURASIP Journal on Applied Signal Processing, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II. Thierry Pollet received the diploma degree in electrical engineering from the University of Ghent, Ghent, Belgium, in 1989. From 1989 to 1996, he was with the Communications Engineering Laboratory, University of Ghent, as a Research Assistant. In 1996, he joined the Alcatel Corporate Research Center, Antwerp, Belgium. Currently, he is a Project Manager for the Strategic Program Access to Networks. His main interest are high-speed copper transmission, digital communications, equalization, and synchronization.