[go: up one dir, main page]

EP1521242A1 - Speech coding method applying noise reduction by modifying the codebook gain - Google Patents

Speech coding method applying noise reduction by modifying the codebook gain Download PDF

Info

Publication number
EP1521242A1
EP1521242A1 EP03022249A EP03022249A EP1521242A1 EP 1521242 A1 EP1521242 A1 EP 1521242A1 EP 03022249 A EP03022249 A EP 03022249A EP 03022249 A EP03022249 A EP 03022249A EP 1521242 A1 EP1521242 A1 EP 1521242A1
Authority
EP
European Patent Office
Prior art keywords
signal
time interval
noise
fixed gain
gain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03022249A
Other languages
German (de)
French (fr)
Inventor
Christophe Dr. Beaugeant
Nicolas Dütsch
Herbert Dr. Heiss
Hervé Dr. Taddei
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Siemens Corp
Original Assignee
Siemens AG
Siemens Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG, Siemens Corp filed Critical Siemens AG
Priority to EP03022249A priority Critical patent/EP1521242A1/en
Priority to PCT/EP2004/051712 priority patent/WO2005031708A1/en
Publication of EP1521242A1 publication Critical patent/EP1521242A1/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain

Definitions

  • the invention refers to a speech coding method applying noise reduction
  • noise reduction methods have been developed in speech processing. Most of the methods are performed in the frequency domain. They commonly comprise three major components:
  • the suppression rule modifies only the spectral amplitude, not the phase. It has been shown, that there is no need to modify the phase in speech enhancement processing. Nevertheless, this approximation is only valid for a Signal to Noise Ratio (SNR) greater than 6dB. However, this condition is supposed to be satisfied in the majority of the noise reduction algorithms.
  • SNR Signal to Noise Ratio
  • FIG. 1 A scheme of a treatment of a speech signal with noise reduction is depicted in Fig. 1.
  • the speech component s(p), where p denotes a time interval is superimposed with a noise component n(p).
  • n(p) This results in the total signal y(p).
  • the total signal y(p) undergoes a FFT.
  • the result are Fourier components Y(p, f k ), where f k denotes a quantized frequency.
  • the noise reduction NR is applied, thus producing modified Fouriercomponents S(p, S and (p,f k ). This leads after an IFFT to a clean speech signal estimate s and (p).
  • a problem of any spectral weighting noise reduction method is its computational complexity, e.g. if the following steps have to be performed successively:
  • a method for transmitting speech data said speech data are encoded by using an analysis through synthesis method.
  • a synthesised signal is produced for approximating the original signal.
  • the production of the synthesised signal is performed by using at least a fixed codebook with a respective fixed gain and optionally an adaptive codebook and a adaptive gain. The entries of the codebook and the gain are chosen such, that the synthesised signal resembles the original signal.
  • Parameters describing these quantities will be transmitted from a sender to a receiver, e.g. from a near-end speaker to a far-end speaker or vice versa.
  • the invention is based on the idea of modifying the fixed gain determined for the signal containing a noise component and a speech component. Objective of this modification is to obtain a useful estimate of the fixed gain of the speech component or clean signal.
  • the modification is done using a modification factor, which is determined on basis of an estimate of the signal to noise ratio.
  • This signal to noise ratio is calculated consecutively using also the past of this quantity. Thereby the noise component is represented by its fixed gain.
  • One advantage of this procedure is its low computational complexity, particularly if the speech enhancement through noise reduction is done independently from an encoding / decoding unit, e.g. in a certain position within a network, where according to a noise reduction method in the time domain all the steps of decoding, FFT, speech enhancement , IFFT and encoding would have to be performed one after the other. This is not necessary for a noise reduction method according based on modification of parameters
  • Another advantage is that by using the parameters for any modification, a repeated encoding and decoding process, the so called “tandeming" can be avoided, because the modification takes place in the parameter itself. Any tandeming decreases the speech quality. Furthermore the delay due to the additional encoding/decoding, which is e.g. in GSM typically 5 ms can be avoided.
  • the procedure is furthermore also applicable within a communications network.
  • An encoding apparatus set up for performing the above described encoding method includes at least a processing unit.
  • the encoding apparatus may be part of a communications device, e.g. a cellular phone or it may be also situated in a communication network or a component thereof.
  • the codec consists of a multi-rate, that is, the AMR codec can switch between the following bit rates: 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s, speech codec, a source-controlled rate scheme including a Voice Activity Detection (VAD), a comfort noise generation system and an error concealment mechanism to compensate the effects of transmission errors.
  • VAD Voice Activity Detection
  • Fig. 2 shows the scheme of the AMR encoder. It uses a LTP (long term prediction) filter. It is transformed to an equivalent structure called adaptive codebook. This codebook saves former LPC filtered excitation signals. Instead of subtracting a long-term prediction as the LTP filter does, an adaptive codebook search is done to get an excitation vector from further LPC filtered speech samples. The amplitude of this excitation is adjusted by a gain factor g a .
  • the encoder transforms the speech signal to parameters which describe the speech.
  • these parameters namely the LSF (or LPC) coefficients, the lag of the adaptive codebook, the index of the fixed codebook and the codebook gains, as "speech coding parameters”.
  • the domain will be called “(speech) codec parameter domain” and the signals of this domain are subscripted with frame index $k$.
  • Fig. 3 shows the signal flow of the decoder.
  • the decoder receives the speech coding parameters and computes the excitation signal of the synthesis filter. This excitation signal is the sum of the excitations of the fixed and adaptive codebook scaled with their respective gain factors. After the synthesis-filtering is performed, the speech signal is post-processed.
  • a (total) signal containing clean speech or a speech component and a noise component is encoded.
  • a fixed gain g y (m) of the total signal is calculated.
  • This fixed gain g y (m) of the total signal is subject to a gain modification which bases on a noise gain estimation.
  • an estimate of the fixed gain g and n ( m ) is determined, which is used for the gain modification.
  • the result of the gain modification is an estimate of the fixed gain g and s ( m ) of the clean speech or the speech component.
  • This parameter is transmitted from a sender to a receiver. At the receiver side it is decoded. This procedure will now be described in detail:

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention refers to a method for encoding an acoustic signal (y(n)) containing a speech component and a noise component by using an analysis through synthesis method, wherein for encoding the acoustic signal a synthesised signal is compared with the acoustic signal for a time interval, said synthesised signal being described by using a fixed codebook and an associated fixed gain, comprising the steps:
  • Extracting an estimated fixed gain (g andn ) of the noise component from the acoustic signal (y(n)) for the time interval;
  • Representing the noise component for the time interval by its fixed gain (g andn );
  • Estimating a signal to noise ratio of the speech component to the noise component for the time interval based on the signal to noise ratio of an earlier time interval and a ratio of the acoustic signal (y(n)) to the noise component in the time interval;
  • Determining a modification factor (γc ) based on the estimate of the signal to noise ratio;
  • Deriving an estimate of a fixed gain (g ands (m)) of the speech component by modifying said fixed gain of the speech component with a modification factor (γ c ).

Description

    Field of the Invention
  • The invention refers to a speech coding method applying noise reduction
  • For over forty years noise reduction methods have been developed in speech processing. Most of the methods are performed in the frequency domain. They commonly comprise three major components:
  • a) a spectral analysis/synthesis system (typically a short-term windowed FFT (Fast Fourier Transform),
  • b) IFFT (Inverse Fast Fourier Transform), a noise estimation procedure, and c) a spectral gain computation according to a suppression rule, which is used for suppressing the noise.
  • The suppression rule modifies only the spectral amplitude, not the phase. It has been shown, that there is no need to modify the phase in speech enhancement processing. Nevertheless, this approximation is only valid for a Signal to Noise Ratio (SNR) greater than 6dB. However, this condition is supposed to be satisfied in the majority of the noise reduction algorithms.
  • Methods for spectral weighting noise reduction are often based on the following hypothesis:
    • The noise is additive (i.e. $y(t)=s(t)+n(t)$), uncorrelated with the speech signal and locally stationary. s and y represent the clean and the noisy speech signal respectively.
    • There are silence periods in the speech signal.
    • The human auditory system is not sensible to the received speech phase.
  • A scheme of a treatment of a speech signal with noise reduction is depicted in Fig. 1. The speech component s(p), where p denotes a time interval is superimposed with a noise component n(p). This results in the total signal y(p). The total signal y(p) undergoes a FFT. The result are Fourier components Y(p, fk), where fk denotes a quantized frequency. Now the noise reduction NR is applied, thus producing modified Fouriercomponents S(p, S and(p,fk). This leads after an IFFT to a clean speech signal estimate s and(p).
  • A problem of any spectral weighting noise reduction method is its computational complexity, e.g. if the following steps have to be performed successively:
  • a) decoding
  • b) FFT analysis
  • c) Speech enhancement, e.g. noise reduction
  • d) Inverse FFT analysis
  • e) encoding
  • Thereby the above list is typical for classical noise reduction occurring in a communications network
  • Based on the foregoing description it is an object of the invention to provide a possibility of a noise reduction method in speech processing systems that can be implemented with a low computational effort.
  • This object is solved by the subject matter disclosed in the independent claims. Advantageous embodiments of the present invention will be presented in the dependent claims.
  • In a method for transmitting speech data said speech data are encoded by using an analysis through synthesis method. For the analysis through synthesis a synthesised signal is produced for approximating the original signal. The production of the synthesised signal is performed by using at least a fixed codebook with a respective fixed gain and optionally an adaptive codebook and a adaptive gain. The entries of the codebook and the gain are chosen such, that the synthesised signal resembles the original signal.
  • Parameters describing these quantities will be transmitted from a sender to a receiver, e.g. from a near-end speaker to a far-end speaker or vice versa.
  • The invention is based on the idea of modifying the fixed gain determined for the signal containing a noise component and a speech component. Objective of this modification is to obtain a useful estimate of the fixed gain of the speech component or clean signal.
  • The modification is done using a modification factor, which is determined on basis of an estimate of the signal to noise ratio. This signal to noise ratio is calculated consecutively using also the past of this quantity. Thereby the noise component is represented by its fixed gain.
  • One advantage of this procedure is its low computational complexity, particularly if the speech enhancement through noise reduction is done independently from an encoding / decoding unit, e.g. in a certain position within a network, where according to a noise reduction method in the time domain all the steps of decoding, FFT, speech enhancement , IFFT and encoding would have to be performed one after the other. This is not necessary for a noise reduction method according based on modification of parameters
  • Another advantage is that by using the parameters for any modification, a repeated encoding and decoding process, the so called "tandeming" can be avoided, because the modification takes place in the parameter itself. Any tandeming decreases the speech quality. Furthermore the delay due to the additional encoding/decoding, which is e.g. in GSM typically 5 ms can be avoided.
  • Thus the parameters, which are actually transmitted do not need to be transformed in a signal for applying the speech reduction. The procedure is furthermore also applicable within a communications network.
  • An encoding apparatus set up for performing the above described encoding method includes at least a processing unit. The encoding apparatus may be part of a communications device, e.g. a cellular phone or it may be also situated in a communication network or a component thereof.
  • In the following the invention will be described by means of preferred embodiments with reference to the accompanying drawings in which:
  • Fig. 1: Scheme of a noise reduction in the frequency domain
  • Fig. 2: shows schematically the function of the AMR encoder;
  • Fig. 3: shows schematically the function of the AMR decoder;
  • Fig. 4: Scheme of a noise reduction method in the parameter domain
  • 1. Function of a encoder (Fig.2)
  • First the function of a speech codec is described by an special implementation of an CELP based codec, the AMR (Adaptive Multirate Codec) codec. The codec consists of a multi-rate, that is, the AMR codec can switch between the following bit rates: 12.2, 10.2, 7.95, 7.40, 6.70, 5.90, 5.15 and 4.75 kbit/s, speech codec, a source-controlled rate scheme including a Voice Activity Detection (VAD), a comfort noise generation system and an error concealment mechanism to compensate the effects of transmission errors.
  • Fig. 2 shows the scheme of the AMR encoder. It uses a LTP (long term prediction) filter. It is transformed to an equivalent structure called adaptive codebook. This codebook saves former LPC filtered excitation signals. Instead of subtracting a long-term prediction as the LTP filter does, an adaptive codebook search is done to get an excitation vector from further LPC filtered speech samples. The amplitude of this excitation is adjusted by a gain factor ga.
  • The encoding of the speech is described now with reference to the numbers given in Fig. 2
  • 1. The speech signal is processed block-wise and thus partitioned into frames and sub-frames. Each frame is 20 ms long (160 samples at 8 kHz sampling frequency) and is divided into 4 sub-frames of equal length.
  • 2. LPC analysis of a Hamming-windowed frame.
  • 3. Because of stability reasons, the LPC filter coefficients are transformed to Line Spectrum Frequencies (LSF). Afterwards these coefficients are quantized in order to save bit rate. This step and the previous are done once per frame (except in 12.2 kbit/s mode; the LPC coefficients are calculated and quantised twice per frame) whereas the steps 4 - 9 are performed on sub-frame basis.
  • 4. The sub-frames are filtered by a LPC filter with retransformed and quantised LSF coefficients. Additionally the filter is modified to improve the subjective listening quality.
  • 5. As the encoding is processed block by block, the decaying part of the filter, which is longer than the block length, has to be considered by processing the next sub-frame. In order to speed up the minimization of the residual power described in the following, the zero impulse response of the synthesis filter excited by previous sub-frames is subtracted.
  • 6. The power of the LPC filtered error signal e(n) depends on four variables: the excitation of the adaptive codebook, the excitation of the fixed codebook and the respective gain factors ga and gf. In order to find the global minimum of the power of the residual signal and as no closed solution of this problem exists, all possible combinations of these four parameters have to be tested experimentally. As the minimization is hence too complex, the problem is divided into subproblems. This results in a suboptimal solution, of course. First the adaptive codebook is searched to get the optimal lag L and gain factor ga,L. Afterwards the optimal excitation scaled with the optimal gain factor is synthesis-filtered and subtracted from the target signal. This adaptive codebook search accords to a LTP filtering.
  • 7. In a second step of the minimization problem the fixed codebook is searched. The search is equivalent to the previous adaptive codebook search. I.e. it is looked for the codebook vector that minimizes the error criteria. Afterwards the optimal fixed gain is determined. The resulting coding parameters are the index of the fixed codebook vector J and the optimal gain factor gf,J.
  • 8. The scaling factors of the codebooks are quantized jointly (except in 12.2 kbit/s mode - both gains are quantized scalar), resulting in a quantization index, which is also transmitted to the decoder.
  • 9. Completing the processing of the sub-frame, the optimal excitation signal is computed and saved in the adaptive codebook. The synthesis filter states are also saved so that this decaying part can be subtracted in the next sub-frame.
  • 2. Function of a decoder (Fig. 3)
  • Now the decoder is described in reference with Fig. 3. As shown in the previous section, the encoder transforms the speech signal to parameters which describe the speech. We will refer to these parameters, namely the LSF (or LPC) coefficients, the lag of the adaptive codebook, the index of the fixed codebook and the codebook gains, as "speech coding parameters". The domain will be called "(speech) codec parameter domain" and the signals of this domain are subscripted with frame index $k$.
  • Fig. 3 shows the signal flow of the decoder. The decoder receives the speech coding parameters and computes the excitation signal of the synthesis filter. This excitation signal is the sum of the excitations of the fixed and adaptive codebook scaled with their respective gain factors. After the synthesis-filtering is performed, the speech signal is post-processed.
  • 3. Embodiment of a noise reduction rule
  • Now an embodiment is described, where the fixed codebook gain of a CELP codec through a certain noise reduction rule is modified such, that the processed fixed codebook is assumed to be noise free. Therfor the following steps are performed, which lead to a less noisy signal after processing.
  • a) a noisy signal y(t) is coded through a CELP (Code excited linear prediction) codec, e.g. the AMR codec.
  • b) There the signal is described or coded through the so named 'parameters', i.e. the fixed code book entry, the fixed code book gain, the adaptive code book entry, the adaptive codebook gain, the LPC coefficients etc..
  • c) With a special processing the fixed code book gain g y (m) of the signal is extracted from these parameters.
  • d) A noise reduction is applied to g y (m).
  • d1) Accordingly an estimation g and n (m) of the noise fixed gain is needed.
  • d2)Furthermore a reduction rule is required to be applied on the noisy fixed gain g y (m). An 'unnoisy' or clean fixed gain, i.e an estimation of the speech gain g and s (m), is thus obtained.
  • e) The coded signal is then recomputed by interchanging the noisy fixed gain g y (m) by the estimation of the speech gain g and s (m) and letting the other codec parameters unchanged. The resulting set of codec parameters are assumed to code a clean signal.
  • f) Optionally a postfilter is applied in order to control the noise reduction rule and to avoid artefacts stemming from the reduction rule.
  • g) If the encoded, in the above described way modified signal is decoded, a clean signal in the time domain is achieved.
  • This procedure is depicted schematically in Fig. 4. In an encoder a (total) signal containing clean speech or a speech component and a noise component is encoded. By the encoding process a fixed gain gy(m) of the total signal is calculated. This fixed gain gy(m) of the total signal is subject to a gain modification which bases on a noise gain estimation. By the noise gain estimation an estimate of the fixed gain g and n (m) is determined, which is used for the gain modification. The result of the gain modification is an estimate of the fixed gain g and s (m) of the clean speech or the speech component. This parameter is transmitted from a sender to a receiver. At the receiver side it is decoded. This procedure will now be described in detail:
  • a)Gain subtraction One possibility to achieve parameters representing the fixed signal is gain subtraction. Assuming that the fixed codebook gain gy(m) from the noisy speech is the sum of the clean fixed codebook gain from the clean speech and the noisy fixed codebook gain from the noise, the fixed codebook gain is modified accordingly: g s (m)=g y (m)- g n (m), where m denotes a time interval, e.g. a frame or a subframe, g and n (m) the estimate of the noise component and g and s (m) the estimate of the clean codebook gain. It will be described in the next section in reference with a different embodiment, how the estimate of fixed gain g and n (m) of the noise component can be calculated.
  • b)Minimisation Alternatively to the gain subtraction the fixed gain can be modified by using a modification factor γc(m), which can be derived from an estimated signal to noise ratio SN andR.As the estimated noisy fixed codebook gain g and n (m) can be larger than the decoded fixed codebook gain g y (m), the estimated clean speech fixed codebook gain g and s (m) is bound by a positive threshold g smin for a least square error minimization.Let g y (m),g and s (m),g and n (m) be respectively the noisy speech fixed codebook gain, the estimated clean speech fixed codebook gain and the estimated noise only fixed codebook gain. The fixed codebook gain is modified according to the following equation: g s (m)=γ c (m).g y (m) γ c (m) based on the MMSE criterion (transposition of the MMSE criterion in the frequency domain as in [3] to the codec parameter domain) is computed using: γ c (m) = SNR(m)1+SNR(m) , where SN andR is the estimate of the signal to noise ratioThe computation of the SN andR(m) based on a "a priori" SNR estimation, which is recursively computed, i.e. it depends on the previous time interval, e.g. subframe. The following estimate has been proved useful in practice: SNR(m) = βγ c (m-1). g y (m-1) g n (m) δ(m) + (1-β) g y (m) g n (m) δ(m) with the exponential weighting factor
    Figure 00110001
    Values used in advantageous embodiments are δ1 = 2,δ2 = 0.75. β denotes a weighting factor for taking account of the past of the speech signal, e.g. as in the formula above by considering the quantities in the previous subframe. The greater β, the more the past is emphasised. Useful values vor β have been found especially between β∈[0.7,0.8]
  • c)estimation of g and n For determining the estimate of SN andR according to the formula above, furthermore the knowledge of the estimate g and n of the noise fixed gain is required
    The estimation of g and n is based on the principle of minimum statistics, wherein the short-term minimum of the estimation of the noisy signal power P(m) is searched: P(m) = α(m).P(m-1) + (1-α(m)).g 2 y (m), with
    Figure 00120001
    where σ and 2 / N is the estimate of the noise power found by using the minimal value the noisy signal power P on a window of length D. As a noisy speech signal contains speech and noise, the minimum value, which is present over a certain time period, and which occurs e.g. in speech pauses represents the noise power σ and 2 / N.
    αmax is a constant, e.g. an advantageous embodiment uses αmax = 0.96.To reduce the number of required comparisons for estimating the noise power σ and 2 / N, that window of length D is divided in U sub-windows of length V. The minimum value in the window of length D is the minimum of the set of minimums on each subwindow. A buffer, Min_I of U elements contains the set of minimums from the last U sub-windows. It is renewed each time that V values of P are computed. The oldest element of the buffer is deleted and replaced by the minimum of the last V values of P. The minimum on the window of length D, σ and 2 / N for each sub-frame m is the minimum between the minimum of the buffer and the last value of P computed. σ and 2 / N can be increased by a gain parameter omin to compensate the bias of the estimation. A bias might be due to a continued overestimating of the noise, e.g. if a continually present murmuring is considered as noise only. The value of g and n is finally given by: g n = σ 2 N (m)
  • d)postfiltering to .control the noise reduction The noise reduction, as it has been described above, may cause some artefacts during the voice activity periods, e.g. that the speech signal is attenuated due to an overestimation of the noise componentTo counter this effect, a postfiltering is performed. In case the noise reduction is not so significant, e.g. in cases where the SNR is high and thus γ c is close to 1, γ c is forced to be 1. The energy E u of the subframe n, which contains e.g. in a CELP codec 40 speech samples, can be described as:
    Figure 00130001
    , wherein n is the summation index, g p is the adaptive code book gain, v(n) is the adaptive codebook excitation and c(n) is the fixed codebook excitation. After the noise reduction, the excitation energy is:
    Figure 00130002
    The final value of γ c (m) is given by:
    Figure 00130003
    Typically Th dB can bechosen equal to 1.

Claims (11)

  1. Method for encoding an acoustic signal (y(n)) containing a speech component and a noise component by using an analysis through synthesis method, wherein for encoding the acoustic signal a synthesised signal is compared with the acoustic signal for a time interval, said synthesised signal being described by using a fixed codebook and an associated fixed gain, comprising the steps:
    a) Extracting an estimated fixed gain (g and n ) of the noise component from the acoustic signal (y(n)) for the time interval;
    b) Representing the noise component for the time interval by its fixed gain (g and n );
    c) Estimating a signal to noise ratio of the speech component to the noise component for the time interval based on the signal to noise ratio of an earlier time interval and a ratio of the acoustic signal (y(n)) to the noise component in the time interval;
    d) Determining a modification factor (γ c ) based on the estimate of the signal to noise ratio;
    e) Deriving an estimate of a fixed gain (g and s (m)) of the speech component by modifying said fixed gain of the speech component with a modification factor (γ c ).
  2. Method according to claim 1, wherein the synthesised signal is further described by an adaptive codebook and an associated adaptive gain.
  3. Method according to claim 2, comprising the further step
    f) Comparing the energy of the signal in a time interval with its estimate based on the estimated fixed gain (g and s (m)) of the speech component,
    g) Modifying the modification factor in dependence on the result of said comparison
  4. Method according to any of the previous claims, wherein the extracting in step a) is done by searching the minimum value of a power of the signal (y(n)) in a time interval, a part of a time interval or a set of time intervals.
  5. Method according to any of the previous claims, wherein the estimating in step c) is performed by using SNR(m) = β γ c (m-1). g y (m-1) g n (m) δ(m) + (1-β) g y (m) g n (m) δ(m), wherein SN andR(m) is the estimate of the signal to noise ratio for the current time interval m, δ is a exponential weighting factor, β is a weighting factor, γ c (m-1) is the modification factor for the previous time interval (m-1), g y (m) is the fixed gain of the signal in the current time interval , g y (m-1) is the fixed gain of the signal in the preceding time interval and g and n (m) is an estimate of the fixed gain of the noise component in the current time interval.
  6. Method according to any of the previous claims, wherein the determining of the modification factor in step d) is performed by using γ c (m) = SNR(m)1+SNR(m) , wherein SN andR(m) is the estimate of the signal to noise ratio for the current time interval m.
  7. Method according to any of the previous claims wherein the deriving in step d) is performed by using: g s (m) = γ c (m).g y (m), wherein γ c (m) is the modification factor for the time interval m, g y (m) is the fixed gain of the signal in the time interval m and g and n (m) is an estimate of the fixed gain of the noise component in the current time interval.
  8. Method according to any of the previous claims wherein said time intervalls are frames or subframes.
  9. Noise reducing apparatus with a processing unit set up for performing a method according to any of the claims 1 to 8.
  10. Communications device, in particular a mobile phone with a noise reducing apparatus according to claim 9.
  11. Communications network with a noise reducing apparatus according to claim 9.
EP03022249A 2003-10-01 2003-10-01 Speech coding method applying noise reduction by modifying the codebook gain Withdrawn EP1521242A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP03022249A EP1521242A1 (en) 2003-10-01 2003-10-01 Speech coding method applying noise reduction by modifying the codebook gain
PCT/EP2004/051712 WO2005031708A1 (en) 2003-10-01 2004-08-04 Speech coding method applying noise reduction by modifying the codebook gain

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP03022249A EP1521242A1 (en) 2003-10-01 2003-10-01 Speech coding method applying noise reduction by modifying the codebook gain

Publications (1)

Publication Number Publication Date
EP1521242A1 true EP1521242A1 (en) 2005-04-06

Family

ID=34306816

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03022249A Withdrawn EP1521242A1 (en) 2003-10-01 2003-10-01 Speech coding method applying noise reduction by modifying the codebook gain

Country Status (2)

Country Link
EP (1) EP1521242A1 (en)
WO (1) WO2005031708A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019081089A1 (en) * 2017-10-27 2019-05-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Noise attenuation at a decoder
CN114023352B (en) * 2021-11-12 2022-12-16 华南理工大学 Voice enhancement method and device based on energy spectrum depth modulation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026356A (en) * 1997-07-03 2000-02-15 Nortel Networks Corporation Methods and devices for noise conditioning signals representative of audio information in compressed and digitized form
WO2001002929A2 (en) * 1999-07-02 2001-01-11 Tellabs Operations, Inc. Coded domain noise control
US20020184010A1 (en) * 2001-03-30 2002-12-05 Anders Eriksson Noise suppression
EP1301018A1 (en) * 2001-10-02 2003-04-09 Alcatel Apparatus and method for modifying a digital signal in the coded domain

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026356A (en) * 1997-07-03 2000-02-15 Nortel Networks Corporation Methods and devices for noise conditioning signals representative of audio information in compressed and digitized form
WO2001002929A2 (en) * 1999-07-02 2001-01-11 Tellabs Operations, Inc. Coded domain noise control
US20020184010A1 (en) * 2001-03-30 2002-12-05 Anders Eriksson Noise suppression
EP1301018A1 (en) * 2001-10-02 2003-04-09 Alcatel Apparatus and method for modifying a digital signal in the coded domain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHANDRAN R ET AL: "COMPRESSED DOMAIN NOISE REDUCTION AND ECHO SUPPRESSION FOR NETWORK SPEECH ENHANCEMENT", PROCEEDINGS OF THE 43RD. IEEE MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS. MWSCAS 2000. LANSING, MI, NEW YORK, NY: IEEE, US, vol. 1 OF 3, 8 August 2000 (2000-08-08) - 11 August 2000 (2000-08-11), pages 10 - 13, XP002951730, ISBN: 0-7803-6476-7 *
MARTIN R ET AL: "Optimized estimation of spectral parameters for the coding of noisy speech", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 3, 5 June 2000 (2000-06-05), Istanbul, Turkey, pages 1479 - 1482, XP010507630 *

Also Published As

Publication number Publication date
WO2005031708A1 (en) 2005-04-07

Similar Documents

Publication Publication Date Title
EP1363273B1 (en) A speech communication system and method for handling lost frames
CA2399706C (en) Background noise reduction in sinusoidal based speech coding systems
US6782360B1 (en) Gain quantization for a CELP speech coder
US6931373B1 (en) Prototype waveform phase modeling for a frequency domain interpolative speech codec system
JP2971266B2 (en) Low delay CELP coding method
US6996523B1 (en) Prototype waveform magnitude quantization for a frequency domain interpolative speech codec system
US20050065792A1 (en) Simple noise suppression model
EP1313091A2 (en) Speech analysis, synthesis, and quantization methods
EP0899718B1 (en) Nonlinear filter for noise suppression in linear prediction speech processing devices
EP1301018A1 (en) Apparatus and method for modifying a digital signal in the coded domain
US10672411B2 (en) Method for adaptively encoding an audio signal in dependence on noise information for higher encoding accuracy
EP2608200B1 (en) Estimation of speech energy based on code excited linear prediction (CELP) parameters extracted from a partially-decoded CELP-encoded bit stream
EP1521242A1 (en) Speech coding method applying noise reduction by modifying the codebook gain
EP1521243A1 (en) Speech coding method applying noise reduction by modifying the codebook gain
EP1521241A1 (en) Transmission of speech coding parameters with echo cancellation
EP1944761A1 (en) Disturbance reduction in digital signal processing
EP0984433A2 (en) Noise suppresser speech communications unit and method of operation
Fapi et al. Noise reduction within network through modification of LPC parameters
KR20110124528A (en) Signal preprocessing method and apparatus for high quality encoding in speech encoder

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

AKX Designation fees paid
REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20060414