[go: up one dir, main page]

CA2120902A1 - Speech coder employing analysis-by-synthesis techniques with a pulse excitation - Google Patents

Speech coder employing analysis-by-synthesis techniques with a pulse excitation

Info

Publication number
CA2120902A1
CA2120902A1 CA002120902A CA2120902A CA2120902A1 CA 2120902 A1 CA2120902 A1 CA 2120902A1 CA 002120902 A CA002120902 A CA 002120902A CA 2120902 A CA2120902 A CA 2120902A CA 2120902 A1 CA2120902 A1 CA 2120902A1
Authority
CA
Canada
Prior art keywords
signal
shift
subframe
residual signal
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002120902A
Other languages
French (fr)
Inventor
Luca Cellario
Daniele Sereno
Willem Bastiaan Kleijn
Peter Kroon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TIM SpA
AT&T Corp
Original Assignee
SIP Societa Italiana per lEsercizio delle Telecomunicazioni SpA
American Telephone and Telegraph Co Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SIP Societa Italiana per lEsercizio delle Telecomunicazioni SpA, American Telephone and Telegraph Co Inc filed Critical SIP Societa Italiana per lEsercizio delle Telecomunicazioni SpA
Publication of CA2120902A1 publication Critical patent/CA2120902A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/10Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0003Backward prediction of gain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0012Smoothing of parameters of the decoder interpolation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0013Codebook search algorithms
    • G10L2019/0014Selection criteria for distances
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

SPEECH CODER EMPLOYING ANALYSIS-BY-SYNTHESIS TECHNIQUES
WITH A PULSE EXCITATION
ABSTRACT
In an analysis-by-synthesis coder, the original speech signal undergoes small time shifts to match in time the signal to be coded with the replica reproduced by the long-term synthesis filter. The shift is determined at each subframe by an exhaustive search within a range of possible values so at to minimize the error signal energy. Once the optimal shift has been determined, the optimal excitation is searched for. The excitation is chosen in a codebook containing words with very few pulses arranged in a deterministic structure, which words are all obtained from a limited number of key words. The deterministic codebook structure allows a fast search for the optimal excitation, without need of storing the codebook and actually performing the synthesis filterings of the candidate excitations.

Description

212~02 :

~".~- ."`-s ~o --~ :

SPEECH CODER EMPLOYING ANALYSIS-BY-SYNTHESIS TECHNIQUES
WITH A PULSE EXCITATION
::
The present invention relates to speech coders employing analysis-by-synthesis techniques, and more particularly to a coder for low-bit-rate applications, -~
preferably at the lowest limits of the range of rates for which the above-mentioned coders can be used witn good -performance, e.g. rates within the 4 - 8 kbit/s range.
An example of this type of applications is represented by speech coders to be used for the so-called half-rate channel of the European mobile radio system.
In coders using analysis-by-synthesis techniques, -~
for each block of speech signal samples to be coded, the excitation signal for the synthesis filter simulating the speech production apparatus is chosen within a set of excitation signals so as to minimize a perceptually meaningful measure of distortion. This is commonly obtained through the comparison of the synthesized samples and of the corresponding samples of the original signal and the simultaneous weighting, in a suitable filter, with a function that takes into account how human perception evaluates the resulting distortion.
In its most general form, the synthesis filter includes a cascade of two elements that impose short-term and long-term spectral features, respectively, on the excitation signal. The former ones are linked to the 212~2 correlation among subsequent samples, which generates a non-flat spectral envelope, and the latter ones are linked to the correlation between adjacent pitch periods, on which the fine signal spectral structure depends. With S such a scheme, the coded signal includes information relating to excitation and to short-term synthesis parameters (short-term linear prediction coefficients or ~ --- -other quantities related to them) and long-term ones ~long-term delay and linear prediction coefficients).
The insertion of long-term features into the coded signal greatly enhances natural sounding of the signal, especially if the delay is updated at each subframe during the analysis-by-synthesis cycle; however,the related information would require most of the bits available for coding. Especially in case of low-bit-rate applications, it is therefore particularly interesting to ~ -search for solutions that enable a reduction c~ the ~-amount of information to be transmitted to the decoder, while preserving signal quality.
In the paper "Generalized analysis-by-synthesis coding and its application to pitch prediction" presented by W.B. Kl~ijn, R.P. Ramachandran and P. Kroon at the -~
ICASSP 92 Conference, San Francisco (California, USA), March 23-26 1992, paper I-337, it is suggested for this purpose to carry out a long-term analysis delay interpolation, the delay being updated at each frame. A -direct interpolation, without adequate arrangements, would provide delay values that are not the optimal values and would provoke time misalignments among long-term spectral features in the original signal and in the synthesized signal, that generate a significant distortion.
To avoid these inconveniences, the paper suggests to modify the original signal so that long-term predictor parameters become known functions of time and allow a direct interpolation without degrading performance. The suggested modifications consist of limited time oscillations and small amplitude scalings of the original .:
'.

21~ 02 signal. Time oscillations can be carried out in discrete manner. The need for inserting these time oscillations, and therefore for setting an optimal amount thereof, obviously increases the coder complexity.
To solve this problem, according to the present invention, therefore, a coding system is provided in which, before long-term analysis, discrete time shifts are introduced on the residual signals and in which the search for optimal e~citation signal and optimal shirt is carried out so as to reduce complexity of computations.
The invention characteristics are disclosed in the appended claims. ; -- -A preferred embodiment of the invention will now be described, with reference to the enclosed drawings, in which:
- Fig. 1 is a block diagram of the coder;
- Fig. 2 is a functional diagram of some blocks of the coder; -~
- Fig. 3 is a block diagram of the decoder.
Before describing in detail the coder/decoder structure, the principles on which it is based will be ~ ~-summarized. The coder receives samples x(n) of the speech signal to be coded, grouped into blocks (commonly called -'frames') including a fixed number Lf of contiguous -samples. Every frame of Lf samples is then divided into subframes of Ls contiguous samples. The coder must determine a set of parameters to be transmitted to the -decoder so that the decoder is able to synthesize a signal that approximates the original signal. To achieve this, an analysis-by-synthesis procedure is used, through which the coder analyzes the effects of the possible values of each parameter and chooses the value that enables obtaining the best approximation of the original signal. For this purpose, the coder will contain a replica of the decoder to produce, for each of said values, the corresponding output signal. To generate these output signals, both long-term and short-term correlations of the speech signal are exploited, imposed .. ' ' ' -': ~
.

... -.. , ... .~ .".. ~ . ::~
2~2~02 on an excitation signal through respective synthesis filters. At each frame, the coder carries out a linear prediction analysis (short-term or LPC analysis) and computes the short-term residual signal, that is used to S compute parameters (delay and coefficient) of the long-term synthesis filter. (The coefficient is unique in the preferred embodiment, since a first-order filter is used). To improve the resolution of long-term-correlation ~ -information, both the delay and the coefficient are interpolated when the delays of the current frame and the previous frame are close in value. To reduce the eîfects of time mismatches between the original signal and the reconstructed one, at each subframe small time shifts can be introduced in the oriqinal speech signal: the shift 15 amount is determined through an exhaustive search in a -range of possible values so as to minimize the energy of the error (difference between original signal and reconstructed signal). After having determined the optimal shift, the search for the optimal excitation signal is carried out.
In the following, to make the description clearer, the possible excitation signals will be considered as words chosen in a certain codebook, that is, reference is made to a type of coder known as CELP (Codebook Excited Linear Prediction), even if, as it will be seen, every word is made up of an extremely small number of pulses (preferably 1 or 2) with deterministically predefined amplitudes and positions, and the codebook is not stored. -The coded signal will include information related to short-term and long-term synthesis filter parameters and to the optimal excitation, transmitted as usual in the form of suitably coded indexes.
In the decoder, starting from these indexes, an excitation signal corresponding to the one used by the coder will be retrieved and filtered in the chain of a long-term synthesis filter and a short-term synthesis filter to provide a reconstructed signal that can be still sub~ected to a further filtering (post-filtering), ;~

~:.' '.',. :

2 :~ 2 ~
, -based for example on short-term synthesis parameters, to improve the subjective signal quality. The reconstructed signal is then converted again into analogue form and supplied to utilization devices.
By way of example, in the following description reference will be made to frames with length Lf = 160 samples (that, with a 8-kHz sampling frequency, -~
correspond to a speech signal segment whose length T = 20 ms), divided into 8 subframes whose length Ls = 20 samples. For reasons related to the introduction of time shifts, it is necessary to have available, in addition to the Lf samples of a frame, a group of H+K samples of the -following frame (e.g. H = 24, K = 8).
With reference to Fig. 1, the input signal samples ~ ~
15 x (n) present on a connection 1 are temporarily stored in ;- - ;
a buffer MT arranged to store N = Lf+H+K samples, and - -every T ms a block of Lf samples will be written and read. Samples read in MT are supplied to a high-pass filter FPA whose task is removing d.c. drifts and low-20 frequency noise, and the filtered signal xf(n) is ~ -supplied to short-term analysis circuits STA and to a linear prediction filter LPC. --Circuits STA are to determine, for each frame, a set of P linear prediction coefficients ai (e.g. 10), to 25 convert these coefficients into a group of parameters in the frequency domain, commonly known as LSP (Line Spectrum Pairs) and to carry out a quantization, for example a scalar one, of the differences between adjacent parameters. Indexes j(~), that are part of the coded signal, are transmitted to the decoder through a connection 2a after binary coding in circuits that are not shown. Conversion into line spectrum pairs is desirable since, as well known, spectrum lines have properties of quantization, interpolation and check of 35 synthesis filter stability that are better than those of the coefficients. Before computing line spectrum pairs, in the block STA a smoothing of spectrum information related to formants is also carried out to match it to . . ., ~ .: - :
, .. ~, ., .: .. , . - - - ... -,~ . - : . : , 212~

the quantization circuit resolution. This is accomplished by multiplying computed coefficients ai by a respective factor ~1i, whose value is typically less than l but quite near 1. This operation allows reducing the risk, in case of particularly narrow formants, of reproducing after quantization formants that are equally narrow, but shifted with respect to the original ones, and therefore reduces a possible cause for the degradation of coded signal quality.
The circuit STA computes coefficients ai according to the classical autocorrelation method, as described in "Digital Signal Processing of Speech Signals" by L.R.
Rabiner and R.r~. Schafer (Prentice - ~all Ed., Englewood Cliffs, N.J., ~SA, 1978), p. 401. For the computation, STA operates on a set of Lf+~ input samples ~in particular, the samples that occupy the last Lf+P
positions in MT), obtained through a trapezoidal window that weights with a maximum weight (particularly 1) all samples except for the first and the last P ones, for which the weights have been determined with a simple linear interpolation operation between minimum and maximum weight: in this way, smoothing, that is required by the autocorrelation method to provide good results, is limited to the overlapping area between contiguous windows. The forward positioning of the window also takes into account the fact that, when coding the initial subframes of a frame (e.g. the first 3), in place of linear prediction coefficients computed for the frame itself, coefficients are used which are obtained by the conversion of line spectrum pair values determined through interpolation between values related to the previous frame and values related to the current frame.
This ensures a gradual transition between current frame parameters and previous frame parameters.
The transformation of linear prediction coefficients into line spectrum pairs is carried out, for example, in the way described by P.Kabal and R.P. Ramachandran in the article "The computation of line spectral frequencies .,~ ' ,' ' ~
~,., '" ~, "':.

2 ~ ~n ~1 7 using Chebyshev polynomials", IEEE Transactions on Acoustic, Speech and Signal Processing, December 1986.
The operations of STA are typical of any linear prediction coder, and therefore a more detailed description s not necessary.
The in~exes j(~) are also supplied to a linear prediction coefficient reconstructing circuit STR1 that supplies fi _er L'C, short-term synthesis filters STS1, STS' and s?ectr-l weighting filters SW, SW' with quantized -~-alues of the coefficients, obtained by applying in~-erse p-ocedures with respect to the ones used to transform the coefficientsinto line spectrum pairs.
STR1 also c-mputes interpolated values to be used in the first three subfra2es. To simplify, in the following, the quantized values a-e also designated ai.
The filter LPC receives the filtered speech signal samples xf(n) and -ilters them according to the p conventional function 1 - A(z),where A(z) = ai-Z~i, i=l generating tne short-term prediction residual rS(n), that is supplied both to a low-pass filter FPB, that produces a filtered residual signal rf(n), and to time shift circuits TS, that produce a modified residual signal rm(n). Low-~ass Liltering facilitates, as well known, operations of the following long-term analysis circuits LTA.
The circuits LTA must determine, at each frame, and supply the following long-term synthesis filter LTS1 with the delay d (pitch period) with which a sample of an excitation signal is used to generate a reconstructed signal and the gain or coefficient b with which said sample is weighted.
The block LTA computes the delay d by maximizing the 5 autocorrelation function N-l-k R~rf(k)] = rf(n)-rf(n+k) (1) n=x ~ :~ 2 ~
.~ :

where k can vary between a minimum value and a maximum value allowed for the delay d (e.g., 20 and 120), and x is a preset number, whose purpose is causing the length of the window taken into account for the calculation to enable obtaining a satisfactory value for d. Considering that the window must include the most recent samples, as already said, its length is a compromise between two opposed needs: the greater the length, the most accurate the evaluation; on the other hand, the shorter the window, the more its center is next to the end or the frame to be coded (Lf samples) and therefore it allows obtaining a current value next to that end, what is required for interpolation. For example, x can be K. In the preferred embodiment, the delay is never less than the length of a subframe, and this simplifies considerably subsequent operations. The value computed with (1) can also be subjected to corrections, that will be examined afterwards, aimed at guaranteeing a shape as much as possible smooth for d and compensating for synchronism losses due to the time shift.
The value of coefficient b is determined so as to minimize the energy of error signal rltn) at the output of LTS1, given by the equation rl(n) = rf(n) - brf(n-d) (2) For the value d of the delay to be used for the current frame, b is given by the equation b = R[rf(d)]/E(rf), where E(rf) indicates the energy N-1-d E(rf) = ~ rf2(n) (3) n=x A minimum and a maximum, 0 and 1 respectively, are also set for the value of b. Values that are less than 0 are excluded because they would correspond to a signal overturning, that would also compel to transmit a sign bit, while values that are greater than 1 make the filter unstable, as well known. The value of b computed using (2) can also be subjected to corrections aimed at guaranteeing the best quality of the coded signal.
:. '~
' ~, ' .~

~12~

g Furthermore, in certain frames, instead of the values d and b computed with (1) and (2), it is possible to use - values obtained by linear interpolation between values computed for the previous frame and values computed for the current frame.
Together with the computation of d and b, the prediction gain G is also computed: this is a quantity representing the ratio between the energies of input and output signals from the long-term predictor and gives a measure of long-term prediction efficiency. Gain G is defined by the expression G = --------------------, where 1 - bR[rf(d)]/E'(rf) E'(rf) = rf2(n). (4) n=x+d Gain G allows establishing whether the speech segment being coded is voiced, that is indicated by values of G
and b that are both greater than respective thresholds Gthr, bthr. In case of a voiced sound, LTA generates a flag V that is used to decide to carry out the interpolation and to introduce the time shift.
A first correction for delay d is based on the search for the local maximum of function (1) also in a given neighborhood (e.g., + 15~) of the value obtained at the previous frame: if this local maximum is different from the main maximum by an amount that is less than a certain limit, that new value is used that provides a more smooth outline that can be therefore interpolated.
This secondary search is carried out only if the signal in the previous frame was strongly voiced and had been subjected to interpolation. Moreover, the correction, if any, is carried out before computing b and G, so as to use the already corrected value of d for these computations.
A second correction is linked to the presence of the time shift mechanism, that inserts a variable delay h whose effects can be compared to those of a non-2~2~0~ ~
1 0 ' ' ~ '' `
synchronous operation of the coder. To try to recover -~
synchronous features, the value of d computed by LTA and possibly corrected as said before is changed by adding thereto a corrective term d' linked to the amount of the 5 shift itself and given by the expression ~ . -d' = fid/rLf ;~
where fi is the shift accumulated up to that frame expressed as number of samples of the residual signal upsampled by a ~ac.or r, while d and Lf have the meaning said before. Upsampling will be discussed in greater detail with reference to circuits TS. The correction can -~
be carried out if interpolation is required in the current frame and if the speech segment is not voiced.
The first condition is necessary since, if the interpolation is absent, no shift is carried out;
moreover, the signal must not be voiced because in this situation an even minimal modification of d with respect to the exact value can usuaily be perceived. Before adding the corrective term to d, its absolute value is limited to a maximum value Id'lmax, for example l.
Furthermore, the correction is carried out only if it does not modify the decision about interpolation (that will be described afterwards) and does not take the value of d outside the provided range of values. ;
As regards b, a first correction consists of `~
clipping b to a first upper limit bl, since, if b is too ~ -high, an excessive energy increase would occur, which gives rise to noises. Limit bl is linked to the ratio between energies in a pitch period of the current frame 0 and of the previous one and it is given by the expression bl = [E"(rf) o/E~ ~rf)-l]d/2Lf N-l where E"(rf) denotes the quantity ~ rf2(n), that indeed N-l-d is the energy in a pitch period d, and indexes 0, -l denote current and previous frames, respectively. The correction is carried out if the energy in the previous 2 ~ 2 ~

frame exceeds a certain threshold.
A further limitation for b is carried out in case of low values of G (less than Gthr), that show speech segments with low periodicity, while b is relatively high (greater than a second limit b2): in this case, the value b2 is employed, since employing the actual value could produce artifacts in the coded signal.
As regards interpolation, this is carried out if the relative variation of d between two consecutive frames does not exceed, as absolute value, a predetermined amount (e.g., 15%) and if the values of b in these frames are both positive. The actual computation of the values of d and b to be used in case of interpolation is carried out in the long-term synthesis filter LTSl, to which LTA
sends a flag F when the above mentioned conditions are verified. The same flag is also supplied to circuits EM
determining the optimal time shift and excitation.
Information about interpolation is also required by the synthesis filter in the decoder; however, it is not necessary to transmit it, since it can be immediately recreated in that filter, by the comparison between the values of d and b related to two frames, exactly like in the coder.
The values of d and b determined at each frame are converted as usual into the respective indexes j(d), j~b), that are the information related to long-term analysis to be inserted into the coded signal, and that are transmitted to the decoder, after suitable coding, through connections 2b, 2c. Index j(b) is determined through a quantization operation, during which, in addition to limiting the maximum value to l, values of b that are less than half of the first quantized value are forced to 0. No quantization of d is however necessary, since d is already a discrete quantity: it is however preferable to transmit d under the form of an index for sake of uniformity with the other information. The conversion of the values of d into indexes practically consists of their shift, such as to make the possible 2 1 2 ~

range of values begin from 1 instead of from a value dmin. In the described example (101 values of d and j(d)), 7 bits will be necessary to code index j(d), and these bits will also allow coding of values of j(d) outside the provided range. One of these further values (e.g., value 127) is used to show forcing of b to 0 and it is supplied to the decoder in place of index j(d) corresponding to the actual value of d, since, if b = 0, the long-term synthesis filter does not provide contributions to ~he reconstructed signal and delay information is useiess. In addition to information about forcing of b to 0, however, index j(b) corresponding to the minimum value of b is transmitted.
To simplify, circuits generating indexes j(b), j(d) are included into block LTA.
It must be noted that the correction of d to take into account possible shifts is carried out after the corrections of b, since only depending on the corrected values of b, circuit LTA can take decisions related to the sound nature and the need to carry out interpolation and therefore shift.
The operations performed by LTA are described in detail in the appendix, that includes program listing in C language. Given the listing, a technician has no problem in designing devices that perform the described functions.
Indexes j(d), j(b) are reconverted into quantized or reconstructed values of the respective parameters by reconstructing circuits LTR1, composed of simple read-only memories addressed by the indexes. During thisreconstruction, LTRl provides the actual values of d, b if j(d) shows a value allowed for the delay (that is, if j(d) is in the range 1 to 101). If j(d) shows any one of the values outside the allowed range (therefore its value is from 102 to 127), LTR1 provides value 0 for b and value dmin for d. The fact that, when reconstructing the parameters, all indexes j(d) not corresponding to a value allowed for the delay, and not only the one really used 2~9~ :

for this purpose, are interpreted as indication of forcing of b to 0, allows reconstructing the value b=0 even in case of ?ossible errors on the least significant bits of that index. Anyway, if by chance the reconstruc. on o_ b=0 should fail, circuits LTR1 generate the minimu~ valu^ of b since they have at their disposal the corres?ondi~ inàex j(b). To simplify, in the following, -econc ructed ~or quantized) values will also be shown b~ ~, d.
The ~ng-t~ synthesis filter LTS1 generates a reconstruo.ed s~~rt-term residual signal sS(n), by filtering according to the conventional function l/P(z) =
1/1-b z-d an e~ ation signal s1(n). This one is composed o a sha?e information (innovation), represented lS by one of ~he wc=às s(n) of an innovation codebook IC1, by a positive o~ null amplitude parameter g (innovation gain), chosen in a codebook of innovation gains IG1, and by a sigr info--!ation, represented by a parameter (innovation sign) whose value is +1. Signal s1(n) is therefore given by s1(n) = ~g-s~n) = g1-s(n) and is obtained through a multiplier M1. To simplify, we suppose that also parameter ~ is read in codebook IGl. Even if, to facilitate understanding, codebooks ICl, IGl are represented as circuit blocks (that could suggest the idea of memories that contain them), as said above, the particular structure of innovation codebook makes their storage superfluo~s. The structure of innovation and gain codebooks will be examined later.
In order to obtain a sample of the reconstructed residual sS(n), LTSl must weight with the factor b the sample related to instant n-d. In case no interpolation has to be performed, operation of LTSl is quite conventional. In case of interpolation, the values of d and b are compu-ed sample by sample according to the 5 equations d(n) = d(-1) + (n+l)~d (5) b(n) = b(-1) + (n+l)~b . ~ . :,. ~ , - .

:
2 ~ 2 ~ ~ 0 2 with n = O..... Lf~ d = [dO - d(-l)]/Lf and ~b = [bO -b(-1)]/Lf. Symbols dO, bO show the values related to the current frame, d(-1), b(-1) those related to the previous frame. The interpolation is therefore a linear one and extends over a whole frame. The values of d(n) and b(n) then vary sample by sample. As regards d(n), it will generally not be an integer number: this means that the value of signal sS(n) at the continuous time instant n-d(n) does not coincide with that of an actually available 10 sample and must be evaluated: according to the invention, -evaluation is perrormed through a second order polynomial ~ ~
interpolation (that is through a parabola) centered about -the discrete time instant that is nearest to n-d(n); the value thus evaluated is then multiplied by the -- ~ -15 interpolated value b(n). -~
The interpolation procedure adopted has an extremely lower computation complexity than more sophisticated ;~
interpolation methods based on signal filtering. However, -~
its effect is essentially a low-pass one, that is useful -20 for the good operation of the coder since it avoids that -the reconstructed signal has too marked periodicity -properties.
The reconstructed short-term residual sS(n) is -~
supplied to the short-term synthesis filter STSl, whose transfer function is 1/1-A(z). This filter generates the reconstructed speech signal y(n) that is supplied to a ~ -spectral weighting filter SW whose transfer function is, as usual, [1 - A(z)]/ll - Aw(z)], where Aw(z) is the p :::
function ~ awiz-i~ with awi = ai~i~ where ~is an i=l ~ .-experimentally determined corrective factor that -~
determines band widening around formants. The reconstructed and weighted signal yw(n) is subtracted in an adder SM from the modified reconstructed and weighted signal xW(n) obtained by filtering the output signal from TS in the cascade of two filters STS', SW', respectively identical to STSl and SW. At output of SM, a weighted -- ~ 2 ~ 2 ~ ?

error signal e(n) is obtained, that is supplied to an error energy minimizing circuit EM that performs all necessary operations to determine optimal shift and excitation.
5Purpose of circuits TS is aligning in time the signal to be coded with the replica that long-term synthesis filter is able to produce, and in particular avoiding shifts among pitch peaks in the signal predicted by LTS1 and in the original one. For this purpose, TS at each subframe makes the time window of Ls samples, that locates the subframe itself, shif, by a certain amount ~h. The shift to be applied is determined by unit EM with a fast search procedure within a range of values defined by a maximum allowable shift. Shift is applied on the residual signal and not on the original one because the resulting distortion is smoothed by the following filtering in STS', SW' and therefore is substantially imperceptible. The shift applied in a subframe is algebraically added to the one accumulated up to that time, providing a global shift fi, in order to avoid too sudden variations. Global- shift also cannot exceed a certain maximum value (H samples of the original signal).
The reason why H samples of the following frame have also been loaded in MT is therefore evident. Purpose of the shift variation limitation is avoiding excessive distortions; the limitation related to global shift instead is determined by the delay that has to be tolerated in coding procedures and therefore by the availability of future samples. Time shift has a resolution that is less than one sampling period of the original signal, and therefore it is necessary to carry out an upsampling of the residual signal.
Taking into account all this, circuit TS will include an upsampling circuit US (in practice an interpolating filter), that supplies at its output the upsampled residual ~s(fl)~ and a shifting element SH that receives from EM information about shift entity fi and generates the modified upsampled residual ~m(fl) In the ~20~

example, upsampling ratio r is 8, and therefore the upsampled signal has a frequency of 64 kHz: this upsampling ratio provides an suitable resolution for all desired purposes. ~oreover, for the correct operatio~ of the interpolating rllter, it is necessary to always have available a certa~n number of samples following the interested ones: tnis is the reason why the further K
samples of the following frame are also loaded in MT.
It is not ne_essary to materially carry out the downsampling to ob _in a modified residual signal w~th a 8-kHz sampling frequency, since this operation can ke implicitly carried out, when necessary, by simply reading a sample of ~m(R) every r, with an suitable phase.
Element SH will practically be a memory that loads, at each subframe, the rLs samples of the upsampled residual plus a certain number of following and previous samples linked to the maximum allowed shift in a frame (in practice, a number of samples equal to twice the maximum shift, as will be explained in the description of optimal shift search); SH is addressed for reading by the error energy minimizing unit EM, in such a way as to supply the following circuits with Ls samples adequately shifted with respect to the incoming subframe.
Turning back to the innovation codebook, this includes a certain number of words, each having Ls samples, of which only a very limited number is different from 0. This choice derives from the fact that, being the codebook quite limited, it would be an illusion to think to find inside it words with a lot of pulses ~that is non-null samples) in which all pulses are actually suitable, and further enables reducing the amount of computations necessary when searching for the optimal excitation. In the preferred embodiment of the present invention, the codebook is composed of two parts. The first one includes Ls words having a single non-null sample, with amplitude equal to 1 and positive sign, and Ls-l null samples. The non-null sample occupies a different position in all words, that therefore can be 212~2 obtained one from the other by simply shifting the non-null sample by one position. For this first part of the codebook, signal s(n) can be represented as s(n) = ~(n-nl) (5) where ~ is the well known unitary function and n, n1 can have values between 0 and Ls-1.
The second part includes words with two samples whose amplitude is 1, and Ls-2 null samples. These words are generated starting from a limited number of key-words (in particular 3) wlth the method described in European Patent Application EP-A-0396121 in the name of CSELT. In the example taken into account, the three key-words have all the first pulse in position 0 and the second pulse in a respective key position n2(l), n2(2), n2(3), and the other words are obtained making the pulse pair shift towards a word end till the second pulse reaches such end or the first pulse reaches the respective key position.
Xey positions are chosen in order to give origin to Ni2 - (in particular 21) possible positions of the pulse pair;
for each one of these positions, there are two words that are different one from the other by the second pulse sign, as described in said European Application, that take to Ls+2Ni2 (62 in the example) the total number of words in the innovation codebook. For this second part of the codebook, an innovation word is represented by the equation s(n) = ~(n-n~ (n-n2) (6) `
with n = O...Ls-l, n1 = O...Ls-1-n2(p), n2 = n2(p)...Ls-1, p = l...Nip, where n2(p) shows the generic key position and Nip is the number of key positions used (3 in the example).
The innovation codebook structure, with few non-null samples and words obtained by shifting samples by one position starting from a limited number of keys, is a simple deterministic structure that enables a fast search procedure of the optimal excitation that requires neither codebook storage nor the effective filtering of the candidate excitation signal.

. .

2~2~2 ~

During the search for optimal innovation, the test ~ith words c_ the first part of the codebook must be carried out only if long-term analysis has indicated a voiced sound or, on the contrary, when strong energy concentratiors a-e noted in short signal sections. These strong concer~ra_ions can in fact signal the onset of a voiced sectio~, tha_ cannot still be classified as such, since classi~lcetion is based on long-term analysis and in ~he pr_v ~us s gnl sections there were no useful features to ~d C2_Q such onset. Under these conditions, therefore, fi te- L-S1 would indeed not be able to supply a correct predicted signal. Now, it is mandatory, for a good coded s nl c~ality, that pitch pulses be correctly reprodu_ed, anc LhereLore use of single-pulse words proves itself useful to indeed compensate for an inadequate o~eration (in voiced sections) or for an impossible correct operation (in onsets) of long-term synthesis fi~Ler. Single-pulse words, instead, must not be used to re~roduce unvoiced sounds that are not onsets, where their use is counterproductive, even in case it is actually one of them to provide minimum error signal energy, since the subjective effect is usually worse.
The manner in which strong energy concentrations in short times are detected will be described afterwards.
Words in the codebook are identified by a respective index j(s); the index related to the optimal word, adequately coded, is transmitted to the decoder through a connection 2d. Since in the described example the codebook includes 62 words, to which as many indexes j(s) correspond, without having to modify the number of bits coding j(s), two further values of j(s) are available that do not correspond to any word in the codebook. These are used to represent a null innovation gain, as will be said afterwards; similarly to what has been done for long-term prediction delay and coefficient, when generating the indexes, only one of the two values of j(s) not corresponding to an innovation word will be used to indicate g = 0 and, when decoding, g will be set :
.~q:l 212~2 to 0 in correspondence with both values of j(s).
As regards gain g, this is quantized using a codebook built so as to allow saving coding bits with respect to what would actually be necessary to represent all possible values provided in the codebook. Information about gain, for each subframe, is represented in the form of two indexes j(gmax), j(gnor), the first one of which is linked to the maximum value of g in the frame, and the second one to the difference between such maximum value and the actual value, and by sign ~. This information is transmitted to the decoder through a connection 2e.
The codebook includes a number Nig of possible absolute values of g that can be represented as Nig = Nim + Nin -1 where Nim and Nin are two different powers of 2. For example, we can have Nim = 24 and Nin = 22, or Nim = 24 and Nin = 23. At each subframe, the optimal value of g determined with the error minimizing procedure that will be described afterwards is quanti~ed, generating a respective index j(g) that is not transmitted but is reconstructed in the decoder. At the end of the frame, value j(gmax) related to the maximum frame gain is identified and is transmitted as such if it is not less than Nin; otherwise, index j(gmax) is forced to value ~ ;
Nin. In this way, j(gmax) can only assume Nim values and therefore the number of coding bits is limited. Once having identified j(gmax), index j(gnor) is computed for every subframe with the equation j(gnor) = j(gmax)-j(g);
j(gnor) can have values in the range between 0 and Nim+Nin-2. The actual value of index j(gnor) is transmitted only if it is not greater than Nin-1;
otherwise, gain is deemed 0 (that is, innovation is silenced for subframes where gain is very small with respect to the maximum one) and index j(s) of the innovation word is forced to one of the values that do ~ -not correspond to any codebook word to show transmission of a word with null gain. In this way, a reduced differential dynamics is used and the bits that should have been used to represent gain on the whole dynamics, 2 ~ 2 ~

are saved, at the expense of a slight performance loss due to possible innovation silencing. To minimize the effect of channel errors on innovation index j(s), in case of silencing the value Nin-1 for index j(gnor) is anyway transmitted.
The gain codebook can be a logarithmic codebook, so -that the ratio between two consecutive values is a constant. The ratio must take into account several requirements:
- values in dB must be as near as possible to allow a quantization as accurate as possiblei - global dynamics between minimum gain g(l) and maximum one g(Nim+Nin-1) must be adequately extended to cover the different types of sound and a reasonable set of -~
different voice levels; - ; -~
- differential dynamics between g(x-Nim+1) and g(x) must be adequately extended to make the probability of silencing reasonably low. -For example, with the above values of Nim, Nin, the value of the ratio between two consecutive gain levels can range from 3 to 6 dB.
The fast search procedure for optimal shift and excitation will now be described, referring also to the operative diagram in Fig. 2, that correspond to the set of blocks M1, LTS, STS, STS', SM, SW, SW' of Fig. 1. In Fig. 2, the same symbols as in Fig. 1 are used, with the exception of blocks STWl, STW2 that represent the filter resulting from the series of filters STS1, SW and respectively STS', SW', that is a filter with transfer function l/1-Aw(z). In this Figure, each of the filters has been divided into an element with null input (LTSa, STWla, STW2a) that provides contribution of initial conditions (that is of filtering memories for previous subframes), and into an element (STWlb, STW2b) that is reset at each subframe (filtering with null initial conditions), as indicated by signal R supplied by a time base, not shown. Filtering with null initial conditions of excitation is only the short-term filtering, since it :~

212~9~
.

has been supposed that delay d is not less than a subframe.
The optimal shift determination is composed of three steps:
- evaluation of the need to perform a shift;
- determination of an suitable range of shift values;
- search ror the optimal shift in the range.
In the first step, it is checked if three conditions are satisfied:
- the subrrame is not silence, which is shown by the fact that the energy of rS(n) is greater than a given thres hold;
- the signal is voiced or has been subjected to interpolation, which is shown by flags F, V coming from LTA;
- a peak of rS(n) actually occurs in the subframe, which is shown by the fact that the average power of rS(n) in the sublrame (that is the energy divided by the number Ls of samples) is greater than or equal to the energy in a period of length d that ends with the last sample of the subframe itself.
The reason for the first condition is obvious. As regards the second and the third one, shift must be performed only if there is a pitch peak in the subframe.
This occurs first of all in voiced sections; the fact that an interpolation occurred, that is, that the values of parameters obtained in two subsequent frames are very near, suggest a certain periodicity in the signal segment that must be coded, and therefore enabling the shift also in this case can be useful to further reduce risks of misalignment between the reconstruced signal and the original signal.
Computation of energy and powers can be carried out indifferently on the upsampled signal or on the original one. During these computations, the maximum absolute value of ~s in the current subframe and its position are also obtained: they will be used in determining the shift. To determine the position of the maximum, it is . ::.,:' .:' 212~02 .

mandatory to operate on the upsampled signal to get maximum resolution.
The second step determines the lower and upper extremes ~min~ hmax of a range that extends around shift value fi accumul2ted so far in the frame. Values hmaX, fimin are initia7ly fixed so that differences fimax - h and h - hmin have a prearranged value r ~h, for example 20 samples of the upsampled signal ~s There exists therefore a max ~u~ nu~oer of possible values (41 in the example) among which the optimal shift can be searched for. The actuai extreme values fimin, hmaX could be not symmetrical with respect to value fi (that is, the range can be l-~ited on one or both sides of the accumulated value h), since it is necessary to avoid shifting the subframe too much, both in the past, with possible duplication of a maximum of ~s previously taken into account, and in the future with consequent loss of a maximum. This check is made possible by storing the maximum f ~s in the subframe. However, unless range limiting has not been bilateral, the search for the optimal shift is carried out trying to keep constant the range width, by taking into account also some values beyond the extreme that is not subjected to limitation.
In any case, the shift to be carried out must not make value H exceeded.
The optimal shift value within the test range is the one minimizing energy of an error signal e1(n) represented by the difference between reconstructed and weighted modified signal xW(n) (Fig. 1) and contribution YWl(n) of excitation filtering memories, and is obtained with a fast search procedure that allows reducing the amount of necessary computations.
For this fast search, it must be taken into account on one hand that output signal xW(n) from STW' can be 35 expressed as P
xw(n) = ~m(R) + awi-Xw(n~ S(rn+fi) +
i=l :~ ' ,,~

2 ~ ~! l31 r3 ~
'`:

P ..
+ ~ awixw(n-i) (7) i=l (where n ranges from 0 to Ls-1), and on the other hand that the same signal is the sum of output xwl of STW2a and output XW2 - STW2b. Summation in (7) represents signal xwl, that can be computed, once and for all, like the corresponding contribution Yw1 Of chain LTSa, STWla, and therefore an error eo = xwl - Ywl can also be computed once and for all, tha~ appears at the output of an adder SMa. Error e1 can then be written e1 = eO + xw2, where xw2 depends on ~s and therefore on the shift. It is then necessary to determine xw2 for all values of the shift, to compute for each one the respective energy of ~ `
e1, and to store value of fi that provides minimum energy and corresponding signal xW(n).
The procedure to determine xw2 adopted according to the invention takes into account that~ for a given shift value, signal xw2 is given by min(n,P) Xw2(n) = ~s(rn + fi) + ~ awi-XW(n-i) (8) -~
i= l . ' .
The upper limit of the summation is the minimum between n and P, since when filtering with null initial conditions, -25 samples with n-k < 0, that is, samples of the previous - -subframe, must not be taken into account. Values of xw2 are actually computed according to (8) for a first group of r possible shifts that range from fimax to fimax~r+1;
obviously, the tests will be stopped if by chance fimin is reached before having examined all r shifts. For the other values of shift, from fimax~r to fimin~ instead of being somputed with (8), xw2 is computed according to the equation xW2(n) = xw2(n-l) + Q(n)~s(fi) (9) (n = Ls-1...1) with xw2(0) = ~s(fi)-In (9), Q(n) shows the truncated pulse response (since it is computed only for Ls values of n) of filter STW, with Q(0) = 1.
~ -. - ~':::.:

212~02 It can be immediately noted that, taking into -~
account that Q is determined once and for all, beside a ~ ~
certain value, (9) requires much fewer computations than -(8).
S It must urther be stated that r values of xw2 must ~;
actually be computed according to (8) and (9), that is one for eacn of the r upsampled signal samples corresponding to a 8-kHz sampling period.
Once hav~-;g mlnlmized the energy of el(n) and having found the optimal shift, minimlzation of the energy of e(n) is started to find the optimal excitation. Unit EM
directly computes an expression of the energy to be -minimized thal is f~nction of the position of the pulses in the innovation word, and for this purpose the pulse response Q is employed, computed during search for the optimal shift. Computation of the pulse response is made convenient with respect to filtering execution by the fact that every wo~d includes two non-null samples at most. Moreover, taking into account the more general case of the words with 2 pulses, the global pulse response is the sum of two responses spaced by a distance equal to the key; responses for all other words linked to a key are then obtained simply by a translation by one sample -at a time. To simplify, in the following mathematical expressions, the variability range of the summation index for summations extended to all samples in a subframe has not been indicated.
Error e(n), for a generic excitation word, is given by e(n) = el(n) ~ Yw2(n) = el(n) - gl-u(n), where u(n) is 30 the output signal from STWlb. Energy of e(n) is given by ~ ~ -E(e) = e2(n) = ~ [el(n) - gl-u(n)]2 (l0) that can be written as E(e) = sel2 - 2gl el-u + gl2 ~u2.
Taking into account that the first and the last summations represent energies of signals el, u, and the second one represents mutual correlation R(elu)(k) between them, evaluated for k=0 and in the following ~ ~
simply called R(elu), we have -E(e) = E(el) - 2glR(elu) + gl2E(U) (ll).
~ :
:., ~''~' .,.

: : . : ..... , . ... -:: :: :~ . . ~
.... ~ .:.- . . :

212~
-Minimizing E(n) is the same as maximizing the difference of energies ~E = E(e1) - E(e) = 2g1R~e1u) - g1 E(u) (12).
For each word of the examined codebook, the maximum of (12) is obtained for a value gO = R(e1u)/E(u) of g1, as immediately appears by computing the derivative with respect to g1 and making it equal to 0, to which a value ~Eo = R(e1u)2/E(u) = gOR(elu) (13) correspcsds.
The particula- structure of the innovation codebook allows to directly obtain E(u) and R(elu), that depend on the position of the pulse or pulses in the word, by exploiting the pulse response of filter STW1, that is equal to the one OL filter STW2, previously determined.
S In fact E~u) = [Q(n-n1) + Q(n-n2)]2 = ~ Q2(n-nl) + ~ Q2(n-n2) + 2 ~ Q(n-n1) Q(n-n2) or, more simply, -E(u) = Eq(nl) + Eq(n2) + p(nl~-n2)- (14) where Eq is energy of the adequately truncated signal Q
(that is, computed for a number of samples determined by -the position of n1, n2). Moreover, R(e1u) can be written R(e1u) = R[e1q(n1)] + R[e1q(n2)] (15) Ls-l-K
where R[elq(K)] = ~ el(n+K)Q(n).
n=0 It is clear that for single-pulse words, relations ~14) and (15) are simply reduced to E(u) = Eq(n1) and R(elu) = R[elq(nl)]~
The operations performed at each subframe by EM to determine the optimal excitation can be considered as divided into three steps.
a) Before examining the effect of each innovation word, as soon as values ai are available, EM computes and stores the possible values of the three addends in (14).
Computation will be carried out only for the first 4 subframes, since, as already said, in the following subframes filter coefficients ai do not change. Terms Eq -:- ~
..,.. - ~ .
-2 ~
26 ~-can be computed with a simple iterative procedure, according to the equation Eq~Ls-1-n) = Eq(Ls-n) + Q2(n) (16) with n = l... Ls-l and Eq(Ls-1) = 1. -Moreover, since the codebook includes only Ni2 possible pairs of values nl, n2, computation of p is carried out -only for these pairs, according to the expressions pk = 2Q[n2(P)]
p~ = 2pk+1 + 2Q[n+n2(p)] Q(n) where n2(p) has the already cited meaning, n = l...Ls~
n2~p) and k = Ni2...1 is the generic pair of values n1, n2~
b) As soon as the optimal value of e1 is available, always before the search procedure, EM computes and stores values R(elq).
c) After these operations, EM computes values of E (u), R(elu) word by word, determining value gO and the related ~sE~ and storing the word index and the related value of g that originated the energy minimum. -~;~
As said above, if the sound is not voiced, the tests with words of the first part of the codebook are carried -out only if strong energy concentrations in short times are noted,that can show the onset of a voiced signal section. For this purpose, within the subframe, energy of a certain group of samples of the modified residual is computed (e.g. 5 samples), starting from the beginning of the subframe and shifting, by one sample at a time, the -window selecting the group till the whole subframe has been scanned, and storing which group shows maximum ~ ;
30 energy. Furthermore, the average power (that is the ~
energy divided by the number of samples) in the window ;
where the maximum occurred and the average power in the subframe are also computed. Tests with single-pulse words will be enabled if subframe energy and the ratio between the average powers in the window and in the subframe are greater than suitable thresholds. Moreover, if the optimal innovation is composed of a single~pulse word, the absolute value of gain g is limited to a maximum :.

2 1 2 ~ ~ ~) r ~

value Islmax = Irslmax~ where is a parameter approximately equal to 1 and IrSImaX is the residual maximum computed during operations to determine hmin, nmaX. Purpose of this limitation is also to prevent insertion into the signal of a pulse with too high energy with respect to ~he maximum residual amplitude in the same subframe.
At the end of each subframe, initial conditions in filte~s LTSa, STWla, STW2a will have to be updaied. To update LTSa, that is ss(n), it will be necessary to add a pulse or a pair o- pulses (corresponding to the optimal innovation word) to sSl(n). To update yw(n), it will be necessary to add to YWl(n) one or two pulse responses (corresponding to signal u(n)) adequately shifted and multiplied by gain g in order to supply the value Of Yw2 corresponding to the optimal excitation. The pulse -~
response will also be exploited to update STW2a.
Furthermore, since filters STW have order P, only the last P samples of such responses (from Ls to Ls-P) are of interest.
The operations of EM are also included in the appendix. ~ -The decoder structure will now be described, referring to the diagram in Fig. 3, where blocks 2S corresponding to the ones already described with reference to Fig. 1 are shown by the same reference symbols, followed by digit 2. The various reconstructed -signals are also shown with the same reference symbols used for the original signals in the coder.
The decoder receives from the coder, through connections 2a-2e, indexes j(~ d), j~b), j(s), j~gmax), itgnor) and sign ~ for the innovation gain. At each subframe, index j~s) selects an innovation word s(n) in codebook IC2 or indicates a subframe that does not provide innovation contributions (g=0). If a word has been selected, it is multiplied in M2 by gain g whose absolute value is selected in the codebook IG2 by an index j(g) = j(gmax) - j(gnor) and whose sign is ~, 212~02 thereby providing the reconstructed excitation signal (or fixed codebook contribution) s1(n).
This signal is filtered in the long-term synthesis filter LTS2 to provide the reconstructed short-term residual sS(n). In order to operate exactly like its replica LTS1 in the coder, filter LTS2 must receive ~rom reconstruction circuit LTR2 parameters d, b and flag F -indicating the ?ossible need to carry out interpolation of d and b. T:rerefore, LTR2 will include a read-only memory with two tables addressed by indexes j(d), j(b), like LTR1 (Fig. 1), in addition to a circuit suitable to store values of d, b related to two consecutive frames ~.
and to carry out the comparisons, described in conneclion with the coder, necessary to determine if interpolation -~
lS of d, b is necessary. Signal sS(n~ outgoing from LTS2 iS -~
filtered in the short-term synthesis filter STS2 using i~ :
coefficients ai generated in coefficient reconstructing circuit STR2 starting from indexes j(~). In STS2, too, for the first subframes of each frame, interpolated 20 coefficients will be used. The reconstructed speech -signal y(n) is still subjected to a further filtering in an adaptive filter PF that uses coefficients obtained from linear prediction coefficients ai and that inserts into the reconstructed speech signal a distortion that improves the perceptual effect. At the output of PF, -there is a filtered reconstructed signal yp(n)~ Employ of filters like PF when coding a speech signal is well known to the technicians and does not require further ~ -explanations.
It will be noted that the decoder does not take into account the possible shift carried out into the coder: in ;
fact, purpose of the shift is just causing the synthesized signal to be a replica as good as possible of ~
the original signal, and therefore the decoder only ``?' 3S requires information related to excitation and filters.
It is clear that what has been described is provided only by way of non-limiting example and that variations and modifications are possible without departing from the 212~D2 scope of the present invention. Thus, for example, even if reference has been made, about innovation, to sample whose amplitude was 1, it is also possible to use samples whose amplitudes are chosen in a finite set of values S (e.g., +1, + -~2, + 1/~2): obviously, in this case the coded signal ~i ll also include information about the relative amplitude of innovation samples. Generalizing equations (14), (15) to the case of pulses whose amplitude is r^_ unitary is immediate. Tne choice of sample amplituces in a finite set of values is not limiting, becauâe anyway relative amplitudes of the samples themselves are quantized.
To simplify the drawings, no timing signals for the various blocks `-ave been shown; on the other hand, the timing sequence of operations clearly results from the description.

- , . .

2~2~9~2 :" : . :.

APPENDIX
Due to formal requirements of Language C, some quantities in the program listing are written in somewhat different manner from the description or with different symbols. Differences only concerning presence or sub-scripts, parentheses etc. are not discussed in details, since there is no risk of obscurities. Symbols n_, h_, rs_, Eh, h _ , Relh, Nik, n2key, id, ib, is in the ;~
listing correspond to n, h, rs, Eq, Q, Relq, ~ip, n2(P), j~d), j(b), j(s) in the specification; letters "thr"
added to certain symbols (such Ers, Erf........ ) indicate ~ ~-threshold values for the respective quantities, which values have been discussed in the specification; EPSILON, RO are proportionality factors, also discussed in the specification; DELTA indicates increments in the value of the respective quantity (DELTAd correspondig to d' in the - -specification) 1) LONG TERM ANALYSIS ~-/* Search for the long term predictor delay: */

Rrfdmax=-DBL_MAX;
for (d_-Ls; d_<=D; d_++) {
Rrf[d_]=0.; ~ -~
for (n=K; n<=Lf+H+K-1-d_; n+~) Rrf[d_]+=rf[n+d_]*rf[n];

if (Rrf[d_]>RrfdmaX) d[0]=d_;
Rrfdmax=Rrf[d_];
}

~ * Secondary search for the long term predictor delay around the 2 1 ~

previous value:
*/ : .
:~ .
dmin=sround((l.-EPSILONdthr)*d[-l]);
dmax=sround((l.+EPSILONdthr)*d[-l]);
if (dmin<Ls) dmin=Ls;
if (dmax>D) dmax=D; ~:
10 if (voiced[-l]&&interpolation~
l]&&(d[0]<dminlld[0]>dmax)) { ~' ' .,:
Rrfdmax_=-DBL_MAX;
for (d_ -dmin; d_<=dmax; d _++) if (Rrf[d_]>Rrfdmax_) :
{ --d_-d _;
Rrfdmax_-Rrf~d_]; ~ -} ' :, '~
if (Rrfdmax_/Rrfdmax>=RORrfthr) :~
d[0]=d_;
. ........ ................................................ ... .... .... ,, .,. ,-,:
/* Computation of the long term predictor coefficient and gain: */

Erf=Erf_-Erf_ ~0]=0.;
if (K-l+d[0]<=Lf+H+K-l-d[0]) {
for (n=K; n<=K-l+d[0]; n++) ~ .:
Erf+=rf~n]*rf[n];
for (; n<=Lf+H+K-l-d[0]; n++) .
Erf_+=rf[n]*rf[n];
for (; n<=Lf+H+K-l; n++) ; :
Erf [0]+=rf[n]*rf[n];
Erf+=Erf ;
Erf_+=Erf_ [0]; ~ ~
:'- ~';,: ~':.' 212~ 2 . :
32 .

else { ;~ . ' ' :' '~ "
for (n=K; n<=Lf+H+K-1-d[0]; n++) _rf+=rf[n]*rf[n]; ;
for (; n<=K-l+d[0]; n++) ~rf_ [0]+=rf[n]*rf[n];
for (; n<=Lr+H+K-1; n++) -rL -r=rf[n]*rf[n];
Erf_ [C]+=Erf_;
} . - .:
Erf_ [0]+=rf[Lf+H+K-1-d~O]]*rf[Lf+H+K-l-d[0]];
.
b[0]=(Er~>=Erfthr)?Rrf[d[0]]/Erf:0.; .

G=(Erf>=Erfthr&&Erf_>=Erfthr_)?l./tl.- -~
b[O]*Rrf[d[O] ]/Err_) :1.; ~ -,i.
/* Correction of the long term predictor. . :
coefficient: */

bmaxl=(Erf_ [-l]>=Erfthr_)?pow(Erf_ [0]/Erf_ [-l],(double)d[0]/(2*Lf)): "
DBL_MAX;
:~
if (b[O]>bmaxl) b[0]=bmaxl; ;:: :

if (b[O]>bmax2&&G<Gthr) :~
b[0]=bmax2;

/* Zero-clipping and quantization of the long :~;
term predictor coefficient with preservation of the zero 35 value: */ -~

if (b[0]>=bq[l]/2.) { ~ ~ ' '~ ' . ' '', ` .`" '' ',",. ~ '"' ~ ' ' ` .: '-~: . :.- ...

2 ~ n 2 ib=sq(Nlc,bd,b[0]);
b[0]=bq[ib];
}
else S { , ~- ,.: ' b[0]=0.;
ib=1;
} ':

/~ Inter~olation and voiced decisions: */

if ((double)abs(d[0]-d[-1])/d[-1]<=EPSILONdthr&&b[0]>0.&&b[-1]>0.) inte-~olation[0]=1;
else interpolation [O] =O; :: : '-'"' ~
if (G>=G-hr&&b[0]>=bthr) voiced[0]=1; : -else voiced[0]=0;

/* Selective correction of the long term predictor delay: */
if (interpolation[0]&&!voiced[0]) { . ':
DELTAd=sround((double)(h_*d[0])/(GAMMA*Lf)); ~ .

if (DELTAd<-absDELTAdmax) :
DELTAd=-absDELTAdmax;
else if (DELTAd>absDELTAdmax) DELTAd=absDELTAdmax; :
. :, : ~
if (d[0]+DELTAd>=Ls&&d~0]+DELTAd<=D&&(double)abs(d[0]+DELTAd -d[-1])/ - ~:
d[-1]<=EPSILONdthr) ~

2 1 2 0 ~ 0 2 ; ~
~ .

34 ` ` :` ~ ~:
- . -.: -:
d[03+=DELTAd;
}

/* Representation of the long term predictor S delay or the zero : : .
coefficient~
* / ~ : :' - ~ " ', :

if (b[0]>0.) :`
id=d[0]-Ls+1;
else id=Nld;

2) MINIMIZATION OF ERROR ENERGY
~ .' In this part of the listing, 'bang' denotes recognition , of an onset.

' ~ `
/* Preparation for time shift and innovation searches: *~ i if (interpolation[0]11voiced[0]) {
Ers=Ers_-0.; ;~
for ~n=o+Ls-d[0]+h; n<=o+h-l; n++) Ers+=rs[n]*rs[n];
for (; n<=o+Ls+h-l; n++) Ers_+=rs[n]*rs[n];

absrsmax=-DBL_MAX;
for (n=GAMMA*o+h_; n<=GAMMA*(o+Ls)+h_-l; :
n++) {
absrs=fabs(rs_[n]);

if (absrs>absrsmax) 212~

~:
{ : ;
np[0]=n; ~ :
absrsmax=absrs;
} :
np[0]-=GAMMA*o;

if (Ers_>=Ersthr&~d[0]*Ers_/(Ls*(Ers+Ers_))>=ROPrsthr) { ' ':
hmin=-GAMMA*H; - :
if (peak&&np[-l]<=h_-l&&hmin<np[-1]) hmin=np[-1];
hmax=GAMMA*H;
if (hmax>np[0]) hmax=np[0];
DELTAhpL=h_-hmin;
DELTAhpH=hmax-h_;
if ~
20 (DELTAhpL>=sround(GAMMA*DELTAhp)&&DELTAhpH>=sround(GAMMA* ~ - DELTAhp)) {
hmin=h_-sround(GAMMA*DELTAhp);
hmax=h_+sround(GAMMA*DELTAhp);

else - ::
if (DELTAhpL>=DELTAhpH) hmin=max(hmax-2*sround(GAMMA*DELTAhp),hmin);
else ~ -:.
: . ;. .
.
hmax=min(hmin+2*sround(GAMMA*DELTAhp),hmax);
peak=l; :
} :
3S else {
hmin=hmax=h_;
peak=0; ;~

' 2~20~2 : . `

}
else { :
S f (h_<0) hmin=hmax=min(h_ sround(GAMMA*DELTAhr),0);
else hmin=hmax=max(h_-sround(GAMMA*DEL ~kr),0);?eak=0;
} ~ ~?~

if (i<=~is~
{ ':' ;-.-'r_ [ 0]=Eh[Ls-l]=
-or (n=l,n_-Ls-2; n<=Ls-1; n++,n_~

h _ [n]=0.;
for (k=l; k<=min(n,P); k++) ~-h _ [n]+=aw[l][k]*h _ [n-k];
Eh[n_]=Eh[n +l]+h _ [n]*h _ [n];

for (i=Nik,j=Ni2; i>=l; i--) {
ro[j]=2.*h _ [n2key[i]];
for (n=l,j--; n<=Ls-l-n2key[i]; ~ ;
n++,j--) ~` ` `

ro[j]=ro[j+1]+2.*h _ [n+n2key[i]]*h _ [n];
} :::
}

/* Search for the time shift of the 3S term prediction residual:
*/

`:

212~
, for (n=o; n<=o+Ls-l; n++) { ~
xw[n]=0.;
for (k=l; k<=P; k++) ~ :
xw[n]+=aw[l][k]*xw[n-k]; :
-~ . . -:
el[n-o]=xw[n]-yw[n]; . - ~-.
} A ~

10 Eelmin--DBL_MAX;
for (i=O,h _-hmax; i<=GAMMA-l&&h _ >=hmin;
i++,h _--) Eel=0.;
for (n=O,n_-GAMMA*o+h _; n<=Ls-l;
n++,n_+=GAMMA) .. ~

xw2[i][n]=rs_[n_]; :
for (k=l; k<=min(n,P); k++) - ~
xw2~i][n]+=aw[l][k]*xw2[i][n-k]; ` ~-:
~ ':. ~''.
el_-el[n]+xw2[i][n];
Eel+=el_*el_; : .

if (Eel<Eelmin) h_-h _;
Eelmin=Eel;
}
} .: ' for (i=O,n_-GAMMA*o+h _; h _ >=hmin;
i=(i<GAMMA-l)?i+l:O,h _--,n_--) { " '' Eel=0.; : -for (n=Ls-l; n>=l; n--) .

xw2[i~[n]=xw2[i][n- .
. .:
: , ' : ' . .:: . ., "`'' .`.`'.`..`;;' ~"` ~ `'`''.`'`.. ~;'`''i` `` , 212~0~
.

38 : :~
l]+h _ [n]*rs_[n_]; - ~.

el_=el[n]+xw2[i][n];
Eel+=el_*el_; :: -} ~ ~ :

xw2[i][0]=rs_[n_]; ~;;

el_=el[O]+xw2[i][0];
~el+=el_*el~

if (Eel<Eelmin) h_-h _; .
Eelmin=Eel;

h=sround((double)h_/GAMMA);

/* Computation of the weighted speech and the long term component of the weighted error and :~
further preparation for the innovation search~

for (n=o,n_-GAMMA*o+h_; n<=o+Ls~
n++,n_+=GAMMA) xw[n]=rs_[n_];
for (k=l; k<=P; k++) xw[n]+=aw[l][k]*xw[n-k];

el[n-o]=xw[n]-yw[n];

sqrs[n-o]=rs_[n_]*rs_[n_]; q for (k=0; k<=Ls-l; k++) {

2~2~
:: .

Relh[k]=el[k];
for (n=l; n<=Ls-l-k; n++) Relh[k]+=el[n+k]*h _ [n]; .. A.

/* Preparation for the selective innovation gain clipping and the selective innovation search: ~ ~
*/ : ~:
.-~_ (voiced[0]) - ::
absglmax=nu*absrsmax;
else Ers=0.;
for (n=0; n<=B-l; n++) :~
Ers+=sqrs[n];
. :
Ersmax=Ers_-Ers;
ror (; n<=Ls-l; n++) {
Ers+=sqrs[n];
Ers_+=sqrs[n]-sqrs[n-B];
-~ if (Ers >Ersmax) Ersmax=Ers_;
} :.
' ~ ' ~ :',:
if ~ ::
(Ers>=Ersthr_&&Ls*Ersmax/(B*Ers)>=ROPrsthr_) :~
bang=l;
else bang=0;
}
' . ' /~ Selective search for the innovation .
35 parameters with selective ~`
innovation gain clipping: -.
* / , . .~ ~ .
. .
. ., 212~2 nl[l]=n2[l]=o;
gl[l]=g2[1]=0.;
DELTAEelemax=-DBL_MAX;
if ~voiced[O]IIbang) for (nl_=O,i=l; nl_<=Ls-l; nl_++,i++) {

Eu=Eh[nl_];
Relu=Relh[nl_];

gl_=Relu/Eu;

DELTAEele=gl_~Relu;
if (DELTAEele>DELTAEelemax) {
nl[l]=nl_;
is [l]=i;
gl [l]=gl_;
DELTAEelemax=DELTAEele;
-::

else i=Ls+l; ~:
,: ~ -if (voicedlO]) ~.
if ~glll]<-absglmax) gl~l]=-absglmax; ~:
else if (gl[l]>absglmax) gl[l]=absglmax; -~

for (j=l,k=l; j<=Nik; j++) ~;
for (nl_-O,n2_-n2key[j]; n2_<=Ls-l;
nl_++,n2_++,i++,k++) { .... ~;
Eu=Eh[nl_]+Eh[n2_]-ro[k];
Relu=Relh[nl_]-Relh[n2_];

gl_-Relu/Eu; ~ :

2120nD2 ~:
~1 " ' : ~.
DELTAEele=gl_*Relu; :
if (DELTAEele>DELTAEelemax) :
{ ':' nl[l]=nl-;
n2[1]=n2~
is [l]=i; ~ , :
gl[l]=gl_;
g2[1]=-gl_;
DELTAEelemax=DELTAEele;
10 }
i++; ~ "

Eu=Eh[nl_]+Eh[n2_]+ro[k];
lS Relu=Relh[nl_]TRelh[n2_]; .

- gl_-Relu/Eu;

DELTAEele=gl_*Relu; ~ .
if (DELTAEele>DELTAEelemax) - ;~
{
nl[l]=nl ;
n2[1]=n2_;
is[l]=i;
gl[l]=gl_;
g2[1]=gl~
DELTAEelemax=DELTAEele; ::~

:, ~' - -:: . . . .

. :: - :: :

Claims (47)

1. A method of coding/decoding speech signals, including, in a coding step, the operations of:
- sampling the original speech signals at a first sampling rate and dividing the resulting sequence of samples [x(n)] into a plurality of blocks of subsequent samples, each block comprising a first predetermined number Ls of samples or an integer multiple of said first number;
- performing a short-term analysis of the original speech signal to determine a group of linear prediction coefficients (ai) to be used for a linear prediction filtering, a short-term synthesis filtering and a spectral weighting filtering, generating a representation of said coefficients in the frequency domain, and inserting into the coded signal information [j(?)] related to the value of said representation, said information being valid for a period equal to the duration of a block or of a group of consecutive blocks of samples;
- obtaining, through said linear prediction filtering, a short-term residual signal [rs(n)] for said block or group of blocks of samples;
- subjecting said residual signal [rs(n)] to a long-term analysis, to determine long-term analysis parameters comprising a long-term synthesis filtering delay d and coefficient b, and inserting into the coded signal information [j(d), j(b)]
related to the values of said parameters, said information being valid for a time equal to the duration of a block or a group of consecutive blocks of samples;
- reproducing every block of speech signal samples to be coded with a reconstructed and weighted speech signal [yw(n)], obtained by subjecting to long-term synthesis filtering, short-term synthesis filtering and spectral weighting filtering an excitation signal chosen within a set of excitation signals, each comprising an amplitude contribution (excitation gain) and a shape contribution (innovation), the latter being composed of a limited number of pulses, much less than said first number of samples, with predefined positions and amplitudes belonging to a respective finite set;
- subjecting a set of samples of said residual signal [rs(n)] to a time shift by discrete steps, each set of residual signal samples having a number of samples equal to the number of samples in a block of speech signal samples to be coded, to align in time the residual signal with a reconstructed residual signal [ss(n)] obtained as result of the short-term synthesis filtering of an excitation signal, the shift generating a modified residual signal [rm(n)]
that is subjected to a short-term synthesis filtering ana to a spectral weighting filtering, identical to those carried out for the excitation signals, to generate a reconstructed and weighted modified speech signal [xw(n)];
- determining an optimal excitation signal for each block of samples, by minimizing the energy of a weighted error signal [e(n)] represented by the difference between the reconstructed and weighted modified signal [xw(n)] and the reconstructed and weighted signal [yw(n)], and inserting into the coded signal information [j(s), j(gmax), j(gnor), .sigma.] that identifies the optimal excitation signal;
wherein:
- the innovation pulses are the only non-null samples of words composed of said first number Ls of samples, - the innovation words for a first subset of excita-tion signals include a pair of pulses, a limited group of words of the first set being key-words in which the two pulses are placed in predetermined key positions and the other words in the subset are ob-tained from each of the key-words by simultaneously shifting the pulses by one position at a time towards a word end, till one of the pulses reaches said end or the key position of the other pulse in the starting word, the shifting direction being the same for all words; and - the innovation words for a second subset of excita-tion signals include only one pulse whose position is different for each signal;
and in that for said determination of the optimal excitation signal the energy of said weighted error signal is directly computed, by exploiting a pulse response Q(n) of filters that carry out synthesis and spectral weighting filterings of the excitation signal, with the following operations:
- determining said pulse response Q(n) and an energy Eq thereof for each of the possible pulse positions in excitation signals;
- determining a first partial error signal (e1(n)], represented by the difference between the reconstructed and weighted signal [xw(n)] and a contribution [yw1(n)] of the excitation signal filtering memory, and an energy of the same error signal;
- determining a first correlation R(e1q) between said first partial error signal [e1(n)] and the pulse response Q(n) for each of the pulses of an excitation signal;
- determining for each excitation signal, starting from said pulse responses, a signal [u(n)]
representative of a contribution of the filtering with null initial conditions of the excitation signal;
- determining the energy E(u) of said signal [u(n)]
representative of the contribution of a filtering with null initial conditions of the excitation signal and a second correlation R(e1u) between said signal [u(n)] representative of the contribution of the filtering with null initial conditions of the excitation signal and the first partial error signal [e1(n)];
- determining, for each excitation signal, an optimal value of the amplitude contribution as ratio between said second correlation and the energy of the signal resulting from filtering at null initial conditions;
- computing, as function of said second correlation R(e1u), of said energy Eu of the signal representative of the contribution of the filtering with null initial conditions of excitation and of said energy E(e1) of the first partial error signal, the value of error signal energy for each excitation signal.
2. A method according to claim 1, wherein said pulses have unitary amplitude.
3. A method according to claim 1, wherein the sequence of speech signal samples is divided into frames that are composed by a plurality of consecutive subframes each corresponding to one of said blocks and include a second predetermined number Lf of samples, wherein said short-term analysis is carried out for each frame, and wherein for said short-term analysis in a frame a sample window is analysed, whose length is Lf+P (P = number of linear prediction coefficients in each group), that encompasses a current frame and the subsequent frame and also includes a predefined number H+K of samples of said subsequent frame, said window being a trapezoidal window that weights all samples with maximum weight, apart from the first and the last P samples, for which the weighting factors are determined through linear interpolation between a minimum weight and the maximum weight.
4. A method according to claim 3, wherein for the initial subframes of each frame, the linear prediction coefficients ai are coefficients obtained as result of an interpolation between the values provided by short-term analysis for the current frame and those provided for the previous frame, the interpolation being carried out by operating on said representation.
5. A method according to claim 1, wherein the linear prediction residual is subjected to low-pass filtering before long-term analysis, thereby providing a filtered residual signal [rf(n)].
6. A method according to-claim 2, wherein the linear-prediction residual is subjected to low-pass filtering before long-term analysis, thereby providing a filtered residual signal [rf(n)].
7. A method according to claim 3, wherein the linear prediction residual is subjected to low-pass filtering before long-term analysis, thereby providing a filtered residual signal [rf(n)].
8. A method according to claim 4, wherein the linear prediction residual is subjected to low-pass filtering before long-term analysis, thereby providing a filtered residual signal [rf(n)].
9. A method according to claim 1, wherein the sequence of speech signal samples is divided into frames that are composed of a plurality of consecutive subframes each corresponding to one of said blocks and include a second predetermined number Lf of samples, wherein said long-term analysis is carried out for each frame, and wherein to determine said long-term analysis parameters, a sample window of the filtered residual signal [rf(n)] is analysed, that encompasses a current frame and the subsequent frame and also includes a predefined number H+K of samples of said subsequent frame.
10. method according to claim 9, wherein said long-term analysis further includes the operation of determining, for each frame, a long-term prediction gain G, representative of the ratio between energies of filtered residual signal at the input of, and the output from, means that carry out said analysis, the gain being also determined at each frame.
11. A method according to claim 9, wherein said long-term analysis further includes the operations of:
- classifying a speech signal segment corresponding to a frame as voiced or unvoiced, depending on the value of said long-term analysis coefficient b and on prediction gain G, and generating a first flag (V) in case the segment is classified as voiced;
- comparing values of long-term analysis delay d and coefficient b related to a current frame with those related to a previous frame and generating, when delay variation is less than a predefined amount and coefficient values in both frames are positive, a second flag (F) that enables interpolation between delay and coefficient values computed for said previous frame and those computed for the current frame.
12. A method according to claim 10, wherein said long-term analysis further includes the operations of:
- classifying a speech signal segment corresponding to a frame as voiced or unvoiced, depending on the value of said long-term analysis coefficient b and on prediction gain G, and generating a first flag (V) in case the segment is classified as voiced;

- comparing values of long-term analysis delay d and coefficient b related to a current frame with those related to a previous frame and generating, when delay variation is less than a predefined amount and coefficient values in both frames are positive, a second flag (F) that enables interpolation between delay and coefficient values computed for said previous frame and those computed for the current frame.
13. A method according to claim 9, 10, 11 or 12, wherein long-term analysis delay d is determined as maximum- of the autocorrelation function of the filtered residual within the window used-for the analysis itself, wherein, before determining long-term analysis coefficient b and prediction gain G for the current frame, the local maximum of said autocorrelation function is determined even in a neighborhood of the maximum of the same function in the previous frame, if said first and second flags had been generated in said previous frame, and said local maximum is used as delay for current frame if it is different by an amount that is less than a predefined value from the maximum in the window related to current frame.
14. A method according to claim 9, 10, 11 or 12, wherein the value of long-term analysis coefficient b is clipped to a first maximum value b1, linked to the ratio between energy of the filtered residual signal in the current frame and in the previous frame in an interval whose length is equal to the long-term analysis delay.
15. A method according to claim 9, 10, 11 or 12, wherein the value of long-term analysis coefficient b is clipped to a second maximum value b2, if it exceeds such value while the prediction gain G is less than a gain threshold Gthr'
16. A method according to claim 11, wherein said interpolation of long-term analysis delay d and coefficient b is a linear interpolation extended over a whole frame and, in case of a non-integer interpolated delay value, the value of a corresponding sample of the reconstructed residual signal ss(n) is evaluated with a second-order polynomial interpolation centered around the integer delay value that is nearest to said interpolated value.
17. A method according to claim 12, wherein said interpolation of long-term analysis delay d and coefficient b is a linear interpolation extended over a whole frame and, in case of a non-integer interpolated delay value, the value of a corresponding sample of the reconstructed residual signal ss(n) is evaluated with a second-order polynomial interpolation centered around the integer delay value that is nearest to said interpolated value.
18. A method according to claim 9, 10, 11, 12, 16 or 17, wherein information related to long-term analysis coefficient b inserted in the coded signal are indexes representative of quantized coefficient values, and information related to long-term analysis delay d allow representing also delay values that are outside an interval of allowed delays, wherein coefficient values that are less than a predefined fraction of a minimum quantized value are forced to o and, in case of forcing to 0, delay information representative of a value that is outside said interval of allowed delays and the index representative of said minimum quantized value, are inserted in the coded signal.
19. A method according to claim 11, wherein, to determine the optimal excitation, excitation signals of said second subset are used if said first flag (V) has been generated or, if said flag has not been generated, if analysis of the energy distribution in the modified residual signal shows an energy concentration in short times, that indicates the onset of a voiced sound.
20. A method according to claim 12, wherein, to determine the optimal excitation, excitation signals of said second subset are used if said first flag (V) has been generated or, if said flag has not been generated, if analysis of the energy distribution in the modified residual signal shows an energy concentration in short times, that indicates the onset of a voiced sound.
21. A method according to claim 19, wherein, to-determine the optimal excitation, the excitation signals of the two subsets are normalized with different normalization factors, linked to the number of pulses present in respective subset signals.
22. A method according to claim 20, wherein, to determine the optimal excitation, the excitation signals of the two subsets are normalized with different normalization factors, linked to the number of pulses present in respective subset signals.
23. A method according to claim 19, wherein, if said first flag (V) has been generated, the amplitude contribution for excitation signals of said second subset is limited in such a way as not to exceed a threshold that is proportional to the absolute value of the residual signal.
24. A method according to claim 20, wherein, if said first flag (V) has been generated, the amplitude contribution for excitation signals of said second subset is limited in such a way as not to exceed a threshold that is proportional to the absolute value of the residual signal.
25. A method according to claim 19, wherein said analysis of the energy distribution of the modified residual signal is carried out at each subframe and includes the operations of:
- dividing the subframe into a plurality of partially overlapping windows, a first and a last window coinciding with a respective initial or final part of the subframe, the windows following the first one being each shifted by one sample with respect to the previous window;
- determining the energy and the power of the modified residual signal in the whole subframe and the energy in each one of said windows;
- determining the power for the window whose energy is maximum and determining the ratio between the power in said window and the power in the subframe; and - comparing said maximum energy and said power ratio with respective thresholds, said energy concentration being recognized if said maximum energy and said ratio are not less than respective thresholds.
26. A method according to claim 20, wherein said analysis of the energy distribution of the modified residual signal is carried out at each subframe and includes the operations of:
- dividing the subframe into a plurality of partially overlapping windows, a first and a last window coinciding with a respective initial or final part of the subframe, the windows following the first one being each shifted by one sample with respect to the previous window;
- determining the energy and the power of the modified residual signal in the whole subframe and the energy in each one of said windows;

- determining the power for the window whose energy is maximum and determining the ratio between the power in said window and the power in the subframe; and - comparing said maximum energy and said power ratio with respective thresholds, said energy concentration being recognized if said maximum energy and said ratio are not less than respective thresholds.
27. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum.
28. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24 25 or 26, wherein, if only the second flag (F) has been generated, long-term analysis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, and wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values.
29. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, and wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift.
30. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied-by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein said analysis of the modified residual signal energy includes the operations of:
- comparing the energy itself with an energy threshold, which, when reached, shows that the corresponding speech signal segment is not silence;
- determining the modified residual signal power in the subframe and in an interval whose length is equal to the long-term analysis delay, and the ratio between such powers; and - comparing such ratio with a power threshold, which, when exceeded, shows the presence of a pitch peak in the subframe.
31. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein the shift for a subframe is determined, before determining an optimal excitation signal, within an interval that extends around the shift accumulated in previous subframes of the same frame, and it is the value that minimizes energy of said first partial error signal (e1(n)].
32. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corres- ponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein said analysis of the modified residual signal energy includes the operations of:
- comparing the energy itself with an energy threshold, which, when reached, shows that the corresponding speech signal segment is not silence;
- determining the modified residual signal power in the subframe and in an interval whose length is equal to the long-term analysis delay, and the ratio between such powers; and - comparing such ratio with a power threshold, which, when exceeded, shows the presence of a pitch peak in the subframe.
and wherein the shift for a subframe is determined, before determining an optimal excitation signal, within an interval that extends around the shift accumulated in previous subframes of the same frame, and it is the value that minimizes energy of said first partial error signal (e1(n)].
33. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein to determine the shift, an upsampling of the residual signal is carried out, at a second rate that is a multiple of the first rate, the shift in a subframe being equal to one or more samples of the upsampled residual signal.
34. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein said analysis of the modified residual signal energy includes the operations of:
- comparing the energy itself with an energy threshold, which, when reached, shows that the corresponding speech signal segment is not silence;
- determining the modified residual signal power in the subframe and in an interval whose length is equal to the long-term analysis delay, and the ratio between such powers; and - comparing such ratio with a power threshold, which, when exceeded, shows the presence of a pitch peak in the subframe.
wherein the shift for a subframe is determined, before determining an optimal excitation signal, within an interval that extends around the shift accumulated in previous subframes of the same frame, and it is the value that minimizes energy of said first partial error signal (e1(n)], and wherein said first partial error signal is computed as sum between a signal [xw2(n)] representative of the modified residual signal filtered with null initi-al conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representative of the modified residu-al filtered with null initial conditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the pre-vious sample and from said pulse response.
35. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein to determine the shift, an upsampling of the residual signal is carried out, at a second rate that is a multiple of the first rate, the shift in a subframe being equal to one or more samples of the upsampled residual signal, and wherein said first partial error signal is computed as sum between a signal [xw2(n)]
representative of the modified residual signal filtered with null initial conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representa-tive of the modified residual filtered with null initial conditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the previous sample and from said pulse response.
36. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual` signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual-signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein said analysis of the modified residual signal energy includes the operations of:
- comparing the energy itself with an energy threshold which, when reached, shows that the corresponding speech signal segment is not silence;
- determining the modified residual signal power in the subframe and in an interval whose length is equal to the long-term analysis delay, and the ratio between such powers; and - comparing such ratio with a power threshold, which, when exceeded, shows the presence of a pitch peak in the subframe.
wherein the shift for a subframe is determined, before determining an optimal excitation signal, within an interval that extends around the shift accumulated in previous subframes of the same frame, and it is the value that minimizes energy of said first partial error signal (e1(n)], and wherein said first partial error signal is computed as sum between a signal [xw2(n)] representative of the modified residual signal filtered with null initial conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representative of the modified residual filtered with null initial conditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the previous sample and from said pulse response, and wherein the determina-tion of said interval of shift values is carried out through the following operations:
- fixing for the interval ends two symmetrical values with respect to the accumulated value;
- determining the residual signal peak position in the upsampled residual signal and comparing it with the peak position in the previous subframe;
- limiting the interval extension on one or both sides of the accumulated value to avoid-an excessive shift of the subframe in the past and/or in the future, with consequent duplication or loss of residual signal peaks.
37. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein to determine the shift, an upsampling of the residual signal is carried out, at a second rate that is a multiple of the first rate, the shift in a subframe being equal to one or more samples of the upsampled residual signal, and wherein said first partial error signal is computed as sum between a signal [xw2(n)]
representative of the modified residual signal filtered with null initial conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representa-tive of the modified residual filtered with null initial conditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the previous sample and from said pulse response.
38. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein to determine the shift, an-upsampling of the residual signal is carried out, at a second rate that is a multiple of the first rate, the shift in a subframe being equal to one or more samples of the upsampled residual signal, and wherein said first partial error signal is computed as sum between a signal [xw2(n)]
representative of the modified residual signal filtered with null initial conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representa-tive of the modified residual filtered with null initial conditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the previous sample and from said pulse response, and wherein the determina-tion of said interval of shift values is carried out through the following operations:.
- fixing for the interval ends two symmetrical values with respect to the accumulated value;
- determining the residual signal peak position in the upsampled residual signal and comparing it with the peak position in the previous subframe;

- limiting the interval extension on one or both sides of the accumulated value to avoid an excessive shift of the subframe in the past and/or in the future, with consequent duplication or loss of residual signal peaks.
39. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein said analysis of the modified residual signal energy includes the operations of:
- comparing the energy itself with an energy threshold, which, when reached, shows that the corresponding speech signal segment is not silence;
- determining the modified residual signal power in the subframe and in an interval whose length is equal to the long-term analysis delay, and the ratio between such powers; and - comparing such ratio with a power threshold, which, when exceeded, shows the presence of a pitch peak in the subframe;
wherein the shift for a subframe is determined, before determining an optimal excitation signal, within an interval that extends around the shift accumulated in previous subframes of the same frame, and it is the value that minimizes energy of said first partial error signal (e1(n)], and wherein said first partial error signal is computed as sum between a signal [xw2(n)] representative of the modified residual signal filtered with null initial conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representative of the modified residual filtered with null initial conditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the previous sample and from said pulse response, and wherein the determina-tion of said interval of shift values is carried out through the following operations:
- fixing for the interval ends two symmetrical values with respect to the accumulated value;
- determining the residual signal peak position in the upsampled residual signal and comparing it with the peak position in the previous subframe;
- limiting the interval extension on one or both sides of the accumulated value to avoid an excessive shift of the subframe in the past and/or in the future, with consequent duplication or loss of residual signal peaks;
and wherein, in case of interval limitation on one side only of the accumulated value, the search for the shift is carried out also taking into account a certain number of values beyond the interval end not interested by the limitation, such that the global number of tested values is equal to the number of values included between said symmetrical values.
40. A method according to claim 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, wherein, if only the second flag (F) has been generated, long-term analy-sis delay d is varied by an amount that is proportional to entity of the shift accumulated up to the previous frame, the absolute value of the variation being limited to a predefined maximum, wherein said delay variation is disabled if it causes the decision about interpolation to be altered and the delay to go out of a predetermined interval of values, wherein the residual signal is subjected to said time shift in a subframe if at least one of said first and second flags has been generated and if an analysis of the modified residual signal energy in the subframe shows that the corresponding speech signal segment is not silence and includes a pitch peak, the shift related to a subframe being accumulated with that of the previous subframes of the same frame, so that the total shift in a frame remains less than a maximum shift, and wherein to determine the shift, an upsampling of the residual signal is carried out, at a second rate that is a multiple of the first rate, the shift in a subframe being equal to one or more samples of the upsampled residual signal, wherein said first partial error signal is computed as sum between a signal [xw2(n)] representa-tive of the modified residual signal filtered with null initial conditions and a second partial error signal (e0(n)], which is the difference between the memory contribution [xw1(n)] of the modified residual signal filtering and the memory contribution [yw1(n)] of the excitation filtering, the signal [xw2(n)] representative of the modified residual filtered with null initial con-ditions related to a sample in a subframe being obtained by carrying out the actual filtering of the modified residual signal for shift values between the upper end of the interval and an intermediate value between the two extreme values, while for each of the remaining shifts in the interval it is iteratively obtained from the value related to the previous sample and from said pulse res-ponse, and wherein, in case of interval limitation on one side only of the accumulated value, the search for the shift is carried out also taking into account a certain number of values beyond the interval end not interested by the limitation, such that the global number of tested values is equal to the number of values included between said symmetrical values.
41. A method according to claim 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 26, and further comprising a decoding step where, starting from the information [j(.PHI.), j(d), j(b), j(s), j(gnor), j(gmax), .sigma.] about the linear prediction coef-ficient representation, the long-term analysis parameters and the excitation signal, said representation is recon-structed, reconstructed linear prediction coefficients are obtained therefrom, the long-term analysis parameters are reconstructed, an excitation signal is chosen in a set of excitation signals corresponding to the one used in the coding step, and said signal is subjected to a short-term and a long-term synthesis filtering, identical to the ones carried out in the coding step, by using reconstructed linear prediction coefficients ai and long-term analysis delay d and coefficient b, to generate a reconstructed block of speech signal samples [y(n)] for each excitation signal [s(n)], wherein every block of reconstructed speech signal [y(n)], during the initial part of a validity period of linear prediction coeffi-cients, is generated by carrying out the short-term synthesis filtering with reconstructed linear prediction coefficients ai obtained as result of an interpolation between reconstructed values related to an immediately previous validity period and reconstructed values related to the current period, and in that the values of long-term analysis delay d and coefficient b, related to two consecutive validity periods, are compared and, if the delay variation is less than a predefined amount and the coefficient is positive in both periods, a flag corres-ponding to that second flag is generated, to enable carrying out, during long-term synthesis filtering, an interpolation between the long-term analysis parameter values related to said two validity periods.
42. An apparatus for coding/decoding speech signals using analysis-by-synthesis techniques, including a coder composed of:
- means (MT) for sampling at a first rate a speech signal an to divide the sample sequence into blocks comprising a first number of samples;
- short-term analysis means (STA, STR1) for computing a group of linear prediction coefficients ai for one or more blocks of samples, for transforming said coefficients into a representation thereof in the frequency domain, for obtaining from said represent-ation indexes j(.PHI.) identifying the coefficients themselves, to be inserted into the coded signal, and for reconstructing the coefficients starting from said indexes, every group of linear prediction coefficients being valid for a period of time equal to the duration of one or more blocks of samples;
- a linear prediction filter (LPC) that receives blocks of signal samples from the sampling means (MT) and linear prediction coefficients ai from the short-term analysis means (STA, STR1) and generates a short-term prediction residual signal rs(n);
- long-term analysis means (LTA, LTR1) for obtaining, from said residual signal, parameters for a long-term synthesis filtering, which parameters comprise a delay (d) and a coefficient (b), and for trans-forming said parameters into indexes [j(b), j(d)] to be inserted into the coded signal, the long-term analysis parameters being valid for a period of time equal to the duration of one or more blocks of samples;
- a first filtering system (LTS1, STS1, SW) that:
includes the series of a long-term synthesis filter (LTS1), that receives from the long-term analysis means (LTA, LTR1) said parameters, and of a short-term synthesis filter (STS1) and a spectral weighting filter (SW), that receive from said short-term analysis means (STA, STR1) said linear predic-tion coefficients ai; receives signals belonging to a set of excitation signals each including a shape contribution composed of a number of pulses, of predefined amplitudes and positions, said pulse number being much less than said first number; and generates a reconstructed signal yw(n) for each one of the excitation signals;
- means (TS) for time shifting, by discrete steps, a set of samples yw(n) of said residual signal to align it in time with a reconstructed residual signal ss(n) generated by the long-term synthesis filter (LTS1) of said first filtering system, the set of samples of residual signal having a number of samples equal to said first number of samples, every shift step being chosen within an interval of allowed values;
- a second filtering system (STS', SW'), that includes the series of a short-term synthesis filter and a spectral weighting filter identical to those (STS1, SW) of the first filtering system, is supplied with a modified residual signal generated by the time shift means for each of the values of said interval, and generates a reconstructed and weighted modified residual signal, said first and second filtering systems (LTS1, STS1, SW1, STS', SW') separately determining a contribution representative of the memory of previous filtering and a contribution representative of a filtering with null initial conditions;
- means (SM, EM) for generating a weighted error signal [e(n)] by comparing signals generated by the first and the second filtering systems, for iden-tifying an optimal excitation signal and an optimal shift, by minimizing the energy of said weighted error signal, and for inserting in the coded signal information that identifies the optimal excitation signal;
and further comprising, at the decoding side:
- means (LTR2, STR2) for reconstructing the linear-prediction coefficients and long-term analysis parameters starting from said indexes;
- a third filtering system (LTS2, STS2), including the series of a long-term synthesis filter and a short-term synthesis filter, identical to those (LTS1, STS1) of the first filtering system, for filtering an excitation signal selected, through information related to optimal excitation, in a set correspon-ding to the set used on the coding side and to gene-rate a block of reconstructed speech signal samples;
wherein:
- the innovation pulses are the only non-null samples of words composed of said first number Ls of samples;
- the innovation words for a first subset of excita-tion signals include a pair of pulses, a limited group of words of the first set being key-words in which the two pulses are placed in predetermined key positions and the other words in the subset being obtained from each of the key-words by simultaneous-ly shifting the pulses by one position at a time towards a word end, till one of the pulses reaches said end or the key position of the other pulse in the starting word, the shifting direction being the same for all words; and - the innovation words for a second subset of excita-tion signals include only one pulse whose position is different for each signal;
and in that, in said error signal generating means (SM, EM), the means to minimize error energy are composed of a processing unit arranged to:
- determine said pulse response [Q(n)] and an energy (Eq) thereof for each one of the possible pulse positions in excitation signals;
- determine a first partial error signal [e1(n)], represented by the difference between the recon-structed and weighted modified signal [xw(n)] and a contribution [yw1(n)] of the excitation signal filtering memory, and an energy of the error signal itself;
- determine a first correlation [R(e1q)] between said first partial error signal [e1(n)] and the pulse response for each of the pulses of an excitation signal;
- determine, for each excitation signal, starting from said pulse responses, a signal [u(n)] representative of a contribution of the filtering with null initial conditions of the excitation signal;
- determine the energy [E(u)] of said signal [u(n)]
representative of the contribution of a filtering with null initial conditions of the excitation signal and a second correlation R(e1u) between said signal [u(n)] representative of the contribution of the filtering with null initial conditions of the excitation signal and the first partial error signal [e1(u)];
- determine, for each excitation signal, an optimal value of the amplitude contribution as ratio between said second correlation and the energy of the signal resulting from filtering with null initial conditions;
- compute, as function of said second correlation R(e1u), of said energy (Eu) of the signal represent-ative of the contribution of the filtering with null initial conditions of the excitation and of said energy [E(e1)] of the first partial error signal, the error signal energy value for each excitation signal.
43. An apparatus according to claim 42, wherein a low-pass filter (FPB) is provided between said linear predic-tion filter (LPC) and said long-term analysis means (LTA, LTR1).
44. An apparatus according to claim 42, wherein the short-term analysis means (STA, STR1) in the coder and the means (STR2) for reconstructing linear prediction coefficients in the decoder include means for carrying out, on said representation in the frequency domain, a linear interpolation between values related to two consecutive validity periods, and supply the short-term synthesis filters (STS1, STS', STS2) of said filtering systems with the interpolated values in an initial part of a validity period of a set of coefficients.
45. An apparatus according to claim 43, wherein the short-term analysis means (STA, STR1) in the coder and the means (STR2) for reconstructing linear prediction coefficients in the decoder include means for carrying out, on said representation in the frequency domain, a linear interpolation between values related to two consecutive validity periods, and supply the short-term synthesis filters (STS1, STS', STS2) of said filtering systems with the interpolated values in an initial part of a validity period of a set of coefficients.
46. An apparatus according to claim 42, 43, 44 or 45, wherein the long-term analysis means (LTA, LTR1) in the coder and the means (LTR2) for reconstructing the long-term analysis parameters in the decoder include comparing means for comparing parameters related to two consecutive validity periods and generating a flag (F) to enable carrying out an interpolation between the parameters when they æatisfy predetermined conditions, and the long-term synthesis filter (LTS1, LTS2) of the first and second filtering systems is associated to means that, when said flag is present, carry out a second-order polynomial interpolation of said parameters, extended to a whole validity period thereof, and supply the respective long-term synthesis filter (LTS1, LTS2) with the interpolated parameters.
47. An apparatus according to claim 42, 43, 44 or 45, wherein the long-term analysis means (LTA, LTR1) in the coder and the means (LTR2) for reconstructing the long-term analysis parameters in the decoder include comparing means for comparing parameters related to two consecutive validity periods and generating a flag (F) to enable carrying out an interpolation between the parameters when they satisfy predetermined conditions, and the long-term synthesis filter (LTS1, LTS2) of the first and second filtering systems is associated to means that, when said flag is present, carry out a second-order polynomial interpolation of said parameters, extended to a whole validity period thereof, and supply the respective long-term synthesis filter (LTS1, LTS2) with the interpolated parameters, and wherein the time shift means (TS) include a circuit (US) for upsampling the residual signal, and storing means (SH) for storing, for each block of samples to be coded, a first group of upsampled residual signal samples corresponding to said first number Ls of samples, and two further groups of upsampled residual signal samples, respectively preceding and following said first group and including a number of samples linked to the maximum allowed shift, and for supplying the second filtering system (STS', STW'), upon command by the energy minimizing means (EM), with a fourth group of upsampled residual signal samples, including as many samples as those of the first group and shifted with respect to the first group by said optimal shift.
CA002120902A 1993-04-09 1994-04-08 Speech coder employing analysis-by-synthesis techniques with a pulse excitation Abandoned CA2120902A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IT93TO000244A IT1264766B1 (en) 1993-04-09 1993-04-09 VOICE CODER USING PULSE EXCITATION ANALYSIS TECHNIQUES.
ITTO93A000244 1993-04-09

Publications (1)

Publication Number Publication Date
CA2120902A1 true CA2120902A1 (en) 1994-10-10

Family

ID=11411368

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002120902A Abandoned CA2120902A1 (en) 1993-04-09 1994-04-08 Speech coder employing analysis-by-synthesis techniques with a pulse excitation

Country Status (5)

Country Link
EP (1) EP0619574A1 (en)
JP (1) JPH075899A (en)
CA (1) CA2120902A1 (en)
FI (1) FI941648A (en)
IT (1) IT1264766B1 (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2729246A1 (en) * 1995-01-06 1996-07-12 Matra Communication SYNTHETIC ANALYSIS-SPEECH CODING METHOD
FR2729247A1 (en) * 1995-01-06 1996-07-12 Matra Communication SYNTHETIC ANALYSIS-SPEECH CODING METHOD
FR2729244B1 (en) * 1995-01-06 1997-03-28 Matra Communication SYNTHESIS ANALYSIS SPEECH CODING METHOD
US5664054A (en) * 1995-09-29 1997-09-02 Rockwell International Corporation Spike code-excited linear prediction
EP1553564A3 (en) * 1996-08-02 2005-10-19 Matsushita Electric Industrial Co., Ltd. Voice encoding device, voice decoding device, recording medium for recording program for realizing voice encoding /decoding and mobile communication device
JP2000509847A (en) * 1997-02-10 2000-08-02 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Transmission system for transmitting audio signals
US6334648B1 (en) 1997-03-21 2002-01-01 Girsberger Holding Ag Vehicle seat
US6766289B2 (en) 2001-06-04 2004-07-20 Qualcomm Incorporated Fast code-vector searching
US7236928B2 (en) 2001-12-19 2007-06-26 Ntt Docomo, Inc. Joint optimization of speech excitation and filter parameters
JP3981399B1 (en) * 2006-03-10 2007-09-26 松下電器産業株式会社 Fixed codebook search apparatus and fixed codebook search method
DE602007005729D1 (en) 2006-06-19 2010-05-20 Sharp Kk Signal processing method, signal processing device and recording medium
WO2010058931A2 (en) * 2008-11-14 2010-05-27 Lg Electronics Inc. A method and an apparatus for processing a signal

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8500843A (en) * 1985-03-22 1986-10-16 Koninkl Philips Electronics Nv MULTIPULS EXCITATION LINEAR-PREDICTIVE VOICE CODER.
US4890328A (en) * 1985-08-28 1989-12-26 American Telephone And Telegraph Company Voice synthesis utilizing multi-level filter excitation
US5293449A (en) * 1990-11-23 1994-03-08 Comsat Corporation Analysis-by-synthesis 2,4 kbps linear predictive speech codec

Also Published As

Publication number Publication date
FI941648A0 (en) 1994-04-08
JPH075899A (en) 1995-01-10
IT1264766B1 (en) 1996-10-04
EP0619574A1 (en) 1994-10-12
ITTO930244A1 (en) 1994-10-09
ITTO930244A0 (en) 1993-04-09
FI941648A (en) 1994-10-10

Similar Documents

Publication Publication Date Title
EP0673017B1 (en) Excitation signal synthesis during frame erasure or packet loss
EP0747882B1 (en) Pitch delay modification during frame erasures
US5884253A (en) Prototype waveform speech coding with interpolation of pitch, pitch-period waveforms, and synthesis filter
EP0673018B1 (en) Linear prediction coefficient generation during frame erasure or packet loss
Campbell Jr et al. The DoD 4.8 kbps standard (proposed federal standard 1016)
AU700205B2 (en) Improved adaptive codebook-based speech compression system
US5371853A (en) Method and system for CELP speech coding and codebook for use therewith
KR100389178B1 (en) Voice/unvoiced classification of speech for use in speech decoding during frame erasures
US5327520A (en) Method of use of voice message coder/decoder
CA2183283C (en) An improved rcelp coder
US7398205B2 (en) Code excited linear prediction speech decoder and method thereof
US5729655A (en) Method and apparatus for speech compression using multi-mode code excited linear predictive coding
US5359696A (en) Digital speech coder having improved sub-sample resolution long-term predictor
US6055496A (en) Vector quantization in celp speech coder
US20050065785A1 (en) Indexing pulse positions and signs in algebraic codebooks for coding of wideband signals
US20040023677A1 (en) Method, device and program for coding and decoding acoustic parameter, and method, device and program for coding and decoding sound
USRE43190E1 (en) Speech coding apparatus and speech decoding apparatus
EP0673015B1 (en) Computational complexity reduction during frame erasure or packet loss
EP0450064B2 (en) Digital speech coder having improved sub-sample resolution long-term predictor
US5884251A (en) Voice coding and decoding method and device therefor
CA2120902A1 (en) Speech coder employing analysis-by-synthesis techniques with a pulse excitation
EP0747884B1 (en) Codebook gain attenuation during frame erasures
JPH0771045B2 (en) Speech encoding method, speech decoding method, and communication method using these
US5692101A (en) Speech coding method and apparatus using mean squared error modifier for selected speech coder parameters using VSELP techniques
CA2219358A1 (en) Speech signal quantization using human auditory models in predictive coding systems

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued