GB2137054A - Speech encoder - Google Patents
Speech encoder Download PDFInfo
- Publication number
- GB2137054A GB2137054A GB8306685A GB8306685A GB2137054A GB 2137054 A GB2137054 A GB 2137054A GB 8306685 A GB8306685 A GB 8306685A GB 8306685 A GB8306685 A GB 8306685A GB 2137054 A GB2137054 A GB 2137054A
- Authority
- GB
- United Kingdom
- Prior art keywords
- signal
- speech
- filter
- encoder
- weighting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005284 excitation Effects 0.000 claims abstract description 24
- 230000001755 vocal effect Effects 0.000 claims description 7
- 238000005311 autocorrelation function Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 8
- 230000002596 correlated effect Effects 0.000 abstract 1
- 238000000034 method Methods 0.000 description 6
- 238000010079 rubber tapping Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
The invention relates to a speech encoder using linear predictive coding and proposes a code comprising the parameters of a linear predictor and an excitation signal consisting of a plurality of pulses of which the timing and the amplitude is selected for each frame of speech. To enable the excitation signal pulses for the recursive filter to be evaluated in real time, the speech signal is passed through a pole-zero filter to suppress the effects of reverberations and the output of the filter 38 is correlated with the time weighted impulse response of the recursive filter with the encoded parameters.
Description
SPECIFICATION
Speech encoder
This invention relates to a speech encoder, this
being a circuit for converting a speech signal into a
pulse train. The pulse train may either be
transmitted, encrypted, or stored and from it the
original speech can be reproduced.
When speech is encoded in this manner, it is
important to reduced as much as possible the
number of pulses necessary to characterise the
speech in order to reduce the bit rate of the
transmitted signal or to reduce the amount of
storage space required to store a particular speech
signal. However, as the bit rate of the pulse train is
reduced the quality of the reproduced signal is
degraded. The invention seeks to provide a speech
encoder in which the speech quality is of
acceptable standard but in which the bit rate is
reduced.
There is already known in the art a system of
speech encoding which makes use of the
technique of linear predictive coding (LPC). In
order to explain the principles employed in this
method of encoding, reference will. first be made
to Figure 1 which shows a linear predictor.
The linear predictor in Figure 1 is a recursive
digital filter comprising a summation circuit 10
which has an input line 12 and an output line 14.
The output line 14 is connected to a shift register or to a tapped delay line 1 6 each tapping of which
is fed back to the summation circuit by way of a
respective multiplication circuit 1 8i to 18,.
Assume that it is desired to produce a particular
sequence of output signals corresponding to a
sampled speech signal. At any given instant, the
output signal has a first component determined by
the weighted summed outputs from the tappings
of the delay line and a second component
determined by the value of the input signal at that
instant. The first of these two components may be
regarded as the predicted value based on previous
values of the output signal and the second as the
residual error. If the weighting parameters p, to p, of the circuits 1 8 are optimised then the residual
error will be minimised. To enable the
reproduction by a linear predictor of an original
speech signal it is only necessary to transmit or
store in each frame the weighting parameters and
an excitation signal.The residual error, if used as
the excitation, yields perfect reproduction of the
original speech.
The technique described above works well for
speech signals because the operation simulates
the acoustic properties of the human vocal tract.
When a sound is uttered a vibration is transmitted
down the vocal tract which is configured to
produce the desired sound.
The configuration of the vocal tract, being due
to physical movement of articulatory organs, can
only change quite slowly. The analogy between
the configuration of the vocal tract and the
weighting parameters allows much of the
information in the speech signal to be transmitted
at a low data rate. While this ensures good intelligibility, the quality and naturalness of the reproduced speech is largely dependent on the excitation signal used.
In a system which has been proposed in the past, the parameters of the predictor are transmitted or stored and the excitation signal is selected either as white noise or as a regular series of pulses depending on the type of sound to be produced. Even using such crude simulation of the residual signal it was possible to produce recognisable speech. However, though the quality was acceptable for certain applications, for example military applications where maximum signal compression was of most importance, it fell below acceptable commercial standards.
In order to improve the quality of the reproduced speech, it is necessary to put more information into the excitation signal so that it should resemble the residual signal more closely.
With this aim in mind, it has been proposed that in each frame the predictor should be excited by a train of pulses, in which the timing and the magnitude of each pulse in the train should be selected in order to minimise the difference between the re-synthesised speech and the original speech signal. In this last case, the excitation signal does not depend on the type of sound to be produced but for each frame the ideal excitation pulse train is computed.
The above method was described and tested by
B. S. Atal 8 J. R. Remde of Bell Laboratories whose paper "A new model of LPC excitation for producing natural-sounding speech at low bit rates." appears in the transactions of the IEEE,
International Conference on Acoustics, Speech and Signal Processing, 1982 pp 614. However, the method described in the above paper is one which is not capable of generating the output signals in real time and is impracticable for all commercial uses of the method, because of the computational complexity required to implement it.
The present invention is intended to encode and decode speech using linear predictive coding in which the LPC filter is excited by a series of sparse pulses whose positions and amplitude are capable of being computed in real time.
According to the present invention, there is provided an encoder for encoding speech signals, comprising means for sampling frames of the speech signal to be encoded, a linear predictive analyser for determining for each frame the weighting parameters of a linear predictor to minimise the residual signal for the sampled frame, and means for producing an excitation signal for transmission or storage in conjunction with the parameters to enable each frame of the speech signal to be resynthesised, in which the means for producing an excitation signal comprises means for correlating a signal derived from the speech signal in that frame with the time weighted impulse response of a linear predictor having the weighting parameters determined by the analyser.
The expression "time weighted" is intended to signify that the response has the same shape but decays more rapidly, this being achieved by multiplying the parameter p,, by a factor kn, where k < 1.
A linear recursive filter if excited by a single pulse may have an impulse response of very long time duration and provided that it is not unstable will eventually decay rather than oscillate. The effect of a long time response is that responses from consecutive excitation pulses tend to run into each other and it is difficult when performing a correlation to separate the pulse response of one excitation from another.
In the preferred embodiment of the invention, the speech signal is passed through a weighting filter, preferably a pole-zero filter, which has the effect of damping reverberations. The weighting filter has a non-recursive part the weighting parameters of which are of the same magnitude as, but of opposite sign to, those of the linear predictor in the decoder. In the analogy mentioned above one may regard the purpose of the nonrecursive side of the weighting filter as negating the effect of the vocal tract on the pulses originally generated within the throat of the speaker. The other side of the filter, on the other hand, the recursive part, has weighting coefficients which are related to those of the linear predictor but are weighted by a factor which follows a power law of kn, (k < 1), so that time-weighting of the impulse response is achieved.
If one correlates the speech signal after passing - through such a weighting filter with the impulse; response of a filter which consists only of the recursive side of the weighting filter when excited by a single excitation pulse, then the correlator will produce a high correlation output at the times when impulses should be applied to the linear prediction filter in order to simulate the speech signal.
Thus, in the preferred embodiment, the weighting filter is followed by a correlator of which the output is fed to an impulse selector. The purpose of the impulse selector is to select from amongst the peaks of the output of the correlator a number of peaks having the highest magnitude. These peaks determine the time at which the residual signal should be applied to the linear predictor in the decoder in order to resynthesise the speech signal.
It is also preferred that the excitation pulses should have an amplitude related to the amplitude of the peak produced by the correlator. Because the auto-correlation functions of the pulse responses of the LPC filter are not constant but vary with the weighting parameters, it is preferred that the excitation pulse amplitude should be derived by dividing the correlator output by the value of the auto-correlation function of the impulse response of the filter with the prevailing time weighted parameters.
The invention will now be described further, by way of example, with reference to the accompanying drawings, in which:
Figure 1 is, as earlier described, a diagram of a
linear predictive filter;
Figure 2 is a block circuit diagram of an encoder in accordance with the present invention; and
Figure 3 is a diagram showing a weighting filter.
In Figure 2, the speech signal to be encoded is received over an input line 30. The input signal is applied to a known circuit 32 which is a linear prediction analyser. This circuit computes the values of the weighting parameters of a digital recursive filter which would minimise the residual signal and outputs these parameters. As is known, a linear prediction analyser more readily computes so called reflection co-efficients which are not the same as the weighting parameters but from which these parameters can be computed. The reflection co-efficients are applied to a line 34.
The speech signal is also applied via a line 36 to a weighting filter 38 which will now be described by reference to Figure 3. The weighting filter comprises an input Une 40 connected to a summation circuit 42 having an output line 44. A multi-tapped delay line (or shift register) 46 is connected to the input line 40 and a similar multitapped delay line 48 is connected to the output line 44. The tappings of the delay line 46 are connected by way of 9 first set of weighting circuits 50 to the circuit 42 which also receives signals from the tappings of the delay line 48 through weighting circuits 52. The values of the parameters used in the multiplication circuits of the weighting filter 38 in Figure 3 are derived from the linear prediction analyser 32.
In a block 60, the weighting parameters p1 to p,,, equivalent to the reflection coefficients are computed. In the coefficient weighting circuits 32, two sets of parameters are derived from the parameters p1 to p,, for setting the parameters of the weighting filter 38. The first set of parameters is applied to the weighting circuits 50 and are equal to -p1 to -p,,. Thus the combination of the summation circuit with the delay line 46 and the weighting circuits 50 results in a digital nonrecursive filter having parameters which are the opposite of those used in the receiving circuit to resynthesize the speech signal.As previously stated, the effect of the non-recursive part of the weighting filter is to negate the effect of the vocal tract.
The second set of parameters evaluated by the coefficient weighting circuit 62 is equal to k. p1 to k" . pal where k is less than 1. Thus, the delay line 48 and the weighting circuits 52 produce in conjunction with the summation circuit 42 a recursive digital filter whose pulse response is similar to that of the filter used to resynthesize the speech but with more rapid decay. The effect of combination of the non-recursive and recursive filters which constitute the weighting filter 38, which is also termed a pole-zero filter, is to produce from the speech signal one in which reverberations are more severely damped to reduce the interaction between the effects of consecutive excitation pulses.
The output of the digital weighting filter 38 is applied to a correlator 64 connected to a circuit 66 which evaluates the impulse response of a digital recursive filter of the same construction as that shown in Fig. 1 but with weighting parameters k. p1 to kn . p,,.
The correlator 64 may consist of a shift register whose tapping are connected to multiplication circuits the multiplication factors of which are determined by the impulse response evaluating circuit 66. When there is a high ievel of correlation between the output of the weighting filter 38 and the impulse response evaluated by the circuit 66, a high output is produced by the correlator. The output of the correlator 64 thus contains peaks which coincide with impulses in the excitation signal which, if applied to the linear predictor at the decoder, will cause a good approximation to the original. speech signal to be produced.
However, in order to reduce the bit rate, it is necessary to select from amongst the correlator output only a small number of pulses and these should coincide with the impulses of maximum energy in the excitation signal.
The purpose of the pulse selector circuit 70 in
Figure 2 is to select the timing of the pulses which are to be encoded. One could merely store the output Values from the correlator and select the highest peaks but this could result in consecutive high values being used to produce excitation pulses when they are truly the flanks of the same pulse. Therefore, it is preferable that the impulse circuit locate local maxima and minima and disregard the values adjacent to these peaks. One possible algorithm would be to disregard high values adjacent a local maximum or minimum if they are not separated from the local maximum or minimum by a zero crossing or a turning point.
The amplitude of the selected pulses will be related to the amplitude of an optimal excitation signal. In order to normalise these pulses to take into account the different values of the autocorrelation function of the impulse responses, the impulse response circuit 66 additionally evaluates the auto-correlation function of each pulse response and applies a signal over a line 72 to a divider circuit 74. In the divider circuit 74, the selected pulses are divided by the auto-correlation value and the output signal from the divider is fed to a multiplexer 76 which encodes the reflection coefficients received over the line 34 and the signals from the divider 74 to produce the encoded signal on output line 78 for transmission or storage.
Claims (8)
1. An encoder for encoding speech signals, comprising means for sampling frames of the speech signal to be encoded, a linear prediction analyser for determining for each frame the weighting parameters of a linear predictor to minimise the residual signal for the sampled frame, and means for producing an excitation signal for transmission or storage in conjunction with the parameters to enable each frame of the speech signal to be resynthesised, in which the means for producing an excitation signal comprises means for correlating a signal derived from the speech signal in that frame with the time weighted impulse response of a linear predictor having the weighting parameters determined by the analyser.
2. A signal encoder as claimed in Claim 1, in which the signal derived from the speech signal is obtained by means of a weighting filter which is operative to damp reverberations within the speech signal caused by resonances in the vocal tract and precedes the correlating means.
3. A signal encoder as claimed in Claim 2, in which the weighting filter comprises a pole-zero filter.
4; A signal encoder as claimed in any preceding claim, in which the correlating means comprises a tapped delay line, means for multiplying the tapped signals by the said time weighted impulse response, and means for summing the outputs of the multiplication circuits.
5. A signal encoder as claimed in any preceding claim in which the output of the correlating means is connected to a pulse selector which is operative to select a number of pulses from the correlator output.
6. A signal encoder as claimed in Claim 5, in which the pulse selector comprises means for detecting local peaks and means for selecting amongst the local peaks, those having the highest and lowest amplitudes.
7. A signal encoder as claimed in any preceding claim in which the magnitude of the transmitted pulses is determined by dividing the output of the correlating means by the auto-correlation function of the said time weighted impulse response.
8. A signal encoder constructed, arranged and adapted to operate substantially as herein described with reference to and as illustrated in the accompanying drawings.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB8306685A GB2137054B (en) | 1983-03-11 | 1983-03-11 | Speech encoder |
DE8484301302T DE3463192D1 (en) | 1983-03-11 | 1984-02-28 | Speech encoder |
EP19840301302 EP0119033B1 (en) | 1983-03-11 | 1984-02-28 | Speech encoder |
CA000449198A CA1202419A (en) | 1983-03-11 | 1984-03-09 | Speech encoder |
JP4698884A JPS59178032A (en) | 1983-03-11 | 1984-03-12 | Voice encoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB8306685A GB2137054B (en) | 1983-03-11 | 1983-03-11 | Speech encoder |
Publications (3)
Publication Number | Publication Date |
---|---|
GB8306685D0 GB8306685D0 (en) | 1983-04-20 |
GB2137054A true GB2137054A (en) | 1984-09-26 |
GB2137054B GB2137054B (en) | 1987-08-26 |
Family
ID=10539363
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB8306685A Expired GB2137054B (en) | 1983-03-11 | 1983-03-11 | Speech encoder |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPS59178032A (en) |
GB (1) | GB2137054B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4944013A (en) * | 1985-04-03 | 1990-07-24 | British Telecommunications Public Limited Company | Multi-pulse speech coder |
-
1983
- 1983-03-11 GB GB8306685A patent/GB2137054B/en not_active Expired
-
1984
- 1984-03-12 JP JP4698884A patent/JPS59178032A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4944013A (en) * | 1985-04-03 | 1990-07-24 | British Telecommunications Public Limited Company | Multi-pulse speech coder |
Also Published As
Publication number | Publication date |
---|---|
GB2137054B (en) | 1987-08-26 |
GB8306685D0 (en) | 1983-04-20 |
JPS59178032A (en) | 1984-10-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5060269A (en) | Hybrid switched multi-pulse/stochastic speech coding technique | |
EP0195487B1 (en) | Multi-pulse excitation linear-predictive speech coder | |
EP0515138B1 (en) | Digital speech coder | |
Atal | High-quality speech at low bit rates: Multi-pulse and stochastically excited linear predictive coders | |
US5359696A (en) | Digital speech coder having improved sub-sample resolution long-term predictor | |
US4945565A (en) | Low bit-rate pattern encoding and decoding with a reduced number of excitation pulses | |
JP3068196B2 (en) | Multipulse analysis speech processing system and method | |
CA2132006C (en) | Method for generating a spectral noise weighting filter for use in a speech coder | |
EP0578436B1 (en) | Selective application of speech coding techniques | |
US5719993A (en) | Long term predictor | |
GB2137054A (en) | Speech encoder | |
EP0162585B1 (en) | Encoder capable of removing interaction between adjacent frames | |
EP0149724A1 (en) | Method and apparatus for coding digital signals | |
EP0119033B1 (en) | Speech encoder | |
JPH058839B2 (en) | ||
JPH043879B2 (en) | ||
JP3749838B2 (en) | Acoustic signal encoding method, acoustic signal decoding method, these devices, these programs, and recording medium thereof | |
CA2127483C (en) | Speech signal encoding system capable of transmitting a speech signal at a low bit rate without carrying out a large volume of calculation | |
JP3103108B2 (en) | Audio coding device | |
JP3274451B2 (en) | Adaptive postfilter and adaptive postfiltering method | |
JP3071800B2 (en) | Adaptive post filter | |
JPH043878B2 (en) | ||
JPH032900A (en) | Vocal cord/vocal meatus type voice analyzing device | |
JPH01179999A (en) | Pitch extracting device | |
JPH03130800A (en) | Voice encoding system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PCNP | Patent ceased through non-payment of renewal fee |