EP0987680B1 - Audiosignalverarbeitung - Google Patents
Audiosignalverarbeitung Download PDFInfo
- Publication number
- EP0987680B1 EP0987680B1 EP19990202980 EP99202980A EP0987680B1 EP 0987680 B1 EP0987680 B1 EP 0987680B1 EP 19990202980 EP19990202980 EP 19990202980 EP 99202980 A EP99202980 A EP 99202980A EP 0987680 B1 EP0987680 B1 EP 0987680B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- component
- speech
- evolution surface
- phase
- concordant
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000005236 sound signal Effects 0.000 title claims description 16
- 238000012545 processing Methods 0.000 title claims description 9
- 238000001228 spectrum Methods 0.000 claims description 81
- 238000000034 method Methods 0.000 claims description 63
- 238000001914 filtration Methods 0.000 claims description 33
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 238000000926 separation method Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 230000000737 periodic effect Effects 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 31
- 230000003595 spectral effect Effects 0.000 description 14
- 230000009467 reduction Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/097—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using prototype waveform decomposition or prototype waveform interpolative [PWI] coders
Definitions
- the present invention relates to audio signal processing. It has particular utility in relation to the separation of voiced speech and unvoiced speech in low bit-rate speech coders.
- Low bit-rate speech coders are becoming increasingly commercially important as they enable a more efficient utilisation of the portion of the radio spectrum available to mobile phones.
- Speech can be classified into three parts - voiced speech, unvoiced speech and silence. Any one of these may be corrupted by the addition of background noise.
- voiced speech can be viewed as a succession of repeated waveforms.
- PWI Prototype Waveform Interpolation
- these methods involve sending information describing repeated pitch period waveforms only once, thereby reducing the amount of bits required to encode the speech signal.
- Initial PWI speech coding methods only encoded voiced speech, the other portions of the speech signal be coded using other methods (e.g. Code Excited Linear Prediction methods).
- One example of such a hybrid coding technique is described in " Encoding Speech Using Prototype Waveforms", W.B. Kleijn, IEEE Transactions on Speech and Audio Processing, Vol. 1, pp386-399, October 1993 .
- a concordant component is a component whose phase changes slowly in comparison to a discordant component whose phase changes more rapidly, said method comprising the steps of:
- concordant is intended to refer to signals whose phase changes slowly in comparison to discordant signals whose phase changes more rapidly.
- the present inventors have found that the rate of evolution of the phase information is useful in distinguishing between voiced speech (the concordant component of speech) and unvoiced speech/noise (the discordant component of speech).
- FIR Finite Impulse Response
- a conventional FIR filter might be approximated by a series of shorter FIR filters.
- a filtering process By decomposing a filtering process into a plurality of filtering stages and, in one or more of the intervals between those filtering stages, substituting phase information from an earlier stage for phase information from the most recent stage, a filtering process results which repeatedly uses the earlier phase information. Filtering a signal tends to smooth its phase and hence a filtered signal contains less information distinguishing its concordant and discordant parts. By reinstating the earlier phase information, the concordant or discordant component can be more thoroughly removed in the subsequent filtering stage(s). The result is a audio signal filtering process which is better able to extract a concordant or discordant component of an audio signal.
- each low-pass filtering step involves the application of an identical low-pass filter. This minimises the complexity of the processing method.
- the phase information derived from the initial evolution surface is used in all of said component steps. This maximises the effectiveness of the extraction method.
- One way in which the discordant component can be calculated is to calculate the concordant component according to the first aspect of the present invention and subtract this from the original signal.
- one way in which the concordant component can be calculated is to calculate the discordant component according to the first aspect of the present invention and subtract this from the original signal.
- an audio signal processor operable to extract one of a concordant component and a discordant component of a predetermined segment of an audio signal wherein a concordant component is a component whose phase changes slowly in comparison to a discordant component whose phase changes more rapidly, said apparatus comprising:
- a speech coding apparatus including:
- a method of waveform interpolation speech coding comprising:
- a mobile telephone network operating in accordance with a first embodiment of the present invention is operable to allow a first user A to converse with a second user B.
- User A's mobile phone is operable to transmit a radio signal representing parameters modelling user A's speech.
- the radio signal is received by a base station 17 which converts it to a digital electrical signal which it forwards to the Public Switched Telephone Network (PSTN) 20.
- PSTN Public Switched Telephone Network
- the Public Switched Telephone Network 20 is operated to make a connection between base station 17 and a base station 22 currently serving user B.
- the digital electrical signal is passed across the connection, and, on receiving the signal, the base station 22 converts the digital electrical signal to parameters representing user A's speech.
- the base station 22 transmits a radio signal representing those parameters to user B's mobile phone 24.
- User B's mobile phone receives the radio signal and converts it back to an analogue electrical signal which is used to drive a loudspeaker 32 to reproduce A's voice.
- a similar communications path exists in the other direction from user B to user A.
- the mobile phone network selects an appropriate bit-rate for the parameters representing the user's speech from a full bit-rate (6.7kbits -1 ), an intermediate bit-rate (4.6kbits -1 ) and a half bit-rate (2.3kbits -1 ).
- ADC Analogue to Digital Converter
- WI Waveform Interpolation
- the parameters are passed to a quantiser 16 which is operable to provide a variable rate parameter stream. The quantiser may simply forward the full-rate parameter stream or, if required, reduce the bit-rate of the parameter stream still further to the intermediate rate (4.6kbits -1 ) or the half-rate (2.3kbits -1 ).
- variable rate parameter stream undergoes further channel coding before being converted to a radio signal for transmission over the radio communication path to the base station 17.
- User B's mobile phone recovers the variable rate parameter stream and, if required, uses interpolation to generate the 6.7kbits -1 parameter stream before passing the parameters to a decoder 28.
- the decoder 28 processes the parameter stream to provide a digitally coded reconstruction of user A's speech which is then converted to an analogue electrical signal by the Digital to Analogue Converter (DAC) 30, which signal is used to drive the loudspeaker 32.
- DAC Digital to Analogue Converter
- the encoder 14 of user A's mobile phone receives the digitally coded speech signal from the Analogue to Digital Converter 30 and carries out a number of processes ( Figure 2 ) on the digitally coded speech signal to provide the stream of parameters representing user A's speech.
- the encoder first divides the digitally coded speech signal into 10ms frames.
- Linear Predictive Coding (LPC) techniques 34,36,38
- LPC Linear Predictive Coding
- a pitch period detection process 40 provides a measure (expressed as a number of sample instants) of the pitch of the current frame of speech.
- the residual signal then passes to a waveform extraction process which is carried out to obtain a characteristic waveform for each one of four 2.5ms sub-frames of each frame.
- Each characteristic waveform has a length equal to the pitch period of the signal at that sub-frame.
- voiced speech normally has a pitch period in the range 2ms to 18.75ms, it will be realised that the characteristic waveforms will normally overlap one another to a significant degree.
- the residual signal for voiced speech has a sharp spike in each pitch period and the window used to isolate the pitch period concerned is movable by a few sample points so as to ensure the spike is not close to the edge of the window.
- cw[i,k] represents the characteristic waveform for the ith sub-frame and res(x) means the value of the xth sample of the residual signal.
- the pitch period from the pitch detector is p i and, if required, q is increased from 0 to 4 in order to shift the spike in the residual away from the edge of the window.
- the characteristic waveforms (of length p i ) thus extracted then undergo a Discrete Fourier Transform (DFT) 44 to produce, for each residual sub-frame, a characteristic spectrum.
- DFT Discrete Fourier Transform
- CS[i, ⁇ ] is a complex value associated with a frequency interval ⁇ and the ith sub-frame of the residual
- the complex values for all frequency intervals forming a complex spectrum for the ith sub-frame of the residual.
- cw[i,k] and p i are as defined above.
- the characteristic spectra are generally obtained from signal segments which overlap at least the signal segments used in deriving the previous and subsequent characteristic spectra.
- voiced speech segments there will be little difference in the magnitude of the complex values associated with each frequency interval of a spectrum and the corresponding magnitude values of the spectra derived from adjacent segments of the signal.
- the time offset between the adjacent signals manifests itself as a phase offset between adjacent spectra.
- the phase spectra (consisting of the phase, or, in mathematical language, argument of the complex spectral values) are operated on by alignment process 46.
- each characteristic spectrum is aligned with another characteristic spectrum which may precede it by a many as four sub-frames.
- the interval (measured in sub-frames) between the characteristic spectra which are aligned with one another increases with increasing pitch period as follows:
- the alignment process shifts the phase values of one of the characteristic spectra to be aligned until the correlation between phase values of the two spectra reaches a maximum.
- the offset that is required to do this provides a phase correction for each one of the 76 frequency bins in the characteristic spectrum associated with a given sub-frame.
- the 'aligned' phase values are calculated by summing the original phase values and the phase correction (each is expressed in radians).
- phase spectrum is then combined with the magnitude spectrum associated with the sub-frame to provide an aligned characteristic spectrum for each sub-frame.
- CS aligned i ⁇ ⁇ CS norm i ⁇ ⁇ ⁇ e i ⁇ CS aligned i ⁇ ⁇
- ⁇ CS aligned [i, ⁇ ] represents the phase value obtained for the frequency interval ⁇ associated with the ith sub-frame following the alignment procedure.
- a normal representation of a spectrum has a series of bars spaced along a frequency axis and representing consecutive frequency intervals.
- the height of each bar is proportional to magnitude of the complex spectral value associated with the corresponding frequency interval. It is possible to visualise a further axis arranged perpendicularly to the frequency axis which represents the time at which a spectrum was obtained. Another spectrum derived a time interval later can then be visualised aligned with and parallel to the first spectrum and spaced therefrom in accordance with the scaling of the time axis. If this process is repeated for several spectra then a surface defined by the tops of the bars can be envisaged or computed from the individual magnitudes.
- FIG. 3A A simplified illustration of such a visualisation of the 'aligned' characteristic spectra output by alignment stage 46 is shown in Figure 3A (note that the alignment does not alter the magnitudes of the complex values forming the characteristic spectra and hence Figure 3A equally well represents the normalised characteristic spectra). For ease of illustration, only 11 spectral values are shown, rather than 76 as is actually the case in the embodiment.
- the so-called 'evolution' of a spectral magnitude associated with a given frequency interval can be envisaged as the variation in that spectral magnitude over spectra derived from consecutive time intervals.
- the evolution of the magnitude associated with the second lowest frequency interval from time t 0 to t 4 in Figure 3A is therefore the succession of values V1,V2,V3,V4,V5.
- the complex spectra in fact contain phase values as well as the magnitudes associated with a given frequency interval.
- the present inventors have found that an evolution of the complex spectral values associated unvoiced speech is more erratic than an analogous evolution derived from voiced speech.
- the phase component of the complex value varies more erratically for unvoiced speech.
- Figure 3B illustrates how a complex spectral value derived from unvoiced speech might evolve (the length of the line represents the magnitude, the angle ⁇ represents the phase).
- Figure 3C shows an evolution likely to be associated with voiced speech.
- a Slowly Evolving Spectrum generation process 48 receives the aligned characteristic spectra and processes them to obtain a Slowly Evolving Spectrum. Conventionally, this has been done by storing, say, seven consecutive spectra and then applying a moving average filter to the evolution of the complex values associated with each frequency interval ( Figure 4 ).
- SES [ i , ⁇ ] represents the complex spectral values of a modified spectrum for the ith sub-frame of the residual signal and a m represent the coefficients of the moving average filter.
- a series of operations are carried out on stored aligned characteristic spectra including the one associated with the current sub-frame and the six respectively associated with the six nearest sub-frames ( Figure 5 ).
- a counter is set to zero (step 60).
- a moving average filter 62 is then applied to the evolutions of the complex spectral values associated with respective frequency intervals to provide a modified spectrum 64 to be associated with the current sub-frame.
- phase values of the modified spectrum are then replaced (step 66) by the phase values of the aligned characteristic spectrum associated with the current sub-frame to provide a hybrid characteristic spectrum 67 associated with the current sub-frame.
- the counter is then increased by one (step 68) and a check is made on the value of the counter (step 70). If it has not yet reached six then the filtering 62 and phase replacement 66 steps are carried out on the hybrid characteristic spectrum just obtained.
- the magnitude values of the hybrid characteristic spectrum 67 obtained after the sixth replacement operation are output by the Slowly Evolving Spectrum generation process ( Figure 2 , 48) as the Slowly Evolving Spectrum 71 for the current sub-frame.
- the Slowly Evolving Spectrum (SES) 71 is passed to the Rapidly Evolving Spectrum generation process 50.
- the Rapidly Evolving Spectrum (RES) generation process 50 subtracts the SES magnitude values from the corresponding magnitude values of the aligned characteristic spectrum associated with the current sub-frame to provide the magnitude values of the RES.
- Both the SES magnitude values and the RES magnitude values are then arranged into Mel-scaled frequency intervals and the SES magnitude values 52 and RES magnitude values 54 for one out of every two sub-frames are forwarded to the quantiser ( Figure 1 , 16).
- the stream of parameters (pitch 41, RES magnitude values 54, SES magnitude values 52, LSFs 37) output by the WI encoder 14 are received at the decoder 28 in user B's mobile phone 24.
- the processes carried out in the decoder 28 are now described with reference to Figure 6 .
- the SES magnitude values 52 are passed to a phase generation process 80 which generates phase values to be associated with the magnitude values on the basis of known assumptions.
- the phase values are generated in the way described in the Applicant's International Patent Application No. PCT/GB97/02037 published as WO 98/05029 .
- the phase values and the SES magnitude values are combined to provide a complex SES characteristic spectrum.
- the RES magnitude values 54 are combined with random phase values 82 to generate a complex RES characteristic spectrum.
- Interpolation processes 84,86 are then carried out on the two types of spectra to obtain one spectra of each type every 2.5ms.
- the two spectra thus created are then combined 88 to provide an approximation to a characteristic spectrum for each sub-frame.
- the approximate characteristic spectrum is then passed, together with the pitch 41, to a cubic interpolation synthesis process 90 which operates in a known manner to reconstruct an approximation to the residual signal originally derived in the LSF analysis process in the encoder ( Figure 2 , 38).
- a filter 92 which is the inverse of the analysis filter ( Figure 2 , 38) is then used to provide an approximation of the audio signal originally passed to the encoder ( Figure 1 , 14).
- the SES generation process ( Figure 2 , 48) is better able to reduce the SES magnitude values associated with unvoiced speech than the processes used in prior-art PWI encoders.
- the erratic evolution of the phase values does result in the low-pass filtering operation ( Figure 4 , 57) reducing the magnitude values of the resultant SEW for the corresponding frequency interval.
- the present invention improves on this since it gives extra weight to the phase information in the characteristic spectrum (it will be recalled that it is the phase information that especially distinguishes unvoiced speech/noise from voiced speech).
- Extra weighting of the phase information is achieved by replacing the phase values at each stage of the iterative filtering process and thereby reintroducing the erratic phase values that particularly distinguish voiced and unvoiced speech before the next filtering stage.
- the result is low SES magnitudes associated with unvoiced speech and hence a less buzzy output than known encoders.
- phase values obtained from any earlier filtering stage could be used to replace the phase resulting after a later filtering stage. Such a method would still provide a degree of improvement over the prior-art.
- the above described processes (40, 42, 44, 46) which extract SES magnitude values from the residual signal could be used to derive a voicing measure for each of the frequency bands for each sub-frame.
- the voicing measure might simply be the ratio of the output SES magnitude to the original characteristic spectrum magnitude for a given frequency interval.
- Such a set of processes might be useful in a Multi-Band Excitation speech coder.
- the alignment stage 46 might be included within the repeated processes contained within the loop illustrated in Figure 5 . This would correct any drift introduced by the filtering process.
- each of the characteristic spectra corresponds to a single pitch period of the residual signal.
- the characteristic waveforms could be of a fixed length allowing the use of an efficient Fast Fourier Transform (FFT) algorithm to calculate the characteristic spectra.
- FFT Fast Fourier Transform
- the characteristic spectra might then contain peaks and troughs corresponding to the fundamental of the input signal (which, of course, need not be a residual signal).
- the application of the iterative process described in relation to Figure 5 would then retain the peaks but reduce the troughs further.
- Such a method is likely to have application in noise reduction algorithms that might be applied to speech, music or any other at least partly periodic audio signals.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Claims (12)
- Verfahren zum Extrahieren einer aus einer übereinstimmenden Komponente und einer nicht übereinstimmenden Komponente eines vorgegebenen Segments eines Audiosignals, wobei eine übereinstimmende Komponente eine Komponente ist, deren Phase sich langsam ändert im Vergleich zu einer nicht übereinstimmenden Komponente, deren Phase sich schneller ändert, wobei das Verfahren die Schritte aufweist:Bilden einer anfänglichen Entwicklungsoberfläche aus einer Reihe von kombinierten Amplituden- und Phasen-Spektren, die Segmente des Signals um das vorgegebene Segment darstellen; Modifizieren der anfänglichen Entwicklungsoberfläche, um eine modifizierte Entwicklungsoberfläche zu erhalten, die eine der übereinstimmenden Komponente oder der nicht übereinstimmenden Komponente des Signals darstellt; undExtrahieren der einen aus der übereinstimmenden Komponente oder der nicht übereinstimmenden Komponente des vorgegebenen Segments aus der modifizierten Entwicklungsoberfläche;wobei der Modifizieren-Schritt umfasst:eine Vielzahl von Komponenten-Filter-Schritte und, vor zumindest einem der Filter-Schritte, die Substitution einer Phaseninformation, die von der anfänglichen Entwicklungsoberfläche oder einem früheren der Komponenten-Schritte für die Phaseninformation abgeleitet ist, die von dem letzten Komponenten-Schritt abgeleitet ist.
- Verfahren gemäß Anspruch 1, wobei die Komponenten-Schritte jeweilige Tiefpass-Filter-Schritte aufweisen, wobei der Modifizierungs-Schritt eine modifizierte Entwicklungsoberfläche vorsieht, welche die übereinstimmende Komponente des vorgegebenen Segments darstellt.
- Verfahren gemäß Anspruch 2, wobei jeder Tiefpass-Filter-Schritt die Anwendung eines identischen Tiefpass-Filters umfasst.
- Verfahren gemäß einem vorhergehenden Anspruch, wobei eine Phaseninformation, die von der anfänglichen Entwicklungsoberfläche abgeleitet ist, in allen Komponenten-Schritten verwendet wird.
- Verfahren gemäß einem vorhergehenden Anspruch, das weiter aufweist den Schritt eines Berechnens der anderen der übereinstimmenden Komponente und der nicht übereinstimmenden Komponente durch Subtrahieren der einen der zwei Komponenten von der anfänglichen Entwicklungsoberfläche.
- Verfahren gemäß Anspruch 1, wobei die Komponenten-Schritte jeweilige Hochpass-Filter-Schritte aufweisen, wobei der Modifizierungs-Schritt eine modifizierte Entwicklungsoberfläche vorsieht, welche die nicht übereinstimmende Komponente des vorgegebenen Segments darstellt.
- Verfahren gemäß Anspruch 1, wobei das Audiosignal im Wesentlichen periodisch ist und jedes vorgegebene Segment eine andere Tonhöhe bzw. Pitch-Periode darstellt.
- Verfahren zum Trennen von stimmhafter Sprache von stimmloser Sprache und Rauschen, wobei das Verfahren die Schritte eines vorhergehenden Anspruchs aufweist, wobei das Audiosignal Sprache darstellt und die stimmhafte Sprache der übereinstimmenden Komponente entspricht und die stimmlose Sprache und das Rauschen der nicht übereinstimmenden Komponenten entsprechen.
- Verfahren zur Sprachcodierung, welches das Trennverfahren von Anspruch 8 aufweist, wobei mehr Information verwendet wird, um die stimmhafte Sprache zu codieren, als verwendet wird, um die stimmlose Sprache und Rauschen zu codieren.
- Audiosignal-Prozessor, der betriebsfähig ist, eine aus einer übereinstimmenden Komponente und einer nicht übereinstimmenden Komponente eines vorgegebenen Segments eines Audiosignals zu extrahieren, wobei eine übereinstimmende Komponente eine Komponente ist, deren Phase sich langsam ändert im Verglich zu einer nicht übereinstimmenden Komponente, deren Phase sich schneller ändert, wobei die Vorrichtung aufweist:Mittel, die betriebsfähig sind, eine anfänglicheEntwicklungsoberfläche aus einer Reihe von kombinierten Amplituden- und Phasen-Spektren zu bilden, die Segmente des Signals um das vorgegebene Segment darstellen;Mittel, die betriebsfähig sind, die anfängliche Entwicklungsoberfläche zu modifizieren, um eine modifizierte Entwicklungsoberfläche zu erhalten, die eine der übereinstimmenden Komponente oder der nicht übereinstimmenden Komponente des Signals darstellt; undMittel, die betriebsfähig sind, die eine aus der übereinstimmenden Komponente oder der nicht übereinstimmenden Komponente des vorgegebenen Segments von der modifizierten Entwicklungsoberfläche zu extrahieren;wobei die Vorrichtung weiter aufweist:Mittel, die betriebsfähig sind, eine Vielzahl von Filter-Schritten auszuführen und, vor zumindest einem der Filter-Schritte, eine Phaseninformation zu ersetzen, die von der anfänglichen Entwicklungsoberfläche oder einem früheren der Komponenten-Schritte für die Phaseninformation abgeleitet ist, die von dem letzten Komponenten-Schritt abgeleitet ist.
- Sprachcodiervorrichtung, die aufweist:ein Speichermedium, das einen Prozessor-lesbaren Code gespeichert hat, der verarbeitbar ist, um eingegebene Sprachdaten zu codieren, wobei der Code umfasst:einen "anfängliche Entwicklungsoberfläche"-Erzeugungscode, der verarbeitbar ist, um "anfängliche Entwicklungsoberfläche"-Daten zu erzeugen, die kombinierte Amplituden- und Phasen-Daten für Segmente der eingegebenen Sprachdaten aufweisen; einen Trennungscode, der verarbeitbar ist, um getrennte Phasen-Daten und Amplituden-Daten von den eingegebenen Sprachdaten abzuleiten;einen Entwicklungsoberfläche-Modifizierungscode, der verarbeitbar ist, um eine modifizierte Entwicklungsoberfläche zu erzeugen, die eine aus einer stimmhaften Komponente oder einer stimmlosen/Rauschen-Komponente der eingegebenen Sprachdaten darstellt; undeinen Komponenten-Extraktionscode, der verarbeitbar ist, um die eine der stimmhaften Komponente oderstimmlosen/Rauschen-Komponente aus den eingegebenen Sprachdaten zu extrahieren;wobei der Entwicklungsoberfläche-Modifizierungscode aufweist:einen Entwicklungsoberfläche-Filter-Code, der verarbeitbar ist, um die "anfängliche Entwicklungsoberfläche"-Daten mehrere Male zu filtern;einen Entwicklungsoberfläche-Dekompositionscode, der verarbeitbar ist, um Amplituden-Daten und Phasen-Daten nachfolgend auf einen oder mehrere der Filter-Schritte abzuleiten; undeinen "frühere Phase"-Wiedereinsetzungscode, der verarbeitbar ist, um die Phase-Daten zu ersetzen, die erlangt wurden bei der Verarbeitung des Entwicklungsoberfläche-Dekompositionscode mit einer früheren Version der Phasen-Daten.
- Verfahren einer Wellenform-Interpolations-Sprachcodierung, das aufweist:Bilden einer anfänglichen Entwicklungsoberfläche aus einer Reihe von kombinierten charakteristischen Wellenformen oder Spektren, die jeweilige Segmente der Sprache darstellen;wobei die Bildung umfasst ein Ausrichten jeder der charakteristischen Wellenformen oder Spektren mit einer früheren charakteristischen Wellenform oder eines Spektrums der Reihe; unddie frühere Wellenform oder das Spektrum getrennt ist von der charakteristischen Wellenform oder dem Spektrum, mit dem es ausgerichtet wird, durch eine variable Anzahl von Mitgliedern der Reihe, wobei die variable Anzahl gemäß der Tonhöhe des Signals variiert.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP19990202980 EP0987680B1 (de) | 1998-09-17 | 1999-09-13 | Audiosignalverarbeitung |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP98307574 | 1998-09-17 | ||
EP98307574 | 1998-09-17 | ||
EP19990202980 EP0987680B1 (de) | 1998-09-17 | 1999-09-13 | Audiosignalverarbeitung |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0987680A1 EP0987680A1 (de) | 2000-03-22 |
EP0987680B1 true EP0987680B1 (de) | 2008-07-16 |
Family
ID=26151440
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19990202980 Expired - Lifetime EP0987680B1 (de) | 1998-09-17 | 1999-09-13 | Audiosignalverarbeitung |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP0987680B1 (de) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6397175B1 (en) * | 1999-07-19 | 2002-05-28 | Qualcomm Incorporated | Method and apparatus for subsampling phase spectrum information |
ATE553472T1 (de) | 2000-04-24 | 2012-04-15 | Qualcomm Inc | Prädikitve dequantisierung von stimmhaften sprachsignalen |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5517595A (en) * | 1994-02-08 | 1996-05-14 | At&T Corp. | Decomposition in noise and periodic signal waveforms in waveform interpolation |
WO1998005029A1 (en) * | 1996-07-30 | 1998-02-05 | British Telecommunications Public Limited Company | Speech coding |
US5924061A (en) * | 1997-03-10 | 1999-07-13 | Lucent Technologies Inc. | Efficient decomposition in noise and periodic signal waveforms in waveform interpolation |
-
1999
- 1999-09-13 EP EP19990202980 patent/EP0987680B1/de not_active Expired - Lifetime
Also Published As
Publication number | Publication date |
---|---|
EP0987680A1 (de) | 2000-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1876587B1 (de) | Sprachgrundfrequenz-entzerrungsvorrichtung, sprachgrundfrequenz-entzerrungsverfahren, sprachcodierungsvorrichtung, sprachdecodierungsvorrichtung, sprachcodierungsverfahren und computerprogrammprodukte | |
US6377916B1 (en) | Multiband harmonic transform coder | |
RU2255380C2 (ru) | Способ и устройство воспроизведения речевых сигналов и способ их передачи | |
EP0770987B1 (de) | Verfahren und Vorrichtung zur Wiedergabe von Sprachsignalen, zur Dekodierung, zur Sprachsynthese und tragbares Funkendgerät | |
CA2167025C (en) | Estimation of excitation parameters | |
EP0837453B1 (de) | Verfahren zur Sprachanalyse sowie Verfahren und Vorrichtung zur Sprachkodierung | |
US6681204B2 (en) | Apparatus and method for encoding a signal as well as apparatus and method for decoding a signal | |
EP1313091B1 (de) | Verfahren und Computersystem zur Analyse, Synthese und Quantisierung von Sprache | |
US6608877B1 (en) | Reduced complexity signal transmission system | |
KR20010102004A (ko) | Celp 트랜스코딩 | |
JPH0744193A (ja) | 高能率符号化方法 | |
EP0766230B1 (de) | Verfahren und Vorrichtung zur Sprachkodierung | |
EP1597721B1 (de) | Melp (mixed excitation linear prediction)-transkodierung mit 600 bps | |
US6535847B1 (en) | Audio signal processing | |
EP1099215B1 (de) | System zur übertragung eines audiosignals | |
US5704002A (en) | Process and device for minimizing an error in a speech signal using a residue signal and a synthesized excitation signal | |
EP0744069B1 (de) | Lineare vorhersage durch impulsanregung | |
EP0987680B1 (de) | Audiosignalverarbeitung | |
US6801887B1 (en) | Speech coding exploiting the power ratio of different speech signal components | |
JP3437421B2 (ja) | 楽音符号化装置及び楽音符号化方法並びに楽音符号化プログラムを記録した記録媒体 | |
JPH05297895A (ja) | 高能率符号化方法 | |
JP3296411B2 (ja) | 音声符号化方法および復号化方法 | |
JPH05265486A (ja) | 音声分析合成方法 | |
JPH05281995A (ja) | 音声符号化方法 | |
GB2352949A (en) | Speech coder for communications unit |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FI FR GB IT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
17P | Request for examination filed |
Effective date: 20000921 |
|
AKX | Designation fees paid |
Free format text: DE FI FR GB IT SE |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 11/06 20060101ALN20071113BHEP Ipc: G10L 19/08 20060101AFI20071113BHEP |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FI FR GB IT SE |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 69939086 Country of ref document: DE Date of ref document: 20080828 Kind code of ref document: P |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080716 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20090417 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20080716 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20081016 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 18 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 19 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20170928 Year of fee payment: 19 Ref country code: FR Payment date: 20170928 Year of fee payment: 19 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20180919 Year of fee payment: 20 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 69939086 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20190402 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180930 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 Expiry date: 20190912 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20190912 |