EP1100076A2 - Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors - Google Patents

Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors Download PDF

Info

Publication number: EP1100076A2
Authority: EP; European Patent Office
Prior art keywords: signal; gain; voice; excitation; spectral parameter
Prior art date: 1999-11-10
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Withdrawn

Application number

EP00124232A

Other languages

English (en)

French (fr)

Other versions

EP1100076A3 (de

Inventor

Kazunori Ozawa

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

NEC Corp

Original Assignee

NEC Corp

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

1999-11-10

Filing date

2000-11-09

Publication date

2001-05-16

2000-11-09 Application filed by NEC Corp filed Critical NEC Corp

2001-05-16 Publication of EP1100076A2 publication Critical patent/EP1100076A2/de

2003-12-10 Publication of EP1100076A3 publication Critical patent/EP1100076A3/de

Status Withdrawn legal-status Critical Current

Images

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/083—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain

Definitions

the present invention relates to a voice coding and a voice decoding apparatus and method thereof for satisfactorily coding background noise signal superimposed on a voice signal even at a low bit rate.
CELP Code-Excited Least Predictive Coding
M. Schroeder at B. Atal "Code-excited liner prediction: High quality speech at very low bit rates", Proc. ICASSP., pp. 937-940, 1985 (Literature 1) and Kleijn et al, "Improved speech quality and efficient vector quantization in SELP", Proc. ICASSP, pp. 155-158, 1988 (Literature 2).
a spectral parameter representing a spectral characteristic of voice signal is extracted from the voice signal for each frame (of 20 msec., for instance) by executing linear prediction (LPC) analysis.
LPC linear prediction
the frame is divided into sub-frames (of 5 msec., for instance), and pitch prediction of voice signal in each sub-frame is executed by using an adaptive codebook.
the pitch prediction is executed by extracting parameters in the adaptive codebook (i.e., delay parameter corresponding to the pitch cycle and gain parameter) for each sub-frame on the basis of past excitation signal.
An excitation signal is obtained as a result of the pitch prediction, and it is quantized by selecting an optimum excitation codevector from an excitation codebook (or vector quantization codebook), which is constituted by noise signals of predetermined kinds, and calculating an optimum gain.
the selection of the excitation codevector is executed such as to minimize the error power level between a signal synthesized from a selected noise signal and the residue signal.
a multiplexer combines an index representing the kind of the selected codevector with the gain, the spectral parameter and the adaptive codebook parameters, and transmits the resultant signal. The receiving side is not described.
An object of the present invention is to solve the above problems and provide a voice coding and a voice decoding apparatus, which is less subject to sound quality deterioration with respect to background noise with relatively less computational effort even in the lower bit rate case.
a voice coding apparatus including a spectral parameter calculating part for obtaining a spectral parameter for each predetermined frame of an input voice signal and quantizing the obtained spectral parameter, an adaptive codebook part for dividing the frame into a plurality of sub-frames, obtaining a delay and a gain from a past quantized excitation signal for each of the sub-frames by using an adaptive codebook and obtaining a residue by predicting the voice signal, an excitation quantizing part for quantizing the excitation signal of the voice signal by using the spectral parameter, and a gain quantizing part for quantizing the gain of the adaptive codebook and the gain of the excitation signal, comprising: a mode discriminating part for extracting a predetermined feature quantity from the voice signal and judging the pertinent mode to be either one of the plurality of predetermined modes on the basis of the extracted feature quantity; a smoothig part for executing time-wise smoothing of at least either one of the gain of the excitation signal, the gain of the
the mode discriminating part executes mode discriminating for each frame.
the feature quantity is pitch prediction gain.
the mode discriminating part averages the pitch prediction gains each obtained for each sub-frame over the full frame and classifying a plurality of predetermined modes by comparing a plurality of predetermined threshold values with the average value.
the plurality of predetermined modes substantially correspond to a silence, a transient, a weak voice and a strong voice time section, respectively.
a voice decoding apparatus including a multiplexer part for separating spectral parameter, pitch, gain and excitation signal as voice data from a voice signal, an excitation signal restoring part for restoring an excitation signal from the separated pitch, excitation signal and gain, a synthesizing filter part for synthesizing a voice signal on the basis of the restored excitation signal and the spectral parameter, and a post-filter part for post-filtering the synthesized voice signal by using the spectral parameter, comprising: an inverse filter part for estimating an excitation signal through an inverse post-filtering and inverse synthesis filtering on the basis of the output signal of the post-filter part and the spectral parameter, and a smoothing part for executing clockwise filtering of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter, the smoothed signal or signals being fed to the synthesis filter part, the synthesized signal output thereof being fed to the post-filter part to synthesize
a voice decoding apparatus including a multiplexer part for separating a mode discrimination data, spectral parameter, pitch, gain and excitation signal on the basis of a feature quantity of a voice signal to be decoded, an excitation signal restoring part for restoring an excitation signal from the separated pitch, excitation signal and gain, a synthesis filter part for synthesizing the voice signal by using the restored excitation signal and the spectral parameter, and a post-filter part for post-filtering the synthesized voice signal by using the spectral parameter, comprising: an inverse filter part for estimating the voice signal on the basis of the output signal of the post-filter part and the spectral parameter through an inverse post-filtering and inverse synthesis filtering, a smoothing part for executing time-wise smoothing of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter, the smoothed signal being fed to the synthesis filter part, the synthesis signal output thereof being fed to the
the mode discriminating part executes mode discriminating for each frame.
the feature quantity is the pitch prediction gain.
the mode discrimination is executed by averaging the pitch prediction gains each obtained for each sub-frame over the full frame and comparing the average value thus obtained with a plurality of predetermined threshold values.
the plurality of predetermined modes substantially correspond to a silence, a transient, a weak voice and a strong voice time section, respectively.
a voice decoding apparatus for locally reproducing a synthesized voice signal on the basis of a signal obtained through time-wise smoothing of at least either one of spectral parameter of the voice signal, gain of an adaptive codebook, gain of an excitation codebook and RMS of an excitation signal.
a voice decoding apparatus for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least one of RMS of residue signal, spectral parameter of received signal, gain of adaptive codebook and gain of excitation codebook and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
a voice decoding apparatus for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, and in a mode determined on the basis of a feature quantity of a voice signal to be decoded or in the case of presence of the feature quantity in a predetermined range, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least either one of RMS of the residue signal, spectral parameter of a received signal, gain of an adaptive codebook and gain of an excitation codebook, and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
a voice coding method including a step for obtaining a spectral parameter for each predetermined frame of an input voice signal and quantizing the obtained spectral parameter, a step for dividing the frame into a plurality of sub-frames, obtaining a delay and a gain from a past quantized excitation signal for each of the sub-frames by using an adaptive codebook and obtaining a residue by predicting the voice signal, a step for quantizing the excitation signal of the voice signal by using the spectral parameter, and a step for quantizing the gain of the adaptive codebook and the gain of the excitation signal, further comprising steps of: extracting a predetermined feature quantity from the voice signal and judging the pertinent mode to be either one of the plurality of predetermined modes on the basis of the extracted feature quantity; executing time-wise smoothing of at least either one of the gain of the excitation signal, the gain of the adaptive codebook, the spectral parameter and the level of the excitation signal; and locally reproducing synthesized
a voice decoding method including a step for separating spectral parameter, pitch, gain and excitation signal as voice data from a voice signal, a step for restoring an excitation signal from the separated pitch, excitation signal and gain, a step for synthesizing a voice signal on the basis of the restored excitation signal and the spectral parameter, and a step for post-filtering the synthesized voice signal by using the spectral parameter, further comprising steps of: estimating an excitation signal through an inverse post-filtering and inverse synthesis filtering on the basis of the post-filtered signal and the spectral parameter; and executing clockwise filtering of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter, the smoothed signal or signals being fed to the synthesis filtering, the synthesized signal output thereof being fed to the post-filtering to synthesize a voice signal.
a voice decoding method including a step for separating a mode discrimination data, spectral parameter, pitch, gain and excitation signal on the basis of a feature quantity of a voice signal to be decoded, a step for restoring an excitation signal from the separated pitch, excitation signal and gain, a step for synthesizing the voice signal by using the restored excitation signal and the spectral parameter, and a step for post-filtering the synthesized voice signal by using the spectral parameter, comprising steps of: estimating the voice signal on the basis of the post-filtered signal and the spectral parameter through an inverse post-filtering and inverse synthesis filtering; and executing time-wise smoothing of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter; the smoothed signal being fed to the synthesis filtering, the synthesis signal output thereof being fed to the post-filtering.
a voice decoding method for locally reproducing a synthesized voice signal on the basis of a signal obtained through time-wise smoothing of at least either one of spectral parameter of the voice signal, gain of an adaptive codebook, gain of an excitation codebook and RMS of an excitation signal.
a voice decoding method for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least one of RMS of residue signal, spectral parameter of received signal, gain of adaptive codebook and gain of excitation codebook and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
a voice decoding method for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, and in a mode determined on the basis of a feature quantity of a voice signal to be decoded or in the case of presence of the feature quantity in a predetermined range, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least either one of RMS of the residue signal, spectral parameter of a received signal, gain of an adaptive codebook and gain of an excitation codebook, and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
Fig. 1 is a block diagram showing a first embodiment of the voice coding apparatus according to the present invention.
a frame circuit 110 divides a voice signal inputted from an input terminal 100 into frames (of 20 msec., for instance).
a sub-frame divider circuit 120 divides each voice signal frame into sub-frames (of 5 msec. for instance) shorter than the frame.
the spectral parameter may be calculated by using well-known LPC analysis and, Brug analysis, etc. In this description, the use of the Brug analysis is assumed. The Brug analysis is detailed in Nakamizo, "Signal analysis and system identification", Corona Co., Ltd., pp. 82-87, 1988 (Literature 4), and is not described here.
the circuit 200 converts linear prediction coefficients ⁇ i (i being 1 to 10 ), calculated by the Brug analysis, to LSP parameter suited for quantization and interpolation.
the circuit 200 converts linear prediction coefficients obtained in the 2-nd and 4-th sub-frames by the Brug method to LSP parameter data, obtains LSP parameter data in the 1-st and 3-rd sub-frames by interpolation, inversely converts the 1-st and 3-rd sub-frame LSP parameter data to restore linear prediction coefficients, and thus feeds out the 1-st to 4-th sub-frame linear prediction coefficients ⁇ il (i being 1 to 10, 1 being 1 to 5) to an acoustic weighting circuit 210.
the circuit 200 further feeds out the 4-th sub-frame LSP parameter data to a spectral parameter quantizing circuit 210.
a spectral parameter quantizing circuit 210 efficiently quantizes LSP parameter in predetermined sub-frame, and feeds out quantized LSP value for minimizing distortion given as where LSP(i), QLSP(i)j and W(i) are the i-th LSP before the quantization, the j-th result obtained after the quantization and the weighting coefficient.
An LSP codebook 211 is referred by the spectral parameter quantizing circuit 210.
the LSP parameter may be vector quantized by a well-known method. Specific examples of the method are described in Japanese Patent Laid-Open No. 4-171500 (Japanese Patent Application No. 2-297600) (Literature 6), Japanese Patent Laid-Open No. 4-363000 (Japanese patent Application No. 3-261925) (Literature 7), Japanese Patent Laid-Open No. 5-6199 (Japanese Patent Application 3-155049) (Literature 8) and T.
the spectral parameter quantizing circuit 210 restores the 1-st to 4-th LSP parameters from the quantized LSP parameter data obtained in the 4-th sub-frame. Specifically, the circuit 210 restores the 1-st to 3-rd sub-frame LSP parameters by executing linear interpolation from the 4-th sub-frame quantized LSP parameter data in the prevailing frame and immediately preceding frames. The 1-st to 4-th sub-frame LSP parameters can be restored by linear interpolation after selecting one kind of codevector corresponding to minimum error power level between the LSP parameter data before and after the quantization.
the spectral parameter quantizing circuit 210 converts the thus restored 1-st to 3-rd sub-frame LSP and quantized 4-th sub-frame LSP parameter data to the linear prediction coefficients ⁇ il (i being 1 to 10, 1 being 1 to 5) for each sub-frame, and feeds out the coefficient data thus obtained to an impulse response calculating circuit 310.
the circuit 210 further feeds out an index representing the codevector of the quantized 4-th sub-frame LSP parameter to a multiplexer 400.
An acoustic weighting circuit 230 receiving the linear prediction coefficients ⁇ il (i being 1 to 10, 1 being 1 to 5) in each sub-frame, executes acoustic weighting of the sub-frame voice signal in the manner as described in Literature 1 noted above, and feeds out an acoustically weighted signal.
the subtracter 235 subtracts the response signal from the acoustically weighted signal for one sub-frame as shown by the equation shown below, and feeds out x' w (n) to an adaptive codebook circuit 300.
x w ( n ) x w ( n )- x z ( n )
the impulse response calculating circuit 310 calculates a predetermined number of impulse responses h w (n) of acoustic weighting filter, in which z transform is expressed by the following equation, and feeds out the calculated data to the adaptive codebook circuit 470 and an excitation quantizing circuit 350.
a mode discriminating circuit 300 executes mode discrimination for each frame by extracting a feature quantity from the frame circuit output signal.
the pitch prediction gain may be used.
the circuit 800 averages the pitch prediction gains obtained in the individual sub-frames over the full frame, and executes classification into a plurality of predetermined modes by comparing the average value with a plurality of predetermined threshold values.
modes 0 to 3 are set substantially for a silence, a transient, a weak voice and a strong voice time sections, respectively.
the circuit 800 feeds out mode discrimination data thus obtained to the excitation quantizing circuit 350, a gain quantizing circuit 365 and the multiplexer 400.
the adaptive codebook circuit 470 receives the past excitation signal v(n) from the gain quantizing circuit 370, output signal x' w (n) from the subtracter 235 and acoustically weighted impulse response hw(n) from the impulse response calculating circuit 310, then obtains a delay T corresponding to the pitch such as to minimize the distortion given by the following equation, and feeds out an index representing the delay to the multiplexer 400.
y w ( n - Twh ) v ( n - T )* h w ( n )
Equation (8) symbol * represents convolution.
the adaptive codebook circuit 500 then obtains gain ⁇ given as
the delay may be obtained as decimal sample value instead of integer sample value.
P. Kroon et al "Pitch predictors with high temporal resolution", Proc. ICASSP, pp. 661-664, 1990 (Literature 11).
the adaptive codebook circuit 470 further executes pitch prediction as in the following Equation (10), and feeds out the prediction residue signal e w (n) to the excitation quantizing circuit 355.
e w ( n ) x w ( n ) - ⁇ v ( n - T )* h w ( n )
the excitation quantizing circuit 355 receives mode discrimination data, and switches the excitation signal quantizing methods on the basis of the discriminated mode.
M pulses are provided in the modes 1 to 3. It is also assumed that in the modes 1 to 3 an amplitude or a polarity codebook of B bits is provided for collective pulse amplitude quantization of M pulses. The following description assumes the case of using a polarity codebook.
the polarity codebook is stored in an excitation codebook 351.
the excitation quantizing circuit 355 reads out individual polarity codevectors stored in the excitation codebook 351, allots a pulse position to each read-out codevector, and selects a plurality of sets of codevector and pulse position, which minimize the following equation (11).
h w (n) is the acoustically weighted impulse response.
the Equation (11) may be minimized by selecting a set of polarity codevector g ik and pulse position m i , which maximizes the following equation (12).
the positions which can be allotted for the individual pulses in the modes 1 to 3, can be restricted as shown in Literature 3.
the excitation quantizing circuit 355 feeds out the selected plurality of sets of polarity codevector and position to the gain quantizing circuit 370.
a predetermined mode i.e., the mode 0 in this case
a plurality of extents of shifting the pulse positions of all the pulses are predetermined by determining the pulse positions at a predetermined interval as shown in Table 2.
the shift extents are transmitted by quantizing them in two bits.
the gain quantizing circuit 370 receives the mode discrimination data from the mode discriminating circuit 300. In the modes 1 to 3, the circuit 370 receives a plurality of sets of polarity codevector and pulse position, and in the mode 0 it receives the set of pulse position and corresponding polarity for each shift extent.
the gain quantizing circuit 370 reads out gain codevector from the gain codebook 380. In the modes 1 to 3, the circuit 370 executes gain codevector retrieval for the plurality of selected sets of polarity codevector and pulse position such as to minimize the following Equation (15), and selects one set of gain and plurality codevectors, which minimizes distortion.
both the excitation gains represented by the gain and pulse of the adaptive codebook are simultaneously vector quantized.
the gain quantizing circuit 370 feeds the index representing selected polarity codevector, the code representing pulse position and the index representing gain codevector to the multiplexer 400.
the gain quantizing circuit 370 receives a plurality of shift extents and polarity corresponding to each pulse position in each shift extent case, executes gain codevector retrieval, and selects one set of gain codevector and shift extent, which minimizes the following Equation (16).
⁇ , k and G' k is the k-th codevector in a two-dimensional gain codebook stored in the gain codebook 380
⁇ (j) is the j-th shift extent
g' k is the selected gain codevector.
the circuit 370 feeds out index representing the selected gain codevector and code representing the shift extent to the multiplexer 400.
a smoothing circuit 450 receives the mode data and, when the received mode data is in a predetermined mode (for instance the mode 0), executes time-wise smoothing of at least either one of gain of excitation signal in gain codevector, gain of adaptive codebook, RMS of excitation signal and spectral parameter.
G t (m) ⁇ G t (m-1)+(1- ⁇ )G' t (m) where m is the sub-frame number.
⁇ t (m) ⁇ ⁇ t (m-1)+(1- ⁇ ) ⁇ ' t (m)
RMS (m) ⁇ RMS (m-1)+(1- ⁇ )RMS(m)
a weighting signal calculating circuit 360 receives the mode discrimination data and the smoothed signal output of the smoothing circuit and, in the cases of the modes 12 to 3, obtains drive excitation signal v(n) as in the above Equation (21).
the weighting signal calculating circuit 360 feeds out v(n) to the adaptive codebook circuit 470.
the weighting signal calculating circuit 360 obtains drive excitation signal v(n) in the manner as given by Equation (22).
the weighting signal calculating circuit 360 feeds out v(n) to the adaptive codebook circuit 470.
the weighting signal calculating circuit 360 calculates response signal x w (n) for each sub-frame by using the output parameters of the spectral parameter calculating circuit 200, the spectral parameter quantizing circuit 210 and the smoothing circuit 450. In the modes 1 to 3, the circuit 360 calculates x w (n) as given by Equation (23), and feeds out the calculated x w (n) to the response signal calculating circuit 240.
the weighted signal calculating circuit 500 receives smoothed LSP parameter obtained in the smoothing circuit 450, and converts this parameter to smoothed linear prediction coefficient.
the circuit 360 then calculates response signal x w (n) as given by Equation (24), and feeds out the response signal x w (n) to the response signal calculating circuit 240.
a demultiplexer 500 separates, from the received signal, index representing gain codevector, index representing delay of adaptive codebook, data of voice signal, index of excitation codevector and index of spectral parameter, and feeds out individual separated parameters.
a gain decoding circuit 510 receives index of gain codevector and mode discrimination data, and reads out and feeds out the gain codevector from a gain codebook 380 on the basis of the received index.
An adaptive codebook circuit 520 receives mode discrimination data and delay of adaptive codebook, generates adaptive codevector, multiplies gain of adaptive codebook by gain codevector, and feeds out the resultant product.
an excitation restoring circuit 540 When the mode discrimination data is in the modes 1 to 3, an excitation restoring circuit 540 generates an excitation signal on the basis of the polarity codevector, pulse position data and gain codevector read out from excitation codebook 351, and feeds out the generated excitation signal to an adder 550.
the adder 550 generates the drive excitation signal v(n) by using the outputs of the adaptive codebook circuit 520 and the excitation signal decoding circuit 540, feeds out the generated v(n) to a synthesizing filter circuit 560.
a spectral parameter decoding circuit 570 decodes the spectral parameter, executes conversion thereof to linear prediction coefficient, and feeds out the coefficient data thus obtained to a synthesizing filter circuit 560.
the synthesizing filter circuit 560 receives the drive excitation signal v(n) and linear prediction coefficient, and calculates reproduced signal s(n).
a post-filtering circuit 600 executes post-filtering for masking quantized noise with respect to the reproduced signal s(n), and feeds out post-filtered output signal Sp(n).
the post-filter has a transfer characteristic given by Equation (25).
An inverse post/synthesizing filter circuit 610 constitutes inverse-filter of post and synthesizing filters, and calculates a residue signal e(n).
the inverse filter has a transfer characteristic given by Equation (26).
a smoothing circuit 620 executes time-wise smoothing of at least either one of gain of excitation signal in gain codevector, gain of adaptive codebook, RMS of residue signal and spectral parameter.
the gain of excitation signal, the gain of adaptive codebook and the spectral parameter are smoothed in manners as given by the above Equations (17), (18) and (20), respectively.
RMSe(m) is the RMS of the m-th sub-frame residue signal.
RMS e (m) ⁇ RMS e (m-1)+(1- ⁇ )RMS e (m)
the smoothing circuit 620 restores the drive excitation signal by using the smoothed parameter or parameters.
the instant case concerns the restoration of drive voice surface signal by smoothing the RMS of residue signal as given by the following Equation (28).
e(n) [e(n)/RMS e (m)] RMS e (m)
the synthesizing filter 560 receives drive excitation signal e(n) obtained by using the smoothed parameter or parameters, and calculates reproduced signal s(n). As an alternative, it is possible to use smoothed linear prediction coefficient.
the post filter 600 receives the pertinent reproduced signal, executes post-filtering thereof to obtain final reproduced signal sp(n), and feeds out this signal.
Fig. 3 is a block diagram showing a third embodiment.
parts like those in Fig. 2 are designated by like reference numerals, and are no longer described.
an inverse post/synthesizing filter circuit 630 and a smoothing circuit 640 receive discrimination data from a demultiplexer 500 and, when the discrimination data is in a predetermined mode (for instance mode 0), executes their operations. These operations are the same as those of the inverse post/synthesizing filter circuit 610 and the smoothing circuit 620 in Fig. 2, and then no longer described.
a synthesized signal is locally reproduced by using the data obtained by time-wise smoothing of at least either one of spectral parameter, gain of adaptive codebook, gain of excitation codebook and RMS of excitation signal.
a residue signal is obtained from a signal obtained after post-filtering in an inverse post-synthesis filtering process
a voice signal synthesizing process is executed afresh on the basis of a signal obtained as a result of time-wise smoothing of at least either one of RMS residue signal, spectral parameter of received signal, gain of adaptive codebook, and gain of excitation codebook
a post-filtering process is executed afresh, thereby feeding out a final synthesized signal.
Processes thus may be added as perfect post-processes to the prior art decoding apparatus without any change or modification thereof. It is thus possible to suppress local time-wise parameter variations in the background noise part and provide synthesized voice less subject to sound quality deterioration.
a parameter smoothing process is executed in a predetermined mode or in the case of presence of feature quantity in a predetermined area. It is thus possible to execute process only in a particular time section (for instance a silence time section). Thus, even in the case of coding voice with background noise superimposed thereon at a low bit, the background noise part can be satisfactorily coded without adversely affecting the voice time section.

Landscapes

Engineering & Computer Science (AREA)
Computational Linguistics (AREA)
Signal Processing (AREA)
Health & Medical Sciences (AREA)
Audiology, Speech & Language Pathology (AREA)
Human Computer Interaction (AREA)
Physics & Mathematics (AREA)
Acoustics & Sound (AREA)
Multimedia (AREA)
Compression, Expansion, Code Conversion, And Decoders (AREA)
Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

EP00124232A 1999-11-10 2000-11-09 Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors Withdrawn EP1100076A3 (de)

Applications Claiming Priority (2)

Application Number	Priority Date	Filing Date	Title
JP31953499A JP2001142499A (ja)	1999-11-10	1999-11-10	音声符号化装置ならびに音声復号化装置
JP31953499		1999-11-10

Publications (2)

Publication Number	Publication Date
EP1100076A2 true EP1100076A2 (de)	2001-05-16
EP1100076A3 EP1100076A3 (de)	2003-12-10

Family

ID=18111328

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP00124232A Withdrawn EP1100076A3 (de)	1999-11-10	2000-11-09	Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors

Country Status (3)

Country	Link
EP (1)	EP1100076A3 (de)
JP (1)	JP2001142499A (de)
CA (1)	CA2325322A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US8457953B2 (en)	2007-03-05	2013-06-04	Telefonaktiebolaget Lm Ericsson (Publ)	Method and arrangement for smoothing of stationary background noise

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
JP2008170488A (ja) *	2007-01-06	2008-07-24	Yamaha Corp	波形圧縮装置、波形伸長装置、プログラムおよび圧縮データの生産方法
DE602008005250D1 (de)	2008-01-04	2011-04-14	Dolby Sweden Ab	Audiokodierer und -dekodierer
CN101615910B (zh) *	2009-05-31	2010-12-22	华为技术有限公司	压缩编码的方法、装置和设备以及压缩解码方法
US8737602B2 (en)	2012-10-02	2014-05-27	Nvoq Incorporated	Passive, non-amplified audio splitter for use with computer telephony integration

Citations (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US5267317A (en) *	1991-10-18	1993-11-30	At&T Bell Laboratories	Method and apparatus for smoothing pitch-cycle waveforms
EP0731348A2 (de) *	1995-03-07	1996-09-11	Advanced Micro Devices, Inc.	System zur Speicherung von und zum Zugriff auf Sprachinformation
GB2312360A (en) *	1996-04-12	1997-10-22	Olympus Optical Co	Voice Signal Coding Apparatus

1999
- 1999-11-10 JP JP31953499A patent/JP2001142499A/ja active Pending
2000
- 2000-11-09 CA CA002325322A patent/CA2325322A1/en not_active Abandoned
- 2000-11-09 EP EP00124232A patent/EP1100076A3/de not_active Withdrawn

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US5267317A (en) *	1991-10-18	1993-11-30	At&T Bell Laboratories	Method and apparatus for smoothing pitch-cycle waveforms
EP0731348A2 (de) *	1995-03-07	1996-09-11	Advanced Micro Devices, Inc.	System zur Speicherung von und zum Zugriff auf Sprachinformation
GB2312360A (en) *	1996-04-12	1997-10-22	Olympus Optical Co	Voice Signal Coding Apparatus

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MURASHIMA A ET AL: "A post-processing technique to improve coding quality of celp under background noise" 2000 IEEE WORKSHOP ON SPEECH CODING, 17 September 2000 (2000-09-17), pages 102-104, XP010520055 *
TANIGUCHI T ET AL: "Enhancement of VSELP Coded Speech under Background Noise" 1995 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, 20 September 1995 (1995-09-20), pages 67-68, XP010269480 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US8457953B2 (en)	2007-03-05	2013-06-04	Telefonaktiebolaget Lm Ericsson (Publ)	Method and arrangement for smoothing of stationary background noise

Also Published As

Publication number	Publication date
EP1100076A3 (de)	2003-12-10
CA2325322A1 (en)	2001-05-10
JP2001142499A (ja)	2001-05-25

Publication	Publication Date	Title
US5826226A (en)	1998-10-20	Speech coding apparatus having amplitude information set to correspond with position information
EP0802524B1 (de)	2003-01-08	Sprachkodierer
EP0957472B1 (de)	2004-07-28	Vorrichtung zur Sprachkodierung und -dekodierung
EP0834863B1 (de)	2003-11-05	Sprachkodierer mit niedriger Bitrate
US7680669B2 (en)	2010-03-16	Sound encoding apparatus and method, and sound decoding apparatus and method
EP1005022A1 (de)	2000-05-31	Verfahren und Vorrichtung zur Sprachkodierung
US6009388A (en)	1999-12-28	High quality speech code and coding method
CA2205093C (en)	2001-01-30	Signal coder
CA2336360C (en)	2006-08-01	Speech coder
EP1100076A2 (de)	2001-05-16	Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors
EP1154407A2 (de)	2001-11-14	Positionsinformationskodierung in einem Multipuls-Anregungs-Sprachkodierer
JP3153075B2 (ja)	2001-04-03	音声符号化装置
JP3299099B2 (ja)	2002-07-08	音声符号化装置
JP3471542B2 (ja)	2003-12-02	音声符号化装置
JPH09146599A (ja)	1997-06-06	音声符号化装置
JPH09319399A (ja)	1997-12-12	音声符号化装置

Legal Events

Date	Code	Title	Description
2001-03-30	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2001-05-16	AK	Designated contracting states	Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR
2001-05-16	AX	Request for extension of the european patent	Free format text: AL;LT;LV;MK;RO;SI
2003-10-24	PUAL	Search report despatched	Free format text: ORIGINAL CODE: 0009013
2003-12-10	AK	Designated contracting states	Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR
2003-12-10	AX	Request for extension of the european patent	Extension state: AL LT LV MK RO SI
2003-12-10	RIC1	Information provided on ipc code assigned before grant	Ipc: 7G 10L 21/02 B Ipc: 7G 10L 19/14 A
2004-01-02	17P	Request for examination filed	Effective date: 20031031
2004-09-01	AKX	Designation fees paid	Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR
2005-02-02	17Q	First examination report despatched	Effective date: 20041215
2005-09-16	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2005-11-02	18D	Application deemed to be withdrawn	Effective date: 20050426

EP1100076A2 - Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors - Google Patents

Info

Links

Images

Classifications

Definitions

Landscapes

Applications Claiming Priority (2)

Publications (2)

Family

ID=18111328

Family Applications (1)

Country Status (3)

Cited By (1)

Families Citing this family (4)

Citations (3)

Patent Citations (3)

Non-Patent Citations (2)

Cited By (1)

Also Published As

Similar Documents

Legal Events