EP1100076A2 - Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors - Google Patents
Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors Download PDFInfo
- Publication number
- EP1100076A2 EP1100076A2 EP00124232A EP00124232A EP1100076A2 EP 1100076 A2 EP1100076 A2 EP 1100076A2 EP 00124232 A EP00124232 A EP 00124232A EP 00124232 A EP00124232 A EP 00124232A EP 1100076 A2 EP1100076 A2 EP 1100076A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- gain
- voice
- excitation
- spectral parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000009499 grossing Methods 0.000 title claims abstract description 42
- 230000005284 excitation Effects 0.000 claims abstract description 132
- 230000003595 spectral effect Effects 0.000 claims abstract description 100
- 230000003044 adaptive effect Effects 0.000 claims abstract description 55
- 238000001914 filtration Methods 0.000 claims description 66
- 238000000034 method Methods 0.000 claims description 54
- 238000003786 synthesis reaction Methods 0.000 claims description 32
- 230000008569 process Effects 0.000 claims description 30
- 230000002194 synthesizing effect Effects 0.000 claims description 27
- 230000015572 biosynthetic process Effects 0.000 claims description 23
- 230000001052 transient effect Effects 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract 1
- 230000004044 response Effects 0.000 description 16
- 238000013139 quantization Methods 0.000 description 10
- 230000006866 deterioration Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000009467 reduction Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- ZMOQBTRTDSZZRU-UHFFFAOYSA-N 2-(1,2-dichloroethyl)pyridine;hydrochloride Chemical compound Cl.ClCC(Cl)C1=CC=CC=N1 ZMOQBTRTDSZZRU-UHFFFAOYSA-N 0.000 description 1
- 101000622137 Homo sapiens P-selectin Proteins 0.000 description 1
- 102100023472 P-selectin Human genes 0.000 description 1
- 101000873420 Simian virus 40 SV40 early leader protein Proteins 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/083—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain
Definitions
- the present invention relates to a voice coding and a voice decoding apparatus and method thereof for satisfactorily coding background noise signal superimposed on a voice signal even at a low bit rate.
- CELP Code-Excited Least Predictive Coding
- M. Schroeder at B. Atal "Code-excited liner prediction: High quality speech at very low bit rates", Proc. ICASSP., pp. 937-940, 1985 (Literature 1) and Kleijn et al, "Improved speech quality and efficient vector quantization in SELP", Proc. ICASSP, pp. 155-158, 1988 (Literature 2).
- a spectral parameter representing a spectral characteristic of voice signal is extracted from the voice signal for each frame (of 20 msec., for instance) by executing linear prediction (LPC) analysis.
- LPC linear prediction
- the frame is divided into sub-frames (of 5 msec., for instance), and pitch prediction of voice signal in each sub-frame is executed by using an adaptive codebook.
- the pitch prediction is executed by extracting parameters in the adaptive codebook (i.e., delay parameter corresponding to the pitch cycle and gain parameter) for each sub-frame on the basis of past excitation signal.
- An excitation signal is obtained as a result of the pitch prediction, and it is quantized by selecting an optimum excitation codevector from an excitation codebook (or vector quantization codebook), which is constituted by noise signals of predetermined kinds, and calculating an optimum gain.
- the selection of the excitation codevector is executed such as to minimize the error power level between a signal synthesized from a selected noise signal and the residue signal.
- a multiplexer combines an index representing the kind of the selected codevector with the gain, the spectral parameter and the adaptive codebook parameters, and transmits the resultant signal. The receiving side is not described.
- An object of the present invention is to solve the above problems and provide a voice coding and a voice decoding apparatus, which is less subject to sound quality deterioration with respect to background noise with relatively less computational effort even in the lower bit rate case.
- a voice coding apparatus including a spectral parameter calculating part for obtaining a spectral parameter for each predetermined frame of an input voice signal and quantizing the obtained spectral parameter, an adaptive codebook part for dividing the frame into a plurality of sub-frames, obtaining a delay and a gain from a past quantized excitation signal for each of the sub-frames by using an adaptive codebook and obtaining a residue by predicting the voice signal, an excitation quantizing part for quantizing the excitation signal of the voice signal by using the spectral parameter, and a gain quantizing part for quantizing the gain of the adaptive codebook and the gain of the excitation signal, comprising: a mode discriminating part for extracting a predetermined feature quantity from the voice signal and judging the pertinent mode to be either one of the plurality of predetermined modes on the basis of the extracted feature quantity; a smoothig part for executing time-wise smoothing of at least either one of the gain of the excitation signal, the gain of the
- the mode discriminating part executes mode discriminating for each frame.
- the feature quantity is pitch prediction gain.
- the mode discriminating part averages the pitch prediction gains each obtained for each sub-frame over the full frame and classifying a plurality of predetermined modes by comparing a plurality of predetermined threshold values with the average value.
- the plurality of predetermined modes substantially correspond to a silence, a transient, a weak voice and a strong voice time section, respectively.
- a voice decoding apparatus including a multiplexer part for separating spectral parameter, pitch, gain and excitation signal as voice data from a voice signal, an excitation signal restoring part for restoring an excitation signal from the separated pitch, excitation signal and gain, a synthesizing filter part for synthesizing a voice signal on the basis of the restored excitation signal and the spectral parameter, and a post-filter part for post-filtering the synthesized voice signal by using the spectral parameter, comprising: an inverse filter part for estimating an excitation signal through an inverse post-filtering and inverse synthesis filtering on the basis of the output signal of the post-filter part and the spectral parameter, and a smoothing part for executing clockwise filtering of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter, the smoothed signal or signals being fed to the synthesis filter part, the synthesized signal output thereof being fed to the post-filter part to synthesize
- a voice decoding apparatus including a multiplexer part for separating a mode discrimination data, spectral parameter, pitch, gain and excitation signal on the basis of a feature quantity of a voice signal to be decoded, an excitation signal restoring part for restoring an excitation signal from the separated pitch, excitation signal and gain, a synthesis filter part for synthesizing the voice signal by using the restored excitation signal and the spectral parameter, and a post-filter part for post-filtering the synthesized voice signal by using the spectral parameter, comprising: an inverse filter part for estimating the voice signal on the basis of the output signal of the post-filter part and the spectral parameter through an inverse post-filtering and inverse synthesis filtering, a smoothing part for executing time-wise smoothing of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter, the smoothed signal being fed to the synthesis filter part, the synthesis signal output thereof being fed to the
- the mode discriminating part executes mode discriminating for each frame.
- the feature quantity is the pitch prediction gain.
- the mode discrimination is executed by averaging the pitch prediction gains each obtained for each sub-frame over the full frame and comparing the average value thus obtained with a plurality of predetermined threshold values.
- the plurality of predetermined modes substantially correspond to a silence, a transient, a weak voice and a strong voice time section, respectively.
- a voice decoding apparatus for locally reproducing a synthesized voice signal on the basis of a signal obtained through time-wise smoothing of at least either one of spectral parameter of the voice signal, gain of an adaptive codebook, gain of an excitation codebook and RMS of an excitation signal.
- a voice decoding apparatus for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least one of RMS of residue signal, spectral parameter of received signal, gain of adaptive codebook and gain of excitation codebook and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
- a voice decoding apparatus for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, and in a mode determined on the basis of a feature quantity of a voice signal to be decoded or in the case of presence of the feature quantity in a predetermined range, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least either one of RMS of the residue signal, spectral parameter of a received signal, gain of an adaptive codebook and gain of an excitation codebook, and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
- a voice coding method including a step for obtaining a spectral parameter for each predetermined frame of an input voice signal and quantizing the obtained spectral parameter, a step for dividing the frame into a plurality of sub-frames, obtaining a delay and a gain from a past quantized excitation signal for each of the sub-frames by using an adaptive codebook and obtaining a residue by predicting the voice signal, a step for quantizing the excitation signal of the voice signal by using the spectral parameter, and a step for quantizing the gain of the adaptive codebook and the gain of the excitation signal, further comprising steps of: extracting a predetermined feature quantity from the voice signal and judging the pertinent mode to be either one of the plurality of predetermined modes on the basis of the extracted feature quantity; executing time-wise smoothing of at least either one of the gain of the excitation signal, the gain of the adaptive codebook, the spectral parameter and the level of the excitation signal; and locally reproducing synthesized
- a voice decoding method including a step for separating spectral parameter, pitch, gain and excitation signal as voice data from a voice signal, a step for restoring an excitation signal from the separated pitch, excitation signal and gain, a step for synthesizing a voice signal on the basis of the restored excitation signal and the spectral parameter, and a step for post-filtering the synthesized voice signal by using the spectral parameter, further comprising steps of: estimating an excitation signal through an inverse post-filtering and inverse synthesis filtering on the basis of the post-filtered signal and the spectral parameter; and executing clockwise filtering of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter, the smoothed signal or signals being fed to the synthesis filtering, the synthesized signal output thereof being fed to the post-filtering to synthesize a voice signal.
- a voice decoding method including a step for separating a mode discrimination data, spectral parameter, pitch, gain and excitation signal on the basis of a feature quantity of a voice signal to be decoded, a step for restoring an excitation signal from the separated pitch, excitation signal and gain, a step for synthesizing the voice signal by using the restored excitation signal and the spectral parameter, and a step for post-filtering the synthesized voice signal by using the spectral parameter, comprising steps of: estimating the voice signal on the basis of the post-filtered signal and the spectral parameter through an inverse post-filtering and inverse synthesis filtering; and executing time-wise smoothing of at least either one of the level of the estimated excitation signal, the gain and the spectral parameter; the smoothed signal being fed to the synthesis filtering, the synthesis signal output thereof being fed to the post-filtering.
- a voice decoding method for locally reproducing a synthesized voice signal on the basis of a signal obtained through time-wise smoothing of at least either one of spectral parameter of the voice signal, gain of an adaptive codebook, gain of an excitation codebook and RMS of an excitation signal.
- a voice decoding method for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least one of RMS of residue signal, spectral parameter of received signal, gain of adaptive codebook and gain of excitation codebook and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
- a voice decoding method for obtaining a residue signal from a signal obtained after post-filtering through an inverse post-synthesis filtering process, and in a mode determined on the basis of a feature quantity of a voice signal to be decoded or in the case of presence of the feature quantity in a predetermined range, executing a voice signal synthesizing process afresh on the basis of a signal obtained through time-wise smoothing of at least either one of RMS of the residue signal, spectral parameter of a received signal, gain of an adaptive codebook and gain of an excitation codebook, and executing a post-filtering process afresh, thereby feeding out a final synthesized signal.
- Fig. 1 is a block diagram showing a first embodiment of the voice coding apparatus according to the present invention.
- a frame circuit 110 divides a voice signal inputted from an input terminal 100 into frames (of 20 msec., for instance).
- a sub-frame divider circuit 120 divides each voice signal frame into sub-frames (of 5 msec. for instance) shorter than the frame.
- the spectral parameter may be calculated by using well-known LPC analysis and, Brug analysis, etc. In this description, the use of the Brug analysis is assumed. The Brug analysis is detailed in Nakamizo, "Signal analysis and system identification", Corona Co., Ltd., pp. 82-87, 1988 (Literature 4), and is not described here.
- the circuit 200 converts linear prediction coefficients ⁇ i (i being 1 to 10 ), calculated by the Brug analysis, to LSP parameter suited for quantization and interpolation.
- the circuit 200 converts linear prediction coefficients obtained in the 2-nd and 4-th sub-frames by the Brug method to LSP parameter data, obtains LSP parameter data in the 1-st and 3-rd sub-frames by interpolation, inversely converts the 1-st and 3-rd sub-frame LSP parameter data to restore linear prediction coefficients, and thus feeds out the 1-st to 4-th sub-frame linear prediction coefficients ⁇ il (i being 1 to 10, 1 being 1 to 5) to an acoustic weighting circuit 210.
- the circuit 200 further feeds out the 4-th sub-frame LSP parameter data to a spectral parameter quantizing circuit 210.
- a spectral parameter quantizing circuit 210 efficiently quantizes LSP parameter in predetermined sub-frame, and feeds out quantized LSP value for minimizing distortion given as where LSP(i), QLSP(i)j and W(i) are the i-th LSP before the quantization, the j-th result obtained after the quantization and the weighting coefficient.
- An LSP codebook 211 is referred by the spectral parameter quantizing circuit 210.
- the LSP parameter may be vector quantized by a well-known method. Specific examples of the method are described in Japanese Patent Laid-Open No. 4-171500 (Japanese Patent Application No. 2-297600) (Literature 6), Japanese Patent Laid-Open No. 4-363000 (Japanese patent Application No. 3-261925) (Literature 7), Japanese Patent Laid-Open No. 5-6199 (Japanese Patent Application 3-155049) (Literature 8) and T.
- the spectral parameter quantizing circuit 210 restores the 1-st to 4-th LSP parameters from the quantized LSP parameter data obtained in the 4-th sub-frame. Specifically, the circuit 210 restores the 1-st to 3-rd sub-frame LSP parameters by executing linear interpolation from the 4-th sub-frame quantized LSP parameter data in the prevailing frame and immediately preceding frames. The 1-st to 4-th sub-frame LSP parameters can be restored by linear interpolation after selecting one kind of codevector corresponding to minimum error power level between the LSP parameter data before and after the quantization.
- the spectral parameter quantizing circuit 210 converts the thus restored 1-st to 3-rd sub-frame LSP and quantized 4-th sub-frame LSP parameter data to the linear prediction coefficients ⁇ il (i being 1 to 10, 1 being 1 to 5) for each sub-frame, and feeds out the coefficient data thus obtained to an impulse response calculating circuit 310.
- the circuit 210 further feeds out an index representing the codevector of the quantized 4-th sub-frame LSP parameter to a multiplexer 400.
- An acoustic weighting circuit 230 receiving the linear prediction coefficients ⁇ il (i being 1 to 10, 1 being 1 to 5) in each sub-frame, executes acoustic weighting of the sub-frame voice signal in the manner as described in Literature 1 noted above, and feeds out an acoustically weighted signal.
- the subtracter 235 subtracts the response signal from the acoustically weighted signal for one sub-frame as shown by the equation shown below, and feeds out x' w (n) to an adaptive codebook circuit 300.
- x w ( n ) x w ( n )- x z ( n )
- the impulse response calculating circuit 310 calculates a predetermined number of impulse responses h w (n) of acoustic weighting filter, in which z transform is expressed by the following equation, and feeds out the calculated data to the adaptive codebook circuit 470 and an excitation quantizing circuit 350.
- a mode discriminating circuit 300 executes mode discrimination for each frame by extracting a feature quantity from the frame circuit output signal.
- the pitch prediction gain may be used.
- the circuit 800 averages the pitch prediction gains obtained in the individual sub-frames over the full frame, and executes classification into a plurality of predetermined modes by comparing the average value with a plurality of predetermined threshold values.
- modes 0 to 3 are set substantially for a silence, a transient, a weak voice and a strong voice time sections, respectively.
- the circuit 800 feeds out mode discrimination data thus obtained to the excitation quantizing circuit 350, a gain quantizing circuit 365 and the multiplexer 400.
- the adaptive codebook circuit 470 receives the past excitation signal v(n) from the gain quantizing circuit 370, output signal x' w (n) from the subtracter 235 and acoustically weighted impulse response hw(n) from the impulse response calculating circuit 310, then obtains a delay T corresponding to the pitch such as to minimize the distortion given by the following equation, and feeds out an index representing the delay to the multiplexer 400.
- y w ( n - Twh ) v ( n - T )* h w ( n )
- Equation (8) symbol * represents convolution.
- the adaptive codebook circuit 500 then obtains gain ⁇ given as
- the delay may be obtained as decimal sample value instead of integer sample value.
- P. Kroon et al "Pitch predictors with high temporal resolution", Proc. ICASSP, pp. 661-664, 1990 (Literature 11).
- the adaptive codebook circuit 470 further executes pitch prediction as in the following Equation (10), and feeds out the prediction residue signal e w (n) to the excitation quantizing circuit 355.
- e w ( n ) x w ( n ) - ⁇ v ( n - T )* h w ( n )
- the excitation quantizing circuit 355 receives mode discrimination data, and switches the excitation signal quantizing methods on the basis of the discriminated mode.
- M pulses are provided in the modes 1 to 3. It is also assumed that in the modes 1 to 3 an amplitude or a polarity codebook of B bits is provided for collective pulse amplitude quantization of M pulses. The following description assumes the case of using a polarity codebook.
- the polarity codebook is stored in an excitation codebook 351.
- the excitation quantizing circuit 355 reads out individual polarity codevectors stored in the excitation codebook 351, allots a pulse position to each read-out codevector, and selects a plurality of sets of codevector and pulse position, which minimize the following equation (11).
- h w (n) is the acoustically weighted impulse response.
- the Equation (11) may be minimized by selecting a set of polarity codevector g ik and pulse position m i , which maximizes the following equation (12).
- the positions which can be allotted for the individual pulses in the modes 1 to 3, can be restricted as shown in Literature 3.
- the excitation quantizing circuit 355 feeds out the selected plurality of sets of polarity codevector and position to the gain quantizing circuit 370.
- a predetermined mode i.e., the mode 0 in this case
- a plurality of extents of shifting the pulse positions of all the pulses are predetermined by determining the pulse positions at a predetermined interval as shown in Table 2.
- the shift extents are transmitted by quantizing them in two bits.
- the gain quantizing circuit 370 receives the mode discrimination data from the mode discriminating circuit 300. In the modes 1 to 3, the circuit 370 receives a plurality of sets of polarity codevector and pulse position, and in the mode 0 it receives the set of pulse position and corresponding polarity for each shift extent.
- the gain quantizing circuit 370 reads out gain codevector from the gain codebook 380. In the modes 1 to 3, the circuit 370 executes gain codevector retrieval for the plurality of selected sets of polarity codevector and pulse position such as to minimize the following Equation (15), and selects one set of gain and plurality codevectors, which minimizes distortion.
- both the excitation gains represented by the gain and pulse of the adaptive codebook are simultaneously vector quantized.
- the gain quantizing circuit 370 feeds the index representing selected polarity codevector, the code representing pulse position and the index representing gain codevector to the multiplexer 400.
- the gain quantizing circuit 370 receives a plurality of shift extents and polarity corresponding to each pulse position in each shift extent case, executes gain codevector retrieval, and selects one set of gain codevector and shift extent, which minimizes the following Equation (16).
- ⁇ , k and G' k is the k-th codevector in a two-dimensional gain codebook stored in the gain codebook 380
- ⁇ (j) is the j-th shift extent
- g' k is the selected gain codevector.
- the circuit 370 feeds out index representing the selected gain codevector and code representing the shift extent to the multiplexer 400.
- a smoothing circuit 450 receives the mode data and, when the received mode data is in a predetermined mode (for instance the mode 0), executes time-wise smoothing of at least either one of gain of excitation signal in gain codevector, gain of adaptive codebook, RMS of excitation signal and spectral parameter.
- G t (m) ⁇ G t (m-1)+(1- ⁇ )G' t (m) where m is the sub-frame number.
- ⁇ t (m) ⁇ ⁇ t (m-1)+(1- ⁇ ) ⁇ ' t (m)
- RMS (m) ⁇ RMS (m-1)+(1- ⁇ )RMS(m)
- a weighting signal calculating circuit 360 receives the mode discrimination data and the smoothed signal output of the smoothing circuit and, in the cases of the modes 12 to 3, obtains drive excitation signal v(n) as in the above Equation (21).
- the weighting signal calculating circuit 360 feeds out v(n) to the adaptive codebook circuit 470.
- the weighting signal calculating circuit 360 obtains drive excitation signal v(n) in the manner as given by Equation (22).
- the weighting signal calculating circuit 360 feeds out v(n) to the adaptive codebook circuit 470.
- the weighting signal calculating circuit 360 calculates response signal x w (n) for each sub-frame by using the output parameters of the spectral parameter calculating circuit 200, the spectral parameter quantizing circuit 210 and the smoothing circuit 450. In the modes 1 to 3, the circuit 360 calculates x w (n) as given by Equation (23), and feeds out the calculated x w (n) to the response signal calculating circuit 240.
- the weighted signal calculating circuit 500 receives smoothed LSP parameter obtained in the smoothing circuit 450, and converts this parameter to smoothed linear prediction coefficient.
- the circuit 360 then calculates response signal x w (n) as given by Equation (24), and feeds out the response signal x w (n) to the response signal calculating circuit 240.
- a demultiplexer 500 separates, from the received signal, index representing gain codevector, index representing delay of adaptive codebook, data of voice signal, index of excitation codevector and index of spectral parameter, and feeds out individual separated parameters.
- a gain decoding circuit 510 receives index of gain codevector and mode discrimination data, and reads out and feeds out the gain codevector from a gain codebook 380 on the basis of the received index.
- An adaptive codebook circuit 520 receives mode discrimination data and delay of adaptive codebook, generates adaptive codevector, multiplies gain of adaptive codebook by gain codevector, and feeds out the resultant product.
- an excitation restoring circuit 540 When the mode discrimination data is in the modes 1 to 3, an excitation restoring circuit 540 generates an excitation signal on the basis of the polarity codevector, pulse position data and gain codevector read out from excitation codebook 351, and feeds out the generated excitation signal to an adder 550.
- the adder 550 generates the drive excitation signal v(n) by using the outputs of the adaptive codebook circuit 520 and the excitation signal decoding circuit 540, feeds out the generated v(n) to a synthesizing filter circuit 560.
- a spectral parameter decoding circuit 570 decodes the spectral parameter, executes conversion thereof to linear prediction coefficient, and feeds out the coefficient data thus obtained to a synthesizing filter circuit 560.
- the synthesizing filter circuit 560 receives the drive excitation signal v(n) and linear prediction coefficient, and calculates reproduced signal s(n).
- a post-filtering circuit 600 executes post-filtering for masking quantized noise with respect to the reproduced signal s(n), and feeds out post-filtered output signal Sp(n).
- the post-filter has a transfer characteristic given by Equation (25).
- An inverse post/synthesizing filter circuit 610 constitutes inverse-filter of post and synthesizing filters, and calculates a residue signal e(n).
- the inverse filter has a transfer characteristic given by Equation (26).
- a smoothing circuit 620 executes time-wise smoothing of at least either one of gain of excitation signal in gain codevector, gain of adaptive codebook, RMS of residue signal and spectral parameter.
- the gain of excitation signal, the gain of adaptive codebook and the spectral parameter are smoothed in manners as given by the above Equations (17), (18) and (20), respectively.
- RMSe(m) is the RMS of the m-th sub-frame residue signal.
- RMS e (m) ⁇ RMS e (m-1)+(1- ⁇ )RMS e (m)
- the smoothing circuit 620 restores the drive excitation signal by using the smoothed parameter or parameters.
- the instant case concerns the restoration of drive voice surface signal by smoothing the RMS of residue signal as given by the following Equation (28).
- e(n) [e(n)/RMS e (m)] RMS e (m)
- the synthesizing filter 560 receives drive excitation signal e(n) obtained by using the smoothed parameter or parameters, and calculates reproduced signal s(n). As an alternative, it is possible to use smoothed linear prediction coefficient.
- the post filter 600 receives the pertinent reproduced signal, executes post-filtering thereof to obtain final reproduced signal sp(n), and feeds out this signal.
- Fig. 3 is a block diagram showing a third embodiment.
- parts like those in Fig. 2 are designated by like reference numerals, and are no longer described.
- an inverse post/synthesizing filter circuit 630 and a smoothing circuit 640 receive discrimination data from a demultiplexer 500 and, when the discrimination data is in a predetermined mode (for instance mode 0), executes their operations. These operations are the same as those of the inverse post/synthesizing filter circuit 610 and the smoothing circuit 620 in Fig. 2, and then no longer described.
- a synthesized signal is locally reproduced by using the data obtained by time-wise smoothing of at least either one of spectral parameter, gain of adaptive codebook, gain of excitation codebook and RMS of excitation signal.
- a residue signal is obtained from a signal obtained after post-filtering in an inverse post-synthesis filtering process
- a voice signal synthesizing process is executed afresh on the basis of a signal obtained as a result of time-wise smoothing of at least either one of RMS residue signal, spectral parameter of received signal, gain of adaptive codebook, and gain of excitation codebook
- a post-filtering process is executed afresh, thereby feeding out a final synthesized signal.
- Processes thus may be added as perfect post-processes to the prior art decoding apparatus without any change or modification thereof. It is thus possible to suppress local time-wise parameter variations in the background noise part and provide synthesized voice less subject to sound quality deterioration.
- a parameter smoothing process is executed in a predetermined mode or in the case of presence of feature quantity in a predetermined area. It is thus possible to execute process only in a particular time section (for instance a silence time section). Thus, even in the case of coding voice with background noise superimposed thereon at a low bit, the background noise part can be satisfactorily coded without adversely affecting the voice time section.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP31953499A JP2001142499A (ja) | 1999-11-10 | 1999-11-10 | 音声符号化装置ならびに音声復号化装置 |
JP31953499 | 1999-11-10 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1100076A2 true EP1100076A2 (de) | 2001-05-16 |
EP1100076A3 EP1100076A3 (de) | 2003-12-10 |
Family
ID=18111328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00124232A Withdrawn EP1100076A3 (de) | 1999-11-10 | 2000-11-09 | Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1100076A3 (de) |
JP (1) | JP2001142499A (de) |
CA (1) | CA2325322A1 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8457953B2 (en) | 2007-03-05 | 2013-06-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and arrangement for smoothing of stationary background noise |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008170488A (ja) * | 2007-01-06 | 2008-07-24 | Yamaha Corp | 波形圧縮装置、波形伸長装置、プログラムおよび圧縮データの生産方法 |
DE602008005250D1 (de) | 2008-01-04 | 2011-04-14 | Dolby Sweden Ab | Audiokodierer und -dekodierer |
CN101615910B (zh) * | 2009-05-31 | 2010-12-22 | 华为技术有限公司 | 压缩编码的方法、装置和设备以及压缩解码方法 |
US8737602B2 (en) | 2012-10-02 | 2014-05-27 | Nvoq Incorporated | Passive, non-amplified audio splitter for use with computer telephony integration |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5267317A (en) * | 1991-10-18 | 1993-11-30 | At&T Bell Laboratories | Method and apparatus for smoothing pitch-cycle waveforms |
EP0731348A2 (de) * | 1995-03-07 | 1996-09-11 | Advanced Micro Devices, Inc. | System zur Speicherung von und zum Zugriff auf Sprachinformation |
GB2312360A (en) * | 1996-04-12 | 1997-10-22 | Olympus Optical Co | Voice Signal Coding Apparatus |
-
1999
- 1999-11-10 JP JP31953499A patent/JP2001142499A/ja active Pending
-
2000
- 2000-11-09 CA CA002325322A patent/CA2325322A1/en not_active Abandoned
- 2000-11-09 EP EP00124232A patent/EP1100076A3/de not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5267317A (en) * | 1991-10-18 | 1993-11-30 | At&T Bell Laboratories | Method and apparatus for smoothing pitch-cycle waveforms |
EP0731348A2 (de) * | 1995-03-07 | 1996-09-11 | Advanced Micro Devices, Inc. | System zur Speicherung von und zum Zugriff auf Sprachinformation |
GB2312360A (en) * | 1996-04-12 | 1997-10-22 | Olympus Optical Co | Voice Signal Coding Apparatus |
Non-Patent Citations (2)
Title |
---|
MURASHIMA A ET AL: "A post-processing technique to improve coding quality of celp under background noise" 2000 IEEE WORKSHOP ON SPEECH CODING, 17 September 2000 (2000-09-17), pages 102-104, XP010520055 * |
TANIGUCHI T ET AL: "Enhancement of VSELP Coded Speech under Background Noise" 1995 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, 20 September 1995 (1995-09-20), pages 67-68, XP010269480 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8457953B2 (en) | 2007-03-05 | 2013-06-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and arrangement for smoothing of stationary background noise |
Also Published As
Publication number | Publication date |
---|---|
EP1100076A3 (de) | 2003-12-10 |
CA2325322A1 (en) | 2001-05-10 |
JP2001142499A (ja) | 2001-05-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5826226A (en) | Speech coding apparatus having amplitude information set to correspond with position information | |
EP0802524B1 (de) | Sprachkodierer | |
EP0957472B1 (de) | Vorrichtung zur Sprachkodierung und -dekodierung | |
EP0834863B1 (de) | Sprachkodierer mit niedriger Bitrate | |
US7680669B2 (en) | Sound encoding apparatus and method, and sound decoding apparatus and method | |
EP1005022A1 (de) | Verfahren und Vorrichtung zur Sprachkodierung | |
US6009388A (en) | High quality speech code and coding method | |
CA2205093C (en) | Signal coder | |
CA2336360C (en) | Speech coder | |
EP1100076A2 (de) | Multimodaler Sprachkodierer mit Glättung des Gewinnfaktors | |
EP1154407A2 (de) | Positionsinformationskodierung in einem Multipuls-Anregungs-Sprachkodierer | |
JP3153075B2 (ja) | 音声符号化装置 | |
JP3299099B2 (ja) | 音声符号化装置 | |
JP3471542B2 (ja) | 音声符号化装置 | |
JPH09146599A (ja) | 音声符号化装置 | |
JPH09319399A (ja) | 音声符号化装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Extension state: AL LT LV MK RO SI |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: 7G 10L 21/02 B Ipc: 7G 10L 19/14 A |
|
17P | Request for examination filed |
Effective date: 20031031 |
|
AKX | Designation fees paid |
Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
17Q | First examination report despatched |
Effective date: 20041215 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20050426 |