[go: up one dir, main page]

CN101154383A - Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model - Google Patents

Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model Download PDF

Info

Publication number
CN101154383A
CN101154383A CNA2006101412409A CN200610141240A CN101154383A CN 101154383 A CN101154383 A CN 101154383A CN A2006101412409 A CNA2006101412409 A CN A2006101412409A CN 200610141240 A CN200610141240 A CN 200610141240A CN 101154383 A CN101154383 A CN 101154383A
Authority
CN
China
Prior art keywords
noise
estimation
phonetic feature
speech
square error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006101412409A
Other languages
Chinese (zh)
Other versions
CN101154383B (en
Inventor
丁沛
何磊
鄢翔
赵蕤
郝杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to CN2006101412409A priority Critical patent/CN101154383B/en
Publication of CN101154383A publication Critical patent/CN101154383A/en
Application granted granted Critical
Publication of CN101154383B publication Critical patent/CN101154383B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a noise reduction method, a method for extracting phonetic feature, a speech recognition method and a speech model training method as well as a noise reduction device, a device for extracting phonetic feature, a speech recognition device and a speech model training device. According to one aspect of the invention, the noise reduction method which is used in speech spectrum containing noise includes the following steps: according to a noise estimation spectrum, logarithm spectrum minimum mean square error estimation of the speech spectrum is completed to reduce the noise contained in the speech spectrum, wherein, the logarithm spectrum minimum mean square error estimation is realized through calculating gain function according to the following steps: the gain function is calculated through Taylor series accumulation and numerical integration; finally, the result of the Taylor series accumulation is combined with that of the numerical integration.

Description

The method and apparatus of squelch, extraction phonetic feature, speech recognition and training utterance model
Technical field
The present invention relates in general to speech recognition technology, particularly, relates to the noise reduction techniques of speech manual.
Background technology
Popular speech recognition system can obtain very high accuracy of identification to clean speech at present, but because noise brings the mismatch between acoustic model and the acoustic feature, the performance of existing speech recognition system can sharply descend under noise circumstance.
Mainly concentrate on Front-end Design in the work aspect the noise robustness, purpose is to reduce the mismatch at speech feature space that noise brings.Least mean-square error (Minimum Mean-Square Error MMSE) estimates it is a kind of voice enhancement algorithm, and it can suppress ground unrest effectively, thus the signal to noise ratio (S/N ratio) of raising input signal (Signal-to-Noise Ratio, SNR).Estimate for least mean-square error, document " Speech enhancement using aminimum mean-square error short-time spectral amplitude estimator " at Y.Ephraim and D.Malah, IEEE Trans.Acoustic, Speech, and Signal Processing, Vol.ASSP-32, pp.1109-1121, be described in detail in 1984, its full content is contained in this with way of reference, (hereinafter is called document 1) for your guidance.In the document, utilize MMSE to estimate to short-time spectrum amplitude (Short-Time Spectral Amplitude, STSA) estimate, and the system that has proposed to utilize MMSE STSA to estimate, and with this system with widely used based on Wiener filtering with subtract the system of composing algorithm (Spectral Subtraction Algorithm) and compare.
Although the distortion measurement of the square error of the spectrum of using in the document of Y.Ephraim and D.Malah is easily handled on mathematics, and has obtained good result, it is not optimal mode.As everyone knows, be more suitable in speech processes based on the distortion measurement of the square error of logarithmic spectrum, for example at R.M.Gray, A.Buzo, A.H.Gray, the document of Jr. and Y.Matsuyama " Distortionmeasures for speech processing; " IEEE Trans.Acoust., Speech, Signalprocessing, vol.ASSP-28, pp.367-376, be described in detail among the Aug.1980, its full content is contained in this with way of reference, for your guidance.Therefore, this distortion measurement is widely used for speech analysis and identification.
Estimate for logarithmic spectrum least mean-square error (LogMMSE), document " Speech enhancement using a minimum mean-square errorlog-spectral amplitude estimator " at Y.Ephraim and D.Malah, IEEE Trans.Acoustic, Speech, andSignal Processing, Vol.ASSP-33, pp.443-445, be described in detail in 1985, its full content is contained in this with way of reference, (hereinafter is called document 2) for your guidance.LogMMSE is better than MMSE, because it can obtain littler residual noise level, does not influence the quality of voice itself simultaneously.In the LogMMSE enhancement algorithms, the employing Taylor series add up or numerical integration is come the calculated gains function.
Yet, in this framework, there are following two problems:
1. Taylor series add up to have only when input value and hour calculate accurately, and numerical integration is only calculated accurately when input value is big.
2. utilize Taylor series to add up or the calculated amount of numerical integration calculated gains function very big.
Summary of the invention
In order to solve above-mentioned problems of the prior art, the invention provides noise suppressing method, extract the method for phonetic feature, the method of audio recognition method and training utterance model, and Noise Suppression Device, extract the device of phonetic feature, the device of speech recognition equipment and training utterance model.
According to an aspect of the present invention, provide a kind of noise suppressing method that is used to contain the noise speech manual, having comprised:, the described noise speech manual that contains has been carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual according to the Noise Estimation spectrum; Wherein, replacing gain function to carry out described logarithmic spectrum least mean-square error with piecewise linear function estimates.
According to another aspect of the present invention, a kind of noise suppressing method that is used to contain the noise speech manual is provided, comprise:, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual according to the Noise Estimation spectrum; Wherein, carrying out described logarithmic spectrum least mean-square error by following steps calculated gains function estimates: utilize the described gain function of Taylor series accumulation calculating; Utilize numerical integration to calculate described gain function; And merge the result that described Taylor series add up and the result of described numerical integration.
According to another aspect of the present invention, provide a kind of method that is used to extract phonetic feature, having comprised: will contain the noise phonetic modification and become to contain the noise speech manual; Utilize above-mentioned noise suppressing method, reduce the described noise that contains the noise speech manual; And extract phonetic feature from the speech manual that described noise reduces.
According to another aspect of the present invention, provide a kind of audio recognition method, having comprised: utilized the method for above-mentioned extraction phonetic feature, extract phonetic feature; And according to the described phonetic feature that extracts, recognizing voice.
According to another aspect of the present invention, provide a kind of method of training utterance model, having comprised: utilized the method for above-mentioned extraction phonetic feature, extract phonetic feature; And, train described speech model according to the described phonetic feature that extracts.
According to another aspect of the present invention, a kind of Noise Suppression Device that is used to contain the noise speech manual is provided, comprise: estimation unit (estimation unit), compose according to Noise Estimation, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual; Wherein, described estimation unit uses piecewise linear function to replace gain function to carry out described logarithmic spectrum least mean-square error estimation.
According to another aspect of the present invention, a kind of Noise Suppression Device that is used to contain the noise speech manual is provided, comprise: estimation unit (estimation unit), compose according to Noise Estimation, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual; Wherein, described estimation unit comprises: Taylor series accumulation calculating unit (Taylor seriesaccumulation calculation unit), utilize the described gain function of Taylor series accumulation calculating; Numerical integration computing unit (numeric integration calculation unit) utilizes numerical integration to calculate described gain function; And merge cells (combination unit), be used to merge described Taylor series accumulation calculating unit result calculated and described numerical integration computing unit result calculated.
According to another aspect of the present invention, provide a kind of device that is used to extract phonetic feature, having comprised: converter unit (transforming unit) will contain the noise phonetic modification and become to contain the noise speech manual; According to above-mentioned Noise Suppression Device, be used to reduce the described noise that contains the noise speech manual; And extraction unit (extracting unit), the speech manual that reduces from described noise extracts described phonetic feature.
According to another aspect of the present invention, provide a kind of speech recognition equipment, having comprised: the device according to above-mentioned extraction phonetic feature is used to extract phonetic feature; And voice recognition unit (speechrecognition unit), according to the described phonetic feature that extracts, recognizing voice.
According to another aspect of the present invention, provide a kind of device of training utterance model, having comprised: the device according to above-mentioned extraction phonetic feature is used to extract phonetic feature; And model training unit (model-training unit), according to the described phonetic feature that extracts, train described speech model.
Description of drawings
Believe by below in conjunction with the explanation of accompanying drawing, can make people understand the above-mentioned characteristics of the present invention, advantage and purpose better the specific embodiment of the invention.
Fig. 1 is the process flow diagram of noise suppressing method according to an embodiment of the invention;
Fig. 2 A-2D shows an example of the process of the cut-point that piecewise linear function is set, wherein Fig. 2 A shows the curve of a gain function, Fig. 2 B shows the curve of the derivative of gain function, Fig. 2 C shows the curve of the difference between gain function and the piecewise linear function, and Fig. 2 D shows the curve of the piecewise linear function after cutting apart;
Fig. 3 is the process flow diagram of noise suppressing method according to another embodiment of the invention;
Fig. 4 A-4C shows an example to Taylor series add up and numerical integration merges, wherein Fig. 4 A shows the gain function that adds up and obtain by Taylor series, Fig. 4 B shows the gain function that obtains by numerical integration, and Fig. 4 C shows by merging the gain function that above-mentioned two kinds of computing method obtain;
Fig. 5 shows and calculates an example that merges threshold value;
Fig. 6 is the process flow diagram of the method for extraction phonetic feature according to another embodiment of the invention;
Fig. 7 is the process flow diagram of audio recognition method according to another embodiment of the invention;
Fig. 8 is the process flow diagram of the method for training utterance model according to another embodiment of the invention;
Fig. 9 is the block scheme of Noise Suppression Device according to another embodiment of the invention;
Figure 10 is the block scheme of Noise Suppression Device according to another embodiment of the invention;
Figure 11 is the block scheme of the device of extraction phonetic feature according to another embodiment of the invention;
Figure 12 is the block scheme of speech recognition equipment according to another embodiment of the invention; And
Figure 13 is the block scheme of the device of training utterance model according to another embodiment of the invention.
Embodiment
For the ease of the understanding of back embodiment, at first briefly introduce least mean-square error (MMSE) estimation and logarithmic spectrum least mean-square error (LogMMSE) estimation principles.
It is a kind of voice enhancement algorithm that MMSE estimates, it utilizes the estimation spectrum of ground unrest, and the noise that contains in the noise speech manual is suppressed, and obtains the speech manual that noise is inhibited.Particularly, least mean-square error is estimated to be undertaken by following formula:
y(t)=x(t)+d(t), 0≤t≤T (1)
A ^ k = E { A k | y ( t ) , 0 ≤ t ≤ T } - - - ( 2 )
Wherein, y (t) expression comprises the signal of voice signal x (t) and noise signal d (t), A kThe amplitude of k the spectral component of expression voice signal x (t),
Figure A20061014124000113
A is passed through in expression kMMSE estimate the speech manual obtain.Obtain by derivation:
A ^ k = C υ k γ k M ( υ k ) R k - - - ( 3 )
Wherein υ k = ξ k 1 + ξ k γ k - - - ( 4 )
Wherein The speech manual that the expression noise is inhibited, R kExpression contains the noise speech manual, and C is a constant, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to the Noise Estimation spectrum, γ kBe the posteriority signal to noise ratio (S/N ratio) of composing and contain the acquisition of noise speech manual according to Noise Estimation, M (υ k) be confluent hypergeometric function, and k represents k spectral component.Detail is referring to the document 1 of above-mentioned Y.Ephraim and D.Malah.
It also is a kind of voice enhancement algorithm that LogMMSE estimates, it can obtain littler residual noise level, does not influence the quality of voice itself simultaneously.Particularly, LogMMSE estimates to be undertaken by following formula:
A ^ k = exp { E [ ln A k | y ( t ) , 0 ≤ t ≤ T ] } - - - ( 5 )
Wherein, different with the formula (1) that carries out using when MMSE estimates is, to the amplitude A of k the spectral component of voice signal x (t) kTake the logarithm.Obtain by derivation:
A ^ k = ξ k 1 + ξ k exp { 1 2 ∫ υ k ∞ e - t t dt } R k - - - ( 6 )
As the gain function G (υ that gives a definition k):
G ( υ k ) ≡ A ^ k R k - - - ( 7 )
Wherein υ k = ξ k 1 + ξ k γ k .
Thereby obtain the speech manual that noise is inhibited
Figure A20061014124000124
For:
A ^ k = G ( υ k ) R k - - - ( 8 )
Detail is referring to the document 2 of above-mentioned Y.Ephraim and D.Malah.
Below just in conjunction with the accompanying drawings each embodiment of the present invention is described in detail.
Fig. 1 is the process flow diagram of noise suppressing method according to an embodiment of the invention.As shown in Figure 1, at first, in step 101, input contains the noise speech manual.Containing the noise speech manual is according to the voice data that comprises ground unrest and voice, and therefore the speech manual that for example utilizes Fast Fourier Transform (FFT) to obtain is the speech manual that ground unrest and voice are superimposed.
Then, in step 105,, carry out the estimation of logarithmic spectrum least mean-square error to containing the noise voice according to the Noise Estimation spectrum of pre-estimating.The Noise Estimation spectrum is the ground unrest that does not have voice to be pre-estimated obtain.The mode that obtains the Noise Estimation spectrum is a lot, for example, the ground unrest spectrum of repeatedly gathering is averaged or the like, and the present invention is to this not special restriction.Particularly, carry out the logarithmic spectrum least mean-square error according to above-mentioned formula (8) and estimate, wherein utilize the gain function G (υ in the piecewise linear function replacement formula (8) k), the formula after the conversion is:
A ^ k = L ( υ k ) R k - - - ( 9 )
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein
Figure A20061014124000128
The speech manual that the expression noise is inhibited, R kExpression contains noise speech manual, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to the Noise Estimation spectrum, γ kBe the posteriority signal to noise ratio (S/N ratio) of composing and contain the acquisition of noise speech manual according to Noise Estimation, L (υ k) be piecewise linear function, and k represents k spectral component.
In the present embodiment, can utilize the piecewise linear function L (υ that preestablishes cut-point k) approximate gain function G (υ k).For example, can carry out piecewise linear function L (υ by following steps k) approximate gain function G (υ k) process.
Particularly, Fig. 2 A-2D shows an example of the process of the cut-point that piecewise linear function is set, wherein Fig. 2 A shows a gain function G (curve v), Fig. 2 B shows the curve of the derivative of gain function, Fig. 2 C shows the curve of the difference between gain function and the piecewise linear function, and Fig. 2 D shows the piecewise linear function L (curve v) after cutting apart.Concrete cutting procedure is as follows.
At first, (derivative v) is as described in Fig. 2 B for the calculated gains function G.For convenience, in this example, the curve of differentiation value in the 0.05-0.50 scope is as example.
Then, (initial segmentation point v) is as described in Fig. 2 B to set piecewise linear function L.For example in this example, to be located at derivative value be 0.10,1.15,0.20,0.25,0.30,0.35,0.40,0.45 place to the initial segmentation point.
Then, calculate piecewise linear function L between per two continuous cut-points of initial segmentation point (v) and gain function G (difference v) is shown in Fig. 2 C.
Then, the difference and the pre-set threshold of the functional value between per two the continuous cut-points that calculate compared, for example, in this example, threshold setting is 0.037.By relatively,, for example, between cut-point 0.10 and 0.15, for example insert a new cut-point in their midpoint if difference greater than 0.037, is then inserted a new cut-point between two continuous cut-points.
Repeat the step of aforementioned calculation difference and step afterwards thereof, up to not having described difference greater than described threshold value.Thereby, obtain the piecewise linear function shown in Fig. 2 D.
Turn back to Fig. 1, utilizing piecewise linear function L (υ k) replacement gain function G (υ k) carry out after the estimation of logarithmic spectrum least mean-square error, in step 110, output estimates to reduce the speech manual of noise by the logarithmic spectrum least mean-square error.
By the noise suppressing method of present embodiment, utilize piecewise linear function to replace gain function, greatly reduced the calculated amount that the logarithmic spectrum least mean-square error is estimated, kept the squelch performance simultaneously.
Under same inventive concept, Fig. 3 is the process flow diagram of noise suppressing method according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 3, at first, in step 301, input contains the noise speech manual.Containing the noise speech manual is according to the voice data that comprises ground unrest and voice, and therefore the speech manual that for example utilizes Fast Fourier Transform (FFT) to obtain is the speech manual that ground unrest and voice are superimposed.
Then, in step 305, carry out the estimation of logarithmic spectrum least mean-square error to containing the noise voice.Particularly, in this step, utilize formula (8), carry out the logarithmic spectrum least mean-square error by Taylor series accumulation calculating gain function and estimate, obtain the curve shown in Fig. 4 A.The Taylor series accumulation method that adopts in the present embodiment can be the known any method of those skilled in the art, and the present invention does not repeat them here this not restriction.
As can be seen, in input variable hour very accurate in Fig. 4 A by the add up gain function value that obtains of Taylor series, and when input variable is big, the gain function value out of true that calculates.
Then,,, utilize formula (8), carry out the logarithmic spectrum least mean-square error by numerical integration calculated gains function and estimate, obtain the curve shown in Fig. 4 B according to the Noise Estimation spectrum in step 310.The numerical integration method that adopts in the present embodiment can be the known any method of those skilled in the art, and the present invention does not repeat them here this not restriction.
As can be seen, opposite with Taylor series accumulation method result calculated in Fig. 4 B, when input variable was big, the gain function value that obtains by numerical integration was very accurate, and in input variable hour, the gain function value out of true that calculates.
Then, in step 315, merga pass Taylor series accumulation method result calculated and numerical integration method result calculated.
Particularly, can will replace by the gain function value that numerical integration obtains by Taylor series coarse part utilization that adds up in the gain function value that obtains among Fig. 4 A, perhaps coarse part utilization in the gain function value that obtains by numerical integration among Fig. 4 B be replaced by the Taylor series gain function value that obtains that adds up.In addition, also can in Taylor series accumulation method and all accurate scope of numerical integration method, get a bit (for example the most approaching place of two curves among Fig. 4 A and Fig. 4 B) arbitrarily, as merging threshold value, will be less than merging passing through gain function value that the Taylor series accumulation method calculates and merging of threshold value greater than the gain function value that numerical integration method calculates of passing through that merges threshold value.
Preferably, can determine above-mentioned merging threshold value by the following method.
At first, to subtract each other by the gain function value of Taylor series accumulation method calculating with by the gain function value that numerical integration method calculates, take absolute value and make log-transformation alternatively subtracting each other the result who obtains alternatively then, obtain curve as shown in Figure 3.Then, the input variable of the minimum value place correspondence of the curve of selection Fig. 3 is as above-mentioned merging threshold value.
After determining to merge threshold value, will be less than merging passing through gain function value that the Taylor series accumulation method calculates and merging of threshold value greater than the gain function value that numerical integration method calculates of passing through that merges threshold value, shown in Fig. 4 A-4C, thereby obtain the accurate gain functional value.
Turn back to Fig. 3, after carrying out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, in step 320, the speech manual of reduction noise is estimated in output by the logarithmic spectrum least mean-square error.
Noise suppressing method by present embodiment, carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.
Under same inventive concept, Fig. 6 is the process flow diagram of the method for extraction phonetic feature according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 6, at first, in step 601, input contains the noise voice, and this contains the noise voice packet and draws together voice and the ground unrest that the speaker says.
Then, in step 605, the described noise phonetic modification that contains is become to contain the noise speech manual, for example (Fast Fourier Transform FFT) becomes the phonetic modification on the time domain speech manual on the frequency domain by fast fourier transform.
Then, in step 610, according to the described noise suppressing method of the embodiment of Fig. 1 and Fig. 2, reduce the described noise that contains the noise speech manual above utilizing.Described noise suppressing method is to carry out the logarithmic spectrum least mean-square error according to above-mentioned formula (9) to estimate, wherein, utilizes piecewise linear function to replace gain function.Identical in concrete noise reduction process and the foregoing description do not repeat them here.
In addition, according to the described noise suppressing method of the embodiment of Fig. 3 to Fig. 5, reduce the described noise that contains the noise speech manual above also can utilizing.Described noise suppressing method is to carry out the logarithmic spectrum least mean-square error according to above-mentioned formula (8) to estimate, wherein, carries out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method.Identical in concrete noise reduction process and the foregoing description do not repeat them here.
At last, in step 615, from the speech manual that noise reduces, extract phonetic feature.Particularly, can pass through Mel frequency cepstral coefficient (Mel Frequency ceptral Coefficient, MFCC) or linear prediction cepstrum coefficient (Linear Predictive Cepstral Coefficient, LPCC) etc. conventional method is extracted phonetic feature, and the present invention is not particularly limited this.
By above explanation as can be known, the method of the extraction phonetic feature of present embodiment can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (9) and estimate to reduce noise, wherein utilize piecewise linear function to replace gain function, greatly reduce the calculated amount that the logarithmic spectrum least mean-square error is estimated, kept the squelch performance simultaneously.Therefore, can improve the quality of phonetic feature.
In addition, the method of the extraction phonetic feature of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (8) and estimate to reduce noise, wherein carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.Therefore, can improve the quality of phonetic feature.
Under same inventive concept, Fig. 7 is the process flow diagram of audio recognition method according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 7, at first,, with reference to the method for the described extraction phonetic feature of the embodiment of figure 6, extract phonetic feature above utilizing in step 701.Identical in concrete leaching process and the foregoing description do not repeat them here.
Then, in step 705,, carry out speech recognition according to the described phonetic feature that extracts.Particularly, for example, phonetic feature and the good template of training in advance that extracts compared, thereby identify the content information of described voice, the present invention is not particularly limited this.
By above explanation as can be known, the audio recognition method of present embodiment can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (9) and estimate to reduce noise, wherein utilize piecewise linear function to replace gain function, greatly reduced the calculated amount that the logarithmic spectrum least mean-square error is estimated, keep the squelch performance simultaneously, thereby can improve the quality of phonetic feature.Therefore, can improve the performance of speech recognition.
In addition, alternatively, the audio recognition method of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (8) and estimate to reduce noise, wherein carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.Therefore, can improve the performance of speech recognition.
Under same inventive concept, Fig. 8 is the process flow diagram of the method for training utterance model according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 8, at first,, with reference to the method for the described extraction phonetic feature of the embodiment of figure 6, extract phonetic feature above utilizing in step 801.Identical in concrete leaching process and the foregoing description do not repeat them here.
Then, in step 805,, train described speech model according to the described phonetic feature that extracts.
By above explanation as can be known, in the method for the training utterance model of present embodiment, can be before from contain the noise speech manual, extracting phonetic feature, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (9) and estimate to reduce noise, wherein utilize piecewise linear function to replace gain function, greatly reduced the calculated amount that the logarithmic spectrum least mean-square error is estimated, kept the squelch performance simultaneously, thereby can improve the quality of phonetic feature.Therefore, can improve the quality of the model that trains.
In addition, alternatively, the method of the training utterance model of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (8) and estimate to reduce noise, wherein carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.Therefore, can improve the quality of the model that trains.
Under same inventive concept, Fig. 9 is the block scheme of Noise Suppression Device according to an embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in Figure 9, the Noise Suppression Device 900 that being used to of present embodiment contains the noise speech manual comprises logarithmic spectrum least mean-square error estimation unit (log-spectral minimum mean-square errorestimation unit) 901, it is composed according to Noise Estimation, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual.Described logarithmic spectrum least mean-square error estimation unit 900 utilizes piecewise linear function to replace gain function, carrying out the logarithmic spectrum least mean-square error according to above-mentioned formula (9) estimates, detail does not repeat them here with above-mentioned identical with reference to the description about noise suppressing method among the embodiment of Fig. 1 and 2.
The Noise Suppression Device 900 of present embodiment can also comprise cut-point preservation unit 905, is used to preserve the cut-point of described piecewise linear function; And Noise Estimation preservation unit 910, be used to preserve the Noise Estimation of ground unrest being pre-estimated acquisition.In addition, described Noise Estimation also can be imported described logarithmic spectrum least mean-square error estimation unit 901 from the outside.
By above explanation as can be known,, greatly reduced the calculated amount that the logarithmic spectrum least mean-square error is estimated, kept the squelch performance simultaneously because the Noise Suppression Device 900 of present embodiment utilizes piecewise linear function to replace gain function.
Under same inventive concept, Figure 10 is the block scheme of Noise Suppression Device according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 10, the Noise Suppression Device 1000 that being used to of present embodiment contains the noise speech manual comprises: logarithmic spectrum least mean-square error estimation unit (log-spectral minimum mean-squareerror estimation unit) 1001, it is composed according to Noise Estimation, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual.Identical among detail and the above-mentioned embodiment about the description of noise suppressing method with reference to figure 3 to Fig. 5, do not repeat them here.
Particularly, logarithmic spectrum least mean-square error estimation unit 1001 also comprises Taylor series accumulation calculating unit (Taylor series accumulation calculation unit) 10011, it utilizes formula (8), carry out the logarithmic spectrum least mean-square error by Taylor series accumulation calculating gain function and estimate, obtain the curve shown in Fig. 4 A.Can to be that those skilled in the art is known anyly can carry out the device that Taylor series add up in the Taylor series accumulation calculating unit 10011 that adopts in the present embodiment, and the present invention does not repeat them here this not restriction.
As can be seen, in input variable hour, the gain function value that is calculated by Taylor series accumulation calculating unit 10011 is very accurate in Fig. 4 A, and when input variable is big, the gain function value out of true that calculates.
In addition, logarithmic spectrum least mean-square error estimation unit 1001 also comprises numerical integration computing unit (numeric integration calculation unit) 10012, it utilizes formula (8), carry out the logarithmic spectrum least mean-square error by numerical integration calculated gains function and estimate, obtain the curve shown in Fig. 4 B.The numerical integration computing unit 10012 that adopts in the present embodiment can be the known any device that can carry out numerical integration of those skilled in the art, and the present invention does not repeat them here this not restriction.
In Fig. 4 B as can be seen, opposite with the result who is calculated by Taylor series accumulation calculating unit 10011, when input variable was big, the gain function value that is calculated by numerical integration computing unit 10012 was very accurate, and in input variable hour, the gain function value out of true that calculates.
In addition, logarithmic spectrum least mean-square error estimation unit 1001 also comprises merge cells (combination unit) 10013, is used to merge result who is calculated by Taylor series accumulation calculating unit 10011 and the result who is calculated by numerical integration computing unit 10012.
Particularly, coarse part utilization in the gain function value that is calculated by Taylor series accumulation calculating unit 10011 among Fig. 4 A can be replaced by the gain function value that numerical integration computing unit 10012 calculates, perhaps coarse part utilization in the gain function value that is calculated by numerical integration computing unit 10012 among Fig. 4 B be replaced by the gain function value that Taylor series accumulation calculating unit 10011 calculates.In addition, also can in Taylor series accumulation calculating unit 10011 and all accurate scope of numerical integration computing unit 10012, get a bit (for example the most approaching place of two curves among Fig. 4 A and Fig. 4 B) arbitrarily, as merging threshold value, will merge less than the gain function value that calculates by Taylor series accumulation calculating unit 10011 that merges threshold value with greater than the gain function value that calculates by numerical integration computing unit 10012 that merges threshold value.
Preferably, merge cells 10013 comprises subtrator (subtraction unit), the gain function value that it will be calculated by Taylor series accumulation calculating unit 10011 and subtracted each other by the gain function value that numerical integration computing unit 10012 calculates; Optional signed magnitude arithmetic(al) unit (absoluteoperation unit), the result that subtrator is obtained takes absolute value; Optional logarithm operation unit (logarithmic operation unit), the result that the signed magnitude arithmetic(al) unit is obtained carries out log-transformation, obtains curve as shown in Figure 3; And selected cell (selection unit), the input variable of the minimum value place correspondence of the curve of selection Fig. 3 is as above-mentioned merging threshold value.
After determining to merge threshold value, merge cells 10013 will merge less than the gain function value that is calculated by Taylor series accumulation calculating unit 10011 that merges threshold value with greater than the gain function value that is calculated by numerical integration computing unit 10012 that merges threshold value, shown in Fig. 4 A-4C, thereby obtain the accurate gain functional value.
Noise Suppression Device 1000 by present embodiment, carrying out the logarithmic spectrum least mean-square error by Taylor series accumulation calculating unit 10011, numerical integration computing unit 10012 and merge cells 10013 merging Taylor series accumulation methods and numerical integration method estimates, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.
Under same inventive concept, Figure 11 is the block scheme of the device of extraction phonetic feature according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 11, the device 1100 that being used to of present embodiment extracted phonetic feature comprises: input block (inputting unit) 1501, and input contains the noise voice; Converter unit (transforming unit) 1105 becomes to contain the noise speech manual with the described noise phonetic modification that contains; Noise Suppression Device 900 recited above or Noise Suppression Device 1000 are used to reduce the described noise that contains the noise speech manual; And extraction unit (extracting unit) 1110, the speech manual that reduces from described noise extracts described phonetic feature.Detail does not repeat them here with above-mentioned identical with reference to the description about the method for extracting phonetic feature among the embodiment of figure 6.
By above explanation as can be known, the device 1100 of the extraction phonetic feature of present embodiment can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (9) and estimate to reduce noise, wherein utilize piecewise linear function to replace gain function, greatly reduce the calculated amount that the logarithmic spectrum least mean-square error is estimated, kept the squelch performance simultaneously.Therefore, can improve the quality of phonetic feature.
In addition, the device 1100 of the extraction phonetic feature of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (8) and estimate to reduce noise, wherein carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.Therefore, can improve the quality of phonetic feature.
Under same inventive concept, Figure 12 is the block scheme of speech recognition equipment according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 12, the speech recognition equipment 1200 of present embodiment comprises: the device 1100 of extraction phonetic feature recited above is used to extract phonetic feature; And voice recognition unit (speechrecognition unit) 1201, according to the described phonetic feature that extracts, carry out speech recognition.Detail does not repeat them here with above-mentioned identical with reference to the description about audio recognition method among the embodiment of figure 7.
By above explanation as can be known, the speech recognition equipment 1200 of present embodiment can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (9) and estimate to reduce noise, wherein utilize piecewise linear function to replace gain function, greatly reduce the calculated amount that the logarithmic spectrum least mean-square error is estimated, kept the squelch performance simultaneously.Therefore, can improve the performance of speech recognition.
In addition, the speech recognition equipment 1200 of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (8) and estimate to reduce noise, wherein carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.Therefore, can improve the performance of speech recognition.
Under same inventive concept, Figure 13 is the block scheme of the device of training utterance model according to another embodiment of the invention.Below just in conjunction with this figure, present embodiment is described.For those parts identical, suitably omit its explanation with front embodiment.
As shown in figure 13, the device 1300 of the training utterance model of present embodiment comprises: the device 1100 of extraction phonetic feature recited above is used to extract phonetic feature; And model training unit (model-training unit) 1301, according to the described phonetic feature that extracts, train described speech model.Detail does not repeat them here with above-mentioned identical with reference to the description about the method for training utterance model among the embodiment of figure 8.
By above explanation as can be known, the device 1300 of the training utterance model of present embodiment can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (9) and estimate to reduce noise, wherein utilize piecewise linear function to replace gain function, greatly reduced the calculated amount that the logarithmic spectrum least mean-square error is estimated, keep the squelch performance simultaneously, thereby can improve the quality of phonetic feature.Therefore, can improve the quality of the model that trains.
In addition, alternatively, the device 1300 of the training utterance model of present embodiment also can be before extracting phonetic feature from contain the noise speech manual, carry out the logarithmic spectrum least mean-square error by above-mentioned formula (8) and estimate to reduce noise, wherein carry out the estimation of logarithmic spectrum least mean-square error by merging Taylor series accumulation method and numerical integration method, can access the performance of desired in theory removal noise, thereby remedy independent use Taylor series accumulation method or numerical integration method calculates coarse shortcoming.Therefore, can improve the quality of the model that trains.
Though more than described noise suppressing method of the present invention in detail by some exemplary embodiments, extract the method for phonetic feature, the method of audio recognition method and training utterance model, and Noise Suppression Device, extract the device of phonetic feature, the device of speech recognition equipment and training utterance model, but above these embodiment are not exhaustive, and those skilled in the art can realize variations and modifications within the spirit and scope of the present invention.Therefore, the present invention is not limited to these embodiment, and scope of the present invention only is as the criterion by claims.

Claims (23)

1. noise suppressing method that is used to contain the noise speech manual comprises:
According to the Noise Estimation spectrum, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual;
Wherein, replacing gain function to carry out described logarithmic spectrum least mean-square error with piecewise linear function estimates.
2. noise suppressing method according to claim 1 wherein, utilizes predefined cut-point that described gain function is transformed to described piecewise linear function, carries out described logarithmic spectrum least mean-square error and estimates.
3. noise suppressing method according to claim 2, wherein, the described predefined cut-point of described piecewise linear function obtains by following steps:
Calculate the derivative of described gain function;
Set the initial segmentation point of described piecewise linear function;
Calculating between per two continuous cut-points of described initial segmentation point described piecewise linear function and the difference between the described gain function;
If described difference greater than a threshold value, is inserted a new cut-point between described two continuous cut-points; And
Repeat the step of described calculating difference and step afterwards thereof, up to not having described difference greater than described threshold value.
4. according to any described noise suppressing method among the claim 1-3, wherein, described logarithmic spectrum least mean-square error is estimated to be undertaken by following formula:
A ^ k = L ( υ k ) R k ,
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein The speech manual that the expression noise is inhibited, R kExpression contains noise speech manual, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to the Noise Estimation spectrum, γ kBe the posteriority signal to noise ratio (S/N ratio) of composing and contain the acquisition of noise speech manual according to Noise Estimation, L (υ k) be piecewise linear function, and k represents k spectral component.
5. noise suppressing method that is used to contain the noise speech manual comprises:
According to the Noise Estimation spectrum, the described noise speech manual that contains is carried out the estimation of logarithmic spectrum least mean-square error, to reduce the described noise that contains the noise speech manual;
Wherein, carrying out described logarithmic spectrum least mean-square error by following steps calculated gains function estimates:
Utilize the described gain function of Taylor series accumulation calculating;
Utilize numerical integration to calculate described gain function; And
Merge the result that described Taylor series add up and the result of described numerical integration.
6. noise suppressing method according to claim 5, wherein, described combining step comprises: the result that described Taylor series are added up and the result of the described numerical integration the most approaching place between them merges.
7. noise suppressing method according to claim 6, wherein, described combining step comprises:
The result that described Taylor series are added up and the result of described numerical integration subtract each other;
The value of selecting the minimum place of absolute value among the above-mentioned result who subtracts each other is as threshold value; And
According to described threshold value, merge the result that described Taylor series add up and the result of described numerical integration.
8. noise suppressing method according to claim 7, wherein, described combining step comprises result that the described Taylor series less than described threshold value are added up and merges greater than the result of the described numerical integration of described threshold value.
9. method that is used to extract phonetic feature comprises:
To contain the noise phonetic modification and become to contain the noise speech manual;
Utilize any described noise suppressing method among the aforesaid right requirement 1-8, reduce the described noise that contains the noise speech manual; And
The speech manual that reduces from described noise extracts phonetic feature.
10. the method for extraction phonetic feature according to claim 9, wherein, described shift step comprises fast fourier transform.
11. an audio recognition method comprises:
Utilize the method for aforesaid right requirement 9 or 10 described extraction phonetic features, extract phonetic feature; And
According to the described phonetic feature that extracts, recognizing voice.
12. the method for a training utterance model comprises:
Utilize the method for aforesaid right requirement 9 or 10 described extraction phonetic features, extract phonetic feature; And
According to the described phonetic feature that extracts, train described speech model.
13. a Noise Suppression Device that is used to contain the noise speech manual comprises:
Estimation unit according to the Noise Estimation spectrum, carries out the estimation of logarithmic spectrum least mean-square error to the described noise speech manual that contains, to reduce the described noise that contains the noise speech manual;
Wherein, described estimation unit uses piecewise linear function to replace gain function to carry out described logarithmic spectrum least mean-square error estimation.
14. Noise Suppression Device according to claim 13 wherein, utilizes predefined cut-point that described gain function is transformed to described piecewise linear function, carries out described logarithmic spectrum least mean-square error and estimates.
15. according to claim 13 or 14 described Noise Suppression Devices, wherein, described estimation unit carries out the logarithmic spectrum least mean-square error by following formula and estimates:
A ^ k = L ( υ k ) R k ,
Wherein υ k = ξ k 1 + ξ k γ k ,
Wherein
Figure A2006101412400004C3
The speech manual that the expression noise is inhibited, R kExpression contains noise speech manual, ξ kBe the priori signal to noise ratio (S/N ratio) that obtains according to the Noise Estimation spectrum, γ kBe the posteriority signal to noise ratio (S/N ratio) of composing and contain the acquisition of noise speech manual according to Noise Estimation, L (υ k) be piecewise linear function, and k represents k spectral component.
16. a Noise Suppression Device that is used to contain the noise speech manual comprises:
Estimation unit according to the Noise Estimation spectrum, carries out the estimation of logarithmic spectrum least mean-square error to the described noise speech manual that contains, to reduce the described noise that contains the noise speech manual;
Wherein, described estimation unit comprises:
Taylor series accumulation calculating unit utilizes the described gain function of Taylor series accumulation calculating;
The numerical integration computing unit utilizes numerical integration to calculate described gain function; And
Merge cells is used to merge described Taylor series accumulation calculating unit result calculated and described numerical integration computing unit result calculated.
17. Noise Suppression Device according to claim 16, wherein, the most approaching place between them merges described merge cells with described Taylor series accumulation calculating unit result calculated and described numerical integration computing unit result calculated.
18. Noise Suppression Device according to claim 17, wherein, described merge cells comprises:
Subtrator subtracts each other described Taylor series accumulation calculating unit result calculated and described numerical integration computing unit result calculated; And
Selected cell, the minimum value of locating of absolute value as a result that is used for selecting above-mentioned subtrator to obtain is as threshold value;
Wherein said merge cells merges described Taylor series accumulation calculating unit result calculated and described numerical integration computing unit result calculated according to described threshold value.
19. Noise Suppression Device according to claim 18, wherein, described merge cells will merge less than the described Taylor series accumulation calculating unit result calculated of described threshold value with greater than the described numerical integration computing unit result calculated of described threshold value.
20. a device that is used to extract phonetic feature comprises:
Converter unit will contain the noise phonetic modification and become to contain the noise speech manual;
Any described Noise Suppression Device according among the aforesaid right requirement 13-19 is used to reduce the described noise that contains the noise speech manual; And
Extraction unit, the speech manual that reduces from described noise extracts described phonetic feature.
21. the device of extraction phonetic feature according to claim 20, wherein, described converter unit is configured to carry out conversion by fast fourier transform.
22. a speech recognition equipment comprises:
Device according to aforesaid right requirement 20 or 21 described extraction phonetic features is used to extract phonetic feature; And
Voice recognition unit is according to the described phonetic feature that extracts, recognizing voice.
23. the device of a training utterance model comprises:
Device according to aforesaid right requirement 20 or 21 described extraction phonetic features is used to extract phonetic feature; And
Described speech model according to the described phonetic feature that extracts, is trained in the model training unit.
CN2006101412409A 2006-09-29 2006-09-29 Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model Expired - Fee Related CN101154383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2006101412409A CN101154383B (en) 2006-09-29 2006-09-29 Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2006101412409A CN101154383B (en) 2006-09-29 2006-09-29 Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model

Publications (2)

Publication Number Publication Date
CN101154383A true CN101154383A (en) 2008-04-02
CN101154383B CN101154383B (en) 2010-10-06

Family

ID=39256000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2006101412409A Expired - Fee Related CN101154383B (en) 2006-09-29 2006-09-29 Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model

Country Status (1)

Country Link
CN (1) CN101154383B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440869A (en) * 2013-09-03 2013-12-11 大连理工大学 An audio reverberation suppressing device and its suppressing method
CN105448303A (en) * 2015-11-27 2016-03-30 百度在线网络技术(北京)有限公司 Voice signal processing method and apparatus
CN106409289A (en) * 2016-09-23 2017-02-15 合肥华凌股份有限公司 Environment self-adaptive method of speech recognition, speech recognition device and household appliance
CN108053835A (en) * 2017-11-13 2018-05-18 河海大学 A kind of noise estimation method based on passage Taylor series
CN108717851A (en) * 2018-03-28 2018-10-30 深圳市三诺数字科技有限公司 A kind of audio recognition method and device
CN109378012A (en) * 2018-10-11 2019-02-22 苏州思必驰信息科技有限公司 Noise reduction method and system for single-channel voice device recording audio
CN110232905A (en) * 2019-06-12 2019-09-13 会听声学科技(北京)有限公司 Uplink noise-reduction method, device and electronic equipment
CN111833897A (en) * 2020-09-02 2020-10-27 合肥分贝工场科技有限公司 Voice enhancement method for interactive education
CN112309426A (en) * 2020-11-24 2021-02-02 北京达佳互联信息技术有限公司 Voice processing model training method and device and voice processing method and device
CN113838475A (en) * 2021-11-29 2021-12-24 成都航天通信设备有限责任公司 Voice signal enhancement method and system based on logarithm MMSE estimator
CN111564154B (en) * 2020-03-23 2023-08-08 北京邮电大学 Method and device for defending against sample attack based on voice enhancement algorithm

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9905788D0 (en) * 1999-03-12 1999-05-05 Fulcrum Systems Ltd Background-noise reduction
CN1188830C (en) * 2002-06-28 2005-02-09 清华大学 An impact and noise resistance process of limiting observation probability minimum value in a speech recognition system
JP2004198810A (en) * 2002-12-19 2004-07-15 Denso Corp Speech recognition device
CN1281003C (en) * 2004-02-26 2006-10-18 上海交通大学 Time-domain adaptive channel estimating method based on pilot matrix
CN101089952B (en) * 2006-06-15 2010-10-06 株式会社东芝 Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103440869B (en) * 2013-09-03 2017-01-18 大连理工大学 An audio reverberation suppressing device and its suppressing method
CN103440869A (en) * 2013-09-03 2013-12-11 大连理工大学 An audio reverberation suppressing device and its suppressing method
CN105448303A (en) * 2015-11-27 2016-03-30 百度在线网络技术(北京)有限公司 Voice signal processing method and apparatus
CN106409289A (en) * 2016-09-23 2017-02-15 合肥华凌股份有限公司 Environment self-adaptive method of speech recognition, speech recognition device and household appliance
CN106409289B (en) * 2016-09-23 2019-06-28 合肥美的智能科技有限公司 Environment self-adaption method, speech recognition equipment and the household electrical appliance of speech recognition
CN108053835A (en) * 2017-11-13 2018-05-18 河海大学 A kind of noise estimation method based on passage Taylor series
CN108717851B (en) * 2018-03-28 2021-04-06 深圳市三诺数字科技有限公司 Voice recognition method and device
CN108717851A (en) * 2018-03-28 2018-10-30 深圳市三诺数字科技有限公司 A kind of audio recognition method and device
CN109378012B (en) * 2018-10-11 2021-05-28 思必驰科技股份有限公司 Noise reduction method and system for single-channel voice device recording audio
CN109378012A (en) * 2018-10-11 2019-02-22 苏州思必驰信息科技有限公司 Noise reduction method and system for single-channel voice device recording audio
CN110232905A (en) * 2019-06-12 2019-09-13 会听声学科技(北京)有限公司 Uplink noise-reduction method, device and electronic equipment
CN111564154B (en) * 2020-03-23 2023-08-08 北京邮电大学 Method and device for defending against sample attack based on voice enhancement algorithm
CN111833897A (en) * 2020-09-02 2020-10-27 合肥分贝工场科技有限公司 Voice enhancement method for interactive education
CN111833897B (en) * 2020-09-02 2023-08-22 合肥分贝工场科技有限公司 Voice enhancement method for interactive education
CN112309426A (en) * 2020-11-24 2021-02-02 北京达佳互联信息技术有限公司 Voice processing model training method and device and voice processing method and device
CN113838475A (en) * 2021-11-29 2021-12-24 成都航天通信设备有限责任公司 Voice signal enhancement method and system based on logarithm MMSE estimator
CN113838475B (en) * 2021-11-29 2022-02-15 成都航天通信设备有限责任公司 Voice signal enhancement method and system based on logarithm MMSE estimator

Also Published As

Publication number Publication date
CN101154383B (en) 2010-10-06

Similar Documents

Publication Publication Date Title
CN101154383B (en) Method and device for noise suppression, phonetic feature extraction, speech recognition and training voice model
CN101089952B (en) Method and device for controlling noise, smoothing speech manual, extracting speech characteristic, phonetic recognition and training phonetic mould
US10614827B1 (en) System and method for speech enhancement using dynamic noise profile estimation
US7877254B2 (en) Method and apparatus for enrollment and verification of speaker authentication
Govindan et al. Adaptive wavelet shrinkage for noise robust speaker recognition
CN104464728A (en) Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
CN113744725B (en) Training method of voice endpoint detection model and voice noise reduction method
Shokouhi et al. Robust overlapped speech detection and its application in word-count estimation for prof-life-log data
CN101853665A (en) Method for eliminating noise in voice
van Hout et al. A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition
CN104205214B (en) noise reduction method and device
Elshamy et al. An iterative speech model-based a priori SNR estimator
KR100969138B1 (en) Noise Mask Estimation Method using Hidden Markov Model and Apparatus
Gupta et al. Speech enhancement using MMSE estimation and spectral subtraction methods
Astudillo et al. An uncertainty propagation approach to robust ASR using the ETSI advanced front-end
Elshamy et al. Two-stage speech enhancement with manipulation of the cepstral excitation
Abka et al. Speech recognition features: Comparison studies on robustness against environmental distortions
CN101223574A (en) Speech recognition device and method using voiceband signal
Alam et al. Smoothed nonlinear energy operator-based amplitude modulation features for robust speech recognition
JP4325044B2 (en) Speech recognition system
Tu et al. Computational auditory scene analysis based voice activity detection
Arakawa et al. Model-basedwiener filter for noise robust speech recognition
Li et al. Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition
Shome et al. Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech
Panda Psychoacoustic model compensation with robust feature set for speaker verification in additive noise

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20101006

Termination date: 20160929

CF01 Termination of patent right due to non-payment of annual fee