[go: up one dir, main page]

CN104064196B - A method of improving speech recognition accuracy based on speech front-end noise elimination - Google Patents

A method of improving speech recognition accuracy based on speech front-end noise elimination Download PDF

Info

Publication number
CN104064196B
CN104064196B CN201410281240.3A CN201410281240A CN104064196B CN 104064196 B CN104064196 B CN 104064196B CN 201410281240 A CN201410281240 A CN 201410281240A CN 104064196 B CN104064196 B CN 104064196B
Authority
CN
China
Prior art keywords
speech
noise
channel
env
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410281240.3A
Other languages
Chinese (zh)
Other versions
CN104064196A (en
Inventor
刘明
王明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology Shenzhen
Original Assignee
Harbin Institute of Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology Shenzhen filed Critical Harbin Institute of Technology Shenzhen
Priority to CN201410281240.3A priority Critical patent/CN104064196B/en
Publication of CN104064196A publication Critical patent/CN104064196A/en
Application granted granted Critical
Publication of CN104064196B publication Critical patent/CN104064196B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明为大规模孤立词语音识别提供了一种基于语音前端处理消除噪声从而提高识别准确率的方法,本发明的方法解决了由于含有噪声在MFCC提取过程中语音端点检测错误导致识别准确率低的问题。计算听觉场景分析(CASA)用于语音识别的前端,相比于降噪、语音增强等传统去噪方法,通过模拟人耳的听觉神经系统,可以有效的将噪声从带噪语音中分离出来。本发明中对10240个带噪语音进行识别,相比于不进行前端噪声处理,识别的准确率由83%提高到了95.5%。

The present invention provides a method for eliminating noise based on speech front-end processing for large-scale isolated word speech recognition to improve recognition accuracy. The method of the present invention solves the problem of low recognition accuracy due to noise in the speech endpoint detection error in the MFCC extraction process. The problem. Computational auditory scene analysis (CASA) is used in the front end of speech recognition. Compared with traditional noise reduction methods such as noise reduction and speech enhancement, CASA can effectively separate noise from noisy speech by simulating the auditory nervous system of the human ear. In the present invention, 10,240 noisy speeches are recognized, and the recognition accuracy is increased from 83% to 95.5% compared with no front-end noise processing.

Description

一种基于语音前端噪声消除的提高语音识别准确率的方法A method of improving speech recognition accuracy based on speech front-end noise elimination

技术领域technical field

本发明涉及孤立词语音识别领域,具体涉及一种提高大规模孤立词语音识别的准确率的方法。The invention relates to the field of speech recognition of isolated words, in particular to a method for improving the accuracy of speech recognition of large-scale isolated words.

背景技术Background technique

语音识别技术中研究和应用最广泛的特征参数是梅尔倒谱系数(MFCC),低频段MFCC参数具有较高的谱分辨率,适合于语音识别。从目前使用的情况来看,梅尔刻度倒频谱参数已基本取代原本常用的线性预测编码导出的倒频谱参数,原因是它考虑了人类发声与接收声音的特性,在语音识别方面表现出了更好的鲁棒性。The most widely studied and applied feature parameter in speech recognition technology is the Mel cepstrum coefficient (MFCC). The low-frequency MFCC parameters have high spectral resolution and are suitable for speech recognition. Judging from the current situation, the Mel-scale cepstrum parameters have basically replaced the cepstrum parameters derived from the commonly used linear predictive coding, because it takes into account the characteristics of human vocalization and receiving sound, and shows better performance in speech recognition. Good robustness.

但是MFCC参数在存在较大的背景噪声的情况下,其识别率也不是很好。由于自然界任何地方都存在噪声,因此任何人发出的语音都是混有噪声的语音,即使是在绝对安静的环境下。在时域中,背景噪声以横波的形式叠加在语音波形上,在该情况下,在进行语音端点检测的时候,无疑会将噪声大、语音小的部分波形也当成有用的语音帧,这样提取的语音特征参数MFCC是不理想的,甚至是不可用的。However, the recognition rate of MFCC parameters is not very good in the presence of large background noise. Since noise exists everywhere in nature, the speech of any human being is speech mixed with noise, even in an environment of absolute silence. In the time domain, the background noise is superimposed on the speech waveform in the form of a transverse wave. In this case, when the speech endpoint detection is performed, the part of the waveform with large noise and small speech will undoubtedly be regarded as a useful speech frame. The speech feature parameters of MFCC are not ideal, or even unusable.

人的听觉系统能够在噪声环境中区分并跟踪自己感兴趣的语音信号,即使多种声音同时存在也能“听取”所需要的内容。听觉场景分析(ASA)正是在这一听觉生理现象上提出的理论。CASA模拟人耳的神经听觉系统,对语音信号的处理更接近于人对混合声音信号的听觉感知过程。因此可以用来将噪声从语音信号中分离出来,得到比较纯的语音信号,实际上是在语音识别过程中加入一个前端处理,从而达到提高含噪声语音识别的准确率。利用CASA进行语音增强的重点是选择合适的特征来分离目标语音和背景噪音,可用的特征包括语谱能量、基因频率和信道互相关特征阈值。The human auditory system can distinguish and track the speech signal of interest in a noisy environment, and can "hear" the desired content even if multiple sounds exist at the same time. Auditory Scene Analysis (ASA) is a theory proposed on this physiological phenomenon of hearing. CASA simulates the neural auditory system of the human ear, and the processing of speech signals is closer to the process of human auditory perception of mixed sound signals. Therefore, it can be used to separate the noise from the speech signal to obtain a relatively pure speech signal. In fact, a front-end processing is added to the speech recognition process, so as to improve the accuracy of speech recognition with noise. The focus of speech enhancement using CASA is to select appropriate features to separate target speech from background noise, available features include spectral energy, gene frequency, and channel cross-correlation feature threshold.

发明内容Contents of the invention

为解决现有技术中存在的问题,本发明提出了一种通过基于语音前端噪声消除来提高大规模孤立词语音识别准确率的方法,解决了由于含有噪声,MFCC提取过程中语音端点检测错误导致识别准确率低的问题。In order to solve the problems existing in the prior art, the present invention proposes a method for improving the accuracy of large-scale isolated word speech recognition by eliminating noise based on the front end of the speech, which solves the problem of speech endpoint detection errors in the MFCC extraction process due to noise. Problems with low recognition accuracy.

本发明通过以下技术方案实现:The present invention is realized through the following technical solutions:

一种基于语音前端噪声消除的提高语音识别准确率的方法,其特征在于:所述方法采用计算听觉场景分析(CASA)实现语音识别前端的噪声消除,所述方法包括以下步骤:A method for improving the accuracy of speech recognition based on voice front-end noise elimination, characterized in that: the method adopts computational auditory scene analysis (CASA) to realize the noise elimination of the speech recognition front-end, and the method comprises the following steps:

A.16KHz采样的带噪语音,先通过一个32通道的Gammatone滤波器,中心频率为50Hz~8KHz,对滤波后的信号加时间分辨率为20ms的矩形窗,帧率为100Hz;A. The noisy speech sampled at 16KHz first passes through a 32-channel Gammatone filter with a center frequency of 50Hz to 8KHz, and adds a rectangular window with a time resolution of 20ms to the filtered signal, and the frame rate is 100Hz;

B.计算第i个频率第j帧的听觉谱的噪声包络和语音包络,计算公式为:B. Calculate the noise envelope and the voice envelope of the auditory spectrum of the jth frame of the i frequency, the calculation formula is:

其中,i,j分别表示第i个频率,第j帧;N是一帧内的采样点的个数;x表示信号的时域振幅,下标L和R表示两个不同的声道;Among them, i and j respectively represent the i-th frequency and the j-th frame; N is the number of sampling points in one frame; x represents the time-domain amplitude of the signal, and the subscripts L and R represent two different channels;

C.计算噪声通道和语音通道的互相关函数C. Calculate the cross-correlation function of the noise channel and the speech channel

其中,τ是语音和噪声的特征时延,τ的取值范围是-16到16,对应16KHz的采样率下的-1ms到1ms的相对事件范围;Among them, τ is the characteristic delay of speech and noise, and the value range of τ is -16 to 16, corresponding to the relative event range of -1ms to 1ms under the sampling rate of 16KHz;

D.通过互相关函数计算计算噪声通道和语音通道的ITD和ILD:D. Calculate the ITD and ILD of the noise channel and the speech channel by calculating the cross-correlation function:

ITD(i,j)=argmaxCCi,j(τ),ITD(i,j)=argmaxCC i,j (τ),

E.通过将所有帧、所有频率信道上的互相关函数相加,求出该和的极值,即为语音和噪声的特征时延τ,E. By adding the cross-correlation functions on all frames and all frequency channels, find the extreme value of the sum, which is the characteristic time delay τ of speech and noise,

判断哪一个声道输入的是语音信号,当τ为负时,第一信道信号为纯语音;反之,第二个信道的信号为纯语音;Judging which channel input is a speech signal, when τ is negative, the first channel signal is pure speech; otherwise, the signal of the second channel is pure speech;

F.采用简单的3状态单项状态跳转HMM模型计算第i个频率第j帧信号的掩模m(i,j),掩模信息用来估计语音包络,其中F. Use a simple 3-state single-item state-jump HMM model to calculate the mask m(i,j) of the j-th frame signal at the i-th frequency, and the mask information is used to estimate the speech envelope, where

结合B中的包络可以计算出分离出噪声的语音的包络谱:Combined with the envelope in B, the envelope spectrum of the noise-separated speech can be calculated:

G.通过求解对数能量,提取每一帧语音的一个12维的谱系数向量,得到的系数向量可以直接作为语音识别的特征参数,具体采用以下公式:G. By solving the logarithmic energy, a 12-dimensional spectral coefficient vector of each frame of speech is extracted, and the obtained coefficient vector can be directly used as a characteristic parameter of speech recognition, specifically using the following formula:

其中,I是Gammatone滤波器的数量,其取值32,j、k分别表示第j帧中的第k个谱系数。Among them, I is the number of Gammatone filters, and its value is 32, and j and k respectively represent the kth spectral coefficient in the jth frame.

本发明的有益效果是:本发明为大规模孤立词语音识别提供了一种语音前端处理消除噪声从而提高识别准确率的方法。本发明解决了由于含有噪声,MFCC提取过程中语音端点检测错误导致识别准确率低的问题。实验结果表明,该算法在增加了一定计算量的前提下,有效地提高了噪声环境下大规模孤立词语音识别的准确率。The beneficial effects of the present invention are: the present invention provides a voice front-end processing method for eliminating noise and improving recognition accuracy for large-scale isolated word speech recognition. The invention solves the problem of low recognition accuracy due to noise and speech endpoint detection errors in the MFCC extraction process. Experimental results show that the algorithm can effectively improve the accuracy of speech recognition of large-scale isolated words in noise environment under the premise of increasing a certain amount of calculation.

附图说明Description of drawings

图1是本发明的语音前端噪声消除过程示意图。FIG. 1 is a schematic diagram of the voice front-end noise elimination process of the present invention.

具体实施方式detailed description

下面结合附图说明及具体实施方式对本发明进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

本发明的工作原理如下:输入的带噪语音信号可以看作是两个通信信道分别输入纯语音和纯噪声的模型,因此CASA模拟人耳的作用,根据两个信道到达的信号时间差(ITD)和强度差(ILD)来确定声源,即将注意力放到纯语音信号上面。CASA用ITD和ILD估计时频域上时频单元(T-F unit)的掩模信息,T-F掩模的信息可以指出T-F区域哪里是噪声,哪里是语音,最后将包含语音信息的T-F区域进行语音合成,还原“纯”语音。The working principle of the present invention is as follows: the input noisy speech signal can be regarded as the model of two communication channels inputting pure speech and pure noise respectively, so CASA simulates the effect of human ear, and according to the signal time difference (ITD) of two channel arrivals The sum intensity difference (ILD) is used to determine the sound source, that is, the focus is placed on the pure speech signal. CASA uses ITD and ILD to estimate the mask information of the time-frequency unit (T-F unit) in the time-frequency domain. The information of the T-F mask can indicate where the T-F area is noise and where is the voice. Finally, the T-F area containing voice information is used for speech synthesis. , to restore the "pure" voice.

如图1所示,本发明的基于语音前端噪声消除的提高语音识别准确率的方法,采用计算听觉场景分析(CASA)实现语音识别前端的噪声消除,所述方法包括以下步骤:As shown in Figure 1, the method for improving the accuracy of speech recognition based on speech front-end noise elimination of the present invention adopts computational auditory scene analysis (CASA) to realize the noise elimination of speech recognition front-end, and described method comprises the following steps:

A.16KHz采样的带噪语音,先通过一个32通道的Gammatone滤波器,中心频率为50Hz~8KHz,对滤波后的信号加时间分辨率为20ms的矩形窗,帧率为100Hz;A. The noisy speech sampled at 16KHz first passes through a 32-channel Gammatone filter with a center frequency of 50Hz to 8KHz, and adds a rectangular window with a time resolution of 20ms to the filtered signal, and the frame rate is 100Hz;

B.计算第i个频率第j帧的听觉谱的噪声包络和语音包络,计算公式为:B. Calculate the noise envelope and the voice envelope of the auditory spectrum of the jth frame of the i frequency, the calculation formula is:

其中,i,j分别表示第i个频率,第j帧;N是一帧内的采样点的个数;x表示信号的时域振幅,下标L和R表示两个不同的声道;Among them, i and j respectively represent the i-th frequency and the j-th frame; N is the number of sampling points in one frame; x represents the time-domain amplitude of the signal, and the subscripts L and R represent two different channels;

C.计算噪声通道和语音通道的互相关函数C. Calculate the cross-correlation function of the noise channel and the speech channel

其中,τ是语音和噪声的特征时延,τ的取值范围是-16到16,对应16KHz的采样率下的-1ms到1ms的相对事件范围;Among them, τ is the characteristic delay of speech and noise, and the value range of τ is -16 to 16, corresponding to the relative event range of -1ms to 1ms under the sampling rate of 16KHz;

D.通过互相关函数计算计算噪声通道和语音通道的ITD和ILD:D. Calculate the ITD and ILD of the noise channel and the speech channel by calculating the cross-correlation function:

ITD(i,j)=argmaxCCi,j(τ),ITD(i,j)=argmaxCC i,j (τ),

E.通过将所有帧、所有频率信道上的互相关函数相加,求出该和的极值,即为语音和噪声的特征时延τ,E. By adding the cross-correlation functions on all frames and all frequency channels, find the extreme value of the sum, which is the characteristic time delay τ of speech and noise,

判断哪一个声道输入的是语音信号,当τ为负时,第一信道信号为纯语音;反之,第二个信道的信号为纯语音;Judging which channel input is a speech signal, when τ is negative, the first channel signal is pure speech; otherwise, the signal of the second channel is pure speech;

F.采用简单的3状态单项状态跳转HMM模型计算第i个频率第j帧信号的掩模m(i,j),掩模信息用来估计语音包络,其中F. Use a simple 3-state single-item state-jump HMM model to calculate the mask m(i,j) of the j-th frame signal at the i-th frequency, and the mask information is used to estimate the speech envelope, where

结合B中的包络可以计算出分离出噪声的语音的包络谱:Combined with the envelope in B, the envelope spectrum of the noise-separated speech can be calculated:

G.通过求解对数能量,提取每一帧语音的一个12维的谱系数向量,得到的系数向量可以直接作为语音识别的特征参数,具体采用以下公式:G. By solving the logarithmic energy, a 12-dimensional spectral coefficient vector of each frame of speech is extracted, and the obtained coefficient vector can be directly used as a characteristic parameter of speech recognition, specifically using the following formula:

其中,I是Gammatone滤波器的数量,其取值32,j、k分别表示第j帧中的第k个谱系数。Among them, I is the number of Gammatone filters, and its value is 32, and j and k respectively represent the kth spectral coefficient in the jth frame.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims (1)

1.一种基于语音前端噪声消除的提高语音识别准确率的方法,其特征在于:所述方法采用计算听觉场景分析(CASA)实现语音识别前端的噪声消除,所述方法包括以下步骤:1. a method for improving speech recognition accuracy based on speech front-end noise elimination, is characterized in that: described method adopts computing auditory scene analysis (CASA) to realize the noise elimination of speech recognition front-end, and described method comprises the following steps: A.16KHz采样的带噪语音,先通过一个32通道的Gammatone滤波器,中心频率为50Hz~8KHz,对滤波后的信号加时间分辨率为20ms的矩形窗,帧率为100Hz;A. The noisy speech sampled at 16KHz first passes through a 32-channel Gammatone filter with a center frequency of 50Hz to 8KHz, and adds a rectangular window with a time resolution of 20ms to the filtered signal, and the frame rate is 100Hz; B.计算第i个频率第j帧的听觉谱的噪声包络和语音包络,计算公式为:B. Calculate the noise envelope and the voice envelope of the auditory spectrum of the jth frame of the i frequency, the calculation formula is: envenv LL (( ii ,, jj )) == || &Sigma;&Sigma; nno == 00 NN -- 11 xx LL ii ,, jj (( nno )) || envenv RR (( ii ,, jj )) == || &Sigma;&Sigma; nno == 00 NN -- 11 xx RR ii ,, jj (( nno )) || 其中,i,j分别表示第i个频率,第j帧;N是一帧内的采样点的个数;Among them, i and j respectively represent the i-th frequency and the j-th frame; N is the number of sampling points in one frame; x表示信号的时域振幅,下标L和R表示两个不同的声道;x represents the time-domain amplitude of the signal, and the subscripts L and R represent two different channels; C.计算噪声通道和语音通道的互相关函数C. Calculate the cross-correlation function of the noise channel and the speech channel CCCC ii ,, jj (( &tau;&tau; )) == 11 NN &Sigma;&Sigma; 00 NN -- 11 || xx SS ii ,, jj (( nno )) xx NN ii ,, jj (( nno -- &tau;&tau; )) || 11 NN &Sigma;&Sigma; 00 NN -- 11 || xx SS ii ,, jj (( nno )) || 22 11 NN &Sigma;&Sigma; 00 NN -- 11 || xx NN ii ,, jj (( nno -- &tau;&tau; )) || 22 ,, 其中,τ是语音和噪声的特征时延,τ的取值范围是-16到16,对应16KHz的采样率下的-1ms到1ms的相对时间范围;Among them, τ is the characteristic delay of speech and noise, and the value range of τ is -16 to 16, corresponding to the relative time range of -1ms to 1ms under the sampling rate of 16KHz; D.通过互相关函数计算计算噪声通道和语音通道的ITD和ILD:D. Calculate the ITD and ILD of the noise channel and the speech channel by calculating the cross-correlation function: ITD(i,j)=argmaxCCi,j(τ),ITD(i,j)=argmaxCC i,j (τ), II LL DD. (( ii ,, jj )) == 2020 loglog 1010 &lsqb;&lsqb; envenv LL (( ii ,, jj )) envenv RR (( ii ,, jj )) &rsqb;&rsqb; ;; E.通过将所有帧、所有频率信道上的互相关函数相加,求出该和的极值,即为语音和噪声的特征时延τ,E. By adding the cross-correlation functions on all frames and all frequency channels, find the extreme value of the sum, which is the characteristic time delay τ of speech and noise, &tau;&tau; == argarg mm aa xx &Sigma;&Sigma; ii ,, jj CCCC ii ,, jj (( &tau;&tau; )) ;; 判断哪一个声道输入的是语音信号,当τ为负时,L声道信号为纯语音;Judging which channel input is a speech signal, when τ is negative, the L channel signal is pure speech; 反之,R声道的信号为纯语音;Conversely, the R channel signal is pure speech; F.采用简单的3状态单项状态跳转HMM模型计算第i个频率第j帧信号的掩模m(i,j),掩模信息用来估计语音包络,其中,F. Use a simple 3-state single-item state-jump HMM model to calculate the mask m(i,j) of the j-th frame signal at the i-th frequency, and the mask information is used to estimate the speech envelope, where, mm (( ii ,, jj )) == 11 11 ++ expexp {{ &lsqb;&lsqb; II TT DD. (( ii ,, jj )) -- 0.50.5 &rsqb;&rsqb; &lsqb;&lsqb; II LL DD. (( ii ,, jj )) -- 0.50.5 )) &rsqb;&rsqb; }} ,, 结合B中的包络可以计算出分离出噪声的语音的包络谱:Combined with the envelope in B, the envelope spectrum of the noise-separated speech can be calculated: envenv Mm == envenv LL (( ii ,, jj )) &CenterDot;&CenterDot; mm (( ii ,, jj )) &tau;&tau; mm aa xx << 00 envenv RR (( ii ,, jj )) &CenterDot;&CenterDot; mm (( ii ,, jj )) &tau;&tau; mm aa xx &GreaterEqual;&Greater Equal; 00 ;; G.通过求解对数能量,提取每一帧语音的一个12维的谱系数向量,得到的谱系数向量直接作为语音识别的特征参数,具体采用以下公式:G. By solving the logarithmic energy, a 12-dimensional spectral coefficient vector of each frame of speech is extracted, and the obtained spectral coefficient vector is directly used as a characteristic parameter of speech recognition, specifically using the following formula: cc (( jj ,, kk )) == &Sigma;&Sigma; ii == 11 II ll nno &lsqb;&lsqb; envenv Mm (( ii ,, jj )) &rsqb;&rsqb; cc oo sthe s &lsqb;&lsqb; kk &pi;&pi; II (( ii -- 0.50.5 )) &rsqb;&rsqb; ,, 其中,I是Gammatone滤波器的数量,其取值为32,j、k分别表示第j帧中的第k个谱系数。Among them, I is the number of Gammatone filters, and its value is 32, and j and k respectively represent the kth spectral coefficient in the jth frame.
CN201410281240.3A 2014-06-20 2014-06-20 A method of improving speech recognition accuracy based on speech front-end noise elimination Expired - Fee Related CN104064196B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410281240.3A CN104064196B (en) 2014-06-20 2014-06-20 A method of improving speech recognition accuracy based on speech front-end noise elimination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410281240.3A CN104064196B (en) 2014-06-20 2014-06-20 A method of improving speech recognition accuracy based on speech front-end noise elimination

Publications (2)

Publication Number Publication Date
CN104064196A CN104064196A (en) 2014-09-24
CN104064196B true CN104064196B (en) 2017-08-01

Family

ID=51551874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410281240.3A Expired - Fee Related CN104064196B (en) 2014-06-20 2014-06-20 A method of improving speech recognition accuracy based on speech front-end noise elimination

Country Status (1)

Country Link
CN (1) CN104064196B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305614A (en) * 2017-01-11 2018-07-20 中兴通讯股份有限公司 A kind of method of speech processing and device
CN108806708A (en) * 2018-06-13 2018-11-13 中国电子科技集团公司第三研究所 Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model
CN109288649B (en) * 2018-10-19 2020-07-31 奥弗锐(福建)电子科技有限公司 Intelligent voice control massage chair
CN110070863A (en) * 2019-03-11 2019-07-30 华为技术有限公司 A kind of sound control method and device
CN111796790B (en) * 2019-04-09 2023-09-08 深圳市冠旭电子股份有限公司 Sound effect adjusting method and device, readable storage medium and terminal equipment
CN110191387A (en) * 2019-05-31 2019-08-30 深圳市荣盛智能装备有限公司 Automatic starting control method, device, electronic equipment and the storage medium of earphone
CN111523389A (en) * 2020-03-25 2020-08-11 中国平安人寿保险股份有限公司 Intelligent emotion recognition method and device, electronic equipment and storage medium
CN115273880A (en) * 2022-07-21 2022-11-01 百果园技术(新加坡)有限公司 Voice noise reduction method, model training method, device, equipment, medium and product

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157156A (en) * 2011-03-21 2011-08-17 清华大学 Single-channel voice enhancement method and system
CN103456312A (en) * 2013-08-29 2013-12-18 太原理工大学 Single channel voice blind separation method based on computational auditory scene analysis

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI412023B (en) * 2010-12-14 2013-10-11 Univ Nat Chiao Tung A microphone array structure and method for noise reduction and enhancing speech

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102157156A (en) * 2011-03-21 2011-08-17 清华大学 Single-channel voice enhancement method and system
CN103456312A (en) * 2013-08-29 2013-12-18 太原理工大学 Single channel voice blind separation method based on computational auditory scene analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
An Auditory Scene Analysis Approach to Monaural Speech Segregation;G.N.Hu etc;《Signals and Communication Technology》;20061231;第485-515页 *
基于计算听觉场景分析的单声道语音分离研究;赵立恒;《中国博士学位论文全文数据库 信息科技辑》;20130115(第01期);第1-82页 *
基于计算听觉场景分析的语音增强改进算法;王雨 等;《华东理工大学学报(自然科学版)》;20121031;第38卷(第5期);第617-621页 *
基于计算听觉场景分析的语音盲分离方法;王卫华 等;《哈尔滨工程大学学报》;20080430;第29卷(第4期);第395-399页 *

Also Published As

Publication number Publication date
CN104064196A (en) 2014-09-24

Similar Documents

Publication Publication Date Title
CN104064196B (en) A method of improving speech recognition accuracy based on speech front-end noise elimination
Das et al. Fundamentals, present and future perspectives of speech enhancement
CN103236260B (en) Speech recognition system
CN107886967B (en) Bone conduction voice enhancement method of deep bidirectional gate recurrent neural network
CN102054480B (en) A Monophonic Aliasing Speech Separation Method Based on Fractional Fourier Transform
Han et al. Deep neural network based spectral feature mapping for robust speech recognition.
CN102969000B (en) Multi-channel speech enhancement method
CN103854662A (en) Self-adaptation voice detection method based on multi-domain joint estimation
CN104464728A (en) Speech enhancement method based on Gaussian mixture model (GMM) noise estimation
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
CN106653004B (en) Speaker identification feature extraction method for sensing speech spectrum regularization cochlear filter coefficient
CN103971697B (en) Sound enhancement method based on non-local mean filtering
Chang et al. Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction
CN103886859B (en) Phonetics transfer method based on one-to-many codebook mapping
CN107895582A (en) A speaker-adaptive speech emotion recognition method for multi-source information domain
Li et al. A Convolutional Neural Network with Non-Local Module for Speech Enhancement.
CN104064197B (en) Method for improving speech recognition robustness on basis of dynamic information among speech frames
Roy et al. On supervised LPC estimation training targets for augmented Kalman filter-based speech enhancement
Tomchuk Spectral masking in MFCC calculation for noisy speech
Chougule et al. Channel robust MFCCs for continuous speech speaker recognition
Hepsiba et al. Computational intelligence for speech enhancement using deep neural network
Bawa et al. Spectral-warping based noise-robust enhanced children ASR system
Thomsen et al. Speech enhancement and noise-robust automatic speech recognition
Soni et al. Comparing front-end enhancement techniques and multiconditioned training for robust automatic speech recognition
Guo et al. Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170801

Termination date: 20210620