CN1188835C

CN1188835C - System and method for reducing noise

Info

Publication number: CN1188835C
Application number: CNB971824304A
Authority: CN
Inventors: A·P·毛罗
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 1997-09-02
Filing date: 1997-09-30
Publication date: 2005-02-09
Anticipated expiration: 2017-09-30
Also published as: US6122384A; CN1312938A; KR20010023579A; KR100546468B1

Abstract

A noise suppression system and method for a speech processing system (108). A gain estimator (220) determines the gain and noise suppression level for each input signal frame and then sets the gain to a predetermined minimum value. If there is speech in the frame, the adjuster (224) determines a gain factor for each channel of the predefined set of frequency channels. For each channel, the gain factor is a function of the speech SNR within the channel. The channel SNR is generated by the SNR estimator (210b) based on the channel energy estimate provided by the energy estimator (206b) and the channel noise energy estimate provided by the noise energy estimator (214b). The noise energy estimator (214b) updates the estimate of the speech-free frame periods determined by the speech detector (208).

Description

Noise suppression system and method

发明领域field of invention

本发明涉及语音处理。具体而言，本发明涉及用于语音处理的噪声抑制系统和方法。The present invention relates to speech processing. In particular, the present invention relates to noise suppression systems and methods for speech processing.

背景技术Background technique

利用数字技术传送语音正变得普遍起来，特别是在蜂窝电话和个人通信系统(PCS)应用中。这产生了改进语音处理技术的兴趣。一个正在改进的领域是噪声抑制技术。Digital transmission of voice is becoming common, especially in cellular telephone and personal communication systems (PCS) applications. This has generated interest in improving speech processing techniques. One area that is improving is noise suppression technology.

语音通信系统中的噪声抑制一般通过从所需语音中滤除环境背景噪声改进所需音频信号的总体质量。在环境背景噪声异常高的环境下(例如飞机、运动的交通工具或嘈杂的工厂)这种语音增强技术特别需要。Noise suppression in speech communication systems generally improves the overall quality of the desired audio signal by filtering out ambient background noise from the desired speech. Such speech enhancement techniques are especially needed in environments with unusually high ambient background noise, such as airplanes, moving vehicles, or noisy factories.

有一种噪声抑制技术是谱减除或谱增益修正技术。利用这种方法，输入的音频信号被划分为频率信道，并且特定的频率信道按照其噪声能量衰减。每种频率信道的背景噪声估值被用来产生信道内语音的信噪比(SNR)，并且SNR被用来计算每个信道的增益因子。随后增益因子确定特定信道衰减。衰减的信道被重新组合以产生噪声抑制的输出信号。One noise suppression technique is spectral subtraction or spectral gain correction. With this method, an input audio signal is divided into frequency channels, and specific frequency channels are attenuated according to their noise energy. The background noise estimate for each frequency channel is used to generate a signal-to-noise ratio (SNR) for speech within the channel, and the SNR is used to calculate a gain factor for each channel. The gain factor then determines the specific channel attenuation. The attenuated channels are recombined to produce a noise-suppressed output signal.

在涉及较高背景噪声环境的特定应用中，大多数噪声抑制技术在性能上受到明显的限制。这种应用的一个例子是蜂窝移动通信系统的车载扬声电话选项。该扬声电话选项为车辆司机提供免提操作。免提耳机一般离用户很远(例如安装在头盔上)。由于道路和刮风引起的噪声，距离较远的耳机向陆基方传送的信号的SNR较差。虽然在陆基端接收的语音通常是清楚的，但是连续处于这种背景噪声常常会增加听者的疲劳。In certain applications involving high background noise environments, most noise suppression techniques are significantly limited in performance. An example of such an application is a car speakerphone option for a cellular mobile communication system. This speakerphone option provides hands-free operation for the driver of the vehicle. Hands-free headsets are typically located far away from the user (eg mounted on a helmet). Headsets farther away have poorer SNR for signals transmitted to land-based parties due to road and wind-induced noise. While speech received at the ground-based end is usually clear, the constant exposure to this background noise often increases listener fatigue.

对于工作正常的噪声抑制系统，重要是精确确定语音的SNR。但是由于当前所用噪声检测器的局限，难以精确确定语音信号的SNR。谱减除技术在语音不出现时更新背景噪声估值。当语音不出现时，将测得的谱能量归因于噪声，并且根据测得的谱能量更新噪声估值。因此，为了获得精确的噪声能量估值以计算SNR，区分语音存在周期与语音不存在期间很重要。For a noise suppression system to work properly, it is important to accurately determine the SNR of speech. However, due to limitations of currently used noise detectors, it is difficult to accurately determine the SNR of speech signals. Spectral subtraction techniques update background noise estimates when speech is not present. When speech is not present, the measured spectral energy is attributed to noise, and the noise estimate is updated based on the measured spectral energy. Therefore, in order to obtain an accurate estimate of the noise energy to calculate the SNR, it is important to distinguish periods of speech presence from periods of speech absence.

一种示意性的语音检测技术采用语音度量计算器完成噪声更新判定。语音度量是对信道能量总体语音类特性的量度。首先，原始的SNR估值被用来建立语音度量表索引以获得每个信道的语音度量值。对单个信道语音度量值求和以产生能量参数，它与背景噪声更新阈值进行比较。如果语音度量之和等于或大于阈值，则信号被称为包含语音。如果语音度量之和小于阈值，输入帧被视为噪声，并且完成背景噪声更新。但是在高背景噪声、突发性背景噪声或逐渐增大的噪声源的情况下，SNR测量将很大，导致较高的语音度量，从而阻止了噪声估值的更新。An exemplary speech detection technique uses a speech metric calculator to perform noise update decisions. The speech metric is a measure of the overall speech-like properties of the channel energy. First, the raw SNR estimates are used to index the speech metrics table to obtain speech metrics for each channel. The individual channel speech metrics are summed to produce an energy parameter, which is compared to a background noise update threshold. If the sum of the speech metrics is equal to or greater than a threshold, the signal is said to contain speech. If the sum of the speech metrics is less than a threshold, the input frame is considered noisy and the background noise update is done. But in the case of high background noise, sudden background noise, or gradually increasing noise sources, the SNR measurement will be large, resulting in a higher speech metric, preventing the update of the noise estimate.

对语音度量计算器技术的进一步改进是测量信道能量偏差。该方法假定噪声在时间上具有恒定的谱能量，而语音在时间上具有变化的谱能量。因此对信道能量在时间上积分，并且如果有较大的信道能量偏差则检测出语音，而如果只有较小的信道能量偏差则检测出噪声。测量信道能量偏差的语音检测器将检测出噪声突发性的增大。但是当输入语音信号能量恒定时信道能量偏差方法提供了不精确的结果。而且对于噪声源逐渐增大的情况，输入能量的变化将导致能量偏差较大，即使需要更新也会阻止噪声估值更新。A further improvement to the Speech Metric Calculator technique is the measurement of channel energy deviation. The method assumes that noise has constant spectral energy over time, while speech has varying spectral energy over time. The channel energy is thus integrated over time, and speech is detected if there is a large channel energy deviation, and noise is detected if there is only a small channel energy deviation. A speech detector that measures deviations in channel energy will detect sudden increases in noise. But the channel energy bias method provides inaccurate results when the input speech signal energy is constant. And for the case where the noise source is gradually increasing, the change of the input energy will lead to a large energy deviation, which will prevent the noise estimate from being updated even if it needs to be updated.

除了精确的语音检测器以外，语音抑制系统必需适当地调整信道增益。应该调整信道增益从而在不牺牲语音质量的前提下抑制噪声。信道增益调整的其中一个方法是将增益作为语音信号的总噪声估值和SNR的函数计算。一般情况下，总噪声估值的增加导致给定SNR增益因子的降低。降低的增益因子表明衰减因子较大。该技术施加最小的增益值以防止在总噪声估值非常大时信道增益过度衰减。通过利用硬嵌位的最小增益值，在噪声抑制与语音质量之间找到了折衷。当嵌位较低时，噪声抑制得到了改进但是语音质量变差。当嵌位较高时，噪声抑制变差但是语音质量得到改进。第4811404号美国专利揭示了一种在高背景噪声环境中抑制背景噪声的方法和设备。该美国专利包括增加信噪比(SNR)阈值机构以减小由于对增益表的增益提高进行偏移而产生的背景噪声直到达到某个SNR阈值。In addition to an accurate speech detector, a speech suppression system must properly adjust the channel gain. Channel gain should be adjusted to suppress noise without sacrificing speech quality. One method of channel gain adjustment is to calculate the gain as a function of the total noise estimate and the SNR of the speech signal. In general, an increase in the total noise estimate results in a decrease in the gain factor for a given SNR. A reduced gain factor indicates a larger attenuation factor. This technique applies a minimum gain value to prevent excessive attenuation of the channel gain when the total noise estimate is very large. By utilizing a hard-clamped minimum gain value, a compromise is found between noise suppression and speech quality. When the inlay is lower, noise suppression is improved but speech quality is degraded. When the clamping is higher, the noise suppression is worse but the speech quality is improved. US Patent No. 4811404 discloses a method and apparatus for suppressing background noise in a high background noise environment. This US patent includes an increasing signal-to-noise ratio (SNR) threshold mechanism to reduce background noise due to offsetting the gain increase of the gain table until a certain SNR threshold is reached.

为了提供改进的噪声抑制系统，需要解决语音检测和信道增益计算的当前技术的限制。这些问题和缺陷由本发明按照下述方式解决。In order to provide an improved noise suppression system, limitations of current techniques for speech detection and channel gain calculation need to be addressed. These problems and disadvantages are solved by the present invention in the following manner.

发明内容Contents of the invention

本发明是一种用于语音处理系统的噪声抑制系统和方法。本发明的目标是提供一种确定输入信号中是否存在语音的语音检测器。为了精确确定语音的信噪比(SNR)，需要可靠的语音检测器。当判断语音不存在时，认为输入信号完全是噪声信号，并且可以测量噪声能量。随后利用噪声能量确定SNR。本发明另一个目标是提供改进的增益确定单元以抑制噪声。The present invention is a noise suppression system and method for a speech processing system. It is an object of the present invention to provide a speech detector for determining the presence or absence of speech in an input signal. To accurately determine the signal-to-noise ratio (SNR) of speech, a reliable speech detector is required. When it is judged that speech does not exist, the input signal is considered to be completely a noise signal, and the noise energy can be measured. The noise energy is then used to determine the SNR. Another object of the present invention is to provide an improved gain determination unit to suppress noise.

按照本发明，噪声抑制系统包括确定输入信号帧内语音是否存在的语音检测器。可以根据输入信号中语音的SNR量度判断语音。SNR估值器根据能量估值器产生的信号能量估值和噪声能量估值器产生的噪声能量估值估计SNR。也可以根据输入信号编码速率判断语音。在可变速率通信系统中，每个输入帧根据输入帧的内容被指定一个从预设速率组内选定的编码速率。通常情况下，速率取决于语音活动水平，因此包含语音的帧将被指定较高的速率，而不包含语音的帧将被指定较低的速率。而且可以根据一个或更多的表征输入信号特征的模式测量判断语音。如果判断输入帧内没有语音，则噪声能量估值器更新噪声能量估值。In accordance with the present invention, a noise suppression system includes a speech detector for determining the presence or absence of speech within a frame of an input signal. Speech can be judged from the SNR measure of the speech in the input signal. The SNR estimator estimates the SNR based on the signal energy estimate produced by the energy estimator and the noise energy estimate produced by the noise energy estimator. The voice can also be judged according to the coding rate of the input signal. In a variable rate communication system, each incoming frame is assigned an encoding rate selected from a set of preset rates based on the content of the incoming frame. Typically, the rate depends on the level of speech activity, so frames containing speech will be assigned a higher rate, and frames that do not contain speech will be assigned a lower rate. Furthermore, the speech may be judged based on one or more pattern measurements characterizing the input signal. If it is determined that there is no speech in the input frame, the noise energy estimator updates the noise energy estimate.

信道增益估值器确定输入信号帧的增益。如果帧内没有语音，则增益设定为预设的最小值。否则，根据帧的频率内容确定增益。在较佳实施例中，确定每组预定义频率信道的增益因子。对于每个信道，根据信道内语音SNR确定增益。对于每个信道，利用适于信道所在频带特性的函数定义增益。一般而言，对于预定义的频带，将增益设定为随SNR增大而线性增大。此外，每个频带的最小增益可以根据环境特性调整。例如可以实施用户可选的最小增益。根据能量估值器生成的信道能量估值和噪声能量估值器生成的信道噪声能量估值确定信道的SNR。利用增益因子调整不同信道内信号的增益，并且组合增益被调整的信道以产生噪声抑制的输出信号。A channel gain estimator determines the gain for a frame of the input signal. If there is no speech in the frame, the gain is set to the preset minimum value. Otherwise, the gain is determined based on the frequency content of the frame. In a preferred embodiment, a gain factor is determined for each set of predefined frequency channels. For each channel, the gain is determined based on the in-channel speech SNR. For each channel, the gain is defined using a function appropriate to the characteristics of the frequency band in which the channel resides. In general, for a predefined frequency band, the gain is set to increase linearly with increasing SNR. In addition, the minimum gain of each frequency band can be adjusted according to the environmental characteristics. For example a user selectable minimum gain may be implemented. The SNR of the channel is determined based on the channel energy estimate generated by the energy estimator and the channel noise energy estimate generated by the noise energy estimator. The gains of the signals in the different channels are adjusted using the gain factors, and the gain-adjusted channels are combined to produce a noise-suppressed output signal.

根据本发明，提供一种用于抑制引起信号背景噪声的噪声抑制器，包括：信噪比估值器，用于产生所述音频信号第一预定义频率信道组的信道SNR估值；增益估值器，用于根据对应的一个所述信道SNR估值器产生每个所述频率信道的增益因子，其中利用将增益因于定义为SNR增函数的增益函数得出所述增益因子；增益调整器，用于根据一个所述对应增益因于调整每个所述频率信道的增益水平；以及语音检测器，用于确定所述音频信号中语音的存在，其中所述语音检测器利用另一信噪比估值器和速率判定单元确定所述音频信号的一组可变速率的编码速率，以检测语音的存在。According to the present invention, there is provided a noise suppressor for suppressing background noise causing a signal, comprising: a signal-to-noise ratio estimator for generating a channel SNR estimate for a first predefined frequency channel group of said audio signal; a gain estimate An estimator for generating a gain factor for each of the frequency channels according to a corresponding one of the channel SNR estimators, wherein the gain factor is obtained by using a gain function defined as an SNR increasing function; gain adjustment a device for adjusting the gain level of each of said frequency channels based on one of said corresponding gain factors; and a speech detector for determining the presence of speech in said audio signal, wherein said speech detector utilizes another signal A noise ratio estimator and rate decision unit determines a set of variable rate coding rates for said audio signal to detect the presence of speech.

根据本发明，还提供一种用于抑制音频信号背景噪声的方法，包括以下步骤：将所述语音信号变换为所述音频信号的频率表示：检测与所述音频信号相关的编码速率；根据所述音频信号的编码速率确定所述音频信号内是否存在语音；产生所述频率表示的预定义频率信道组的信道信噪比(SNR)估值；如果确定所述音频信号内存在语音则确定每个所述频率信道的增益因子，其中为一组频带的每一个定义了增益函数，并且为每个所述频带定义随SNR增大而增大的增益回子，因此对于每个所述频率信道，信道增益因子根据范围包含频率信道的频带的增益函数确定；根据所述对应的信道增益因子调整每个所述频率信道的增益水平：以及逆变换所述增益调整频率表示以产生噪声抑制的音频信号。According to the present invention, there is also provided a method for suppressing background noise of an audio signal, comprising the steps of: transforming said speech signal into a frequency representation of said audio signal; detecting a coding rate associated with said audio signal; Determine whether speech is present in the audio signal at the encoding rate of the audio signal; generate channel signal-to-noise ratio (SNR) estimates for a predefined set of frequency channels represented by the frequency; determine if speech is present in the audio signal for each Gain factors for each of the frequency channels, where a gain function is defined for each of a set of frequency bands, and for each of the frequency bands a gain factor that increases with increasing SNR is defined, so for each of the frequency channels , a channel gain factor determined from a gain function ranging over a frequency band containing frequency channels; adjusting a gain level for each of said frequency channels according to said corresponding channel gain factor; and inverse transforming said gain-adjusted frequency representation to produce noise-suppressed audio Signal.

附图简要说明Brief description of the drawings

通过以下附图对本发明的描述可以进一步理解本发明的特征、目标和优点，附图中相同的部分用相同的标号表示，其中：The features, objectives and advantages of the present invention can be further understood by the description of the present invention in the following drawings, wherein the same parts are represented by the same reference numerals in the accompanying drawings, wherein:

图1为利用噪声抑制器的通信系统框图；Figure 1 is a block diagram of a communication system utilizing a noise suppressor;

图2为按照本发明的噪声抑制器框图；Fig. 2 is a block diagram according to the noise suppressor of the present invention;

图3为按照本发明的实现噪声抑制的基于频率的增益因子图；以及FIG. 3 is a graph of frequency-based gain factors for achieving noise suppression in accordance with the present invention; and

图4为图2处理单元实施的噪声抑制中处理步骤实施例的流程图。FIG. 4 is a flowchart of an embodiment of processing steps in noise suppression performed by the processing unit of FIG. 2 .

实施发明的较佳方式Best Mode of Carrying Out the Invention

在语音通信系统中，通常利用噪声抑制器抑制不需要的环境背景噪声。大多数噪声抑制器通过估计一个或多个频带内的输入数据信号背景噪声特性并从输入信号中减除估值平均值实现抑制操作。平均背景噪声的估值在没有语音期间更新。噪声抑制器需要精确判断背景噪声水平以进行正确的操作。此外，噪声抑制水平必需根据输入信号的语音和噪声特性正确调整。这些要求由本发明的噪声抑制系统解决。In voice communication systems, noise suppressors are often used to suppress unwanted ambient background noise. Most noise suppressors perform suppression by estimating the background noise characteristics of the input data signal in one or more frequency bands and subtracting the estimated average from the input signal. The estimate of average background noise is updated during periods of no speech. Noise suppressors require accurate judgment of background noise levels for proper operation. In addition, the level of noise suppression must be properly adjusted according to the speech and noise characteristics of the input signal. These requirements are addressed by the noise suppression system of the present invention.

图1示出了按照本发明的示意性语音处理系统100。系统100包含耳机102、A/D转换器104、语音处理器106、发射机110和天线112。耳机102可以与图1其他单元一起位于蜂窝电话内。耳机102也可以是蜂窝通信系统车载扬声电话选项的免提耳机。车载扬声电话组件有时称为车用套件(carkit)。在耳机102是车用套件一部分的场合，噪声抑制功能特别重要。由于免提耳机一般位于离用户一定距离的位置，所以由于道路和刮风的原因，接收到的声音信号的语音SNR总是较差。Figure 1 shows an exemplary speech processing system 100 according to the present invention. System 100 includes headset 102 , A/D converter 104 , speech processor 106 , transmitter 110 and antenna 112 . Headset 102 may be located within the cellular telephone along with the other elements of FIG. 1 . The headset 102 may also be a hands-free headset of the speakerphone option of a cellular communication system. A car speakerphone kit is sometimes called a carkit. Noise suppression is particularly important where the headset 102 is part of a car kit. Since the hands-free headset is generally located at a certain distance from the user, the speech SNR of the received sound signal is always poor due to road and wind.

参见图1，耳机102接收包含语音和/或背景噪声的输入音频信号。输入音频信号由耳机102转换为项s(t)表示的电声信号。电声信号可以由模拟-数字转换器104从模拟信号转换为脉冲编码调制(PCM)样本。在示意性实施例中，PCM样本以64kbps由A/D转换器104输出并且如图1所示用信号s(n)表示。数字信号s(n)由包含其他单元一道的噪声抑制器108的语音处理器106接收。噪声抑制器108按照本发明抑制信号s(n)中的噪声。在车用件应用中，噪声抑制器108确定背景环境噪声的水平并调整信号增益以减弱这种环境噪声的影响。除了噪声抑制器108以外，语音处理器106一般还包含语音编码器或声码器(未画出)，它通过提取与人声产生模型有关的参数压缩语音。语音处理器106也可以包含回声抵消器(未画出)，它消除扬声器(未画出)与耳机102之间反馈引起的声音回波。Referring to FIG. 1 , a headset 102 receives an input audio signal containing speech and/or background noise. The input audio signal is converted by the earphone 102 into an electro-acoustic signal represented by the term s(t). The electro-acoustic signal may be converted from an analog signal to pulse code modulation (PCM) samples by an analog-to-digital converter 104 . In the exemplary embodiment, the PCM samples are output by A/D converter 104 at 64 kbps and are represented by signal s(n) as shown in FIG. 1 . The digital signal s(n) is received by a speech processor 106 which includes, among other things, a noise suppressor 108 . Noise suppressor 108 suppresses noise in signal s(n) according to the invention. In automotive applications, the noise suppressor 108 determines the level of background ambient noise and adjusts the signal gain to attenuate the effects of such ambient noise. In addition to noise suppressor 108, speech processor 106 typically includes a speech encoder or vocoder (not shown), which compresses speech by extracting parameters related to the production model of the human voice. Speech processor 106 may also include an echo canceller (not shown), which cancels sound echoes caused by feedback between a speaker (not shown) and earphone 102 .

在语音处理器106处理之后，信号被提供给发射机110，它根据诸如码分多址(CDMA)、时分多址(TDMA)或频分多址(FDMA)之类的预设格式完成调制。在示意性的实施例中，发射机110根据题为“利用卫星或陆基中继器的扩展频谱多址通信系统”的美国专利No.4,901,307所述的CDMA调制格式调制信号，该专利作为参考文献包含在这里。发射机随后上变频和放大调制信号，并且通过天线112发送调制信号。After processing by speech processor 106, the signal is provided to transmitter 110, which performs modulation according to a preset format such as Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), or Frequency Division Multiple Access (FDMA). In the exemplary embodiment, transmitter 110 modulates signals according to the CDMA modulation format described in U.S. Patent No. 4,901,307, entitled "Spread Spectrum Multiple Access Communication System Utilizing Satellite or Ground-Based Repeaters," which is incorporated by reference it's here. The transmitter then upconverts and amplifies the modulated signal, and sends the modulated signal through antenna 112 .

应该认识到，噪声抑制器108可以在不同于图1系统100的语音处理系统内实施。例如噪声抑制器108可以在包含语音邮件选项的电子邮件应用中使用。对于这类应用，图1的发射机110和天线112不再需要。相反噪声抑制信号由语音处理器106格式化以通过电子邮件网络传输。It should be appreciated that noise suppressor 108 may be implemented within a speech processing system other than system 100 of FIG. 1 . For example, noise suppressor 108 may be used in an email application that includes a voicemail option. For such applications, the transmitter 110 and antenna 112 of FIG. 1 are no longer required. Instead the noise suppressed signal is formatted by speech processor 106 for transmission over the email network.

图2示出了噪声抑制器108的实施例。如图2所示，输入的音频信号由预处理器202接收。预处理器202通过预加重和帧生成制作用于噪声抑制的输入信号。预加重通过加强信号高频语音分量对语音信号功率谱密度重新分配。预加重基本上完成的是高通滤波功能，加强了重要的语音分量以提高频域内三个分量的SNR。预处理器202也可以从输入信号样本中产生帧。在较佳实施例中，产生了80样本/帧的10微秒帧。为了使处理精度更高，帧可以包含交叠的样本。通过窗口化和对输入信号的样本加零产生帧。预处理信号被提供给变换单元204。在较佳实施例中，变换单元204对每帧输入信号产生128个点的快速傅立叶变换(FFT)。但是应该理解的是，可以采用其他手段来分析输入信号的频率分量。FIG. 2 shows an embodiment of the noise suppressor 108 . As shown in FIG. 2 , an input audio signal is received by a pre-processor 202 . The pre-processor 202 prepares the input signal for noise suppression by pre-emphasis and frame generation. Pre-emphasis redistributes the power spectral density of the speech signal by enhancing the high-frequency speech components of the signal. The pre-emphasis basically completes the high-pass filtering function, which strengthens the important speech components to improve the SNR of the three components in the frequency domain. Preprocessor 202 may also generate frames from input signal samples. In the preferred embodiment, 10 microsecond frames of 80 samples/frame are generated. Frames can contain overlapping samples for greater processing precision. Frames are generated by windowing and zeroing the samples of the input signal. The preprocessed signal is provided to a transform unit 204 . In a preferred embodiment, the transform unit 204 generates a 128-point Fast Fourier Transform (FFT) for each frame of the input signal. However, it should be understood that other means may be used to analyze the frequency components of the input signal.

变换分量被提供给信道能量估值器206a，它产生N个变换信号信道的每一个的能量估值。对于每个信道，用于更新信道能量的一种技术将当前帧能量对于当前帧信道能量作平滑更新估计如下：The transformed components are provided to a channel energy estimator 206a, which produces energy estimates for each of the N transformed signal channels. For each channel, one technique for updating the channel energy is to estimate the smooth update of the current frame energy to the current frame channel energy as follows:

E_u(t)＝αE_ch+(1-α)E_u(t-1) (1)E _u (t)=αE _ch +(1-α)E _u (t-1) (1)

这里更新的估值E_u(t)被定义为当前信道能量E_ch和先前估计信道噪声能量E_u(t-1)的函数。实施例设定α＝0.55。Here the updated estimate E _u (t) is defined as a function of the current channel energy E _ch and the previously estimated channel noise energy E _u (t-1). The embodiment sets α = 0.55.

较佳实施例确定低频信道的能量估值和高频信道的能量估值，从而使N＝2。低频信道对应250～2250Hz的频率，而高频信道噪音2250～3500Hz的频率。低频信道的当前信道能量可以通过求和对应250～2250Hz的FFT点能量而确定，高频信道的当前信道能量可以通过求和对应2250～3500Hz的FFT点能量确定。The preferred embodiment determines an energy estimate for the low frequency channel and an energy estimate for the high frequency channel such that N=2. The low-frequency channel corresponds to a frequency of 250-2250 Hz, while the high-frequency channel is noisy at a frequency of 2250-3500 Hz. The current channel energy of the low-frequency channel can be determined by summing the FFT point energy corresponding to 250-2250 Hz, and the current channel energy of the high-frequency channel can be determined by summing the FFT point energy corresponding to 2250-3500 Hz.

能量估值被提供给语音检测器208，它确定接收的语音信号中是否有语音。语音检测器208的SNR估值器210a接收能量估值。SNR估值器210a根据信道能量估值和信道噪声能量估值确定N个信道的每一个的语音信噪比(SNR)。信道噪声能量估值由噪声能量估值器214a提供，通常对应在不包含语音的先前帧上平滑的估计噪声能量。The energy estimate is provided to speech detector 208, which determines whether speech is present in the received speech signal. SNR estimator 210a of speech detector 208 receives the energy estimate. SNR estimator 210a determines a speech signal-to-noise ratio (SNR) for each of the N channels based on the channel energy estimate and the channel noise energy estimate. The channel noise energy estimate is provided by the noise energy estimator 214a and typically corresponds to the estimated noise energy smoothed over previous frames that did not contain speech.

语音检测器208还包括速率判断单元212，它从预设的数据率组选择输入信号的数据率。在某些通信系统中，数据被编码使得数据率可以逐帧改变。这称为变速率通信系统。根据可变速率方案编码数据的语音编码器一般称为可变速率声码器。可变速率声码器的实施例参见题为“可变速率声码器”的美国专利No.5,414,796，它作为参考文献包含在本发明中。当没有有用语音发送时利用可变速率通信信道消除了不必要的传输。在声码器内部，根据语音活动性的变化，利用算法产生每帧内信息位数变化的速率。例如带一组四种速率的声码器可以根据讲话者的活动性产生包含16、40、80或171个信息位的20毫秒数据帧。需要通过改变通信传输速率在固定时间内发送每个数据帧。The voice detector 208 also includes a rate determination unit 212 that selects the data rate of the input signal from a set of preset data rates. In some communication systems, data is encoded such that the data rate can vary from frame to frame. This is called a variable rate communication system. A speech encoder that encodes data according to a variable rate scheme is generally referred to as a variable rate vocoder. An example of a variable rate vocoder is found in US Patent No. 5,414,796, entitled "Variable Rate Vocoder," which is incorporated herein by reference. Utilizing a variable rate communication channel eliminates unnecessary transmissions when there is no useful speech to send. Inside the vocoder, an algorithm is used to generate the rate at which the number of bits of information changes per frame, based on changes in speech activity. For example a vocoder with a set of four rates can generate 20 millisecond data frames containing 16, 40, 80 or 171 information bits depending on the activity of the speaker. Each data frame needs to be sent within a fixed time by varying the communication transmission rate.

由于帧速率依赖于时间帧期间的语音活动性，所以速率的确定提供了语音是否存在的信息。在利用变速率的系统中，判断帧是否应该以最高速率编码通常指示了语音的存在，而判断帧是否应该以最低速率编码通常指示了语音的不存在。中等速率一般指示在语音存在与不存在之间的过渡。Since the frame rate is dependent on the speech activity during the time frame, the determination of the rate provides information on whether speech is present or not. In systems utilizing variable rates, a determination of whether a frame should be encoded at the highest rate generally indicates the presence of speech, while a determination of whether a frame should be encoded at the lowest rate generally indicates the absence of speech. Moderate rates generally indicate transitions between the presence and absence of speech.

速率判断单元212可以用许多速率判断算法实施。在1999年6月8日授权的题为“用于降低可变速率声编码的方法和装置”的美国专利No.5,911,128中揭示了这样一种速率判断算法，它作为参考文献包含在本发明中。该技术提供了称为模式量度的一组速率判断判据。第一种模式量度是来自先前编码帧的目标匹配信噪比(TMSNR)，它提供了有关如何更好地通过将合成的语音信号与输入语言信号比较完成编码模型的信息。第二种模式量度是归一化自相关函数(NACF)，它测量了语音帧中的周期性。第三种模式量度是零交叉(ZC)参数，它测量了输入语音帧内的高频内容。第四种模式量度为预测增益差分(PGD)，它确定编码器是否保持其预测效率。第五种模式量度是能量差分(ED)，它将当前帧内的能量与平均帧能量进行比较。利用这些模式量度，速率判断逻辑选择输入帧的编码速率。Rate decision unit 212 may be implemented with any number of rate decision algorithms. Such a rate decision algorithm is disclosed in U.S. Patent No. 5,911,128, entitled "Method and Apparatus for Reduced Variable Rate Vocoding," issued June 8, 1999, which is incorporated herein by reference . This technique provides a set of rate decision criteria called mode metrics. The first pattern metric is the target matching signal-to-noise ratio (TMSNR) from previously encoded frames, which provides information on how well the encoding model can be completed by comparing the synthesized speech signal with the input speech signal. The second pattern metric is the normalized autocorrelation function (NACF), which measures periodicity in speech frames. The third mode metric is the zero-crossing (ZC) parameter, which measures the high-frequency content within the input speech frame. The fourth mode metric is the Prediction Gain Difference (PGD), which determines whether the encoder maintains its predictive efficiency. The fifth mode metric is Energy Difference (ED), which compares the energy within the current frame to the average frame energy. Using these mode metrics, the rate decision logic selects an encoding rate for the incoming frames.

应该理解的是，虽然图2示出了速率判断单元212作为噪声抑制器108的单元包含在其中，但是速率信息也可以由语音处理器106另一单元提供给噪声抑制器108(图1)。例如语音处理器106可以包含可变速率声码器(未画出)，它判断输入信号每帧的编码速率。代之以噪声抑制器108独立完成速率判断，可以由可变速率声码器向噪声抑制器108提供速率信息。It should be understood that although FIG. 2 shows the rate determination unit 212 included as a unit of the noise suppressor 108, the rate information may also be provided to the noise suppressor 108 (FIG. 1) by another unit of the speech processor 106. For example, speech processor 106 may include a variable rate vocoder (not shown), which determines the encoding rate per frame of the input signal. Instead of noise suppressor 108 making the rate determination independently, the rate information may be provided to noise suppressor 108 by a variable rate vocoder.

应该理解的是，代之以利用速率判断来确定语音的存在，语音检测器208可以采用与速率判断有关的模式量度子集。例如速率判断单元212可以由NACF单元代替(未画出)，它如上所述测量了语音帧内的周期性。NACF根据下列关系估值：It should be appreciated that instead of utilizing the rate judgment to determine the presence of speech, speech detector 208 may employ a subset of the mode metrics associated with the rate judgment. For example the rate decision unit 212 can be replaced by a NACF unit (not shown), which measures the periodicity within speech frames as described above. NACF is valued according to the following relationship:

$NACF NACF = = \frac{\overset{max max}{T T &Element; &Element; [[{t t}_{11},, {t t}_{22}]]} {{{Σ Σ}_{n no = = 00}^{N N - - 11} e e ((n no)) \cdot \cdot e e ((n no - - T T))}}}{0.5 0.5 \cdot \cdot {Σ Σ}_{n no = = 00}^{N N - - 11} {{{e e}^{22} ((n no)) + + {e e}^{22} ((n no - - T T))}}} - - - - ((22))$

这里N为语音帧的样本数，t1和t2为用来估计NACF的T个样本内的边界。NACF根据共振峰残余信号e(n)估计NACF。共振峰频率为语音的共振频率。采用短周期滤波器滤波语音信号以获得共振峰频率。利用短周期滤波器滤波后的残余信号为共振峰残余信号，并包含长周期语音信息，例如信号音调。Here N is the number of samples of the speech frame, and t1 and t2 are boundaries within T samples used to estimate NACF. NACF estimates NACF from the formant residual signal e(n). The formant frequency is the resonant frequency of speech. The speech signal is filtered with a short period filter to obtain the formant frequencies. The residual signal filtered by the short-period filter is a formant residual signal and contains long-period speech information, such as signal pitch.

由于包含在语音信号内的信号的周期性与不包含在语音信号内的信号的周期性不同，所以NACF模式量度适于确定语音的存在与否。语音信号总是具有周期性分量的特征。当语音不存在时，信号一般不具有周期性分量。因此NACF量度是较好的指示器，可以为语音检测器208所用。Since the periodicity of a signal contained within a speech signal is different from that of a signal not contained within a speech signal, the NACF pattern metric is suitable for determining the presence or absence of speech. Speech signals are always characterized by periodic components. When speech is absent, the signal generally has no periodic component. The NACF metric is therefore a better indicator and can be used by the speech detector 208 .

语音检测器208可以采用诸如NACF之类的量度代替无法产生速率判断情况下的速率判断。例如，如果不能从可变速率声码器得到速率判断，并且噪声处理器108不具备产生自身速率判断的处理能力，则诸如NACF之类的模式量度提供了所需的选择。这可能是处理能力受到限制的车用件应用场合。Speech detector 208 may employ a metric such as NACF in place of a rate determination where no rate determination can be made. For example, if a rate call is not available from a variable rate vocoder, and the noise processor 108 does not have the processing capability to generate its own rate call, then a mode metric such as NACF provides the desired option. This could be an automotive application where processing power is constrained.

此外应该理解的是，语音检测器208可以单独根据速率判断、模式量度或SNR估值作出语音是否存在的判断。虽然增加量度应该可改进判断的精度，但是单独一个量度已经可以得到合适的结果。Furthermore, it should be understood that speech detector 208 may make a determination of whether speech is present based solely on rate determinations, mode metrics, or SNR estimates. While adding metrics should improve the precision of the judgment, a single metric already yields adequate results.

速率判断(或模式量度)和SNR估值器210a生成的SNR估值被提供给语音判断单元216。语音判断单元216根据其输入产生输入信号中语音是否存在的判断。有关语音是否存在的判断将决定是否应该更新噪声能量估值。噪声能量估值被SNR估值器210a用来确定输入信号中语音的SNR。SNR又被用来计算噪声抑制的输入信号衰减水平。如果判断存在语音，则语音判断单元216打开开关218a，防止噪声能量估值器214a更新噪声能量估值。如果判断不存在语音，则假定输入信号为噪声，并且语音判断单元216关闭开关218a，使噪声能量估值器214a更新噪声估值。虽然图2所示的为开关218a，但是应该理解的是语音判断单元216向噪声能量估值器214a提供的使能信号可以完成同样的功能。The rate decision (or mode metric) and the SNR estimate generated by the SNR estimator 210 a are provided to the speech decision unit 216 . The speech judgment unit 216 generates a judgment of whether speech is present in the input signal based on its input. The presence or absence of speech determines whether the noise energy estimate should be updated. The noise energy estimate is used by SNR estimator 210a to determine the SNR of speech in the input signal. SNR is used in turn to calculate the level of input signal attenuation for noise suppression. If it is determined that there is speech, the speech determination unit 216 turns on the switch 218a, preventing the noise energy estimator 214a from updating the noise energy estimate. If it is determined that there is no speech, the input signal is assumed to be noise, and the speech determination unit 216 closes the switch 218a, causing the noise energy estimator 214a to update the noise estimate. Although the switch 218a is shown in FIG. 2, it should be understood that the enable signal provided by the speech determination unit 216 to the noise energy estimator 214a can perform the same function.

在较佳实施例中，估值的是两个信道的SNR，语音判断单元216根据下列程序产生噪声更新判断：In a preferred embodiment, what is estimated is the SNR of two channels, and the speech judgment unit 216 generates a noise update judgment according to the following procedure:

           
if(rate＝＝min)

        if((chsnr1＞T1)OR(chsnr2＞T2))

            if(ratecount＞T3)

                 update noise estimate

            else

                 ratecount++

         else

            update noise estimate

            ratecount＝0

    else

        ratecount＝0

if(rate==min)

        if((chsnr1>T1)OR(chsnr2>T2))

            if(ratecount＞T3)

                 update noise estimate

            else

                 ratecount++

         else

            update noise estimate

            ratecount=0

    else

        ratecount=0

SNR估值器210a提供的信道SNR估值用chsnr1和chsnr2表示。由速率判断单元212提供的输入信号的速率用rate表示。计数器，即速率计数如下所述根据某些条件跟踪帧数。The channel SNR estimates provided by SNR estimator 210a are denoted chsnr1 and chsnr2. The rate of the input signal provided by the rate judging unit 212 is represented by rate. The counter, the rate count, tracks the number of frames based on certain conditions as described below.

语音判断单元216判断语音不存在并判断应该更新噪声估值，如果速率为可变速率中的最小速率，则chsnr1大于阈值T1或chsnr2大于阈值T2，并且速率计数大于阈值T3。如果速率最小，并且chsnr1大于阈值T1或chsnr2大于阈值T2，但是速率计数小于阈值T3，则速率计数增一但是不更新噪声估值。计数器，即速率计数通过对具有最小速率但是至少在一个信道中具有高能量的帧的计数，检测出噪声的突发性增加水平或者逐渐增大的噪声源。提供高SNR信号不包含语音的指示器的计数器被设定为计数直到信号内检测到语音。较佳实施例设定T1＝T2＝5dB，而T2＝100帧，这里是对10毫秒的帧估值。The voice judging unit 216 judges that the voice does not exist and judges that the noise estimate should be updated. If the rate is the minimum rate in the variable rate, then chsnr1 is greater than the threshold T1 or chsnr2 is greater than the threshold T2, and the rate count is greater than the threshold T3. If the rate is minimum, and chsnr1 is greater than threshold T1 or chsnr2 is greater than threshold T2, but the rate count is less than threshold T3, then the rate count is incremented but the noise estimate is not updated. Counter, ie rate counting By counting frames with a minimum rate but high energy in at least one channel, a sudden increasing level of noise or a gradually increasing noise source is detected. A counter providing an indicator that a high SNR signal does not contain speech is set to count until speech is detected within the signal. A preferred embodiment sets T1 = T2 = 5dB, and T2 = 100 frames, where a frame estimate of 10 milliseconds is used.

如果速率最小，则chsnr1小于T1，并且chsnr2小于T2，则语音判断单元216将确定语音不存在并且应该更新噪声估值。此外，速率计数复位为零。If the rate is minimal, chsnr1 is less than T1, and chsnr2 is less than T2, then speech determination unit 216 will determine that speech is absent and the noise estimate should be updated. Also, the rate count is reset to zero.

如果速率不是最小，则语音判断单元216将确定帧包含语音并且不更新噪声估值，但是速率计数复位为零。If the rate is not minimum, the speech decision unit 216 will determine that the frame contains speech and not update the noise estimate, but reset the rate count to zero.

代之以利用速率量度来判断语音的存在，可以采用诸如NACF之类的模式量度。语音判断单元216可以根据下列程序，利用NACF量度来确定语音的存在和噪声更新判断：Instead of using a rate metric to determine the presence of speech, a pattern metric such as NACF can be employed. Speech judgment unit 216 may utilize NACF metrics to determine the presence of speech and noise update judgments according to the following procedure:

           
if(pitchPresent＝＝FALSE)

        if((chsnr1＞TH1)OR(chsnr2＞TH2))

             if(pitchCount＞TH3)

                 update noise estimate

             else

                  pitchCount++

         else

              update noise estimate

              pitchCount＝0

    else

         pitchCount＝0

if(pitchPresent==FALSE)

        if((chsnr1>TH1)OR(chsnr2>TH2))

             if(pitchCount＞TH3)

                 update noise estimate

             else

                  pitchCount++

         else

              update noise estimate

              pitchCount=0

    else

         pitchCount=0

这里pitchPresent定义如下：Here pitchPresent is defined as follows:

           
if(NACF＞TT1)

        pitchPresent＝TRUE

        NACFcount＝0

    elseif(TT2≤NACF≤TT1)

        if(NACFcount＞TT3)

            pitchPresent＝TRUE

        else

            pitchPresent＝FALSE

            NACFcount++

    else

         pitchPresent＝FALSE

         NACFcount＝0

if(NACF>TT1)

        pitchPresent=TRUE

        NACFcount=0

    elseif(TT2≤NACF≤TT1)

        if(NACFcount＞TT3)

            pitchPresent=TRUE

        else

            pitchPresent=FALSE

            NACFcount++

    else

         pitchPresent=FALSE

         NACFcount=0

SNR估值器210a提供的信道SNR估值也用chsnr1和chsnr2表示。NACF单元(未画出)产生如上定义指示音调是否存在的量度pitchPresent。计数器，即pitchCount如下所述根据某些条件跟踪帧数。The channel SNR estimates provided by SNR estimator 210a are also denoted chsnr1 and chsnr2. A NACF unit (not shown) generates the metric pitchPresent as defined above indicating the presence or absence of a pitch. The counter, pitchCount, tracks the number of frames based on certain conditions as described below.

量度pitchPresent确定如果NACF大于阈值TT1则存在音调。如果NACF在大于阈值TT3的若干帧数的中间范围内(TT2≤NACF≤TT1)，则也确定存在音调。计数器，即NACFcount跟踪TT2≤NACF≤TT1的帧数。在较佳实施例中，TT1＝0.6，TT2＝0.4，并且TT3＝8帧，这里估值是对10毫秒的帧。The metric pitchPresent determines that a pitch is present if NACF is greater than a threshold TT1. A tone is also determined to be present if NACF is in the middle range of several frame numbers greater than threshold TT3 (TT2≦NACF≦TT1). A counter, NACFcount tracks the number of frames for which TT2≤NACF≤TT1. In a preferred embodiment, TT1 = 0.6, TT2 = 0.4, and TT3 = 8 frames, where estimates are for 10 millisecond frames.

语音判断单元216判断语音不存在并且应该更新噪声估值，如果pitchPresent量度指示音调不存在(pitchPresent＝False)，则chsnr1大于阈值TH1或chsnr2大于阈值TH2，并且pitchCount大于阈值TH3。如果pitchPresent＝False，并且chsnr1大于TH1或chsnr2大于TH2，但是pitchPresent小于TH3，则pitchPresent增一但是不更新噪声估值。计数器，即pitchCount用来检测噪声的突发性增加水平或者逐渐增大的噪声源。较佳实施例设定T1＝T2＝5dB，而T2＝100帧，这里的估值是10毫秒的帧。Speech judgment unit 216 judges that speech does not exist and should update the noise estimate, if the pitchPresent metric indicates that pitch does not exist (pitchPresent=False), then chsnr1 is greater than threshold TH1 or chsnr2 is greater than threshold TH2, and pitchCount is greater than threshold TH3. If pitchPresent = False, and chsnr1 is greater than TH1 or chsnr2 is greater than TH2, but pitchPresent is less than TH3, pitchPresent is incremented but the noise estimate is not updated. A counter, pitchCount, is used to detect a sudden increase in the level of noise or a gradually increasing noise source. The preferred embodiment sets T1 = T2 = 5dB, and T2 = 100 frames, where the estimate is a 10 ms frame.

如果pitchPresent指示不存在音调，并且chsnr1小于TH1和chsnr2小于TH2，则语音判断单元216将确定语音不存在和应该更新噪声估值。此外，pitchCount复位为零。If pitchPresent indicates that no pitch is present, and chsnr1 is less than TH1 and chsnr2 is less than TH2, then speech determination unit 216 will determine that speech is not present and the noise estimate should be updated. Also, pitchCount is reset to zero.

如果pitchPresent指示存在音调速率(pitchPresent＝TRUE)，则语音判断单元216将确定帧包含语音并且不更新噪声估值，但是pitchCount复位为零。If pitchPresent indicates that pitch rate is present (pitchPresent=TRUE), speech decision unit 216 will determine that the frame contains speech and not update the noise estimate, but reset pitchCount to zero.

在判断不存在语音的基础上，关闭开关218a，使噪声能量估值器214a更新噪声估值。噪声能量估值器214a一般对输入信号N个信道的每一个产生噪声能量估值。由于不存在语音，所以假定能量都是由噪声贡献的。对于每个信道，噪声能量更新被估计为当前信道能量对于不包含语音的先前帧信道能量的平滑。例如可以根据下述关系获得更新估值：On the basis of judging that there is no speech, the switch 218a is closed to allow the noise energy estimator 214a to update the noise estimate. Noise energy estimator 214a typically generates noise energy estimates for each of the N channels of the input signal. Since speech is absent, it is assumed that the energy is all contributed by noise. For each channel, the noise energy update is estimated as the smoothing of the current channel energy over the channel energy of previous frames that did not contain speech. An updated estimate can be obtained, for example, from the following relationship:

E_u(t)＝βE_ch+(1-β)E_u(t-1) (3)E _u (t)=βE _ch +(1-β)E _u (t-1) (3)

这里更新的估值E_u(t)被定义为当前信道能量E_ch和先前估计信道噪声能量E_u(t-1)的函数。实施例设定β＝0.1。更新的信道噪声能量估值被提供给SNR估值器210a。这些信道噪声能量估值将被用来获得输入信号下一帧的信道SNR估值更新。Here the updated estimate E _u (t) is defined as a function of the current channel energy E _ch and the previously estimated channel noise energy E _u (t-1). The embodiment sets β=0.1. The updated channel noise energy estimate is provided to SNR estimator 210a. These channel noise energy estimates will be used to obtain an update of the channel SNR estimate for the next frame of the input signal.

有关是否存在语音的判断也被提供给信道增益估值器220。信道增益估值器220确定输入信号帧的增益和噪声抑制水平。如果语音判断单元216已经判断语音不存在，则帧增益设定为预设的最小增益水平。否则，增益被确定为频率的函数。在较佳实施例中，根据图3的曲线计算增益。虽然图3为曲线形式，但是应该理解的是图3所示函数可以信道增益估值器220内查询表的形式实施。A determination as to whether speech is present is also provided to channel gain estimator 220 . Channel gain estimator 220 determines the gain and noise suppression level for the input signal frame. If the voice determination unit 216 has determined that the voice does not exist, the frame gain is set to a preset minimum gain level. Otherwise, the gain is determined as a function of frequency. In the preferred embodiment, the gain is calculated from the graph of FIG. 3 . Although FIG. 3 is in graphical form, it should be understood that the function shown in FIG. 3 may be implemented in the form of a look-up table within channel gain estimator 220 .

由图3可见，本发明的实施例为L个频带的每一个定义了各自的增益曲线。虽然L可以是任何大于等于1的数，但是在图3中为3个频带(L＝3)。因此低频带信道的增益因子可以利用低频带曲线确定，中频带信道的增益因子可以利用中频带曲线确定，而高频带信道的增益因子可以利用高频带曲线确定。It can be seen from FIG. 3 that the embodiment of the present invention defines a separate gain curve for each of the L frequency bands. Although L can be any number greater than or equal to 1, in FIG. 3 there are 3 frequency bands (L=3). Thus the gain factor of the low-band channel can be determined using the low-band curve, the gain factor of the mid-band channel can be determined using the mid-band curve, and the gain factor of the high-band channel can be determined using the high-band curve.

虽然可以只利用输入信号的一条增益曲线(L＝1)完成噪声抑制，但是利用多个频带可以减小语音质量下降。在环境噪声下(例如道路和刮风情况)，噪声信号的能量在低频段较高，并且能量通常随频率增加而减小。Although noise suppression can be accomplished using only one gain curve (L=1) of the input signal, speech quality degradation can be reduced by using multiple frequency bands. In ambient noise (such as road and windy conditions), the energy of the noise signal is higher at low frequencies and generally decreases with increasing frequency.

在图3中，斜率和y截距固定的线性方程被用来确定每种频带的增益因子。增益因子的确定可以用下列方程描述：In Figure 3, a linear equation with fixed slope and y-intercept is used to determine the gain factor for each frequency band. The determination of the gain factor can be described by the following equation:

gain[low band](dB)＝slope1*SNR+lowBandYintercept； (4)gain[low band](dB)=slope1*SNR+lowBandYintercept; (4)

gain[mid band](dB)＝slope2*SNR+midBandYintercept； (5)gain[mid band](dB)=slope2*SNR+midBandYintercept; (5)

gain[high band](dB)＝slope3*SNR+highBandYintercept. (6)gain[high band](dB)＝slope3*SNR+highBandYintercept. (6)

较佳实施例将低频指定为125-375赫兹，中频指定为375-2625赫兹，而高频指定为2625-4000赫兹。斜率和截距根据实验确定。虽然每个频带可以采用不同的斜率，但是较佳实施例对每个频带采用同一斜率0.39。而且lowBandYintercept设定为-17dB，midBandYintercept设定为-13dB，而highBandYintercept设定为-13dB。The preferred embodiment specifies low frequencies from 125-375 Hz, mid frequencies from 375-2625 Hz, and high frequencies from 2625-4000 Hz. Slope and intercept were determined experimentally. Although a different slope can be used for each frequency band, the preferred embodiment uses the same slope of 0.39 for each frequency band. And lowBandYintercept is set to -17dB, midBandYintercept is set to -13dB, and highBandYintercept is set to -13dB.

选项特征将向用户提供包含噪声抑制器以选择所需y截距的装置。因此可以语音质量下降的代价选择较大的噪声抑制(较低的y截距)。y截距可以是噪声抑制器108确定的某些量度的函数的变量。例如当在预定时间间隔内检测到过量噪声能量时可能需要更强的噪声抑制(较低的y截距)。当检测到诸如混串音之类的情况时可能需要较弱的噪声抑制(较高的y截距)。在混串音期间，存在背景讲话者，并且可以保证较低的噪声抑制以防止切断主要的讲话者。另一选项特征将提供可选的增益曲线斜率。而且应该理解的是，除了方程(4)-(6)所述的曲线，也可以有其他更适于确定一定情况下增益因子的曲线。The options feature will provide the user with the means to include a noise suppressor to select the desired y-intercept. Thus greater noise suppression (lower y-intercept) can be chosen at the expense of speech quality degradation. The y-intercept may be a variable of a function of some metric determined by the noise suppressor 108 . Stronger noise suppression (lower y-intercept) may be required, for example, when excess noise energy is detected within a predetermined time interval. Weaker noise suppression (higher y-intercept) may be required when conditions such as crosstalk are detected. During babble, background talkers are present and lower noise rejection may be warranted to prevent cutting off of the main talker. Another optional feature would provide a selectable gain curve slope. And it should be understood that, in addition to the curves described in equations (4)-(6), there may be other curves that are more suitable for determining the gain factor in a certain situation.

对于包含语音的每帧，确定输入信号的M个频率信道每一个的增益因子，这里M为被估值的预定信道数。较佳实施例估值16个信道(M＝16)。参见图3，利用低频曲线确定具有低频范围内频率分量的信道的增益因子。利用中频曲线确定具有中频范围内频率分量的信道的增益因子。利用高频曲线确定具有高频范围内频率分量的信道的增益因子。For each frame containing speech, a gain factor is determined for each of the M frequency channels of the input signal, where M is the predetermined number of channels being evaluated. The preferred embodiment evaluates 16 channels (M=16). Referring to FIG. 3, the low frequency curve is used to determine the gain factor of a channel having frequency components in the low frequency range. Use the IF curve to determine the gain factor for a channel with frequency components in the IF range. The high frequency curve is used to determine the gain factor for a channel with frequency components in the high frequency range.

对于每个估值的信道，采用信道SNR，根据合适的曲线得出增益因子。图2所示信道SNR由信道能量估值器206b、噪声能量估值器214b和SNR估值器210b估值。对于每帧输入信号，信道能量估值器206b产生变换后输入信号M个信道每一个的能量估值。信道能量估值可以利用上述方程(1)的关系更新。如果语音判断单元216确定输入信号中没有语音，则开关218b关闭，并且噪声估值器214b更新信道噪声能量的估值。对于M个信道的每一个，更新的噪声能量估值基于信道能量估值器206b确定的信道能量估值。更新的估值可以利用上述方程(3)的关系估值。信道噪声估值被提供给SNR估值器210b。因此SNR估值器210b根据特定语音帧的信道能量估值和噪声能量估值器214b提供的信道噪声能量估值确定每个语音帧的信道SNR估值。For each estimated channel, the gain factor is derived from the appropriate curve using the channel SNR. The channel SNR shown in FIG. 2 is estimated by channel energy estimator 206b, noise energy estimator 214b, and SNR estimator 210b. For each frame of the input signal, the channel energy estimator 206b generates energy estimates for each of the M channels of the transformed input signal. The channel energy estimate can be updated using the relationship of equation (1) above. If the speech determination unit 216 determines that there is no speech in the input signal, the switch 218b is closed, and the noise estimator 214b updates the estimate of the channel noise energy. For each of the M channels, the updated noise energy estimate is based on the channel energy estimate determined by channel energy estimator 206b. The updated estimate can utilize the relational estimate of Equation (3) above. The channel noise estimate is provided to SNR estimator 210b. SNR estimator 210b thus determines a channel SNR estimate for each speech frame based on the channel energy estimate for the particular speech frame and the channel noise energy estimate provided by noise energy estimator 214b.

本领域内的技术人员将认识到，信道能量估值器206a、噪声能量估值器214a、开关218a和SNR估值器210a完成的功能分别相似于信道能量估值器206b、噪声能量估值器214b、开关218b和SNR估值器210b完成的功能。因此，虽然在图2中表示为单独的处理单元，信道能量估值器206a和206b可以组合为一个处理单元，噪声能量估值器214a和214b可以组合为一个处理单元，开关218a和218b可以组合为一个单元，而SNR估值器210a和210b可以组合为一个单元。作为组合单元，信道能量估值器将确定用于语音检测的N个信道和用于确定信道增益因子的M个信道的信道能量估值。值得注意的是，可能的情况是N＝M。同样，噪声能量估值器和SNR估值器将在N个信道和M个信道上工作。SNR估值器随后向语音判断单元216提供N个SNR估值，并且向信道增益估值器220提供M个SNR估值。Those skilled in the art will recognize that channel energy estimator 206a, noise energy estimator 214a, switch 218a, and SNR estimator 210a perform functions similar to channel energy estimator 206b, noise energy estimator 214b, switch 218b and SNR estimator 210b complete the functions. Thus, although shown as separate processing units in FIG. 2, channel energy estimators 206a and 206b may be combined as one processing unit, noise energy estimators 214a and 214b may be combined as one processing unit, switches 218a and 218b may be combined as one unit, and the SNR estimators 210a and 210b can be combined as one unit. As a combining unit, the channel energy estimator will determine channel energy estimates for N channels for speech detection and M channels for determining channel gain factors. It is worth noting that the possible case is N=M. Likewise, the noise energy estimator and the SNR estimator will work on N channels and M channels. The SNR estimator then provides N SNR estimates to speech decision unit 216 and provides M SNR estimates to channel gain estimator 220 .

信道增益因子由信道增益估值器220提供给增益调整器224。增益调整器224还从变换单元204接收FFT变换的输入信号。变换信号的增益按照信道增益因子作适当调整。例如在上述实施例中(其中M＝16)，根据合适的信道增益因子调整属于16个信道某一个的变换(FFT)点。The channel gain factor is provided by channel gain estimator 220 to gain adjuster 224 . Gain adjuster 224 also receives the FFT transformed input signal from transform unit 204 . The gain of the converted signal is appropriately adjusted according to the channel gain factor. For example, in the above embodiment (where M=16), the transform (FFT) point belonging to one of the 16 channels is adjusted according to an appropriate channel gain factor.

增益调整器224产生的增益调整信号随后被提供给逆变换单元226，在较佳实施例中，它产生信号的逆快速傅立叶变换(IFFT)。逆变换信号被提供给后处理单元228。如果输入帧已经与交叠样本一起形成，则后处理器单元228调整交叠的输出信号。如果信号经历过预加重，则后处理单元228还完成去加重。去加重使预加重期间加强的频率分离衰减。通过减少待处理频率分量外部的噪声分量，预加重/去加重过程有效地进行了噪声抑制。The gain adjusted signal generated by gain adjuster 224 is then provided to an inverse transform unit 226 which, in the preferred embodiment, generates an inverse fast Fourier transform (IFFT) of the signal. The inverse transformed signal is provided to a post-processing unit 228 . If the input frame has been formed with overlapping samples, the post-processor unit 228 adjusts the overlapped output signal. If the signal has undergone pre-emphasis, the post-processing unit 228 also performs de-emphasis. De-emphasis attenuates the frequency separation that was emphasized during pre-emphasis. The pre-emphasis/de-emphasis process effectively suppresses noise by reducing noise components outside the frequency components to be processed.

应该理解的是，图2所示噪声抑制器的各种处理块可以数字信号处理器(DSP)或专用集成电路(ASIC)方式实现。本发明功能性的描述将使普通技术人员无需过度的实验就能以DSP或ASIC方式实施本发明。It should be understood that the various processing blocks of the noise suppressor shown in FIG. 2 may be implemented as a digital signal processor (DSP) or an application specific integrated circuit (ASIC). A functional description of the invention will enable one of ordinary skill to implement the invention in a DSP or ASIC without undue experimentation.

参见图4的流程图，它示出了涉及图2和3所述处理的一些步骤。虽然示出的步骤是顺序的，但是本领域内技术人员将会认识到某些步骤的顺序是可交换的。See FIG. 4 for a flow diagram illustrating some of the steps involved in the processing described in FIGS. 2 and 3 . Although the steps are shown sequentially, those skilled in the art will recognize that the order of certain steps may be interchanged.

过程从步骤402开始。在步骤404，变换单元204将输入的音频信号变换为变换信号，通常为FFT信号。在步骤406，SNR估值器210b根据信道能量估值器206b提供的信道能量估值和噪声能量估值器214b提供的信道噪声能量估值确定输入信号的M个信道的语音SNR。在步骤408，信道增益估值器220根据信道频率确定输入信号的M个信道的增益因子。如果在输入信号帧内没有语音，则信道增益估值器220将增益设定在最小水平。否则根据预定的函数确定M个信道每一个的增益因子。例如参见图3，可以采用斜率和y截距固定的线性方程定义的函数，其中每个线性方程定义了预定频带的增益。在步骤410，增益调整器224利用M个增益因子调整变换信号的M个信道的增益。在步骤412，逆变换单元226逆变换经增益调整的变换信号，产生噪声抑制的音频信号。The process begins at step 402 . In step 404, the transform unit 204 transforms the input audio signal into a transformed signal, usually an FFT signal. In step 406, SNR estimator 210b determines speech SNRs for the M channels of the input signal based on the channel energy estimates provided by channel energy estimator 206b and the channel noise energy estimates provided by noise energy estimator 214b. In step 408, the channel gain estimator 220 determines the gain factors of the M channels of the input signal according to the channel frequencies. If there is no speech within the input signal frame, the channel gain estimator 220 sets the gain at a minimum level. Otherwise the gain factor for each of the M channels is determined according to a predetermined function. For example, referring to FIG. 3, a function defined by linear equations with fixed slope and y-intercept may be employed, where each linear equation defines a gain for a predetermined frequency band. In step 410, the gain adjuster 224 adjusts the gains of the M channels of the transformed signal using the M gain factors. In step 412, the inverse transform unit 226 inverse transforms the gain-adjusted transformed signal to produce a noise-suppressed audio signal.

在步骤414，SNR估值器210根据信道能量估值器206a提供的信道能量估值和噪声能量估值器214a提供的信道噪声能量估值确定输入信号的N个信道的语音SNR。在步骤416，速率判断单元212通过分析输入信号确定输入信号编码速率。另外，可以确定诸如NACF之类的一个或多个模式量度。在步骤418，语音判断单元216根据SNR估值器210提供的SNR、速率判断单元212提供的速率和/或模式量度确定输入信号中是否存在语音。如果在判断块420判断不存在语音，则假定输入信号完全是噪声，并且由噪声能量估值器214a在步骤422完成噪声估值更新。噪声能量估值器214a根据信道能量估值器206a确定的信道能量更新噪声估值。不管是否检测到语音，程序继续转入下一信号帧的处理。In step 414, SNR estimator 210 determines speech SNRs for the N channels of the input signal based on the channel energy estimates provided by channel energy estimator 206a and the channel noise energy estimates provided by noise energy estimator 214a. In step 416, the rate determination unit 212 determines the encoding rate of the input signal by analyzing the input signal. Additionally, one or more pattern metrics such as NACF can be determined. In step 418 , speech determination unit 216 determines whether speech is present in the input signal based on the SNR provided by SNR estimator 210 , the rate and/or mode metric provided by rate determination unit 212 . If at decision block 420 it is determined that speech is not present, then the input signal is assumed to be completely noise and a noise estimate update is done at step 422 by the noise energy estimator 214a. Noise energy estimator 214a updates the noise estimate based on the channel energy determined by channel energy estimator 206a. No matter whether the voice is detected or not, the program continues to process the next signal frame.

以上借助实施例描述了本发明。对于本领域内技术人员来说，无需创造性的劳动即可对本发明作出各种修改。因此本发明的范围和精神由后面所附权利要求限定。The invention has been described above by means of the embodiments. Various modifications to the present invention will occur to those skilled in the art without creative effort. It is therefore intended that the scope and spirit of the invention be defined by the appended claims.

Claims

1. A noise suppressor for suppressing audio signal background noise, characterized in that it comprises:

a signal-to-noise ratio (SNR) estimator for generating channel SNR estimates for a first predefined set of frequency channels of the audio signal;

a gain estimator for generating a gain factor for each of said frequency channels based on a corresponding one of said channel SNR estimators, wherein said gain factor is derived using a gain function defining the gain factor as an SNR increasing function;

a gain adjuster for adjusting the gain level of each of said frequency channels according to one of said corresponding gain factors; and

A speech detector for determining the presence of speech in the audio signal, wherein the speech detector utilizes an estimator and a rate decision unit to detect the presence of speech.

2. The noise suppressor of claim 1, wherein said gain function is frequency dependent.

3. The noise suppressor according to claim 1, characterized in that said gain function is implemented as a look-up table.

4. The noise suppressor of claim 1, wherein said gain function is a linear function with a fixed slope and y-intercept.

5. The noise suppressor of claim 4, wherein said y-intercept is user selectable.

6. The noise suppressor of claim 4, wherein the y-intercept is adjusted based on a measured characteristic of noise within the audio signal.

7. The noise suppressor of claim 4, wherein said slope is user selectable.

8. The noise suppressor of claim 4, wherein said slope is adjusted based on a measured characteristic of noise within said audio signal.

9. The noise suppressor of claim 1, further comprising:

a noise energy estimator for generating an updated channel noise energy estimate for each of said frequency channels when said speech detector determines that there is no speech within said audio signal, said updated channel noise energy estimate being provided to said an SNR estimator to generate the channel SNR estimate.

10. The noise suppressor of claim 9, wherein said speech detector comprises:

a signal-to-noise ratio (SNR) estimator for generating channel SNR estimates for a second predefined set of frequency channels of the audio signal;

A voice judging unit, configured to determine whether there is voice according to the estimated channel SNR of the second frequency channel group.

11. The noise suppressor as claimed in claim 10, wherein said voice detector further comprises:

a mode metric unit for determining at least one mode metric characterizing said audio signal;

Wherein the speech determination unit determines the presence of speech based on the at least one pattern metric.

12. The noise suppressor of claim 11, wherein the pattern metric comprises a normalized autocorrelation function (NACF) metric.

13. A noise suppressor for suppressing background noise of an audio signal, characterized in that it comprises:

means for detecting an encoding rate associated with said audio signal, wherein said audio signal is encoded according to an encoding rate;

means for determining whether speech is present in said audio signal based on a coding rate;

means for generating a channel signal-to-noise ratio (SNR) estimate for a predefined set of frequency channels of the audio signal;

means for determining a gain factor for each of said frequency channels if said means for determining whether speech is present in said audio signal determines that speech is present, wherein a gain function is defined for each of a set of frequency bands, and for each of said The frequency band defines a gain factor that increases as the SNR increases, and the channel gain factor is determined according to a gain function that ranges over a frequency band that includes the frequency channel; and

means for adjusting the gain level of each of said frequency channels according to said corresponding channel gain factor.

14. A noise suppressor as claimed in claim 13, wherein said means for determining a gain factor determines a minimum gain factor for each of said frequency channels if said means for determining whether speech exists determines that speech does not exist .

15. The noise suppressor of claim 13, wherein the gain function is implemented as a look-up table.

16. The noise suppressor of claim 13, wherein said gain function is a linear function with a fixed slope and y-intercept.

17. The noise suppressor of claim 16, wherein each of said y-intercepts is user selectable.

18. The noise suppressor of claim 16, wherein each of said y-intercepts is adjusted based on a measured characteristic of noise within said audio signal.

19. The noise suppressor of claim 16, wherein each of said slopes is user selectable.

20. The noise suppressor of claim 16, wherein each of said slopes is adjusted based on a measured characteristic of noise within said audio signal.

21. The noise suppressor of claim 13, further comprising:

for generating an updated channel noise energy estimate for each of said frequency channels when said means for determining whether speech is present determines that there is no speech in said audio signal, said updated channel noise energy estimate being provided to a device for generating an SNR estimate Value means to update the channel SNR estimate.

22. The noise suppressor of claim 13, wherein said means for determining whether speech is present further comprises:

means for generating channel SNR estimates for a second predefined set of frequency channels of the audio signal.

23. The noise suppressor of claim 13, wherein said means for determining whether speech is present comprises:

means for determining at least one pattern metric characterizing said audio signal; and

Means for determining the presence or absence of speech based on said at least one pattern metric.

24. The noise suppressor of claim 23, wherein said means for determining whether speech is present further comprises:

means for generating channel SNR estimates for a second set of predefined frequency channels of the audio signal;

The device for making a judgment on whether the voice exists further makes a judgment based on the SNR estimate.

25. The noise suppressor of claim 23, wherein said pattern metric comprises a normalized autocorrelation function (NACF) metric.

26. A method for suppressing background noise in an audio signal, comprising the steps of:

transforming the speech signal into a frequency representation of the audio signal;

detecting a coding rate associated with said audio signal;

determining whether speech is present in the audio signal according to the encoding rate of the audio signal;

generating channel signal-to-noise ratio (SNR) estimates for a predefined set of frequency channels represented by said frequency;

If it is determined that speech is present within the audio signal, then determining a gain factor for each of said frequency channels, wherein a gain function is defined for each of a set of frequency bands, and for each of said frequency bands a gain factor that increases with increasing SNR is defined a gain factor, whereby for each of said frequency channels, the channel gain factor is determined according to a gain function ranging over a frequency band comprising the frequency channel;

adjusting the gain level of each of said frequency channels according to said corresponding channel gain factor; and

The gain-adjusted frequency representation is inverse transformed to produce a noise-suppressed audio signal.

27. The method of claim 26, comprising the steps of:

A minimum gain factor for each of said frequency channels is determined if speech is determined to be absent.

28. The method of claim 26, wherein each of said gain functions is a linear function with a fixed slope and y-intercept.

29. The method of claim 26, further comprising the steps of:

generating an updated channel noise energy estimate for each of said frequency channels when said step of determining whether speech is present determines that speech is absent within said audio signal, said updated channel noise energy estimate being used to generate said channel SNR estimate .

30. The method of claim 26, wherein the step of determining whether the voice exists comprises:

generating channel SNR estimates for a second predefined set of frequency channels of the audio signal;

Determining whether speech exists according to the channel SNR estimate of the second group of frequency channels.

31. The method of claim 30, wherein the step of determining whether the voice exists further comprises:

determining at least one mode metric characterizing the audio signal; and

A determination of whether speech is present is determined based on the at least one pattern metric.

32. The method of claim 31, wherein the pattern metric comprises a normalized autocorrelation function (NACF) metric.