CN101185120B

CN101185120B - Systems, methods, and apparatus for highband burst suppression

Info

Publication number: CN101185120B
Application number: CN2006800182696A
Authority: CN
Inventors: 科恩·贝尔纳德·福斯; 阿南塔帕德马纳卜汉·A·坎达达伊
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2005-04-01
Filing date: 2006-04-03
Publication date: 2012-05-30
Anticipated expiration: 2026-04-03
Also published as: ES2350494T3; CN101185125A; CN101180676B; CN101180677A; UA95776C2; ES2351935T3; CN101185126B; CN101185120A; CN101185125B; UA94041C2; CN101180676A; CN101180677B; CN101185126A; ES2358125T3; CN101185124A; CN101185127A; CN101184979A; CN101184979B; UA91853C2; UA93677C2

Abstract

In one embodiment, a highband burst suppressor includes a first burst detector configured to detect bursts in a lowband speech signal, and a second burst detector configured to detect bursts in a corresponding highband speech signal. The lowband and highband speech signals may be different (possibly overlapping) frequency regions of a wideband speech signal. The highband burst suppressor also includes an attenuation control signal calculator configured to calculate an attenuation control signal according to a difference between outputs of the first and second burst detectors. A gain control element is configured to apply the attenuation control signal to the highband speech signal. In one example, the attenuation control signal indicates an attenuation when a burst is found in the highband speech signal but is absent from a corresponding region in time of the lowband speech signal.

Description

Systems, methods, and devices for high-band burst suppression

本申请案主张2005年4月1日申请的题为“CODING THE HIGH-FREQUENCYBAND OF WIDEBAND SPEECH”的第60/667,901号美国临时专利申请案的权益。本申请案还主张2005年4月22日申请的题为“PARAMETER CODING IN A HIGH-BANDSPEECH CODER”的第60/673,965号美国临时专利申请案的权益。This application claims the benefit of U.S. Provisional Patent Application No. 60/667,901, filed April 1, 2005, entitled "CODING THE HIGH-FREQUENCYBAND OF WIDEBAND SPEECH." This application also claims the benefit of U.S. Provisional Patent Application No. 60/673,965, filed April 22, 2005, entitled "PARAMETER CODING IN A HIGH-BANDSPEECH CODER."

技术领域 technical field

本发明涉及信号处理。The present invention relates to signal processing.

背景技术 Background technique

在公众交换电话网络(PSTN)上的语音通信传统上在带宽上限于300-3400kHz的频率范围。用于例如蜂窝式电话和IP语音(VoIP)的语音通信的新的网络可能不具有相同带宽限制，且可能需要在此类网络上发射和接收包括宽带频率范围的语音通信。举例来说，可能需要支持向下延伸至50Hz且/或向上延伸至7或8kHz的音频范围。还可能需要支持可能具有在传统PSTN限制以外的范围中的音频语音内容的其它应用，例如高质量音频或音频/视频会议。Voice communication over the Public Switched Telephone Network (PSTN) has traditionally been limited in bandwidth to the frequency range of 300-3400 kHz. Newer networks for voice communications such as cellular telephones and voice over IP (VoIP) may not have the same bandwidth limitations, and voice communications may need to be transmitted and received over such networks including wideband frequency ranges. For example, it may be desirable to support an audio range extending down to 50Hz and/or up to 7 or 8kHz. There may also be a need to support other applications, such as high quality audio or audio/video conferencing, that may have audio voice content outside the limits of traditional PSTNs.

语音编码器所支持的范围向较高频率的延伸可改进可识度。举例来说，区分例如“s”和“f”的摩擦音的信息主要在高频率中。高频带延伸还可改进语音的其它质量，例如真实感。举例来说，甚至发声的元音也可能具有远在PSTN限制以上的频谱能量。The extension of the range supported by the speech codec to higher frequencies improves intelligibility. For example, the information to distinguish fricatives such as "s" and "f" is mainly in high frequencies. High-band extension may also improve other qualities of speech, such as realism. For example, even vocalized vowels may have spectral energy well above the PSTN limit.

在对宽带语音信号进行研究的过程中，发明者偶然在频谱的上部部分中观察到高能量的脉冲或“突发”。这些高频带突发通常仅持续几毫秒(通常2毫秒，最大长度为约3毫秒)，可在频率中跨越高达几千赫兹(kHz)，且在不同类型的语音声音(发声和不发声两者)期间显现为随机发生。对于一些讲话者来说，高频带突发可在每一句子中发生，而对于其它讲话者来说，此类突发可能完全不发生。虽然这些事件通常并不频繁发生，但是它们确实看起来普遍存在，因为发明者已在来自若干不同数据库和来自若干其它来源的宽带语音样本中发现它们的实例。In the course of research on wideband speech signals, the inventors happened to observe high energy pulses or "bursts" in the upper part of the frequency spectrum. These high-band bursts typically last only a few milliseconds (typically 2 milliseconds, with a maximum length of about 3 milliseconds), can span up to several kilohertz (kHz) in frequency, and occur in different types of speech sounds (both vocalized and unvoiced). or) appears to occur randomly. For some speakers, high-band bursts may occur in every sentence, while for other speakers such bursts may not occur at all. Although these events generally do not occur frequently, they do appear to be ubiquitous, as the inventors have found instances of them in wideband speech samples from several different databases and from several other sources.

高频带突发具有广泛的频率范围，但是通常仅在频谱的较高频带中(例如，3.5到7kHz的区域)发生，而不在较低频带中发生。举例来说，图1展示词语“能”的频谱图。在这个宽带语音信号中，可在0.1秒处观察到高频带突发，其在6kHz左右的广泛频率区域上延伸(该图中，较暗的区域指示较高强度)。有可能至少一些高频带突发由讲话者的嘴与麦克风之间的相互作用而产生，且/或归因于讲话者的嘴在讲话期间发出的喀哒声。High-band bursts have a wide frequency range, but typically only occur in the upper frequency bands of the spectrum (eg, the 3.5 to 7 kHz region) and not in the lower frequency bands. For example, FIG. 1 shows a spectrogram of the word "energy". In this wideband speech signal, a high frequency band burst can be observed at 0.1 s, extending over a broad frequency region around 6 kHz (darker areas in this figure indicate higher intensities). It is possible that at least some of the high frequency band bursts result from the interaction between the speaker's mouth and the microphone, and/or are due to clicks made by the speaker's mouth during speech.

发明内容 Contents of the invention

根据一个实施例，一种信号处理方法包括：处理宽带语音信号以获得低频带语音信号和高频带语音信号；确定突发存在于高频带语音信号的一区域中；和确定低频带语音信号的相应区域中不存在突发。所述方法还包括基于确定存在突发和基于确定突发不存在，使所述区域上的高频带语音信号衰减。According to one embodiment, a signal processing method includes: processing a wideband speech signal to obtain a low-band speech signal and a high-band speech signal; determining that bursts exist in an area of the high-band speech signal; and determining the low-band speech signal There is no burst in the corresponding region of . The method also includes attenuating high-band speech signals over the region based on determining the burst is present and based on determining the burst is not present.

根据一实施例，一种设备包括：第一突发检测器，其经配置以检测低频带语音信号中的突发；第二突发检测器，其经配置以检测相应高频带语音信号中的突发；衰减控制信号计算器，其经配置以根据第一突发检测器的输出与第二突发检测器的输出之间的差来计算衰减控制信号；和增益控制元件，其经配置以将衰减控制信号施加给高频带语音信号。According to an embodiment, an apparatus includes: a first burst detector configured to detect bursts in a low-band speech signal; a second burst detector configured to detect bursts in a corresponding high-band speech signal an attenuation control signal calculator configured to calculate the attenuation control signal based on the difference between the output of the first burst detector and the output of the second burst detector; and a gain control element configured to apply the attenuation control signal to the high-band speech signal.

附图说明 Description of drawings

图1展示包括高频带突发的信号的频谱图。Figure 1 shows a spectrogram of a signal comprising high-band bursts.

图2展示高频带突发已得以抑制的信号的频谱图。Figure 2 shows a spectrogram of a signal in which high-band bursts have been suppressed.

图3展示根据一实施例包括滤波器组A110和高频带突发抑制器C200的布置的框图。FIG. 3 shows a block diagram of an arrangement comprising a filter bank A110 and a high-band burst suppressor C200 according to an embodiment.

图4展示包括滤波器组A110、高频带突发抑制器C200和滤波器组B120的布置的框图。4 shows a block diagram of an arrangement including filter bank A110, high-band burst suppressor C200, and filter bank B120.

图5a展示滤波器组A110的实施方案A112的框图。5a shows a block diagram of an implementation A112 of filter bank A110.

图5b展示滤波器组B120的实施方案B122的框图。5b shows a block diagram of an implementation B122 of filter bank B120.

图6a展示滤波器组A110的一个实例的低频带和高频带的带宽覆盖。Figure 6a shows the bandwidth coverage of the low and high bands for one example of filter bank A110.

图6b展示滤波器组A110的另一实例的低频带和高频带的带宽覆盖。FIG. 6b shows the bandwidth coverage of the low-band and high-band for another example of filter bank A110.

图6c展示滤波器组A112的实施方案A114的框图。6c shows a block diagram of an implementation A114 of filter bank A112.

图6d展示滤波器组B122的实施方案B124的框图。Figure 6d shows a block diagram of an implementation B124 of filter bank B122.

图7展示包括滤波器组A110、高频带突发抑制器C200和高频带语音编码器A200的布置的框图。FIG. 7 shows a block diagram of an arrangement comprising a filter bank A110 , a highband burst suppressor C200 and a highband speech encoder A200 .

图8展示包括滤波器组A110、高频带突发抑制器C200、滤波器组B120和宽带语音编码器A100的布置的框图。8 shows a block diagram of an arrangement comprising filter bank A110, high-band burst suppressor C200, filter bank B120, and wideband speech encoder A100.

图9展示包括高频带突发抑制器C200的宽带语音编码器A102的框图。FIG. 9 shows a block diagram of wideband speech encoder A102 including highband burst suppressor C200.

图10展示宽带语音编码器A102的实施方案A104的框图。10 shows a block diagram of an implementation A104 of wideband speech encoder A102.

图11展示包括宽带语音编码器A104和多路复用器A130的布置的框图。FIG. 11 shows a block diagram of an arrangement comprising wideband speech encoder A104 and multiplexer A130.

图12展示高频带突发抑制器C200的实施方案C202的框图。12 shows a block diagram of an implementation C202 of high-band burst suppressor C200.

图13展示突发检测器C10的实施方案C12的框图。13 shows a block diagram of an implementation C12 of burst detector C10.

图14a和14b分别展示初始区域指示器C50-1和终止区域指示器C50-2的实施方案C52-1、C52-2的框图。Figures 14a and 14b show block diagrams of implementations C52-1, C52-2 of an initial region indicator C50-1 and an ending region indicator C50-2, respectively.

图15展示重合检测器C60的实施方案C62的框图。15 shows a block diagram of an implementation C62 of coincidence detector C60.

图16展示衰减控制信号发生器C20的实施方案C22的框图。16 shows a block diagram of an implementation C22 of attenuation control signal generator C20.

图17展示突发检测器C12的实施方案C14的框图。17 shows a block diagram of an implementation C14 of burst detector C12.

图18展示突发检测器C14的实施方案C16的框图。18 shows a block diagram of an implementation C16 of burst detector C14.

图19展示突发检测器C16的实施方案C18的框图。19 shows a block diagram of an implementation C18 of burst detector C16.

图20展示衰减控制信号发生器C22的实施方案C24的框图。20 shows a block diagram of an implementation C24 of attenuation control signal generator C22.

具体实施方式 Detailed ways

除非上下文明确限制，否则本文使用术语“计算”来指示其任何普通含义，例如计算、产生和从一列值中选择。在术语“包含”用于本描述内容和权利要求书中时，并不排除其它元件或操作。Unless clearly limited by the context, the term "calculate" is used herein to indicate any of its ordinary meanings, such as calculating, generating, and selecting from a list of values. When the term "comprising" is used in the present description and claims, it does not exclude other elements or operations.

高频带突发在原始语音信号中完全听得到，但是它们不对可识度作贡献，且通过抑制它们可改进信号质量。高频带突发还可能对高频带语音信号的编码有害，使得通过抑制来自高频带语音信号的突发可改进编码信号的效率，且尤其可改进编码时间包络的效率。High-band bursts are perfectly audible in the original speech signal, but they do not contribute to intelligibility, and by suppressing them the signal quality can be improved. High-band bursts can also be detrimental to the encoding of high-band speech signals, so that by suppressing bursts from high-band speech signals the efficiency of encoding the signal and in particular the efficiency of encoding the temporal envelope can be improved.

高频带突发可以若干方式负面影响高频带编码系统。第一，这些突发可通过在突发时引入尖峰而使语音信号能量包络随着时间的过去变得不平滑得多。除非编码器以高分辨率模拟信号的时间包络(其增加待发送到解码器的信息量)，否则突发能量可能随着时间的过去在经解码信号中拖尾且导致假信号。第二，高频带突发往往在如由(例如)一组参数(例如线性预测滤波器系数)模拟的频谱包络中占优势。通常为语音信号的每一帧(约20毫秒)执行此模拟。因此，可根据频谱包络来合成不同于先前帧和后继帧的含有喀哒声的帧，这可导致感觉上令人不愉快的不连续性。Highband bursts can negatively impact highband encoding systems in several ways. First, these bursts can make the speech signal energy envelope much less smooth over time by introducing spikes in the burst. Unless the encoder simulates the temporal envelope of the signal at high resolution (which increases the amount of information to be sent to the decoder), the burst energy may smear in the decoded signal over time and cause glitches. Second, high-band bursts tend to dominate in the spectral envelope as modeled by, for example, a set of parameters (eg linear prediction filter coefficients). This simulation is typically performed for each frame (about 20 milliseconds) of the speech signal. Thus, click-containing frames may be synthesized differently from previous and subsequent frames according to the spectral envelope, which may result in perceptually unpleasant discontinuities.

高频带突发可导致高频带合成滤波器的激励信号是从窄带残差(residual)导出或另外表示窄带残差的语音编码系统的另一问题。在所述情况下，高频带突发的存在可使高频带语音信号的编码复杂化，因为高频带语音信号包括窄带语音信号中不具有的结构。Highband bursting can cause another problem for speech coding systems where the excitation signal for the highband synthesis filter is derived from or otherwise represents a narrowband residual. In such cases, the presence of high-band bursts can complicate the encoding of high-band speech signals, since high-band speech signals include structures not present in narrow-band speech signals.

实施例包括经配置以检测存在于高频带语音信号中而不存在于相应低频带语音信号中的突发且减小每一突发期间高频带语音信号的电平的系统、方法和设备。此类实施例的潜在优势包括在不显著降级原始信号的质量的情况下避免经解码信号中的假信号和/或避免编码效率的损失。图2展示在根据此方法抑制高频带突发之后图1所示的宽带信号的频谱图。Embodiments include systems, methods, and devices configured to detect bursts that are present in a high-band speech signal but not in a corresponding low-band speech signal and reduce the level of the high-band speech signal during each burst . Potential advantages of such embodiments include avoiding artifacts in the decoded signal and/or avoiding loss of coding efficiency without significantly degrading the quality of the original signal. Fig. 2 shows a spectrogram of the wideband signal shown in Fig. 1 after suppressing high-band bursts according to this method.

图3展示根据一实施例包括滤波器组A110和高频带突发抑制器C200的布置的框图。滤波器组A110经配置以对宽带语音信号S10进行滤波以产生低频带语音信号S20和高频带语音信号S30。高频带突发抑制器C200经配置以基于高频带语音信号S30输出经处理高频带语音信号S30a，其中在高频带语音信号S30中发生而在低频带语音信号S20中不存在的突发已得以抑制。FIG. 3 shows a block diagram of an arrangement comprising a filter bank A110 and a high-band burst suppressor C200 according to an embodiment. Filter bank A110 is configured to filter wideband speech signal S10 to generate lowband speech signal S20 and highband speech signal S30. The high-band burst suppressor C200 is configured to output a processed high-band speech signal S30a based on the high-band speech signal S30, wherein bursts that occur in the high-band speech signal S30 but do not exist in the low-band speech signal S20 Hair has been suppressed.

图4展示还包括滤波器组B120的图3所示的布置的框图。滤波器组B120经配置以将低频带语音信号S20与经处理高频带语音信号S30a组合，以产生经处理宽带语音信号S10a。由于对高频带突发的抑制，经处理宽带语音信号S10a的质量可比宽带语音信号S10的质量有所改进。FIG. 4 shows a block diagram of the arrangement shown in FIG. 3 that also includes filter bank B120. Filter bank B120 is configured to combine low-band speech signal S20 with processed high-band speech signal S30a to generate processed wideband speech signal S10a. The quality of the processed wideband speech signal S10a may be improved compared to the quality of the wideband speech signal S10 due to the suppression of high frequency band bursts.

滤波器组A110经配置以根据分离频带方案对输入信号进行滤波以产生低频子频带和高频子频带。视特定应用的设计准则而定，输出子频带可具有相等的或不相等的带宽，且可重叠或不重叠。产生两个以上子频带的滤波器组A110的配置也是可能的。举例来说，所述滤波器组可经配置以产生极低频带信号，其包括低于窄带信号S20的频率范围的频率范围中的分量(例如50-300Hz的范围)。在此情况下，宽带语音编码器A100(参见下边附图8的介绍)经实施以单独编码此极低频带信号，且多路复用器A130(参见下边附图11的介绍)可经配置以在多路复用信号S70中包括经编码极低频带信号(例如，作为可分离部分)。Filter bank A110 is configured to filter the input signal according to a split-band scheme to produce low frequency sub-bands and high frequency sub-bands. Depending on application-specific design criteria, the output sub-bands may have equal or unequal bandwidths, and may or may not overlap. Configurations of the filter bank A110 that generate more than two sub-bands are also possible. For example, the filter bank may be configured to generate a very low-band signal that includes components in a frequency range lower than that of the narrowband signal S20 (eg, the range of 50-300 Hz). In this case, wideband speech encoder A100 (see description of FIG. 8 below) is implemented to separately encode this very low frequency band signal, and multiplexer A130 (see description of FIG. 11 below) can be configured to The encoded very low frequency band signal is included in the multiplexed signal S70 (eg, as a separable part).

图5a展示滤波器组A110的实施方案A112的框图，所述实施方案经配置以产生具有减小的取样率的两个子频带信号。滤波器组A110经布置以接收具有高频(或高频带)部分和低频(或低频带)部分的宽带语音信号S10。滤波器组A112包括经配置以接收宽带语音信号S10并产生低频带语音信号S20的低频带处理路径，和经配置以接收宽带语音信号S10并产生高频带语音信号S30的高频带处理路径。低通滤波器110对宽带语音信号S10进行滤波以使选定低频子频带通过，且高通滤波器130对宽带语音信号S10进行滤波以使选定高频子频带通过。因为两个子频带信号比宽带语音信号S10具有更窄的带宽，所以其取样率可在不损失信息的情况下减小到一定程度。降取样器120根据所需抽取因数来减小低通信号的取样率(例如，通过移除信号样本和/或以平均值替换样本)，且降取样器140同样根据另一所需抽取因数来减小高通信号的取样率。5a shows a block diagram of an implementation A112 of filter bank A110 configured to generate two subband signals with a reduced sampling rate. Filter bank A110 is arranged to receive wideband speech signal S10 having a high frequency (or high band) part and a low frequency (or low band) part. Filter bank A112 includes a low-band processing path configured to receive wideband speech signal S10 and generate low-band speech signal S20, and a high-band processing path configured to receive wideband speech signal S10 and generate high-band speech signal S30. The low pass filter 110 filters the wideband speech signal S10 to pass selected low frequency sub-bands, and the high pass filter 130 filters the wideband speech signal S10 to pass selected high frequency subbands. Since the two sub-band signals have a narrower bandwidth than the wideband speech signal S10, their sampling rate can be reduced to a certain extent without loss of information. Decimator 120 reduces the sampling rate of the low-pass signal by a desired decimation factor (e.g., by removing signal samples and/or replacing samples with an average value), and decimator 140 similarly decimates by another desired decimation factor. Reduce the sampling rate of the high-pass signal.

图5b展示滤波器组B120的相应实施方案B122的框图。升取样器150增加低频带语音信号S20的取样率(例如，通过零填充和/或通过复制样本)，且低通滤波器160对经升取样的信号进行滤波以使得仅低频带部分通过(例如，以防止混叠)。同样，升取样器170增加经处理的高频带信号S30a的取样率，且高通滤波器180对经升取样的信号进行滤波以使得仅高频带部分通过。接着求两个通带信号之和以形成宽带语音信号S10a。在包括滤波器组B120的一设备的一些实施方案中，滤波器组B120经配置以根据由所述设备接收和/或计算出的一个或一个以上权值来产生两个通带信号的加权和。还预期将两个以上通带信号组合的滤波器组B120的配置。5b shows a block diagram of a corresponding implementation B122 of filter bank B120. Up-sampler 150 increases the sampling rate of low-band speech signal S20 (e.g., by zero padding and/or by duplicating samples), and low-pass filter 160 filters the up-sampled signal so that only the low-band portion passes (e.g., , to prevent aliasing). Also, the up-sampler 170 increases the sampling rate of the processed high-band signal S30a, and the high-pass filter 180 filters the up-sampled signal to pass only the high-band portion. The two passband signals are then summed to form wideband speech signal S10a. In some embodiments of an apparatus comprising filter bank B 120, filter bank B 120 is configured to generate a weighted sum of two passband signals based on one or more weights received and/or calculated by the apparatus . Configurations of filter bank B 120 that combine more than two passband signals are also contemplated.

滤波器110、130、160、180中的每一者可实施为有限脉冲响应(FIR)滤波器或实施为无限脉冲响应(IIR)滤波器。滤波器110和130的频率响应可具有在阻带与通带之间的对称或不同形状的过渡区域。同样，滤波器160和180的频率响应可具有在阻带与通带之间的对称或不同形状的过渡区域。可能需要(但不是严格必要)低通滤波器110与低通滤波器160具有相同的响应，且高通滤波器130与高通滤波器180具有相同的响应。在一个实例中，两个滤波器对110、130和160、180为正交镜像滤波器(QMF)组，其中滤波器对110、130与滤波器对160、180具有相同系数。Each of the filters 110, 130, 160, 180 may be implemented as a finite impulse response (FIR) filter or as an infinite impulse response (IIR) filter. The frequency responses of filters 110 and 130 may have symmetrical or differently shaped transition regions between stopbands and passbands. Likewise, the frequency responses of filters 160 and 180 may have symmetrical or differently shaped transition regions between stopbands and passbands. It may be desirable (but not strictly necessary) for low-pass filter 110 to have the same response as low-pass filter 160 and for high-pass filter 130 to have the same response as high-pass filter 180 . In one example, the two filter pairs 110 , 130 and 160 , 180 are quadrature mirror filter (QMF) banks, where the filter pair 110 , 130 has the same coefficients as the filter pair 160 , 180 .

在典型实例中，低通滤波器110具有包括300-3400Hz的有限PSTN范围的通带(例如，0到4kHz的频带)。图6a和6b以两个不同实施实例展示宽带语音信号S10、低频带语音信号S20和高频带语音信号S30的相对带宽。在这两个特定实例中，宽带语音信号S10具有16kHz的取样率(表示在0到8kHz的范围内的频率分量)，且低频带信号S20具有8kHz的取样率(表示在0到4kHz的范围内的频率分量)。In a typical example, the low pass filter 110 has a passband (eg, a frequency band of 0 to 4 kHz) that includes a limited PSTN range of 300-3400 Hz. Figures 6a and 6b show the relative bandwidths of the wideband speech signal S10, the lowband speech signal S20 and the highband speech signal S30 in two different implementation examples. In these two specific examples, the wideband speech signal S10 has a sampling rate of 16 kHz (representing frequency components in the range 0 to 8 kHz), and the low-band signal S20 has a sampling rate of 8 kHz (representing frequency components in the range 0 to 4 kHz). frequency components).

在图6a的实例中，在两个子频带之间不存在显著重叠。在此实例中展示的高频带信号S30可使用具有4-8kHz的通带的高通滤波器130来获得。在此情况下，可能需要通过以二为因数对经滤波信号进行降取样来使取样率减小到8kHz。可预期此操作显著减小对信号的进一步处理操作的计算复杂性，所述操作将在不损失信息的情况下将通带能量向下移动到0到4kHz的范围。In the example of Fig. 6a, there is no significant overlap between the two sub-bands. The high-band signal S30 shown in this example can be obtained using a high-pass filter 130 with a passband of 4-8 kHz. In this case, it may be necessary to reduce the sampling rate to 8kHz by downsampling the filtered signal by a factor of two. This operation is expected to significantly reduce the computational complexity of further processing operations on the signal that will shift the passband energy down to the 0 to 4 kHz range without loss of information.

在图6b的替代实例中，较高子频带与较低子频带具有明显重叠，使得两个子频带信号描述3.5到4kHz的区域。在此实例中的高频带信号S30可使用具有3.5-7kHz的通带的高通滤波器130来获得。在此情况下，可能需要通过以16/7为因数对经滤波信号进行降取样来使取样率减小到7kHz。可预期此操作显著减小对信号的进一步处理操作的计算复杂性，所述操作将在不损失信息的情况下将通带能量向下移动到0到3.5kHz的范围。In the alternative example of Fig. 6b, the upper sub-band has a significant overlap with the lower sub-band such that the two sub-band signals describe the 3.5 to 4 kHz region. The high-band signal S30 in this example can be obtained using a high-pass filter 130 with a passband of 3.5-7 kHz. In this case, it may be necessary to reduce the sampling rate to 7kHz by downsampling the filtered signal by a factor of 16/7. This operation is expected to significantly reduce the computational complexity of further processing operations on the signal that will shift the passband energy down to the 0 to 3.5 kHz range without loss of information.

在用于电话通信的典型手机中，一个或一个以上变换器(即，麦克风和耳机或扬声器)在7-8kHz的频率范围内不存在明显响应。在图6b的实例中，宽带语音信号S10的在7与8kHz之间的部分不包括在经编码信号中。高通滤波器130的其它特定实例具有3.5-7.5kHz和3.5-8kHz的通带。In a typical cell phone used for telephone communications, there is no appreciable response in the frequency range of 7-8 kHz for one or more transducers (ie, microphone and earpiece or speaker). In the example of Fig. 6b, the part of the wideband speech signal S10 between 7 and 8 kHz is not included in the encoded signal. Other specific examples of high pass filter 130 have passbands of 3.5-7.5 kHz and 3.5-8 kHz.

在一些实施方案中，如图6b的实例中提供子频带之间的重叠允许使用在重叠区域上具有平滑下降的低通和/或高通滤波器。此类滤波器通常比具有较陡峭或“砖墙(brick-wall)”响应的滤波器在计算上较不复杂且/或引入较少延迟。具有陡峭过渡区域的滤波器往往比具有平滑下降的类似阶的滤波器具有更高的旁瓣(其可导致混叠)。具有陡峭过渡区域的滤波器也可能具有可导致振铃假信号的长脉冲响应。对于具有一个或一个以上IIR滤波器的滤波器组实施方案来说，允许在重叠区域上的平滑下降可使得能够使用极点远离单位圆的滤波器(一或多个)，这对于确保稳定的定点实施方案可能很重要。In some embodiments, providing an overlap between sub-bands as in the example of Fig. 6b allows the use of low-pass and/or high-pass filters with smooth roll-off over the overlapping region. Such filters are typically less computationally complex and/or introduce less delay than filters with steeper or "brick-wall" responses. Filters with steep transition regions tend to have higher sidelobes (which can lead to aliasing) than similar order filters with smooth rolloffs. Filters with steep transition regions can also have long impulse responses that can cause ringing glitches. For filter bank implementations with one or more IIR filters, allowing smooth rolloff over overlapping regions can enable the use of filter(s) with poles far from the unit circle, which is important for ensuring stable fixed-point Implementation can be important.

子频带的重叠允许低频带和高频带的平滑混合，其可产生较少的可听假信号，使混叠减少和/或从一个频带到另一频带的过渡不太显著。此外，在随后由不同语音编码器对低频带和高频带语音信号S20、S30编码的应用中，低频带语音编码器(例如，波形编码器)的编码效率可因频率不断增加而降低。举例来说，低频带语音编码器的编码质量可在低位速率下减小，尤其在存在背景噪声时减小。在此类情况下，提供子频带的重叠可增加在重叠区域中的再生频率分量的质量。The overlapping of the sub-bands allows for a smooth mixing of the low and high frequency bands, which results in less audible artifacts, reduced aliasing and/or less pronounced transitions from one frequency band to another. Furthermore, in applications where the low-band and high-band speech signals S20, S30 are subsequently encoded by different vocoders, the coding efficiency of the low-band vocoder (eg, waveform coder) may decrease due to increasing frequency. For example, the encoding quality of a low-band speech encoder may decrease at low bit rates, especially in the presence of background noise. In such cases, providing an overlap of sub-bands may increase the quality of the reproduced frequency components in the overlapping region.

此外，子频带的重叠允许低频带和高频带的平滑混合，其可产生较少的可听假信号，使混叠减少和/或从一个频带到另一频带的过渡不太显著。对于如下所讨论的低频带语音编码器A120和高频带语音编码器A200根据不同编码方法操作的实施方案来说，可能尤其需要所述特征。举例来说，不同编码技术可产生听起来完全不同的信号。编码密码本索引形式的频谱包络的编码器可产生具有与改为编码幅值频谱的编码器不同声音的信号。时域编码器(例如，脉冲编码调制或PCM编码器)可产生具有与频域编码器不同声音的信号。编码具有频谱包络表示形式和相应残差信号的信号的编码器可产生具有与编码仅具有频谱包络表示形式的信号的编码器不同声音的信号。将信号编码为其波形的表示形式的编码器可产生具有与来自正弦编码器的声音不同的声音的输出。在此类情况下，使用具有陡峭过渡区域的滤波器来界定非重叠子频带可在合成的宽带信号中的子频带之间产生突然的感觉上显著的过渡。Furthermore, the overlapping of sub-bands allows for a smooth mixing of low and high frequency bands, which may result in less audible artifacts, reduced aliasing and/or less pronounced transitions from one frequency band to another. This feature may be particularly desirable for implementations in which the low-band vocoder A120 and the high-band vocoder A200 operate according to different encoding methods, as discussed below. For example, different encoding techniques can produce signals that sound completely different. A coder that encodes a spectral envelope in the form of a codebook index can produce a signal that has a different sound than a coder that encodes a magnitude spectrum instead. A time-domain encoder (eg, a pulse code modulation or PCM encoder) can produce a signal that has a different sound than a frequency-domain encoder. An encoder that encodes a signal with a spectral envelope representation and a corresponding residual signal may produce a signal that has a different sound than an encoder that encodes a signal with only a spectral envelope representation. An encoder that encodes a signal as a representation of its waveform can produce an output that has a different sound than that from a sinusoidal encoder. In such cases, using filters with steep transition regions to delimit non-overlapping subbands can produce abrupt, perceptually significant transitions between subbands in the synthesized wideband signal.

虽然常常在子频带技术中使用具有互补重叠频率响应的QMF滤波器组，但是此类滤波器不适于本文所描述的至少一些带宽编码实施方案。编码器处的QMF滤波器组经配置以建立显著程度的混叠，其在解码器处的相应QMF滤波器组中被取消。此类布置可能不适于信号在滤波器组之间引起大量失真的应用，因为失真可能减小混叠取消性质的有效性。举例来说，本文所描述的应用包括经配置而以极低位速率操作的编码实施方案。由于位速率极低，所以经解码信号与原始信号相比，很可能呈现为显著失真，使得QMF滤波器组的使用可导致未取消的混叠。使用QMF滤波器组的应用通常具有较高位速率(例如，对于AMR来说超过12kbps，且对于G.722来说超过64kbps)。While QMF filterbanks with complementary overlapping frequency responses are often used in subband techniques, such filters are not suitable for at least some bandwidth coding implementations described herein. The QMF filterbanks at the encoder are configured to create a significant degree of aliasing, which is canceled out in the corresponding QMF filterbanks at the decoder. Such an arrangement may not be suitable for applications where the signal induces substantial distortion between filter banks, since the distortion may reduce the effectiveness of the alias cancellation properties. For example, applications described herein include encoding implementations configured to operate at very low bit rates. Due to the very low bit rate, the decoded signal is likely to appear significantly distorted compared to the original signal, so that the use of a QMF filter bank may result in non-canceled aliasing. Applications using QMF filter banks typically have higher bit rates (eg, over 12kbps for AMR and over 64kbps for G.722).

另外，编码器可经配置以产生在感觉上类似于原始信号但实际上显著不同于原始信号的合成信号。举例来说，从本文所描述的窄带残差导出高频带激励的编码器可产生此类信号，因为经解码信号中可能完全不存在实际高频带残差。QMF滤波器组在此类应用中的使用可导致由未取消的混叠引起的显著程度的失真。Additionally, an encoder may be configured to produce a composite signal that is perceptually similar to the original signal, but is actually significantly different from the original signal. For example, an encoder that derives a high-band excitation from the narrow-band residual described herein may produce such a signal, since the actual high-band residual may be completely absent in the decoded signal. The use of QMF filter banks in such applications can result in a significant degree of distortion caused by non-canceled aliasing.

如果受影响的子频带较窄，那么由QMF混叠引起的失真量可减小，因为混叠的效果限于等于子频带的宽度的带宽。举例来说，如本文所述，每一子频带约包括宽带带宽的一半，然而，由未取消混叠引起的失真可影响信号的大部分。信号的质量也可能受上面发生未取消混叠的频带的位置影响。举例来说，在宽带语音信号的中心附近(例如，3kHz与4kHz之间)产生的失真可比在信号边缘附近(例如，约6kHz)发生的失真更有害。If the affected subbands are narrower, the amount of distortion caused by QMF aliasing can be reduced because the effect of aliasing is limited to a bandwidth equal to the width of the subband. For example, as described herein, each sub-band includes approximately half of the wideband bandwidth, however, distortion caused by non-anti-aliasing can affect a large portion of the signal. The quality of the signal may also be affected by the position of the frequency band above which the non-anti-aliasing occurs. For example, distortion occurring near the center of a wideband speech signal (eg, between 3kHz and 4kHz) may be more detrimental than distortion occurring near the edge of the signal (eg, around 6kHz).

虽然QMF滤波器组的滤波器的响应彼此严格相关，但是滤波器组A110和B120的低频带和高频带路径可经配置以具有除两个子频带的重叠外完全不相关的频谱。我们将两个子频带的重叠定义为从高频带滤波器的频率响应下降到-20dB的点到低频带滤波器的频率响应下降到-20dB的点的距离。在滤波器组A110和/或B120的各种实例中，此重叠在200Hz左右到1kHz左右的范围内。约400到约600Hz的范围可表示编码效率与感觉上的平滑度之间的理想折衷。在如上所述的一个特定实例中，所述重叠在500Hz左右。While the responses of the filters of the QMF filterbank are strictly correlated with each other, the low-band and high-band paths of filterbanks A110 and B120 can be configured to have completely uncorrelated spectra except for the overlap of the two subbands. We define the overlap of two sub-bands as the distance from the point where the frequency response of the high-band filter drops to -20dB to the point where the frequency response of the low-band filter drops to -20dB. In various examples of filter banks A110 and/or B120, this overlap is in the range of around 200 Hz to around 1 kHz. A range of about 400 to about 600 Hz may represent an ideal compromise between coding efficiency and perceived smoothness. In a specific example as described above, the overlap is around 500 Hz.

可能需要实施滤波器组A112和/或B122以在若干阶段中执行如图6a和6b中所说明的操作。举例来说，图6c展示滤波器组A112的实施方案A114的框图，所述实施方案使用一系列内插、重取样、抽取和其它操作来执行高通滤波和降取样操作的功能均等操作。此类实施方案可能较易于设计且/或可能允许逻辑和/或编码的功能块的再使用。举例来说，如图6c所示，相同的功能块可用于执行到14kHz的抽取和到7kHz的抽取的操作。可通过将信号与函数e^jnπ或序列(-1)ⁿ(其值在+1与-1之间更替)相乘来实施频谱反向操作。频谱整形操作可实施为低通滤波器，所述低通滤波器经配置以使信号整形以获得所需的总滤波器响应。It may be necessary to implement filter banks A112 and/or B122 to perform operations as illustrated in Figures 6a and 6b in several stages. For example, FIG. 6c shows a block diagram of an implementation A114 of filterbank A112 that uses a series of interpolation, resampling, decimation, and other operations to perform functionally equivalent operations of high-pass filtering and downsampling operations. Such implementations may be easier to design and/or may allow reuse of logical and/or coded functional blocks. For example, as shown in Figure 6c, the same functional block can be used to perform the operations of decimation to 14kHz and decimation to 7kHz. The spectral inversion operation can be implemented by multiplying the signal with a function ^ejnπ or a sequence (-1) ⁿ whose value alternates between +1 and -1. The spectral shaping operation may be implemented as a low pass filter configured to shape the signal to obtain the desired overall filter response.

注意到，由于频谱反向操作的缘故，高频带信号S30的频谱被反向。可相应地配置编码器和相应解码器中的随后操作。举例来说，可能需要产生也具有频谱反向形式的相应激励信号。Note that due to the spectral inversion operation, the spectrum of the high-band signal S30 is inverted. Subsequent operations in the encoder and corresponding decoder can be configured accordingly. For example, it may be desirable to generate a corresponding excitation signal that also has a spectrally inverse form.

图6d展示滤波器组B122的实施方案B124的框图，所述实施方案使用一系列内插、重取样、抽取和其它操作来执行升取样和高通滤波操作的功能均等操作。滤波器组B124包括在高频带中的频谱反向操作，其使(例如)编码器的滤波器组(例如滤波器组A114)中所执行的类似操作反向。在此特定实例中，滤波器组B124还包括低频带和高频带中的陷波滤波器，其衰减7100Hz处的信号的分量，但是此类滤波器为可选的且并非必须包括。与此一同申请的专利申请案“SYSTEMS，METHODS，AND APPARATUS FORSPEECH SIGNAL FILTERING”(公开号为2007/0088558的美国专利申请)包括关于滤波器组A110和B120的特定实施方案的元件的响应的额外描述和图式，且此材料在此以引用的方式并入。Figure 6d shows a block diagram of an implementation B124 of filter bank B122 that uses a series of interpolation, resampling, decimation, and other operations to perform the functional equivalent of upsampling and high-pass filtering operations. Filterbank B 124 includes a spectral inversion operation in the high frequency band that inverts a similar operation performed, eg, in a filterbank of the encoder (eg, filterbank A 114 ). In this particular example, filter bank B 124 also includes notch filters in the low and high bands that attenuate components of the signal at 7100 Hz, but such filters are optional and need not be included. Co-filed patent application "SYSTEMS, METHODS, AND APPARATUS FORSPEECH SIGNAL FILTERING" (US Patent Application Publication No. 2007/0088558) includes additional description regarding the response of elements of specific embodiments of filter banks A110 and B120 and drawings, and this material is hereby incorporated by reference.

如上所述，高频带突发抑制可改进编码高频带语音信号S30的效率。图7展示由高频带语音编码器A200编码经处理高频带语音信号S30a(如由高频带突发抑制器C200所产生)以产生经编码高频带语音信号S30b的布置的框图。As described above, high-band burst suppression can improve the efficiency of encoding the high-band speech signal S30. 7 shows a block diagram of an arrangement for encoding processed highband speech signal S30a (as produced by highband burst suppressor C200) by highband speech encoder A200 to produce encoded highband speech signal S30b.

一种宽带语音编码方法涉及缩放窄带语音编码技术(例如，经配置以编码0-4kHz的范围的技术)以覆盖宽带频谱。举例来说，可在较高速率下对语音信号进行取样以包括高频率处的分量，且窄带编码技术可经重新配置以使用较多滤波器系数来表示此宽带信号。图8展示宽带语音编码器A100经布置以编码经处理宽带语音信号S10a以产生经编码宽带语音信号S10b的实例的框图。One wideband speech coding method involves scaling narrowband speech coding techniques (eg, techniques configured to encode the range of 0-4 kHz) to cover the wideband frequency spectrum. For example, a speech signal can be sampled at a higher rate to include components at high frequencies, and narrowband encoding techniques can be reconfigured to use more filter coefficients to represent this wideband signal. 8 shows a block diagram of an example of wideband speech encoder A100 arranged to encode processed wideband speech signal S10a to produce encoded wideband speech signal S10b.

然而，例如CELP(密码本激励线性预测)的窄带编码技术计算量较大，且宽带CELP编码器可能消耗过多的处理循环才可应用于许多移动和其它嵌入式应用。使用这种技术编码宽带信号的整个频谱达到所需质量还可能导致不可接受的较大的带宽增加。此外，甚至在此类经编码信号的窄带部分可传输到仅支持窄带编码的系统和/或由所述系统解码之前，也将需要对所述经编码信号进行代码转换。图9展示分别包括单独的低频带和高频带语音编码器A120和A220的宽带语音编码器A102的框图。However, narrowband encoding techniques such as CELP (Codebook Excited Linear Prediction) are computationally intensive, and wideband CELP encoders may consume too many processing cycles to be applicable to many mobile and other embedded applications. Encoding the entire spectrum of a wideband signal to the desired quality using this technique may also result in an unacceptably large increase in bandwidth. Furthermore, transcoding of such encoded signals will be required even before the narrowband portion of such encoded signals can be transmitted to and/or decoded by systems that only support narrowband encoding. FIG. 9 shows a block diagram of wideband vocoder A102 including separate low-band and high-band vocoders A120 and A220, respectively.

可能需要实施宽带语音编码，使得在不进行代码转换或其它显著修改的情况下，至少经编码信号的窄带部分可经由窄带信道(例如PSTN信道)发送。还可能需要宽带编码延伸的有效性(例如)以避免在应用(例如无线蜂窝式电话以及有线和无线信道上的广播)中可得到服务的用户的数量显著减少。It may be desirable to implement wideband speech coding such that at least the narrowband portion of the encoded signal can be sent via a narrowband channel (eg, a PSTN channel) without transcoding or other significant modifications. The availability of wideband coding extensions may also be required, for example, to avoid a significant reduction in the number of serviceable users in applications such as wireless cellular telephony and broadcasting on wired and wireless channels.

一种宽带语音编码方法涉及从经编码窄带频谱包络外推高频带频谱包络。虽然这种方法可在带宽无任何增加且不需要代码转换的情况下实施，然而，一般不能从窄带部分的频谱包络中精确预测到语音信号的高频带部分的粗频谱包络或共振峰结构。One wideband speech coding method involves extrapolating a high-band spectral envelope from an encoded narrowband spectral envelope. Although this approach can be implemented without any increase in bandwidth and without the need for transcoding, however, the coarse spectral envelope or formants of the high-band portion of the speech signal cannot generally be accurately predicted from the spectral envelope of the narrowband portion structure.

图10展示根据来自低频带语音信号的信息使用另一方法来编码高频带语音信号的宽带语音编码器A104的框图。在此实例中，从经编码低频带激励信号S50导出高频带激励信号。编码器A104可经配置以(例如)根据如公开号为WO2006/107837，题为“METHODS AND APPARATUS FOR ENCODING AND DECODING AN HIGHBANDPORTION OF A SPEECH SIGNAL”的专利申请中所描述的一个或一个以上此类实施例基于一基于高频带激励信号的信号来编码增益包络，所述申请案的描述内容在此以引用的方式并入。宽带语音编码器A104的一个特定实例经配置而在约8.55kbps(千位/秒)的速率下编码宽带语音信号S10，其中约7.55kbps用于低频带滤波器参数S40和经编码低频带激励信号S50，且约1kbps用于经编码高频带语音信号S60。10 shows a block diagram of a wideband speech encoder A104 that uses another method to encode a high-band speech signal based on information from a low-band speech signal. In this example, the high-band excitation signal is derived from the encoded low-band excitation signal S50. Encoder A104 may be configured to (for example) according to one or more such implementations as described in Patent Application Publication No. WO2006/107837, entitled "METHODS AND APPARATUS FOR ENCODING AND DECODING AN HIGHBANDPORTION OF A SPEECH SIGNAL" example encodes the gain envelope based on a signal based on a high-band excitation signal, the description of which is hereby incorporated by reference. A specific example of the wideband speech encoder A104 is configured to encode the wideband speech signal S10 at a rate of about 8.55 kbps (kilobits per second), with about 7.55 kbps being used for the low-band filter parameters S40 and the encoded low-band excitation signal S50, and about 1 kbps for the encoded high-band speech signal S60.

可能需要将经编码低频带信号与高频带信号组合成单一位流。举例来说，可能需要将经编码信号多路复用在一起，以作为经编码宽带语音信号来用于传输(例如，经由有线、光学或无线传输信道)或存储。图11展示包括宽带语音编码器A104和多路复用器A130的布置的框图，所述多路复用器A130经配置以将低频带滤波器参数S40、经编码低频带激励信号S50和经编码的高频带语音信号S30b组合成多路复用信号S70。It may be desirable to combine the encoded low-band and high-band signals into a single bitstream. For example, it may be desirable to multiplex the encoded signals together for transmission (eg, via a wired, optical or wireless transmission channel) or storage as an encoded wideband speech signal. 11 shows a block diagram of an arrangement comprising a wideband speech encoder A104 and a multiplexer A130 configured to combine the low-band filter parameters S40, the encoded low-band excitation signal S50 and the encoded The high frequency band speech signal S30b is combined into a multiplexed signal S70.

多路复用器A130可能需要经配置以将经编码低频带信号(包括低频带滤波器参数S40和经编码低频带激励信号S50)作为多路复用信号S70的可分离的子流而嵌入，使得可独立于多路复用信号S70的另一部分(例如高频带和/或极低频带信号)来恢复和解码经编码低频带信号。举例来说，多路复用信号S70可经布置以使得可通过剥离所述经编码的高频带语音信号S30b来恢复经编码低频带信号。此特征的一个潜在优势是避免在将经编码宽带信号传递到支持低频带信号的解码但不支持高频带部分的解码的系统之前，需要对所述经编码宽带信号进行代码转换。The multiplexer A130 may need to be configured to embed the encoded low-band signal (comprising the low-band filter parameters S40 and the encoded low-band excitation signal S50) as separable substreams of the multiplexed signal S70, This allows the encoded low-band signal to be recovered and decoded independently of another part of the multiplexed signal S70, such as the high-band and/or very low-band signal. For example, multiplexed signal S70 may be arranged such that an encoded low-band signal may be recovered by stripping said encoded high-band speech signal S30b. One potential advantage of this feature is to avoid the need to transcode the encoded wideband signal before passing it to a system that supports decoding of the lowband signal but not the highband portion.

如本文所描述包括低频带、高频带和/或宽带语音编码器的设备还可包括经配置以将经编码信号传输到传输信道(例如有线、光学或无线信道)中的电路。此类设备还可经配置以对信号执行一个或一个以上信道编码操作，例如误差校正编码(例如，速率兼容卷积编码)和/或误差检测编码(例如，循环冗余编码)，和/或一层或一层以上的网络协议编码(例如，以太网、TCP/IP、cdma2000)。An apparatus including a low-band, high-band, and/or wideband speech encoder as described herein may also include circuitry configured to transmit the encoded signal into a transmission channel (eg, a wired, optical, or wireless channel). Such devices may also be configured to perform one or more channel coding operations on the signal, such as error-correcting coding (e.g., rate-compatible convolutional coding) and/or error-detecting coding (e.g., cyclic redundancy coding), and/or One or more layers of network protocol encoding (eg, Ethernet, TCP/IP, cdma2000).

可根据源滤波器模型来实施本文所描述的低频带、高频带和宽带语音编码器中的任一者或全部，所述源滤波器模型将输入语音信号编码为(A)描述滤波器的一组参数和(B)使所描述的滤波器产生输入语音信号的合成再生物的激励信号。举例来说，语音信号的频谱包络由表示声域的共振的且称为共振峰的许多峰值表征。大多数语音编码器至少将这种粗频谱结构编码为例如滤波器系数的一组参数。Any or all of the low-band, high-band, and wideband speech encoders described herein may be implemented in terms of a source filter model that encodes an input speech signal as (A) describing a filter A set of parameters and (B) cause the described filter to generate an excitation signal for a synthetic regenerator of the input speech signal. For example, the spectral envelope of a speech signal is characterized by a number of peaks representing resonances of the vocal field and called formants. Most speech coders encode at least this coarse spectral structure as a set of parameters such as filter coefficients.

在基本源滤波器布置的一个实例中，分析模块计算表征滤波器的对应于一段时间(通常20毫秒)语音声音的一组参数。根据那些滤波器参数而配置的白化滤波器(也称为分析或预测误差滤波器)移除频谱包络以使信号在频谱上平坦化。所得的白化信号(也称为残差)与原始语音信号相比具有较少的能量，且因此具有较少的方差，且更易于编码。对残差信号的编码产生的误差也可在频谱上更均匀地散布。滤波器参数和残差通常经量化以用于经由信道有效传输。在解码器处，根据滤波器参数配置的合成滤波器由残差激励以产生原始语音声音的合成型式。合成滤波器通常经配置以具有转移函数，所述转移函数是白化滤波器的转移函数的反函数。In one example of a basic source filter arrangement, the analysis module computes a set of parameters characterizing the filter for a period of time (typically 20 milliseconds) of speech sound. A whitening filter (also known as an analysis or prediction error filter) configured according to those filter parameters removes the spectral envelope to flatten the signal spectrally. The resulting whitened signal (also called residual) has less energy and thus less variance than the original speech signal, and is easier to encode. Errors resulting from the encoding of the residual signal may also be more evenly spread over the frequency spectrum. Filter parameters and residuals are typically quantized for efficient transmission over a channel. At the decoder, a synthesis filter configured according to the filter parameters is excited by the residual to produce a synthesized version of the original speech sound. Synthesis filters are typically configured to have a transfer function that is the inverse of that of the whitening filter.

分析模块可实施为线性预测编码(LPC)分析模块，其将语音信号的频谱包络编码为一组线性预测(LP)系数(例如，全极点滤波器1/A(z)的系数)。分析模块通常将输入信号处理为一系列非重叠的帧，为每一帧计算一组新的系数。帧周期一般为可预期信号为局部静止的周期；一个常见实例为20毫秒(等于8kHz的取样率下160个样本)。低频带LPC分析模块的一个实例经配置以计算一组十个LP滤波器系数，以表征低频带语音信号S20的每一20毫秒帧的共振峰结构，且高频带LPC分析模块的一个实例经配置以计算一组六个(或者，八个)LP滤波器系数，以表征高频带语音信号S30的每一20毫秒帧的共振峰结构。还可能实施分析模块以将输入信号处理为一系列重叠帧。The analysis module may be implemented as a linear predictive coding (LPC) analysis module that encodes the spectral envelope of the speech signal into a set of linear predictive (LP) coefficients (eg, coefficients of the all-pole filter 1/A(z)). Analysis modules typically process the input signal as a series of non-overlapping frames, computing a new set of coefficients for each frame. The frame period is generally the period during which the signal is expected to be locally stationary; a common example is 20 milliseconds (equal to 160 samples at a sampling rate of 8kHz). One instance of the low-band LPC analysis module was configured to calculate a set of ten LP filter coefficients to characterize the formant structure of each 20 millisecond frame of the low-band speech signal S20, and one instance of the high-band LPC analysis module was configured by It is configured to compute a set of six (or, eight) LP filter coefficients to characterize the formant structure of each 20 millisecond frame of the high-band speech signal S30. It is also possible to implement an analysis module to process the input signal as a series of overlapping frames.

分析模块可经配置以直接分析每一帧的样本，或可首先根据窗口函数(例如，汉明窗口)来对样本加权。也可在大于所述帧的窗口(例如30毫秒窗口)上执行分析。此窗口可为对称的(例如5-20-5，使得其紧接在20毫秒帧之前和之后包括5毫秒)或非对称的(10-20，使得其包括前一帧的最后10毫秒)。LPC分析模块通常经配置以使用Levinson-Durbin递归式或Leroux-Gueguen算法来计算LP滤波器系数。在另一实施方案中，分析模块可经配置以为每一帧计算一组倒谱系数，而不是一组LP滤波器系数。The analysis module may be configured to analyze the samples of each frame directly, or may first weight the samples according to a window function (eg, a Hamming window). Analysis may also be performed on windows larger than the frame (eg, 30 millisecond windows). This window may be symmetrical (eg 5-20-5 such that it includes 5 milliseconds immediately before and after a 20 millisecond frame) or asymmetrical (10-20 such that it includes the last 10 milliseconds of the previous frame). The LPC analysis module is typically configured to calculate the LP filter coefficients using the Levinson-Durbin recursion or the Leroux-Gueguen algorithm. In another implementation, the analysis module may be configured to compute a set of cepstral coefficients for each frame instead of a set of LP filter coefficients.

通过量化滤波器参数可显著减小语音编码器的输出率，而对再生质量产生相对较小的影响。线性预测滤波器系数难以有效量化且常常由语音编码器映射为另一表示形式(例如线频谱对(LSP)或线频谱频率(LSF))以用于量化和/或熵编码。LP滤波器系数的其它一对一表示形式包括部分自相关系数、对数面积比值、导抗频谱对(ISP)和导抗频谱频率(ISF)，其用于GSM(全球移动通信系统)AMR-WB(自适应多速率宽带)编解码器中。通常，一组LP滤波器系数与相应的一组LSF之间的变换为可逆的，但是实施例也包括变换无法在无误差情况下可逆的语音编码器的实施方案。By quantizing the filter parameters, the output rate of the speech coder can be significantly reduced with relatively little effect on the reproduction quality. Linear prediction filter coefficients are difficult to quantize efficiently and are often mapped by a speech coder into another representation, such as line spectral pairs (LSP) or line spectral frequencies (LSF) for quantization and/or entropy coding. Other one-to-one representations of LP filter coefficients include partial autocorrelation coefficients, log-area ratios, immittance-spectrum-pair (ISP), and immittance-spectrum-frequency (ISF), which are used in GSM (Global System for Mobile Communications) AMR- WB (adaptive multi-rate wideband) codec. Typically, the transformation between a set of LP filter coefficients and a corresponding set of LSFs is reversible, but embodiments also include implementations of speech encoders where the transformation is not invertible without error.

语音编码器通常经配置以量化所述组窄带LSF(或其它系数表示形式)且输出此量化的结果作为滤波器参数。通常使用向量量化器来执行量化，所述向量量化器将输入向量编码为表或密码本中的相应向量条目的索引。此类量化器还可经配置以执行分类向量量化。举例来说，此类量化器可经配置以基于已在相同帧内(例如，在低频带信道中和/或在高频带信道中)编码的信息来选择一组密码本中的一者。这种技术通常以额外密码本存储为代价提供增加的编码效率。Speech encoders are typically configured to quantize the set of narrowband LSFs (or other coefficient representations) and output the results of this quantization as filter parameters. Quantization is typically performed using a vector quantizer that encodes an input vector as an index to a corresponding vector entry in a table or codebook. Such quantizers may also be configured to perform categorical vector quantization. For example, such a quantizer may be configured to select one of a set of codebooks based on information that has been encoded within the same frame (eg, in a low-band channel and/or in a high-band channel). This technique typically provides increased coding efficiency at the cost of additional codebook storage.

语音编码器还可经配置以通过传递语音信号通过根据所述组滤波器系数配置的白化滤波器(也称为分析或预测误差滤波器)来产生残差信号。白化滤波器通常实施为FIR滤波器，但是也可使用IIR实施方案。此残差信号通常将含有语音帧的感觉上重要的信息，例如关于音调的长期结构，其在滤波器参数中未表示。此外，此残差信号通常经量化而用于输出。举例来说，低频带语音编码器A122可经配置以计算残差信号的量化表示形式以作为经编码低频带激励信号S50而输出。通常使用向量量化器来执行此量化，所述向量量化器将输入向量编码为表或密码本中的相应向量条目的索引，且可经配置以执行如上所描述的分类向量量化。The speech encoder may also be configured to generate a residual signal by passing the speech signal through a whitening filter (also called an analysis or prediction error filter) configured according to the set of filter coefficients. Whitening filters are typically implemented as FIR filters, but HR implementations may also be used. This residual signal will usually contain perceptually important information of the speech frame, eg about the long-term structure of pitch, which is not represented in the filter parameters. Furthermore, this residual signal is usually quantized for output. For example, low-band speech encoder A122 may be configured to compute a quantized representation of the residual signal for output as encoded low-band excitation signal S50. This quantization is typically performed using a vector quantizer that encodes input vectors as indices to corresponding vector entries in a table or codebook, and may be configured to perform categorical vector quantization as described above.

或者，此类量化器可经配置以发送一个或一个以上参数，如在稀疏密码本方法中，可在解码器处根据所述参数动态产生向量，而不是从存储装置中检索向量。这种方法用于例如代数CELP(密码本激励线性预测)的编码方案和例如3GPP2(第三代合作伙伴2)EVRC(增强可变速率编解码器)的编解码器中。Alternatively, such quantizers may be configured to send one or more parameters from which vectors may be dynamically generated at the decoder rather than being retrieved from storage, as in a sparse codebook approach. This approach is used in coding schemes such as Algebraic CELP (Codebook Excited Linear Prediction) and codecs such as 3GPP2 (Third Generation Partnership 2) EVRC (Enhanced Variable Rate Codec).

低频带语音编码器A120的一些实施方案经配置以通过识别一组密码本向量中与残差信号最佳匹配的一个密码本向量来计算经编码低频带激励信号S50。然而，注意到，也可实施低频带语音编码器A120以在不实际产生残差信号的情况下计算残差信号的量化表示形式。举例来说，低频带语音编码器A120可经配置以使用许多密码本向量来产生相应的合成信号(例如，根据一组当前的滤波器参数)，且选择与所产生信号相关联的在感觉加权域中与原始低频带语音信号S20最佳匹配的密码本向量。Some implementations of the low-band speech encoder A120 are configured to compute the encoded low-band excitation signal S50 by identifying one of a set of codebook vectors that best matches the residual signal. Note, however, that the low-band speech encoder A120 may also be implemented to compute a quantized representation of the residual signal without actually generating the residual signal. For example, low-band speech encoder A 120 may be configured to use a number of codebook vectors to generate a corresponding synthesized signal (e.g., according to a current set of filter parameters), and to select an in-perceptual weighting associated with the generated signal The codebook vector that best matches the original low-band speech signal S20 in the domain.

可能需要将低频带语音编码器A120或A122实施为分析合成语音编码器。密码本激励线性预测(CELP)编码是分析合成编码的一个通用系列，且此类编码器的实施方案可执行残差的波形编码，包括例如从固定和自适应密码本中选择条目、误差最小化操作和/或感觉加权操作的操作。分析合成编码的其它实施方案包括混合激励线性预测(MELP)、代数CELP(ACELP)、松弛CELP(RCELP)、规则脉冲激励(RPE)、多脉冲CELP(MPE)以及向量和激励线性预测(VSELP)编码。相关编码方法包括多频带激励(MBE)和原型波形内插(PWI)编码。标准的分析合成语音编解码器的实例包括：ETSI(欧洲电信标准协会)-GSM全速率编解码器(GSM 06.10)，其使用残差激励线性预测(RELP)；GSM增强全速率编解码器(ETSI-GSM 06.60)；ITU(国际电信联盟)标准11.8kb/s G.729Annex E编码器；IS(临时标准)-136(时分多路存取方案)的IS-641编解码器；GSM自适应多速率(GSM-AMR)编解码器；和4GV^TM(Fourth-Generation Vocoder^TM)编解码器(加州圣地亚哥市的高通公司(QUALCOMM Incorporated，San Diego，CA))。RCELP编码器的现有实施方案包括如在电信工业协会(TIA)IS-127中所描述的增强可变速率编解码器(EVRC)，和第三代合作伙伴计划2(3GPP2)可选模式声码器(SMV)。可根据这些技术中的任一者或任何其它语音编码技术(无论是已知的还是待开发的)来实施本文所描述的各种低频带、高频带和宽带编码器，其中所述任何其它语音编码技术将语音信号表示为(A)描述滤波器的一组参数和(B)提供用于使所描述的滤波器再生语音信号的激励的至少一部分的残差信号。It may be desirable to implement the low-band vocoder A120 or A122 as an analysis-to-synthesis vocoder. Codebook Excited Linear Predictive (CELP) coding is a general family of analytic-synthetic coding, and implementations of such coders can perform waveform coding of residuals, including, for example, selection of entries from fixed and adaptive codebooks, error minimization Manipulation of manipulation and/or sensory weighting manipulation. Other implementations of analysis-by-synthesis coding include Mixed Excitation Linear Prediction (MELP), Algebraic CELP (ACELP), Relaxed CELP (RCELP), Regular Pulse Excitation (RPE), Multi-Pulse CELP (MPE), and Vector Sum Excited Linear Prediction (VSELP) coding. Related coding methods include Multiband Excitation (MBE) and Prototype Waveform Interpolation (PWI) coding. Examples of standard analytically synthesized speech codecs include: ETSI (European Telecommunications Standards Institute) - GSM Full Rate Codec (GSM 06.10), which uses Residual Excited Linear Prediction (RELP); GSM Enhanced Full Rate Codec ( ETSI-GSM 06.60); ITU (International Telecommunication Union) standard 11.8kb/s G.729 Annex E encoder; IS (Interim Standard)-136 (Time Division Multiple Access Scheme) IS-641 codec; GSM adaptive Multi-Rate (GSM-AMR) codec; and 4GV ^(TM) (Fourth-Generation Vocoder ^(TM )) codec (QUALCOMM Incorporated, San Diego, CA). Existing implementations of RCELP coders include the Enhanced Variable Rate Codec (EVRC) as described in the Telecommunications Industry Association (TIA) IS-127, and the 3rd Generation Partnership Project 2 (3GPP2) Alternative Mode Audio Encoder (SMV). The various low-band, high-band, and wideband encoders described herein may be implemented according to any of these techniques, or any other speech coding technique (whether known or yet to be developed), wherein any other Speech coding techniques represent a speech signal as (A) a set of parameters describing a filter and (B) a residual signal providing at least a portion of the excitation for causing the described filter to reproduce the speech signal.

图12展示高频带突发抑制器C200的实施方案C202的框图，所述实施方案包括突发检测器C10的两个实施方案C10-1、C10-2。突发检测器C10-1经配置以产生指示在低频带语音信号S20中存在突发的低频带突发指示信号SB10。突发检测器C10-2经配置以产生指示在高频带语音信号S30中存在突发的高频带突发指示信号SB20。突发检测器C10-1和C10-2可相同或可为突发检测器C10的不同实施方案的实例。高频带突发抑制器C202还包括：衰减控制信号发生器C20，其经配置以根据低频带突发指示信号SB10与高频带突发指示信号SB20之间的关系产生衰减控制信号SB70；和增益控制元件C150(例如，乘法器或放大器)，其经配置以将衰减控制信号SB70施加给高频带语音信号S30以产生经处理高频带语音信号S30a。12 shows a block diagram of an implementation C202 of high-band burst suppressor C200 that includes two implementations C10-1, C10-2 of burst detector C10. Burst detector C10-1 is configured to generate a low-band burst indication signal SB10 indicating the presence of a burst in the low-band speech signal S20. Burst detector C10-2 is configured to generate a high-band burst indication signal SB20 indicating the presence of a burst in high-band speech signal S30. Burst detectors C10-1 and C10-2 may be the same or may be instances of different implementations of burst detector C10. The high-band burst suppressor C202 further includes: an attenuation control signal generator C20 configured to generate an attenuation control signal SB70 according to a relationship between the low-band burst indication signal SB10 and the high-band burst indication signal SB20; and A gain control element C150 (eg, a multiplier or an amplifier) configured to apply an attenuation control signal SB70 to the high-band speech signal S30 to generate a processed high-band speech signal S30a.

在本文所描述的特定实例中，可假设高频带突发抑制器C202在20毫秒帧中处理高频带语音信号S30，且低频带语音信号S20和高频带语音信号S30两者均在8kHz下被取样。然而，这些特定值仅为实例，且并非限制，且也可根据特定设计选择和/或如本文所述使用其它值。In the particular example described herein, it may be assumed that the high-band burst suppressor C202 processes the high-band speech signal S30 in 20 millisecond frames, and that both the low-band speech signal S20 and the high-band speech signal S30 are at 8 kHz down is sampled. However, these specific values are examples only, and are not limiting, and other values may also be selected according to a particular design and/or used as described herein.

突发检测器C10经配置以计算语音信号的前向和后向平滑包络，且根据前向平滑包络中的边缘与后向平滑包络中的边缘之间的时间关系来指示突发的存在。突发抑制器C202包括突发检测器C10的两个实例，每一者经布置以接收语音信号S20、S30中的各别一者且输出相应的突发指示信号SB10、SB20。Burst detector C10 is configured to calculate the forward and backward smooth envelopes of the speech signal, and to indicate the burstiness according to the time relationship between the edges in the forward smooth envelope and the edges in the backward smooth envelope exist. Burst suppressor C202 comprises two instances of burst detector C10, each arranged to receive a respective one of speech signals S20, S30 and output a respective burst indicating signal SB10, SB20.

图13展示突发检测器C10的实施方案C12的框图，所述实施方案经布置以接收语音信号S20、S30中的一者且输出相应的突发指示信号SB10、SB20。突发检测器C12经配置以在两个阶段计算前向和后向平滑包络中的每一者。在第一阶段，计算器C30经配置以使语音信号转变成恒定极性信号。在一个实例中，计算器C30经配置以将恒定极性信号计算为相应语音信号的当前帧的每一样本的平方。此信号可经平滑化以获得能量包络。在另一实例中，计算器C30经配置以计算每一传入样本的绝对值。此信号可经平滑化以获得幅值包络。计算器C30的其它实施方案可经配置以根据例如削波的另一函数来计算恒定极性信号。13 shows a block diagram of an implementation C12 of burst detector C10 that is arranged to receive one of speech signals S20, S30 and output a corresponding burst-indicating signal SB10, SB20. Burst detector C12 is configured to compute each of the forward and backward smoothing envelopes in two stages. In a first stage, calculator C30 is configured to convert the speech signal into a constant polarity signal. In one example, calculator C30 is configured to calculate the constant polarity signal as the square of each sample of the current frame of the corresponding speech signal. This signal can be smoothed to obtain an energy envelope. In another example, calculator C30 is configured to calculate the absolute value of each incoming sample. This signal can be smoothed to obtain an amplitude envelope. Other implementations of calculator C30 may be configured to calculate the constant polarity signal according to another function, such as clipping.

在第二阶段，前向平滑器C40-1经配置以使恒定极性信号在前向时间方向上平滑化以产生前向平滑包络，且后向平滑器C40-2经配置以使恒定极性信号在后向时间方向上平滑化以产生后向平滑包络。前向平滑包络指示在前向方向上相应语音信号随时间的电平差，且后向平滑包络指示在后向方向上相应语音信号随时间的电平差。In the second stage, forward smoother C40-1 is configured to smooth the constant polarity signal in the forward time direction to produce a forward smoothed envelope, and backward smoother C40-2 is configured to smooth the constant polarity signal The sexual signal is smoothed in the backward time direction to produce a backward smoothing envelope. The forward smoothing envelope indicates the level difference over time of the corresponding speech signal in the forward direction, and the backward smoothing envelope indicates the level difference over time of the corresponding speech signal in the backward direction.

在一个实例中，前向平滑器C40-1实施为一阶无限脉冲响应(IIR)滤波器，其经配置以根据例如以下表达式使恒定极性信号平滑化：In one example, forward smoother C40-1 is implemented as a first-order infinite impulse response (IIR) filter configured to smooth constant polarity signals according to, for example, the following expression:

S_f(n)＝αS_f(n-1)+(1-α)P(n)， _Sf (n)= _αSf (n-1)+(1-α)P(n),

且后向平滑器C40-2实施为一阶IIR滤波器，其经配置以根据例如以下表达式使恒定极性信号平滑化：And backward smoother C40-2 is implemented as a first-order IIR filter configured to smooth a constant polarity signal according to, for example, the following expression:

S_b(n)＝αS_b(n+1)+(1-α)P(n)， _Sb (n)= _αSb (n+1)+(1-α)P(n),

其中n为时间指数，P(n)为恒定极性信号，S_f(n)为前向平滑包络，S_b(n)为后向平滑包络，且α为具有0(无平滑)与1之间的值的衰变因数。可注意到，部分由于例如后向平滑包络的计算的操作的缘故，经处理高频带语音信号S30a中可能引起至少一帧的延迟。然而，这一延迟感觉上相对不重要，且即使在实时语音处理操作中也并非罕见。where n is the time index, P(n) is the constant polarity signal, S _f (n) is the forward smoothing envelope, S _b (n) is the backward smoothing envelope, and α is Decay factor for values between 1. It may be noted that a delay of at least one frame may be induced in the processed high-band speech signal S30a partly due to operations such as the calculation of the backward smoothing envelope. However, this delay is perceived as relatively insignificant and not uncommon even in real-time speech processing operations.

可能需要选择α的值使得平滑器的衰变时间类似于高频带突发的预期持续时间(例如，约5毫秒)。通常，前向平滑器C40-1和后向平滑器C40-2经配置以执行相同平滑操作的互补型式，且使用相同的α值，但在一些实施方案中，两个平滑器可经配置以执行不同操作和/或使用不同值。也可使用其它递归或非递归平滑函数，包括高阶有限脉冲响应(FIR)或IIR滤波器。It may be desirable to choose a value of α such that the decay time of the smoother is similar to the expected duration of the high-band burst (eg, about 5 milliseconds). Typically, forward smoother C40-1 and backward smoother C40-2 are configured to perform complementary versions of the same smoothing operation, and use the same alpha value, but in some implementations, both smoothers may be configured to Do something different and/or use a different value. Other recursive or non-recursive smoothing functions may also be used, including higher order finite impulse response (FIR) or HR filters.

在突发检测器C12的其它实施方案中，前向平滑器C40-1和后向平滑器C40-2中的一者或两者经配置以执行自适应平滑操作。举例来说，前向平滑器C40-1可经配置以根据例如以下表达式来执行自适应平滑操作：In other implementations of burst detector C12, one or both of forward smoother C40-1 and backward smoother C40-2 are configured to perform an adaptive smoothing operation. For example, forward smoother C40-1 may be configured to perform an adaptive smoothing operation according to an expression such as:

其中平滑减少，或在此情况下，在恒定极性信号的强前沿处禁用平滑。在突发检测器C12的此实施方案或另一实施方案中，后向平滑器C40-2可经配置以根据例如以下表达式来执行自适应平滑操作：where smoothing is reduced, or in this case disabled, at strong leading edges of constant polarity signals. In this or another implementation of burst detector C12, backward smoother C40-2 may be configured to perform an adaptive smoothing operation according to, for example, the following expression:

其中平滑减少，或在此情况下，在恒定极性信号的强后沿处禁用平滑。这种自适应平滑可有助于界定前向平滑包络中的突发事件的开始和后向平滑包络中的突发事件的结束。where smoothing is reduced, or in this case disabled, at strong trailing edges of constant polarity signals. Such adaptive smoothing may help define the start of bursts in the forward smoothing envelope and the end of bursts in the backward smoothing envelope.

突发检测器C12包括区域指示器C50的实例(初始区域指示器C50-1)，其经配置以指示在前向平滑包络中高电平事件(例如突发)的开始。突发检测器C12还包括区域指示器C50的实例(终止区域指示器C50-2)，其经配置以指示在后向平滑包络中高电平事件(例如突发)的结束。Burst detector C12 includes an instance of region indicator C50 (initial region indicator C50-1) configured to indicate the start of a high level event (eg, a burst) in the forward smoothing envelope. Burst detector C12 also includes an instance of region indicator C50 (termination region indicator C50-2) configured to indicate the end of a high level event (eg, a burst) in the backward smoothing envelope.

图14a展示初始区域指示器C50-1的实施方案C52-1的框图，所述实施方案包括延迟元件C70-1和加法器。延迟元件C70-1经配置以应用具有正量值的延迟，使得前向平滑包络减小其自身的延迟型式。在另一实例中，可根据所需加权因数对当前样本或延迟样本进行加权。Figure 14a shows a block diagram of an implementation C52-1 of an initial region indicator C50-1 that includes a delay element C70-1 and an adder. Delay element C70-1 is configured to apply a delay of positive magnitude such that the forward smoothing envelope reduces its own delayed version. In another example, current samples or delayed samples may be weighted according to a desired weighting factor.

图14b展示终止区域指示器C50-2的实施方案C52-2的框图，所述实施方案包括延迟元件C70-2和加法器。延迟元件C70-2经配置以应用具有负量值的延迟，使得后向平滑包络减小其自身的提前型式。在另一实例中，可根据所需加权因数对当前样本或提前样本进行加权。Figure 14b shows a block diagram of an implementation C52-2 of a termination region indicator C50-2 that includes a delay element C70-2 and an adder. Delay element C70-2 is configured to apply a delay with a negative magnitude such that the backward smoothing envelope reduces an advanced version of itself. In another example, the current sample or the previous sample may be weighted according to a desired weighting factor.

在区域指示器C52的不同实施方案中可使用各种延迟值，且在初始区域指示器C52-1和终止区域指示器C52-2中可使用具有不同量值的延迟值。可根据所检测区域的所需宽度来选择延迟的量值。举例来说，小的延迟值可用于执行窄边缘区域的检测。为了获得强边缘检测，可能需要使用具有类似于预期边缘宽度的量值(例如，约3个或5个样本)的延迟。Various delay values may be used in different implementations of zone indicator C52, and delay values having different magnitudes may be used in initial zone indicator C52-1 and terminating zone indicator C52-2. The magnitude of the delay can be chosen according to the desired width of the detected area. For example, small delay values can be used to perform detection of narrow edge regions. In order to obtain strong edge detection, it may be necessary to use a delay of a magnitude similar to the expected edge width (eg, about 3 or 5 samples).

或者，区域指示器C50可经配置以指示延伸超过相应边缘的较宽区域。举例来说，初始区域指示器C50-1可能需要指示在前沿之后的一段时间内在前向方向上延伸的事件的初始区域。同样，终止区域指示器C50-2可能需要指示在后沿之前的一段时间内在后向方向上延伸的事件的终止区域。在此情况下，可能需要使用具有较大量值的延迟值，例如类似于突发的预期长度的量值的量值。在一个此类实例中，使用约4毫秒的延迟。Alternatively, area indicator C50 may be configured to indicate a wider area extending beyond the corresponding edge. For example, the initial area indicator C50-1 may need to indicate the initial area of events extending in the forward direction for a period of time after the leading edge. Likewise, the termination region indicator C50-2 may need to indicate the termination region of events extending in the backward direction for a period of time preceding the trailing edge. In this case, it may be desirable to use a delay value with a larger magnitude, for example a magnitude similar to that of the expected length of the burst. In one such example, a delay of about 4 milliseconds is used.

区域指示器C50进行的处理可根据延迟的量值和方向而延伸超过语音信号的当前帧的边界。举例来说，初始区域指示器C50-1进行的处理可向前一帧中延伸，且终止区域指示器C50-2进行的处理可向后一帧中延伸。The processing by region indicator C50 may extend beyond the boundaries of the current frame of the speech signal depending on the magnitude and direction of the delay. For example, the processing by the initial region indicator C50-1 may extend to a previous frame, and the processing by the termination region indicator C50-2 may extend to a subsequent frame.

与语音信号中可能出现的其它高电平事件相比，突发由在时间上与终止区域(如在终止区域指示信号SB60中所指示)重合的初始区域(如在初始区域指示信号SB50中所指示)来辨别。举例来说，当初始区域与终止区域之间的时间距离不大于(或者小于)预定重合间隔(例如突发的预期持续时间)时，可指示突发。重合检测器C60经配置以根据区域指示信号SB50和SB60中的初始区域与终止区域在时间上的重合来指示检测到突发。例如，对于初始区域指示信号SB50和终止区域指示信号SB60指示从各自前沿和后沿延伸的区域的实施方案来说，重合检测器C60可经配置以指示延伸区域在时间上的重叠。Compared with other high-level events that may occur in the speech signal, the burst consists of an initial region (as indicated in the initial region indication signal SB50) that coincides in time with the termination region (as indicated in the termination region indication signal SB60). Instructions) to identify. For example, a burst may be indicated when the temporal distance between the initial region and the terminating region is not greater (or less) than a predetermined coincidence interval (eg, the expected duration of the burst). The coincidence detector C60 is configured to indicate the detection of a burst based on the temporal coincidence of the initial zone and the ending zone in the zone indication signals SB50 and SB60. For example, for implementations in which initial region indication signal SB50 and end region indication signal SB60 indicate regions extending from respective leading and trailing edges, coincidence detector C60 may be configured to indicate temporal overlap of the extended regions.

图15展示重合检测器C60的实施方案C62的框图，所述实施方案包括：削波器C80的第一实例C80-1，其经配置以对初始区域指示信号SB50进行削波；削波器C80的第二实例C80-2，其经配置以对终止区域指示信号SB60进行削波；和均值计算器C90，其经配置以根据经削波信号的均值来输出相应的突发指示信号。削波器C80经配置以根据例如以下表达式来对输入信号的值进行削波：15 shows a block diagram of an implementation C62 of a coincidence detector C60 that includes: a first instance C80-1 of a clipper C80 configured to clip an initial zone indication signal SB50; a clipper C80 A second instance C80-2 of C80-2 configured to clip the end zone indication signal SB60; and a mean value calculator C90 configured to output a corresponding burst indication signal according to the mean value of the clipped signal. Clipper C80 is configured to clip the value of the input signal according to, for example, the following expression:

输出＝max(输入，0)。output = max(input, 0).

或者，削波器C80可经配置以根据例如以下表达式来对输入信号的值按阈值取值：Alternatively, clipper C80 may be configured to threshold the value of the input signal according to, for example, the following expression:

其中，阈值T_L具有大于零的值。通常，削波器C80的实例C80-1和C80-2将使用相同的阈值，但是也可能两个实例C80-1和C80-2使用不同的阈值。Here, the threshold _TL has a value greater than zero. Typically, instances C80-1 and C80-2 of clipper C80 will use the same threshold, but it is also possible that two instances C80-1 and C80-2 use different thresholds.

均值计算器C90经配置以根据经削波信号的均值来输出相应的突发指示信号SB10、SB20，所述突发指示信号指示输入信号中的突发的时间位置和强度且具有等于或大于零的值。尤其对于将具有界定的初始区域和终止区域的突发与仅具有强初始区域或终止区域的其它事件区分来说，几何均值可比算术均值提供更好的结果。举例来说，仅具有一个强边缘的事件的算术均值可能仍然较高，不存在所述边缘之一的事件的几何均值将较低或为零。然而，几何均值通常计算量比算术均值大。在一个实例中，经布置以处理低频带结果的均值计算器C90的实例使用算术均值

且经布置以处理高频带结果的均值计算器C90的实例使用较保守的几何均值

The mean value calculator C90 is configured to output respective burst indicating signals SB10, SB20 according to the mean value of the clipped signal, said burst indicating signal indicating the temporal position and intensity of a burst in the input signal and having a value equal to or greater than zero value. Especially for distinguishing bursts with well-defined initial and termination regions from other events with only strong initiation or termination regions, the geometric mean may provide better results than the arithmetic mean. For example, the arithmetic mean of events with only one strong edge may still be high, and the geometric mean of events without one of the edges will be low or zero. However, the geometric mean is usually more computationally intensive than the arithmetic mean. In one example, an instance of mean calculator C90 arranged to process low-band results uses the arithmetic mean

and the instance of Mean Calculator C90 arranged to handle high-band results uses a more conservative geometric mean

均值计算器C90的其它实施方案可经配置以使用不同种类的均值，例如调和均值。在重合检测器C62的另一实施方案中，初始区域指示信号SB50和终止区域指示信号SB60中的一者或两者在削波之前或之后相对于另一者被加权。Other implementations of mean calculator C90 may be configured to use different kinds of means, such as harmonic means. In another embodiment of coincidence detector C62, one or both of initial region indicating signal SB50 and ending region indicating signal SB60 are weighted relative to the other before or after clipping.

重合检测器C60的其它实施方案经配置以通过测量前沿与后沿之间的时间距离来检测突发。举例来说，一个此类实施方案经配置以将突发识别为在初始区域指示信号SB50中的前沿与在终止区域指示信号SB60中的后沿之间的分开不大于预定宽度的区域。所述预定宽度基于高频带突发的预期持续时间，且在一个实例中，使用约4毫秒的宽度。Other implementations of coincidence detector C60 are configured to detect bursts by measuring the temporal distance between leading and trailing edges. For example, one such implementation is configured to identify a burst as a region where the separation between a leading edge in initial region indicating signal SB50 and a trailing edge in ending region indicating signal SB60 is not greater than a predetermined width. The predetermined width is based on the expected duration of the high-band burst, and in one example, a width of about 4 milliseconds is used.

重合检测器C60的另一实施方案经配置以将初始区域指示信号SB50中的每一前沿在前向方向上扩展所需时间周期(例如，基于高频带突发的预期持续时间)，且将终止区域指示信号SB60中的每一后沿在后向方向上扩展所需时间周期(例如，基于高频带突发的预期持续时间)。此类实施方案可经配置以产生相应的突发指示信号SB10、SB20作为这两个经扩展信号的逻辑与(AND)，或者产生相应的突发指示信号SB10、SB20以指示跨越区域重叠的区的突发的相对强度(例如，通过计算所述被扩展信号的均值)。此类实施方案可经配置以仅扩展超过阈值的边缘。在一个实例中，将边缘扩展约4毫秒的时间周期。Another implementation of coincidence detector C60 is configured to extend each leading edge in initial region indication signal SB50 in the forward direction by a desired period of time (e.g., based on the expected duration of the high-band burst), and to Each trailing edge in termination region indication signal SB60 extends in the backward direction for a desired period of time (eg, based on the expected duration of the high-band burst). Such implementations may be configured to generate respective burst-indicating signals SB10, SB20 as the logical AND (AND) of these two spread signals, or to generate corresponding burst-indicating signals SB10, SB20 to indicate regions that span regions overlapping The relative strength of the burst (eg, by computing the mean of the spread signal). Such implementations may be configured to only expand edges that exceed a threshold. In one example, the edges are extended for a time period of about 4 milliseconds.

衰减控制信号发生器C20经配置以根据低频带突发指示信号SB10与高频带突发指示信号SB20之间的关系来产生衰减控制信号SB70。举例来说，衰减控制信号发生器C20可经配置以根据突发指示信号SB10与SB20之间的算术关系(例如，差)来产生衰减控制信号SB70。The attenuation control signal generator C20 is configured to generate the attenuation control signal SB70 according to the relationship between the low-band burst indication signal SB10 and the high-band burst indication signal SB20 . For example, the attenuation control signal generator C20 may be configured to generate the attenuation control signal SB70 according to an arithmetic relationship (eg, a difference) between the burst indication signals SB10 and SB20 .

图16展示衰减控制信号发生器C20的实施方案C22的框图，所述实施方案经配置以通过从高频带突发指示信号SB20中减去低频带突发指示信号SB10而将低频带突发指示信号SB10与高频带突发指示信号SB20组合。所得的差信号指示突发存在于高频带中的何处，所述突发在低频带中不发生(或较弱)。在另一实施方案中，低频带突发指示信号SB10和高频带突发指示信号SB20中的一者或两者相对于另一者被加权。FIG. 16 shows a block diagram of an implementation C22 of attenuation control signal generator C20 configured to divide the low-band burst indication by subtracting the low-band burst indication signal SB10 from the high-band burst indication signal SB20. Signal SB10 is combined with high-band burst indication signal SB20. The resulting difference signal indicates where in the high frequency band bursts exist that do not occur (or are weaker) in the low frequency band. In another embodiment, one or both of the low-band burst-indicating signal SB10 and the high-band burst-indicating signal SB20 is weighted relative to the other.

衰减控制信号计算器C100根据差信号的值来输出衰减控制信号SB70。举例来说，衰减控制信号计算器C100可经配置以指示根据差信号超过阈值的程度而变化的衰减。The attenuation control signal calculator C100 outputs the attenuation control signal SB70 according to the value of the difference signal. For example, attenuation control signal calculator C100 may be configured to indicate an attenuation that varies according to how much the difference signal exceeds a threshold.

衰减控制信号发生器C20可能需要经配置以对经对数定标的值执行运算。举例来说，可能需要根据突发指示信号的电平之间的比率(例如，根据以分贝或dB为单位的值)来使高频带语音信号S30衰减，且这一比率可根据经对数定标的值的差来容易地计算出。对数定标使信号沿量值轴变形，而不另外改变其形状。图17展示突发检测器C12的实施方案C14，其包括对数计算器C130的实例C130-1、C130-2，所述对数计算器经配置以对前向和后向处理路径的每一者中的平滑包络进行对数定标(例如，以10为底数)。Attenuation control signal generator C20 may need to be configured to operate on logarithmically scaled values. For example, it may be necessary to attenuate the high-band speech signal S30 according to the ratio between the levels of the burst indication signals (for example, according to a value in decibels or dB), and this ratio can be according to the logarithm The difference between the scaled values is easily calculated. Logarithmic scaling distorts a signal along the magnitude axis without otherwise changing its shape. 17 shows an implementation C14 of burst detector C12 that includes instances C130-1, C130-2 of logarithmic calculators C130 configured to The smoothing envelope in the latter is logarithmically scaled (eg, to base 10).

在一个实例中，衰减控制信号计算器C100经配置以根据以下公式计算衰减控制信号SB70的值：In one example, attenuation control signal calculator C100 is configured to calculate the value of attenuation control signal SB70 according to the following formula:

其中D_dB表示高频带突发指示信号SB20与低频带突发指示信号SB10之间的差，T_dB表示阈值，且A_dB为衰减控制信号SB70的相应值。在一个特定实例中，阈值T_dB具有8dB的值。where D _dB represents the difference between the high-band burst indicator signal SB20 and the low-band burst indicator signal SB10, T _dB represents the threshold value, and A _dB is the corresponding value of the attenuation control signal SB70. In one particular example, the threshold T _dB has a value of 8dB.

在另一实施方案中，衰减控制信号计算器C100经配置以根据差信号超过阈值(例如，3dB或4dB)的程度来指示线性衰减。在此实例中，直到差信号超过阈值，衰减控制信号SB70才指示衰减。当差信号超过阈值时，衰减控制信号SB70指示与当前超过阈值的量成线性比例的衰减值。In another implementation, the attenuation control signal calculator C100 is configured to indicate a linear attenuation according to how much the difference signal exceeds a threshold (eg, 3dB or 4dB). In this example, the decay control signal SB70 does not indicate decay until the difference signal exceeds the threshold. When the difference signal exceeds the threshold, the attenuation control signal SB70 indicates an attenuation value that is linearly proportional to the amount by which the threshold is currently exceeded.

高频带突发抑制器C202包括增益控制元件C150(例如乘法器或放大器)，所述增益控制元件经配置以根据衰减控制信号SB70的当前值使高频带语音信号S30衰减以产生经处理高频带语音信号S30a。通常，除非在高频带语音信号S30的当前位置处已检测到高频带突发，否则衰减控制信号SB70指示无衰减的值(例如，1.0或0dB的增益)，在所述已检测到高频带突发的情况下，典型的衰减值为0.3或约10dB的增益减小量。The high-band burst suppressor C202 includes a gain control element C150 (e.g., a multiplier or amplifier) configured to attenuate the high-band speech signal S30 to produce a processed high frequency band speech signal S30a. Typically, unless a high-band burst has been detected at the current position of the high-band speech signal S30, the attenuation control signal SB70 indicates a value of no attenuation (for example, a gain of 1.0 or 0 dB), at which point a high-band burst has been detected. A typical attenuation value is 0.3 or about a 10dB gain reduction in the case of band bursting.

衰减控制信号发生器C22的替代实施方案可经配置以根据逻辑关系将低频带突发指示信号SB10与高频带突发指示信号SB20组合。在一个此类实例中，通过计算高频带突发指示信号SB20与低频带突发指示信号SB10的逻辑反向形式(logical inverse)的逻辑与来组合突发指示信号。在此情况下，突发指示信号的每一者可首先按阈值取值以获得二进制值信号，且衰减控制信号计算器C100可经配置以根据所组合信号的状态来指示两个衰减状态中的一个相应状态(例如，指示无衰减的一个状态)。Alternative implementations of attenuation control signal generator C22 may be configured to combine low-band burst indication signal SB10 with high-band burst indication signal SB20 according to a logical relationship. In one such example, the burst indication signal is combined by computing the logical AND of the high-band burst indication signal SB20 and the logical inverse of the low-band burst indication signal SB10. In this case, each of the burst indication signals may first be thresholded to obtain a binary valued signal, and the attenuation control signal calculator C100 may be configured to indicate the one of the two attenuation states according to the state of the combined signal A corresponding state (eg, a state indicating no decay).

在执行包络计算之前，可能需要使语音信号S20和S30中的一者或两者的频谱整形以使频谱平坦化且/或使一个或一个以上特定频率区域加重或衰减。低频带语音信号S20(例如)可能往往在低频率下具有较多能量，且可能需要减少此能量。还可能需要减少低频带语音信号S20的高频率分量使得突发检测主要基于中间频率。频谱整形是可改进突发抑制器C200的性能的可选操作。It may be desirable to spectrally shape one or both of speech signals S20 and S30 to flatten the spectrum and/or emphasize or attenuate one or more specific frequency regions prior to performing envelope calculations. Low-band speech signal S20, for example, may tend to have more energy at low frequencies, and this energy may need to be reduced. It may also be desirable to reduce the high-frequency components of the low-band speech signal S20 such that burst detection is mainly based on intermediate frequencies. Spectral shaping is an optional operation that may improve the performance of burst suppressor C200.

图18展示突发检测器C14的实施方案C16的框图，所述实施方案包括整形滤波器C110。在一个实例中，滤波器C110经配置以根据例如以下通带转移函数来对低频带语音信号S20进行滤波：18 shows a block diagram of an implementation C16 of burst detector C14 that includes shaping filter C110. In one example, filter C110 is configured to filter low-band speech signal S20 according to, for example, the following passband transfer function:

${F f}_{LB LB} ((z z)) = = \frac{11 + + 0.96 0.96 {z z}^{- - 11} + + 0.96 0.96 {z z}^{- - 22} + + {z z}^{- - 33}}{11 - - 0.5 0.5 {z z}^{- - 11}},,$

其使极低和极高的频率衰减。It attenuates very low and very high frequencies.

可能需要使高频带语音信号S30的低频率衰减且/或增强使较高频率。在一个实例中，滤波器C110经配置以根据例如以下高通转移函数来对高频带语音信号S30进行滤波：It may be desirable to attenuate the low frequencies of the high-band speech signal S30 and/or boost the higher frequencies. In one example, filter C110 is configured to filter high-band speech signal S30 according to, for example, the following high-pass transfer function:

${F f}_{HB HB} ((z z)) = = \frac{0.5 0.5 + + {z z}^{- - 11} + + 0.5 0.5 {z z}^{- - 22}}{11 + + 0.5 0.5 {z z}^{- - 11} + + 0.3 0.3 {z z}^{- - 22}},,$

其使4kHz左右的频率衰减。It attenuates frequencies around 4kHz.

在实际意义上可能不必以相应语音信号S20、S30的全取样速率执行突发检测操作中的至少一些操作。图19展示突发检测器C16的实施方案C18的框图，所述实施方案包括经配置以对所述前向处理路径中的所述平滑包络进行降取样的降取样器C120的实例C120-1和经配置以对所述后向处理路径中的所述平滑包络进行降取样的降取样器C120的实例C120-2。在一个实例中，降取样器C120的每一实例经配置而以八为因数对包络进行降取样。对于在8kHz下取样的20毫秒帧(160个样本)的特定实例来说，此类降取样器将包络减小到1kHz取样率，或每帧20个样本。降取样可显著减小高频带突发抑制操作的计算复杂性，而不会显著影响性能。It may not be necessary in a practical sense to perform at least some of the burst detection operations at the full sampling rate of the respective speech signal S20, S30. 19 shows a block diagram of an implementation C18 of burst detector C16 that includes an instance C120-1 of downsampler C120 configured to downsample the smoothed envelope in the forward processing path. and an instance C120-2 of a downsampler C120 configured to downsample the smoothed envelope in the backward processing path. In one example, each instance of downsampler C 120 is configured to downsample the envelope by a factor of eight. For the specific example of a 20 millisecond frame (160 samples) sampled at 8 kHz, such a downsampler reduces the envelope to a 1 kHz sampling rate, or 20 samples per frame. Downsampling significantly reduces the computational complexity of high-band burst suppression operations without significantly affecting performance.

由增益控制元件C150施加的衰减控制信号可能需要与高频带语音信号S30具有相同的取样率。图20展示衰减控制信号发生器C22的实施方案C24的框图，所述实施方案可结合突发检测器C10的降取样型式来使用。衰减控制信号发生器C24包括升取样器C140，所述升取样器C140经配置以对衰减控制信号SB70进行升取样而获得取样率等于高频带语音信号S30的取样率的信号SB70a。The attenuation control signal applied by the gain control element C150 may need to have the same sampling rate as the high-band speech signal S30. 20 shows a block diagram of an implementation C24 of attenuation control signal generator C22 that may be used in conjunction with a downsampled version of burst detector C10. The attenuation control signal generator C24 includes an upsampler C140 configured to upsample the attenuation control signal SB70 to obtain a signal SB70a having a sampling rate equal to that of the highband speech signal S30.

在一个实例中，升取样器C140经配置以通过衰减控制信号SB70的零阶内插来执行升取样。在另一实例中，升取样器C140经配置以另外通过在衰减控制信号SB70的值之间进行内插(例如，通过传递衰减控制信号SB70通过FIR滤波器)以获得较不陡峭的过渡，来执行升取样。在另一实例中，升取样器C140经配置以使用窗口正弦函数来执行升取样。In one example, upsampler C140 is configured to perform upsampling by zero-order interpolation of attenuation control signal SB70. In another example, upsampler C140 is configured to additionally obtain less steep transitions by interpolating between values of attenuation control signal SB70 (e.g., by passing attenuation control signal SB70 through a FIR filter). Perform upsampling. In another example, upsampler C140 is configured to perform upsampling using a windowed sine function.

在一些情况下(例如在电池供电装置(例如，蜂窝式电话)中)，高频带突发抑制器C200可经配置以选择性地被禁用。举例来说，可能需要在装置的省电模式下禁用例如高频带突发抑制的操作。In some cases, such as in a battery powered device such as a cellular telephone, the high band burst suppressor C200 may be configured to be selectively disabled. For example, it may be desirable to disable operations such as high-band burst suppression in the device's power saving mode.

如上所提及，本文所描述的实施例包括可用于执行嵌入式编码的实施方案，支持与窄带系统的兼容性以及避免需要代码转换。对高频带编码的支持还可用于基于成本来区分具有宽带支持以及向后兼容性的芯片、芯片集、装置和/或网络与仅具有窄带支持的芯片、芯片集、装置和/或网络。本文所描述的对高频带编码的支持还可结合用于支持低频带编码的技术来使用，且根据此实施例的系统、方法或设备可支持(例如)约50或100Hz直到约7或8kHz的频率分量的编码。As mentioned above, the embodiments described herein include implementations that can be used to perform embedded encoding, support compatibility with narrowband systems, and avoid the need for transcoding. Support for high-band encoding may also be used to differentiate chips, chipsets, devices, and/or networks with wideband support and backward compatibility from those with narrowband support only, based on cost. The support for high-band coding described herein can also be used in conjunction with techniques for supporting low-band coding, and a system, method, or device according to this embodiment can support, for example, about 50 or 100 Hz up to about 7 or 8 kHz The encoding of the frequency components of .

如上所提及，向语音编码器添加高频带支持可改进可识度，尤其可改进关于摩擦音的区分的可识度。虽然通常可由听者从特定上下文中导出这种区分，但是高频带支持可用作语音识别和其它机器解译应用(例如用于自动化语音菜单导航和/或自动呼叫处理的系统)中的启动特征。高频带突发抑制可增加机器解译应用中的精确性，且预期高频带突发抑制器C200的实施方案可用于一个或一个以上进行或不进行语音编码的此类应用中。As mentioned above, adding high-band support to the speech coder can improve intelligibility, especially with respect to the distinction of fricatives. While such distinctions can often be derived by the listener from a particular context, high-band support can be used as a trigger in speech recognition and other machine interpretation applications such as systems for automated voice menu navigation and/or automated call handling feature. High-band burst suppression can increase accuracy in machine interpretation applications, and it is contemplated that an implementation of high-band burst suppressor C200 can be used in one or more such applications with or without speech coding.

根据一实施例的设备可嵌入用于无线通信的便携式装置中，所述便携式装置例如蜂窝式电话或个人数字助理(PDA)。或者，此类设备可包括在另一通信装置中，所述另一通信装置例如：VoIP手机、经配置以支持VoIP通信的个人计算机，或经配置以路由电话或VoIP通信的网络装置。举例来说，根据一实施例的设备可在用于通信装置的芯片或芯片集中实施。视特定应用而定，此类装置还可包括例如以下特征：语音信号的模拟-数字转换和/或数字-模拟转换、用于对语音信号执行放大和/或其它信号处理操作的电路，和/或用于发射和/或接收经编码语音信号的射频电路。An apparatus according to an embodiment may be embedded in a portable device for wireless communication, such as a cellular telephone or a personal digital assistant (PDA). Alternatively, such equipment may be included in another communication device, such as a VoIP handset, a personal computer configured to support VoIP communications, or a network device configured to route telephone or VoIP communications. For example, an apparatus according to an embodiment may be implemented on a chip or chip set for a communication device. Depending on the particular application, such devices may also include features such as analog-to-digital conversion and/or digital-to-analog conversion of speech signals, circuitry for performing amplification and/or other signal processing operations on speech signals, and/or Or a radio frequency circuit for transmitting and/or receiving encoded speech signals.

明确地预期且揭示：实施例可包括此处引用的被公开的专利申请中揭示的其它特征中的任一者或一者以上且/或与所述其它特征中的任一者或一者以上一起使用，本申请案主张所述美国临时专利申请案的权益。此类特征包括从低频带激励信号中产生高频带激励信号，其可包括其它特征，例如：反稀疏滤波、使用非线性函数进行谐波延伸、经调制噪声信号与经频谱延伸的信号的混合，和/或自适应白化。此类特征包括根据在低频带编码器中执行的规则化来对高频带语音信号进行时间扭曲。此类特征包括根据原始语音信号与合成语音信号之间的关系来编码增益包络。此类特征包括使用重叠的滤波器组来从宽带语音信号中获得低频带和高频带语音信号。此类特征包括根据低频带激励信号S50的规则化或其他移位来使高频带信号S30和/或一高频带激励信号移位。此类特征包括系数表示形式(例如高频带LSF)的固定或自适应平滑。此类特征包括与系数表示形式(例如LSF)的量化相关联的噪声的固定或自适应整形。此类特征还包括增益包络的固定或自适应平滑，和增益包络的自适应衰减。It is expressly contemplated and disclosed that embodiments may comprise and/or be combined with any one or more of the other features disclosed in the published patent applications referenced herein. Used together, this application claims the benefit of said US Provisional Patent Application. Such features include the generation of high-band excitation signals from low-band excitation signals, which may include other features such as: inverse sparse filtering, harmonic extension using nonlinear functions, mixing of modulated noise signals with spectrally extended signals , and/or adaptive whitening. Such features include time-warping the high-band speech signal according to the regularization performed in the low-band encoder. Such features include encoding gain envelopes according to the relationship between the original speech signal and the synthesized speech signal. Such features include the use of overlapping filter banks to obtain low-band and high-band speech signals from wideband speech signals. Such features include shifting the high-band signal S30 and/or a high-band excitation signal according to a regularization or other shifting of the low-band excitation signal S50. Such features include fixed or adaptive smoothing of coefficient representations such as high-band LSFs. Such features include fixed or adaptive shaping of noise associated with quantization of coefficient representations such as LSFs. Such features also include fixed or adaptive smoothing of the gain envelope, and adaptive decay of the gain envelope.

提供对所描述的实施例的以上介绍以使所属领域的技术人员能够制造或使用本发明。对这些实施例的各种修改是可能的，且本文所提出的一般原理也可应用于其它实施例。举例来说，实施例可部分或整体实施为硬连线电路、实施为制作成专用集成电路的电路配置，或实施为加载到非易失性存储装置中的固件程序或作为机器可读代码从数据存储媒体加载或加载到数据存储媒体中的软件程序，所述代码为可由逻辑元件阵列(例如微处理器或其它数字信号处理单元)执行的指令。数据存储媒体可为：存储元件的阵列，例如：半导体存储器(其可包括(不限于)动态或静态RAM(随机存取存储器)、ROM(只读存储器)和/或快闪RAM)，或铁电、磁阻、双向、聚合或相变存储器；或盘式媒体，例如磁盘或光盘。应将术语“软件”理解为包括源代码、汇编语言代码、机器代码、二进制代码、固件、宏代码、微码、可由逻辑元件阵列执行的任何一个或一个以上指令集或指令序列，以及此类实例的任何组合。The above description of the described embodiments is provided to enable any person skilled in the art to make or use the invention. Various modifications to these embodiments are possible, and the generic principles presented herein may be applied to other embodiments as well. For example, an embodiment may be implemented in part or in whole as a hardwired circuit, as a circuit configuration fabricated as an application specific integrated circuit, or as a firmware program loaded into non-volatile storage or as machine-readable code from The data storage medium loads or loads a software program into the data storage medium, the code being instructions executable by an array of logic elements such as a microprocessor or other digital signal processing unit. The data storage medium may be: an array of storage elements such as semiconductor memory (which may include, but is not limited to, dynamic or static RAM (random access memory), ROM (read only memory), and/or flash RAM), or iron electrical, magnetoresistive, bidirectional, polymeric or phase change memory; or disk media such as magnetic or optical disks. The term "software" shall be understood to include source code, assembly language code, machine code, binary code, firmware, macrocode, microcode, any one or more sets or sequences of instructions executable by an array of logic elements, and such Any combination of instances.

高频带语音编码器A200，宽带语音编码器A100、A102和A104以及高频带突发抑制器C200的实施方案的各种元件及包括一个或一个以上此类设备的布置可实施为驻留在(例如)同一芯片上或一芯片集中的两个或两个以上芯片之间的电子和/或光学装置，但是也涵盖不存在这种限制的其它布置。此类设备的一个或一个以上元件可整体或部分地实施为经布置以执行一个或一个以上固定或可编程逻辑元件(例如，晶体管、门)阵列的一个或一个以上指令集，所述阵列例如微处理器、嵌入式处理器、IP核心、数字信号处理器、FPGA(现场可编程门阵列)、ASSP(专用标准产品)和ASIC(专用集成电路)。一个或一个以上此类元件也可能具有共同结构(例如，用于在不同时间执行对应于不同元件的代码部分的处理器、用于在不同时间执行对应于不同元件的任务的指令集，或在不同时间为不同元件执行操作的电子和/或光学装置的布置)。此外，一个或一个以上此类元件有可能用于执行与所述设备的操作不直接相关的任务或其它指令集，例如与内嵌有所述设备的装置或系统的另一操作相关的任务。Various elements of an implementation of highband vocoder A200, wideband vocoders A100, A102 and A104, and highband burst suppressor C200 and arrangements comprising one or more such devices may be implemented to reside in For example, electronic and/or optical devices on the same chip or between two or more chips in a chip set, but other arrangements not subject to such limitations are also contemplated. One or more elements of such an apparatus may be implemented, in whole or in part, as one or more sets of instructions arranged to execute one or more arrays of fixed or programmable logic elements (e.g., transistors, gates), such as Microprocessors, embedded processors, IP cores, digital signal processors, FPGAs (Field Programmable Gate Arrays), ASSPs (Application Specific Standard Products) and ASICs (Application Specific Integrated Circuits). One or more such elements may also have a common structure (e.g., a processor for executing portions of code corresponding to different elements at different times, a set of instructions for executing tasks corresponding to different elements at different times, or An arrangement of electronic and/or optical devices that performs operations for different elements at different times). Furthermore, it is possible that one or more such elements are used to perform tasks or other sets of instructions not directly related to the operation of the apparatus, such as tasks related to the operation of another apparatus or system in which the apparatus is embedded.

实施例还包括额外的语音处理、语音编码和高频带突发抑制方法，所述方法如本文(例如)通过描述经配置以执行此类方法的结构实施例而明确揭示。这些方法的每一者也可确实地实施(例如，在上文所列举的一个或一个以上数据存储媒体中)为一个或一个以上可由包括逻辑元件阵列(例如，处理器、微处理器、微控制器或其它有限状态机)的机器读取和/或执行的指令集。因此，本发明不希望限于上文所展示的实施例，而是应符合与本文中以任何方式揭示的原理和新颖特征一致的最广泛范围。Embodiments also include additional speech processing, speech coding, and high-band burst suppression methods as explicitly disclosed herein, for example, by describing structural embodiments configured to perform such methods. Each of these methods may also be tangibly implemented (e.g., in one or more of the data storage media enumerated above) as one or more devices comprising an array of logic elements (e.g., a processor, microprocessor, microprocessor, A set of instructions that a machine reads and/or executes, such as a controller or other finite state machine. Thus, the present invention is not intended to be limited to the embodiments shown above but is to be accorded the widest scope consistent with the principles and novel features disclosed in any way herein.

Claims

1. signal processing method, said method comprises:

Calculate the first burst indicator signal, whether the said first burst indicator signal indication detects burst in the low frequency part of audio speech signal;

Calculate the second burst indicator signal, whether the said second burst indicator signal indication detects burst in the HFS of said audio speech signal;

Arithmetic relation or logical relation according between said first burst indicator signal and the said second burst indicator signal produce attenuation control signal; And

The said HFS that said attenuation control signal is imposed on said audio speech signal is to produce treated high-frequency signal part.

2. signal processing method according to claim 1, at least one in wherein said calculating first burst indicator signal and the said calculating second burst indicator signal comprises:

Produce the level and smooth envelope of forward direction of said voice signal appropriate section, the level and smooth envelope of said forward direction is level and smooth on positive time direction;

The prime area of the burst of indication in the level and smooth envelope of said forward direction;

Produce the back to level and smooth envelope of said voice signal appropriate section, said back is level and smooth on negative time orientation to level and smooth envelope; And

Indication is in the termination zone of the burst of said back in level and smooth envelope.

3. signal processing method according to claim 2, at least one in wherein said calculating first burst indicator signal and the said calculating second burst indicator signal comprise and detect said prime area and stop regional overlapping in time with said.

4. signal processing method according to claim 2, at least one in wherein said calculating first burst indicator signal and the said calculating second burst indicator signal comprise according to said prime area and stop regional in time overlapping and indicate burst with said.

5. method according to claim 2; In wherein said calculating first burst indicator signal and the said calculating second burst indicator signal at least one comprises according to the corresponding burst indicator signal of the mean value computation of two signals, and said two signals are that (A) is based on the signal of the indication of said prime area with (B) based on the said signal that stops the indication in zone.

6. method according to claim 1, the level of burst on logarithmically calibrated scale that at least one indication in wherein said first burst indicator signal and the said second burst indicator signal is detected.

7. method according to claim 1, wherein said generation attenuation control signal comprise that the difference that happens suddenly between the indicator signal according to the said first burst indicator signal and said second produces said attenuation control signal.

8. method according to claim 1, wherein said generation attenuation control signal comprise that the degree that the level according to the said second burst indicator signal surpasses the level of the said first burst indicator signal produces said attenuation control signal.

9. method according to claim 1; The wherein said said HFS that said attenuation control signal is imposed on said audio speech signal comprises following among both at least one: (A) said HFS and said attenuation control signal are multiplied each other and (B) amplify said HFS according to said attenuation control signal.

10. method according to claim 1, said method comprise handles said audio speech signal to obtain said low frequency part and said HFS.

11. method according to claim 1, said method comprise becoming a plurality of at least coefficient of linear prediction wave filter based on said treated high-frequency signal signal encoding partly.

12. method according to claim 11, said method comprise said low frequency part is encoded into more than at least the second coefficient of linear prediction wave filter and through code-excited signal,

Wherein said coding comprises according to the gain envelope of encoding based on said signal through code-excited signal based on the signal of said treated high-frequency signal part based on the signal of said treated high-frequency signal part.

13. method according to claim 11, said method comprise said low frequency part is encoded into more than at least the second coefficient of linear prediction wave filter and through code-excited signal, and

Produce high band excitation signal based on said through code-excited signal,

Wherein said coding comprises according to the gain envelope of encoding based on the signal of said high band excitation signal based on the signal of said treated high-frequency signal part based on the signal of said treated high-frequency signal part.

14. an equipment that comprises the highband burst rejector, said highband burst rejector comprises:

First burst detector, it is through being configured to export the first burst indicator signal, and whether the said first burst indicator signal indication detects burst in the low frequency part of audio speech signal;

Second burst detector, it is through being configured to export the second burst indicator signal, and whether the said second burst indicator signal indication detects burst in the HFS of said audio speech signal;

The attenuation control signal generator, it is through being configured to produce attenuation control signal according to arithmetic relation or logical relation between said first burst indicator signal and the said second burst indicator signal; And

Gain control element, it is through being configured to said attenuation control signal is imposed on the said HFS of said audio speech signal.

15. equipment according to claim 14, at least one in wherein said first burst detector and said second burst detector comprises:

The forward direction smoother, it is through being configured to produce the level and smooth envelope of forward direction of said voice signal appropriate section, and said forward direction smoothly is included on the positive time direction level and smooth;

The first area indicator, it is through being configured to indicate the prime area of the burst in the level and smooth envelope of said forward direction;

The back is to smoother, its through be configured to produce said voice signal appropriate section after to level and smooth envelope, said back is level and smooth on negative time orientation to level and smooth envelope; And

The second area indicator, it is through being configured to indicate the termination zone of burst in level and smooth envelope after said.

16. equipment according to claim 15; At least one burst detector in wherein said first burst detector and second burst detector comprises coincidence detector, and said coincidence detector stops zone overlapping in time through being configured to detect said prime area with said.

17. equipment according to claim 15; At least one burst detector in wherein said first burst detector and second burst detector comprises coincidence detector, and said coincidence detector is indicated burst through being configured to stop in time overlapping of zone according to said prime area and said.

18. equipment according to claim 15; At least one burst detector in wherein said first burst detector and second burst detector comprises coincidence detector; Said coincidence detector its through being configured to export corresponding burst indicator signal according to the average of two signals, said two signals be (A) based on the signal of the indication of said prime area with (B) based on the said signal that stops the indication in zone.

19. equipment according to claim 14, the level of burst on logarithmically calibrated scale that at least one indication in wherein said first burst indicator signal and the said second burst indicator signal is detected.

20. equipment according to claim 14, wherein said attenuation control signal generator is through being configured to produce said attenuation control signal according to the difference between said first burst indicator signal and the said second burst indicator signal.

21. equipment according to claim 14, wherein said attenuation control signal generator produces said attenuation control signal through the degree that is configured to level according to the said second burst indicator signal and surpasses the level of the said first burst indicator signal.

22. equipment according to claim 14, wherein said gain control element comprises at least one in multiplier and the amplifier.

23. equipment according to claim 14, said equipment comprises bank of filters, and said bank of filters is through being configured to handle said voice signal to obtain said low frequency part and said HFS.

24. equipment according to claim 14, said equipment comprises the high frequency band speech coder, and said high frequency band speech coder is through being configured to that the signal encoding based on the output of said gain control element is become a plurality of at least coefficient of linear prediction wave filter.

25. equipment according to claim 24, said equipment comprises the low-frequency band speech coder, and said low-frequency band speech coder is through being configured to that said low frequency part is encoded into more than at least the second coefficient of linear prediction wave filter and through code-excited signal,

Wherein said high frequency band speech coder is through being configured to according to the gain envelope of encoding based on said signal through code-excited signal based on the signal of the output of said gain control element.

26. equipment according to claim 25, wherein said high band encoder is through being configured to produce high band excitation signal based on said through code-excited signal, and

Wherein said high frequency band speech coder is through being configured to according to the gain envelope of encoding based on the signal of said high band excitation signal based on the signal of the output of said gain control element.

27. equipment according to claim 14, said equipment comprises cellular phone.

28. a signal handling equipment, it comprises:

Be used to calculate the device of the first burst indicator signal, whether the said first burst indicator signal indication detects burst in the low frequency part of audio speech signal;

Be used to calculate the device of the second burst indicator signal, whether the said second burst indicator signal indication detects burst in the HFS of said audio speech signal;

Be used for according to the arithmetic relation between said first burst indicator signal and the said second burst indicator signal or the device of logical relation generation attenuation control signal; And

The said HFS that is used for said attenuation control signal is imposed on said audio speech signal is to produce the device of treated high-frequency signal part.

29. equipment according to claim 28, the wherein said device that is used for calculating the first burst indicator signal and said are used to calculate at least one of device of the second burst indicator signal and comprise:

Be used to produce the device of the level and smooth envelope of forward direction of said voice signal appropriate section, the level and smooth envelope of said forward direction is level and smooth on positive time direction;

Be used in reference to the device of the prime area that is shown in the burst in the level and smooth envelope of said forward direction;

Be used to produce the back device to level and smooth envelope of said voice signal appropriate section, said back is level and smooth on negative time orientation to level and smooth envelope; And

Be used in reference to the device in the termination zone that is shown in the burst of said back in level and smooth envelope.