CN116997960A

CN116997960A - Multi-band ducking in the field of audio signal technology

Info

Publication number: CN116997960A
Application number: CN202280021662.XA
Authority: CN
Inventors: R·泰亚吉; H·普恩豪根
Original assignee: Dolby International AB; Dolby Laboratories Licensing Corp
Current assignee: Dolby International AB; Dolby Laboratories Licensing Corp
Priority date: 2021-04-06
Filing date: 2022-04-01
Publication date: 2023-11-03

Abstract

A method for multiband evasion of an audio signal is provided. In some embodiments, the method involves receiving an input audio signal at a decoder, wherein the input audio signal is a downmix audio signal. In some embodiments, the method involves dividing the input audio signal into a first set of frequency bands. In some embodiments, the method involves determining a set of evasion gains, the evasion gains corresponding to frequency bands in the first set of frequency bands. In some embodiments, the method involves generating a wideband decorrelated audio signal, wherein the evasion gains of the set of evasion gains are applied to at least one of: 1) A second set of frequency bands prior to generating the at least one wideband decorrelated audio signal; or 2) a third set of frequency bands separating the at least one wideband decorrelated audio signal.

Description

Multi-band ducking in the field of audio signal technology

相关申请的交叉引用Cross-references to related applications

本申请要求于2022年03月08日提交的美国临时申请63/268,991以及于2021年04月06日提交的美国临时申请63/171,219的优先权，所有这些申请均通过引用以其整体并入本文。This application claims priority from U.S. Provisional Application 63/268,991, filed on March 08, 2022, and U.S. Provisional Application 63/171,219, filed on April 06, 2021, all of which are incorporated herein by reference in their entirety. .

本公开涉及用于音频信号的多频带闪避(ducking)的系统、方法和介质。The present disclosure relates to systems, methods, and media for multi-band ducking of audio signals.

背景技术Background technique

例如，可以执行音频信号的闪避以衰减各种类型的信号，如瞬变信号。然而，传统上执行的音频信号的闪避可能导致各种伪影，如振铃伪影、渲染空间场景时不期望的伪影等。For example, ducking of audio signals can be performed to attenuate various types of signals, such as transients. However, traditionally performed ducking of audio signals can lead to various artifacts such as ringing artifacts, undesirable artifacts when rendering spatial scenes, etc.

符号和术语Symbols and terminology

贯穿本公开，包括在权利要求书中，术语“扬声器(speaker)”、“扩音器(loudspeaker)”和“音频再现换能器”同义地用于表示任何发声换能器或一组换能器。一套典型的耳机包括两个扬声器。扬声器可以被实施为包括多个换能器(如低音扬声器和高音扬声器)，所述换能器可以由单个公共扬声器馈送或多个扬声器馈送驱动。在一些示例中，(多个)扬声器馈送可以在耦接到不同换能器的不同电路分支中经历不同处理。Throughout this disclosure, including in the claims, the terms "speaker", "loudspeaker" and "audio reproduction transducer" are used synonymously to refer to any sound-generating transducer or group of transducers. energy device. A typical set of headphones includes two speakers. The loudspeaker may be implemented to include multiple transducers (eg woofer and tweeter) which may be driven by a single common loudspeaker feed or multiple loudspeaker feeds. In some examples, the speaker feed(s) may undergo different processing in different circuit branches coupled to different transducers.

贯穿本公开，包括在权利要求中，在广义上使用“对”信号或数据执行操作的表达(如对信号或数据进行滤波、缩放、变换或应用增益)来表示直接对信号或数据执行操作或者对信号或数据的已处理版本执行操作。例如，可以对信号的在对其执行操作之前已经经过初步滤波或预处理的版本执行操作。Throughout this disclosure, including in the claims, the expression "performing an operation on" a signal or data (such as filtering, scaling, transforming or applying a gain to the signal or data) is used broadly to mean performing an operation directly on the signal or data or Perform operations on a processed version of a signal or data. For example, an operation may be performed on a version of the signal that has been initially filtered or preprocessed before the operation is performed on it.

贯穿本公开，包括在权利要求中，在广义上使用表达“系统”来表示设备、系统或子系统。例如，实施解码器的子系统可以被称为解码器系统，并且包括这样的子系统的系统(例如，响应于多个输入而生成X个输出信号的系统，其中，所述子系统生成M个输入，而其他X-M个输入是从外部源接收的)也可以被称为解码器系统。Throughout this disclosure, including in the claims, the expression "system" is used broadly to refer to a device, system or subsystem. For example, a subsystem implementing a decoder may be referred to as a decoder system, and a system including such subsystems (e.g., a system that generates X output signals in response to multiple inputs, where the subsystem generates M input, while the other X-M inputs are received from external sources) may also be called a decoder system.

贯穿本公开，包括在权利要求中，在广义上使用术语“处理器”来表示可编程或以其他方式可配置(如用软件或固件)为对数据(其可以包括音频或视频或其他图像数据)执行操作的系统或设备。处理器的示例包括现场可编程门阵列(或其他可配置集成电路或芯片组)、被编程和/或以其他方式被配置成对音频或其他声音数据执行流水线式处理的数字信号处理器、可编程通用处理器或计算机、以及可编程微处理器芯片或芯片组。Throughout this disclosure, including in the claims, the term "processor" is used in a broad sense to mean a processor that is programmable or otherwise configurable (such as with software or firmware) to process data (which may include audio or video or other image data). ) the system or device on which the operation is performed. Examples of processors include field programmable gate arrays (or other configurable integrated circuits or chipsets), digital signal processors programmed and/or otherwise configured to perform pipelined processing of audio or other sound data, Programming general-purpose processors or computers, and programmable microprocessor chips or chipsets.

发明内容Contents of the invention

本公开的至少一些方面可以经由方法来实施。一些方法可以涉及在解码器处接收输入音频信号，其中，所述输入音频信号是下混音频信号。一些方法可以涉及将所述输入音频信号分成第一组频带。一些方法可以涉及确定一组闪避增益，该组闪避增益中的闪避增益对应于所述第一组频带中的频带。一些方法可以涉及生成至少一个宽带去相关音频信号，其中，所述至少一个宽带去相关音频信号可用于对所述下混音频信号进行上混，并且其中，该组闪避增益中的闪避增益被应用于以下各项中的至少一项：1)在生成所述至少一个宽带去相关音频信号之前的第二组频带；或者2)将所述至少一个宽带去相关音频信号分隔开的第三组频带。At least some aspects of the present disclosure can be implemented via methods. Some methods may involve receiving an input audio signal at a decoder, wherein the input audio signal is a downmix audio signal. Some methods may involve dividing the input audio signal into a first set of frequency bands. Some methods may involve determining a set of ducking gains, the ducking gains of the set corresponding to frequency bands of the first set of frequency bands. Some methods may involve generating at least one wideband decorrelated audio signal, wherein the at least one wideband decorrelated audio signal may be used to upmix the downmix audio signal, and wherein a ducking gain of the set of ducking gains is applied at least one of: 1) a second set of frequency bands prior to generating the at least one wideband decorrelated audio signal; or 2) a third set of frequency bands separating the at least one wideband decorrelated audio signal frequency band.

在一些示例中，该组闪避增益包括一组输入闪避增益，并且所述方法进一步包括在生成所述至少一个宽带去相关音频信号之前将该组输入闪避增益中的输入闪避增益应用于所述第二组频带。在一些示例中，与所述第二组频带中的频带相关联的闪避信号被聚合以生成宽带闪避信号，所述宽带闪避信号被提供给去相关器，所述去相关器被配置成生成所述至少一个宽带去相关音频信号。In some examples, the set of ducking gains includes a set of input ducking gains, and the method further includes applying an input ducking gain of the set of input ducking gains to the first before generating the at least one wideband decorrelated audio signal. Two sets of frequency bands. In some examples, ducking signals associated with frequency bands in the second set of frequency bands are aggregated to generate a wideband ducking signal that is provided to a decorrelator configured to generate the at least one wideband decorrelated audio signal.

在一些示例中，所述第一组频带和所述第二组频带是同一组频带的两个实例。In some examples, the first set of frequency bands and the second set of frequency bands are two instances of the same set of frequency bands.

在一些示例中，该组闪避增益包括一组输出闪避增益，一些方法可以进一步涉及：将该组输出闪避增益中的输出闪避增益应用于所述第三组频带，以生成至少一组闪避去相关音频信号，所述至少一组闪避去相关音频信号中的每个闪避去相关音频信号对应于所述第三组频带中的频带；以及使所述至少一组闪避去相关音频信号中的闪避去相关音频信号聚合，以生成至少一个宽带闪避去相关音频信号，所述至少一个宽带闪避去相关音频信号能够用于对所述下混音频信号进行上混。In some examples, the set of ducking gains includes a set of output ducking gains, and some methods may further involve: applying an output ducking gain of the set of output ducking gains to the third set of frequency bands to generate at least one set of ducking decorrelation audio signals, each duck-decorrelated audio signal in the at least one set of duck-decorrelated audio signals corresponding to a frequency band in the third set of frequency bands; and causing the duck-decorrelated audio signals in the at least one set of duck-decorrelated audio signals to The correlated audio signals are aggregated to generate at least one wideband ducking decorrelated audio signal that can be used to upmix the downmix audio signal.

在一些示例中，确定该组闪避增益包括：确定一个或多个初始闪避增益；以及修改所述一个或多个初始闪避增益中的至少一个以生成该组闪避增益，其中，所述一个或多个初始闪避增益中的至少一个是通过执行更新和/或释放控制来修改的。In some examples, determining the set of ducking gains includes: determining one or more initial ducking gains; and modifying at least one of the one or more initial ducking gains to generate the set of ducking gains, wherein the one or more initial ducking gains At least one of the initial dodge gains is modified by performing an update and/or releasing control.

在一些示例中，对于所述第一组频带中的频带，基于包括两个包络跟踪器的输出的比率来确定对应的闪避增益，所述两个包络跟踪器对应于慢包络跟踪器和快包络跟踪器。在一些示例中，所述慢包络跟踪器包括绝对值计算块和第一低通滤波器，并且其中，所述快包络跟踪器包括所述绝对值计算块和第二低通滤波器，所述第一低通滤波器和所述第二低通滤波器具有不同的时间常数。在一些示例中，一些方法可以进一步涉及将高通滤波器应用于所述第一组频带中的至少一个频带，其中，所述高通滤波器的输出被提供给所述两个包络跟踪器中的至少一个。在一些示例中，所述高通滤波器被应用于所述第一组频带中的两个或更多个频带，并且其中，应用于所述两个或更多个频带中的第一频带的高通滤波器与应用于所述两个或更多个频带中的第二频带的高通滤波器具有不同的截止频率。在一些示例中，所述慢包络跟踪器的第一低通滤波器的时间常数比所述快包络跟踪器的第二低通滤波器的时间常数长，并且其中，所述比率包括所述慢包络跟踪器的输出与所述快包络跟踪器的输出之比。在一些示例中，所述慢包络跟踪器的第一低通滤波器的时间常数比所述快包络跟踪器的第二低通滤波器的时间常数长，并且其中，所述比率包括所述快包络跟踪器的输出与所述慢包络跟踪器之比。在一些示例中，所述比率包括特定于所述第一组频带中的频带的常数，所述常数被选择为控制以下各项中的至少一项：1)应用于所述第二组频带中的每个频带的闪避增益量；或者2)应用于所述第三组频带中的每个频带的闪避增益量。In some examples, for a frequency band in the first set of frequency bands, a corresponding ducking gain is determined based on a ratio including outputs of two envelope trackers, the two envelope trackers corresponding to a slow envelope tracker and fast envelope trackers. In some examples, the slow envelope tracker includes an absolute value calculation block and a first low pass filter, and wherein the fast envelope tracker includes the absolute value calculation block and a second low pass filter, The first low pass filter and the second low pass filter have different time constants. In some examples, some methods may further involve applying a high pass filter to at least one of the first set of frequency bands, wherein an output of the high pass filter is provided to one of the two envelope trackers. at least one. In some examples, the high-pass filter is applied to two or more of the first set of frequency bands, and wherein the high-pass filter applied to a first of the two or more frequency bands The filter has a different cutoff frequency than the high pass filter applied to a second of the two or more frequency bands. In some examples, the time constant of the first low pass filter of the slow envelope tracker is longer than the time constant of the second low pass filter of the fast envelope tracker, and wherein the ratio includes the The ratio of the output of the slow envelope tracker to the output of the fast envelope tracker. In some examples, the time constant of the first low pass filter of the slow envelope tracker is longer than the time constant of the second low pass filter of the fast envelope tracker, and wherein the ratio includes the The ratio of the output of the fast envelope tracker to the output of the slow envelope tracker. In some examples, the ratio includes a constant specific to a frequency band in the first set of frequency bands, the constant being selected to control at least one of: 1) application in the second set of frequency bands The amount of ducking gain applied to each frequency band; or 2) the amount of ducking gain applied to each frequency band in the third group of frequency bands.

在一些示例中，将所述输入音频信号分成所述第一组频带包括将所述输入音频信号提供给滤波器组。在一些示例中，所述滤波器组被实施为无限脉冲响应(IIR)滤波器组或有限脉冲响应(FIR)滤波器组。In some examples, dividing the input audio signal into the first set of frequency bands includes providing the input audio signal to a filter bank. In some examples, the filter bank is implemented as an infinite impulse response (IIR) filter bank or a finite impulse response (FIR) filter bank.

在一些示例中，所述第一组频带、所述第二组频带和/或所述第三组频带包括三个频带。In some examples, the first set of frequency bands, the second set of frequency bands, and/or the third set of frequency bands include three frequency bands.

在一些示例中，所述第一组频带与所述第三组频带相同。In some examples, the first set of frequency bands and the third set of frequency bands are the same.

在一些示例中，所述至少一个宽带去相关信号包括两个或更多个宽带去相关信号。In some examples, the at least one wideband decorrelated signal includes two or more wideband decorrelated signals.

在一些示例中，一些方法进一步涉及使用所述至少一个宽带去相关信号和在所述解码器处接收到的元数据来对所述下混音频信号进行上混，以生成重建的音频信号。在一些示例中，一些方法进一步涉及渲染所述重建的音频信号以生成经渲染音频信号。在一些示例中，一些方法进一步涉及使用以下各项中的一项或多项来呈现所述经渲染音频信号：扩音器或耳机。In some examples, some methods further involve upmixing the downmix audio signal using the at least one wideband decorrelated signal and metadata received at the decoder to generate a reconstructed audio signal. In some examples, some methods further involve rendering the reconstructed audio signal to generate a rendered audio signal. In some examples, some methods further involve presenting the rendered audio signal using one or more of: a loudspeaker or headphones.

本文描述的一些或所有操作、功能和/或方法可以由一个或多个设备根据存储在一个或多个非暂态介质上的指令(例如，软件)来执行。这种非暂态介质可以包括如本文描述的存储器设备，包括但不限于随机存取存储器(RAM)设备、只读存储器(ROM)设备等。因此，可以经由其上存储有软件的一种或多种非暂态介质来实施本公开内容中描述的主题的一些创新方面。Some or all operations, functions, and/or methods described herein may be performed by one or more devices according to instructions (eg, software) stored on one or more non-transitory media. Such non-transitory media may include memory devices as described herein, including but not limited to random access memory (RAM) devices, read only memory (ROM) devices, and the like. Accordingly, some innovative aspects of the subject matter described in this disclosure may be implemented via one or more non-transitory media having software stored thereon.

本公开的至少一些方面可以经由装置来实施。例如，一个或多个设备可以能够至少部分地执行本文公开的方法。在一些实施方式中，装置是或包括具有接口系统和控制系统的音频处理系统。控制系统可以包括一个或多个通用单芯片或多芯片处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其他可编程逻辑设备、离散门或晶体管逻辑、离散硬件部件或其组合。At least some aspects of the present disclosure can be implemented via devices. For example, one or more devices may be capable of performing, at least in part, the methods disclosed herein. In some embodiments, the apparatus is or includes an audio processing system having an interface system and a control system. The control system may include one or more general-purpose single-chip or multi-chip processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates, or Transistor logic, discrete hardware components, or combinations thereof.

在以下附图和说明中阐述了本说明书中所描述的主题的一个或多个实施方式的细节。从说明书、附图和权利要求中，其他特征、方面和优点将变得显而易见。注意，以下附图的相对尺寸可能不是按比例来绘制的。The details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and description below. Other features, aspects and advantages will become apparent from the description, drawings and claims. Note that the relative dimensions of the following figures may not be to scale.

附图说明Description of the drawings

图1是根据一些实施例的示例多声道编解码器的框图。Figure 1 is a block diagram of an example multi-channel codec in accordance with some embodiments.

图2是根据一些实施例的解码器的一部分的框图，所述解码器包括去相关器与用于实施多频带闪避的闪避器的实例。Figure 2 is a block diagram of a portion of a decoder including an example of a decorrelator and a ducker for implementing multi-band ducking, in accordance with some embodiments.

图3是根据一些实施例的可以用于实施多频带闪避的闪避器的实例的框图。Figure 3 is a block diagram of an example of a ducker that may be used to implement multi-band ducking in accordance with some embodiments.

图4是根据一些实施例的可以用于实施多频带闪避的示例滤波器组的频率响应的曲线图。4 is a graph of the frequency response of an example filter bank that may be used to implement multi-band ducking in accordance with some embodiments.

图5是根据一些实施例的可以由解码器执行的用于执行多频带闪避的示例过程的流程图。Figure 5 is a flowchart of an example process that may be performed by a decoder for performing multi-band ducking in accordance with some embodiments.

图6图示了根据一些实施例的沉浸式语音和音频服务(IVAS)系统的示例用例。Figure 6 illustrates an example use case for an immersive speech and audio services (IVAS) system in accordance with some embodiments.

图7示出了图示能够实施本公开的各个方面的装置的部件的示例的框图。7 shows a block diagram illustrating an example of components of an apparatus capable of implementing various aspects of the present disclosure.

在各个附图中，相似的附图标记和名称指示相似的元件。Similar reference numbers and names indicate similar elements throughout the various drawings.

具体实施方式Detailed ways

去相关器通常在利用多声道音频编解码器(如立体声音频编解码器、参数立体声、AC-4等)的解码器设备中使用。特别地，在编码器处，可以将N个声道输入下混为M个声道，其中，N>M。这M个下混声道和辅助信息被编码成比特流并传输到解码器。解码器然后可以对这M个声道和辅助信息进行解码，并利用辅助信息来上混或重建这N个声道。特别地，解码器设备的去相关器可以生成N-M个去相关信号。解码器然后可以利用这M个下混声道、这N-M个去相关信号和辅助信息来获得原始N个声道的近似重建。换句话说，通过生成原始N个声道的近似重建，解码器可以重建原始空间音频场景。Decorrelators are commonly used in decoder devices that utilize multi-channel audio codecs such as Stereo Audio Codec, Parametric Stereo, AC-4, etc. In particular, at the encoder, N channel inputs can be downmixed into M channels, where N>M. The M downmix channels and auxiliary information are encoded into a bit stream and transmitted to the decoder. The decoder can then decode the M channels and the side information and use the side information to upmix or reconstruct the N channels. In particular, the decorrelator of the decoder device may generate N-M decorrelated signals. The decoder can then utilize the M downmix channels, the N-M decorrelated signals and the side information to obtain an approximate reconstruction of the original N channels. In other words, by generating an approximate reconstruction of the original N channels, the decoder can reconstruct the original spatial audio scene.

举例来说，在立体声音频的情况下，其中，N对应于两个声道，并且其中，M对应于一个下混声道，去相关器可以生成一个去相关信号。解码器然后可以使用这一个去相关信号、这一个下混声道和辅助信息来重建原始的两个音频信号的表示。作为另一示例，在N是四个声道(如一阶高保真立体声(FOA)信号的声道W、X、Y、Z)并且M是一个下混声道的情况下，去相关器可以生成三个去相关信号。解码器可以利用这三个去相关信号来重建原始空间音频场景。For example, in the case of stereo audio, where N corresponds to two channels, and where M corresponds to one downmix channel, a decorrelator can generate a decorrelated signal. The decoder can then use the decorrelated signal, the downmix channel and the side information to reconstruct a representation of the original two audio signals. As another example, in the case where N is four channels (such as channels W, a decorrelated signal. The decoder can utilize these three decorrelated signals to reconstruct the original spatial audio scene.

一般而言，去相关器可以用于将输入音频信号变换成一个或多个不相关的输出信号，这可以获得可控的宽度、空间或扩散感，而其他感知属性保持不变。相应地，去相关器对于重建具有空间分量的音频信号可能是有用的。图1图示了利用解码器中的去相关器来重建经编码音频信号的编解码器的特定示例。In general, a decorrelator can be used to transform an input audio signal into one or more uncorrelated output signals, which can achieve a controlled sense of width, space, or diffusion while other perceptual properties remain unchanged. Accordingly, decorrelators may be useful for reconstructing audio signals with spatial components. Figure 1 illustrates a specific example of a codec that utilizes a decorrelator in the decoder to reconstruct an encoded audio signal.

图1是根据实施例的用于对IVAS比特流进行编码和解码的沉浸式语音和音频服务(IVAS)编解码器150的框图。IVAS编解码器150包括编码器和远端解码器。IVAS编码器包括空间分析和下混单元152、量化和熵编码单元153、核心编码单元156和模式/比特率控制单元157。IVAS解码器包括量化和熵解码单元154、核心解码单元158、空间合成/渲染单元159和去相关器单元161。Figure 1 is a block diagram of an Immersive Speech and Audio Services (IVAS) codec 150 for encoding and decoding IVAS bitstreams, according to an embodiment. IVAS codec 150 includes an encoder and a remote decoder. The IVAS encoder includes a spatial analysis and downmix unit 152, a quantization and entropy encoding unit 153, a core encoding unit 156 and a mode/bitrate control unit 157. The IVAS decoder includes a quantization and entropy decoding unit 154, a core decoding unit 158, a spatial synthesis/rendering unit 159 and a decorrelator unit 161.

空间分析和下混单元152接收表示音频场景的N个声道输入音频信号151。输入音频信号151包括但不限于：单声道信号、立体声信号、双耳信号、空间音频信号(例如，多声道空间音频对象)、FOA、更高阶高保真立体声(HOA)和任何其他音频数据。空间分析和下混单元152将这N个声道输入音频信号151下混为指定数量的下混声道(M)。在该示例中，M<＝N。空间分析和下混单元152还生成辅助信息(例如，空间元数据)，所述辅助信息可以由远端IVAS解码器用于从解码器处生成的M个下混声道、空间元数据和去相关信号合成N个声道输入音频信号151。在一些实施例中，空间分析和下混单元152实施用于分析/下混立体声/FOA音频信号的复杂高级耦接(CACPL)和/或用于分析/下混FOA音频信号的空间重建器(SPAR)。在其他实施例中，空间分析和下混单元152实施其他格式。The spatial analysis and downmix unit 152 receives an N channel input audio signal 151 representing an audio scene. Input audio signals 151 include, but are not limited to: mono signals, stereo signals, binaural signals, spatial audio signals (eg, multi-channel spatial audio objects), FOA, higher order high fidelity stereo (HOA), and any other audio data. The spatial analysis and downmix unit 152 downmixes the N channel input audio signals 151 into a specified number of downmix channels (M). In this example, M<=N. The spatial analysis and downmix unit 152 also generates auxiliary information (e.g., spatial metadata) that can be used by the remote IVAS decoder for the M downmix channels, spatial metadata, and decorrelated signals generated from the decoder. N channel input audio signals are synthesized 151. In some embodiments, the spatial analysis and downmix unit 152 implements a Complex Advanced Coupling (CACPL) for analyzing/downmixing stereo/FOA audio signals and/or a spatial reconstructor (CACPL) for analyzing/downmixing FOA audio signals ( SPAR). In other embodiments, spatial analysis and downmix unit 152 implements other formats.

这M个声道由包括在核心编码单元156中的核心编解码器的一个或多个实例进行编码。辅助信息(例如，空间元数据(MD))由量化和熵编码单元153进行量化和编码。经编码比特然后被一起打包成(多个)IVAS比特流，并且被发送到IVAS解码器。在实施例中，底层核心编解码器可以是可以用于生成经编码比特流的任何合适的单声道、立体声或多声道编解码器。The M channels are encoded by one or more instances of the core codec included in core encoding unit 156. Auxiliary information (eg, spatial metadata (MD)) is quantized and encoded by the quantization and entropy encoding unit 153 . The encoded bits are then packed together into IVAS bitstream(s) and sent to the IVAS decoder. In embodiments, the underlying core codec may be any suitable mono, stereo or multi-channel codec that may be used to generate the encoded bitstream.

在一些实施例中，核心编解码器是EVS编解码器。EVS编码单元156符合3GPP TS26.445，并且提供广泛的功能，如窄带(EVS-NB)和宽带(EVS-WB)言语服务的增强质量和编码效率、使用超宽带(EVS-SWB)言语的增强质量、会话应用中的混合内容和音乐的增强质量、对分组丢失和延迟抖动的稳健性以及对AMR-WB编解码器的向后兼容性。In some embodiments, the core codec is the EVS codec. The EVS encoding unit 156 complies with 3GPP TS26.445 and provides a wide range of functions, such as enhanced quality and coding efficiency for narrowband (EVS-NB) and wideband (EVS-WB) speech services, enhancement using super-wideband (EVS-SWB) speech Quality, enhanced quality for mixed content and music in conversational applications, robustness to packet loss and delay jitter, and backward compatibility with the AMR-WB codec.

在解码器处，这M个声道由包括在核心解码单元158中的核心编解码器的对应的一个或多个实例进行解码，并且辅助信息由量化和熵解码单元154进行解码。如FOA信号格式的W声道等主要下混声道被馈送到去相关器单元161，所述去相关器单元生成N-M个去相关声道。这M个下混声道、N-M个去相关声道和辅助信息被馈送到空间合成/呈渲染单元1 59，所述空间合成/渲染单元使用这些输入来合成或重新生成原始的N个声道输入音频信号，所述原始的N个声道输入音频信号可以由音频设备160呈现。在实施例中，M个声道由除EVS之外的单声道编解码器进行解码。在其他实施例中，M个声道由一个或多个多声道核心编码单元和一个或多个单声道核心编码单元的组合进行解码。At the decoder, the M channels are decoded by corresponding one or more instances of the core codec included in core decoding unit 158 , and the side information is decoded by quantization and entropy decoding unit 154 . The primary downmix channels, such as the W channel of the FOA signal format, are fed to the decorrelator unit 161, which generates N-M decorrelated channels. These M downmix channels, N-M decorrelated channels and ancillary information are fed to the spatial synthesis/rendering unit 1 59 which uses these inputs to synthesize or regenerate the original N channel inputs Audio signal, the original N channel input audio signal may be presented by the audio device 160 . In an embodiment, the M channels are decoded by a mono codec other than EVS. In other embodiments, the M channels are decoded by a combination of one or more multi-channel core coding units and one or more mono core coding units.

下面给出了具有一次声道下混的FOA输入音频信号的编码的示例实施方式。对于1声道被动下混配置，只对W声道、P(p₁，p₂，p₃)参数和P_d(d₁，d₂，d₃)参数进行编码并将其发送到解码器。P对应于指示可以根据W声道预测多少侧声道(Y、X和Z)的预测系数。P_d参数指示去掉预测分量后Y、X和Z声道中的剩余能量。An example implementation of encoding of an FOA input audio signal with primary channel downmixing is given below. For a 1-channel passive downmix configuration, only the W channel, P (p ₁ , p ₂ , p ₃ ) parameters and P _d (d ₁ , d ₂ , d ₃ ) parameters are encoded and sent to the decoder . P corresponds to a prediction coefficient indicating how many side channels (Y, X, and Z) can be predicted from the W channel. The P _d parameter indicates the remaining energy in the Y, X and Z channels after removing the predicted components.

在被动下混编码方案中，侧声道Y、X和Z在解码器处根据所传输的下混W声道来预测；使用三个预测参数P来预测。侧声道中的丢失能量通过使用去相关参数P_d添加去相关下混D(W)的缩放版本来填充。对于被动下混，FOA输入的重建可以由下式确定：In the passive downmix coding scheme, the side channels Y, X and Z are predicted at the decoder from the transmitted downmix W channel; three prediction parameters P are used for prediction. The missing energy in the side channels is filled by adding a scaled version of the decorrelated downmix D(W) using the decorrelation parameter _Pd . For passive downmix, the reconstruction of the FOA input can be determined by:

U_pas＝pW+P_dD(W)，U _pas =pW+P _d D(W),

其中，p＝[1 p₁ p₂ p₃]^T且P_d＝[0 d₁ d₂ d₃]^T，并且D(W)描述了具有作为去相关器块的输入提供的W声道的去相关器输出。U_pas是解码器处的重建FOA输出。注意，假设去相关器是完美的并且没有进行预测和去相关器参数的量化，则该方案在输入协方差矩阵方面实现了完美的重建。where, p = [1 p ₁ p ₂ p ₃ ] ^T and P _d = [0 d ₁ d ₂ d ₃ ] ^T , and D(W) describes the channel with W provided as input to the decorrelator block Decorrelator output. _Upas is the reconstructed FOA output at the decoder. Note that this scheme achieves perfect reconstruction in terms of the input covariance matrix, assuming that the decorrelator is perfect and no quantization of prediction and decorrelator parameters is performed.

在示例编码器实施方式中，Y声道的预测系数可以由下式确定：In an example encoder implementation, the prediction coefficients for the Y channel can be determined by:

在上面给出的等式中，R_YW是W和Y声道的协方差，并且R_WW是W声道的方差。In the equation given above, R _YW is the covariance of the W and Y channels, and R _WW is the variance of the W channel.

类似地，可以确定对其他侧声道(p₂用于X声道并且p₃用于Z声道)的预测。Similarly, predictions for the other side channels (p ₂ for the X channel and p ₃ for the Z channel) can be determined.

剩余侧声道可以由下式确定：The remaining side channels can be determined by:

Y′＝Y-p₁*WY′＝Yp ₁ *W

X′＝X-p₂*WX′＝Xp ₂ *W

Z′＝Z-p₃*WZ′＝Zp ₃ *W

在示例实施方式中，Y声道的去相关参数d₁由下式确定：In an example implementation, the decorrelation parameter d ₁ for the Y channel is determined by:

此处，R_Y′Y′是剩余声道Y’的方差，并且R_WW是W声道的方差。类似地，可以确定其他剩余的侧声道(d₂用于X’声道并且d₃用于Z’声道)的去相关参数。Here, R _Y'Y' is the variance of the remaining channel Y', and R _WW is the variance of the W channel. Similarly, the decorrelation parameters for the other remaining side channels (d ₂ for the X' channel and d ₃ for the Z' channel) can be determined.

去相关器的一个潜在问题是，输入音频信号中的瞬变可能在输出声道中随时间而模糊。举例来说，如敲击声或其他类型的瞬变等瞬变可能在由去相关器生成的多个声道中随时间而模糊，这可能在具有瞬变的帧中添加不期望的混响。另一个问题是，即使当输入信号突然偏移时由去相关器生成的去相关信号仍可能具有相当大的能量。应当注意，如本文所使用的，术语“偏移”通常用于指音频信号的主要元素或分量的结束或停止。换句话说，在去相关器的输入信号包括突然停止或偏移的情况下，去相关信号可能包括使偏移模糊的相当大的能量。这进而可能在基于去相关信号生成的重建信号中产生伪影。One potential problem with decorrelators is that transients in the input audio signal can become blurred over time in the output channel. For example, transients such as knocks or other types of transients may be blurred over time across multiple channels generated by the decorrelator, which may add undesirable reverberation in frames with transients . Another problem is that the decorrelated signal generated by the decorrelator can still have considerable energy even when the input signal is suddenly shifted. It should be noted that as used herein, the term "offset" is generally used to refer to the end or cessation of a major element or component of an audio signal. In other words, where the input signal to the decorrelator includes sudden stops or shifts, the decorrelated signal may contain considerable energy that obscures the shifts. This in turn may produce artifacts in the reconstructed signal generated based on the decorrelated signals.

闪避可以用于在将输入音频信号提供给去相关器之前闪避或衰减瞬变。例如，在生成(多个)去相关信号之前闪避瞬变可以防止瞬变在(多个)生成的去相关信号中随时间而模糊。类似地，在输入音频信号中存在偏移的情况下，可以对去相关器的输出执行闪避以衰减(多个)去相关信号。然而，闪避传统上是在宽带的基础上执行的。换句话说，音频信号的所有频带以相同的增益被闪避。这可能会产生伪影并降低音频质量。例如，在存在瞬变的情况下，以宽带方式对输入音频信号应用闪避增益可以闪避高频内容，由于瞬变，这可能是期望的。然而，以宽带方式应用闪避增益还可能闪避如低音等较低频率的内容，这可能降低整体音频质量和/或在整体音频内容中产生失真。为了解决在所有频带上等效应用闪避的问题，当使用多频带去相关器时，一些传统技术可以在频带域中应用闪避。然而，由于实施去相关器的计算复杂性，因此实施去相关器的多个实例(每个实例在不同的频带上操作)可能大大增加计算复杂性，从而导致计算资源的过度使用等。Ducking can be used to duck or attenuate transients before providing the input audio signal to the decorrelator. For example, ducking the transient before generating the decorrelated signal(s) can prevent the transient from being blurred over time in the generated decorrelated signal(s). Similarly, in the presence of offsets in the input audio signal, ducking may be performed on the output of the decorrelator to attenuate the decorrelated signal(s). However, ducking has traditionally been performed on a broadband basis. In other words, all frequency bands of the audio signal are ducked with the same gain. This can create artifacts and degrade audio quality. For example, in the presence of transients, applying ducking gain to the input audio signal in a broadband manner can duck high-frequency content, which may be desirable due to transients. However, applying ducking gain in a broadband manner may also duck lower frequency content such as bass, which may reduce overall audio quality and/or create distortion in the overall audio content. To solve the problem of applying ducking equally in all frequency bands, some conventional techniques can apply ducking in the frequency band domain when using multi-band decorrelators. However, due to the computational complexity of implementing a decorrelator, implementing multiple instances of the decorrelator (each operating on a different frequency band) may significantly increase the computational complexity, leading to overuse of computing resources, etc.

本文描述了用于在每频带的基础上应用闪避增益的技术。特别地，闪避增益是在逐个频带的基础上确定和应用的。例如，这可以允许与高频内容相比对低频内容不同地应用闪避增益。在一些实施方式中，闪避增益可以是在将输入音频信号提供给去相关器之前应用于输入音频信号的输入闪避增益。输入闪避增益可以用于在瞬变被提供给去相关器之前闪避瞬变信号，从而防止瞬变“进入”去相关器。在一些实施方式中，闪避增益可以另外地或替代性地是应用于由去相关器生成的去相关信号的输出闪避增益。输出闪避增益可以用于闪避(多个)生成的去相关信号中与输入信号中的偏移相对应的持续信号，从而恢复(多个)去相关信号中的输入信号的偏移。应当注意，尽管闪避增益可以在每个频带的基础上确定和应用，但是去相关可以在宽带的基础上执行。因为去相关器实施起来可能计算量大，所以在宽带基础上执行去相关的同时在每个频带的基础上应用闪避可以通过仅实施去相关器的一个实例来提高计算效率，同时通过以考虑音频内容的频率的选择性方式应用闪避增益来提高整体音频质量。This article describes techniques for applying ducking gain on a per-band basis. In particular, ducking gain is determined and applied on a band-by-band basis. For example, this may allow ducking gain to be applied differently to low frequency content compared to high frequency content. In some embodiments, the ducking gain may be an input ducking gain applied to the input audio signal before the input audio signal is provided to the decorrelator. The input ducking gain can be used to duck the transient signal before the transient is provided to the decorrelator, thereby preventing the transient from "entering" the decorrelator. In some embodiments, the ducking gain may additionally or alternatively be an output ducking gain applied to the decorrelated signal generated by the decorrelator. The output ducking gain may be used to duck persistent signals in the generated decorrelated signal(s) that correspond to offsets in the input signal, thereby recovering offsets in the input signal in the decorrelated signal(s). It should be noted that while ducking gain can be determined and applied on a per-band basis, decorrelation can be performed on a broadband basis. Because decorrelators can be computationally expensive to implement, performing decorrelation on a wideband basis while applying ducking on a per-band basis can improve computational efficiency by implementing only one instance of the decorrelator while taking audio into account Ducking gain is applied in a content-frequency-selective manner to improve overall audio quality.

图2图示了根据一些实施例的可以由解码器用于实施多频带闪避的示例系统的框图。应该注意，图2所示系统的各种块可以使用设备的一个或多个控制系统(如图7中示出并且下面结合该图描述的控制系统)来实施。如图2所示，输入音频信号或输入音频信号帧被提供给第一滤波器组202(其在图2中被描绘为“滤波器组A”)。在一些实施方式中，第一滤波器组202可以将输入音频信号分成任何合适数量的频带，如两个频带、三个频带、八个频带、十个频带、16个频带等。在图2所示的示例中，第一滤波器组202将输入音频信号分成三个频带，这三个频带可以分别对应于低频、中频和高频。涉及三个频带的实施方式的频率范围的示例在图4中示出并在下面描述。Figure 2 illustrates a block diagram of an example system that may be used by a decoder to implement multi-band ducking in accordance with some embodiments. It should be noted that the various blocks of the system shown in Figure 2 may be implemented using one or more control systems of the device, such as the control system shown in Figure 7 and described below in connection with that figure. As shown in Figure 2, an input audio signal or frame of an input audio signal is provided to a first filter bank 202 (depicted in Figure 2 as "Filter Bank A"). In some implementations, the first filter bank 202 may divide the input audio signal into any suitable number of frequency bands, such as two frequency bands, three frequency bands, eight frequency bands, ten frequency bands, 16 frequency bands, etc. In the example shown in FIG. 2 , the first filter bank 202 divides the input audio signal into three frequency bands, which may respectively correspond to low frequency, intermediate frequency and high frequency. An example of frequency ranges for an embodiment involving three frequency bands is shown in Figure 4 and described below.

每个频带可以被提供给闪避器块的实例。例如，因为第一滤波器组202将输入音频信号分成三个频带，所以在图2中图示了三个闪避器块，这三个闪避器块被描述为闪避器204a、闪避器204b和闪避器204c。每个闪避器块可以生成输入闪避增益和/或输出闪避增益。在一些实施方式中，闪避增益可以基于两个包络跟踪器的输出的比率来确定，每个包络跟踪器具有不同的时间常数。包络跟踪器可以使用绝对值(整流器)块和低通滤波器来实施。例如，可以基于具有长时间常数的低通滤波器的输出与具有短时间常数的低通滤波器的输出的比率来确定输入闪避增益。换句话说，可以基于慢包络跟踪与快包络跟踪的比率来确定输入闪避增益。相反，可以基于具有短时间常数的低通滤波器的输出与具有长时间常数的低通滤波器的输出的比率来确定输出闪避增益。换句话说，可以基于快包络跟踪与慢包络跟踪的比率来确定输出闪避增益。长时间常数的示例包括60毫秒、70毫秒、80毫秒、90毫秒等。短时间常数的示例包括3毫秒、4毫秒、5毫秒、10毫秒等。应当注意，每个闪避器块实例可以将与特定频带相对应的第一滤波器组202的输出作为输入，并且生成适用于该特定频带的闪避增益。闪避器块的更详细示例在图3中示出并且下面结合该图进行描述。Each frequency band can be provided to an instance of the ducker block. For example, because the first filter bank 202 separates the input audio signal into three frequency bands, three ducker blocks are illustrated in FIG. 2 and are described as ducker 204a, ducker 204b and ducker 204b. Device 204c. Each ducker block can generate input ducking gain and/or output ducking gain. In some implementations, the ducking gain may be determined based on the ratio of the outputs of two envelope trackers, each having a different time constant. Envelope trackers can be implemented using absolute value (rectifier) blocks and low-pass filters. For example, the input ducking gain may be determined based on the ratio of the output of a low-pass filter with a long time constant to the output of a low-pass filter with a short time constant. In other words, the input ducking gain can be determined based on the ratio of slow envelope tracking to fast envelope tracking. Instead, the output ducking gain may be determined based on the ratio of the output of the low-pass filter with a short time constant to the output of the low-pass filter with a long time constant. In other words, the output ducking gain can be determined based on the ratio of fast envelope tracking to slow envelope tracking. Examples of long time constants include 60 milliseconds, 70 milliseconds, 80 milliseconds, 90 milliseconds, etc. Examples of short time constants include 3 ms, 4 ms, 5 ms, 10 ms, etc. It should be noted that each ducker block instance may take as input the output of the first filter bank 202 corresponding to a particular frequency band and generate a ducking gain appropriate for that particular frequency band. A more detailed example of a ducker block is shown in Figure 3 and described below in conjunction with this figure.

输入音频信号可以被提供给延迟块206。输入音频信号的延迟版本可以被提供给第二滤波器组208(在图2中描绘为“滤波器组B”)。延迟块206可以用于将输入音频信号延迟某个量，所述量在输入音频信号被第二滤波器组208分成多个频带之后将输入音频信号与由闪避器块204a、204b和204c确定闪避增益的输入音频信号的时序的时间对准。应当注意，延迟块206可以结合宽带闪避器实施方式(例如，其中没有实施滤波器组202和208)来实施。可以由延迟块206应用的示例延迟包括1.5毫秒、2毫秒、2.5毫秒等。在一些实施方式中，由延迟块206应用的延迟可以是将在宽带闪避器系统中使用的延迟，所述宽带闪避器系统然后至少部分地基于由第一滤波器组202应用的延迟和/或由第二滤波器组208应用的延迟来进行修改。The input audio signal may be provided to delay block 206. A delayed version of the input audio signal may be provided to a second filter bank 208 (depicted as "Filter Bank B" in Figure 2). Delay block 206 may be used to delay the input audio signal by an amount that separates the input audio signal from the ducking determined by ducker blocks 204a, 204b, and 204c after the input audio signal is divided into the plurality of frequency bands by second filter bank 208. Gain time alignment of the timing of the input audio signal. It should be noted that delay block 206 may be implemented in conjunction with a wideband ducker implementation (eg, where filter banks 202 and 208 are not implemented). Example delays that may be applied by delay block 206 include 1.5 milliseconds, 2 milliseconds, 2.5 milliseconds, etc. In some embodiments, the delay applied by delay block 206 may be a delay to be used in a wideband ducker system, which is then based at least in part on the delay applied by first filter bank 202 and/or The modification is made by the delay applied by the second filter bank 208 .

由闪避器块204a、204b和204c确定的输入闪避增益可以在每个频带的基础上应用于输入音频信号的延迟版本的频带。例如，可以基于第一滤波器组202的第一频带来确定与第一频带相对应的第一输入闪避增益。继续该示例，然后可以将第一输入闪避增益应用于第二滤波器组208的第一频带的对应实例。作为更具体的示例，可以通过将输入闪避增益与对应的频带信号增益应用块209a、209b和209c相乘来应用输入闪避增益。应当注意，在一些实施方式中，第一滤波器组202和第二滤波器组208可以是同一滤波器组(例如，具有相同频带数量、相同频率响应、相同滤波器类型等的滤波器组)的不同实例。相反，在一些实施方式中，第一滤波器组202和第二滤波器组208可以在任何一个或多个特性上不同，如频带数量、各种频带的截止频率、所使用的滤波器类型等。应当注意，输入闪避增益的应用可以用于闪避或衰减输入音频信号中的瞬变。如下面将结合图3和图5更详细描述的，应用于较高频带的输入闪避增益可以比应用于较低频带的输入闪避增益高，从而使得高频信号比低频信号被更强烈地闪避或衰减。The input ducking gains determined by ducker blocks 204a, 204b, and 204c may be applied to the frequency band of the delayed version of the input audio signal on a per-band basis. For example, a first input ducking gain corresponding to the first frequency band may be determined based on the first frequency band of the first filter bank 202 . Continuing with the example, the first input ducking gain may then be applied to the corresponding instance of the first frequency band of the second filter bank 208 . As a more specific example, the input ducking gain may be applied by multiplying the input ducking gain by the corresponding band signal gain application blocks 209a, 209b, and 209c. It should be noted that in some implementations, first filter bank 202 and second filter bank 208 may be the same filter bank (e.g., filter banks with the same number of frequency bands, the same frequency response, the same filter type, etc.) of different instances. Rather, in some embodiments, first filter bank 202 and second filter bank 208 may differ in any one or more characteristics, such as number of frequency bands, cutoff frequencies for various frequency bands, type of filters used, etc. . It should be noted that the application of input ducking gain can be used to duck or attenuate transients in the input audio signal. As will be described in more detail below in conjunction with Figures 3 and 5, the input ducking gain applied to higher frequency bands can be higher than the input ducking gain applied to lower frequency bands, such that high frequency signals are ducked more strongly than low frequency signals. or decay.

在应用输入闪避增益之后可以生成宽带闪避信号。例如，在已经在第二滤波器组208的频带组的每频带基础上应用输入闪避增益之后，可以例如通过求和来组合频带，以生成宽带信号。作为更具体的示例，可以经由聚合块209d对频带进行求和或聚合。然后，可以将宽带信号提供给去相关器210。去相关器210可以生成一个或多个去相关信号。在一些实施方式中，由去相关器210生成的去相关信号的数量可以取决于要由解码器进行参数重建的信号的数量，如上面结合图1进行描述的。例如，在重建的音频信号是立体声信号的情况下，去相关器210可以生成一个去相关信号，所述去相关信号可以用于对信号下混的信号进行上混以生成原始的两个信号。作为另一示例，在重建的音频信号包括四个声道并且存在一个下混信号的情况下，去相关器210可以生成三个去相关信号，每个去相关信号可以用于重建由编码器进行参数编码的三个信号。A wideband ducking signal can be generated after applying the input ducking gain. For example, after the input ducking gain has been applied on a per-band basis in the band set of the second filter bank 208, the frequency bands may be combined, such as by summation, to generate a wideband signal. As a more specific example, the frequency bands may be summed or aggregated via aggregation block 209d. The wideband signal may then be provided to decorrelator 210. Decorrelator 210 may generate one or more decorrelated signals. In some embodiments, the number of decorrelated signals generated by decorrelator 210 may depend on the number of signals to be parametrically reconstructed by the decoder, as described above in connection with FIG. 1 . For example, where the reconstructed audio signal is a stereo signal, the decorrelator 210 may generate a decorrelated signal that may be used to upmix the downmixed signal to generate the original two signals. As another example, in the case where the reconstructed audio signal includes four channels and there is one downmix signal, the decorrelator 210 may generate three decorrelated signals, each of which may be used for reconstruction by the encoder. Parametrically encoded three signals.

所述一个或多个去相关信号可以被提供给第三滤波器组212(在图2中被描绘为“滤波器组C”)。第三滤波器组212可以将一个或多个去相关信号中的每一个分成多个频带，例如，两个频带、三个频带、八个频带、16个频带等。在一些实施例中，第三滤波器组212可以是第一滤波器组202和/或第二滤波器组208的另一个实例。相反，在一些实施方式中，第三滤波器组212可以在任何特性(如各种频带的截止频率、所使用的滤波器类型等)上与第一滤波器组202和/或第二滤波器组208不同。应当注意，在一些实施方式中，可以为由去相关器210生成的每个去相关信号复制第三滤波器组212。The one or more decorrelated signals may be provided to a third filter bank 212 (depicted as "Filter Bank C" in Figure 2). The third filter bank 212 may divide each of the one or more decorrelated signals into multiple frequency bands, eg, two frequency bands, three frequency bands, eight frequency bands, 16 frequency bands, etc. In some embodiments, third filter bank 212 may be another instance of first filter bank 202 and/or second filter bank 208 . Conversely, in some embodiments, the third filter bank 212 may be compatible with the first filter bank 202 and/or the second filter bank in any characteristics (such as cutoff frequencies for various frequency bands, filter types used, etc.) Group 208 is different. It should be noted that in some implementations, third filter bank 212 may be replicated for each decorrelated signal generated by decorrelator 210 .

输出闪避增益可以由对应的延迟块214a、214b和214c延迟，每个输出闪避增益基于第一滤波器组202的频带确定并且由闪避器块204a、204b和204c生成。延迟块214a、214b和214c可以用于延迟输出闪避增益，使得输出闪避增益可以与第三滤波器组212的频带时间对准。在一些实施例中，由延迟块214a、214b和214c中的每一个应用的延迟可以至少部分地基于由第三滤波器组212生成的延迟。延迟的输出闪避增益然后可以在每频带的基础上应用于一个或多个去相关信号中的每一个。例如，可以经由增益应用块213a、213b和213c将输出闪避增益乘以对应的频带信号来应用输出闪避增益。应当注意，输出闪避增益可以用于闪避或衰减输入音频信号中的偏移。偏移的示例是输入音频信号的突然停止。The output ducking gains may be delayed by corresponding delay blocks 214a, 214b, and 214c, each output ducking gain being determined based on the frequency band of the first filter bank 202 and generated by ducker blocks 204a, 204b, and 204c. Delay blocks 214a, 214b, and 214c may be used to delay the output ducking gain so that the output ducking gain may be time aligned with the frequency band of the third filter bank 212. In some embodiments, the delay applied by each of delay blocks 214a, 214b, and 214c may be based at least in part on the delay generated by third filter bank 212. The delayed output ducking gain may then be applied to each of the one or more decorrelated signals on a per-band basis. For example, the output ducking gain may be applied by multiplying the output ducking gain by the corresponding frequency band signal via gain application blocks 213a, 213b, and 213c. It should be noted that the output ducking gain can be used to duck or attenuate excursions in the input audio signal. An example of a shift is a sudden stop of the input audio signal.

在每频带的基础上应用输出闪避增益之后，可以生成每个去相关信号的宽带版本。例如，闪避频带可以被组合(例如，求和)，以生成闪避的宽带去相关信号。作为更具体的示例，可以经由聚合块213d对闪避频带进行求和或聚合。解码器可以使用闪避的宽带去相关信号来对下混信号进行上混并生成重建的音频信号。After applying the output ducking gain on a per-band basis, a broadband version of each decorrelated signal can be generated. For example, the ducking frequency bands may be combined (eg, summed) to generate a ducked broadband decorrelated signal. As a more specific example, the ducking bands may be summed or aggregated via aggregation block 213d. The decoder can use the ducked wideband decorrelation signal to upmix the downmix signal and generate a reconstructed audio signal.

应当注意，第一滤波器组202、第二滤波器组208和/或第三滤波器组212可以以任何合适的方式实施。例如，滤波器组可以被实施为无限脉冲响应(IIR)滤波器组。作为另一示例，滤波器组可以被实施为有限脉冲响应(FIR)滤波器组。各种滤波器组实施方式可能都有优点和缺点。例如，一些滤波器组实施方式可能比其他滤波器组实施方式具有更长的延迟。如上所述，可以实施各种延迟块来解决由滤波器组应用的延迟，例如，以确保在应用闪避增益之前信号是时间对准的。应当注意，滤波器组可以实现和/或接近“确切重建”，其中，未经修改的频带的总和与滤波器组的输入信号或其延迟版本基本上相同。It should be noted that the first filter bank 202, the second filter bank 208, and/or the third filter bank 212 may be implemented in any suitable manner. For example, the filter bank may be implemented as an infinite impulse response (IIR) filter bank. As another example, the filter bank may be implemented as a finite impulse response (FIR) filter bank. Various filter bank implementations may have advantages and disadvantages. For example, some filter bank implementations may have longer delays than other filter bank implementations. As mentioned above, various delay blocks can be implemented to account for the delay applied by the filter bank, for example, to ensure that the signal is time aligned before applying ducking gain. It should be noted that the filter bank can achieve and/or approach an "exact reconstruction" where the sum of the unmodified frequency bands is essentially the same as the filter bank's input signal or a delayed version thereof.

如上所述，在一些实施方式中，输入闪避增益和输出闪避增益可以通过向两个包络跟踪器提供输入音频信号的特定频带并确定这两个跟踪器的输出的比率来确定。在一些实施例中，每个包络跟踪器可以与对应的低通滤波器相关联。在一些实施例中，两个低通滤波器可以具有两个不同的时间常数，一个时间常数比另一个时间常数长得多。较短时间常数的示例是3毫秒、4毫秒、5毫秒、10毫秒等。较长时间常数的示例是60毫秒、70毫秒、80毫秒、100毫秒等。每个低通滤波器可以有效地对作为低通滤波器的输入提供的输入音频信号的特定频带执行包络跟踪，其中，一个低通滤波器执行慢包络跟踪，而另一个低通滤波器执行快包络跟踪。每个低通滤波器可以由分子滤波器系数b和分母滤波器系数a来表征，其中，b＝[1-c]并且a＝[1,-c]。此处，可以基于滤波器的时间常数来确定c，其中，c＝exp(-1/(t_c*sampling_rate))，其中，t_c表示滤波器的以秒为单位的时间常数。给定-3dB的截止频率，时间常数为5毫秒的低通滤波器的截止频率可以为约32.2Hz，并且时间常数为80毫秒的滤波器的截止频率可以为约2.2Hz。在一些实施例中，特定频带的输入闪避增益可以基于具有较长时间常数的低通滤波器的输出与具有较短时间常数的低通滤波器的输出的比率来确定。换句话说，输入闪避增益可以对应于慢包络跟踪与快包络跟踪的比率。相反，特定频带的输出闪避增益可以基于具有较短时间常数的低通滤波器的输出与具有较长时间常数的低通滤波器的输出的比率来确定。换句话说，输出闪避增益可以对应于快包络跟踪与慢包络跟踪的比率。As described above, in some embodiments, the input ducking gain and the output ducking gain may be determined by providing a specific frequency band of the input audio signal to two envelope trackers and determining a ratio of the outputs of the two trackers. In some embodiments, each envelope tracker may be associated with a corresponding low-pass filter. In some embodiments, the two low pass filters may have two different time constants, one time constant being much longer than the other. Examples of shorter time constants are 3 ms, 4 ms, 5 ms, 10 ms, etc. Examples of longer time constants are 60 ms, 70 ms, 80 ms, 100 ms, etc. Each low-pass filter can effectively perform envelope tracking for a specific frequency band of the input audio signal provided as the input to the low-pass filter, where one low-pass filter performs slow envelope tracking while the other low-pass filter Perform fast envelope tracking. Each low-pass filter can be characterized by a numerator filter coefficient b and a denominator filter coefficient a, where b=[1-c] and a=[1,-c]. Here, c can be determined based on the time constant of the filter, where c=exp(-1/( _tc *sampling_rate)), where _tc represents the time constant of the filter in seconds. Given a cutoff frequency of -3dB, a low-pass filter with a time constant of 5 milliseconds can have a cutoff frequency of about 32.2Hz, and a filter with a time constant of 80 milliseconds can have a cutoff frequency of about 2.2Hz. In some embodiments, the input ducking gain for a particular frequency band may be determined based on the ratio of the output of a low-pass filter with a longer time constant to the output of a low-pass filter with a shorter time constant. In other words, the input ducking gain may correspond to the ratio of slow envelope tracking to fast envelope tracking. Instead, the output ducking gain for a particular frequency band may be determined based on the ratio of the output of a low-pass filter with a shorter time constant to the output of a low-pass filter with a longer time constant. In other words, the output ducking gain may correspond to the ratio of fast envelope tracking to slow envelope tracking.

在一些实施方式中，在向两个包络跟踪器提供输入音频信号的特定频带之前，可以应用高通滤波器。高通滤波器可以用于使频谱变平和/或避免在低频隆隆声的存在下的偏差。在一些实施方式中，高通滤波器的截止频率可以取决于对高通滤波器应用的输入音频信号的频带。例如，相对于较高的频带，较低的截止频率可以用于较低的频带。在一个示例中，3kHz的截止频率可以用于较高的频带，而1kHz的截止频率可以用于较低的频带。高通滤波器的截止频率的示例包括1kHz、2kHz、3kHz、5kHz等。在一些实施方式中，对于一些频带，可以省略高通滤波器。In some embodiments, a high-pass filter may be applied before providing specific frequency bands of the input audio signal to the two envelope trackers. A high-pass filter can be used to flatten the frequency spectrum and/or avoid deviations in the presence of low-frequency rumble. In some embodiments, the cutoff frequency of the high-pass filter may depend on the frequency band of the input audio signal to which the high-pass filter is applied. For example, a lower cutoff frequency may be used for lower frequency bands relative to higher frequency bands. In one example, a cutoff frequency of 3kHz may be used for the upper frequency band and a cutoff frequency of 1kHz may be used for the lower frequency band. Examples of cutoff frequencies for high-pass filters include 1kHz, 2kHz, 3kHz, 5kHz, etc. In some implementations, the high-pass filter may be omitted for some frequency bands.

图3示出了根据一些实施例的示例闪避器实例的示意图。应当注意，图3所示的示例闪避器实例的各种块可以由设备的一个或多个控制系统(如图7中示出并且下面结合该图描述的控制系统)来实施。闪避器可以将输入音频信号的特定频带作为输入，并且可以生成适用于该频带的输入闪避增益和/或输出闪避增益作为输出。如上所述，闪避器可以将输入音频信号的频带作为输入。例如，频带可以是第一滤波器组202的频带，如图2中示出并且上面结合该图进行描述的。输入闪避增益和/或输出闪避增益可以适用于该特定频带。应当注意，可以为第一滤波器组的每个频带在本质上重复图3所示的示例闪避器实例。Figure 3 shows a schematic diagram of an example ducker example in accordance with some embodiments. It should be noted that the various blocks of the example ducker instance shown in Figure 3 may be implemented by one or more control systems of the device, such as the control system shown in Figure 7 and described below in connection with that figure. The ducker can take as input a specific frequency band of the input audio signal and can generate as output an input ducking gain and/or an output ducking gain appropriate for that frequency band. As mentioned above, the ducker can take as input the frequency band of the input audio signal. For example, the frequency band may be that of the first filter bank 202, as shown in Figure 2 and described above in connection with that figure. Input ducking gain and/or output ducking gain may be applied to that particular frequency band. It should be noted that the example ducker instance shown in Figure 3 may be essentially repeated for each frequency band of the first filter bank.

如所图示的，输入音频信号的频带可以可选地使用高通滤波器302进行高通滤波。在一些实施方式中，高通滤波器302的截止频率可以至少部分地取决于由闪避器实例处理的输入音频信号的频带。例如，较高的截止频率可以用于较高的频带，反之亦然。高通滤波器的截止频率的示例包括1kHz、2kHz、3kHz、5kHz等。As illustrated, the frequency bands of the input audio signal may optionally be high-pass filtered using high-pass filter 302. In some implementations, the cutoff frequency of high-pass filter 302 may depend, at least in part, on the frequency band of the input audio signal processed by the ducker instance. For example, a higher cutoff frequency can be used for a higher frequency band and vice versa. Examples of cutoff frequencies for high-pass filters include 1kHz, 2kHz, 3kHz, 5kHz, etc.

输入音频信号的频带(或者如果使用的话，输入音频信号的频带的经高通滤波版本)可以被提供给快包络跟踪器305和慢包络跟踪器307。每个包络跟踪器可以包括绝对值计算块304，所述绝对值计算块被配置成生成信号的绝对值。应当注意，在一些实施方式中，在图3中描绘为“ε”的相对小的值可以被加到信号的绝对值上。如下所述，当确定输入闪避增益和/或输出闪避增益时，这可以防止除零错误。如图3所图示的，快包络跟踪器305包括第一低通滤波器306，并且慢包络跟踪器307包括第二低通滤波器308。如图3所图示的，与第二低通滤波器308相比，第一低通滤波器306可以具有较短的时间常数。较短时间常数的示例包括3毫秒、4毫秒、5毫秒、10毫秒等。较长时间常数的示例是60毫秒、70毫秒、80毫秒、90毫秒、100毫秒等。The frequency bands of the input audio signal (or, if used, a high-pass filtered version of the frequency bands of the input audio signal) may be provided to the fast envelope tracker 305 and the slow envelope tracker 307 . Each envelope tracker may include an absolute value calculation block 304 configured to generate the absolute value of the signal. It should be noted that in some embodiments, a relatively small value, depicted as "ε" in Figure 3, may be added to the absolute value of the signal. As discussed below, this prevents divide-by-zero errors when determining input ducking gain and/or output ducking gain. As illustrated in Figure 3, the fast envelope tracker 305 includes a first low pass filter 306 and the slow envelope tracker 307 includes a second low pass filter 308. As illustrated in FIG. 3 , the first low pass filter 306 may have a shorter time constant compared to the second low pass filter 308 . Examples of shorter time constants include 3 ms, 4 ms, 5 ms, 10 ms, etc. Examples of longer time constants are 60 ms, 70 ms, 80 ms, 90 ms, 100 ms, etc.

第一低通滤波器306的输出(在图3中被描绘为“f”，表示快包络跟踪)和第二低通滤波器308的输出(在图3中被描绘为“s”，表示慢包络跟踪)被提供给输出闪避增益确定块310。类似地，第一低通滤波器306的输出和第二低通滤波器308的输出被提供给输入闪避增益确定块312。可以至少部分地基于快包络跟踪与慢包络跟踪的比率来确定输出闪避增益。特别地，如图3所图示的，如果第一低通滤波器306的输出表示为f(即，用于快包络跟踪)，并且第二低通滤波器308的输出表示为s(即，用于慢包络跟踪)，则初始的一组输出闪避增益可以由下式确定：The output of the first low-pass filter 306 (depicted as "f" in Figure 3, indicating fast envelope tracking) and the output of the second low-pass filter 308 (depicted as "s" in Figure 3, indicating Slow envelope tracking) is provided to the output ducking gain determination block 310. Similarly, the output of first low pass filter 306 and the output of second low pass filter 308 are provided to input ducking gain determination block 312 . The output ducking gain may be determined based at least in part on the ratio of fast envelope tracking to slow envelope tracking. In particular, as illustrated in Figure 3, if the output of the first low-pass filter 306 is denoted as f (i.e., for fast envelope tracking), and the output of the second low-pass filter 308 is denoted as s (i.e., for fast envelope tracking) , for slow envelope tracking), then the initial set of output ducking gains can be determined by the following formula:

初始的一组输入闪避增益可以由下式确定：The initial set of input ducking gains can be determined by:

应当注意，表示乘法常数的const对于输出闪避增益和输入闪避增益可以是相同的，或者与输入闪避增益相比，对于输出闪避增益可以是不同的。const的示例值包括1、1.05、1.1、1.15、1.2等。另外，还应当注意，常数c₁和c₂对于每个频带可以是不同的。特别地，c₁和c₂的值可以分别表示要针对频带应用的输入闪避和输出闪避的量。换句话说，c₁和c₂可以用作对闪避增益的频带相关校正。举例来说，在最低频带中没有闪避可能是有利的。相应地，对于最低频带，c₁和c₂可以为1。作为另一示例，可以对最高频带应用相对较大量的闪避。相应地，对于最高频带，c₁和c₂可以为0，从而使得输入闪避增益和输出闪避增益被确定为基于包络跟踪器的输出的比率，而无需对所述比率进行与频带相关的校正。应当注意，对于特定频带，c₁和c₂可以彼此相同，或者可以彼此不同。在一些实施方式中，c₁和c₂可以是0至1范围内(包括0和1)的任何合适的值。It should be noted that the const representing the multiplicative constant may be the same for the output ducking gain and the input ducking gain, or may be different for the output ducking gain compared to the input ducking gain. Example values for const include 1, 1.05, 1.1, 1.15, 1.2, etc. Additionally, it should be noted that the constants c ₁ and c ₂ can be different for each frequency band. In particular, the values of c ₁ and c ₂ may represent the amount of input ducking and output ducking to be applied for the frequency band, respectively. In other words, c ₁ and c ₂ can be used as band-dependent corrections to the ducking gain. For example, it may be advantageous to have no ducking in the lowest frequency band. Correspondingly, c ₁ and c ₂ can be 1 for the lowest frequency band. As another example, a relatively larger amount of ducking may be applied to the highest frequency bands. Accordingly, _c1 and _c2 can be 0 for the highest frequency bands, such that the input ducking gain and output ducking gain are determined as a ratio based on the output of the envelope tracker without band-dependent corrections to the ratios . It should be noted that for a specific frequency band, c ₁ and c ₂ may be the same as each other, or may be different from each other. In some embodiments, c ₁ and c ₂ can be any suitable value in the range of 0 to 1, inclusive.

初始的一组输出闪避增益可以被提供给输出闪避增益更新块313，以确定输出闪避增益314。类似地，初始的一组输入闪避增益可以被提供给输入闪避增益更新块315，以确定输入闪避增益316。在一些实施方式中，输出闪避增益更新块313和输入闪避增益更新块315可以被配置成执行平滑和/或闪避释放控制，以避免所应用的闪避增益出现不期望的突然变化。举例来说，在输入音频信号包括瞬变的情况下，输入闪避增益中可能存在突然的变化，例如由输入闪避增益确定块312确定的，以便闪避瞬变。继续该示例，输入闪避增益更新块315然后可以修改在瞬变之后确定的初始的一组输入闪避增益，使得经修改的输入闪避增益在由于瞬变导致的输入闪避增益的突然变化之后平滑转变。The initial set of output duck gains may be provided to the output duck gain update block 313 to determine the output duck gains 314 . Similarly, an initial set of input duck gains may be provided to input duck gain update block 315 to determine input duck gains 316 . In some embodiments, the output duck gain update block 313 and the input duck gain update block 315 may be configured to perform smoothing and/or duck release control to avoid undesirable sudden changes in the applied duck gain. For example, where the input audio signal includes a transient, there may be a sudden change in the input ducking gain, such as determined by the input ducking gain determination block 312, in order to duck the transient. Continuing with the example, input duck gain update block 315 may then modify the initial set of input duck gains determined after the transient such that the modified input duck gain transitions smoothly after the sudden change in input duck gain due to the transient.

下面描述块313和315的示例实施方式。给定表示为in_duck_gains_init的输入闪避增益的初始值和表示为out_duck_gains_init的输出闪避增益的初始值，实际输入闪避增益(表示为in_duck_gains_act)和实际输出闪避增益(表示为out_duck_gains_act)可以由以下伪代码确定：Example implementations of blocks 313 and 315 are described below. Given an initial value of the input ducking gain denoted in_duck_gains_init and an initial value of the output ducking gain denoted out_duck_gains_init, the actual input ducking gain (denoted in_duck_gains_act) and the actual output ducking gain (denoted out_duck_gains_act) can be determined by the following pseudocode:

对于每个样本s：For each sample s:

in_duck_state＝(in_duck_state-1)*in_duck_c+1in_duck_state=(in_duck_state-1)*in_duck_c+1

If(in_duck_gains_init(s)＜in_duck_state)If(in_duck_gains_init(s)＜in_duck_state)

induck_state＝in_duck_gains_init(s)induck_state=in_duck_gains_init(s)

in_duck_gains_act(s)＝in_duck_statein_duck_gains_act(s)=in_duck_state

在上文中，in_duck_state表示从一个时间帧到另一个时间帧的增益状态。In_duck_state的初始值可以设置在0与1之间。在上面给出的伪代码示例中，in_duck_c表示释放常数，所述释放常数控制闪避增益释放的快慢。换句话说，in_duck_c可以用于控制闪避增益从低值到高值的转变。在上述技术中，输入闪避增益根据释放常数进行释放，然后响应于新闪避增益样本小于释放值而进行更新。In the above, in_duck_state represents the gain state from one time frame to another. The initial value of In_duck_state can be set between 0 and 1. In the pseudocode example given above, in_duck_c represents the release constant, which controls how quickly the ducking gain is released. In other words, in_duck_c can be used to control the transition of ducking gain from low to high values. In the technique described above, the input duck gain is released based on a release constant and then updated in response to a new duck gain sample being less than the release value.

类似的方法可以用于输出闪避增益，如下面给出的伪代码样本所示。A similar approach can be used to output ducking gain, as shown in the pseudocode sample given below.

对于每个样本s：For each sample s:

out_duck_state＝(out_duck_state-1)*out_duck_c+1out_duck_state=(out_duck_state-1)*out_duck_c+1

If(out_duck_gains_init(s)＜out_duck_state)If(out_duck_gains_init(s)＜out_duck_state)

out_duck_state＝out_duck_gains_init(s)out_duck_state=out_duck_gains_init(s)

out_duck_gains_act(s)＝out_duck_stateout_duck_gains_act(s)=out_duck_state

在上面给出的伪代码示例中，out_duck_state表示从一个时间帧到另一个时间帧的增益状态。out_duck_state的初始值可以设置在0与1之间。在上面给出的示例中，out_duck_c是释放常数，所述释放常数控制闪避增益释放的快慢。换句话说，out_duck_c可以用于控制闪避增益从低值到高值的转变。在上面给出的示例中，输出闪避增益可以根据释放常数进行释放，然后可以响应于新闪避增益样本小于释放值而进行更新。In the pseudocode example given above, out_duck_state represents the gain state from one time frame to another. The initial value of out_duck_state can be set between 0 and 1. In the example given above, out_duck_c is the release constant that controls how quickly the ducking gain is released. In other words, out_duck_c can be used to control the transition of ducking gain from low to high values. In the example given above, the output duck gain can be released based on a release constant, and can then be updated in response to a new duck gain sample being less than the release value.

如上所述，解码器可以实施各种滤波器组，以基于滤波器组的频带将音频信号分成频带受限的多个信号。例如，滤波器组可以将输入音频信号分成多个频带，以在每频带的基础上确定输入闪避增益和/或输出闪避增益。作为另一示例，滤波器组可以将输入音频信号分成多个频带，以在每频带的基础上应用输入闪避增益。作为又一个示例，滤波器组可以在每频带的基础上应用输出闪避增益之前将可能已经应用了输入闪避增益的宽带去相关信号分成多个频带。如上所述，在实施多个滤波器组的实例中，滤波器组可以是同一滤波器组的多个实例，或者可以在一个或多个特性(如频带数量、频率响应、所使用的滤波器类型等)上变化。滤波器组可以将信号分成任何合适数量的频带，如两个、三个、五个、八个、16个等。在一个示例中，滤波器组将信号分成三个频带，这三个频带对应于低频、中频和高频。可以使用的示例类型的滤波器包括无限脉冲响应(IIR)滤波器、有限脉冲响应(FIR)滤波器等。每种类型的滤波器可以与不同的复杂度相关联，这可以允许在实施方式中在滤波特性与计算复杂度之间进行权衡。As mentioned above, the decoder may implement various filter banks to split the audio signal into a plurality of band-limited signals based on the frequency band of the filter bank. For example, a filter bank may divide an input audio signal into multiple frequency bands to determine input ducking gain and/or output ducking gain on a per-band basis. As another example, a filter bank may divide the input audio signal into multiple frequency bands to apply the input ducking gain on a per-band basis. As yet another example, a filter bank may split a broadband decorrelated signal into multiple bands to which an input ducking gain may have been applied before applying an output ducking gain on a per-band basis. As discussed above, in instances where multiple filter banks are implemented, the filter banks may be multiple instances of the same filter bank, or may differ in one or more characteristics (e.g., number of frequency bands, frequency response, filters used, type, etc.). Filter banks can divide the signal into any suitable number of frequency bands, such as two, three, five, eight, 16, etc. In one example, a filter bank splits the signal into three frequency bands, corresponding to low, mid, and high frequencies. Example types of filters that may be used include infinite impulse response (IIR) filters, finite impulse response (FIR) filters, and the like. Each type of filter may be associated with a different complexity, which may allow implementations to trade off filtering characteristics versus computational complexity.

图4示出了根据一些实施例的可以使用的示例滤波器组的频带的频率响应。图4所示的示例利用三个零延迟一阶IIR滤波器。这三个滤波器对应于低频带402、中频带404和高频带406。在图4所示的示例中，低频带402的截止频率为200Hz，并且高频带406的截止频率为2kHz。中频带404源自低频带402和高频带406，例如，以获得通过滤波器组的信号的完美重建。注意，在闪避增益被确定为1或接近1的情况下，信号的完美重建可以使信号有效地保持未修改。应当注意，图4所示的示例仅仅是示例性的，并且由解码器实施的滤波器组可以在频带数量、每个频带的截止频率、所使用的滤波器类型、复杂度、延迟等方面与图4所图示的滤波器组不同。Figure 4 shows the frequency response of a frequency band of an example filter bank that may be used in accordance with some embodiments. The example shown in Figure 4 utilizes three zero-delay first-order IIR filters. These three filters correspond to low frequency band 402, mid frequency band 404 and high frequency band 406. In the example shown in Figure 4, the cutoff frequency of low frequency band 402 is 200 Hz and the cutoff frequency of high frequency band 406 is 2 kHz. The mid-frequency band 404 is derived from the low-frequency band 402 and the high-frequency band 406, for example, to obtain a perfect reconstruction of the signal passing through the filter bank. Note that in the case where the ducking gain is determined to be 1 or close to 1, perfect reconstruction of the signal allows the signal to effectively remain unmodified. It should be noted that the example shown in Figure 4 is illustrative only, and the filter banks implemented by the decoder may differ from each other in terms of number of bands, cutoff frequency for each band, type of filters used, complexity, latency, etc. The filter bank illustrated in Figure 4 is different.

图5是根据一些实施例的用于在每频带的基础上应用闪避增益的示例过程500的流程图。在一些实施方式中，过程500的框可以使用解码器设备的控制系统来实施。这样的控制系统在图7中示出并且下面结合该图进行描述。在一些实施例中，过程500的框可以以除图5所示的顺序之外的顺序执行。在一些实施方式中，过程500的两个或更多个框可以基本上并行地执行。在一些实施方式中，过程500的一个或多个框可以省略。Figure 5 is a flowchart of an example process 500 for applying ducking gain on a per-band basis, in accordance with some embodiments. In some implementations, the blocks of process 500 may be implemented using a control system of the decoder device. Such a control system is shown in Figure 7 and is described below in connection with this figure. In some embodiments, the blocks of process 500 may be performed in an order other than that shown in FIG. 5 . In some implementations, two or more blocks of process 500 may be performed substantially in parallel. In some implementations, one or more blocks of process 500 may be omitted.

过程500可以在502处通过接收输入音频信号或输入音频信号帧开始。在一些实施方式中，输入音频信号可以由解码器的接收器设备(如天线)接收。在一些实施例中，可以在解码器处从发射输入音频信号的编码器设备接收输入音频信号。应当注意，在一些实施方式中，接收到的输入音频信号可以是在发射到解码器之前已经由编码器进行下混的下混音频信号。在一些这样的实施方式中，解码器可以另外接收元数据或辅助信息，其可以用于对下混信号进行上混，例如，以生成重建的音频信号，如上面结合图1进行描述的。Process 500 may begin at 502 by receiving an input audio signal or input audio signal frame. In some implementations, the input audio signal may be received by a receiver device of the decoder, such as an antenna. In some embodiments, the input audio signal may be received at the decoder from an encoder device that transmitted the input audio signal. It should be noted that in some embodiments, the received input audio signal may be a downmixed audio signal that has been downmixed by the encoder before being transmitted to the decoder. In some such implementations, the decoder may additionally receive metadata or auxiliary information, which may be used to upmix the downmix signal, for example, to generate a reconstructed audio signal, as described above in connection with FIG. 1 .

在504处，过程500可以将输入音频信号分成多个频带。例如，在一些实施方式中，过程500可以将输入音频信号提供给第一滤波器组，所述第一滤波器组将输入音频信号分成对应的频带。可以使用任何合适数量的频带，如两个、三个、五个、八个、16个等。在一个示例中，输入音频信号可以被分成三个频带，这三个频带对应于低频带、中频带和高频带，类似于在图4中示出并且上面结合该图进行描述的示例。At 504, process 500 may divide the input audio signal into multiple frequency bands. For example, in some implementations, process 500 may provide the input audio signal to a first filter bank that separates the input audio signal into corresponding frequency bands. Any suitable number of frequency bands may be used, such as two, three, five, eight, 16, etc. In one example, the input audio signal may be divided into three frequency bands corresponding to a low frequency band, a mid frequency band and a high frequency band, similar to the example shown in Figure 4 and described above in connection with this figure.

在506处，过程500可以确定与所述多个频带相对应的输入闪避增益和/或输出闪避增益。例如，如在图3中示出并且上面结合该图进行描述的，过程500可以对每个频带应用两个包络跟踪器，第一包络跟踪器对应于快包络跟踪，而第二包络跟踪器对应于慢包络跟踪。作为包络跟踪的一部分，过程500可以在绝对值计算(例如，整流)之后对每个频带应用两个低通滤波器，第一低通滤波器具有相对短的时间常数，而第二低通滤波器具有较长的时间常数。第一低通滤波器可以生成在本文中通常被称为f的表示快包络跟踪的输出，而第二低通滤波器可以生成在本文中通常被称为s的表示慢包络跟踪的输出。如在图3中示出并且上面结合该图进行描述的，输入闪避增益可以由下式确定：At 506, process 500 may determine input ducking gains and/or output ducking gains corresponding to the plurality of frequency bands. For example, as shown in Figure 3 and described above in connection with that figure, process 500 may apply two envelope trackers per frequency band, the first envelope tracker corresponding to fast envelope tracking, and the second envelope tracker The network tracker corresponds to slow envelope tracking. As part of envelope tracking, process 500 may apply two low-pass filters to each frequency band after absolute value calculation (e.g., rectification), the first low-pass filter having a relatively short time constant, and the second low-pass filter having a relatively short time constant. Filters have long time constants. The first low-pass filter may generate an output, often referred to herein as f, representing fast envelope tracking, and the second low-pass filter may generate an output, commonly referred to herein as s, representing slow envelope tracking. . As shown in Figure 3 and described above in connection with this figure, the input ducking gain can be determined by:

输出闪避增益可以通过下式来确定：The output ducking gain can be determined by the following formula:

如以上等式所示，可以基于两个包络跟踪器的输出的比率来确定输入闪避增益和输出闪避增益，其中，所述比率是基于针对每个频带选择的常数(在以上等式中表示为c₁和c₂)来修改的。举例来说，通常可以基于慢包络跟踪与快包络跟踪的比率来确定输入闪避增益，其中，每一者在比率中加权的量由常数c₁来修改。类似地，通常可以基于快包络跟踪与慢包络跟踪的比率来确定输出闪避增益，其中，每一者在比率中加权的量由常数c₂来修改。如上所述，输入闪避增益和/或输出闪避增益可以随后例如使用输入闪避增益更新块和/或输出闪避增益更新块来修改，如上面结合图3进行描述的。As shown in the above equation, the input ducking gain and the output ducking gain can be determined based on the ratio of the outputs of the two envelope trackers, where the ratio is based on a constant selected for each frequency band (expressed in the above equation Modified for c ₁ and c ₂ ). For example, the input ducking gain may typically be determined based on the ratio of slow envelope tracking to fast envelope tracking, where the amount each is weighted in the ratio is modified by the constant c ₁ . Similarly, the output ducking gain can generally be determined based on the ratio of fast envelope tracking to slow envelope tracking, where the amount each is weighted in the ratio is modified by the constant _c2 . As mentioned above, the input ducking gain and/or the output ducking gain may then be modified, for example using the input ducking gain update block and/or the output ducking gain update block, as described above in connection with FIG. 3 .

应当注意，在一些实施方式中，在确定特定频带的输入闪避增益和/或输出闪避增益之前，过程500可以获得或确定特定频带的c₁和c₂的值。在一些实施例中，对于特定频带，c₁和c₂的值可以是固定的。举例来说，在一些实施例中，对于最低频带，c₁和c₂可以固定为1，从而使得最低频带不被闪避。继续该示例，在一些实施例中，对于最高频带，c₁和c₂可以被设置为0，从而使得输入闪避增益基于无调整的慢包络跟踪与快包络跟踪的比率来确定，并且使得输出闪避增益基于无调整的快包络跟踪与慢包络跟踪的比率来确定。It should be noted that in some embodiments, process 500 may obtain or determine values for c ₁ and c ₂ for a particular frequency band prior to determining the input ducking gain and/or the output ducking gain for the particular frequency band. In some embodiments, the values of c ₁ and c ₂ may be fixed for a specific frequency band. For example, in some embodiments, c ₁ and c ₂ may be fixed to 1 for the lowest frequency band, such that the lowest frequency band is not ducked. Continuing with the example, in some embodiments, c ₁ and c ₂ may be set to 0 for the highest frequency band, such that the input ducking gain is determined based on the ratio of slow envelope tracking to fast envelope tracking without adjustment, and such that Output ducking gain is determined based on the ratio of fast envelope tracking to slow envelope tracking without adjustment.

另外，应当注意，对于多个频带中的特定频带，可以在将输入信号提供给快包络跟踪器和慢包络跟踪器之前应用高通滤波器，如在图3中示出并且上面结合该图进行描述的。高通滤波器可以用于使频谱变平和/或避免在低频隆隆声的存在下的偏差。在一些实施方式中，高通滤波器可以仅应用于多个频带的子集。在一些实施例中，对于不同的频带，高通滤波器的截止频率可以不同。如上面结合图3进行描述的，示例截止频率包括1.5kHz、2kHz、2.5kHz、3kHz、3.5kHz、4kHz等。Additionally, it should be noted that for specific ones of the multiple frequency bands, a high-pass filter may be applied before providing the input signal to the fast envelope tracker and the slow envelope tracker, as shown in Figure 3 and incorporated above described. A high-pass filter can be used to flatten the frequency spectrum and/or avoid deviations in the presence of low-frequency rumble. In some implementations, a high-pass filter may be applied to only a subset of multiple frequency bands. In some embodiments, the cutoff frequency of the high-pass filter may be different for different frequency bands. As described above in conjunction with Figure 3, example cutoff frequencies include 1.5kHz, 2kHz, 2.5kHz, 3kHz, 3.5kHz, 4kHz, etc.

在508处，过程500可以将输入闪避增益应用于所述多个频带。如在图2中示出并且上面结合该图进行描述的，在一些实施例中，过程500可以通过首先将输入音频信号延迟至少部分地根据结合框504使用的第一滤波器组应用的延迟确定的量并且随后将第二滤波器组应用于延迟的输入音频信号以将延迟的输入音频信号分成多个频带来应用输入闪避增益。输入闪避增益然后可以例如通过将特定频带的信号乘以该频带的对应的一个或多个输入闪避增益来应用于延迟的输入音频信号的多个频带。应当注意，在一些实施方式中，对于特定频带，可能存在多个时变输入闪避增益，使得时域中的频带受限音频信号的每个样本可以被输入闪避增益的对应样本闪避。在一些实施例中，第二滤波器组可以是第一滤波器组的第二实例。换句话说，在一些实施方式中，用于确定闪避增益的滤波器组可以具有与用于生成输入闪避增益所应用于的输入音频信号的多个频带的滤波器组相同的特性。相反，在一些实施方式中，第一滤波器组可以在一个或多个特性(如频率响应、频带数量、所使用的滤波器类型等)上与第二滤波器组不同。At 508, process 500 may apply input ducking gain to the plurality of frequency bands. As illustrated in FIG. 2 and described above in connection with that figure, in some embodiments, process 500 may be determined by first delaying the input audio signal based at least in part on a delay applied by the first filter bank used in conjunction with block 504 amount and then a second filter bank is applied to the delayed input audio signal to split the delayed input audio signal into multiple frequency bands to apply the input ducking gain. The input ducking gain may then be applied to multiple frequency bands of the delayed input audio signal, for example by multiplying the signal of a particular frequency band by the corresponding input ducking gain or gains for that frequency band. It should be noted that in some embodiments, there may be multiple time-varying input ducking gains for a particular frequency band, such that each sample of a band-limited audio signal in the time domain may be ducked by a corresponding sample of the input ducking gain. In some embodiments, the second filter bank may be a second instance of the first filter bank. In other words, in some embodiments, the filter bank used to determine the ducking gain may have the same characteristics as the filter bank used to generate the multiple frequency bands of the input audio signal to which the input ducking gain is applied. Conversely, in some implementations, the first filter bank may differ from the second filter bank in one or more characteristics (eg, frequency response, number of frequency bands, type of filters used, etc.).

在510处，过程500可以聚合所述多个频带上的信号以生成输入音频信号的第一闪避版本。例如，在一些实施例中，过程500可以对多个频带进行求和。在一些实施方式中，过程500可以生成聚合信号的时域版本，以生成输入音频信号的第一闪避版本。At 510, process 500 may aggregate signals over the plurality of frequency bands to generate a first ducked version of the input audio signal. For example, in some embodiments, process 500 may sum multiple frequency bands. In some implementations, process 500 may generate a time domain version of the aggregated signal to generate a first ducked version of the input audio signal.

在512处，过程500可以通过将输入音频信号的第一闪避版本提供给去相关器来生成去相关信号。在一些实施方式中，可以生成一个或多个去相关信号。在一些实施例中，由去相关器生成的去相关信号的数量可以取决于要从元数据或辅助信息进行参数重建的信号的数量，如在图1和图2中示出并且上面结合这些图进行描述的。At 512, process 500 may generate a decorrelated signal by providing a first ducked version of the input audio signal to a decorrelator. In some implementations, one or more decorrelated signals may be generated. In some embodiments, the number of decorrelated signals generated by the decorrelator may depend on the number of signals to be parametrically reconstructed from metadata or auxiliary information, as illustrated in Figures 1 and 2 and in conjunction with these Figures above. described.

在514处，过程500可以将去相关信号分成多个频带。在一些实施方式中，可以使用滤波器组来分隔每个去相关信号，如在图2和图4中示出并且上面结合这些图进行描述的。在一些实施例中，滤波器组可以与结合框504和/或508使用的滤波器组相同。相反，在一些实施例中，滤波器组可以具有与结合框504和/或508使用的滤波器组不同的一个或多个特性。At 514, process 500 may separate the decorrelated signals into multiple frequency bands. In some embodiments, a filter bank may be used to separate each decorrelated signal, as shown in Figures 2 and 4 and described above in connection with these figures. In some embodiments, the filter bank may be the same filter bank used in connection with blocks 504 and/or 508. Conversely, in some embodiments, the filter bank may have one or more different characteristics than the filter bank used in connection with blocks 504 and/or 508 .

在516处，过程500可以将输出闪避增益应用于去相关信号的多个频带，所述输出闪避增益已经在框506处确定。在一些实施方式中，对于特定频带，可以通过乘以对应的一个或多个输出闪避增益来将输出闪避增益应用于该频带。输出闪避增益然后可以例如通过将特定频带的信号乘以该频带的对应的一个或多个输出闪避增益来应用于去相关信号的多个频带。应当注意，在一些实施方式中，对于特定频带，可能存在多个时变输出闪避增益，使得时域中的频带受限去相关音频信号的每个样本可以被输出闪避增益的对应样本闪避。在一些实施方式中，输出闪避增益可以分别应用于每个去相关信号。At 516 , process 500 may apply output ducking gains, which have been determined at block 506 , to the plurality of frequency bands of the decorrelated signal. In some embodiments, for a specific frequency band, the output ducking gain may be applied to the frequency band by multiplying the corresponding output ducking gain or gains. The output ducking gain may then be applied to multiple frequency bands of the decorrelated signal, for example by multiplying the signal of a particular frequency band by the corresponding output ducking gain or gains for that frequency band. It should be noted that in some embodiments, there may be multiple time-varying output ducking gains for a particular frequency band, such that each sample of a band-limited decorrelated audio signal in the time domain may be ducked by a corresponding sample of the output ducking gain. In some implementations, output ducking gain may be applied to each decorrelated signal separately.

在518处，过程500可以生成闪避的去相关信号的宽带版本。例如，对于特定的去相关信号，过程500可以在应用输出闪避增益之后对多个频带的信号进行求和。继续该示例，过程500可以生成经求和或聚合的信号的时域表示，以生成闪避的去相关信号。At 518, process 500 may generate a broadband version of the ducked decorrelated signal. For example, for a particular decorrelated signal, process 500 may sum signals across multiple frequency bands after applying output ducking gain. Continuing with the example, process 500 may generate a time domain representation of the summed or aggregated signals to generate a ducked decorrelated signal.

应当注意，尽管过程500描述了应用输入闪避增益和输出闪避增益，但是在一些实施方式中，可以应用输入闪避增益或输出闪避增益中的任一个，而不应用另一个。例如，在将信号提供给去相关器之前，可以应用输入闪避增益来闪避特定频带中的瞬变。继续该示例，例如，在不存在偏移的情况下，可以不将输出闪避增益应用于一个或多个去相关信号。作为另一示例，可以应用输出闪避增益来闪避由去相关器生成的一个或多个去相关信号的偏移部分，而无需将输入闪避增益预先应用于提供给去相关器的信号。作为更具体的示例，在输入音频信号不包括特定类型的信号(如瞬变信号)的情况下，可以不应用输入闪避增益。It should be noted that although process 500 describes applying input ducking gain and output ducking gain, in some embodiments, either input ducking gain or output ducking gain may be applied without the other. For example, input ducking gain can be applied to duck transients in a specific frequency band before providing the signal to the decorrelator. Continuing with the example, for example, in the absence of offset, the output ducking gain may not be applied to one or more decorrelated signals. As another example, an output ducking gain may be applied to duck offset portions of one or more decorrelated signals generated by a decorrelator without having to pre-apply an input ducking gain to the signal provided to the decorrelator. As a more specific example, where the input audio signal does not include a specific type of signal, such as a transient signal, the input ducking gain may not be applied.

另外，应当注意，解码器可以利用每个闪避的去相关信号来对下混输入音频信号进行上混。例如，如在图1中示出并且上面结合该图进行描述的，闪避的去相关信号可以被提供给空间重建编解码器，所述空间重建编解码器获取由编码器提供的(多个)闪避的去相关信号和辅助信息或元数据，并对下混输入音频信号进行上混。在一些实施方式中，然后可以对上混音频信号进行渲染，例如，以在经渲染的音频信号被呈现时创建空间感知。在一些实施方式中，解码器设备可以使经渲染的音频信号例如通过一个或多个扩音器、耳机等来呈现。Additionally, it should be noted that the decoder can utilize the decorrelated signal of each duck to upmix the downmixed input audio signal. For example, as shown in Figure 1 and described above in connection with this figure, the ducked decorrelated signal may be provided to a spatial reconstruction codec that obtains the encoder(s) provided by the encoder. Decorrelated signals and ancillary information or metadata for dodging and upmixing of downmixed input audio signals. In some implementations, the upmixed audio signal may then be rendered, for example, to create a spatial perception when the rendered audio signal is presented. In some implementations, the decoder device may cause the rendered audio signal to be presented, for example, through one or more loudspeakers, headphones, etc.

图6图示了根据实施例的IVAS系统600的示例用例。在一些实施例中，各种设备通过呼叫服务器602进行通信，所述呼叫服务器被配置成从例如由PSTN/其他PLMN 604图示的公共交换电话网(PSTN)或公共陆地移动网络设备(PLMN)接收音频信号。用例支持仅以单声道渲染和捕获音频的传统设备606，包括但不限于：支持增强型语音服务(EVS)、多速率宽带(AMR-WB)和自适应多速率窄带(AMR-NB)的设备。用例还支持捕获和渲染立体声音频信号的用户设备(UE)608和/或614，或者捕获单声道信号并将其双耳渲染为多声道信号的UE 610。用例还支持分别由视频会议室系统616和/或618捕获和渲染的沉浸式和立体声信号。用例还支持家庭影院系统620的立体声音频信号的立体声捕获和沉浸式渲染，以及用于虚拟现实(VR)装备622和沉浸式内容摄取624的音频信号的单声道捕获和沉浸式渲染的计算机612。Figure 6 illustrates an example use case of the IVAS system 600 according to an embodiment. In some embodiments, various devices communicate through a call server 602 configured to call devices from the Public Switched Telephone Network (PSTN) or Public Land Mobile Network (PLMN), such as illustrated by PSTN/Other PLMN 604 Receive audio signals. Use cases support legacy devices 606 that render and capture audio in mono only, including but not limited to: supporting Enhanced Voice Service (EVS), Multi-Rate Wideband (AMR-WB), and Adaptive Multi-Rate Narrowband (AMR-NB) equipment. The use case also supports user equipment (UE) 608 and/or 614 that captures and renders stereo audio signals, or UE 610 that captures a mono signal and binaurally renders it into a multichannel signal. The use case also supports immersive and stereo signals captured and rendered by video conferencing room systems 616 and/or 618, respectively. The use case also supports stereo capture and immersive rendering of stereo audio signals for a home theater system 620 and mono capture and immersive rendering of audio signals for a virtual reality (VR) rig 622 and immersive content ingestion 624 by a computer 612 .

图7是示出了能够实施本公开的各个方面的装置的部件的示例的框图。与本文提供的其他图一样，图7中示出的元件的类型和数量仅作为示例提供。其他实施方式可以包括更多、更少和/或不同类型和数量的元件。根据一些示例，装置700可以被配置用于执行本文公开的方法中的至少一些方法。在一些实施方式中，装置700可以是或可以包括电视、音频系统的一个或多个部件、移动设备(比如蜂窝电话)、膝上型计算机、平板设备、智能扬声器或另一种类型的设备。7 is a block diagram illustrating an example of components of an apparatus capable of implementing various aspects of the present disclosure. As with the other figures provided herein, the types and numbers of elements shown in Figure 7 are provided as examples only. Other embodiments may include more, fewer, and/or different types and numbers of elements. According to some examples, apparatus 700 may be configured to perform at least some of the methods disclosed herein. In some implementations, apparatus 700 may be or may include a television, one or more components of an audio system, a mobile device (such as a cell phone), a laptop computer, a tablet device, a smart speaker, or another type of device.

根据一些替代性实施方式，装置700可以是或者可以包括服务器。在一些这样的示例中，装置700可以是或者可以包括编码器。因此，在一些情况下，装置700可以是被配置用于在如家庭音频环境的音频环境内使用的设备，然而在其他情况下，装置700可以是被配置用于在“云”中使用的设备，例如，服务器。According to some alternative embodiments, apparatus 700 may be or may include a server. In some such examples, apparatus 700 may be or include an encoder. Thus, in some cases, the apparatus 700 may be a device configured for use within an audio environment, such as a home audio environment, whereas in other situations, the apparatus 700 may be a device configured for use in the "cloud" , for example, server.

在该示例中，装置700包括接口系统705和控制系统710。在一些实施方式中，接口系统705可以被配置用于与音频环境的一个或多个其他设备进行通信。在一些示例中，音频环境可以是家庭音频环境。在其他示例中，音频环境可以是另一种类型的环境，如办公室环境、汽车环境、火车环境、街道或人行道环境、公园环境等。在一些实施方式中，接口系统705可以被配置用于与音频环境的音频设备交换控制信息和相关联的数据。在一些示例中，控制信息和相关联的数据可以与装置700正执行的一个或多个软件应用程序有关。In this example, device 700 includes interface system 705 and control system 710 . In some implementations, interface system 705 may be configured to communicate with one or more other devices of the audio environment. In some examples, the audio environment may be a home audio environment. In other examples, the audio environment may be another type of environment, such as an office environment, a car environment, a train environment, a street or sidewalk environment, a park environment, etc. In some implementations, the interface system 705 may be configured to exchange control information and associated data with audio devices of the audio environment. In some examples, control information and associated data may relate to one or more software applications being executed by device 700.

在一些实施方式中，接口系统705可以被配置用于接收内容流或用于提供内容流。内容流可以包括音频数据。音频数据可以包括但可以不限于音频信号。在一些情况下，音频数据可以包括如声道数据和/或空间元数据等空间数据。在一些示例中，内容流可以包括视频数据和与视频数据相对应的音频数据。In some implementations, interface system 705 may be configured to receive a content stream or to provide a content stream. The content stream may include audio data. Audio data may include, but may not be limited to, audio signals. In some cases, the audio data may include spatial data such as channel data and/or spatial metadata. In some examples, the content stream may include video data and audio data corresponding to the video data.

接口系统705可以包括一个或多个网络接口和/或一个或多个外部设备接口(如一个或多个通用串行总线(USB)接口)。根据一些实施方式，接口系统705可以包括一个或多个无线接口。接口系统705可以包括用于实施用户接口的一个或多个设备，如一个或多个麦克风、一个或多个扬声器、显示系统、触摸传感器系统和/或手势传感器系统。在一些示例中，接口系统705可以包括控制系统710与存储器系统(如图7中示出的可选存储器系统715)之间的一个或多个接口。然而，在一些情况下，控制系统710可以包括存储器系统。在一些实施方式中，接口系统705可以被配置用于从环境中的一个或多个麦克风接收输入。Interface system 705 may include one or more network interfaces and/or one or more external device interfaces (eg, one or more Universal Serial Bus (USB) interfaces). According to some implementations, interface system 705 may include one or more wireless interfaces. Interface system 705 may include one or more devices for implementing a user interface, such as one or more microphones, one or more speakers, a display system, a touch sensor system, and/or a gesture sensor system. In some examples, interface system 705 may include one or more interfaces between control system 710 and a memory system (such as optional memory system 715 shown in Figure 7). However, in some cases, control system 710 may include a memory system. In some implementations, interface system 705 may be configured to receive input from one or more microphones in the environment.

例如，控制系统710可以包括通用单芯片或多芯片处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其他可编程逻辑设备、离散门或晶体管逻辑、和/或离散硬件部件。For example, control system 710 may include a general-purpose single-chip or multi-chip processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gates, or transistors Logic, and/or discrete hardware components.

在一些实施方式中，控制系统710可以驻留在多于一个设备中。例如，在一些实施方式中，控制系统710的一部分可以驻留在本文描绘的环境之一内的设备中，并且控制系统710的另一部分可以驻留在环境之外的设备中，如服务器、移动设备(例如，智能电话或平板计算机)等。在其他示例中，控制系统710的一部分可以驻留在一种环境内的设备中，并且控制系统710的另一部分可以驻留在所述环境的一个或多个其他设备中。例如，控制系统710的一部分可以驻留在实施基于云的服务的设备(如服务器)中，并且控制系统710的另一部分可以驻留在实施基于云的服务的另一设备(如另一服务器、存储器设备等)中。在一些示例中，接口系统705还可以驻留在多于一个设备中。In some implementations, control system 710 may reside in more than one device. For example, in some implementations, a portion of the control system 710 may reside in a device within one of the environments depicted herein, and another portion of the control system 710 may reside in a device outside of the environment, such as a server, mobile device (e.g., smartphone or tablet computer), etc. In other examples, a portion of the control system 710 may reside in a device within an environment, and another portion of the control system 710 may reside in one or more other devices in the environment. For example, a portion of the control system 710 may reside in a device that implements a cloud-based service, such as a server, and another portion of the control system 710 may reside in another device that implements a cloud-based service, such as another server, memory device, etc.). In some examples, interface system 705 may also reside in more than one device.

在一些实施方式中，控制系统710可以被配置用于至少部分地执行本文公开的方法。根据一些示例，控制系统710可以被配置用于实施将音频信号分成多个频带、基于频带确定输入闪避增益和/或输出闪避增益、对每频带应用输入闪避增益、对宽带音频信号应用去相关器、在去相关音频信号的每频带基础上应用输出闪避增益等方法。In some implementations, control system 710 may be configured to perform, at least in part, the methods disclosed herein. According to some examples, control system 710 may be configured to perform splitting the audio signal into multiple frequency bands, determining input ducking gains and/or output ducking gains based on frequency bands, applying input ducking gains per frequency band, applying a decorrelator to the wideband audio signal , methods such as applying output ducking gain on a per-band basis to the decorrelated audio signal.

本文描述的一些或所有方法可以由一个或多个设备根据存储在一个或多个非暂态介质上的指令(例如，软件)来执行。这种非暂态介质可以包括如本文描述的存储器设备，包括但不限于随机存取存储器(RAM)设备、只读存储器(ROM)设备等。一个或多个非暂态介质可以例如位于图7中所示的可选存储器系统715和/或控制系统710中。因此，可以在其上存储有软件的一个或多个非暂态介质中实施本公开中所描述的主题的各个创新方面。软件可以例如包括用于将音频信号分成多个频带、基于频带确定输入闪避增益和/或输出闪避增益、对每频带应用输入闪避增益、对宽带音频信号应用去相关器、在去相关音频信号的每频带的基础上应用输出闪避增益等的指令。例如，软件可以由如图7的控制系统710等控制系统的一个或多个部件执行。Some or all of the methods described herein may be performed by one or more devices according to instructions (eg, software) stored on one or more non-transitory media. Such non-transitory media may include memory devices as described herein, including but not limited to random access memory (RAM) devices, read only memory (ROM) devices, and the like. One or more non-transitory media may be located, for example, in optional memory system 715 and/or control system 710 shown in FIG. 7 . Accordingly, various innovative aspects of the subject matter described in this disclosure may be implemented in one or more non-transitory media having software stored thereon. Software may, for example, include functions for splitting the audio signal into multiple frequency bands, determining input ducking gains and/or output ducking gains based on the frequency bands, applying input ducking gains per frequency band, applying a decorrelator to the wideband audio signal, performing the decorrelation of the audio signal on Apply commands for output ducking gain, etc. on a per-band basis. For example, the software may be executed by one or more components of a control system such as control system 710 of FIG. 7 .

在一些示例中，装置700可以包括图7中示出的可选麦克风系统720。可选麦克风系统720可以包括一个或多个麦克风。在一些实施方式中，一个或多个麦克风可以是另一设备(如扬声器系统的扬声器、智能音频设备等)的一部分或与其相关联。在一些示例中，装置700可以不包括麦克风系统720。然而，在一些这样的实施方式中，装置700仍然可以被配置成经由接口系统710接收音频环境中的一个或多个麦克风的麦克风数据。在一些这样的实施方式中，装置700的基于云的实施方式可以被配置成经由接口系统710从音频环境中的一个或多个麦克风接收麦克风数据或至少部分地与麦克风数据相对应的噪声指标。In some examples, device 700 may include optional microphone system 720 shown in FIG. 7 . Optional microphone system 720 may include one or more microphones. In some implementations, one or more microphones may be part of or associated with another device (such as a speaker of a speaker system, a smart audio device, etc.). In some examples, device 700 may not include microphone system 720. However, in some such implementations, device 700 may still be configured to receive microphone data for one or more microphones in the audio environment via interface system 710 . In some such implementations, cloud-based implementations of apparatus 700 may be configured to receive microphone data or noise indicators corresponding at least in part to the microphone data from one or more microphones in the audio environment via interface system 710 .

根据一些实施方式，装置700可以包括图7中示出的可选扩音器系统725。可选扩音器系统725可以包括一个或多个扩音器，该扩音器在本文中也可以被称为“扬声器”，或更通常地被称为“音频再现换能器”。在一些示例(例如，基于云的实施方式)中，装置700可以不包括扩音器系统725。在一些实施方式中，装置700可以包括耳机。耳机可以经由耳机插孔或经由无线连接(例如，蓝牙)连接或耦接到装置700。According to some embodiments, the apparatus 700 may include the optional loudspeaker system 725 shown in FIG. 7 . Optional loudspeaker system 725 may include one or more loudspeakers, which may also be referred to herein as "speakers," or more generally as "audio reproduction transducers." In some examples (eg, cloud-based implementations), device 700 may not include loudspeaker system 725 . In some implementations, device 700 may include headphones. Headphones may be connected or coupled to device 700 via a headphone jack or via a wireless connection (eg, Bluetooth).

本公开的一些方面包括一种被配置(例如，被编程)成执行所公开方法的一个或多个示例的系统或设备，以及一种存储用于实施所公开方法或其步骤的一个或多个示例的代码的有形计算机可读介质(例如，磁盘)。例如，一些公开的系统可以是或者包括可编程通用处理器、数字信号处理器或微处理器，该可编程通用处理器、数字信号处理器或微处理器用软件或固件编程为和/或以其他方式被配置成对数据执行各种操作中的任一个，包括所公开方法或其步骤的实施例。这样的通用处理器可以是或者包括计算机系统，该计算机系统包括输入设备、存储器和处理子系统，该处理子系统被编程(和/或以其他方式被配置)为响应于向其断言的数据而执行所公开方法(或其步骤)的一个或多个示例。Some aspects of the present disclosure include a system or device configured (e.g., programmed) to perform one or more examples of the disclosed methods, and a system or device that stores one or more instructions for performing the disclosed methods or steps thereof. A tangible computer-readable medium (for example, a disk) of the code for the examples. For example, some disclosed systems may be or include a programmable general purpose processor, digital signal processor, or microprocessor programmed with software or firmware and/or otherwise Means are configured to perform any of a variety of operations on data, including embodiments of the disclosed methods or steps thereof. Such a general-purpose processor may be or include a computer system including an input device, a memory, and a processing subsystem programmed (and/or otherwise configured) to operate in response to data asserted thereto. One or more examples of performing the disclosed methods (or steps thereof).

一些实施例可以被实施为可配置的(例如，可编程的)数字信号处理器(DSP)，该数字信号处理器被配置(例如，被编程和以其他方式被配置)为对(多个)音频信号执行需要的处理，包括对所公开方法的一个或多个示例的执行。替代性地，所公开系统(或其元件)的实施例可以被实施为通用处理器(例如，个人计算机(PC)或其他计算机系统或微处理器，其可以包括输入设备和存储器)，所述通用处理器用软件或固件编程为和/或以其他方式被配置成执行各种操作中的任一个，包括所公开的方法的一个或多个示例。替代性地，本发明系统的一些实施例的元件被实施为被配置(例如，被编程)成执行所公开方法的一个或多个示例的通用处理器或DSP，并且所述系统还包括其他元件。其他元件可以包括一个或多个扩音器和/或一个或多个麦克风。被配置成执行所公开方法的一个或多个示例的通用处理器可以耦接到输入设备。输入设备的示例包括例如鼠标和/或键盘。通用处理器可以耦接到存储器、显示设备等。Some embodiments may be implemented as a configurable (e.g., programmable) digital signal processor (DSP) configured (e.g., programmed and otherwise configured) to pair The audio signal performs required processing, including performance of one or more examples of the disclosed methods. Alternatively, embodiments of the disclosed system (or elements thereof) may be implemented as a general purpose processor (eg, a personal computer (PC) or other computer system or microprocessor, which may include input devices and memory), A general purpose processor is programmed with software or firmware and/or otherwise configured to perform any of a variety of operations, including one or more examples of the disclosed methods. Alternatively, elements of some embodiments of the inventive system are implemented as general purpose processors or DSPs configured (eg, programmed) to perform one or more examples of the disclosed methods, and the systems further include other elements. . Other elements may include one or more loudspeakers and/or one or more microphones. A general-purpose processor configured to perform one or more examples of the disclosed methods may be coupled to an input device. Examples of input devices include, for example, a mouse and/or a keyboard. A general-purpose processor may be coupled to memory, display devices, and the like.

本公开的另一方面是一种计算机可读介质(如磁盘或其他有形存储介质)，所述计算机可读介质存储用于(例如，由可执行以执行所公开方法或其步骤的一个或多个示例的编码器)执行所公开方法或其步骤的一个或多个示例的代码。Another aspect of the present disclosure is a computer-readable medium (such as a disk or other tangible storage medium) storing one or more instructions executable to perform the disclosed method or steps thereof. code that performs one or more examples of the disclosed method or steps thereof.

虽然在本文中已经描述了本公开的具体实施例和本公开的应用，但是对于本领域普通技术人员而言显而易见的是，在不脱离本文描述的并要求保护的本公开的范围的情况下，可以对本文描述的实施例和应用进行许多改变。应当理解，虽然已经示出和描述了本公开的某些形式，但是本公开不限于所描述和示出的具体实施例或所描述的具体方法。While specific embodiments of the disclosure and applications of the disclosure have been described herein, it will be apparent to those of ordinary skill in the art that without departing from the scope of the disclosure as described and claimed herein, Many variations may be made to the embodiments and applications described herein. It is to be understood that, while certain forms of the disclosure have been shown and described, the disclosure is not limited to the specific embodiments described and illustrated, or the specific methods described.

Claims

1. A method for processing audio signals, the method comprising:

receiving an input audio signal at the decoder, wherein the input audio signal is a downmix audio signal;

dividing the input audio signal into a first set of frequency bands;

determining a set of ducking gains, the ducking gains of the set of ducking gains corresponding to frequency bands of the first set of frequency bands; and

Generating at least one wideband decorrelated audio signal, wherein the at least one wideband decorrelated audio signal is capable of upmixing the downmix audio signal, and wherein a ducking gain in the set of ducking gains is applied to At least one of: 1) a second set of frequency bands prior to generating the at least one wideband decorrelated audio signal; or 2) a third set of frequency bands separating the at least one wideband decorrelated audio signal.

2. The method of claim 1, wherein the set of ducking gains includes a set of input ducking gains, and the method further comprises inputting the set of input ducking gains prior to generating the at least one broadband decorrelated audio signal. Input ducking gain is applied to the second set of frequency bands.

3. The method of claim 2, wherein ducking signals associated with frequency bands in the second set of frequency bands are aggregated to generate a wideband ducking signal, the wideband ducking signal being provided to a decorrelator, said The decorrelator is configured to generate the at least one wideband decorrelated audio signal.

4. The method of any one of claims 1 to 3, wherein the first set of frequency bands and the second set of frequency bands are two instances of the same set of frequency bands.

5. The method of any one of claims 1 to 4, wherein the set of ducking gains includes a set of output ducking gains, and the method further comprises:

Applying an output ducking gain of the set of output ducking gains to the third set of frequency bands to generate at least one set of ducking decorrelated audio signals, each of the at least one set of ducking decorrelated audio signals Corresponding to a frequency band in said third set of frequency bands; and

Aggregating ducking decorrelated audio signals in the at least one set of ducking decorrelated audio signals to generate at least one wideband ducking decorrelated audio signal, the at least one wideband ducking decorrelated audio signal being able to be used to downmix the audio signal Do upmix.

6. The method of any one of claims 1 to 5, wherein determining the set of ducking gains includes:

Determine one or more initial dodge buffs; and

At least one of the one or more initial ducking gains is modified to generate the set of ducking gains, wherein the at least one of the one or more initial ducking gains is modified by performing update and/or release control.

7. The method of any one of claims 1 to 6, wherein, for a frequency band in the first set of frequency bands, a corresponding ducking gain is determined based on a ratio comprising the outputs of two envelope trackers, so The two envelope trackers described above correspond to the slow envelope tracker and the fast envelope tracker.

8. The method of claim 7, wherein the slow envelope tracker includes an absolute value calculation block and a first low pass filter, and wherein the fast envelope tracker includes the absolute value calculation block and a second low-pass filter, the first low-pass filter and the second low-pass filter having different time constants.

9. The method of claim 7, further comprising applying a high pass filter to at least one of the first set of frequency bands, wherein an output of the high pass filter is provided to the two envelope trackers at least one of the devices.

10. The method of claim 9, wherein the high-pass filter is applied to two or more of the first set of frequency bands, and wherein the high-pass filter is applied to the two or more The high-pass filter for a first one of the frequency bands has a different cutoff frequency than the high-pass filter applied to a second one of the two or more frequency bands.

11. The method of any one of claims 7 to 10, wherein the first low-pass filter of the slow envelope tracker has a smaller diameter than the second low-pass filter of the fast envelope tracker. The time constant is a longer time constant, and wherein the ratio includes the ratio of the output of the slow envelope tracker to the output of the fast envelope tracker.

12. The method of any one of claims 7 to 11, wherein the first low-pass filter of the slow envelope tracker has a smaller diameter than the second low-pass filter of the fast envelope tracker. The time constant is a longer time constant, and wherein the ratio includes the ratio of the output of the fast envelope tracker to the slow envelope tracker.

13. The method of any one of claims 7 to 12, wherein the ratio includes a constant specific to a frequency band in the first set of frequency bands, the constant being selected to control at least Either: 1) the amount of ducking gain applied to each of the second set of frequency bands; or 2) the amount of ducking gain applied to each of the third set of frequency bands.

14. The method of any one of claims 1 to 13, wherein dividing the input audio signal into the first set of frequency bands includes providing the input audio signal to a filter bank.

15. The method of claim 14, wherein the filter bank is implemented as an infinite impulse response (IIR) filter bank or a finite impulse response (FIR) filter bank.

16. The method of any one of claims 1 to 15, wherein the first set of frequency bands, the second set of frequency bands and/or the third set of frequency bands comprise three frequency bands.

17. The method of any one of claims 1 to 16, wherein the first set of frequency bands and the third set of frequency bands are the same.

18. The method of any one of claims 1 to 17, wherein the at least one broadband decorrelated signal includes two or more broadband decorrelated signals.

19. The method of any one of claims 1 to 18, further comprising upmixing the downmix audio signal using the at least one wideband decorrelation signal and metadata received at the decoder. mix to generate a reconstructed audio signal.

20. The method of claim 19, further comprising rendering the reconstructed audio signal to generate a rendered audio signal.

21. The method of claim 20, further comprising presenting the rendered audio signal using one or more of: a loudspeaker or headphones.

22. An apparatus configured for carrying out the method of any one of claims 1 to 21.

23. One or more non-transitory media having stored thereon software comprising instructions for controlling one or more devices to perform the method of any one of claims 1 to 21.