HK40124189A - Methods, apparatus and systems for performing perceptually motivated gain control - Google Patents
Methods, apparatus and systems for performing perceptually motivated gain controlInfo
- Publication number
- HK40124189A HK40124189A HK62025112713.5A HK62025112713A HK40124189A HK 40124189 A HK40124189 A HK 40124189A HK 62025112713 A HK62025112713 A HK 62025112713A HK 40124189 A HK40124189 A HK 40124189A
- Authority
- HK
- Hong Kong
- Prior art keywords
- gain
- frame
- audio signal
- conversion function
- encoder
- Prior art date
Links
Description
相关申请的交叉引用Cross-references to related applications
本申请要求2022年10月6日提交的美国临时申请第63/378678号和2023年5月22日提交的美国临时申请第63/503533号的优先权,其中每个申请的全部内容通过引用并入本文。This application claims priority to U.S. Provisional Application No. 63/378678, filed October 6, 2022, and U.S. Provisional Application No. 63/503533, filed May 22, 2023, the entire contents of each of which are incorporated herein by reference.
技术领域Technical Field
本公开涉及用于音频环境中的自适应增益控制的系统、方法和介质。This disclosure relates to systems, methods, and media for adaptive gain control in audio environments.
背景技术Background Technology
例如,可以使用增益控制来将信号衰减到音频编解码器所期望的范围内。为了改善在编码器处应用增益控制以及在解码器处应用逆增益控制的音频信号的感知质量,已经提出了在应用于连续帧的不同增益之间平滑转换的增益转换函数。如果在连续的帧之间存在剧烈的增益变化,则该方法可能导致可听到的伪影。此外,在一些情况下,连续帧的所确定的增益之间的增益变化太大和/或太突然,无法应用平滑转换函数。在这种情况下,可以使用硬转换来确保信号在预期范围内。例如,可以使用单个比特来传达在连续帧的增益之间使用硬转换的信息。然而,这种硬转换也可能导致被解码和再现的音频信号中比由原始过载状况引入的伪影更差的可听到的伪影。因此,存在使用增益转换函数来改善编码/解码系统的感知质量并且减少编码所需的比特的需求。For example, gain control can be used to attenuate a signal to the range desired by the audio codec. To improve the perceptual quality of audio signals with gain control applied at the encoder and inverse gain control applied at the decoder, gain transfer functions that smoothly transition between different gains applied to consecutive frames have been proposed. If there are drastic gain changes between consecutive frames, this method can lead to audible artifacts. Furthermore, in some cases, the gain changes between determined gains of consecutive frames are too large and/or too abrupt to apply a smooth transfer function. In such cases, hard transfer can be used to ensure the signal remains within the expected range. For example, a single bit can be used to convey the information about using hard transfer between gains of consecutive frames. However, such hard transfer can also result in audible artifacts in the decoded and reproduced audio signal that are worse than those introduced by the original overload condition. Therefore, there is a need to use gain transfer functions to improve the perceptual quality of the encoding/decoding system and reduce the number of bits required for encoding.
记法和命名法Notation and naming conventions
在整个本公开中,包括在权利要求书中,术语“扬声器”、“扩音器”和“音频再现换能器”被同义地用于表示任何发出声音的换能器或换能器组。一套典型的耳机包括两个扬声器。扬声器可以被实现为包括多个换能器,例如低音扬声器和高音扬声器,其可以由单个公共扬声器馈送或多个扬声器馈送来驱动。在一些示例中,(多个)扬声器馈送可以在耦合到不同换能器的不同电路分支中经历不同的处理。Throughout this disclosure, including in the claims, the terms "loudspeaker," "amplifier," and "audio reproduction transducer" are used synonymously to refer to any transducer or group of transducers that emits sound. A typical set of headphones includes two loudspeakers. A loudspeaker can be implemented to include multiple transducers, such as a woofer and a tweeter, which can be driven by a single common loudspeaker feed or multiple loudspeaker feeds. In some examples, the loudspeaker feeds may undergo different processing in different circuit branches coupled to different transducers.
在整个本公开中,包括在权利要求中,“对”信号或数据执行操作的表述,例如对信号或数据进行滤波、缩放、变换或应用增益,在广义上用于表示直接对信号或数据或对信号或数据的经处理版本执行操作。例如,可以对在对信号执行操作之前经过初步滤波或预处理的信号版本执行操作。Throughout this disclosure, including in the claims the expression "to perform an operation on" a signal or data, such as filtering, scaling, transforming, or applying gain to the signal or data, is broadly used to mean performing an operation directly on the signal or data or on a processed version of the signal or data. For example, an operation may be performed on a version of the signal that has undergone preliminary filtering or preprocessing before the operation is performed on the signal.
在整个本公开中,包括在权利要求中,表述“系统”在广义上用于表示设备、系统或子系统。例如,实现解码器的子系统可以被称为解码器系统,并且包括这样的子系统的系统(例如,响应于多个输入而生成X个输出信号的系统,其中子系统生成M个输入并且从外部源接收其他X-M个输入)也可以被称为解码器系统。Throughout this disclosure, including in the claims, the term "system" is used broadly to refer to a device, system, or subsystem. For example, a subsystem implementing a decoder may be referred to as a decoder system, and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, wherein the subsystem generates M inputs and receives other X-M inputs from an external source) may also be referred to as a decoder system.
在整个本公开中,包括在权利要求中,术语“处理器”在广义上用于表示(例如利用软件或固件)可编程或以其他方式可配置为对数据执行操作的系统或设备,该数据可以包括音频或视频或其他图像数据。处理器的示例包括现场可编程门阵列(或其他可配置集成电路或芯片组)、被编程和/或以其他方式配置为对音频或其他声音数据执行流水线处理的数字信号处理器、可编程通用处理器或计算机、以及可编程微处理器芯片或芯片组。Throughout this disclosure, including in the claims, the term "processor" is used broadly to refer to a system or device (e.g., using software or firmware) that is programmable or otherwise configurable to perform operations on data, which may include audio or video or other image data. Examples of processors include field-programmable gate arrays (or other configurable integrated circuits or chipsets), digital signal processors programmed and/or otherwise configured to perform pipelined processing of audio or other sound data, programmable general-purpose processors or computers, and programmable microprocessor chips or chipsets.
发明内容Summary of the Invention
有鉴于此,本公开提供了用于改善自动增益控制的方法、装置和程序以及计算机可读存储介质,其具有相应独立权利要求的特征。In view of this, this disclosure provides methods, apparatus and programs for improving automatic gain control, as well as computer-readable storage media, having the features of the respective independent claims.
根据本公开的一个方面,提供了一种对音频信号执行增益控制的方法。音频信号可以是高阶环境立体声(Higher Order Ambisonics,HOA)音频信号。在该方法中,可以获得待编码的音频信号的下混合音频信号。获得音频信号可以包括接收下混合音频信号。备选地,它可以包括从待编码的音频信号确定下混合音频信号。此外,可以确定对于下混合音频信号的帧已经发生过载状况。过载状况可以是下混合音频信号的该帧超过预定义信号范围的状况。预定义信号范围可以是编码器期望的信号范围。编码器可以是核心编码器。响应于确定已经发生过载状况,可以确定该帧的增益转换函数。增益转换函数可以至少基于增益转换步长。可以将增益转换函数应用于该帧以生成下混合音频信号的经增益调整的帧。经增益调整的帧可以是衰减帧或放大帧。可以提供经增益调整的帧和指示增益转换函数的信息以供编码器编码。According to one aspect of this disclosure, a method for performing gain control on an audio signal is provided. The audio signal may be a Higher Order Ambisonics (HOA) audio signal. In this method, a downmixed audio signal of the audio signal to be encoded can be obtained. Obtaining the audio signal may include receiving the downmixed audio signal. Alternatively, it may include determining the downmixed audio signal from the audio signal to be encoded. Furthermore, it may be determined that an overload condition has occurred for a frame of the downmixed audio signal. The overload condition may be a condition where the frame of the downmixed audio signal exceeds a predefined signal range. The predefined signal range may be a signal range desired by the encoder. The encoder may be a core encoder. In response to determining that an overload condition has occurred, a gain conversion function for the frame can be determined. The gain conversion function may be based at least on a gain conversion step size. The gain conversion function may be applied to the frame to generate a gain-adjusted frame of the downmixed audio signal. The gain-adjusted frame may be a attenuated frame or an amplified frame. The gain-adjusted frame and information indicating the gain conversion function may be provided for encoding by the encoder.
通过将增益转换函数限制到增益转换步长,可以实现从连续增益的平滑且不太突然的转换。增益转换步长可能不足以将帧的所有样本衰减到核心编码器所需的信号范围。然而,与增益参数的非常突然的增大或减小相比,由于小的过冲而产生的伪影不那么明显。因此,通过允许某些值超出所需的信号范围,可以在对信号进行解码、渲染和回放时实现改善的音频体验。By limiting the gain transition function to the gain transition step size, a smooth and less abrupt transition from continuous gain can be achieved. The gain transition step size may not be sufficient to attenuate all samples of a frame to the signal range required by the core encoder. However, the artifacts caused by small overshoots are less noticeable compared to very abrupt increases or decreases in the gain parameter. Therefore, by allowing certain values to exceed the desired signal range, an improved audio experience can be achieved when decoding, rendering, and playing back the signal.
在一些实施例中,可以将经增益调整的帧与指示增益转换函数的信息一起编码。In some embodiments, the gain-adjusted frame may be encoded together with information indicating the gain conversion function.
在一些实施例中,下混合音频信号可以是空间编码的下混合信号。In some embodiments, the downmixed audio signal may be a spatially encoded downmixed signal.
在一些实施例中,下混合音频信号的帧可以是当前帧,并且增益转换函数还基于向当前帧之前的帧应用的先前增益转换函数。In some embodiments, the frame of the downmixed audio signal may be the current frame, and the gain conversion function may also be based on a previous gain conversion function applied to frames preceding the current frame.
在一些实施例中,增益转换函数可以进一步依赖于基于增益转换步长的平滑函数。In some embodiments, the gain transition function may further rely on a smoothing function based on the gain transition step size.
在一些实施例中,增益转换函数可以包括瞬时部分和稳态部分。瞬时部分可以对应于从与前一帧相关联的增益到按增益转换步长调整的与前一帧相关联的增益的转换。In some embodiments, the gain conversion function may include an instantaneous portion and a steady-state portion. The instantaneous portion may correspond to a conversion from the gain associated with the previous frame to a gain associated with the previous frame adjusted by a gain conversion step size.
在一些实施例中,根据当前帧的增益调整目标,按增益转换步长调整的与前一帧相关联的增益可以是对与前一帧相关联的增益的按增益转换步长的衰减或按增益转换步长的放大。In some embodiments, the gain associated with the previous frame that is adjusted by the gain conversion step size, based on the gain adjustment target of the current frame, can be a decrease or an increase of the gain associated with the previous frame by the gain conversion step size.
在一些实施例中,瞬时部分的长度可以由编码器和解码器使用的编解码器引入的延迟来限制。In some embodiments, the length of the instantaneous portion may be limited by the delay introduced by the codec used by the encoder and decoder.
因此,增益控制确实引入了基本上为零的附加延迟。Therefore, gain control does introduce an additional delay that is essentially zero.
在一些实施例中,瞬时部分的长度可以等于或小于编码器用于编码操作的样本数。In some embodiments, the length of the instantaneous portion may be equal to or less than the number of samples used by the encoder for encoding operations.
在一些实施例中,增益转换函数可以被定义为In some embodiments, the gain conversion function can be defined as
其中,DBSTEP是增益转换步长,l是样本索引,j是帧索引,p()是平滑函数,lend表示为其定义了p()的最右边的索引,而L是一帧的样本数。Where DBSTEP is the gain conversion step size, l is the sample index, j is the frame index, p() is the smoothing function, l end indicates that it defines the rightmost index of p(), and L is the number of samples in a frame.
在一些实施例中,增益转换步长可以是预定义值,或者可以从大小增加的一组预定义值确定。预定义值或该组预定义值可以基于感知质量收听测试或客观质量测量测试来确定。感知质量收听测试可以是具有隐藏参照和锚的多刺激测试(MUSHRA)。感知质量收听测试可以是编码器和解码器处的自动增益控制的调谐过程的一部分。In some embodiments, the gain transition step size may be a predefined value, or it may be determined from a set of predefined values that increase in size. The predefined value or the set of predefined values may be determined based on perceived quality listening tests or objective quality measurement tests. Perceived quality listening tests may be a multi-stimulus test (MUSHRA) with hidden references and anchors. Perceived quality listening tests may be part of the tuning process for automatic gain control at the encoder and decoder.
在一些实施例中,该方法还可以包括确定由下混合音频信号的帧引起的过载量。此外,可以取决于过载量从大小增加的一组预定义值确定增益转换步长。In some embodiments, the method may further include determining the amount of overload caused by frames of the downmixed audio signal. Furthermore, the gain conversion step size may be determined based on a set of predefined values increasing in magnitude from the overload amount.
因此,增益转换步长可以适应连续帧之间所需的变化率。Therefore, the gain conversion step size can be adapted to the required rate of change between consecutive frames.
在一些实施例中,将增益转换函数应用于帧以生成下混合信号的经增益调整的帧可以包括将增益转换函数应用于下混合音频信号的样本。样本的总数可以对应于下混合音频信号的帧。In some embodiments, applying a gain conversion function to a frame to generate a gain-adjusted frame of the downmixed signal may include samples to which the gain conversion function is applied to the downmixed audio signal. The total number of samples may correspond to the frames of the downmixed audio signal.
在一些实施例中,将经增益调整的帧与指示增益转换函数的信息一起编码可以包括基于增益转换函数来确定编码方案。在一些情况下,可以基于增益转换步长来确定编码方案。在一些情况下,可以基于过载状况是否已被消除来确定编码方案。编码方案可以是修正离散余弦变换(MDCT)或代数码激励线性预测(ACELP)之一。In some embodiments, encoding the gain-adjusted frame along with information indicating the gain transfer function may include determining a coding scheme based on the gain transfer function. In some cases, the coding scheme may be determined based on the gain transfer step size. In some cases, the coding scheme may be determined based on whether the overload condition has been eliminated. The coding scheme may be one of Modified Discrete Cosine Transform (MDCT) or Algebraically Excited Linear Prediction (ACELP).
由此,可以针对特定音频信号和所需的增益转换步长来优化编码方案。Therefore, the encoding scheme can be optimized for a specific audio signal and the required gain conversion step size.
根据另一方面,提供了一种对音频信号执行增益控制的方法。在该方法中,可以由解码器接收音频信号的编码帧。音频信号的编码帧可以被解码以获得下混合音频信号的帧和指示由编码器应用的增益控制的信息。可以至少部分地基于指示由编码器应用的增益控制的信息来确定要应用于下混合音频信号的帧的逆增益转换函数。指示由编码器应用的增益控制的信息可以包括增益转换步长。逆增益转换函数可以应用于下混合音频信号的帧。According to another aspect, a method for performing gain control on an audio signal is provided. In this method, an encoded frame of the audio signal can be received by a decoder. The encoded frame of the audio signal can be decoded to obtain a frame of a submixed audio signal and information indicating the gain control applied by the encoder. The inverse gain transfer function to be applied to the frame of the submixed audio signal can be determined at least in part based on the information indicating the gain control applied by the encoder. The information indicating the gain control applied by the encoder may include a gain transfer step size. The inverse gain transfer function can be applied to the frame of the submixed audio signal.
在一些实施例中,该方法还可以包括对下混合的音频信号进行上混合以生成上混合音频信号。上混合音频信号可以适合于渲染。In some embodiments, the method may further include upmixing the downmixed audio signal to generate an upmixed audio signal. The upmixed audio signal may be suitable for rendering.
在一些实施例中,该方法还可以包括渲染上混合信号以产生经渲染的音频数据。In some embodiments, the method may further include rendering the mixed signal to produce rendered audio data.
在一些实施例中,该方法还可以包括使用扩音器或耳机中的一个或多个回放经渲染的音频数据。In some embodiments, the method may further include playing back rendered audio data using one or more loudspeakers or headphones.
在一些实施例中,可以通过对编码器应用的增益转换函数求逆来确定逆增益转换函数。In some embodiments, the inverse gain conversion function can be determined by inverting the gain conversion function applied to the encoder.
在一些实施例中,逆增益转换函数可以包括瞬时部分和稳态部分。In some embodiments, the inverse gain conversion function may include an instantaneous part and a steady-state part.
本文描述的一些或全部操作、功能和/或方法可以由一个或多个设备根据存储在一个或多个非瞬态介质上的指令(例如,软件)来执行。这样的非瞬态介质可以包括诸如本文描述的那些存储器设备,包括但不限于随机存取存储器(RAM)设备、只读存储器(ROM)设备等。因此,本公开中描述的主题的一些创新方面可以通过其上存储有软件的一个或多个非瞬态介质来实现。Some or all of the operations, functions, and/or methods described herein can be performed by one or more devices according to instructions (e.g., software) stored on one or more non-transient media. Such non-transient media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. Therefore, some innovative aspects of the subject matter described in this disclosure can be implemented by one or more non-transient media on which software is stored.
本公开的至少一些方面可以经由装置来实现。例如,一个或多个设备能够至少部分地执行本文公开的方法。在一些实现中,装置是或包括具有接口系统和控制系统的音频处理系统。控制系统可以包括一个或多个通用单芯片或多芯片处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其他可编程逻辑器件、分立门或晶体管逻辑、分立硬件组件或其组合。At least some aspects of this disclosure can be implemented via apparatus. For example, one or more devices are capable of performing at least partially the methods disclosed herein. In some implementations, the apparatus is or includes an audio processing system having an interface system and a control system. The control system may include one or more general-purpose single-chip or multi-chip processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or combinations thereof.
本说明书中描述的主题的一个或多个实现的细节在附图和下面的描述中陈述。其他特征、方面和优点将从说明书、附图和权利要求书中变得明显。请注意,以下附图的相对尺寸可能不按比例绘制。Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the specification, drawings, and claims. Note that the relative dimensions in the following drawings may not be drawn to scale.
附图说明Attached Figure Description
图1是现有技术中用于提供音频信号的增益控制的系统的示意性框图。Figure 1 is a schematic block diagram of a system for providing gain control of audio signals in the prior art.
图2A和2B是根据一些实施例的用于实现自适应增益控制的系统的示意性框图。Figures 2A and 2B are schematic block diagrams of a system for implementing adaptive gain control according to some embodiments.
图3A和3B示出了根据一些实施例的分别可由编码器实现的增益转换函数和可由解码器实现的逆增益转换函数的示例。Figures 3A and 3B show examples of gain conversion functions that can be implemented by an encoder and inverse gain conversion functions that can be implemented by a decoder, according to some embodiments.
图4是根据一些实施例的可由编码器执行以实现自适应增益控制的示例过程的流程图。Figure 4 is a flowchart of an example process that can be executed by an encoder to achieve adaptive gain control according to some embodiments.
图5是根据一些实施例的可由解码器执行以实现自适应增益控制的示例过程的流程图。Figure 5 is a flowchart of an example process that can be executed by a decoder to achieve adaptive gain control according to some embodiments.
图6示出了根据一些实施例的沉浸式语音和服务(IVAS)系统的示例用例。Figure 6 illustrates example use cases of an immersive voice and services (IVAS) system according to some embodiments.
图7示出了图示能够实现本公开的各个方面的装置的组件的示例的框图。Figure 7 shows a block diagram illustrating an example of components of an apparatus capable of implementing various aspects of the present disclosure.
图8A和8B示出了利用下混合信号的感知激励的增益控制的音频编解码器的示例实施例,其中增益转换步长是均匀的。Figures 8A and 8B illustrate example embodiments of an audio codec that utilizes the perceptual excitation of a downmixed signal for gain control, where the gain transition step size is uniform.
图9A和9B示出了利用下混合信号的感知激励的增益控制的音频编解码器的示例实施例,其中增益转换步长是非均匀的。Figures 9A and 9B illustrate example embodiments of an audio codec that utilizes the perceptual excitation of a submixed signal for gain control, where the gain transition step size is non-uniform.
各种附图中相同的附图标记和名称表示相同的要素。The same reference numerals and names in the various figures denote the same elements.
具体实施方式Detailed Implementation
基于场景的音频、立体声音频、多声道音频和/或对象音频的一些编码技术依赖于在下混合操作之后对多分量信号进行编码。下混合可以允许以保留该波形的波形编码方式来编码减少数量的音频分量,并且可以对其余分量进行参数编码。在接收器侧,可以使用指示参数编码的参数化元数据来重建剩余分量。因为仅对分量的子集进行波形编码,并且可以在比特率方面高效地编码与参数编码的分量相关联的参数元数据,所以这样的编码技术可以是相对比特率高效的,同时仍然允许高质量音频。Some coding techniques for scene-based audio, stereo audio, multichannel audio, and/or object audio rely on encoding multi-component signals after a downmixing operation. Downmixing allows for encoding a reduced number of audio components using waveform coding that preserves the waveform, and parametric coding of the remaining components. At the receiver side, parametric metadata indicating the parametric coding can be used to reconstruct the remaining components. Because only a subset of the components is waveform-coded, and the parametric metadata associated with the parametrically coded components can be encoded efficiently in terms of bit rate, such coding techniques can be relatively bit-rate efficient while still allowing for high-quality audio.
可能出现的一个问题是,由空间编码器确定的下混合声道可能包括具有不适合由构建音频信号比特流的核心编解码器进行后续处理的电平的信号。例如,在一些情况下,下混合信号可能具有过高的电平,使得尽管原始输入信号在其任何分量信号中没有过载,但核心编解码器仍过载。这可能导致严重的失真,例如在解码和渲染之后重建信号中的限幅。这可能在最终渲染的信号中造成相当大的质量损失。一种可能的解决方案可以是衰减输入信号以避免核心编解码器的过载。然而,该解决方案可能具有增加颗粒噪声的缺点,因为用于编码信号的量化器可能不在最佳范围内运行。One potential problem is that the downmixed channels determined by the spatial encoder may include signals with levels unsuitable for subsequent processing by the core codec that constructs the audio signal bitstream. For example, in some cases, the downmixed signal may have excessively high levels, causing the core codec to overload even though the original input signal is not overloaded in any of its component signals. This can lead to severe distortion, such as clipping in the reconstructed signal after decoding and rendering. This can result in a considerable loss of quality in the final rendered signal. One possible solution is to attenuate the input signal to avoid overloading the core codec. However, this solution may have the drawback of increased grainy noise because the quantizer used to encode the signal may not be operating optimally.
图1示出了用于对经编码的高阶环境立体声(HOA)信号执行增益控制的常规系统100的示意性框图。图1所示的示意性框图可用于对MPEG-H信号进行编码和解码。MPEG-H是由国际标准化组织(ISO)/国际电工委员会(IEC)运动图像专家组(MPEG)正在制定的一组国际标准。MPEG-H由各个部分组成,包括第3部分,MPEG-H 3D音频。Figure 1 shows a schematic block diagram of a conventional system 100 for performing gain control on encoded high-order ambient stereo (HOA) signals. The schematic block diagram shown in Figure 1 can be used for encoding and decoding MPEG-H signals. MPEG-H is a set of international standards being developed by the Moving Picture Experts Group (MPEG) of the International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC). MPEG-H consists of several parts, including Part 3, MPEG-H 3D Audio.
在编码器102处,在104处处理输入HOA信号。该处理可以包括例如分解,其中生成下混合声道。下混合声道可以包括针对给定帧由[-max,max]约束的一组信号。因为核心编码器108可以在[-1,1)的范围内对信号进行编码,所以与超出核心编码器108的范围的下混合声道相关联的信号的样本可能导致过载。为了避免过载,增益控制106调整帧的增益,使得相关联的信号在核心编码器108的范围内(例如,在[-1,1)内)。核心编码器108可以被认为是生成编码比特流的编解码器。由分解/处理块104生成的边信息(其可以包括与参数编码的声道相关联的元数据等)可以结合作为核心编码器108的输出产生的信号编码在比特流中。At encoder 102, the input HOA signal is processed at 104. This processing may include, for example, decomposition, in which a lower mixed channel is generated. The lower mixed channel may include a set of signals constrained by [-max, max] for a given frame. Because the core encoder 108 can encode signals in the range of [-1, 1), samples of signals associated with lower mixed channels outside the range of the core encoder 108 may cause overload. To avoid overload, gain control 106 adjusts the gain of the frame so that the associated signals are within the range of the core encoder 108 (e.g., within [-1, 1)). The core encoder 108 can be considered as a codec that generates an encoded bitstream. Side information generated by decomposition/processing block 104 (which may include metadata, etc., associated with the parametrically encoded channels) can be combined with the signal generated as the output of the core encoder 108 and encoded in the bitstream.
解码器112接收经编码的比特流。解码器112可以提取边信息,并且核心解码器116可以提取下混合信号。然后,逆增益控制块120可以逆转由编码器应用的增益。例如,逆增益控制块120可以放大由编码器102的增益控制106衰减的信号。然后,可以通过HOA重建块122来重建HOA信号。可选地,可以通过渲染/重放块124来渲染和/或重放HOA信号。渲染/重放块124可以包括例如用于将重建的HOA输出渲染为例如经渲染的音频数据的各种算法。例如,渲染重建的HOA输出可以涉及将HOA输出的一个或多个信号分布到多个扬声器以实现特定的感知印象。可选地,渲染/重放块124可以包括用于呈现经渲染的音频数据的一个或多个扩音器、耳机等。Decoder 112 receives the encoded bitstream. Decoder 112 can extract side information, and core decoder 116 can extract the undermixed signal. Then, inverse gain control block 120 can reverse the gain applied by the encoder. For example, inverse gain control block 120 can amplify the signal attenuated by gain control 106 of encoder 102. The HOA signal can then be reconstructed by HOA reconstruction block 122. Optionally, the HOA signal can be rendered and/or replayed by rendering/replay block 124. Rendering/replay block 124 can include various algorithms, for example, for rendering the reconstructed HOA output as, for example, rendered audio data. For example, rendering the reconstructed HOA output can involve distributing one or more signals of the HOA output to multiple speakers to achieve a specific perceptual impression. Optionally, rendering/replay block 124 can include one or more amplifiers, headphones, etc., for presenting the rendered audio data.
增益控制106可以使用以下技术来实现增益控制。增益控制106可以首先确定帧中信号值的上限。例如,对于MPEG-H音频信号,该上限可以表示为乘积其中该乘积是在MPEG-H标准中指定的。在给定上限的情况下,所需的最小衰减可以确保经缩放的信号样本限于间隔[-1,1)。换言之,经缩放的样本可以在核心编码器108的范围内。这可以通过应用的增益因子来确定,其中根据定义,emin可以是负数。在一些实施例中,放大可以受到最大放大因子的限制,其中emax是非负整数。因此,为了执行衰减和放大两者,可以定义2e的增益因子,其中增益参数e是[emin,emax]范围内的值。因此,表示增益参数e所需的最低比特数被确定为βe=ceil(log2(|emin|+emax+1))。Gain control 106 can be implemented using the following techniques. Gain control 106 can first determine an upper limit for the signal values in the frame. For example, for an MPEG-H audio signal, this upper limit can be represented as a product where the product is specified in the MPEG-H standard. Given an upper limit, the minimum required attenuation can ensure that the scaled signal samples are limited to the interval [-1, 1). In other words, the scaled samples are within the range of the core encoder 108. This can be determined by the applied gain factor, where e <sub>min </sub> can be negative by definition. In some embodiments, amplification can be limited by a maximum amplification factor, where e <sub>max </sub> is a non-negative integer. Therefore, to perform both attenuation and amplification, a gain factor of 2e can be defined, where the gain parameter e is a value in the range [e <sub>min</sub> , e <sub>max</sub> ]. Therefore, the minimum number of bits required to represent the gain parameter e is determined as βe = ceil(log <sub>2</sub> (|e <sub>min</sub> | + e<sub> max </sub> + 1)).
可以通过应用对应于一个HOA块的一个帧延迟并利用以下递归运算来确定特定声道n和帧j的增益因子gn(j):The gain factor g <sub>n</sub> (j) for a specific channel n and frame j can be determined by applying a frame delay corresponding to a HOA block and using the following recursive operation:
上式中,gn(j-2)表示应用于帧(j-2)的增益因子,并且表示计算帧j-1的增益因子gn(j-1)所需的增益因子调整。In the above formula, g <sub>n </sub>(j-2) represents the gain factor applied to frame (j-2), and represents the gain factor adjustment required to calculate the gain factor g <sub>n</sub> (j-1) of frame j-1.
本文公开了用于提供自适应增益控制的技术。具体地,如本文所描述的,可以确定不产生附加延迟的增益参数,因为可以基于为编解码器使用而生成的超前样本来确定增益参数。编解码器可以由感知编码器使用。增益转换函数的确定在下面结合图2-5所示和描述。This paper discloses a technique for providing adaptive gain control. Specifically, as described herein, a gain parameter that does not introduce additional delay can be determined because the gain parameter can be determined based on lead samples generated for use by the codec. The codec can be used by a perceptual encoder. The determination of the gain transition function is illustrated and described below in conjunction with Figures 2-5.
图2A和2B分别示出了根据示例性实施例的用于执行低延迟自适应增益控制的编码器202和解码器212的示意性框图。在编码器202,由空间分析块204对输入的HOA信号(或一阶环境立体声(FOA))信号进行处理。对于N声道HOA输入,空间分析块204可以生成并输出一组M个下混合声道204A。该组M个下混合声道204A中的下混合声道的数量可以在1≤M≤N的范围内。此外,空间分析块204可以生成并输出用于逆转下混合操作的空间边信息204B。Figures 2A and 2B show schematic block diagrams of an encoder 202 and a decoder 212 for performing low-latency adaptive gain control according to exemplary embodiments. In the encoder 202, the input HOA signal (or first-order ambient stereo (FOA)) signal is processed by a spatial analysis block 204. For an N-channel HOA input, the spatial analysis block 204 can generate and output a set of M downmixing channels 204A. The number of downmixing channels in this set of M downmixing channels 204A can be in the range of 1 ≤ M ≤ N. Furthermore, the spatial analysis block 204 can generate and output spatial side information 204B for reversing the downmixing operation.
例如,对于FOA输入,下混合声道可以包括主下混合声道W’,其可以通过将全向输入信号W与使用各种混合增益的定向输入信号X、Y和Z以及多达3个残余声道X’、Y’和Z’混合来生成,每个残余声道对应于X、Y和Z信号中不能从主下混合信号预测的信号分量。在一个示例中,空间分析块204利用空间重建(SPAR)技术。下文对SPAR进行了进一步描述,其通过引用整体并入本文:D.McGrath、S.Bruhn、H.Purnhagen、M.Eckert、J.Torres、S.Brown和D.Darcy的“Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec”,IEEE声学、语音和信号处理国际会议(ICASSP),2019,第730-734页。在其他示例中,空间分析块204可以利用能量压缩变换的任何其他适合的线性预测编解码器,诸如卡尔胡宁-洛夫变换(KLT)等。核心编码器208可以被认为是生成经编码的音频比特流208A的编解码器。在一些实现中,核心编码器208和核心解码器216可以引入一些先行样本,这些先行样本将被自适应增益控制206用来确定增益参数,以避免向整个编码过程添加额外延迟(零附加延迟)。For example, for FOA input, the downmix channel may include a main downmix channel W’, which can be generated by mixing the omnidirectional input signal W with directional input signals X, Y, and Z using various mixing gains, and up to three residual channels X’, Y’, and Z’, each corresponding to a signal component in the X, Y, and Z signals that cannot be predicted from the main downmix signal. In one example, spatial analysis block 204 utilizes the spatial reconstruction (SPAR) technique. SPAR is further described below, which is incorporated herein by reference in its entirety: “Immersive Audio Coding for Virtual Reality Using a Metadata-assisted Extension of the 3GPP EVS Codec”, D. McGrath, S. Bruhn, H. Purnhagen, M. Eckert, J. Torres, S. Brown, and D. Darcy, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 730-734. In other examples, spatial analysis block 204 can utilize any other suitable linear predictive codec of the energy compression transform, such as the Karhunen-Loov transform (KLT). Core encoder 208 can be considered as a codec that generates the encoded audio bitstream 208A. In some implementations, core encoder 208 and core decoder 216 can introduce some lookahead samples, which will be used by adaptive gain control 206 to determine the gain parameters to avoid adding additional latency to the entire encoding process (zero additional latency).
然后,可以通过自适应增益控制206来分析与M个下混合声道204A相关联的信号。自适应增益控制206可以确定与M个下混合声道204A中的任何一个相关联的信号是否超过核心编码器208预期的音频幅度范围,并因此将使核心编码器208过载。在一些实施例中,在自适应增益控制206确定将不应用增益的情况下,例如响应于确定M个下混合声道204A的信号中都没有超过核心编码器208的预期范围,自适应增益控制206可以设置指示不应用增益控制的标志。可以通过设置标志的值来执行标志指示,例如通过设置单个比特的值。在自适应增益控制206确定将不应用增益的情况下,自适应增益控制206可以不设置标志,从而保留一个比特(例如,与标志相关联的比特)。例如,在一些实现中,如果空间元数据比特流和/或核心编码器比特流(其可以是感知编码器比特流)是自终止的,则可以通过确定比特流中是否存在任何未读比特来确定增益控制标志的存在。未读比特可以是在比特流中的剩余比特。在不存在过载状况的情况下,自适应增益控制206可以输出M个下混合声道206A。然后,可以将M个下混合声道206A传递到核心编码器208,以便编码在比特流208A中。The signals associated with the M lower mixing channels 204A can then be analyzed using adaptive gain control 206. Adaptive gain control 206 can determine whether the signal associated with any of the M lower mixing channels 204A exceeds the expected audio amplitude range of the core encoder 208, and thus would overload the core encoder 208. In some embodiments, if adaptive gain control 206 determines that no gain will be applied, for example, in response to determining that none of the signals in the M lower mixing channels 204A exceed the expected range of the core encoder 208, adaptive gain control 206 can set a flag indicating that gain control will not be applied. The flag indication can be performed by setting the value of the flag, for example, by setting the value of a single bit. If adaptive gain control 206 determines that no gain will be applied, adaptive gain control 206 can leave the flag unset, thus reserving a bit (e.g., the bit associated with the flag). For example, in some implementations, if the spatial metadata bitstream and/or the core encoder bitstream (which may be a perceptual encoder bitstream) are self-terminating, the presence of the gain control flag can be determined by determining whether any unread bits exist in the bitstream. Unread bits can be remaining bits in the bitstream. In the absence of overload conditions, adaptive gain control 206 can output M lower mixing channels 206A. These M lower mixing channels 206A can then be passed to the core encoder 208 for encoding in the bitstream 208A.
相反,在自适应增益控制206确定要应用增益的情况下,自适应增益控制206可以确定增益参数并且根据所确定的增益参数将(多个)增益应用于M个下混合声道。然后,可以将应用了增益的M个下混合声道206A传递到核心编码器208,以编码在比特流中。此外,自适应增益控制206可以输出关于增益控制的边信息206B。关于标志的信息可以被包括在关于增益控制的边信息206B中。边信息编码器210可以将空间边信息204B与增益参数206B一起编码为元数据210A,以便在比特流中传输。然后,解码器212可以提取并使用该元数据来对下混合声道进行上混合并逆转增益调整。例如,元数据210A稍后可用于重建由空间分析单元204下混合的原始音频输入的表示。边信息编码器210可以额外向核心编码器208提供边信息208B。核心编码器208然后可以使用边信息208B来在编码技术之间进行选择。编码比特流208A和具有元数据210A的编码比特流这两者可以被多路复用以形成由编码器202输出的最终比特流。Conversely, when adaptive gain control 206 determines to apply gain, it can determine gain parameters and apply gains(multiple) to the M downmix channels based on those parameters. The M downmix channels 206A with applied gain can then be passed to the core encoder 208 for encoding in the bitstream. Furthermore, adaptive gain control 206 can output side information 206B regarding gain control. Information about flags can be included in the side information 206B. Side information encoder 210 can encode spatial side information 204B along with the gain parameters 206B into metadata 210A for transmission in the bitstream. Decoder 212 can then extract and use this metadata to upmix the downmix channels and reverse the gain adjustment. For example, metadata 210A can later be used to reconstruct a representation of the original audio input downmixed by spatial analysis unit 204. Side information encoder 210 can additionally provide side information 208B to the core encoder 208. The core encoder 208 can then use side information 208B to select between encoding techniques. The encoded bitstream 208A and the encoded bitstream with metadata 210A can be multiplexed to form the final bitstream output by encoder 202.
在一些实现中,自适应增益控制206可以确定在与前一帧(例如,第j-1帧)相关联的增益参数e(j-1)和当前帧的增益参数e(j)之间转换的增益转换函数。增益转换函数可以由自适应增益控制206在逐帧的基础上应用,其中每一帧可以是M个下混合声道204A之一的帧。在一些实现中,增益转换函数可以跨第j个帧的样本平滑地将增益参数从第j-1帧的增益参数值(例如,e(j-1))转换到当前帧的增益参数(例如,e(j))。因此,增益转换函数可以包括两个部分:1)瞬时部分,其中增益参数跨瞬时部分的样本从前一帧的增益参数转换到当前帧的增益参数;以及2)稳态部分,其中增益参数对于稳态部分的样本具有当前帧的增益参数值。In some implementations, adaptive gain control 206 may determine a gain transfer function that transforms the gain parameter e(j-1) associated with the previous frame (e.g., the (j-1)th frame) and the gain parameter e(j) of the current frame. The gain transfer function may be applied by adaptive gain control 206 on a frame-by-frame basis, where each frame may be one of the M downmixing channels 204A. In some implementations, the gain transfer function may smoothly transform the gain parameter from the gain parameter value of the (j-1)th frame (e.g., e(j-1)) to the gain parameter of the current frame (e.g., e(j)) across samples of the j-th frame. Therefore, the gain transfer function may include two parts: 1) an instantaneous part, where the gain parameter transforms from the gain parameter of the previous frame to the gain parameter of the current frame across samples of the instantaneous part; and 2) a steady-state part, where the gain parameter has the gain parameter value of the current frame for samples of the steady-state part.
在一些实施例中,在向当前帧应用的增益小于向前一帧应用的增益的情况下,由于衰减量在当前帧的样本上增加,因此瞬时部分可以被称为具有“衰落”的瞬时类型。向当前帧应用的增益小于向前一帧应用的增益的情况可以表示为e(j)>e(j-1)。在一些实施例中,在向当前帧应用的增益大于向前一帧应用的增益的情况下,由于衰减量在当前帧的样本上减小,因此可以将瞬时部分称为具有“逆衰落”或“非衰落”的瞬时类型。向当前帧应用的增益大于向前一帧应用的增益的情况可以表示为e(j)<e(j-1)。在一些实施例中,在向当前帧应用的增益与向当前帧应用的增益相同的情况下,瞬时部分可以被称为具有“保持”的瞬时类型,其中瞬时部分不是瞬时的,而是具有与稳态部分相同的值。向当前帧应用的增益与向当前帧应用的增益相同的情况可以表示为e(j)=e(j-1)。In some embodiments, when the gain applied to the current frame is less than the gain applied to the previous frame, the instantaneous portion can be referred to as an instantaneous type with "fading" because the attenuation increases on the samples of the current frame. The case where the gain applied to the current frame is less than the gain applied to the previous frame can be expressed as e(j) > e(j-1). In some embodiments, when the gain applied to the current frame is greater than the gain applied to the previous frame, the instantaneous portion can be referred to as an instantaneous type with "inverse fading" or "non-fading" because the attenuation decreases on the samples of the current frame. The case where the gain applied to the current frame is greater than the gain applied to the previous frame can be expressed as e(j) < e(j-1). In some embodiments, when the gain applied to the current frame is the same as the gain applied to the current frame, the instantaneous portion can be referred to as an instantaneous type with "holding," where the instantaneous portion is not instantaneous but has the same value as the steady-state portion. The case where the gain applied to the current frame is the same as the gain applied to the current frame can be expressed as e(j) = e(j-1).
在一些实施例中,增益转换函数取决于增益转换步长。增益转换步长可以限制从前一帧到当前帧的可能转换的量。这是由如下事实促动的:可能允许在转换期间发生过载的更小且更平滑的增益/衰减变化在感觉上好于具有更大变化,特别是当这要由需要所提及的预定义值范围作为输入的有损核心编码器进行进一步处理时。通过这种方式预定义增益转换函数的参数,可以评估参数对客观质量或感知质量的影响。感知质量可以基于已知的感知质量收听测试来衡量,如具有隐藏参照和锚的多刺激测试(MUSHRA)。感知质量收听测试可以是编码器和解码器处的自动增益控制的调谐过程的一部分。具体地,可以针对特定音频场景和编解码器来调谐诸如增益转换步长大小的参数,直到达到最佳感知的音频质量。然后,编码/解码系统使用经调谐的参数。In some embodiments, the gain transition function depends on the gain transition step size. The gain transition step size can limit the amount of possible transition from the previous frame to the current frame. This is driven by the fact that smaller and smoother gain/attenuation changes that may allow for overload during transitions are perceptually better than larger changes, especially when this is to be further processed by a lossy core encoder that requires the aforementioned predefined range of values as input. By predefining the parameters of the gain transition function in this way, the impact of the parameters on objective or perceived quality can be evaluated. Perceived quality can be measured based on known perceived quality listening tests, such as the Multi-Stimulus Test with Hidden References and Anchors (MUSHRA). Perceived quality listening tests can be part of the tuning process of automatic gain control at the encoder and decoder. Specifically, parameters such as the size of the gain transition step size can be tuned for a specific audio scene and codec until the optimal perceived audio quality is achieved. The encoding/decoding system then uses the tuned parameters.
在示例实现中,自动增益控制206的经处理的输出206A进一步由基于不以进行波形重建为目标的代数码激励线性预测(ACELP)编码的有损核心编解码器进行编码。已经观察到,对ACELP输入和输出应用更大的增益步长导致重建信号中可听到的毛刺,并降低编解码器的整体性能。In the example implementation, the processed output 206A of the automatic gain control 206 is further encoded by a lossy core codec based on algebraically excited linear prediction (ACELP) encoding that is not intended for waveform reconstruction. It has been observed that applying larger gain steps to the ACELP inputs and outputs results in audible glitches in the reconstructed signal and degrades the overall performance of the codec.
在一些实现中,当检测到当前帧的过载时,自动增益控制206还可以确定该帧处于核心编码器208的预期范围内所需的衰减量。如果在连续帧之间所需的衰减之间存在较大差异,则当在解码器处渲染音频信号时,由核心编码器208应用转换函数以实现所需范围[-1,1)可能导致可听到的伪影。可以将转换函数限制为特定的增益转换步长,而不是应用转换函数来将每一帧保持在所需范围的限制内或在所需范围的限制处。因此,无论实现核心编码器208的预期范围所需的衰减量如何,转换函数只能将单个帧衰减等于增益转换步长的量,即±DBSTEPdB。因此,作为示例,如果前一帧的衰减为-10dB,则应用于当前帧的第一个样本的衰减将为-10dB,而应用于当前帧的最后一个样本的衰减将为-10dB±DBSTEP。准确地说,如果过载量从前一帧到当前帧没有改变,则增益转换将是恒定值,例如,-10dB。如果需要改变衰减量,则增益转换函数将按±DBSTEPdB从前一帧的衰减转换到当前帧的最后一个样本。In some implementations, when an overload is detected in the current frame, the automatic gain control 206 can also determine the amount of attenuation required for that frame to be within the expected range of the core encoder 208. If there is a large difference in the required attenuation between consecutive frames, applying a conversion function by the core encoder 208 to achieve the desired range [-1, 1) when rendering the audio signal at the decoder may result in audible artifacts. Instead of applying a conversion function to keep each frame within or at the limit of the desired range, the conversion function can be restricted to a specific gain conversion step size. Therefore, regardless of the amount of attenuation required to achieve the expected range of the core encoder 208, the conversion function can only attenuate a single frame by an amount equal to the gain conversion step size, i.e., ±DBSTEPdB. Thus, as an example, if the attenuation of the previous frame is -10dB, the attenuation applied to the first sample of the current frame will be -10dB, and the attenuation applied to the last sample of the current frame will be -10dB±DBSTEP. More precisely, if the overload does not change from the previous frame to the current frame, the gain conversion will be a constant value, for example, -10dB. If the attenuation needs to be changed, the gain conversion function will convert the attenuation from the previous frame to the last sample of the current frame by ±DBSTEPdB.
在一些实现中,可以选择DBSTEP,使得由自动增益控制206应用的衰减量不足以将帧保持在核心编码器208的预期信号范围内。例如,DBSTEP可以是固定值。当需要衰减的急剧变化时,通过允许帧在[-1,1)的范围之外,可以避免连续帧之间的强烈衰减差异。因此,该帧相对于前一帧的衰减量被衰减固定量,而不是通过转换函数或静态增益变化将帧强制在[-1,1)的范围内。通过使用具有特定增益转换步长的转换函数,可以改善感知音频质量,因为与由连续帧之间的急剧衰减差异引起的失真相比,由于帧在[-1,1)的范围之外而引起的失真较不明显。此外,可以避免用于在平滑转换和静态增益改变之间切换的异常标志。因此,可以在核心编码器208处节省1比特。In some implementations, DBSTEP can be selected such that the attenuation applied by automatic gain control 206 is insufficient to keep the frame within the expected signal range of the core encoder 208. For example, DBSTEP can be a fixed value. When a sharp change in attenuation is required, strong attenuation differences between consecutive frames can be avoided by allowing the frame to be outside the range [-1, 1). Therefore, the attenuation of the frame relative to the previous frame is attenuated by a fixed amount, rather than forcing the frame into the range [-1, 1) through a transfer function or static gain change. Perceived audio quality can be improved by using a transfer function with a specific gain transfer step size because the distortion caused by the frame being outside the range [-1, 1) is less noticeable than the distortion caused by sharp attenuation differences between consecutive frames. Furthermore, an anomalous flag used for switching between smooth transfers and static gain changes can be avoided. Therefore, 1 bit can be saved at the core encoder 208.
在一些实现中,DBSTEP可以是单个值,例如,-1dB。备选地,DBSTEP可以选自一组增大的固定值,例如,-1dB、-3dB、-6dB。在这种情况下,可以根据没有衰减的帧引起的过载量来选择DBSTEP的值。In some implementations, DBSTEP can be a single value, such as -1dB. Alternatively, DBSTEP can be selected from a set of increasing fixed values, such as -1dB, -3dB, and -6dB. In this case, the value of DBSTEP can be chosen based on the overload caused by frames without attenuation.
在一些实现中,自动增益控制被配置为具有指定一组目标增益值GT的能力,该组目标增益值GT可以被表示为指示在每一步提供的DBS衰减的倍数的数字(例如整数)的表格。这是由如下事实促动的:较小的变化提供了感知上的好处,然而,某些信号可能需要更高级别的可能衰减。指定这些非均匀的绝对步长允许覆盖更宽的衰减范围,同时为许多可能的情况提供更小步长的好处。例如,具有-2dB的DBS的一组GT={0,1,3,6}将具有{DBS*GT}={0dB,-2dB,-6dB,-12dB}的绝对目标增益,采取{-2dB,-4dB,-6dB}的连续步长DBSTEP。In some implementations, automatic gain control is configured to have the ability to specify a set of target gain values GT , which can be represented as a table of numbers (e.g., integers) indicating the factor of DBS attenuation provided at each step. This is driven by the fact that smaller variations provide a perceived benefit, however, some signals may require a higher level of attenuation. Specifying these non-uniform absolute step sizes allows for coverage of a wider attenuation range while providing the benefit of smaller step sizes for many possible scenarios. For example, a set of GT = {0, 1, 3, 6} with a DBS of -2dB would have an absolute target gain of {DBS* GT } = {0dB, -2dB, -6dB, -12dB}, taking consecutive steps DBSTEP of {-2dB, -4dB, -6dB}.
可以指定这样的整数表格中的一个或多个,并且可以将关于在编码侧使用的特定表格的选择的信息用信号通知/发送到解码器侧。与产生单个均匀增益转换形状的均匀步长的应用相反,非均匀步长的应用导致非均匀增益转换形状(级别相关的转换函数)。One or more such integer tables can be specified, and information regarding the selection of a specific table used on the encoding side can be signaled/sent to the decoder side. In contrast to applications with uniform step sizes that produce a single uniform gain transition shape, applications with non-uniform step sizes result in non-uniform gain transition shapes (level-dependent transition functions).
在一些实现中,当DBSTEP不足以将当前帧衰减到[-1,1)的范围时,可以将DBSTEP应用于当前帧之后的帧,直到达到[-1,1)的范围。In some implementations, when DBSTEP is insufficient to decay the current frame to the range [-1, 1), DBSTEP can be applied to frames following the current frame until the range [-1, 1) is reached.
在一些实现中,来自自动增益控制206系统的输出级别和衰减信息可以在诸如核心编码器208的其他系统中的决策过程中使用。虽然放宽的要求可以提供感知上的好处,但是它可以通过引入增益的变化或者通过不满足严格的要求并允许保持过载状况来影响核心编码器208。诸如增益控制是否满足要求、是否应用或应用了多少增益等信息可以被输出并传递给核心编码器。这允许做出更好的决策,例如选择能够更好地处理增益变化或超出范围样本的编码方法。例如,当应用大的增益/衰减步长时,核心编码器208可以使用类似于基于MDCT的编码的波形编码技术,而不是预测ACELP编码技术。In some implementations, the output level and attenuation information from the automatic gain control 206 system can be used in decision-making processes in other systems such as the core encoder 208. While relaxed requirements can provide perceived benefits, they can affect the core encoder 208 by introducing gain variations or by not meeting stringent requirements and allowing overload conditions to persist. Information such as whether gain control meets requirements, whether gain is applied, and how much gain is applied can be output and passed to the core encoder. This allows for better decision-making, such as choosing an encoding method that better handles gain variations or out-of-range samples. For example, when applying large gain/attenuation steps, the core encoder 208 can use waveform encoding techniques similar to MDCT-based encoding instead of predictive ACELP encoding techniques.
在一些实施例中,可以使用增益转换函数的瞬时部分的原型形状来确定增益转换函数的瞬时部分,其中基于当前帧的增益参数和前一帧的增益参数之间的差异来缩放原型形状。例如,原型形状可以基于e(j)-e(j-1)来缩放。利用这种原型函数p的增益转换函数可以表示为:In some embodiments, the instantaneous portion of the gain transition function can be determined using the prototype shape of the instantaneous portion of the gain transition function, wherein the prototype shape is scaled based on the difference between the gain parameters of the current frame and the gain parameters of the previous frame. For example, the prototype shape can be scaled based on e(j) - e(j-1). The gain transition function using such a prototype function p can be expressed as:
其中,lend表示为其定义p的最右边的索引,L表示一帧的样本数。例如,瞬时部分增益的原型形状可以定义为:Where l<sub>end</sub> represents the rightmost index of the definition p, and L represents the number of samples in a frame. For example, the prototype shape of the instantaneous partial gain can be defined as:
其中并且L这里是为其定义p的帧中的样本数。例如,L可以是lend+1。Where L is the number of samples in the frame for which p is defined. For example, L could be l end + 1.
图3A中示出了增益转换函数的示例,每个增益转换函数具有瞬时部分,瞬时部分具有“衰落”的瞬时类型。在图3A所示的示例中,每个增益转换函数具有从样本0开始的瞬时部分,其可以对应于当前帧的开始,具有0dB的增益,其中0dB是前一帧(例如,第j-1帧)的增益参数。在图3A所示的示例中,每个增益转换函数的瞬时部分在大约384个样本的过程中改变为增益转换函数的稳态部分。对于图3A中所示的三个增益转换函数中的每一个,稳态部分对应于第j帧的不同增益转换步长,相对于前一帧的增益,其(负)增益分别增加6dB、12dB和18dB。换句话说,如图3A所示,对于三个增益转换函数,分别exp=-[e(j)-e(j-1)]=-1、-2、和-3。对于图3A中所示的每个增益转换函数,瞬时部分具有相同的长度(例如,大约384个样本)。注意,稳态部分的长度可以对应于与编解码器引入的延迟相关的偏移,例如,在图3A所示的示例中为12毫秒。相应地,瞬时部分的长度可以与偏移的倒数有关。在图3A所示的示例中,瞬时部分的长度是帧长度(例如,20毫秒)减去编解码器延迟(例如,12毫秒)。注意,编解码器延迟可以是不包括帧大小延迟的整体编码器算法延迟。Figure 3A shows examples of gain transition functions, each with an instantaneous portion having an instantaneous type of "fading". In the example shown in Figure 3A, each gain transition function has an instantaneous portion starting from sample 0, which may correspond to the beginning of the current frame, with a gain of 0 dB, where 0 dB is the gain parameter of the previous frame (e.g., frame j-1). In the example shown in Figure 3A, the instantaneous portion of each gain transition function changes to the steady-state portion of the gain transition function over a period of approximately 384 samples. For each of the three gain transition functions shown in Figure 3A, the steady-state portion corresponds to different gain transition step sizes in frame j, with its (negative) gain increasing by 6 dB, 12 dB, and 18 dB relative to the gain of the previous frame, respectively. In other words, as shown in Figure 3A, for the three gain transition functions, exp = -[e(j) - e(j-1)] = -1, -2, and -3, respectively. For each gain transition function shown in Figure 3A, the instantaneous portion has the same length (e.g., approximately 384 samples). Note that the length of the steady-state portion can correspond to an offset related to the latency introduced by the codec, for example, 12 milliseconds in the example shown in Figure 3A. Correspondingly, the length of the instantaneous portion can be related to the reciprocal of the offset. In the example shown in Figure 3A, the length of the instantaneous portion is the frame length (e.g., 20 milliseconds) minus the codec latency (e.g., 12 milliseconds). Note that the codec latency can be the overall encoder algorithm latency, excluding frame size latency.
另外,具有“逆衰落”或“非衰落”的瞬时类型的瞬时部分的增益转换函数可以表示为在图3A中示出的增益转换函数的跨水平线翻转的镜像。举例来说,水平线可以是x轴。Additionally, the instantaneous portion of the gain transfer function with "inverse fading" or "non-fading" instantaneous types can be represented as a mirror image of the gain transfer function shown in Figure 3A, flipped across a horizontal line. For example, the horizontal line could be the x-axis.
返回参考图2B,解码器212可以接收经编码的音频比特流208A和元数据比特流210A作为输入,并且可以重建HOA信号以例如用于渲染,或者直接渲染为期望的输出格式。在一些实施例中,核心解码器216接收经编码的音频比特流208A。另外,核心解码器216可以接收由边信息解码器214从元数据比特流210A提取的信息214A。核心解码器216可以基于信息214A或在没有任何边信息知识的情况下对经编码的音频比特流208A进行解码,并将M个经增益调整的下混合声道216A输出到逆增益控制220。边信息解码器214进一步提取增益参数和空间边信息,并将该信息214B发送到逆增益控制220和空间合成/渲染/重放块222。然后,逆增益控制220可以从信息214B获得由编码器202应用的增益参数。例如,在一些实现中,逆增益控制220可以从信息214B检索由编码器202应用的增益转换步长DBSTEP和/或与DBSTEP相关的算术因子的指示。此外,逆增益控制块220可以例如从存储器检索转换函数的形状,即也称为平滑函数的原型函数p的形状。然后,逆增益控制块220可以使用所获得的增益参数来逆转由编码器202应用的增益,并输出M个下混合声道220A。例如,在一些实现中,逆增益控制220可以构造从前一帧的增益参数转换到当前帧的增益参数的逆增益转换函数。在一些实现中,逆增益转换函数可以是跨中心垂直线镜像并垂直调整的由编码器202应用的增益转换函数。举例来说,垂直线可以是y轴。Referring back to Figure 2B, decoder 212 can receive encoded audio bitstream 208A and metadata bitstream 210A as input, and can reconstruct the HOA signal for, for example, rendering, or directly rendering to the desired output format. In some embodiments, core decoder 216 receives encoded audio bitstream 208A. Additionally, core decoder 216 can receive information 214A extracted from metadata bitstream 210A by side information decoder 214. Core decoder 216 can decode encoded audio bitstream 208A based on information 214A or without any side information knowledge, and output M gain-adjusted lower mixing channels 216A to inverse gain control 220. Side information decoder 214 further extracts gain parameters and spatial side information, and sends this information 214B to inverse gain control 220 and spatial synthesis/rendering/replay block 222. Inverse gain control 220 can then obtain the gain parameters applied by encoder 202 from information 214B. For example, in some implementations, inverse gain control 220 may retrieve from information 214B an indication of the gain conversion step size DBSTEP applied by encoder 202 and/or an arithmetic factor associated with DBSTEP. Furthermore, inverse gain control block 220 may, for example, retrieve the shape of the conversion function, i.e., the shape of the prototype function p, also known as the smoothing function, from memory. Inverse gain control block 220 may then use the obtained gain parameters to reverse the gain applied by encoder 202 and output M lower mixing channels 220A. For example, in some implementations, inverse gain control 220 may construct an inverse gain conversion function that converts the gain parameters from the previous frame to the gain parameters of the current frame. In some implementations, the inverse gain conversion function may be a gain conversion function applied by encoder 202 that is mirrored and vertically adjusted across a center vertical line. For example, the vertical line may be the y-axis.
转到图3B,根据一些实现示出了将由解码器响应于图3A中所示的由编码器应用的增益转换函数而应用的逆增益转换函数的示例。如图所示,逆增益转换函数具有稳态部分和瞬时部分。如图3A和3B所示,逆增益转换函数的稳态部分和瞬时部分的持续时间可以对应于与增益转换函数的相应稳态部分和瞬时部分的持续时间,例如与之相同。如图所示,图3B所示的每个逆增益转换函数从0dB开始,并转换为当前帧的-DBSTEP。也就是说,每个逆增益转换函数从对应于向前一帧j-1应用的逆增益的0dB开始。当由编码器应用的增益对应于如图3A的增益转换函数所示由小于0dB的增益表示的衰减时,由解码器应用的逆增益对应于具有大于0dB的增益的放大,如图3B的增益转换函数所示。相反,当由编码器应用的增益对应于放大(例如,具有大于0d B的增益)时,由解码器应用的逆增益对应于衰减,例如,具有小于0dB的增益。Turning to Figure 3B, an example of an inverse gain transition function applied by the decoder in response to the gain transition function applied by the encoder as shown in Figure 3A is illustrated according to some implementations. As shown, the inverse gain transition function has a steady-state portion and an instantaneous portion. As shown in Figures 3A and 3B, the durations of the steady-state and instantaneous portions of the inverse gain transition function can correspond to, for example, the same durations of the corresponding steady-state and instantaneous portions of the gain transition function. As shown, each inverse gain transition function shown in Figure 3B starts at 0 dB and transitions to -DBSTEP of the current frame. That is, each inverse gain transition function starts at 0 dB corresponding to the inverse gain applied in the previous frame j-1. When the gain applied by the encoder corresponds to attenuation represented by a gain less than 0 dB as shown in the gain transition function of Figure 3A, the inverse gain applied by the decoder corresponds to amplification with a gain greater than 0 dB, as shown in the gain transition function of Figure 3B. Conversely, when the gain applied by the encoder corresponds to amplification (e.g., a gain greater than 0 dB), the inverse gain applied by the decoder corresponds to attenuation, for example, a gain less than 0 dB.
返回参考图2B,在应用了逆增益之后,将应用了逆增益的M个下混合声道220A提供给空间合成/渲染/重放块222。空间合成/渲染/重放块222可以使用信息214B重建HOA信号。例如,在空间分析块204使用SPAR技术进行空间编码的情况下,空间合成/渲染/重放块222可以利用SPAR技术来重建使用元数据210A编码的一个或多个声道。然后,可以直接渲染重建的HOA输出或将其提供给另一实体以进行渲染。空间合成/渲染/重放块222可以包括例如用于将重建的HOA输出渲染为例如经渲染的音频数据的各种算法。例如,渲染重建的HOA输出可以涉及将HOA输出的一个或多个信号分布到多个扬声器以实现特定的感知印象。可选地,空间合成/渲染/重放块222可以包括用于呈现经渲染的音频数据的音频重放设备,例如一个或多个扩音器、耳机等。Referring back to Figure 2B, after applying inverse gain, the M downmixing channels 220A with applied inverse gain are provided to the spatial synthesis/rendering/replay block 222. The spatial synthesis/rendering/replay block 222 can reconstruct the HOA signal using information 214B. For example, if the spatial analysis block 204 uses SPAR technology for spatial encoding, the spatial synthesis/rendering/replay block 222 can utilize SPAR technology to reconstruct one or more channels encoded using metadata 210A. The reconstructed HOA output can then be rendered directly or provided to another entity for rendering. The spatial synthesis/rendering/replay block 222 may include various algorithms, for example, for rendering the reconstructed HOA output as, for example, rendered audio data. For example, rendering the reconstructed HOA output may involve distributing one or more signals of the HOA output to multiple speakers to achieve a specific perceptual impression. Optionally, the spatial synthesis/rendering/replay block 222 may include audio playback devices for presenting the rendered audio data, such as one or more loudspeakers, headphones, etc.
图4示出了根据一些实现的用于确定增益参数并根据所确定的增益参数将增益应用于下混合信号的过程400的示例。在一些实现中,过程400的块可以由编码器设备执行。在一些实现中,可以按照图4所示以外的顺序来执行过程400的块。在一些实现中,可以基本上并行地执行过程400的两个或更多个块。在一些实现中,可以省略过程400的一个或多个块。Figure 4 illustrates an example of a process 400, according to some implementations, for determining gain parameters and applying gain to the undermixed signal based on the determined gain parameters. In some implementations, blocks of process 400 may be executed by an encoder device. In some implementations, blocks of process 400 may be executed in a sequence other than that shown in Figure 4. In some implementations, two or more blocks of process 400 may be executed substantially in parallel. In some implementations, one or more blocks of process 400 may be omitted.
在402,过程400可以获得与待编码的音频信号的帧相关联的(多个)下混合音频信号。(多个)下混合音频信号可以与待编码的音频信号的帧相关联。例如,在一些实现中,过程400可以使用任何适合的空间编码技术来确定一组下混合声道。空间编码技术的示例包括SPAR、线性预测技术等。该组下混合声道可以包括任何从1到N个声道,其中N是输入声道的数量,例如在FOA信号的情况下,N是4。下混合信号可以包括对应于音频信号的特定帧的下混合声道的音频信号。At 402, process 400 may obtain (multiple) downmixed audio signals associated with frames of the audio signal to be encoded. These (multiple) downmixed audio signals may be associated with frames of the audio signal to be encoded. For example, in some implementations, process 400 may use any suitable spatial coding technique to determine a set of downmixed channels. Examples of spatial coding techniques include SPAR, linear prediction, etc. This set of downmixed channels may include any number from 1 to N, where N is the number of input channels, for example, 4 in the case of a FOA signal. The downmixed signal may include audio signals of the downmixed channels corresponding to specific frames of the audio signal.
在404,过程400可以确定对于诸如增强型语音服务(EVS)编解码器的编解码器和/或对于任何其他适合的编解码器是否存在过载状况。例如,响应于确定对应于(多个)下混合音频信号的帧的信号超过预定范围(例如,[-1,1))和/或任何其他适合的范围,过程400可以确定存在过载状况。At 404, process 400 may determine whether an overload condition exists for a codec such as the Enhanced Voice Service (EVS) codec and/or for any other suitable codec. For example, in response to determining that the signal corresponding to a frame(s) of the submixed audio signal exceeds a predetermined range (e.g., [-1, 1)) and/or any other suitable range, process 400 may determine that an overload condition exists.
如果在404确定不存在过载状况(404处为“否”),则过程400可以前进到412并且可以对下混合信号进行编码。例如,在一些实现中,过程400可以生成结合诸如元数据之类的边信息对下混合信号进行编码的比特流,解码器可以利用该边信息来对下混合信号进行上混合,例如,以重建FOA或HOA输出。If it is determined at 404 that no overload condition exists (404 is "No"), then process 400 can proceed to 412 and the downmixed signal can be encoded. For example, in some implementations, process 400 can generate a bitstream that encodes the downmixed signal by incorporating side information such as metadata, which the decoder can use to upmix the downmixed signal, for example, to reconstruct the FOA or HOA output.
相反,如果在404确定存在过载状况(404处为“是”),则过程400可以前进到406,并且可以确定导致避免过载状况的帧的增益转换函数,或者如果从一帧到下一帧的过载状况的改变大于增益转换步长,则至少减少过载。此外,在406中,增益转换函数可以基于增益转换步长。此外,增益转换函数可以基于平滑函数的形状。此外,如上面结合图2所描述的,增益转换函数可以具有瞬时部分和稳态部分,其中稳态部分对应于当前帧的增益因子,并且瞬时部分对应于当前帧的样本子集的中间增益因子序列,该中间增益因子序列从前一帧末尾的增益因子转换到前一帧的增益因子±DBSTEP。Conversely, if an overload condition is determined to exist at 404 ("Yes" at 404), process 400 can proceed to 406, and the gain transition function for the frame that leads to avoiding the overload condition can be determined, or the overload can be reduced at least if the change in the overload condition from one frame to the next is greater than the gain transition step size. Furthermore, in 406, the gain transition function can be based on the gain transition step size. Additionally, the gain transition function can be based on the shape of a smoothing function. Furthermore, as described above in conjunction with Figure 2, the gain transition function can have an instantaneous part and a steady-state part, where the steady-state part corresponds to the gain factor of the current frame, and the instantaneous part corresponds to the intermediate gain factor sequence of a sample subset of the current frame, which transitions from the gain factor at the end of the previous frame to the gain factor ±DBSTEP of the previous frame.
在前一帧的增益参数对应于比当前帧的增益参数更少衰减的情况下,瞬时部分可被称为具有“衰落”的瞬时类型。相反,在前一帧的增益参数对应于比当前帧的增益参数更多衰减的情况下,瞬时部分可被称为具有“逆衰落”或“非衰落”的瞬时类型。在前一帧的增益参数与当前帧的增益参数相同的情况下,瞬时部分可以被称为具有“保持”的瞬时类型。在瞬时部分具有“保持”的瞬时类型的情况下,瞬时部分期间的增益转换函数的值可以与稳态部分期间的增益转换函数的值相同。如上面结合图2所描述的,增益转换函数的瞬时部分的持续时间可以对应于编解码器所利用的延迟持续时间。When the gain parameter of the previous frame corresponds to less attenuation than the gain parameter of the current frame, the transient portion can be called a transient type with "fading". Conversely, when the gain parameter of the previous frame corresponds to more attenuation than the gain parameter of the current frame, the transient portion can be called a transient type with "inverse fading" or "non-fading". When the gain parameter of the previous frame is the same as the gain parameter of the current frame, the transient portion can be called a transient type with "holding". In the case of a transient portion with "holding", the value of the gain transfer function during the transient portion can be the same as the value of the gain transfer function during the steady-state portion. As described above in conjunction with Figure 2, the duration of the transient portion of the gain transfer function can correspond to the delay duration utilized by the codec.
在408,过程400可以将增益转换函数应用于与该帧相关联的下混合信号。例如,在一些实现中,过程400可以通过由增益转换函数指示的增益因子来缩放下混合信号的样本。作为更具体的示例,在一些实现中,当前帧的第一个样本可以通过对应于前一帧的增益参数的增益因子来缩放,当前帧的最后一个样本可以通过对应于前一帧的增益参数±DBSTEP的增益因子来缩放,并且中间样本可以通过对应于增益转换函数的瞬时或稳态部分的增益参数的增益因子来缩放。At 408, process 400 may apply a gain transition function to the downmixed signal associated with the frame. For example, in some implementations, process 400 may scale samples of the downmixed signal using a gain factor indicated by the gain transition function. As a more specific example, in some implementations, the first sample of the current frame may be scaled using a gain factor corresponding to the gain parameter of the previous frame, the last sample of the current frame may be scaled using a gain factor corresponding to the gain parameter ±DBSTEP of the previous frame, and intermediate samples may be scaled using a gain factor corresponding to the gain parameter of the instantaneous or steady-state portion of the gain transition function.
在一些实现中,增益转换函数可以仅应用于在块404检测到过载状况的下混合声道的下混合信号。例如,在检测到Y’声道和X’声道的过载状况的情况下,可以为Y’声道和X’声道中的每一个确定单独的增益转换函数,并将其应用于Y’声道和X’声道的信号。继续该示例,增益转换函数可以不应用于W’和Z’声道。在这种情况下,例如在块412,可以对应用了增益转换函数的声道的指示以及每个声道的对应的增益参数进行编码。备选地,在一些实现中,在仅对于一个下混合声道存在过载状况的情况下,可以将对应的增益转换函数应用于所有下混合声道。在这种情况下,因为将增益转换函数应用于所有声道,所以不需要发送已应用了增益的声道的指示,这可能导致比特率效率的提高。In some implementations, the gain conversion function may be applied only to the downmixed signal of the downmixed channel where an overload condition is detected at block 404. For example, if an overload condition is detected in both the Y' and X' channels, a separate gain conversion function can be determined for each of the Y' and X' channels and applied to the signals of those channels. Continuing this example, the gain conversion function may not be applied to the W' and Z' channels. In this case, for example at block 412, an indication of the channel to which the gain conversion function has been applied, along with the corresponding gain parameter for each channel, can be encoded. Alternatively, in some implementations, if an overload condition exists only for one downmixed channel, the corresponding gain conversion function can be applied to all downmixed channels. In this case, because the gain conversion function is applied to all channels, it is not necessary to send an indication of the channel to which the gain has been applied, which may result in improved bit rate efficiency.
在410,过程400可以将衰减的信号和指示增益转换函数的信息提供给编码器进行编码。指示增益转换函数的信息可以是增益转换步长和/或与增益转换步长相关的算术因子。此外,可以将平滑函数的形状提供给编码器进行编码。At 410, process 400 can provide the attenuated signal and information indicating the gain transition function to the encoder for encoding. The information indicating the gain transition function can be the gain transition step size and/or an arithmetic factor associated with the gain transition step size. Additionally, the shape of the smoothing function can be provided to the encoder for encoding.
在412,过程400可以对下混合信号进行编码,并且如果应用了增益,则对指示该帧的(多个)增益参数的信息进行编码。在应用了增益的情况下,在块408,经编码的下混合信号可以是在应用增益转换函数之后的下混合信号。下混合信号和指示增益参数的任何信息可以与解码器可以用来重建或上混合下混合信号的任何边信息(例如,元数据)相关地由编解码器(例如,EVS编解码器等)编码,以生成编码比特流。然后,可以将经编码的比特流与元数据一起存储和/或发送到具有逆转编码器的处理步骤的能力的接收设备。At 412, process 400 may encode the downmixed signal, and if gain is applied, encode information indicating the gain(s) of the frame(s). In the case of gain application, at block 408, the encoded downmixed signal may be the downmixed signal after the application of a gain transformation function. The downmixed signal and any information indicating the gain parameters may be encoded by a codec (e.g., EVS codec, etc.) in relation to any side information (e.g., metadata) that the decoder can use to reconstruct or upmix the downmixed signal to generate an encoded bitstream. The encoded bitstream, along with the metadata, may then be stored and/or transmitted to a receiving device capable of processing steps with a reverse encoder.
应当注意,在一些实现中,过程400可以将增益参数编码为一组比特。在一些实现中,增益转换函数可以指示与增益转换函数的瞬时部分相关联的原型/平滑函数。It should be noted that in some implementations, process 400 may encode the gain parameter as a set of bits. In some implementations, the gain transition function may indicate a prototype/smoothing function associated with the instantaneous portion of the gain transition function.
在对每个声道启用自适应增益控制从而将唯一的增益转换函数应用于与触发过载状况的信号相关联的每个下混合声道的情况下,可以为启用了增益控制的每个声道使用x个比特,每个声道具有额外的一个比特指示符,指示已经编码了增益参数。在这种情况下,用于发送增益控制信息的总比特数为Ndmx+x*Nagc,其中Ndmx表示下混合声道的数量(并且其中对于Ndmx声道中的每一个,使用单个比特来指示是否启用了增益控制),并且其中Nagc表示已经启用了增益控制的声道的数量。应当注意,在没有为特定帧启用增益控制的情况下,可以使用Ndmx个比特来指示没有启用增益控制,例如,对于Ndmx个声道每个1比特。注意,在下混合声道的数量为1的情况下,例如,仅对W声道进行波形编码,用于发送增益控制信息的总比特数由x*Nagc表示。例如,在给定一个下混合声道的情况下,如果没有为这一个下混合声道启用增益控制(例如,Nagc=0),则使用的比特数为0。继续该示例,如果启用了增益控制(例如,Nagc=1),则所使用的比特数为x。When adaptive gain control is enabled for each channel, thus applying a unique gain conversion function to each downmix channel associated with the signal triggering the overload condition, x bits can be used for each channel with gain control enabled, with an additional bit indicating that the gain parameter has been encoded. In this case, the total number of bits used to send gain control information is N dmx + x * N agc , where N dmx represents the number of downmix channels (and where a single bit is used for each of the N dmx channels to indicate whether gain control is enabled), and where N agc represents the number of channels with gain control enabled. It should be noted that if gain control is not enabled for a particular frame, N dmx bits can be used to indicate that gain control is not enabled, for example, 1 bit for each of the N dmx channels. Note that when the number of downmix channels is 1, for example, when only channel W is waveform-coded, the total number of bits used to send gain control information is x * N agc . For example, given a single bottom mix channel, if gain control is not enabled for that bottom mix channel (e.g., Nagc = 0), then 0 bits are used. Continuing with the example, if gain control is enabled (e.g., Nagc = 1), then x bits are used.
在将与触发过载状况的下混合声道相关联的单个增益转换函数应用于所有下混合声道的情况下,可以使用较少的比特来发送增益控制信息。例如,使用x比特发送当前帧的单个增益参数。When a single gain conversion function associated with the down mix channel that triggers the overload condition is applied to all down mix channels, fewer bits can be used to send gain control information. For example, x bits can be used to send a single gain parameter for the current frame.
图5示出了根据一些实现的用于获得由编码器使用的增益参数并基于所获得的增益参数应用逆增益转换函数的过程500的示例。在一些实现中,过程500的块可以由解码器设备执行。在一些实现中,可以按照图5所示以外的顺序来执行过程500的块。在一些实现中,可以基本上并行地执行过程500的两个或更多个块。在一些实现中,可以省略过程500的一个或多个块。Figure 5 illustrates an example of a process 500, according to some implementations, for obtaining gain parameters used by the encoder and applying an inverse gain transformation function based on the obtained gain parameters. In some implementations, blocks of process 500 may be executed by a decoder device. In some implementations, blocks of process 500 may be executed in a sequence other than that shown in Figure 5. In some implementations, two or more blocks of process 500 may be executed substantially in parallel. In some implementations, one or more blocks of process 500 may be omitted.
过程500可以通过接收音频信号的编码帧在502处开始。接收的帧(例如,当前帧)在本文通常被称为第j帧。接收的帧可以紧跟在先前接收的帧之后,或者可以是不紧跟在先前接收的帧之后的帧。Process 500 can begin at 502 by receiving an encoded frame of an audio signal. The received frame (e.g., the current frame) is generally referred to herein as frame j. The received frame may immediately follow a previously received frame, or it may be a frame that does not immediately follow a previously received frame.
在504,过程500可以对音频信号的编码帧进行解码以获得下混合信号,并且如果编码器应用了增益控制,则获得指示应用于当前帧的增益控制的信息。指示应用于当前帧的增益控制的信息可以是由编码器应用的增益转换步长。另外,指示应用于当前帧的增益控制的信息可以是由编码器应用的增益转换函数的平滑函数的形状。在编码器在每个声道的基础上应用增益控制的情况下,过程500还可以识别向哪些下混合声道应用了增益控制。In step 504, process 500 can decode the encoded frames of the audio signal to obtain the downmixed signal, and if gain control is applied to the encoder, obtain information indicating the gain control applied to the current frame. This information indicating the gain control applied to the current frame can be the gain transition step size applied by the encoder. Alternatively, this information can be the shape of a smoothing function of the gain transition function applied by the encoder. If the encoder applies gain control on a per-channel basis, process 500 can also identify which downmixed channels have applied gain control.
在506,过程500可以基于增益转换步长来确定逆增益转换函数。在一些实现中,过程500还可以基于平滑函数的形状来确定逆增益转换函数。逆增益转换函数可以基于增益转换函数来计算,或者可以从多个预定义的逆增益转换函数中选择。In 506, procedure 500 can determine the inverse gain transfer function based on the gain transfer step size. In some implementations, procedure 500 can also determine the inverse gain transfer function based on the shape of a smoothing function. The inverse gain transfer function can be computed based on the gain transfer function, or it can be selected from several predefined inverse gain transfer functions.
在一些实现中,过程500可以将逆增益转换函数确定为在编码器处应用的增益转换函数的逆。例如,逆增益转换函数可以对应于跨水平线镜像并调整的增益转换函数。镜像和调整可以沿着x轴。这种逆增益转换函数的示例在上面结合图3B示出和描述。在一些实现中,逆增益转换函数可以具有对应于向前一帧应用的增益的稳态部分。然后,逆增益转换函数可以具有瞬时部分,该瞬时部分是在编码器处应用的增益转换函数的瞬时部分的逆。例如,在向当前帧应用的增益对应于相对于前一帧的更多衰减的情况下,逆增益转换函数可以具有从较小放大转换到较大放大的瞬时部分。相反,在向当前帧应用的增益对应于相对于前一帧的较小衰减的情况下,逆增益转换函数可以具有从较大放大转换到较小放大的瞬时部分。瞬时部分的持续时间可以涉及编解码器引入的延迟,其中瞬时部分的持续时间是帧长度(例如,20毫秒)减去编解码器延迟(例如,12毫秒)。注意,在由编解码器引入的延迟长于帧长度的情况下,可以利用一帧的延迟来应用逆增益转换。在一些情况下,延迟可以由过程500(例如,通过解码器)从增益控制比特获得。逆增益转换函数还可用于衰减由编码器的增益控制放大的信号。In some implementations, process 500 may determine the inverse gain transition function as the inverse of the gain transition function applied at the encoder. For example, the inverse gain transition function may correspond to a gain transition function that is mirrored and adjusted across a horizontal line. Mirroring and adjustment may be along the x-axis. An example of such an inverse gain transition function is shown and described above in conjunction with Figure 3B. In some implementations, the inverse gain transition function may have a steady-state portion corresponding to the gain applied to the previous frame. The inverse gain transition function may then have an instantaneous portion that is the inverse of the instantaneous portion of the gain transition function applied at the encoder. For example, if the gain applied to the current frame corresponds to a greater attenuation relative to the previous frame, the inverse gain transition function may have an instantaneous portion transitioning from a smaller amplification to a larger amplification. Conversely, if the gain applied to the current frame corresponds to a smaller attenuation relative to the previous frame, the inverse gain transition function may have an instantaneous portion transitioning from a larger amplification to a smaller amplification. The duration of the instantaneous portion may involve a delay introduced by the codec, wherein the duration of the instantaneous portion is the frame length (e.g., 20 milliseconds) minus the codec delay (e.g., 12 milliseconds). Note that when the delay introduced by the codec is longer than the frame length, the delay of one frame can be used to apply the inverse gain conversion. In some cases, the delay can be obtained from the gain control bits by process 500 (e.g., via the decoder). The inverse gain conversion function can also be used to attenuate signals amplified by the encoder's gain control.
在508,过程500可以将逆增益转换函数应用于下混合信号以逆转由编码器应用的增益。例如,逆增益转换函数的应用可以使得放大被编码器衰减的下混合信号以逆转衰减。作为另一示例,逆增益转换函数的应用可以使得衰减被编码器放大的下混合信号以逆转放大。然后,步骤508的输出可以是具有与过程400的步骤402之后的M个下混合声道相同的增益的M个下混合声道。At 508, process 500 can apply an inverse gain conversion function to the downmixed signal to reverse the gain applied by the encoder. For example, applying the inverse gain conversion function can cause the downmixed signal attenuated by the encoder to be amplified to reverse the attenuation. As another example, applying the inverse gain conversion function can cause the downmixed signal amplified by the encoder to be attenuated to reverse the amplification. Then, the output of step 508 can be M downmixed channels with the same gain as the M downmixed channels after step 402 of process 400.
在510,过程500可以对下混合信号进行上混合。上混合可以由空间编码器执行。在一些示例中,空间编码器可以利用SPAR技术。上混合信号可以对应于重建的FOA或HOA音频信号。在一些实现中,过程500可以使用在比特流中编码的边信息(例如,元数据)来对信号进行上混合,其中边信息可以被用来重建参数编码的信号。在一些实现中,块510可以是可选的,例如,当可以直接渲染下混合信号时。In step 510, process 500 can upmix the undermixed signal. Upmixing can be performed by a spatial encoder. In some examples, the spatial encoder can utilize SPAR technology. The upmixed signal can correspond to a reconstructed FOA or HOA audio signal. In some implementations, process 500 can use side information (e.g., metadata) encoded in the bitstream to upmix the signal, where the side information can be used to reconstruct the parametrically encoded signal. In some implementations, block 510 can be optional, for example, when the undermixed signal can be rendered directly.
在一些实现中,在512,过程500可以渲染上混合信号以生成经渲染的音频数据。在一些实现中,过程500可以利用任何适合的渲染算法来渲染FOA或HOA音频信号,例如渲染基于场景的音频数据。在一些实现中,经渲染的音频数据可以以任何适合的格式存储,例如用于将来的呈现或回放。在一些实现中,块512是可选的,因此可以省略。In some implementations, at 512, process 500 can render the mixed signal to generate rendered audio data. In some implementations, process 500 can utilize any suitable rendering algorithm to render the FOA or HOA audio signal, such as rendering scene-based audio data. In some implementations, the rendered audio data can be stored in any suitable format, for example, for future presentation or playback. In some implementations, block 512 is optional and can therefore be omitted.
在一些实现中,在514,过程500可以使经渲染的音频数据被回放。例如,在一些实现中,经渲染的音频数据可以经由一个或多个扩音器和/或耳机来呈现。在一些实现中,可以利用多个扩音器,并且多个扩音器可以在三维中相对于彼此定位在任何适合的位置或取向。在一些实现中,过程514是可选的,因此可以省略。In some implementations, at 514, procedure 500 can allow the rendered audio data to be played back. For example, in some implementations, the rendered audio data can be presented via one or more loudspeakers and/or headphones. In some implementations, multiple loudspeakers can be used, and the multiple loudspeakers can be positioned relative to each other in three dimensions at any suitable location or orientation. In some implementations, procedure 514 is optional and can therefore be omitted.
如上面结合图4所述,可以使用一组增益控制比特来编码增益控制信息,例如指示增益参数的信息。在一些实现中,可以为检测到过载状况的每个下混合声道确定不同的增益转换函数。在这样的实现中,需要增益控制比特来指示是否将增益控制应用于每个下混合声道,并且针对应用了增益控制的每个下混合声道对增益转换函数参数进行编码,如上面结合图4所描述的。备选地,在一些实现中,可以将基于存在过载状况的一个下混合声道确定的单个增益转换函数应用于所有下混合声道。在这样的实现中,需要较少的增益控制比特,因为不需要单独的比特标志来表示是否已经对每个下混合声道应用了增益控制,从而导致更加比特率高效的编码。As described above in conjunction with Figure 4, a set of gain control bits can be used to encode gain control information, such as information indicating gain parameters. In some implementations, a different gain transition function can be determined for each downmixer channel where an overload condition is detected. In such an implementation, gain control bits are needed to indicate whether gain control is applied to each downmixer channel, and the gain transition function parameters are encoded for each downmixer channel where gain control is applied, as described above in conjunction with Figure 4. Alternatively, in some implementations, a single gain transition function determined based on the presence of an overload condition in one downmixer channel can be applied to all downmixer channels. In such an implementation, fewer gain control bits are needed because no separate bit flag is required to indicate whether gain control has been applied to each downmixer channel, resulting in more bit-rate efficient encoding.
通过将相同的增益转换函数应用于包括不存在过载状况的下混合声道在内的所有下混合声道的更有效的比特率编码,可能导致感知质量的劣化,例如通过衰减不存在编解码器过载的信号。相比之下,利用更加针对性的增益控制(其中以针对性的方式将增益控制应用于每个下混合声道)可能需要更多的比特来发送增益控制信息。然而,利用附加比特来发送针对性的(例如特定于声道的)增益控制信息可能需要重新分配通常用于对下混合声道进行波形编码的比特,这在某些情况下可能降低感知质量。因此,在将相同的增益转换函数应用于所有下混合声道和应用特定于声道的增益控制之间可能存在视情况而定的折衷。无论是在所有下混合声道上还是在针对每个声道的基础上应用增益控制,都可以从通常用于对下混合声道的波形编码的比特和/或从通常用于编码用于从下混合声道重建FOA或HOA信号的边信息(例如,元数据)的比特中分配与增益控制信息相关联的比特,从而减少了用于编码下混合声道或边信息的可用比特的数量。Applying the same gain transfer function to all downmix channels, including those without overload conditions, with more efficient bitrate coding can lead to a deterioration in perceived quality, for example, by attenuating signals without codec overload. In contrast, utilizing more targeted gain control (where gain control is applied to each downmix channel in a targeted manner) might require more bits to send the gain control information. However, using additional bits to send targeted (e.g., channel-specific) gain control information might require reallocating bits typically used for waveform encoding of the downmix channels, which could degrade perceived quality in some cases. Therefore, there may be a situational trade-off between applying the same gain transfer function to all downmix channels and applying channel-specific gain control. Whether gain control is applied on all downmix channels or on a per-channel basis, bits associated with gain control information can be allocated from the bits typically used for waveform encoding of downmix channels and/or from the bits typically used for encoding side information (e.g., metadata) for reconstructing FOA or HOA signals from downmix channels, thereby reducing the number of available bits for encoding downmix channels or side information.
图6示出了根据一个实施例的IVAS系统600的示例用例。在一些实施例中,各种设备通过呼叫服务器602进行通信,呼叫服务器602被配置为从例如由PSTN/其他PLMN 604示出的公共交换电话网(PSTN)或公共陆地移动网络设备(PLMN)接收音频信号。用例支持仅以单声道渲染和捕获音频的传统设备606,包括但不限于:支持增强型语音服务(EVS)、多速率宽带(AMR-WB)和自适应多速率窄带(AMR-NB)的设备。用例还支持捕获并渲染立体声音频信号的用户设备(UE)608和/或614,或捕获单声道信号并将其渲染为多声道信号的UE 610。用例还支持分别由视频会议室系统616和/或618捕获和渲染的沉浸式和立体声信号。用例还支持用于家庭影院系统620的立体音频信号的立体捕获和沉浸式渲染,以及用于虚拟现实(VR)设备622和沉浸式内容摄取624的音频信号的单声道捕获和沉浸式渲染的计算机612。Figure 6 illustrates an example use case of an IVAS system 600 according to one embodiment. In some embodiments, various devices communicate via a call server 602, which is configured to receive audio signals from, for example, a Public Switched Telephone Network (PSTN) or Public Land Mobile Network (PLMN) device, as shown by PSTN/other PLMN 604. The use case supports conventional devices 606 that render and capture audio in mono only, including but not limited to: devices supporting Enhanced Voice Service (EVS), Multi-Rate Broadband (AMR-WB), and Adaptive Multi-Rate Narrowband (AMR-NB). The use case also supports user equipment (UE) 608 and/or 614 that capture and render stereo audio signals, or UE 610 that captures mono signals and renders them as multi-channel signals. The use case also supports immersive and stereo signals captured and rendered by video conferencing systems 616 and/or 618, respectively. The use cases also support stereo capture and immersive rendering of stereo audio signals for home theater system 620, and computer 612 for mono capture and immersive rendering of audio signals for virtual reality (VR) device 622 and immersive content ingestion 624.
图7是示出能够实现本公开的各个方面的装置的组件的示例的框图。与本文提供的其他附图一样,图7中所示的元件的类型和数量仅作为示例提供。其他实现可以包括更多、更少和/或不同类型和数量的元件。根据一些示例,装置700可以被配置为执行本文公开的方法中的至少一些。在一些实现中,装置700可以是或可以包括电视、音频系统的一个或多个组件、移动设备(诸如蜂窝电话)、膝上型计算机、平板设备、智能扬声器或另一类型的设备。Figure 7 is a block diagram illustrating examples of components of an apparatus capable of implementing various aspects of the present disclosure. As with the other figures provided herein, the types and quantities of elements shown in Figure 7 are provided by way of example only. Other implementations may include more, fewer, and/or different types and quantities of elements. According to some examples, apparatus 700 may be configured to perform at least some of the methods disclosed herein. In some implementations, apparatus 700 may be or may include one or more components of a television, an audio system, a mobile device (such as a cellular phone), a laptop computer, a tablet device, a smart speaker, or another type of device.
根据一些备选实现,装置700可以是或可以包括服务器。在一些这样的示例中,装置700可以是或可以包括编码器。因此,在一些实例中,装置700可以是被配置为在诸如家庭音频环境的音频环境中使用的设备,而在其他实例中,装置700可以是被配置为在“云”中使用的设备,例如服务器。Depending on some alternative implementations, device 700 may be or may include a server. In some such examples, device 700 may be or may include an encoder. Thus, in some instances, device 700 may be a device configured for use in an audio environment such as a home audio environment, while in other instances, device 700 may be a device configured for use in the “cloud,” such as a server.
在该示例中,装置700包括接口系统705和控制系统710。在一些实现中,接口系统705可以被配置为与音频环境的一个或多个其他设备通信。在一些示例中,音频环境可以是家庭音频环境。在其他示例中,音频环境可以是另一种类型的环境,诸如办公室环境、汽车环境、火车环境、街道或人行道环境、公园环境等。在一些实现中,接口系统705可以被配置用于与音频环境的音频设备交换控制信息和相关联的数据。在一些示例中,控制信息和相关联的数据可以与装置700正在执行的一个或多个软件应用有关。In this example, device 700 includes an interface system 705 and a control system 710. In some implementations, the interface system 705 may be configured to communicate with one or more other devices in the audio environment. In some examples, the audio environment may be a home audio environment. In other examples, the audio environment may be another type of environment, such as an office environment, a car environment, a train environment, a street or sidewalk environment, a park environment, etc. In some implementations, the interface system 705 may be configured to exchange control information and associated data with audio devices in the audio environment. In some examples, the control information and associated data may be related to one or more software applications being executed by device 700.
在一些实现中,接口系统705可以被配置为接收或提供内容流。内容流可以包括音频数据。音频数据可以包括但不限于音频信号。在一些情况下,音频数据可以包括空间数据,诸如声道数据和/或空间元数据。在一些示例中,内容流可以包括视频数据和对应于该视频数据的音频数据。In some implementations, interface system 705 can be configured to receive or provide content streams. The content stream may include audio data. The audio data may include, but is not limited to, audio signals. In some cases, the audio data may include spatial data, such as channel data and/or spatial metadata. In some examples, the content stream may include video data and audio data corresponding to that video data.
接口系统705可以包括一个或多个网络接口和/或一个或多个外部设备接口,例如一个或多个通用串行总线(USB)接口。根据一些实现,接口系统705可以包括一个或多个无线接口。接口系统705可以包括用于实现用户接口的一个或多个设备,诸如一个或多个麦克风、一个或多个扬声器、显示系统、触摸传感器系统和/或手势传感器系统。在一些示例中,接口系统705可以包括控制系统710和诸如图7所示的可选存储器系统715的存储器系统之间的一个或多个接口。然而,在一些情况下,控制系统710可以包括存储器系统。在一些实现中,接口系统705可以被配置为从环境中的一个或多个麦克风接收输入。Interface system 705 may include one or more network interfaces and/or one or more external device interfaces, such as one or more Universal Serial Bus (USB) interfaces. According to some implementations, interface system 705 may include one or more wireless interfaces. Interface system 705 may include one or more devices for implementing a user interface, such as one or more microphones, one or more speakers, a display system, a touch sensor system, and/or a gesture sensor system. In some examples, interface system 705 may include one or more interfaces between control system 710 and a memory system such as optional memory system 715 shown in FIG. 7. However, in some cases, control system 710 may include a memory system. In some implementations, interface system 705 may be configured to receive input from one or more microphones in the environment.
例如,控制系统710可以包括通用单芯片或多芯片处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其他可编程逻辑器件、离散门或晶体管逻辑、和/或离散硬件组件。For example, the control system 710 may include a general-purpose single-chip or multi-chip processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic, and/or discrete hardware components.
在一些实现中,控制系统710可以驻留在多于一个设备中。例如,在一些实现中,控制系统710的一部分可以驻留在本文所描绘的环境之一内的设备中,并且控制系统710的另一部分可以驻留在环境之外的设备中,诸如服务器、移动设备(例如,智能手机或平板计算机)等。在其他示例中,控制系统710的一部分可以驻留在一个环境内的一个设备中,并且控制系统710的另一部分可以驻留在该环境的一个或多个其他设备中。例如,控制系统710的一部分可以驻留在实现基于云的服务的一个设备中,诸如服务器,并且控制系统710的另一部分可以驻留在实现基于云的服务的另一设备中,诸如另一服务器、存储器设备等。在一些示例中,接口系统705还可以驻留在多于一个设备中。In some implementations, the control system 710 may reside in more than one device. For example, in some implementations, a portion of the control system 710 may reside in a device within one of the environments described herein, and another portion of the control system 710 may reside in a device outside the environment, such as a server, mobile device (e.g., a smartphone or tablet computer), etc. In other examples, a portion of the control system 710 may reside in a device within an environment, and another portion of the control system 710 may reside in one or more other devices within that environment. For example, a portion of the control system 710 may reside in a device implementing a cloud-based service, such as a server, and another portion of the control system 710 may reside in another device implementing a cloud-based service, such as another server, a storage device, etc. In some examples, the interface system 705 may also reside in more than one device.
在一些实现中,控制系统710可以被配置为至少部分地执行本文公开的方法。根据一些示例,控制系统710可以被配置为实现确定增益参数、应用增益转换函数、确定逆增益转换函数、应用逆增益转换函数、针对比特流分配用于增益控制的比特等的方法。In some implementations, the control system 710 may be configured to at least partially perform the methods disclosed herein. According to some examples, the control system 710 may be configured to implement methods such as determining gain parameters, applying a gain transition function, determining an inverse gain transition function, applying the inverse gain transition function, and allocating bits for gain control to a bitstream.
本文描述的一些或全部方法可以由一个或多个设备根据存储在一个或多个非瞬态介质上的指令(例如,软件)来执行。这种非瞬态介质可以包括诸如本文描述的那些存储器设备,包括但不限于随机存取存储器(RAM)设备、只读存储器(ROM)设备等。一个或多个非瞬态介质可以例如驻留在图7所示的可选存储器系统715和/或控制系统710中。因此,本公开中描述的主题的各种创新方面可以在其上存储有软件的一个或多个非瞬态介质中实现。该软件可以例如包括用于确定增益参数、应用增益转换函数、确定逆增益转换函数、应用逆增益转换函数、针对比特流分配用于增益控制的比特等的指令。例如,该软件可以由诸如图7的控制系统710的控制系统的一个或多个组件来执行。Some or all of the methods described herein can be executed by one or more devices according to instructions (e.g., software) stored on one or more non-transient media. Such non-transient media may include memory devices such as those described herein, including but not limited to random access memory (RAM) devices, read-only memory (ROM) devices, etc. One or more non-transient media may reside, for example, in the optional memory system 715 and/or control system 710 shown in FIG. 7. Therefore, various innovative aspects of the subject matter described herein can be implemented in one or more non-transient media on which software is stored. The software may, for example, include instructions for determining gain parameters, applying a gain transition function, determining an inverse gain transition function, applying the inverse gain transition function, allocating bits for gain control to a bitstream, etc. For example, the software can be executed by one or more components of a control system such as the control system 710 of FIG. 7.
在一些示例中,装置700可以包括图7所示的可选麦克风系统720。可选麦克风系统720可以包括一个或多个麦克风。在一些实现中,一个或多个麦克风可以是诸如扬声器系统的扬声器、智能音频设备等的另一设备的一部分或与其相关联。在一些示例中,装置700可以不包括麦克风系统720。然而,在一些这样的实现中,装置700仍然可以被配置为经由接口系统710接收音频环境中的一个或多个麦克风的麦克风数据。在一些这样的实现中,装置700的基于云的实现可以被配置为经由接口系统710从音频环境中的一个或多个麦克风接收麦克风数据或至少部分对应于麦克风数据的噪声度量。In some examples, device 700 may include the optional microphone system 720 shown in FIG. 7. The optional microphone system 720 may include one or more microphones. In some implementations, the one or more microphones may be part of or associated with another device, such as a speaker in a speaker system, a smart audio device, etc. In some examples, device 700 may not include the microphone system 720. However, in some such implementations, device 700 may still be configured to receive microphone data from one or more microphones in an audio environment via interface system 710. In some such implementations, a cloud-based implementation of device 700 may be configured to receive microphone data or at least a noise metric corresponding to the microphone data from one or more microphones in an audio environment via interface system 710.
根据一些实现,装置700可以包括图7中所示的可选扬声器系统725。可选扬声器系统725可以包括一个或多个扩音器,本文也可以将其称为“扬声器”,或者更一般地,称为“音频再现换能器”。在一些示例(例如,基于云的实现)中,装置700可以不包括扬声器系统725。在一些实现中,装置700可以包括耳机。耳机可以通过耳机插孔或通过无线连接(例如,蓝牙)连接或耦合到装置700。According to some implementations, device 700 may include the optional speaker system 725 shown in FIG. 7. The optional speaker system 725 may include one or more amplifiers, which may also be referred to herein as “speakers,” or more generally as “audio reproduction transducers.” In some examples (e.g., cloud-based implementations), device 700 may not include speaker system 725. In some implementations, device 700 may include headphones. Headphones may be connected to or coupled to device 700 via a headphone jack or via a wireless connection (e.g., Bluetooth).
图8A和图8B示出了感知激励的增益控制的示例实现,其中在编码器侧具有DBSTEP=-1dB的样本均匀增益控制。在该特定示例中,一帧由1024个样本组成。样本幅度用虚线表示,而每个样本应用的增益用实线表示。如图8A所示,一旦帧在编码器上产生过载(幅度大于0dB),增益函数就从无衰减(0dB)转换为DBSTEP-0dB=-1dB的衰减。当输入音频信号超过1dB时,按DBSTEP引入进一步衰减。Figures 8A and 8B illustrate an example implementation of gain control for perceptual excitation, with sample-uniform gain control of DBSTEP = -1dB on the encoder side. In this particular example, a frame consists of 1024 samples. Sample amplitudes are represented by dashed lines, while the gain applied to each sample is represented by solid lines. As shown in Figure 8A, once the frame generates an overload on the encoder (amplitude greater than 0dB), the gain function changes from no attenuation (0dB) to attenuation of DBSTEP - 0dB = -1dB. Further attenuation is introduced by DBSTEP when the input audio signal exceeds 1dB.
图8B中示出了所得到的经衰减的下混合音频信号。在该特定示例中,DBSTEP的值足够大,因此每个样本都衰减到所需的阈值(0dB)以下。Figure 8B shows the resulting attenuated downmixed audio signal. In this particular example, the value of DBSTEP is large enough that each sample is attenuated below the desired threshold (0dB).
图9A和图9B示出了在DBS=-1dB和GT={0,1,3,6}的情况下,在编码器侧产生一组衰减值{DBS*GT}={0,-1,-3,-6}dB或DBSTEP集合{-1,-2,-3}的“非均匀”增益控制的示例。如图8A和图8B所示,样本的幅度用虚线表示,而增益函数用实线表示。利用DBSTEP集合,自动增益控制可以通过在每一帧利用增加值来衰减信号,从而对编码器造成的过载做出反应。如图9B所示,增益转换步长不够大,以至于所有样本都低于所需的阈值(0dB)。当音频信号在解码器处渲染时,这可能会导致失真,但是由编码器处的过载引起的失真不如由非常突然的增益变化引起的失真明显。Figures 9A and 9B illustrate an example of “non-uniform” gain control that generates a set of attenuation values {DBS* GT } = {0,-1,-3,-6} dB or a DBSTEP set {-1,-2,-3} on the encoder side, with DBS = -1 dB and GT = {0,1,3,6}. As shown in Figures 8A and 8B, the sample amplitudes are represented by dashed lines, while the gain function is represented by solid lines. Using the DBSTEP set, automatic gain control can react to overload caused by the encoder by attenuating the signal with incremental values in each frame. As shown in Figure 9B, the gain transition step size is not large enough that all samples are below the desired threshold (0 dB). This may cause distortion when the audio signal is rendered at the decoder, but the distortion caused by overload at the encoder is not as noticeable as the distortion caused by very sudden gain changes.
本公开的一些方面包括被配置(例如,被编程)为执行所公开的方法的一个或多个示例的系统或设备,以及存储用于实现所公开的方法或步骤的一个或多个示例的代码的有形计算机可读介质,例如盘。例如,一些所公开的系统可以是可编程通用处理器、数字信号处理器或微处理器或包括可编程通用处理器、数字信号处理器或微处理器,其用软件或固件编程和/或以其他方式配置为对数据执行各种操作中的任何一种,包括公开方法或其步骤的实施例。这种通用处理器可以是或包括计算机系统,该计算机系统包括输入设备、存储器和处理子系统,处理子系统被编程(和/或以其他方式配置)为响应于所声明的数据来执行所公开的方法(或其步骤)的一个或多个示例。Some aspects of this disclosure include systems or devices configured (e.g., programmed) to perform one or more examples of the disclosed methods, and tangible computer-readable media, such as a disk, storing code for implementing one or more examples of the disclosed methods or steps. For example, some of the disclosed systems may be or include a programmable general-purpose processor, digital signal processor, or microprocessor, programmed with software or firmware and/or otherwise configured to perform any of a variety of operations on data, including embodiments of the disclosed methods or steps thereof. Such a general-purpose processor may be or include a computer system including input devices, memory, and a processing subsystem programmed (and/or otherwise configured) to perform one or more examples of the disclosed methods (or steps thereof) in response to declared data.
一些实施例可以被实现为可配置的(例如,可编程的)数字信号处理器,其被配置(例如,编程和以其他方式配置)为对(多个)音频信号执行所需的处理,包括执行所公开的方法的一个或多个示例。备选地,所公开的系统(或其元件)的实施例可以被实现为通用处理器,例如个人计算机(PC)或其他计算机系统或微处理器,其可以包括输入设备和存储器,该存储器用软件或固件编程和/或以其他方式被配置为执行包括所公开的方法的一个或多个示例的各种操作中的任何操作。备选地,本发明系统的一些实施例的元件被实现为被配置(例如,编程)为执行所公开方法的一个或多个示例的通用处理器或DSP,并且该系统还包括其他元件。其他元件可以包括一个或多个扬声器和/或一个或多个麦克风。被配置为执行所公开的方法的一个或多个示例的通用处理器可以耦合到输入设备。输入设备的示例包括例如鼠标和/或键盘。通用处理器可以耦合到存储器、显示设备等。Some embodiments may be implemented as a configurable (e.g., programmable) digital signal processor configured (e.g., programmed and otherwise configured) to perform desired processing on (multiple) audio signals, including performing one or more examples of the disclosed methods. Alternatively, embodiments of the disclosed system (or elements thereof) may be implemented as a general-purpose processor, such as a personal computer (PC) or other computer system or microprocessor, which may include input devices and memory programmed with software or firmware and/or otherwise configured to perform any of the various operations including one or more examples of the disclosed methods. Alternatively, elements of some embodiments of the system of the present invention are implemented as general-purpose processors or DSPs configured (e.g., programmed) to perform one or more examples of the disclosed methods, and the system also includes other elements. Other elements may include one or more speakers and/or one or more microphones. A general-purpose processor configured to perform one or more examples of the disclosed methods may be coupled to an input device. Examples of input devices include, for example, a mouse and/or a keyboard. The general-purpose processor may be coupled to memory, a display device, etc.
本公开的另一个方面是一种计算机可读介质,诸如盘或其他有形存储介质,其存储用于例如由可执行以执行的编码器执行其所公开的方法或步骤的一个或多个示例的代码。Another aspect of this disclosure is a computer-readable medium, such as a disk or other tangible storage medium, that stores code for performing one or more examples of the disclosed methods or steps, for example, by an executable encoder.
虽然本文已经描述了本公开的特定实施例和本公开的应用,但是对于本领域的普通技术人员来说显而易见的是,在不偏离本文描述和要求保护的公开的范围的情况下,对本文描述的实施例和应用的许多变化是可能的。应当理解,虽然已经示出和描述了本公开的某些形式,但是本公开不限于所描述和示出的特定实施例或所描述的特定方法。While specific embodiments and applications of this disclosure have been described herein, it will be apparent to those skilled in the art that many variations of the described embodiments and applications are possible without departing from the scope of the disclosure described and claimed herein. It should be understood that while certain forms of this disclosure have been shown and described, this disclosure is not limited to the specific embodiments described and shown or the specific methods described.
从以下列举的示例实施例(EEE)也可以理解本公开的各个方面和实现,这些实施例不是权利要求。Various aspects and implementations of this disclosure can also be understood from the following exemplary embodiments (EEE), which are not claims.
EEE1.一种对音频信号执行增益控制的方法,所述方法包括:EEE1. A method for performing gain control on an audio signal, the method comprising:
获得待编码的音频信号的下混合音频信号;Obtain the downmixed audio signal of the audio signal to be encoded;
确定对于所述下混合音频信号的帧已经发生过载状况;It was determined that an overload condition had occurred in the frame of the downmixed audio signal;
响应于确定已经发生所述过载状况,确定用于所述帧的增益转换函数,其中所述增益转换函数至少基于增益转换步长;In response to determining that the overload condition has occurred, a gain conversion function for the frame is determined, wherein the gain conversion function is based at least on a gain conversion step size;
将所述增益转换函数应用于所述帧以生成所述下混合音频信号的经增益调整的帧;以及The gain conversion function is applied to the frame to generate a gain-adjusted frame of the downmixed audio signal; and
提供所述经增益调整的帧和指示所述增益转换函数的信息以供由编码器编码。The gain-adjusted frame and information indicating the gain conversion function are provided for encoding by the encoder.
EEE2.根据声明EEE1所述的方法,其中所述方法还包括:EEE2. The method described in claim EEE1, wherein the method further comprises:
将所述经增益调整的帧与指示所述增益转换函数的信息一起编码。The gain-adjusted frame is encoded together with information indicating the gain conversion function.
EEE3.根据前述声明所述的方法,其中获得待编码的音频信号的下混合音频信号包括:EEE3. The method described in the foregoing statement, wherein obtaining the downmixed audio signal of the audio signal to be encoded comprises:
接收所述下混合音频信号;或者Receive the downmixed audio signal; or
从待编码的所述音频信号确定所述下混合音频信号。The downmixed audio signal is determined from the audio signal to be encoded.
EEE4.根据任何前述声明所述的方法,其中所述音频信号是高阶环境立体声HOA音频信号。EEE4. The audio signal described in any of the foregoing statements is a high-order ambient stereo (HOA) audio signal.
EEE5.根据任何前述声明所述的方法,其中所述下混合音频信号是经空间编码的下混合信号。EEE5. According to any of the methods described in the foregoing statement, the downmixed audio signal is a spatially encoded downmixed signal.
EEE6.根据任何前述声明所述的方法,其中所述过载状况是所述下混合音频信号的所述帧超过预定义信号范围的状况。EEE6. According to any of the methods described in the foregoing statement, the overload condition is a condition in which the frame of the downmixed audio signal exceeds a predefined signal range.
EEE7.根据EEE6所述的方法,其中所述预定义信号范围是所述编码器期望的信号范围。EEE7. According to the method described in EEE6, the predefined signal range is the signal range expected by the encoder.
EEE8.根据任何前述声明所述的方法,其中所述下混合音频信号的所述帧是当前帧,并且所述增益转换函数还基于向所述当前帧的前一帧应用的先前增益转换函数。EEE8. According to any of the methods described in the foregoing statements, the frame of the downmixed audio signal is the current frame, and the gain conversion function is also based on a previous gain conversion function applied to the previous frame of the current frame.
EEE9.根据任何前述声明所述的方法,其中所述增益转换函数还依赖于基于所述增益转换步长的平滑函数。EEE9. According to any of the methods described in the foregoing statements, the gain conversion function further depends on a smoothing function based on the gain conversion step size.
EEE10.根据EEE8所述的方法,其中所述增益转换函数包括瞬时部分和稳态部分,并且其中所述瞬时部分对应于从与所述前一帧相关联的增益到按所述增益转换步长调整的与所述前一帧相关联的增益的转换。EEE10. The method according to EEE8, wherein the gain conversion function includes an instantaneous portion and a steady-state portion, and wherein the instantaneous portion corresponds to a conversion from a gain associated with the previous frame to a gain associated with the previous frame adjusted by the gain conversion step size.
EEE11.根据EEE10所述的方法,其中取决于所述当前帧的增益调整目标,按所述增益转换步长调整的与所述前一帧相关联的增益是对与所述前一帧相关联的增益衰减所述增益转换步长或放大所述增益转换步长。EEE11. The method according to EEE10, wherein, depending on the gain adjustment target of the current frame, the gain associated with the previous frame adjusted by the gain conversion step size is either attenuating the gain associated with the previous frame by the gain conversion step size or amplifying the gain conversion step size.
EEE12.根据EEE10或11所述的方法,其中所述瞬时部分的长度受限于所述编码器使用的编解码器引入的延迟。EEE12. The method according to EEE10 or 11, wherein the length of the instantaneous portion is limited by the delay introduced by the codec used by the encoder.
EEE13.根据EEE12所述的方法,其中所述瞬时部分的长度等于或小于所述编码器用于编码操作的样本数。EEE13. According to the method of EEE12, the length of the instantaneous portion is equal to or less than the number of samples used by the encoder for encoding operations.
EEE14.根据EEE10至13中任一项所述的方法,其中所述瞬时部分的长度大于1个样本。EEE14. The method according to any one of EEE10 to 13, wherein the length of the instantaneous portion is greater than one sample.
EEE15.根据任何前述声明所述的方法,其中所述增益转换函数被定义为EEE15. According to any of the methods described in the foregoing statements, wherein the gain conversion function is defined as
其中,DBSTEP是增益转换步长,l是样本索引,j是帧索引,p()是平滑函数,lend表示为其定义了p()的最右边的索引,而L是一帧的样本数。Where DBSTEP is the gain conversion step size, l is the sample index, j is the frame index, p() is the smoothing function, l end indicates that it defines the rightmost index of p(), and L is the number of samples in a frame.
EEE16.根据任何前述声明所述的方法,其中所述增益转换步长是预定义值。EEE16. According to any of the methods described in the foregoing statement, the gain conversion step size is a predefined value.
EEE17.根据任何前述声明所述的方法,其中所述增益转换步长是从大小增加的一组预定义值确定的。EEE17. According to any of the methods described in the foregoing statement, the gain conversion step size is determined from a set of predefined values that increase in size.
EEE18.根据EEE17所述的方法,其中所述方法还包括:EEE18. The method according to EEE17, wherein the method further comprises:
确定由所述下混合音频信号的所述帧引起的过载量;Determine the amount of overload caused by the frame of the downmixed audio signal;
根据所述过载量从大小增加的一组预定义值确定所述增益转换步长。The gain conversion step size is determined based on a set of predefined values that increase from the magnitude of the overload.
EEE19.根据任何前述声明所述的方法,其中基于感知质量收听测试或客观质量测量来确定所述增益转换步长。EEE19. The method according to any of the foregoing statements, wherein the gain conversion step size is determined based on perceived quality listening tests or objective quality measurements.
EEE20.根据EEE19所述的方法,其中所述感知质量收听测试是具有隐藏参照和锚的多刺激测试MUSHRA。EEE20. According to the method described in EEE19, the perceived quality listening test is the MUSHRA multistimulus test with hidden references and anchors.
EEE21.根据任何前述声明所述的方法,其中将所述增益转换函数应用于所述帧以生成所述下混合信号的经增益调整的帧包括:EEE21. The method according to any of the foregoing statements, wherein applying the gain conversion function to the frame to generate a gain-adjusted frame of the undermixed signal comprises:
将所述增益转换函数应用于所述下混合音频信号的样本,其中所述样本的总数对应于所述下混合音频信号的所述帧。The gain conversion function is applied to samples of the downmixed audio signal, wherein the total number of samples corresponds to the frame of the downmixed audio signal.
EEE22.根据EEE2或EEE3至21中任一项所述的方法,其中将所述经增益调整的帧与指示所述增益转换函数的信息一起编码包括:EEE22. The method according to any one of EEE2 or EEE3 to 21, wherein encoding the gain-adjusted frame together with information indicating the gain conversion function comprises:
基于所述增益转换函数确定编码方案。The encoding scheme is determined based on the gain conversion function.
EEE23.根据EEE22所述的方法,其中基于所述增益转换函数确定编码方案包括:EEE23. The method according to EEE22, wherein determining the coding scheme based on the gain conversion function includes:
基于所述增益转换步长确定所述编码方案。The encoding scheme is determined based on the gain conversion step size.
EEE24.根据EEE22所述的方法,其中基于所述增益转换函数确定编码方案包括:EEE24. The method according to EEE22, wherein determining the coding scheme based on the gain conversion function includes:
基于所述增益转换函数是否能够消除所述过载状况来确定所述编码方案。The encoding scheme is determined based on whether the gain conversion function can eliminate the overload condition.
EEE25.根据EEE22至24中任一项所述的方法,其中所述编码方案是修正离散余弦变换MDCT或代数码激励线性预测ACELP中的一种。EEE25. The method according to any one of EEE22 to 24, wherein the encoding scheme is one of Modified Discrete Cosine Transform (MDCT) or Algebraic Coded Excited Linear Prediction (ACELP).
EEE26.根据任何前述声明所述的方法,其中所述经增益调整的帧是衰减帧或放大帧。EEE26. According to any of the methods described in the foregoing statement, the gain-adjusted frame is an attenuated frame or an amplified frame.
EEE27.一种对音频信号执行增益控制的方法,所述方法包括:EEE27. A method for performing gain control on an audio signal, the method comprising:
在解码器处接收音频信号的编码帧;The decoder receives the encoded frames of the audio signal;
对音频信号的所述编码帧进行解码,以获得下混合音频信号的帧和指示由编码器应用的增益控制的信息;The encoded frames of the audio signal are decoded to obtain frames of the submixed audio signal and information indicating the gain control applied by the encoder;
至少部分地基于指示由所述编码器应用的增益控制的信息来确定要应用于所述下混合音频信号的所述帧的逆增益转换函数,其中指示由所述编码器应用的增益控制的所述信息包括增益转换步长;以及The inverse gain transition function to be applied to the frame of the downmixed audio signal is determined at least in part based on information indicating the gain control applied by the encoder, wherein the information indicating the gain control applied by the encoder includes a gain transition step size; and
将所述逆增益转换函数应用于所述下混合音频信号的所述帧。The inverse gain conversion function is applied to the frame of the downmixed audio signal.
EEE28.根据EEE27所述的方法,其中所述方法还包括:EEE28. The method according to EEE27, wherein the method further comprises:
对所述下混合音频信号进行上混合以生成上混合音频信号,其中所述上混合音频信号适合于渲染。The downmixed audio signal is upmixed to generate an upmixed audio signal, wherein the upmixed audio signal is suitable for rendering.
EEE29.根据EEE28所述的方法,还包括渲染所述上混合信号以产生经渲染的音频数据。EEE29. The method according to EEE28 further includes rendering the overmixed signal to generate rendered audio data.
EEE30.根据EEE29所述的方法,还包括使用扩音器或耳机中的一个或多个回放所述经渲染的音频数据。EEE30. The method according to EEE29 further includes playing back the rendered audio data using one or more of a loudspeaker or headphones.
EEE31.根据EEE27至30中任一项所述的方法,其中指示由所述编码器应用的增益控制的所述信息还包括指示平滑函数的信息。EEE31. The method according to any one of EEE27 to 30, wherein the information indicating the gain control applied by the encoder further includes information indicating a smoothing function.
EEE32.根据EEE27至31中任一项所述的方法,其中通过对由所述编码器应用的增益转换函数求逆来确定所述逆增益转换函数。EEE32. The method according to any one of EEE27 to 31, wherein the inverse gain conversion function is determined by inverting the gain conversion function applied by the encoder.
EEE33.根据EEE27至32中任一项所述的方法,其中所述逆增益转换函数包括瞬时部分和稳态部分。EEE33. The method according to any one of EEE27 to 32, wherein the inverse gain conversion function comprises an instantaneous portion and a steady-state portion.
EEE34.根据EEE33所述的方法,其中所述瞬时部分的长度受限于由所述解码器使用的编解码器引入的延迟。EEE34. According to the method of EEE33, the length of the instantaneous portion is limited by the delay introduced by the codec used by the decoder.
EEE35.一种被配置为实现EEE1-34中的任一项的方法的装置。EEE35. An apparatus configured to implement any one of EEE1-34.
EEE36.一种包括指令的程序,所述指令当由处理设备执行时使所述处理设备执行根据EEE1-34中的任一项所述的方法。EEE36. A program comprising instructions that, when executed by a processing device, cause the processing device to perform the method according to any one of EEE1-34.
EEE37.一种存储根据EEE36所述的程序的存储介质EEE37. A storage medium for storing a program according to EEE36.
EEA38.一种对音频信号执行增益控制的方法,所述方法包括:EEA38. A method for performing gain control on an audio signal, the method comprising:
由自动增益控制系统接收经空间编码的下混合音频信号;The spatially encoded submixed audio signal is received by the automatic gain control system;
确定对于所接收的信号的一个或多个帧出现过载状况;Determine if an overload condition occurs in one or more frames of the received signal;
响应于所述过载状况,通过将增益函数应用于所接收的信号以衰减所述过载来生成衰减信号,所述增益函数取决于(1)衰减级别参数,(2)针对所述一个或多个帧中的每一个指定相应衰减级别的增益函数形状,或者(3)所述衰减级别参数和所述增益函数形状的组合;以及In response to the overload condition, an attenuated signal is generated by applying a gain function to the received signal to attenuate the overload, the gain function depending on (1) an attenuation level parameter, (2) a gain function shape specifying a corresponding attenuation level for each of the one or more frames, or (3) a combination of the attenuation level parameter and the gain function shape; and
将衰减信号和衰减级别参数的表示提供给核心编码器以供编码。The attenuation signal and attenuation level parameters are provided to the core encoder for encoding.
EEE39.根据EEE38所述的方法,其中所述衰减级别参数包括数字的表格,每个数字对应于要连续地应用于所述一个或多个帧的相应的衰减级别。EEE39. The method according to EEE38, wherein the attenuation level parameter comprises a table of numbers, each number corresponding to a corresponding attenuation level to be applied continuously to the one or more frames.
EEE40.根据EEE39所述的方法,其中每个数字具有相同的值,指示每一步衰减使所述信号衰减相同的量。EEE40. The method according to EEE39, wherein each number has the same value, indicating that each step of attenuation causes the signal to attenuate by the same amount.
EEE41.根据EEE39所述的方法,其中所述数字的值增加,指示每一步衰减使所述信号衰减的量高于前一步。EEE41. According to the method of EEE39, the value of the number increases, indicating that each step of attenuation causes the signal to attenuate by a greater amount than the previous step.
EEA42.根据EEE38-41中的任一项所述的方法,包括引导所述核心编码器基于所述衰减级别参数使用不同编码方案来编码所述音频信号。EEA42. The method according to any one of EEE38-41, comprising guiding the core encoder to encode the audio signal using different encoding schemes based on the attenuation level parameter.
EEA43.根据EEE38-42中的任一项所述的方法,包括基于所述衰减级别参数的不同值改变所述增益函数形状。EEA43. The method according to any one of EEE38-42, including changing the shape of the gain function based on different values of the attenuation level parameter.
EEA44.一种被配置为实现EEE38-43中的任一项所述的方法的装置。EEA44. An apparatus configured to implement the method described in any of EEE38-43.
EEA45.一个或多个其上存储有软件的非瞬态介质,所述软件包括用于控制一个或多个设备执行EEE38-43中任一项所述的方法的指令。EEA45. One or more non-transient media having software stored thereon, the software including instructions for controlling one or more devices to perform the methods described in any one of EEE38-43.
Claims (37)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US63/378,678 | 2022-10-06 | ||
| US63/503,533 | 2023-05-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK40124189A true HK40124189A (en) | 2025-11-07 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7767675B2 (en) | Dynamic range control for various playback environments | |
| JP7662227B2 (en) | Loudness adjustment for downmixed audio content | |
| US10276173B2 (en) | Encoded audio extended metadata-based dynamic range control | |
| CN105103222B (en) | Metadata for loudness and dynamic range control | |
| EP2545646B1 (en) | System for combining loudness measurements in a single playback mode | |
| US11545166B2 (en) | Using metadata to aggregate signal processing operations | |
| US20240153512A1 (en) | Audio codec with adaptive gain control of downmixed signals | |
| TW202422318A (en) | Methods, apparatus and systems for performing perceptually motivated gain control | |
| HK40124189A (en) | Methods, apparatus and systems for performing perceptually motivated gain control | |
| HK40102855A (en) | Audio codec with adaptive gain control of downmixed signals | |
| CN116982109A (en) | Audio codec with adaptive gain control for downmix signals | |
| HK40106111B (en) | Audio coding with adaptive gain control of downmixed signals | |
| HK40106111A (en) | Audio coding with adaptive gain control of downmixed signals | |
| JP2026021423A (en) | Dynamic range control for various playback environments | |
| KR20240047372A (en) | Method and device for limiting output synthesis distortion in sound codec |