CN113299313B

CN113299313B - Audio processing method, device and electronic equipment

Info

Publication number: CN113299313B
Application number: CN202110121348.6A
Authority: CN
Inventors: 张勇
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-01-28
Filing date: 2021-01-28
Publication date: 2024-03-26
Anticipated expiration: 2041-01-28
Also published as: WO2022161475A1; CN113299313A

Abstract

This application discloses an audio processing method, device and electronic equipment, which belongs to the field of signal processing. It can solve the problem of poor playback effect of broadband/full-band non-speech signals. The above method includes: performing resolution improvement processing on the first audio signal to obtain a second audio signal; performing low-pass filtering processing on the above-mentioned second audio signal to obtain a processed second audio signal; and performing a processing on the above-processed second audio signal. The signal is subjected to signal processing to obtain Y first sub-band signals with the same bandwidth; M high-frequency sub-band signals are generated based on the low-frequency sub-band signals among the above-mentioned Y first sub-band signals; based on the high-frequency sub-band signals of the above-mentioned first audio signal frequency characteristic information, perform spectrum adjustment on the above-mentioned M high-frequency sub-band signals, and obtain M target high-frequency sub-band signals; synthesize the above-mentioned M target high-frequency sub-band signals to obtain the target audio signal; among them, Y, M is a positive integer. The embodiments of this application are applied to audio processing scenarios.

Description

Audio processing method, device and electronic equipment

技术领域Technical Field

本申请属于信号处理领域，具体涉及一种音频处理方法、装置及电子设备。The present application belongs to the field of signal processing, and specifically relates to an audio processing method, device and electronic equipment.

背景技术Background Art

随着电子技术的进步，电子设备性能的不断提升，高清电视、耳机、音箱和手机等已经可以支持高清音频的播放，人们对具有高保真和高表现力的高清音频的需求也更加迫切。With the advancement of electronic technology and the continuous improvement of the performance of electronic equipment, high-definition televisions, headphones, speakers and mobile phones can already support the playback of high-definition audio, and people's demand for high-fidelity and high-expressiveness high-definition audio has become more urgent.

一般的，音频信号通常包括语音信号和非语音信号(如，音乐信号)。在相关技术中，电子设备可以基于语音信号生成模型，将窄带语音信号扩展为宽带语音信号，以减少语音信号的声音信息的损失，提升语音信号的保真度。Generally, audio signals usually include speech signals and non-speech signals (e.g., music signals). In related technologies, electronic devices can generate models based on speech signals to expand narrowband speech signals into broadband speech signals to reduce the loss of sound information of speech signals and improve the fidelity of speech signals.

然而，由于非语音信号的频谱特征与语音信号的频谱特征不同，而电子设备中的语音信号生成模型是基于语音信号的频谱特征生成的，仅能处理频谱特征与语音信号相同的音频信号。因此，电子设备中的语音信号生成模型并无法适用于非语音信号(如，音乐信号，自然界产生的声音信号)。如此，使得电子设备无法处理该非语音信号，进而导致非语音信号的播放效果较差。However, since the spectral characteristics of non-speech signals are different from those of speech signals, and the speech signal generation model in the electronic device is generated based on the spectral characteristics of the speech signal, it can only process audio signals with the same spectral characteristics as the speech signal. Therefore, the speech signal generation model in the electronic device is not applicable to non-speech signals (such as music signals and sound signals generated by nature). As a result, the electronic device cannot process the non-speech signal, which leads to poor playback effect of the non-speech signal.

发明内容Summary of the invention

本申请实施例的目的是提供一种音频处理方法，能够解决宽带/全带非语音信号的播放效果较差的问题。The purpose of the embodiments of the present application is to provide an audio processing method that can solve the problem of poor playback effect of broadband/full-band non-voice signals.

为了解决上述技术问题，本申请是这样实现的：In order to solve the above technical problems, this application is implemented as follows:

第一方面，本申请实施例提供了一种音频处理方法，该方法包括：对第一音频信号进行分辨率提升处理，得到第二音频信号；对上述第二音频信号进行低通滤波处理，得到处理后的第二音频信号；对上述处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号；根据上述Y个第一子带信号中的低频子带信号，生成M个高频子带信号；基于上述第一音频信号的高频特征信息，对上述M个高频子带信号进行频谱调整，得到M个目标高频子带信号；将上述M个目标高频子带信号进行合成，得到目标音频信号；其中，Y、M为正整数。In a first aspect, an embodiment of the present application provides an audio processing method, the method comprising: performing resolution enhancement processing on a first audio signal to obtain a second audio signal; performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal; performing signal processing on the processed second audio signal to obtain Y first sub-band signals with the same bandwidth; generating M high-frequency sub-band signals according to low-frequency sub-band signals in the Y first sub-band signals; performing spectrum adjustment on the M high-frequency sub-band signals based on high-frequency feature information of the first audio signal to obtain M target high-frequency sub-band signals; synthesizing the M target high-frequency sub-band signals to obtain a target audio signal; wherein Y and M are positive integers.

第二方面，本申请实施例提供了一种音频处理装置，所述装置包括：处理模块，生成模块和合成模块，其中：In a second aspect, an embodiment of the present application provides an audio processing device, the device comprising: a processing module, a generation module and a synthesis module, wherein:

上述处理模块，用于对第一音频信号进行分辨率提升处理，得到第二音频信号；上述处理模块，还用于对上述第二音频信号进行低通滤波处理，得到处理后的第二音频信号；上述处理模块，还用于对上述处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号；上述生成模块，用于根据处理模块得到的Y个第一子带信号中的低频子带信号，生成M个高频子带信号；上述处理模块，还用于基于上述第一音频信号的高频特征信息，对上述生成模块生成的M个高频子带信号进行频谱调整，得到M个目标高频子带信号；上述合成模块，用于将上述处理模块得到的M个目标高频子带信号进行合成，得到目标音频信号；其中，Y、M为正整数。The above-mentioned processing module is used to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the above-mentioned processing module is also used to perform low-pass filtering processing on the above-mentioned second audio signal to obtain a processed second audio signal; the above-mentioned processing module is also used to perform signal processing on the above-mentioned processed second audio signal to obtain Y first sub-band signals with the same bandwidth; the above-mentioned generation module is used to generate M high-frequency sub-band signals according to the low-frequency sub-band signals in the Y first sub-band signals obtained by the processing module; the above-mentioned processing module is also used to perform spectrum adjustment on the M high-frequency sub-band signals generated by the above-mentioned generation module based on the high-frequency feature information of the above-mentioned first audio signal to obtain M target high-frequency sub-band signals; the above-mentioned synthesis module is used to synthesize the M target high-frequency sub-band signals obtained by the above-mentioned processing module to obtain a target audio signal; wherein Y and M are positive integers.

第三方面，本申请实施例提供了一种电子设备，该电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令，所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, comprising a processor, a memory, and a program or instruction stored in the memory and executable on the processor, wherein the program or instruction, when executed by the processor, implements the steps of the method described in the first aspect.

第四方面，本申请实施例提供了一种可读存储介质，所述可读存储介质上存储程序或指令，所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.

第五方面，本申请实施例提供了一种芯片，所述芯片包括处理器和通信接口，所述通信接口和所述处理器耦合，所述处理器用于运行程序或指令，实现如第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides a chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.

第六方面，本申请实施例提供了一种计算机程序产品，该程序产品被存储在非易失的存储介质中，该程序产品被至少一个处理器执行以实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, which is stored in a non-volatile storage medium and is executed by at least one processor to implement the method as described in the first aspect.

在本申请实施例中，电子设备可以对低分辨率的第一音频信号(如，宽带/全带非语音信号)进行分辨率提升处理得到高分辨率的第二音频信号，并对第二音频信号进行低通滤波处理，从而滤除第二音频信号中的高频信号，然后，对处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号，并根据上述Y个第一子带信号中的低频子带信号，生成M个高频子带信号，最后，基于低分辨率的第一音频信号的高频频谱信息，对上述M个高频子带信号进行频谱调整，得到M个目标高频子带信号，并将上述M个目标高频子带信号进行合成，得到高频部分的谐波特性得到良好重建的第一音频信号，如此，便可得到具有高保真和高表现力的高清音频信号，从而提升了非语音信号的播放效果。In an embodiment of the present application, the electronic device can perform resolution enhancement processing on a low-resolution first audio signal (such as a broadband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering on the second audio signal to filter out high-frequency signals in the second audio signal. Then, the processed second audio signal is subjected to signal processing to obtain Y first sub-band signals with the same bandwidth, and M high-frequency sub-band signals are generated based on the low-frequency sub-band signals in the above Y first sub-band signals. Finally, based on the high-frequency spectrum information of the low-resolution first audio signal, the above M high-frequency sub-band signals are spectrally adjusted to obtain M target high-frequency sub-band signals, and the above M target high-frequency sub-band signals are synthesized to obtain a first audio signal whose harmonic characteristics of the high-frequency part are well reconstructed. In this way, a high-definition audio signal with high fidelity and high expressiveness can be obtained, thereby improving the playback effect of the non-speech signal.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请实施例提供的一种音频处理方法的流程图；FIG1 is a flow chart of an audio processing method provided by an embodiment of the present application;

图2是本申请实施例提供的音频信号的波形图的示意图之一；FIG2 is a schematic diagram of a waveform diagram of an audio signal provided in an embodiment of the present application;

图3是本申请实施例提供的音频信号的波形图的示意图之二；FIG3 is a second schematic diagram of a waveform diagram of an audio signal provided in an embodiment of the present application;

图4是本申请实施例提供的频谱复制/翻转示意图；FIG4 is a schematic diagram of spectrum duplication/flipping provided in an embodiment of the present application;

图5是本申请实施例提供的一种神经网络拓扑结构的示意图；FIG5 is a schematic diagram of a neural network topology structure provided in an embodiment of the present application;

图6是本申请实施例提供的低通原型滤波器和PQMF分析滤波器组的幅频响应曲线；FIG6 is an amplitude-frequency response curve of a low-pass prototype filter and a PQMF analysis filter bank provided in an embodiment of the present application;

图7是本申请实施例提供的一种PQMF子带分析/合成滤波器组原理图；FIG7 is a schematic diagram of a PQMF subband analysis/synthesis filter bank provided in an embodiment of the present application;

图8是本申请实施例提供的高清音频生成系统框图；FIG8 is a block diagram of a high-definition audio generation system provided in an embodiment of the present application;

图9是本申请实施例提供的一种音频处理装置的结构示意图之一；FIG9 is a schematic diagram of a structure of an audio processing device provided in an embodiment of the present application;

图10是本申请实施例提供的一种音频处理装置的结构示意图之二；FIG10 is a second structural diagram of an audio processing device provided in an embodiment of the present application;

图11是本申请实施例提供的一种电子设备的硬件结构示意图。FIG. 11 is a schematic diagram of the hardware structure of an electronic device provided in an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”等所区分的对象通常为一类，并不限定对象的个数，例如第一对象可以是一个，也可以是多个。此外，说明书以及权利要求中“和/或”表示所连接对象的至少其中之一，字符“/”，一般表示前后关联对象是一种“或”的关系。The terms "first", "second", etc. in the specification and claims of this application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the data used in this way can be interchanged when appropriate, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first", "second", etc. are usually of one type, and the number of objects is not limited. For example, the first object can be one or more. In addition, "and/or" in the specification and claims represents at least one of the connected objects, and the character "/" generally indicates that the objects associated before and after are in an "or" relationship.

下面结合附图，通过具体的实施例及其应用场景对本申请实施例提供的音频处理方法进行详细地说明。The audio processing method provided in the embodiment of the present application is described in detail below through specific embodiments and their application scenarios in conjunction with the accompanying drawings.

本申请实施例提供了一种音频处理方法，该音频处理方法可以应用于音频处理装置，图示出了本申请实施例提供的音频处理方法的流程图，如图1所示，本申请实施例提供的音频处理方法可以包括如下步骤101至步骤106：The embodiment of the present application provides an audio processing method, which can be applied to an audio processing device. The figure shows a flow chart of the audio processing method provided by the embodiment of the present application. As shown in FIG1 , the audio processing method provided by the embodiment of the present application may include the following steps 101 to 106:

步骤101：对第一音频信号进行分辨率提升处理，得到第二音频信号。Step 101: Perform resolution enhancement processing on a first audio signal to obtain a second audio signal.

在本申请实施例中，上述第一音频信号包括以下至少一项：宽带音频 (16kHz采样)，超宽带音频(32kHz采样)和全带音频(44.1kHz采样、48kHz 采样)。In an embodiment of the present application, the first audio signal includes at least one of the following: wideband audio (16kHz sampling), ultra-wideband audio (32kHz sampling) and full-band audio (44.1kHz sampling, 48kHz sampling).

在本申请实施例中，上述第一音频信号的分辨率小于上述第二音频信号的分辨率。In an embodiment of the present application, the resolution of the first audio signal is smaller than the resolution of the second audio signal.

需要说明的是，音频信号的分辨率是由采样率(Sample rate)和位深(Bit Depth)决定的，对于位深相同的两个音频信号，采样率高的音频信号的分辨率比采样率低的音频信号的分辨率高，因此，可以通过提升音频信号的采样率来提升音频信号的分辨率。即，上述第一音频信号的采样率小于上述第二音频信号的采样率。例如，上述第二音频信号的采样率可以为96kHZ。It should be noted that the resolution of an audio signal is determined by the sampling rate and the bit depth. For two audio signals with the same bit depth, the resolution of the audio signal with a higher sampling rate is higher than that of the audio signal with a lower sampling rate. Therefore, the resolution of the audio signal can be improved by increasing the sampling rate of the audio signal. That is, the sampling rate of the first audio signal is less than the sampling rate of the second audio signal. For example, the sampling rate of the second audio signal can be 96 kHz.

在本申请实施例中，由于第一音频信号通常为宽带/超宽带/全带音频，其播放效果较差，因此，需要将第一音频信号调整为高清音频，然而，生成高清音频片源对软硬件环境要求较高。因此，本申请可以在不改变数字音频片源的采样率与编码格式，以及不增加网络传输带宽的情况下，可以将上述第一音频信号的采样率提升，以达到高清音频的采样率，从而可以将宽带/超宽带/全带音频调整为高清音频(96kHz采样)。In the embodiment of the present application, since the first audio signal is usually broadband/ultra-wideband/full-band audio, its playback effect is poor, therefore, the first audio signal needs to be adjusted to high-definition audio, however, generating a high-definition audio source has high requirements on the software and hardware environment. Therefore, the present application can increase the sampling rate of the above-mentioned first audio signal to reach the sampling rate of high-definition audio without changing the sampling rate and encoding format of the digital audio source and without increasing the network transmission bandwidth, so that the broadband/ultra-wideband/full-band audio can be adjusted to high-definition audio (96kHz sampling).

一般的，上采样和下采样都是对数字信号进行重采样，具体的，将重采样的采样率与原来获得该数字信号(比如从模拟信号采样而来)的采样率比较，若重采样的采样率大于原来获得该数字信号的采样率，则为上采样，反之，则为下采样。Generally, upsampling and downsampling both involve resampling of digital signals. Specifically, the resampled sampling rate is compared with the sampling rate of the original digital signal (for example, sampled from an analog signal). If the resampled sampling rate is greater than the sampling rate of the original digital signal, it is upsampling; otherwise, it is downsampling.

可以理解的是，上述分辨率提升处理可以认为是：对第一音频信号进行上采样的处理。即，上述步骤101可以包括如下步骤101a：It is understandable that the above-mentioned resolution enhancement process can be considered as: performing up-sampling on the first audio signal. That is, the above-mentioned step 101 can include the following step 101a:

步骤101a：对第一音频信号进行L倍上采样，得到预定采样率的第二音频信号。其中，L大于0。Step 101a: up-sample the first audio signal by L times to obtain a second audio signal of a predetermined sampling rate, where L is greater than 0.

示例性的，假定第一音频信号为采样率为48khz的全带音频信号，在对其进行2倍上采样(即，重采样)的情况下，该全带音频信号的采样率(48kHz) 转换为高清音频(96kHz)的采样率。Exemplarily, assuming that the first audio signal is a full-band audio signal with a sampling rate of 48 kHz, when it is upsampled (i.e., resampled) by 2 times, the sampling rate (48 kHz) of the full-band audio signal is converted to the sampling rate of high-definition audio (96 kHz).

示例1，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例，来说明上述步骤101a的具体实施过程。该全带音频的音频信号的时域波形图如图2中的(a)所示，该全带音频的音频信号的频谱图如图2中的(b)所示。例如，假设该全带音频的采样率为48kHZ，有效带宽为24kHz，该音频处理装置会对该全带音频输入进行2倍上采样，得到96kHz 采样的信号(即，第二音频信号)。Example 1, taking the generation of 96kHZ sampled high-definition audio through 48kHz sampled full-band audio (i.e., the first audio signal) as an example, the specific implementation process of the above step 101a is explained. The time domain waveform of the audio signal of the full-band audio is shown in (a) of Figure 2, and the spectrum of the audio signal of the full-band audio is shown in (b) of Figure 2. For example, assuming that the sampling rate of the full-band audio is 48kHZ and the effective bandwidth is 24kHz, the audio processing device will perform 2 times upsampling on the full-band audio input to obtain a 96kHz sampled signal (i.e., the second audio signal).

需要说明的是，由于音频信号的上采样处理会提升音频信号的带宽，因此，第二音频信号的带宽大于第一音频信号。It should be noted that, since the up-sampling process of the audio signal will increase the bandwidth of the audio signal, the bandwidth of the second audio signal is greater than that of the first audio signal.

步骤102：对第二音频信号进行低通滤波处理，得到处理后的第二音频信号。Step 102: Perform low-pass filtering on the second audio signal to obtain a processed second audio signal.

在本申请实施例中，信号处理装置可以通过低通滤波器，将第二音频信号中的高频分量(即，高频信号)过滤，只保留第二音频信号的低频分量(即，低频信号)。需要说明的是，低通滤波可以简单的认为：设定一个频率点(即，截止频率)，当信号频率高于这个频率时不能通过，当频域高于这个截止频率时，则全部赋值为0。In the embodiment of the present application, the signal processing device can filter the high-frequency component (i.e., high-frequency signal) in the second audio signal through a low-pass filter, and only retain the low-frequency component (i.e., low-frequency signal) of the second audio signal. It should be noted that low-pass filtering can be simply considered as: setting a frequency point (i.e., cutoff frequency), when the signal frequency is higher than this frequency, it cannot pass, and when the frequency domain is higher than this cutoff frequency, all values are assigned to 0.

示例2，结合上述示例1，来说明上述对第二音频信号的信号处理。音频处理装置在得到上述96kHz采样的信号后，可以对该音频信号经过一个截止频率为24kHz的低通滤波器进行滤波，以去除上采样后高频部分的镜像频率成分。经过上采样和低通滤波后的音频信号的波形图和频谱图分别如图3中的(a) 和图3中的(b)所示。Example 2, in combination with the above-mentioned Example 1, illustrates the above-mentioned signal processing of the second audio signal. After obtaining the above-mentioned 96kHz sampled signal, the audio processing device can filter the audio signal through a low-pass filter with a cutoff frequency of 24kHz to remove the image frequency component of the high-frequency part after upsampling. The waveform diagram and spectrum diagram of the audio signal after upsampling and low-pass filtering are shown in (a) of FIG3 and (b) of FIG3 respectively.

需要说明的是，上述第一音频信号的带宽与上述处理后的第二音频信号的带宽相同。例如，参照图3，处理后的音频信号的采样率为96kHz处理后的音频信号的有效带宽仍然保持为24kHz。It should be noted that the bandwidth of the first audio signal is the same as the bandwidth of the processed second audio signal. For example, referring to Fig. 3, the sampling rate of the processed audio signal is 96kHz and the effective bandwidth of the processed audio signal remains at 24kHz.

需要说明的是，音频信号的带宽定义为：该音频信号所拥有的频率范围，根据奈奎斯特定律，信号的采样频率(即，采样率)是信号带宽的2倍，即，信号的带宽为信号的采样频率的1/2。假定第一音频信号的采样率为48kHz，其带宽为48kHz/2，即，24kHz。It should be noted that the bandwidth of an audio signal is defined as: the frequency range of the audio signal. According to the Nyquist theorem, the sampling frequency (i.e., sampling rate) of a signal is twice the bandwidth of the signal, i.e., the bandwidth of a signal is 1/2 of the sampling frequency of the signal. Assuming that the sampling rate of the first audio signal is 48kHz, its bandwidth is 48kHz/2, i.e., 24kHz.

步骤103：对上述处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号。Step 103: Perform signal processing on the second audio signal after the above processing to obtain Y first sub-band signals with the same bandwidth.

其中，Y为正整数。Wherein, Y is a positive integer.

在本申请实施例中，上述Y个第一子带信号包括高频子带信号和低频子带信号。In an embodiment of the present application, the above-mentioned Y first sub-band signals include a high-frequency sub-band signal and a low-frequency sub-band signal.

可选地，在本申请实施例中，上述对处理后的第二音频信号进行信号处理可以为：对处理后的第二音频信号进行滤波处理和下采样处理。Optionally, in the embodiment of the present application, the above-mentioned signal processing of the processed second audio signal may be: performing filtering processing and down-sampling processing on the processed second audio signal.

示例性的，上述信号处理包括PQMF子带滤波处理和下采样处理。进一步地，音频处理装置可以先通过PQMF子带滤波器组，将输入的处理后的第二音频信号分成Y个等带宽的子带信号，再通过对各个子带信号进行下采样，得到 Y个第一子带信号。Exemplarily, the above signal processing includes PQMF sub-band filtering and down-sampling. Further, the audio processing device can first divide the input processed second audio signal into Y sub-band signals of equal bandwidth through the PQMF sub-band filter bank, and then obtain Y first sub-band signals by down-sampling each sub-band signal.

需要说明的是，PQMF子带分析对原始信号进行时频变换，其目的是能够得到反映高低频相关性、具有良好谐波特性且便于分析的多个子带信号。在分析端，通过PQMF分析滤波器组把输入的时域信号分成多个等带宽的子带信号，然后再对各个子带信号进行下采样。在合成端，首先对各个子带信号进行上采样，然后通过PQMF合成滤波器组把上采样后的子带信号转换为时域信号。It should be noted that PQMF sub-band analysis performs time-frequency transformation on the original signal, and its purpose is to obtain multiple sub-band signals that reflect the correlation between high and low frequencies, have good harmonic characteristics and are easy to analyze. At the analysis end, the input time domain signal is divided into multiple sub-band signals of equal bandwidth through the PQMF analysis filter group, and then each sub-band signal is down-sampled. At the synthesis end, each sub-band signal is first up-sampled, and then the up-sampled sub-band signal is converted into a time domain signal through the PQMF synthesis filter group.

需要说明的是，上述Y个第一子带信号中的高频子带信号和低频子带信号的划分是根据处理后的第二音频信号的高频分量和低频分量的频率范围确定的。即，信号频率在低频分量的频率范围内的第一子带信号为低频子带信号；信号频率在高频分量的频率范围内的第一子带信号为高频子带信号。It should be noted that the division of the high-frequency sub-band signal and the low-frequency sub-band signal in the Y first sub-band signals is determined according to the frequency range of the high-frequency component and the low-frequency component of the processed second audio signal. That is, the first sub-band signal whose signal frequency is within the frequency range of the low-frequency component is a low-frequency sub-band signal; the first sub-band signal whose signal frequency is within the frequency range of the high-frequency component is a high-frequency sub-band signal.

步骤104：根据Y个第一子带信号中的低频子带信号，生成M个高频子带信号。Step 104: Generate M high frequency sub-band signals according to the low frequency sub-band signals in the Y first sub-band signals.

在本申请实施例中，音频处理装置可以根据一个低频子带信号生成一个或者多个高频子带信号，即，Y个第一子带信号中的每个低频子带信号分别对应一个或多个高频子带信号，Y小于或等于M。In an embodiment of the present application, the audio processing device can generate one or more high-frequency sub-band signals based on a low-frequency sub-band signal, that is, each low-frequency sub-band signal in the Y first sub-band signals corresponds to one or more high-frequency sub-band signals, and Y is less than or equal to M.

在本申请实施例中，可以利用高频生成器，根据Y个第一子带信号中的低频子带信号的频谱生成高频子带信号频谱，以生成高频子带信号。In the embodiment of the present application, a high frequency generator can be used to generate a high frequency sub-band signal spectrum according to the spectrum of the low frequency sub-band signal in the Y first sub-band signals to generate a high frequency sub-band signal.

示例性的，音频处理装置生成M个高频子带信号的方法可以包括表1所示的4种方法中的任意一种。Exemplarily, the method for the audio processing device to generate M high frequency sub-band signals may include any one of the four methods shown in Table 1.

表1高频子带频谱生成方法Table 1 High frequency sub-band spectrum generation method

参照上表1可知，上述方法1和方法2对应的频谱处理类型为频谱复制，上述方法3和方法4对应的频谱处理类型为频谱翻转，而该频谱复制和频谱翻转的差异可以参照图4所示。Referring to Table 1 above, it can be seen that the spectrum processing type corresponding to the above methods 1 and 2 is spectrum replication, and the spectrum processing type corresponding to the above methods 3 and 4 is spectrum flipping. The difference between spectrum replication and spectrum flipping can be shown in FIG4 .

步骤105：基于第一音频信号的高频特征信息，对上述M个高频子带信号进行频谱调整，得到M个目标高频子带信号。Step 105: Based on the high-frequency feature information of the first audio signal, perform spectrum adjustment on the M high-frequency sub-band signals to obtain M target high-frequency sub-band signals.

在本申请实施例中，上述高频特征信息可以为上述M个高频子带信号的信号增益。In an embodiment of the present application, the above-mentioned high-frequency feature information may be the signal gain of the above-mentioned M high-frequency sub-band signals.

在本申请实施例中，音频处理装置可以通过包络调节器对上述M个高频子带信号的幅度进行调整，得到M个重建高频子带信号(即，上述M个目标高频子带信号)。In the embodiment of the present application, the audio processing device can adjust the amplitudes of the M high frequency sub-band signals through an envelope adjuster to obtain M reconstructed high frequency sub-band signals (i.e., the M target high frequency sub-band signals).

步骤106：将M个目标高频子带信号进行合成，得到目标音频信号。Step 106: synthesize the M target high frequency sub-band signals to obtain a target audio signal.

其中，M为正整数。Wherein, M is a positive integer.

在本申请实施例中，音频处理装置可以通过PQMF合成滤波器组，对上述 M个目标高频子带信号进行合成，得到目标音频信号。In an embodiment of the present application, the audio processing device can synthesize the above-mentioned M target high-frequency sub-band signals through a PQMF synthesis filter group to obtain a target audio signal.

在本申请实施例提供的音频处理方法中，电子设备可以对低分辨率的第一音频信号(如，宽带/全带非语音信号)进行分辨率提升处理得到高分辨率的第二音频信号，并对第二音频信号进行低通滤波处理，从而滤除第二音频信号中的高频信号，然后，对处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号，并根据上述Y个第一子带信号中的低频子带信号，生成M 个高频子带信号，最后，基于低分辨率的第一音频信号的高频频谱信息，对上述M个高频子带信号进行频谱调整，得到M个目标高频子带信号，并将上述 M个目标高频子带信号进行合成，得到高频部分的谐波特性得到良好重建的第一音频信号，如此，便可得到具有高保真和高表现力的高清音频信号，从而提升了非语音信号的播放效果。In the audio processing method provided in the embodiment of the present application, the electronic device can perform resolution enhancement processing on a low-resolution first audio signal (such as a broadband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering on the second audio signal to filter out the high-frequency signal in the second audio signal. Then, the processed second audio signal is subjected to signal processing to obtain Y first sub-band signals with the same bandwidth, and M high-frequency sub-band signals are generated based on the low-frequency sub-band signals in the above Y first sub-band signals. Finally, based on the high-frequency spectrum information of the low-resolution first audio signal, the above M high-frequency sub-band signals are spectrally adjusted to obtain M target high-frequency sub-band signals, and the above M target high-frequency sub-band signals are synthesized to obtain a first audio signal whose harmonic characteristics of the high-frequency part are well reconstructed. In this way, a high-definition audio signal with high fidelity and high expressiveness can be obtained, thereby improving the playback effect of the non-speech signal.

可选地，在本申请实施例中，由于音频信号的高频子带信号和低频子带信号之间存在相关性，因此，可以根据处理后的第二音频信号中的低频子带信号生成相应的高频子带信号。Optionally, in the embodiment of the present application, since there is a correlation between the high-frequency sub-band signal and the low-frequency sub-band signal of the audio signal, a corresponding high-frequency sub-band signal can be generated according to the low-frequency sub-band signal in the processed second audio signal.

示例性的，上述步骤104可以包括如下步骤104a：Exemplarily, the above step 104 may include the following step 104a:

步骤104a：对上述Y个子带信号中的所有低频子带信号进行频谱复制，生成M个高频子带信号。Step 104a: Perform spectrum replication on all low-frequency sub-band signals in the Y sub-band signals to generate M high-frequency sub-band signals.

示例性的，音频处理装置可以采用上述的表1中频谱复制的方法，来生成 M个高频子带信号的频谱。示例性的，音频处理装置可以将低频子带信号的频谱的上半部分复制多次，来生成M个高频子带信号的频谱，以生成M个高频子带信号。Exemplarily, the audio processing device may use the spectrum replication method in Table 1 to generate the spectrum of M high-frequency sub-band signals. Exemplarily, the audio processing device may replicate the upper half of the spectrum of the low-frequency sub-band signal multiple times to generate the spectrum of M high-frequency sub-band signals to generate M high-frequency sub-band signals.

如此，音频处理装置可以基于处理后的第二音频信号中的低频分量，得到处理后的第二音频信号中的高频分量，从而初步得到处理后的第二音频信号的频谱。In this way, the audio processing device can obtain the high-frequency component in the processed second audio signal based on the low-frequency component in the processed second audio signal, thereby preliminarily obtaining the frequency spectrum of the processed second audio signal.

可选地，在本申请实施例中，音频处理装置可以基于音频信号的低频特征和高频谱包络间较强的相关性，抽取原始音频信号的低频特征，以根据低频特征预测音频信号的高频特征。Optionally, in an embodiment of the present application, the audio processing device may extract the low-frequency features of the original audio signal based on the strong correlation between the low-frequency features and the high-frequency spectrum envelope of the audio signal, so as to predict the high-frequency features of the audio signal according to the low-frequency features.

示例性的，上述步骤105之前，本申请实施例提供的音频处理方法还包括如下步骤A1和步骤A2：Exemplarily, before the above step 105, the audio processing method provided in the embodiment of the present application further includes the following steps A1 and A2:

步骤A1：对第一音频信号进行特征提取，得到上述第一音频信号的低频特征信息。Step A1: extract features from the first audio signal to obtain low-frequency feature information of the first audio signal.

步骤A2：将上述低频特征信息输入预设神经网络模型预测出第一音频信号的高频特征信息。Step A2: Input the above low-frequency feature information into a preset neural network model to predict the high-frequency feature information of the first audio signal.

示例性的，上述低频特征信息包括以下至少一项：第一音频信号的归一化自相关系数(x_acf)，梯度指数x_gi，子带谱平坦度(x_sfm)。Exemplarily, the low-frequency feature information includes at least one of the following: a normalized autocorrelation coefficient (x _acf ) of the first audio signal, a gradient index x _gi , and a sub-band spectrum flatness (x _sfm ).

需要说明的是，上述低频特征信息可以视为第一音频信号的特征参数，特征参数的选择需要考虑以下三个原则：It should be noted that the above low-frequency feature information can be regarded as a feature parameter of the first audio signal. The selection of the feature parameter needs to consider the following three principles:

(1).低频特征参数与高频谱包络具有较强的相关性；(1) The low-frequency characteristic parameters have a strong correlation with the high-frequency spectrum envelope;

(2).特征分量之间有良好的独立性；(2) There is good independence between feature components;

(3).特征分量易于计算。(3) The characteristic components are easy to calculate.

基于上述原则，本申请实施例选择通过上述3个特征参数，分别从时域和频域的角度来对音频特性进行描述。在实际应用中，还可以选用其他的具备可行性的特征参数，本申请实施例对此不做任何限定。Based on the above principles, the present embodiment selects the above three characteristic parameters to describe the audio characteristics from the perspective of time domain and frequency domain. In practical applications, other feasible characteristic parameters can also be selected, and the present embodiment does not make any limitation on this.

对于上述三个频特征信息(即，特征参数)的进一步详细说明见下文。For further detailed description of the above three frequency characteristic information (ie, characteristic parameters), please see below.

示例性的，上述预设神经网络可以为DNN神经网络。需要说明的是，DNN 神经网络是一种单向传播的多层前向网络，其能够高效的对复杂数据进行抽象和建模。DNN神经网络拓扑结构如图5所示，其分为三类，输入层、隐藏层和输出层。通常，第一层是输入层，最后一层是输出层，中间层都是隐藏层。各层神经元之间实现全连接，而相同层神经元之间无连接。Exemplarily, the above-mentioned preset neural network can be a DNN neural network. It should be noted that the DNN neural network is a multi-layer forward network with one-way propagation, which can efficiently abstract and model complex data. The topological structure of the DNN neural network is shown in Figure 5, which is divided into three categories: input layer, hidden layer and output layer. Usually, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The neurons in each layer are fully connected, and there is no connection between the neurons in the same layer.

示例性的，上述DNN神经网络用于建立从第一音频信号的低频特征到第一音频信号的高频谱包络的非线性映射。Exemplarily, the above-mentioned DNN neural network is used to establish a nonlinear mapping from the low-frequency features of the first audio signal to the high-frequency spectrum envelope of the first audio signal.

示例性的，上述DNN神经网络的输入为上述第一音频信号的高频特征信息，包括归一化自相关系数、梯度指数、子带谱平坦度，上述DNN神经网络的输出为第一音频信号的高频子带信号的信号增益(用G表示)。Exemplarily, the input of the above-mentioned DNN neural network is the high-frequency feature information of the above-mentioned first audio signal, including the normalized autocorrelation coefficient, the gradient index, and the sub-band spectrum flatness, and the output of the above-mentioned DNN neural network is the signal gain (represented by G) of the high-frequency sub-band signal of the first audio signal.

如此，音频处理装置可以基于第一音频信号的低频特征信息，预测出第一音频信号的高频特征信息，从而通过该高频特征信息对上述处理后的第二音频信号的频谱(即，频谱包络)进行调整。In this way, the audio processing device can predict the high-frequency feature information of the first audio signal based on the low-frequency feature information of the first audio signal, thereby adjusting the spectrum (i.e., spectrum envelope) of the processed second audio signal through the high-frequency feature information.

可选地，在本申请实施例中，音频处理装置可以对上述处理后的第二音频信号进行分帧，然后基于每个音频信号帧进行音频信号处理，以减少语音信号整体的非稳态、时变的影响。Optionally, in an embodiment of the present application, the audio processing device may frame the second audio signal after the above processing, and then perform audio signal processing based on each audio signal frame to reduce the impact of the overall non-steady state and time variation of the speech signal.

示例性的，上述步骤103可以包括如下步骤103a和步骤103b：Exemplarily, the above step 103 may include the following steps 103a and 103b:

步骤103a：对上述处理后的第二音频信号进行分帧，得到X个音频信号帧。Step 103a: Divide the second audio signal after the above processing into frames to obtain X audio signal frames.

步骤103b：依次对每个音频信号帧进行滤波和下采样处理，得到每个音频信号帧对应的N个第一子带信号。Step 103b: Filter and downsample each audio signal frame in turn to obtain N first subband signals corresponding to each audio signal frame.

其中，上述Y个第一子带信号包括：每个音频信号帧对应的N个第一子带信号。The above-mentioned Y first sub-band signals include: N first sub-band signals corresponding to each audio signal frame.

示例性的，每个音频信号帧包括第一预定数量的样本点。例如，可以预设每个信号帧包括2048个样本点。Exemplarily, each audio signal frame includes a first predetermined number of sample points. For example, it can be preset that each signal frame includes 2048 sample points.

示例性的，上述X个音频信号帧中的X是根据第二音频信号的采样率和上述每个音频信号帧包括的样本点的数量确定的。Exemplarily, X in the X audio signal frames is determined according to a sampling rate of the second audio signal and the number of sample points included in each of the audio signal frames.

示例性的，音频处理装置可以对得到的X个音频信号帧进行编号，每个音频信号帧可以对应一个序号，例如，假设处理后的第二音频信号包括l个音频信号帧，则可以将上述l个音频信号帧从1-l进行编号。Exemplarily, the audio processing device may number the obtained X audio signal frames, and each audio signal frame may correspond to a sequence number. For example, assuming that the processed second audio signal includes l audio signal frames, the l audio signal frames may be numbered from 1-1.

举例说明，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例。结合上述示例1和示例2，在对第一音频信号进行上采样和低通滤波，得到处理后的第二音频信号后，处理后的第二音频信号的采样率为96kHz，可以按照96kHz的采样率，将其分为每帧2048个样本点的46个音频信号帧(即，X个音频信号帧)。For example, take the case of generating a 96 kHz sampled high-definition audio through a 48 kHz sampled full-band audio (i.e., a first audio signal). In combination with the above examples 1 and 2, after up-sampling and low-pass filtering the first audio signal to obtain a processed second audio signal, the sampling rate of the processed second audio signal is 96 kHz, and it can be divided into 46 audio signal frames (i.e., X audio signal frames) with 2048 sample points per frame according to the sampling rate of 96 kHz.

示例性的，音频处理装置可以按照X个音频信号帧的时序信息，依次对每个音频信号帧进行上述滤波处理和下采样处理。Exemplarily, the audio processing device may perform the above-mentioned filtering and down-sampling processing on each audio signal frame in sequence according to the timing information of the X audio signal frames.

示例性的，上述N个第一子带信号中的每个第一子带信号存在一个索引，一个索引对应一个第一子带信号。Exemplarily, each of the N first sub-band signals has an index, and one index corresponds to one first sub-band signal.

示例性的，上述N个第一子带信号包括P个低频子带信号和Q个高频子带信号。其中，P和Q为正整数。Exemplarily, the N first sub-band signals include P low-frequency sub-band signals and Q high-frequency sub-band signals, where P and Q are positive integers.

示例性的，上述每个音频信号帧对应的子带信号的个数(即，N)为预设的，进一步地，上述子带信号的个数是根据为PQMF子带滤波器组设置的参数确定的。例如，将PQMF子带滤波器组的子带个数设定为64，在通过PQMF 子带滤波器组对每个音频信号帧进行处理后，则可以得到每个音频信号帧对应的64个子带信号。Exemplarily, the number of subband signals corresponding to each audio signal frame (i.e., N) is preset, and further, the number of subband signals is determined according to the parameters set for the PQMF subband filter bank. For example, the number of subbands of the PQMF subband filter bank is set to 64, and after each audio signal frame is processed by the PQMF subband filter bank, 64 subband signals corresponding to each audio signal frame can be obtained.

示例性的，针对上述步骤103b，音频处理装置可以先对每个音频信号帧进行PQMF滤波处理，得到每个音频信号帧对应的N个子带信号，然后，对N 个子带信号后进行下采样，得到N个第一子带信号。进一步的，上述下采样处理可以为N倍下采样处理。Exemplarily, for the above step 103b, the audio processing device may first perform PQMF filtering on each audio signal frame to obtain N sub-band signals corresponding to each audio signal frame, and then downsample the N sub-band signals to obtain N first sub-band signals. Further, the above downsampling process may be an N-fold downsampling process.

示例性的，上述N个第一子带信号中的每个第一子带信号包括第二预定数量的样本点。进一步地，上述第二预定数量是根据下采样的采样倍数确定的。Exemplarily, each of the N first sub-band signals includes a second predetermined number of sample points. Further, the second predetermined number is determined according to a sampling multiple of downsampling.

示例性的，上述每个第一子带信号中的第二预定数量的样本点，在该第一子带信号所在的频率范围内按时间顺序排列。Exemplarily, the second predetermined number of sample points in each of the first sub-band signals are arranged in time sequence within the frequency range where the first sub-band signal is located.

示例3，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例。在对上述处理后的第二音频信号进行分帧后，假设上述每个信号帧包括2048个样本点，经过PQMF分析滤波器组滤波后得到64个子带信号，再对每个子带信号进行64倍下采样，得到64个第一子带信号，每个第一子带信号包括32个样本点。其中，第0-31个子带信号为低频子带信号，第32-63个子带信号为高频子带信号。Example 3, taking the generation of 96kHZ sampled high-definition audio from 48kHz sampled full-band audio (i.e., the first audio signal) as an example. After the second audio signal after the above processing is framed, assuming that each signal frame includes 2048 sample points, 64 sub-band signals are obtained after filtering by the PQMF analysis filter bank, and then each sub-band signal is down-sampled 64 times to obtain 64 first sub-band signals, each of which includes 32 sample points. Among them, the 0th to 31st sub-band signals are low-frequency sub-band signals, and the 32nd to 63rd sub-band signals are high-frequency sub-band signals.

需要说明的是，上述每个音频信号帧对应的N个第一子带信号，分别属于第二音频信号的N个不同的频率范围(即，频段)。例如，假设上述每个音频信号帧对应64个第一子带信号，则将上述第二音频信号按照信号频率划分为 64个频率范围，每个第一子带信号属于上述64个频率范围中的一个频率范围。如此，得到的N个第一子带信号具备能够反映信号的频率特性，具备良好的谐波特性。It should be noted that the N first sub-band signals corresponding to each audio signal frame belong to N different frequency ranges (i.e., frequency bands) of the second audio signal. For example, assuming that each audio signal frame corresponds to 64 first sub-band signals, the second audio signal is divided into 64 frequency ranges according to the signal frequency, and each first sub-band signal belongs to one frequency range of the 64 frequency ranges. In this way, the obtained N first sub-band signals have frequency characteristics that can reflect the signal and have good harmonic characteristics.

为了便于描述，将PQMF分析滤波器组输出信号，即上述N个第一子带信号表示为x_l[k][n]，其中k表示子带序号，其范围为0≤k≤63，n表示每个子带内时序样点的时序序号，其范围为0≤n≤31，l表示当前音频信号帧的序号。For ease of description, the PQMF analysis filter group output signal, that is, the above-mentioned N first subband signals are represented as x _l [k][n], where k represents the subband number, which ranges from 0≤k≤63, n represents the time sequence number of the time sequence sample point in each subband, which ranges from 0≤n≤31, and l represents the sequence number of the current audio signal frame.

需要说明的是，针对X个音频信号帧中的每个音频信号帧，经过PQMF 分析滤波器组滤波后，输出的子带信号(即，第一子带信号)形成x[k][n]矩阵，其中k表示变换后的子带序号(第一子带信号的序号)，n表示变换后的子带时序样本点(即，第一子带信号的时序样本点)的序号。x[k][n]具有时间和频率双重解析度，既具备频域的频率分布特性，又具备时域的波形特性。It should be noted that, for each of the X audio signal frames, after being filtered by the PQMF analysis filter bank, the output subband signal (i.e., the first subband signal) forms an x[k][n] matrix, where k represents the transformed subband sequence number (the sequence number of the first subband signal), and n represents the sequence number of the transformed subband time series sample point (i.e., the time series sample point of the first subband signal). x[k][n] has dual resolutions of time and frequency, and has both frequency distribution characteristics in the frequency domain and waveform characteristics in the time domain.

为了便于理解，以下对PQMF分析滤波器组与合成滤波器组的表达式进行说明。For ease of understanding, the expressions of the PQMF analysis filter bank and the synthesis filter bank are explained below.

示例性的，本申请实施例使用的PQMF分析滤波器组与合成滤波器组的数学表达式如下：Exemplarily, the mathematical expressions of the PQMF analysis filter bank and the synthesis filter bank used in the embodiment of the present application are as follows:

分析滤波器：Analysis Filters:

合成滤波器：Synthesis Filter:

其中，式(1)和式(2)中的N为第一子带信号的个数，p(n)为低通原型滤波器，其归一化截止频率为π/(2N)，滤波器长度为M，M＝LN，L为任意正整数，k＝0,1,…,N-1，表示子带序号，n标识变换后的子带时序样本点的序号。Wherein, N in equation (1) and equation (2) is the number of the first subband signal, p(n) is a low-pass prototype filter, whose normalized cutoff frequency is π/(2N), the filter length is M, M=LN, L is any positive integer, k=0,1,…,N-1, represents the subband number, and n identifies the number of the subband time series sample point after the transformation.

示例性的，上述PQMF子带滤波器组的子带个数可以设定为N＝64，低通原型滤波器p(n)的阶数可以设定为M＝768，滤波器阻带衰减设计为-90dB。Exemplarily, the number of subbands of the above-mentioned PQMF subband filter bank can be set to N=64, the order of the low-pass prototype filter p(n) can be set to M=768, and the filter stopband attenuation is designed to be -90 dB.

图6中的(a)为低通原型滤波器p(n)的幅频响应曲线，图6中的(b)为 PQMF分析滤波器组的幅频响应曲线。(a) in FIG6 is the amplitude-frequency response curve of the low-pass prototype filter p(n), and (b) in FIG6 is the amplitude-frequency response curve of the PQMF analysis filter bank.

图7为PQMF子带分析/合成滤波器组原理示意图，图7中的H_k(z)为h_k(n) 的Z变换，F_k(z)为F_k(n)的Z变换。FIG7 is a schematic diagram of the principle of a PQMF subband analysis/synthesis filter bank. In FIG7 , H _k (z) is the Z transform of h _k (n), and F _k (z) is the Z transform of F _k (n).

需要说明的是，上述分析滤波器组用于将输入的时域信号分成N个子带信号，上述合成滤波器组用于将N个子带信号合成为一个时域信号。It should be noted that the above-mentioned analysis filter group is used to divide the input time domain signal into N sub-band signals, and the above-mentioned synthesis filter group is used to synthesize the N sub-band signals into one time domain signal.

进一步可选地，结合上述103b，上述步骤104a可以包括如下步骤104a1：Further optionally, in combination with the above 103b, the above step 104a may include the following step 104a1:

步骤104a1：根据上述每个音频信号帧的N个第一子带信号中的低频子带信号，生成至少一个高频子带信号。Step 104a1: Generate at least one high frequency sub-band signal according to the low frequency sub-band signal in the N first sub-band signals of each audio signal frame.

示例性的，每个音频信号帧最终生成的高频子带信号的数量相同。Exemplarily, the number of high frequency sub-band signals finally generated in each audio signal frame is the same.

示例4，结合上述示例3，音频处理装置在得到每个音频信号帧对应的64 个第一子带信号后，音频处理装置在进行复制时，可以选择子带索引为15-30 的16个低频子带信号(即，对应表2中的低频源子带序号)，并将该低频子带信号的频谱系数复制2次，生成32个高频子带频谱系数(即，对应表2中的高频目标子带序号)，其频段复制时的对应关系如表2所示。Example 4. In combination with the above Example 3, after the audio processing device obtains 64 first subband signals corresponding to each audio signal frame, the audio processing device can select 16 low-frequency subband signals with subband indexes of 15-30 (i.e., corresponding to the low-frequency source subband sequence numbers in Table 2) when copying, and copy the spectral coefficients of the low-frequency subband signals twice to generate 32 high-frequency subband spectral coefficients (i.e., corresponding to the high-frequency target subband sequence numbers in Table 2). The corresponding relationship during frequency band copying is shown in Table 2.

低频源子带信号Low frequency source subband signal 高频目标子带信号High frequency target subband signal 1515 32、4832, 48 1616 33、4933, 49 1717 34、5034, 50 1818 35、5135, 51 1919 36、5236, 52 2020 37、5337, 53 21twenty one 38、5438, 54 22twenty two 39、5539, 55 23twenty three 40、5640, 56 24twenty four 41、5741, 57 2525 42、5842, 58 2626 43、5943, 59 2727 44、6044, 60 2828 45、6145, 61 2929 46、6246, 62 3030 47、6347, 63

表2高频和低频频段复制对应表Table 2 High frequency and low frequency band replication correspondence table

需要说明的是，表2中的“低频源子带序号”为上述低频子带信号的序号， “高频目标子带序号”为上述高频子带信号的序号。It should be noted that the “low-frequency source sub-band sequence number” in Table 2 is the sequence number of the low-frequency sub-band signal, and the “high-frequency target sub-band sequence number” is the sequence number of the high-frequency sub-band signal.

进一步可选地，在本申请实施例中，上述步骤A1，包括如下步骤B1：Further optionally, in the embodiment of the present application, the above step A1 includes the following step B1:

步骤B1：对上述每个音频信号帧中的N个第一子带信号中的P个低频子带信号进行特征提取，得到每个音频信号帧的低频特征信息。Step B1: extracting features of the P low-frequency sub-band signals in the N first sub-band signals in each audio signal frame to obtain low-frequency feature information of each audio signal frame.

示例性的，音频处理装置可以根据第一音频信号的样本数和自相关函数的阶数，计算第一音频信号的归一化自相关系数和梯度指数。Exemplarily, the audio processing device may calculate the normalized autocorrelation coefficient and the gradient index of the first audio signal according to the number of samples of the first audio signal and the order of the autocorrelation function.

下面对上述低频特征信息的定义详细进行说明：The definition of the above low-frequency feature information is described in detail below:

(1)上述归一化自相关系数用于描述信号在时域上的相关性。令x(n)为输入的音频信号，N为每帧信号的样点数，m为自相关函数的阶数(m＝1,2,…,M，M为最大自相关阶数)，则归一化自相关系数计算如下：(1) The above normalized autocorrelation coefficient is used to describe the correlation of the signal in the time domain. Let x(n) be the input audio signal, N be the number of samples per frame, and m be the order of the autocorrelation function (m = 1, 2, ..., M, where M is the maximum autocorrelation order). The normalized autocorrelation coefficient is calculated as follows:

(2)上述梯度指数用于区分音频信号的谐波和噪声特性，其定义为音频信号在每个变化方向上的梯度幅度之和，即：(2) The above gradient index is used to distinguish the harmonic and noise characteristics of the audio signal. It is defined as the sum of the gradient amplitudes of the audio signal in each change direction, that is:

其中，变量ψ(n)为信号变化方向的指示函数：Among them, the variable ψ(n) is the indicator function of the direction of signal change:

其中，sign(x)为符号函数，其定义为：Where sign(x) is the sign function, which is defined as:

其中，E为当前帧的输入信号的总能量：Where E is the total energy of the input signal of the current frame:

(3)上述子带谱平坦度用于区分子带内音频信号的音调和噪声特性。子带谱平坦度越大，该子带频谱中表现出越多的音调成分。反之，该子带频谱中表现出越多的噪声成分。其定义为每个低频PQMF子带内所有频谱(MDTC 谱系数)的几何平均与代数平均的比值。(3) The above sub-band spectrum flatness is used to distinguish the tone and noise characteristics of the audio signal in the sub-band. The greater the sub-band spectrum flatness, the more tone components are shown in the sub-band spectrum. Conversely, the more noise components are shown in the sub-band spectrum. It is defined as the ratio of the geometric mean to the algebraic mean of all spectra (MDTC spectrum coefficients) in each low-frequency PQMF sub-band.

下面结合上述低频特征信息的定义，以在具体的示例对上述低频特征信息进行进一步说明。The following combines the definition of the above low-frequency feature information to further illustrate the above low-frequency feature information in a specific example.

示例性的，音频处理装置可以获取上述P个低频子带信号中的每个低频子带信号的频谱系数，来计算每个低频子带信号的子带谱平坦度。Exemplarily, the audio processing device may obtain the frequency spectrum coefficient of each low-frequency sub-band signal in the P low-frequency sub-band signals to calculate the sub-band spectrum flatness of each low-frequency sub-band signal.

示例性的，上述第一音频信号的低频特征信息可以为上述每个音频信号帧的一组64维的特征矢量 Exemplarily, the low-frequency feature information of the first audio signal may be a set of 64-dimensional feature vectors of each audio signal frame.

示例5，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例。假设上述每个信号帧对应64个子带信号(第一子带信号)，音频处理装置可以获取其中的0-31个子带信号的频谱系数，并计算0-31 个子带信号中的每个子带信号的子带谱平坦度。Example 5, taking the generation of 96 kHz sampled high-definition audio from 48 kHz sampled full-band audio (i.e., the first audio signal) as an example. Assuming that each of the above signal frames corresponds to 64 sub-band signals (first sub-band signals), the audio processing device can obtain the spectral coefficients of 0-31 sub-band signals therein, and calculate the sub-band spectrum flatness of each of the 0-31 sub-band signals.

需要说明的是，在进行特征提取时，上述归一化自相关系数的最大自相关阶数可以设定为M＝31，本申请实施例中的特征维数的设置如表3所示。It should be noted that, when performing feature extraction, the maximum autocorrelation order of the above normalized autocorrelation coefficient can be set to M=31. The setting of the feature dimension in the embodiment of the present application is shown in Table 3.

表3特征名称和维数Table 3 Feature names and dimensions

进一步可选地，在本申请实施例中，结合上述步骤B1，上述步骤A2，包括如下步骤B2：Further optionally, in the embodiment of the present application, in combination with the above step B1, the above step A2 includes the following step B2:

步骤B2：将上述每个音频信号帧的低频特征信息输入预设神经网络模型，预测出每个音频信号帧的高频特征信息。Step B2: Input the low-frequency feature information of each audio signal frame into a preset neural network model to predict the high-frequency feature information of each audio signal frame.

示例性的，上述每个音频信号帧的高频特征信息可以为上述H个高频子带信号的信号增益。Exemplarily, the high-frequency feature information of each audio signal frame may be signal gains of the H high-frequency sub-band signals.

举例说明，假定上述M个高频子带信号中的第k个高频子带信号由上述低频子带信号中的第j个低频子带信号生成，则第k个高频子带的子带增益G[k] 定义为：For example, assuming that the kth high-frequency sub-band signal among the M high-frequency sub-band signals is generated by the jth low-frequency sub-band signal among the low-frequency sub-band signals, the sub-band gain G[k] of the kth high-frequency sub-band is defined as:

式(9)中En_k为第k个高频子带频谱系数总能量，En_j为低频第j个PQMF 子带MDCT谱系数总能量。In formula (9), En _k is the total energy of the k-th high-frequency sub-band spectrum coefficients, and En _j is the total energy of the j-th low-frequency PQMF sub-band MDCT spectrum coefficients.

需要说明的是，音频信号是有时序的“序列化”数据，前后信号是有关联的。为了能充分利用其上下文相关性，DNN神经网络(即，DNN模型)采用拼帧来考虑上下文相关信息对于当前帧的影响。具体来说，假定当前帧信号提取的特征参数矢量为拼帧时其向前后各选择m帧组成一个超帧特征向量作为DNN模型的输入，表示如下：It should be noted that audio signals are "serialized" data with time sequence, and the previous and next signals are related. In order to make full use of its contextual relevance, the DNN neural network (i.e., DNN model) uses frame splicing to consider the impact of contextual related information on the current frame. Specifically, assuming that the feature parameter vector extracted from the current frame signal is When splicing frames, it selects m frames forward and backward to form a superframe feature vector As the input of the DNN model, It is expressed as follows:

示例性的，为了充分利用音频信号的上下文相关性(即，多个连续的音频信号帧之间的相关性)，音频处理装置在得到每个音频信号帧的低频特征信息后，可以采取拼帧的策略，在DNN神经网络中输入多个音频信号帧。例如，拼帧时其向前和向后各选择3帧，包括当前帧特征在内一共7帧特征向量组成一个超帧特征向量作为DNN模型的输入，其维数为64*7＝448，即：For example, in order to make full use of the contextual relevance of the audio signal (i.e., the relevance between multiple consecutive audio signal frames), the audio processing device can adopt a frame splicing strategy after obtaining the low-frequency feature information of each audio signal frame, and input multiple audio signal frames into the DNN neural network. For example, when splicing frames, it selects 3 frames forward and backward respectively, and a total of 7 frames of feature vectors including the current frame feature form a superframe feature vector As the input of the DNN model, its dimension is 64*7=448, that is:

示例6，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例。假设每个音频信号帧对应64个子带信号(第一子带信号)，其中子带32-63为高频子带信号，在通过DNN神经网络对每个音频信号帧进行处理后，输出的高频子带信号的信号增益是一个32维的特征矢量，其数学表达式如下：Example 6, taking the generation of 96kHZ sampled high-definition audio from 48kHz sampled full-band audio (i.e., the first audio signal) as an example. Assuming that each audio signal frame corresponds to 64 sub-band signals (first sub-band signals), where sub-bands 32-63 are high-frequency sub-band signals, after processing each audio signal frame through the DNN neural network, the signal gain of the output high-frequency sub-band signal is is a 32-dimensional feature vector, and its mathematical expression is as follows:

示例性的，上述DNN神经网络的超参数设置如表4所示。Exemplarily, the hyperparameter settings of the above DNN neural network are shown in Table 4.

表4DNN神经模型的超参数Table 4 Hyperparameters of DNN neural model

进一步可选地，在本申请实施例中，上述步骤105包括如下步骤105a：Further optionally, in the embodiment of the present application, the above step 105 includes the following step 105a:

步骤105a：根据上述每个音频信号帧的高频特征信息，对上述每个音频信号帧中的H个高频子带信号进行频谱调整，得到H个目标高频子带信号。Step 105a: According to the high-frequency feature information of each audio signal frame, spectrum adjustment is performed on the H high-frequency sub-band signals in each audio signal frame to obtain H target high-frequency sub-band signals.

其中，上述M个目标高频子带信号包括上述每个音频信号帧的H个目标高频子带信号。Among them, the above-mentioned M target high-frequency sub-band signals include the above-mentioned H target high-frequency sub-band signals of each audio signal frame.

示例性的，令高频生成器得到的H个高频子带信号中的第k个高频子带信号为其总能量为令包络预测器得到的第k个高频子带增益为 G[k]，包络调节器得到的第k个重建高频子带信号(即，目标高频子带信号) 为X[k][m]，则有：Exemplarily, the kth high frequency sub-band signal among the H high frequency sub-band signals obtained by the high frequency generator is Its total energy is Let the kth high-frequency subband gain obtained by the envelope predictor be G[k], and the kth reconstructed high-frequency subband signal (i.e., the target high-frequency subband signal) obtained by the envelope modulator be X[k][m], then:

其中N为PQMF子带一帧MDCT系数的帧长，k_l和k_h分别为高频PQMF 子带的起始索引和结束索引。Where N is the frame length of one frame of MDCT coefficients of the PQMF subband, k _l and k _h are the start index and end index of the high-frequency PQMF subband respectively.

示例7，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例。假设每个音频信号帧对应64个子带信号(第一子带信号)，其中子带32-63为高频子带信号，令高频生成器得到的第k个高频子带信号为其总能量为令包络预测器得到的第k个高频子带增益为G[k]，包络调节器得到的第k个重建高频子带信号为X[k][m]，则有：Example 7, taking the generation of 96 kHz sampled high-definition audio from 48 kHz sampled full-band audio (i.e., the first audio signal) as an example. Assume that each audio signal frame corresponds to 64 sub-band signals (the first sub-band signal), where sub-bands 32-63 are high-frequency sub-band signals, and the kth high-frequency sub-band signal obtained by the high-frequency generator is Its total energy is Let the kth high-frequency sub-band gain obtained by the envelope predictor be G[k], and the kth reconstructed high-frequency sub-band signal obtained by the envelope modulator be X[k][m], then:

进一步可选的，在本申请实施例中，在对上述处理后的第二音频信号进行分帧后，相邻两帧边界处的音频信号可能产生较大的幅度差异，从而导致音频信号不连续，进而产生噪声。为了消除这种噪声，可以通过对上述X个音频信号帧进行去噪处理。Further optionally, in the embodiment of the present application, after the processed second audio signal is framed, the audio signal at the boundary of two adjacent frames may have a large amplitude difference, thereby causing the audio signal to be discontinuous, thereby generating noise. In order to eliminate such noise, denoising processing may be performed on the X audio signal frames.

示例性的，上述步骤103a之后，本申请实施例提供的信号处理方法还包括如下步骤C1：Exemplarily, after the above step 103a, the signal processing method provided in the embodiment of the present application further includes the following step C1:

步骤C1：对X个音频信号帧中的相邻两个音频信号帧中的N个第一子带信号进行信号处理，得到处理后的N个第一子带信号。Step C1: performing signal processing on N first sub-band signals in two adjacent audio signal frames in the X audio signal frames to obtain processed N first sub-band signals.

示例性的，上述处理后的第一子带信号包括上述每个音频信号帧中的低频子带信号。Exemplarily, the processed first subband signal includes a low-frequency subband signal in each audio signal frame.

示例性的，上述信号处理可以包括MDCT变换。进一步地，在进行MDCT 变换的情况下，可以依次获取上述相邻两个音频信号帧中频段相同的两个第一子带信号，然后，对上述两个第一子带信号进行加窗和MDCT变换，得到具备MDCT谱系数(即，频谱)的一个第一子带信号。Exemplarily, the signal processing may include MDCT transformation. Further, in the case of MDCT transformation, two first sub-band signals with the same frequency band in the two adjacent audio signal frames may be obtained in sequence, and then the two first sub-band signals may be windowed and MDCT transformed to obtain a first sub-band signal with MDCT spectrum coefficients (i.e., frequency spectrum).

为了便于后续描述，将上述相邻两个音频信号帧中频段相同的两个第一子带信号记为相关的两个子带信号。For the convenience of subsequent description, the two first sub-band signals with the same frequency band in the two adjacent audio signal frames are recorded as two related sub-band signals.

进一步地，上述每个子带信号包括N个样本点数，在进行MDCT变换的情况下，将第一音频信号帧的输入序列(即，x(n))和第一音频信号帧的输入序列的N个样本点组合构成2N个样本点，再对2N个样本点的信号进行加窗，然后对加窗后的信号进行MDCT变换得到N个样本点的MDCT谱系数。Furthermore, each of the above subband signals includes N sample points. When performing MDCT transformation, the input sequence of the first audio signal frame (i.e., x(n)) and the N sample points of the input sequence of the first audio signal frame are combined to form 2N sample points, and then the signal of the 2N sample points is windowed, and then the windowed signal is transformed by MDCT to obtain MDCT spectrum coefficients of the N sample points.

MDCT的表达式如下：The expression of MDCT is as follows:

示例性的，对信号进行加窗时，窗函数选择正弦窗，其定义为：Exemplarily, when windowing a signal, the window function selects a sine window, which is defined as:

示例8，以通过48kHz采样的全带音频(即，第一音频信号)生成96kHZ 采样的高清音频为例。假设每个音频信号帧对应64个子带信号(第一子带信号)，其中，每个子带信号包括32个样本点，在对上述相关的两个子带信号进行加窗和MDCT变换后，每个子带信号得到32个样本点的MDCT谱系数，记为X_l[k][m]，其中k表示子带序号，其范围为0≤k≤63，m表示MDCT谱序号，其范围为0≤m≤31，l表示音频信号帧序号。Example 8, taking the generation of 96 kHz sampled high-definition audio through 48 kHz sampled full-band audio (i.e., the first audio signal) as an example. Assume that each audio signal frame corresponds to 64 sub-band signals (first sub-band signals), wherein each sub-band signal includes 32 sample points, and after windowing and MDCT transformation of the above two related sub-band signals, each sub-band signal obtains 32 sample points of MDCT spectrum coefficients, denoted as X _l [k][m], wherein k represents the sub-band number, and its range is 0≤k≤63, m represents the MDCT spectrum number, and its range is 0≤m≤31, and l represents the audio signal frame number.

进一步可选的，在本申请实施例中，结合上述步骤103a，上述步骤106，包括如下步骤106a和步骤106b：Further optionally, in the embodiment of the present application, in combination with the above step 103a, the above step 106 includes the following steps 106a and 106b:

步骤106a：对每个音频信号帧中的H个目标高频子带信号进行合成，得到每个音频信号帧对应的第四音频信号。Step 106a: synthesize the H target high frequency sub-band signals in each audio signal frame to obtain a fourth audio signal corresponding to each audio signal frame.

步骤106b：将每个音频信号帧对应的第四音频信号进行合成，得到目标音频信号。Step 106b: synthesize the fourth audio signal corresponding to each audio signal frame to obtain a target audio signal.

示例性的，音频处理装置可以通过上采样和滤波处理，将每个音频信号帧中的H个目标高频子带信号进行合成，得到每个音频信号帧对应的第四音频信号。Exemplarily, the audio processing device may synthesize H target high frequency sub-band signals in each audio signal frame through up-sampling and filtering to obtain a fourth audio signal corresponding to each audio signal frame.

进一步地，音频处理装置在对上述每个音频信号帧中的H个目标高频子带信号进行合成的情况下，首先对各个子带信号进行N倍上采样，然后通过 PQMF合成滤波器组把上采样后的子带信号转换为时域信号。Furthermore, when synthesizing the H target high-frequency sub-band signals in each of the above audio signal frames, the audio processing device first upsamples each sub-band signal by N times, and then converts the up-sampled sub-band signal into a time domain signal through a PQMF synthesis filter group.

本申请实施例使用的PQMF合成滤波器组的数学表达式已在上文进行说明，此处不再赘述。The mathematical expression of the PQMF synthesis filter group used in the embodiment of the present application has been explained above and will not be repeated here.

进一步可选的，在本申请实施例中，在对上述N个第一子带信号进行 MDCT变换的情况下，音频处理装置可以对频谱调整后的H个高频子带信号进行MDCT逆变换(即，IMDCT)变换，以恢复每个音频信号帧中的子带信号。Further optionally, in an embodiment of the present application, when performing MDCT transformation on the above-mentioned N first sub-band signals, the audio processing device can perform MDCT inverse transformation (i.e., IMDCT) on the H high-frequency sub-band signals after spectrum adjustment to restore the sub-band signal in each audio signal frame.

结合上述步骤103a和步骤C1，上述步骤105a中对上述每个音频信号帧中的H个高频子带信号进行频谱调整之后，本申请实施例提供的音频信号处理方法还包括如下步骤D1：In combination with the above step 103a and step C1, after the spectrum of the H high frequency sub-band signals in each audio signal frame is adjusted in the above step 105a, the audio signal processing method provided in the embodiment of the present application further includes the following step D1:

步骤D1：对进行频谱调整后的H个高频子带信号进行IMDCT变换，得到上述每个高频子带信号对应的子带重建信号。Step D1: Perform IMDCT transformation on the H high-frequency sub-band signals after spectrum adjustment to obtain a sub-band reconstruction signal corresponding to each of the above high-frequency sub-band signals.

其中，上述H个目标高频子带信号包括上述子带重建信号。The H target high-frequency sub-band signals include the sub-band reconstructed signals.

示例性的，音频处理装置在对上述处理后的第一子带信号进行IDMT变换的情况下，对每个子带的MDCT谱系数执行IMDCT变换和重叠-相加操作，得到当前第l帧的N个子带重建信号x′_l[k][n]，其中k表示子带序号，其范围为 0≤k≤63，n表示每个子带内时序样点的时序序号，其范围为0≤n≤31，l表示音频信号帧序号。Exemplarily, the audio processing device performs IMDCT transform and overlap-add operation on the MDCT spectrum coefficients of each subband when performing IDMT transform on the first subband signal after the above processing, to obtain N subband reconstructed signals x′ _l [k][n] of the current lth frame, where k represents the subband number, and its range is 0≤k≤63, n represents the time sequence number of the time sequence sample point in each subband, and its range is 0≤n≤31, and l represents the audio signal frame number.

IMDCT的表达式如下：The expression of IMDCT is as follows:

其中，w(n)为窗函数。对IMDCT变换后的输出信号执行Overlap-add 操作，得到当前帧l的子带重建信号x′_l(n)，即：Where w(n) is the window function. The output signal after IMDCT transformation Perform Overlap-add operation to obtain the subband reconstructed signal x′ _l (n) of the current frame l, that is:

需要说明的是，本申请实施例提供的音频处理方法的整体流程框图如图8 所示。It should be noted that the overall flow chart of the audio processing method provided in the embodiment of the present application is shown in FIG8 .

需要说明的是，本申请实施例提供的音频处理方法，执行主体可以为音频处理装置，或者该音频处理装置中的用于执行音频处理方法的控制模块。本申请实施例中以音频处理装置执行音频处理方法为例，说明本申请实施例提供的音频处理装置。It should be noted that the audio processing method provided in the embodiment of the present application can be executed by an audio processing device or a control module in the audio processing device for executing the audio processing method. In the embodiment of the present application, the audio processing device provided in the embodiment of the present application is described by taking the audio processing device executing the audio processing method as an example.

本申请实施例提供一种音频处理装置，如图9所示，该装置包括：处理模块801，生成模块802和合成模块803，其中：The embodiment of the present application provides an audio processing device, as shown in FIG9 , the device includes: a processing module 801, a generating module 802 and a synthesizing module 803, wherein:

上述处理模块801，用于对第一音频信号进行分辨率提升处理，得到第二音频信号；上述处理模块801，还用于对上述第二音频信号进行低通滤波处理，得到处理后的第二音频信号；上述处理模块801，还用于对上述处理后的第二音频信号进行滤波处理和下采样处理，得到Y个带宽相同的第一子带信号；上述生成模块802，用于根据上述处理模块801得到的上述Y个第一子带信号中的低频子带信号，生成M个高频子带信号；上述处理模块801，还用于基于上述第一音频信号的高频特征信息，对上述生成模块802生成的M个高频子带信号进行频谱调整，得到M个目标高频子带信号；上述合成模块803，用于将上述处理模块801得到的M个目标高频子带信号进行合成，得到目标音频信号；其中，Y、M为正整数。The processing module 801 is used to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processing module 801 is also used to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal; the processing module 801 is also used to perform filtering processing and down-sampling processing on the processed second audio signal to obtain Y first sub-band signals with the same bandwidth; the generating module 802 is used to generate M high-frequency sub-band signals according to the low-frequency sub-band signals in the Y first sub-band signals obtained by the processing module 801; the processing module 801 is also used to perform spectrum adjustment on the M high-frequency sub-band signals generated by the generating module 802 based on the high-frequency feature information of the first audio signal to obtain M target high-frequency sub-band signals; the synthesizing module 803 is used to synthesize the M target high-frequency sub-band signals obtained by the processing module 801 to obtain a target audio signal; wherein Y and M are positive integers.

在本申请实施例提供的音频处理装置中，电子设备可以对低分辨率的第一音频信号(如，宽带/全带非语音信号)进行分辨率提升处理得到高分辨率的第二音频信号，并对第二音频信号进行低通滤波处理，从而滤除第二音频信号中的高频信号，然后，对处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号，并根据上述Y个第一子带信号中的低频子带信号，生成M 个高频子带信号，最后，基于低分辨率的第一音频信号的高频频谱信息，对上述M个高频子带信号进行频谱调整，得到M个目标高频子带信号，并将上述 M个目标高频子带信号进行合成，得到高频部分的谐波特性得到良好重建的第一音频信号，如此，便可得到具有高保真和高表现力的高清音频信号，从而提升了非语音信号的播放效果。In the audio processing device provided in the embodiment of the present application, the electronic device can perform resolution enhancement processing on a low-resolution first audio signal (such as a broadband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering on the second audio signal to filter out the high-frequency signal in the second audio signal. Then, the processed second audio signal is subjected to signal processing to obtain Y first sub-band signals with the same bandwidth, and M high-frequency sub-band signals are generated based on the low-frequency sub-band signals in the above Y first sub-band signals. Finally, based on the high-frequency spectrum information of the low-resolution first audio signal, the above M high-frequency sub-band signals are spectrally adjusted to obtain M target high-frequency sub-band signals, and the above M target high-frequency sub-band signals are synthesized to obtain a first audio signal whose harmonic characteristics of the high-frequency part are well reconstructed. In this way, a high-definition audio signal with high fidelity and high expressiveness can be obtained, thereby improving the playback effect of the non-speech signal.

可选地，在本申请实施例中，上述生成模块802，具体用于对上述Y个子带信号中的所有低频子带信号进行频谱复制，生成M个高频子带信号，其中，一个低频子带信号对应至少一个高频子带信号，Y小于或等于M。Optionally, in an embodiment of the present application, the above-mentioned generation module 802 is specifically used to perform spectrum replication on all low-frequency sub-band signals in the above-mentioned Y sub-band signals to generate M high-frequency sub-band signals, wherein one low-frequency sub-band signal corresponds to at least one high-frequency sub-band signal, and Y is less than or equal to M.

可选地，在本申请实施例中，上述音频处理装置还包括：提取模块804和预测模块805；Optionally, in the embodiment of the present application, the audio processing device further includes: an extraction module 804 and a prediction module 805;

上述提取模块804，用于对上述第一音频信号进行特征提取，得到上述第一音频信号的低频特征信息；上述预测模块805，用于上述提取模块提取的上述低频特征信息输入预设神经网络模型，预测出上述第一音频信号的高频特征信息。The extraction module 804 is used to extract features from the first audio signal to obtain low-frequency feature information of the first audio signal; the prediction module 805 is used to input the low-frequency feature information extracted by the extraction module into a preset neural network model to predict high-frequency feature information of the first audio signal.

可选地，在本申请实施例中，上述处理模块801，具体用于对上述第一音频信号进行L倍上采样，得到预定采样率的第二音频信号，上述第一音频信号与上述第二音频信号的带宽相同。Optionally, in an embodiment of the present application, the processing module 801 is specifically used to upsample the first audio signal by L times to obtain a second audio signal with a predetermined sampling rate, and the bandwidth of the first audio signal is the same as that of the second audio signal.

可选地，在本申请实施例中，上述处理模块801，还用于对上述第二音频信号的低频分量进行分帧，得到X个音频信号帧，每个音频信号帧包括预定数量的样本点；上述处理模块801，具体用于依次对每个音频信号帧进行滤波和下采样处理，得到每个音频信号帧对应的N个第一子带信号；其中，上述Y 个第一子带信号包括：每个音频信号帧对应的N个第一子带信号。Optionally, in the embodiment of the present application, the processing module 801 is further used to frame the low-frequency component of the second audio signal to obtain X audio signal frames, each audio signal frame including a predetermined number of sample points; the processing module 801 is specifically used to filter and down-sample each audio signal frame in turn to obtain N first sub-band signals corresponding to each audio signal frame; wherein the Y first sub-band signals include: N first sub-band signals corresponding to each audio signal frame.

可选地，在本申请实施例中，上述处理模块801，具体用于对第一音频信号帧的N个第一子带信号和第二音频信号帧中的N个第一子带信号进行信号处理，得到处理后的N个第一子带信号；其中，上述第一音频信号帧和上述第二音频信号帧为上述X个音频信号帧中的相邻音频信号帧。Optionally, in the embodiment of the present application, the processing module 801 is specifically configured to perform signal processing on N first subband signals of the first audio signal frame and N first subband signals in the second audio signal frame to obtain processed N first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames in the X audio signal frames.

本申请实施例中的音频处理装置可以是装置，也可以是终端中的部件、集成电路、或芯片。该装置可以是移动电子设备，也可以为非移动电子设备。示例性的，移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer，UMPC)、上网本或者个人数字助理(personal digital assistant，PDA)等，非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage，NAS)、个人计算机(personal computer，PC)、电视机(television，TV)、柜员机或者自助机等，本申请实施例不作具体限定。The audio processing device in the embodiment of the present application can be a device, or a component, an integrated circuit, or a chip in a terminal. The device can be a mobile electronic device or a non-mobile electronic device. For example, the mobile electronic device can be a mobile phone, a tablet computer, a laptop computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (PDA), etc. The non-mobile electronic device can be a server, a network attached storage (NAS), a personal computer (PC), a television (TV), a teller machine or a self-service machine, etc., which is not specifically limited in the embodiment of the present application.

本申请实施例中的音频处理装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统，可以为ios操作系统，还可以为其他可能的操作系统，本申请实施例不作具体限定。The audio processing device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, which are not specifically limited in the embodiment of the present application.

本申请实施例提供的音频处理装置能够实现图1至图8的方法实施例实现的各个过程，为避免重复，这里不再赘述。The audio processing device provided in the embodiment of the present application can implement each process implemented by the method embodiments of Figures 1 to 8. To avoid repetition, they will not be described here.

可选的，如图10所示，本申请实施例还提供一种电子设备900，包括处理器901，存储器902，存储在存储器902上并可在所述处理器901上运行的程序或指令，该程序或指令被处理器901执行时实现上述音频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。Optionally, as shown in FIG10 , an embodiment of the present application further provides an electronic device 900, including a processor 901, a memory 902, and a program or instruction stored in the memory 902 and executable on the processor 901. When the program or instruction is executed by the processor 901, each process of the above-mentioned audio processing method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be described here.

需要说明的是，本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and non-mobile electronic devices mentioned above.

图11为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 11 is a schematic diagram of the hardware structure of an electronic device implementing an embodiment of the present application.

该电子设备100包括但不限于：射频单元101、网络模块102、音频输出单元103、输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、以及处理器110等部件。The electronic device 100 includes but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110 and other components.

本领域技术人员可以理解，电子设备100还可以包括给各个部件供电的电源(比如电池)，电源可以通过电源管理系统与处理器110逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图11中示出的电子设备结构并不构成对电子设备的限定，电子设备可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置，在此不再赘述。Those skilled in the art will appreciate that the electronic device 100 may also include a power source (such as a battery) for supplying power to each component, and the power source may be logically connected to the processor 110 through a power management system, so that the power management system can manage charging, discharging, and power consumption management. The electronic device structure shown in FIG11 does not constitute a limitation on the electronic device, and the electronic device may include more or less components than shown in the figure, or combine certain components, or arrange components differently, which will not be described in detail here.

其中，上述处理器110，用于对第一音频信号进行分辨率提升处理，得到第二音频信号；上述处理器110，还用于对上述第二音频信号进行低通滤波处理，得到处理后的第二音频信号，对上述处理后的第二音频信号进行滤波处理和下采样处理，得到Y个带宽相同的第一子带信号，根据上述Y个第一子带信号中的低频子带信号，生成M个高频子带信号；上述处理器110，还用于基于上述第一音频信号的高频特征信息，对上述生成的M个高频子带信号进行频谱调整，得到M个目标高频子带信号，将上述M个目标高频子带信号进行合成，得到目标音频信号；其中，Y、M为正整数。The processor 110 is used to perform resolution enhancement processing on the first audio signal to obtain a second audio signal; the processor 110 is also used to perform low-pass filtering processing on the second audio signal to obtain a processed second audio signal, filter and downsample the processed second audio signal to obtain Y first sub-band signals with the same bandwidth, and generate M high-frequency sub-band signals according to the low-frequency sub-band signals in the Y first sub-band signals; the processor 110 is also used to perform spectrum adjustment on the M high-frequency sub-band signals generated based on high-frequency feature information of the first audio signal to obtain M target high-frequency sub-band signals, and synthesize the M target high-frequency sub-band signals to obtain a target audio signal; wherein Y and M are positive integers.

在本申请实施例提供的电子设备中，电子设备可以对低分辨率的第一音频信号(如，宽带/全带非语音信号)进行分辨率提升处理得到高分辨率的第二音频信号，并对第二音频信号进行低通滤波处理，从而滤除第二音频信号中的高频信号，然后，对处理后的第二音频信号进行信号处理，得到Y个带宽相同的第一子带信号，并根据上述Y个第一子带信号中的低频子带信号，生成M个高频子带信号，最后，基于低分辨率的第一音频信号的高频频谱信息，对上述 M个高频子带信号进行频谱调整，得到M个目标高频子带信号，并将上述M 个目标高频子带信号进行合成，得到高频部分的谐波特性得到良好重建的第一音频信号，如此，便可得到具有高保真和高表现力的高清音频信号，从而提升了非语音信号的播放效果。In the electronic device provided in the embodiment of the present application, the electronic device can perform resolution enhancement processing on a low-resolution first audio signal (such as a broadband/full-band non-speech signal) to obtain a high-resolution second audio signal, and perform low-pass filtering on the second audio signal to filter out the high-frequency signal in the second audio signal. Then, the processed second audio signal is subjected to signal processing to obtain Y first sub-band signals with the same bandwidth, and M high-frequency sub-band signals are generated based on the low-frequency sub-band signals in the above Y first sub-band signals. Finally, based on the high-frequency spectrum information of the low-resolution first audio signal, the above M high-frequency sub-band signals are spectrally adjusted to obtain M target high-frequency sub-band signals, and the above M target high-frequency sub-band signals are synthesized to obtain a first audio signal whose harmonic characteristics of the high-frequency part are well reconstructed. In this way, a high-definition audio signal with high fidelity and high expressiveness can be obtained, thereby improving the playback effect of the non-speech signal.

可选地，在本申请实施例中，上述处理器110，具体用于对上述Y个子带信号中的所有低频子带信号进行频谱复制，生成M个高频子带信号，其中，一个低频子带信号对应至少一个高频子带信号，Y小于或等于M。Optionally, in an embodiment of the present application, the processor 110 is specifically used to perform spectrum replication on all low-frequency sub-band signals in the Y sub-band signals to generate M high-frequency sub-band signals, wherein one low-frequency sub-band signal corresponds to at least one high-frequency sub-band signal, and Y is less than or equal to M.

可选地，在本申请实施例中，上述处理器110，还用于对上述第一音频信号进行特征提取，得到上述第一音频信号的低频特征信息，将上述低频特征信息输入预设神经网络模型，预测出上述第一音频信号的高频特征信息。Optionally, in an embodiment of the present application, the processor 110 is further used to perform feature extraction on the first audio signal to obtain low-frequency feature information of the first audio signal, input the low-frequency feature information into a preset neural network model, and predict high-frequency feature information of the first audio signal.

可选地，在本申请实施例中，上述处理器110，具体用于对上述第一音频信号进行L倍上采样，得到预定采样率的第二音频信号，上述第一音频信号与上述第二音频信号的带宽相同。Optionally, in an embodiment of the present application, the processor 110 is specifically configured to upsample the first audio signal by L times to obtain a second audio signal of a predetermined sampling rate, and the first audio signal has the same bandwidth as the second audio signal.

可选地，在本申请实施例中，上述处理器110，还用于对上述第二音频信号的低频分量进行分帧，得到X个音频信号帧，每个音频信号帧包括预定数量的样本点，依次对每个音频信号帧进行滤波和下采样处理，得到每个音频信号帧对应的N个第一子带信号；其中，上述Y个第一子带信号包括：每个音频信号帧对应的N个第一子带信号。Optionally, in the embodiment of the present application, the processor 110 is further configured to frame the low-frequency component of the second audio signal to obtain X audio signal frames, each audio signal frame including a predetermined number of sample points, and sequentially filter and downsample each audio signal frame to obtain N first sub-band signals corresponding to each audio signal frame; wherein the Y first sub-band signals include: N first sub-band signals corresponding to each audio signal frame.

可选地，在本申请实施例中，上述处理器110，具体用于对第一音频信号帧的N个第一子带信号和第二音频信号帧中的N个第一子带信号进行信号处理，得到处理后的N个第一子带信号；其中，上述第一音频信号帧和上述第二音频信号帧为上述X个音频信号帧中的相邻音频信号帧。Optionally, in the embodiment of the present application, the processor 110 is specifically configured to perform signal processing on N first subband signals of the first audio signal frame and N first subband signals in the second audio signal frame to obtain processed N first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent audio signal frames in the X audio signal frames.

应理解的是，本申请实施例中，输入单元104可以包括图形处理器 (GraphicsProcessing Unit，GPU)1041和麦克风1042，图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元106可包括显示面板1061，可以采用液晶显示器、有机发光二极管等形式来配置显示面板1061。用户输入单元107 包括触控面板1071以及其他输入设备1072。触控面板1071，也称为触摸屏。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆，在此不再赘述。存储器109可用于存储软件程序以及各种数据，包括但不限于应用程序和操作系统。处理器110可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器110中。It should be understood that in the embodiment of the present application, the input unit 104 may include a graphics processing unit (GPU) 1041 and a microphone 1042, and the graphics processor 1041 processes the image data of the static picture or video obtained by the image capture device (such as a camera) in the video capture mode or the image capture mode. The display unit 106 may include a display panel 1061, and the display panel 1061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc. The user input unit 107 includes a touch panel 1071 and other input devices 1072. The touch panel 1071 is also called a touch screen. The touch panel 1071 may include two parts: a touch detection device and a touch controller. Other input devices 1072 may include but are not limited to a physical keyboard, function keys (such as a volume control button, a switch button, etc.), a trackball, a mouse, and a joystick, which will not be repeated here. The memory 109 can be used to store software programs and various data, including but not limited to application programs and operating systems. The processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communications. It is understandable that the modem processor may not be integrated into the processor 110.

本申请实施例还提供一种可读存储介质，所述可读存储介质上存储有程序或指令，该程序或指令被处理器执行时实现上述音频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present application also provides a readable storage medium, on which a program or instruction is stored. When the program or instruction is executed by a processor, the various processes of the above-mentioned audio processing method embodiment are implemented and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.

其中，所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质，包括计算机可读存储介质，如计算机只读存储器(Read-Only Memory，ROM)、随机存取存储器(Random Access Memory，RAM)、磁碟或者光盘等。The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.

本申请实施例另提供了一种芯片，所述芯片包括处理器和通信接口，所述通信接口和所述处理器耦合，所述处理器用于运行程序或指令，实现上述音频处理方法实施例的各个过程，且能达到相同的技术效果，为避免重复，这里不再赘述。The embodiment of the present application further provides a chip, which includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned audio processing method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.

应理解，本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.

本申请实施例提供了一种计算机程序产品，该程序产品被存储在非易失的存储介质中，该程序产品被至少一个处理器执行以实现如第一方面所述的方法。An embodiment of the present application provides a computer program product, which is stored in a non-volatile storage medium and is executed by at least one processor to implement the method as described in the first aspect.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外，需要指出的是，本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能，还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能，例如，可以按不同于所描述的次序来执行所描述的方法，并且还可以添加、省去、或组合各种步骤。另外，参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this article, the terms "comprise", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the presence of other identical elements in the process, method, article or device including the element. In addition, it should be noted that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved, for example, the described method may be performed in an order different from that described, and various steps may also be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, those skilled in the art can clearly understand that the above implementation methods can be implemented by means of software plus a necessary general hardware platform, or by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present application, or the part that contributes to the prior art, can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), and includes several instructions for a terminal (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in each embodiment of the present application.

上面结合附图对本申请的实施例进行了描述，但是本申请并不局限于上述的具体实施方式，上述的具体实施方式仅仅是示意性的，而不是限制性的，本领域的普通技术人员在本申请的启示下，在不脱离本申请宗旨和权利要求所保护的范围情况下，还可做出很多形式，均属于本申请的保护之内。The embodiments of the present application are described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific implementation methods. The above-mentioned specific implementation methods are merely illustrative and not restrictive. Under the guidance of the present application, ordinary technicians in this field can also make many forms without departing from the scope of protection of the purpose of the present application and the claims, all of which are within the protection of the present application.

Claims

1. A method of audio processing, the method comprising:

performing resolution enhancement processing on the first audio signal to obtain a second audio signal;

performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal;

filtering the processed second audio signal to obtain Y subband signals with the same bandwidth, and performing downsampling on the Y subband signals with the same bandwidth to obtain Y first subband signals with the same bandwidth;

generating M high-frequency subband signals according to low-frequency subband signals in Y first subband signals in a spectrum copying or spectrum inversion mode, wherein each low-frequency subband signal in Y first subband signals corresponds to one or more high-frequency subband signals;

Performing frequency spectrum adjustment on the M high-frequency subband signals based on the high-frequency characteristic information of the first audio signal to obtain M target high-frequency subband signals, wherein the high-frequency characteristic information is the signal gain of the M high-frequency subband signals, and the signal gain is determined by the normalized autocorrelation coefficient, gradient index and subband spectrum flatness of the first audio signal;

synthesizing the M target high-frequency subband signals to obtain target audio signals;

wherein the first audio signal comprises at least one of: broadband audio, ultra-broadband audio, full-band audio, Y, M is a positive integer.

2. The method according to claim 1, wherein generating M high frequency subband signals from the low frequency subband signals in the Y first subband signals by spectral replication or spectral inversion comprises:

and performing spectrum copying on all low-frequency subband signals in the Y first subband signals to generate M high-frequency subband signals, wherein one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is smaller than or equal to M.

3. The method of claim 1, wherein the performing spectral adjustment on the M high frequency subband signals based on the high frequency characteristic information of the first audio signal, before obtaining M target high frequency subband signals, further comprises:

Extracting the characteristics of the first audio signal to obtain low-frequency characteristic information of the first audio signal;

inputting the low-frequency characteristic information into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.

4. The method of claim 1, wherein performing resolution enhancement processing on the first audio signal to obtain a second audio signal comprises:

and carrying out L times up-sampling on the first audio signal to obtain a second audio signal with a preset sampling rate, wherein the bandwidth of the first audio signal is the same as that of the second audio signal.

5. The method of claim 1, wherein filtering the processed second audio signal to obtain Y subband signals with the same bandwidth, and downsampling the Y subband signals with the same bandwidth to obtain Y first subband signals with the same bandwidth, comprises:

framing the low-frequency component of the second audio signal to obtain X audio signal frames, wherein each audio signal frame comprises a preset number of sample points;

sequentially carrying out filtering and downsampling on each audio signal frame to obtain N first sub-band signals corresponding to each audio signal frame;

Wherein the Y first subband signals include: n first sub-band signals corresponding to each audio signal frame.

6. The method of claim 5, wherein after framing the low frequency component of the second audio signal to obtain X audio signal frames, the method further comprises:

performing signal processing on N first sub-band signals of the first audio signal frame and N first sub-band signals in the second audio signal frame to obtain N processed first sub-band signals; wherein the first audio signal frame and the second audio signal frame are adjacent ones of the X audio signal frames.

7. An audio processing apparatus, the apparatus comprising: the device comprises a processing module, a generating module and a synthesizing module, wherein:

the processing module is used for carrying out resolution improvement processing on the first audio signal to obtain a second audio signal;

the processing module is further used for performing low-pass filtering processing on the second audio signal to obtain a processed second audio signal;

the processing module is further configured to perform filtering processing on the processed second audio signal to obtain Y subband signals with the same bandwidth, and perform downsampling processing on the Y subband signals with the same bandwidth to obtain Y first subband signals with the same bandwidth;

The generating module is configured to generate M high-frequency subband signals according to the low-frequency subband signals in the Y first subband signals obtained by the processing module, where each low-frequency subband signal in the Y first subband signals corresponds to one or more high-frequency subband signals in a spectrum duplication or spectrum inversion manner;

the processing module is further configured to perform spectrum adjustment on the M high-frequency subband signals generated by the generating module based on the high-frequency characteristic information of the first audio signal, to obtain M target high-frequency subband signals, where the high-frequency characteristic information is signal gains of the M high-frequency subband signals, and the signal gains are determined by a normalized autocorrelation coefficient, a gradient index and subband spectrum flatness of the first audio signal;

the synthesizing module is used for synthesizing the M target high-frequency subband signals obtained by the processing module to obtain target audio signals;

8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the generating module is specifically configured to perform spectrum duplication on all low-frequency subband signals in the Y first subband signals to generate M high-frequency subband signals, where one low-frequency subband signal corresponds to at least one high-frequency subband signal, and Y is smaller than or equal to M.

9. The apparatus of claim 7, wherein the audio processing apparatus further comprises: the device comprises an extraction module and a prediction module;

the extraction module is used for extracting the characteristics of the first audio signal to obtain low-frequency characteristic information of the first audio signal;

the prediction module is used for inputting the low-frequency characteristic information extracted by the extraction module into a preset neural network model to predict the high-frequency characteristic information of the first audio signal.

10. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the processing module is specifically configured to perform L times up-sampling on the first audio signal to obtain a second audio signal with a predetermined sampling rate, where the bandwidth of the first audio signal is the same as that of the second audio signal.

11. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the processing module is further configured to frame the low-frequency component of the second audio signal to obtain X audio signal frames, where each audio signal frame includes a predetermined number of sample points;

the processing module is specifically configured to sequentially perform filtering and downsampling processing on each audio signal frame to obtain N first subband signals corresponding to each audio signal frame;

12. The apparatus of claim 11, wherein the device comprises a plurality of sensors,

the processing module is specifically configured to perform signal processing on N first subband signals of the first audio signal frame and N first subband signals in the second audio signal frame, so as to obtain N processed first subband signals; wherein the first audio signal frame and the second audio signal frame are adjacent ones of the X audio signal frames.

13. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the audio processing method according to any of claims 1-6.

14. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the audio processing method according to any of claims 1-6.