[go: up one dir, main page]

CN114333861B - Audio processing method, device, storage medium, equipment and product - Google Patents

Audio processing method, device, storage medium, equipment and product Download PDF

Info

Publication number
CN114333861B
CN114333861B CN202111371005.1A CN202111371005A CN114333861B CN 114333861 B CN114333861 B CN 114333861B CN 202111371005 A CN202111371005 A CN 202111371005A CN 114333861 B CN114333861 B CN 114333861B
Authority
CN
China
Prior art keywords
frequency signal
frequency
spectrum
audio
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111371005.1A
Other languages
Chinese (zh)
Other versions
CN114333861A (en
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111371005.1A priority Critical patent/CN114333861B/en
Publication of CN114333861A publication Critical patent/CN114333861A/en
Application granted granted Critical
Publication of CN114333861B publication Critical patent/CN114333861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请公开了一种音频处理方法、装置、存储介质、设备及产品,涉及互联网技术领域,本申请可应用于地图车联网、区块链、人工智能等领域,该方法包括:接收高频信号的频谱特征参数及低频信号的低频编码数据,所述高频信号及所述低频信号属于目标音频信号;对所述低频编码数据进行解码处理,生成解码低频信号;基于所述频谱特征参数进行预测网络匹配处理,得到所述频谱特征参数匹配的音频预测网络;基于所述音频预测网络与所述解码低频信号进行音频预测处理,以生成所述高频信号对应的预测高频信号;根据所述预测高频信号与所述解码低频信号,生成所述目标音频信号对应的音频输出信号。本申请可以降低音频数据的传输带宽且保证音频播放效果。

The present application discloses an audio processing method, device, storage medium, equipment and product, which relates to the field of Internet technology. The present application can be applied to the fields of map vehicle networking, blockchain, artificial intelligence, etc. The method includes: receiving the spectrum characteristic parameters of the high-frequency signal and the low-frequency coded data of the low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to the target audio signal; decoding the low-frequency coded data to generate a decoded low-frequency signal; performing a prediction network matching process based on the spectrum characteristic parameters to obtain an audio prediction network matched with the spectrum characteristic parameters; performing an audio prediction process based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal. The present application can reduce the transmission bandwidth of audio data and ensure the audio playback effect.

Description

Audio processing method, device, storage medium, equipment and product
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an audio processing method, an audio processing device, a storage medium, equipment and a product.
Background
The audio processing is generally mainly audio encoding and decoding, and the audio encoding and decoding processing process is mainly that sound signals are collected by a collecting end, the collecting end encodes and compresses the audio signals of the collected sound signals and then sends the compressed audio signals to a receiving end, and the receiving end decodes and plays the sound.
At present, an acquisition end in the related art can transmit an audio signal to a receiving end after reducing the code rate in a certain way so as to reduce the transmission bandwidth, but in the current way, the code rate is reduced to a limited extent, so that the reduction effect of the transmission bandwidth is poor, and the receiving end is often uncontrollable to generate an audio output signal in order to reduce the code rate, so that the audio playing effect is poor.
Disclosure of Invention
The embodiment of the application provides an audio processing scheme which can effectively reduce the transmission bandwidth of audio data and ensure the audio playing effect.
In order to solve the technical problems, the embodiment of the application provides the following technical scheme:
According to one embodiment of the application, the audio processing method comprises the steps of receiving frequency spectrum characteristic parameters of a high-frequency signal and low-frequency coding data of a low-frequency signal, decoding the low-frequency coding data to generate a decoded low-frequency signal, performing prediction network matching processing on the basis of the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters, performing audio prediction processing on the basis of the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, and generating an audio output signal corresponding to the target audio signal according to the high-frequency signal and the decoded low-frequency signal.
According to one embodiment of the application, an audio processing device comprises a receiving module, a decoding module, a matching module and an output module, wherein the receiving module is used for receiving spectral characteristic parameters of a high-frequency signal and low-frequency coding data of a low-frequency signal, the high-frequency signal and the low-frequency signal belong to a target audio signal, the decoding module is used for decoding the low-frequency coding data to generate a decoded low-frequency signal, the matching module is used for carrying out prediction network matching processing based on the spectral characteristic parameters to obtain an audio prediction network matched with the spectral characteristic parameters, the prediction module is used for carrying out audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, and the output module is used for generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type, the matching module includes an information obtaining unit configured to obtain network information of at least one preset audio prediction network, where each network information corresponds to a preset spectral envelope type, a network matching unit configured to determine network information corresponding to a preset spectral envelope type matched by the spectral envelope type, to obtain target network information, and a network determining unit configured to determine a preset audio prediction network corresponding to the target network information as the audio prediction network matched by the spectral feature parameters.
In some embodiments of the application, the prediction module comprises an extraction processing unit, an information prediction unit and a signal generation unit, wherein the extraction processing unit is used for carrying out frequency spectrum characteristic extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information, the information prediction unit is used for carrying out audio prediction processing on the basis of the low-frequency spectrum information by adopting the audio prediction network to obtain predicted frequency spectrum information, and the signal generation unit is used for generating a predicted high-frequency signal corresponding to the high-frequency signal on the basis of the predicted frequency spectrum information.
In some embodiments of the present application, the extraction processing unit is configured to perform modified discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information, and the signal generating unit is configured to perform modified discrete cosine inverse transform processing on the predicted spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
In some embodiments of the present application, the output module is configured to perform quadrature mirror image synthesis filtering processing on the predicted high frequency signal and the decoded low frequency signal to generate the audio output signal.
According to one embodiment of the application, the audio processing method comprises the steps of carrying out decomposition processing on a target audio signal to generate a high-frequency signal and a low-frequency signal, carrying out feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal, carrying out audio encoding processing on the low-frequency signal to generate low-frequency encoded data corresponding to the low-frequency signal, and sending the frequency spectrum feature parameter and the low-frequency encoded data to a receiving end so that the receiving end can determine an audio prediction network matched with the frequency spectrum feature parameter and generate an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
According to one embodiment of the application, an audio processing device comprises a decomposition module, an extraction module, an encoding module and a transmission module, wherein the decomposition module is used for carrying out decomposition processing on a target audio signal to generate a high-frequency signal and a low-frequency signal, the extraction module is used for carrying out feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal, the encoding module is used for carrying out audio encoding processing on the low-frequency signal to generate low-frequency encoded data corresponding to the low-frequency signal, and the transmission module is used for transmitting the frequency spectrum feature parameter and the low-frequency encoded data to a receiving end so that the receiving end can determine an audio prediction network matched with the frequency spectrum feature parameter and generate an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
In some embodiments of the application, the extraction module comprises a frequency domain conversion unit, a power spectrum value calculation unit and a frequency spectrum characteristic parameter acquisition unit, wherein the frequency domain conversion unit is used for carrying out frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal, the power spectrum value calculation unit is used for calculating the power spectrum value of each frequency point in the frequency domain signal, and the frequency spectrum characteristic parameter acquisition unit is used for carrying out characteristic extraction processing based on the power spectrum value of each frequency point to obtain a frequency spectrum characteristic parameter describing the frequency spectrum distribution characteristic of the high-frequency signal.
In some embodiments of the present application, the spectrum characteristic parameter obtaining unit includes an element calculating subunit, configured to calculate an average value of power spectrum values of the frequency points, determine a maximum power spectrum value of the power spectrum values of the frequency points, a difference processing subunit, configured to perform a difference processing on the maximum power spectrum value and the average value to obtain a first difference value, and a spectrum characteristic parameter determining subunit, configured to determine a spectrum characteristic parameter corresponding to the high frequency signal according to the first difference value.
In some embodiments of the present application, the spectral feature parameter includes a spectral envelope type, and the spectral feature parameter determining subunit is configured to determine that the spectral envelope type corresponding to the high-frequency signal is a first type if the first difference value is smaller than a first predetermined threshold value and the maximum power spectrum value is smaller than a second predetermined threshold value, and determine that the spectral envelope type corresponding to the high-frequency signal is a second type if the first difference value is smaller than the first predetermined threshold value and the maximum power spectrum value is larger than the second predetermined threshold value.
In some embodiments of the present application, the spectral feature parameter includes a spectral envelope type, the spectral feature parameter determining subunit is configured to normalize a power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point if the first difference value is greater than a first predetermined threshold value, obtain at least one preset target value, each target value corresponds to a preset spectral envelope type, calculate a mean square error value between the normalized value corresponding to each frequency point and each target value, and determine a preset spectral envelope type corresponding to a target value corresponding to the smallest mean square error value as the spectral envelope type of the high-frequency signal.
In some embodiments of the present application, the spectrum characteristic parameter determining subunit is configured to perform a difference processing on the power spectrum value of each frequency point and the average value to obtain a second difference value corresponding to each frequency point, calculate a square value of the second difference value corresponding to each frequency point, calculate an average value of the square values to obtain a normalized score, and divide the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
In some embodiments of the present application, the decomposition module is configured to perform quadrature mirror image decomposition filtering processing on the target audio signal to generate the high frequency signal and the low frequency signal.
According to another embodiment of the application, a computer readable storage medium has stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the method according to the embodiment of the application.
According to another embodiment of the application, an electronic device comprises a memory storing a computer program and a processor reading the computer program stored in the memory to perform the method according to the embodiment of the application.
According to another embodiment of the application, a computer program product or computer program includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations described in the embodiments of the present application.
In the embodiment of the application, the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coded data of the low-frequency signals are received, the high-frequency signals and the low-frequency signals are generated by decomposing target audio signals, the low-frequency coded data are decoded to generate decoded low-frequency signals, the prediction network matching processing is carried out on the basis of the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters, the audio prediction processing is carried out on the basis of the audio prediction network and the decoded low-frequency signals to generate predicted high-frequency signals corresponding to the high-frequency signals, and the audio output signals corresponding to the target audio signals are generated according to the high-frequency signals and the decoded low-frequency signals.
In this way, for the target audio signal, the spectrum distribution characteristics of the high-frequency signal in the target audio signal can be described through the spectrum characteristic parameters with very few data sizes, only the spectrum characteristic parameters and the low-frequency coded data of the low-frequency signal are required to be transmitted when the data are received, the transmission bandwidth is effectively reduced, meanwhile, the matched audio prediction network is selected based on the spectrum characteristic parameters to restore the high-frequency signal, the high-frequency signal is generated, and the general spectrum distribution characteristics can be described through the very few data sizes, so that the errors of the predicted high-frequency signal and the original high-frequency signal are controllable, the generation of the audio output signal is controllable, and furthermore, the overall coding rate in the audio processing process is effectively reduced, the capability of restoring the high-frequency signal is strong, the transmission bandwidth of the audio data is effectively reduced, and the audio playing effect is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 shows a schematic diagram of a system to which embodiments of the application may be applied.
Fig. 2 shows a flow chart of an audio processing method according to an embodiment of the application.
Fig. 3 shows a schematic diagram of spectral envelope types according to an embodiment of the application.
Fig. 4 shows a flow chart of an audio processing method according to another embodiment of the application.
Fig. 5 shows a flow chart of an audio processing procedure in a scenario.
Fig. 6 shows another flow chart of an audio processing procedure in one scenario.
Fig. 7 shows a block diagram of an audio processing device according to an embodiment of the application.
Fig. 8 shows a block diagram of an audio processing device according to another embodiment of the application.
Fig. 9 shows a block diagram of an electronic device according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
In the description that follows, specific embodiments of the application will be described with reference to steps and symbols performed by one or more computers, unless otherwise indicated. Thus, these steps and operations will be referred to in several instances as being performed by a computer, which as referred to herein performs operations that include processing units by the computer that represent electronic signals that represent data in a structured form. This operation transforms the data or maintains it in place in the memory system of the computer, which may be reconfigured or otherwise altered in ways well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format. However, the principles of the present application are described in the foregoing text and are not meant to be limiting, and those of skill in the art will appreciate that various of the steps and operations described below may also be implemented in hardware.
Fig. 1 shows a schematic diagram of a system 100 in which embodiments of the application may be applied. As shown in fig. 1, the system 100 may include a terminal 101, a terminal 102, and a server 103.
Terminals 101 and 102 may be any device, and terminals 102 include, but are not limited to, cell phones, computers, intelligent voice interaction devices, smart home appliances, vehicle terminals, VR/AR devices, smart watches, computers, and the like.
The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. In one implementation of the present example, the server 103 is a cloud server.
In some examples, the terminals 101, 102, and 103 may be nodes in a blockchain network, which may promote security of audio processing.
In one embodiment of the present example, the terminal 101 may receive a spectral feature parameter of a high-frequency signal and low-frequency encoded data of a low-frequency signal, where the high-frequency signal and the low-frequency signal belong to a target audio signal, perform decoding processing on the low-frequency encoded data to generate a decoded low-frequency signal, perform prediction network matching processing based on the spectral feature parameter to obtain an audio prediction network with the spectral feature parameter matching, perform audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, and generate an audio output signal corresponding to the target audio signal according to the high-frequency signal and the decoded low-frequency signal.
In one example, the spectral characteristic parameter of the high-frequency signal and the low-frequency encoded data of the low-frequency signal may be directly sent by the terminal 102 as the acquisition end to the terminal 101 as the receiving end, in one example, the spectral characteristic parameter of the high-frequency signal and the low-frequency encoded data of the low-frequency signal may be sent by the terminal 102 to the terminal 101 through the server 103, and in one example, the spectral characteristic parameter of the high-frequency signal and the low-frequency encoded data of the low-frequency signal may be sent by the a processing unit as the acquisition end in the terminal 101 to the B processing unit as the receiving end in the terminal 101.
In one embodiment of the present example, the terminal 102 may perform decomposition processing on a target audio signal to generate a high-frequency signal and a low-frequency signal, perform feature extraction processing on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal, perform audio encoding processing on the low-frequency signal to generate low-frequency encoded data corresponding to the low-frequency signal, and send the spectral feature parameter and the low-frequency encoded data to a receiving end, so that the receiving end determines an audio prediction network with the matched spectral feature parameter, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
In one example, the receiving end may be the terminal 101, the terminal 102 may directly send the spectrum characteristic parameter and the low-frequency encoded data to the terminal 101 or the terminal 102 may send the spectrum characteristic parameter and the low-frequency encoded data to the terminal 101 through the server 103, in one example, the receiving end may be one C processing unit in the middle terminal 102, and the other D processing unit in the terminal 102 as the collecting end may send the spectrum characteristic parameter and the low-frequency encoded data to the C processing unit.
Fig. 2 schematically shows a flow chart of an audio processing method according to an embodiment of the application. The main execution body of the audio processing method may be any receiving end, and the receiving end may decode the received audio related data to generate an audio output signal, where the audio output signal is used to play sound, and the receiving end may be, for example, the terminal 101 or the terminal 102 shown in fig. 1.
As shown in fig. 2, the audio processing method may include steps S210 to S250.
The method comprises the steps of S210, S220, S230, S240, S250 and S250, wherein the S210 receives spectral characteristic parameters of a high-frequency signal and low-frequency coding data of a low-frequency signal, the high-frequency signal and the low-frequency signal belong to a target audio signal, the decoding process is carried out on the low-frequency coding data to generate a decoded low-frequency signal, the prediction network matching process is carried out on the basis of the spectral characteristic parameters to obtain an audio prediction network with the matched spectral characteristic parameters, the audio prediction process is carried out on the basis of the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, and the audio output signal corresponding to the target audio signal is generated according to the high-frequency signal and the decoded low-frequency signal.
In this way, for the target audio signal, the spectrum distribution characteristics of the high-frequency signal in the target audio signal can be described through the spectrum characteristic parameters with very few data sizes, only the spectrum characteristic parameters and the low-frequency coded data of the low-frequency signal are required to be transmitted when the data are received, the transmission bandwidth is effectively reduced, meanwhile, the matched audio prediction network is selected based on the spectrum characteristic parameters to restore the high-frequency signal, the high-frequency signal is generated, and the general spectrum distribution characteristics can be described through the very few data sizes, so that the errors of the predicted high-frequency signal and the original high-frequency signal are controllable, the generation of the audio output signal is controllable, and furthermore, the overall coding rate in the audio processing process is effectively reduced, the capability of restoring the high-frequency signal is strong, the transmission bandwidth of the audio data is effectively reduced, and the audio playing effect is ensured.
The specific procedure of each step performed when audio processing is performed in the embodiment shown in fig. 2 is described below.
In step S210, a spectral feature parameter of a high-frequency signal and low-frequency encoded data of a low-frequency signal generated by decomposing a target audio signal are received.
The target audio signal may be a digital sound signal generated by the acquisition end through analog-to-digital conversion of the acquired sound signal, the target audio signal may be decomposed into a high-frequency signal and a low-frequency signal, the high-frequency signal may be a part of the target audio signal above a predetermined frequency, and the low-frequency signal may be a part of the target audio signal below the predetermined frequency.
The acquisition end can extract the spectrum distribution characteristics of the high-frequency signal and generate spectrum characteristic parameters describing the spectrum distribution characteristics according to the extracted spectrum distribution characteristics. The spectral feature parameter may be a number or an identifier, etc., and the data size of the spectral feature parameter may be controlled very little, for example, in one example, a spectral distribution feature is described by "1", and only 3 bits (bits) of "1" are needed to describe a spectral distribution feature. And the low frequency signal may be encoded using a conventional speech encoder (which may be CELP, SILK, AAC or the like) to generate low frequency encoded data.
After the acquisition end generates the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coding data of the low-frequency signals, the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coding data of the low-frequency signals can be sent to the receiving end. The acquisition end can form a coded code stream together with the frequency spectrum characteristic parameters and the low-frequency coded data to be transmitted to the receiving end when transmitting, and the code stream of the coded code stream can be extremely small.
In step S220, the low-frequency encoded data is subjected to decoding processing to generate a decoded low-frequency signal.
The receiving end can decode the low-frequency encoded data by a traditional voice decoder to generate a decoded low-frequency signal, and the decoded low-frequency signal is the decoded low-frequency signal.
In step S230, a prediction network matching process is performed based on the spectral feature parameters, so as to obtain an audio prediction network with the matched spectral feature parameters.
The audio prediction network, i.e. the prediction network for predicting the high frequency signal, may be a deep learning network. The prediction network matching process is performed based on the spectral feature parameters, that is, an audio prediction network that determines that the spectral feature parameters match, and it is understood that each spectral feature parameter may correspond to an audio prediction network.
The audio prediction network is used for predicting a high-frequency signal corresponding to the high-frequency signal according to low-frequency coding data corresponding to the low-frequency signal, so that mapping of audio from low frequency to high frequency is realized, diversity exists in mapping of audio from low frequency to high frequency, the spectral characteristic parameters of the high-frequency signal are added to match the corresponding audio prediction network, so that the diversity of samples can be limited, namely, a plurality of preset audio prediction networks are trained, each network corresponds to own training samples (the training samples can comprise input samples, namely, signal characteristic information (such as spectral information) of the low-frequency signal, and the expected output, namely, signal characteristic information (such as spectral information) of the high-frequency signal) and the audio prediction network matched with the spectral characteristic parameters can be determined from the plurality of preset audio prediction networks.
Each network corresponds to its own training sample and trains independently, for example, the a preset audio prediction network may train based on the training sample corresponding to the spectral feature parameter 1, and the B preset audio prediction network may train based on the training sample corresponding to the spectral feature parameter 2. At this time, if the received spectral feature parameter is 1, the audio prediction network to which the received spectral feature parameter is matched is a preset audio prediction network.
In one embodiment, the spectrum characteristic parameters comprise spectrum envelope types, and the step S230 of performing prediction network matching processing based on the spectrum characteristic parameters to obtain an audio prediction network matched with the spectrum characteristic parameters comprises the steps of obtaining network information of at least one preset audio prediction network, wherein each network information corresponds to one preset spectrum envelope type, determining network information corresponding to the preset spectrum envelope types matched with the spectrum envelope types to obtain target network information, and determining the preset audio prediction network corresponding to the target network information as the audio prediction network matched with the spectrum characteristic parameters.
The spectral feature parameter is information describing a spectral distribution feature, in this embodiment, the spectral feature parameter is a spectral envelope type, and the spectral distribution feature is a spectral envelope feature, where the spectral envelope type is information describing a spectral envelope feature, and the spectral envelope feature may represent a spectral value variation trend in a signal spectrum, that is, each spectral value variation trend corresponds to one spectral envelope type.
The network information may be an identification of a preset audio prediction network, i.e. a pre-trained audio prediction network. Each network information corresponds to a preset spectrum envelope type, that is, each preset audio prediction network corresponds to a preset spectrum envelope type.
In this embodiment, each network corresponds to its own training sample and trains independently, for example, a preset audio prediction network may train based on the training sample corresponding to the spectral envelope type 1, and B preset audio prediction network may train based on the training sample corresponding to the spectral envelope type 2. At this time, if the received spectrum envelope type is 1, the audio prediction network with the received spectrum envelope type matching is a preset audio prediction network.
A preset spectrum envelope type is determined, for example, the received spectrum envelope type is 1, and the preset spectrum envelope type is 1. If the network information corresponding to the matched preset spectrum envelope type 1 is X, the target network information is X. Furthermore, the audio prediction network with matched frequency spectrum characteristic parameters is a preset audio prediction network indicated by X.
The accuracy of the prediction high-frequency signals of the audio prediction network can be effectively improved in a mode of matching the prediction network by using the spectrum envelope type.
In one embodiment, referring to fig. 3, the preset spectrum envelope type includes 8 types, wherein the "0" type is a low-energy flat type, the "1" type is a high-energy flat type, the "2" type is an energy convex type, the "3" type is an energy concave type, the "4" type is an energy gradually rising type, the "5" energy gradually decreasing type, the "6" type is a step type with high energy before and low energy after, and the "7" type is a step type with high energy before and low energy after. The applicant found that in this type of division, the prediction accuracy of the audio prediction network can be further improved by performing prediction network matching in the form of a spectral envelope type.
It will be appreciated that in other embodiments, the spectral feature parameter may include other feature information, such as information describing features in the spectrum, such as a proportion of a power spectrum value exceeding a predetermined threshold, etc., and further step S230 of performing a prediction network matching process based on the spectral feature parameter to obtain an audio prediction network with the spectral feature parameter matching, may include obtaining network information of at least one preset audio prediction network, each of the network information corresponding to a preset spectrum envelope type, determining network information corresponding to the preset spectrum envelope type with which the other feature information matches, obtaining target network information, and determining a preset audio prediction network corresponding to the target network information as the audio prediction network with which the spectral feature parameter matches.
In step S240, audio prediction processing is performed based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal.
When the audio prediction network based on the spectrum characteristic parameter matching predicts, the error of the predicted high-frequency signal and the original high-frequency signal is controllable because the general spectrum distribution characteristic can be described through the extremely small data size, and compared with the mode of adopting a unified prediction network, the matched audio prediction network can accurately predict the high-frequency signal, can avoid the midrange property of the predicted high-frequency signal, and improves the accuracy of the predicted high-frequency signal.
Wherein signal characteristic information (e.g., spectral information) of the decoded low frequency signal may be input to an audio prediction network, the audio prediction network may predict signal characteristic information (e.g., spectral information) of the output high frequency signal, and the output signal characteristic may be used to restore the high frequency signal to generate a predicted audio signal.
In one embodiment, step S240 performs audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, where the step includes performing spectral feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectral information, performing audio prediction processing based on the low-frequency spectral information using the audio prediction network to obtain predicted spectral information, and generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectral information.
The low-frequency spectrum information is input into an audio prediction network, and the audio prediction network performs audio prediction processing, so that predicted spectrum information is output, and the predicted high-frequency signal corresponding to the high-frequency signal can be obtained by performing audio restoration based on the predicted spectrum information.
In one embodiment, the method for extracting the frequency spectrum characteristics of the decoded low-frequency signal to obtain low-frequency spectrum information comprises the steps of performing modified discrete cosine transform on the decoded low-frequency signal to obtain the low-frequency spectrum information, and generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted frequency spectrum information comprises the steps of performing modified discrete cosine inverse transform on the predicted frequency spectrum information to generate the predicted high-frequency signal corresponding to the high-frequency signal.
In this embodiment, the decoded low frequency signal may be subjected to a modified discrete cosine transform process by a modified discrete cosine transformer (MDCT, modified Discrete Cosine Transform) to obtain the low frequency spectrum information. Then, the predicted spectral information predicted for the audio prediction network may be subjected to an inverse modified discrete cosine transform process by an inverse modified discrete cosine transformer (IMDCT, inverse Modified Discrete Cosine Transform) to generate a predicted high frequency signal. In this predictive manner, the applicant has found that the accuracy of predicting the audio signal can be further improved.
In step S250, an audio output signal corresponding to the target audio signal is generated according to the predicted high frequency signal and the decoded low frequency signal.
The predicted high-frequency signal is a predicted high-frequency signal, the decoded low-frequency signal is an original low-frequency signal, and the predicted high-frequency signal and the decoded low-frequency signal are synthesized to generate an audio output signal corresponding to the original target audio signal.
In one embodiment, step S250 is configured to generate an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal, and includes performing quadrature mirror image synthesis filtering on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal. Wherein, the quadrature mirror image synthesis filter processing can be performed on the predicted high frequency signal and the decoded low frequency signal through a quadrature mirror image filter (QMF, quandrature Mirror Filter) to generate a full-band audio output signal corresponding to the target audio signal.
It will be appreciated that in other embodiments, the audio output signal may be generated by subjecting the high frequency signal to a synthesis filter process with the decoded low frequency signal by other existing synthesis filters.
Fig. 4 schematically shows a flow chart of an audio processing method according to an embodiment of the application. The main body of execution of the audio processing method may be any terminal, for example, the terminal 101 or the terminal 102 shown in fig. 1.
As shown in fig. 4, the audio processing method may include steps S310 to S340.
Step S310, decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal; the method comprises the steps of S320, S330, S340, transmitting the frequency spectrum characteristic parameters and the low-frequency coding data to a receiving end so that the receiving end determines an audio prediction network matched with the frequency spectrum characteristic parameters and generates an audio output signal based on the audio prediction network and the decoded low-frequency signals obtained by decoding the low-frequency coding data.
In this way, for the target audio signal, the spectrum distribution characteristics of the high-frequency signal in the target audio signal can be described through the spectrum characteristic parameters with very few data sizes, only the spectrum characteristic parameters and the low-frequency coding data of the low-frequency signal are required to be transmitted during transmission, the transmission bandwidth is effectively reduced, meanwhile, the matched audio prediction network is selected based on the spectrum characteristic parameters to restore the high-frequency signal, the high-frequency signal is generated, and the general spectrum distribution characteristics can be described through the very few data sizes, so that the errors of the predicted high-frequency signal and the original high-frequency signal are controllable, the generation of the audio output signal is controllable, and furthermore, the overall coding rate in the audio processing process is effectively reduced, the capability of restoring the high-frequency signal is strong, the transmission bandwidth of the audio data is effectively reduced, and the audio playing effect is ensured.
The specific procedure of each step performed when audio processing is performed in the embodiment shown in fig. 3 is described below.
In step S310, the target audio signal is decomposed to generate a high-frequency signal and a low-frequency signal.
The target audio signal may be a digital sound signal generated by the acquisition end through analog-to-digital conversion of the acquired sound signal, the target audio signal may be decomposed into a high-frequency signal and a low-frequency signal, the high-frequency signal may be a part of the target audio signal above a predetermined frequency, and the low-frequency signal may be a part of the target audio signal below the predetermined frequency.
Wherein the target audio signal may be decomposed in a corresponding manner to generate a high frequency signal and a low frequency signal by a band-PASS FILTER (BPF) bank, a quadrature mirror filter (QMF, quandrature Mirror Filter) bank, or the like.
In one embodiment, step S310 of decomposing the target audio signal to generate a high frequency signal and a low frequency signal includes performing quadrature mirror decomposition filtering on the target audio signal to generate the high frequency signal and the low frequency signal. Wherein, the target audio signal may be subjected to quadrature mirror image decomposition filtering processing by a quadrature mirror image filter (QMF, quandrature Mirror Filter) bank to generate a high frequency signal and a low frequency signal.
In step S320, feature extraction processing is performed on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal.
The acquisition end can extract the spectrum distribution characteristics of the high-frequency signal and generate spectrum characteristic parameters describing the spectrum distribution characteristics according to the extracted spectrum distribution characteristics. The spectral distribution characteristics may include, among other things, spectral envelope characteristics, proportions of power spectrum values in the spectrum exceeding a predetermined threshold, and the like. In one embodiment of the present example, the spectral distribution feature is a spectral envelope feature, and the spectral feature parameter is a spectral envelope type.
The spectral feature parameter may be a number or an identifier, etc., and the data size of the spectral feature parameter may be controlled very little, for example, in one example, a spectral distribution feature is described by "1", and only 3 bits (bits) of "1" are needed to describe a spectral distribution feature.
In one embodiment, step S320 includes performing feature extraction processing on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal, where the feature extraction processing includes performing frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal, calculating power spectrum values of frequency points in the frequency domain signal, and performing feature extraction processing based on the power spectrum values of the frequency points to obtain a spectral feature parameter describing a spectral distribution feature of the high-frequency signal.
The high-frequency signal is a time-domain signal, a frequency-domain signal of a frequency domain can be obtained through frequency domain conversion processing (such as Fourier transform processing), and a power spectrum value of each frequency point can be extracted based on a frequency spectrum (i.e. a spectrogram) of the frequency domain signal. The characteristic extraction based on the power spectrum value of each frequency point can accurately analyze the spectrum distribution characteristic of the high-frequency signal, and further generate spectrum characteristic parameters describing the spectrum distribution characteristic. In one example, the power spectrum value of each frequency point may be a logarithmic value of the power spectrum of each frequency point.
In one embodiment, the feature extraction processing is performed based on the power spectrum values of the frequency points to obtain a spectrum feature parameter describing the spectrum distribution feature of the high-frequency signal, and the feature extraction processing comprises the steps of calculating an average value of the power spectrum values of the frequency points, determining a maximum power spectrum value in the power spectrum values of the frequency points, performing a difference processing on the maximum power spectrum value and the average value to obtain a first difference value, and determining the spectrum feature parameter corresponding to the high-frequency signal according to the first difference value.
For example, the power spectrum value of each frequency point is x (i), i e [ N1, N2], i is the number of the frequency point, the average xavg of the power spectrum values of each frequency point is the average of all x (i), the maximum power spectrum value xmax of each frequency point is the maximum one of all x (i), and the maximum power spectrum value xmax is subtracted by the average xavg to obtain a first difference. In this way, the spectral distribution characteristics, in particular the spectral envelope characteristics, can be accurately reflected on the basis of the magnitude of the first difference.
In some other modes, when feature extraction processing is performed based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing the spectrum distribution feature of the high-frequency signal, the corresponding spectrum feature parameter can be determined according to the proportion by calculating the proportion of the power spectrum value exceeding a predetermined threshold in the spectrum.
In one embodiment, the spectrum characteristic parameter includes a spectrum envelope type, and the determining the spectrum characteristic parameter corresponding to the high-frequency signal according to the first difference value includes determining that the spectrum envelope type corresponding to the high-frequency signal is a first type if the first difference value is smaller than a first predetermined threshold value and the maximum power spectrum value is smaller than a second predetermined threshold value, and determining that the spectrum envelope type corresponding to the high-frequency signal is a second type if the first difference value is smaller than the first predetermined threshold value and the maximum power spectrum value is larger than the second predetermined threshold value. The first type and the second type describe corresponding spectral envelope features, respectively.
In one implementation of this embodiment, referring to fig. 3, the predetermined spectrum envelope type includes a "0" type being a low-energy tile type and a "1" type being a high-energy tile type. If the first difference is smaller than the first predetermined threshold value C1 and the maximum power spectrum value is smaller than the second predetermined threshold value C2, it may be determined that the spectrum envelope type corresponding to the high-frequency signal is a low-energy tiling type of the first type: "0". If the first difference value is smaller than a first preset threshold value C1 and the maximum power spectrum value is larger than a second preset threshold value C2, determining that the spectrum envelope type corresponding to the high-frequency signal is of a second type which is a high-energy tiling type of a type 1.
In one embodiment, the spectrum characteristic parameter comprises a spectrum envelope type, the spectrum characteristic parameter corresponding to the high-frequency signal is determined according to the first difference value, the method comprises the steps of normalizing the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point if the first difference value is larger than a first preset threshold value, obtaining at least one preset target value, each target value corresponds to one preset spectrum envelope type, calculating a mean square error value of the normalization value corresponding to each frequency point and each target value, and determining the minimum target value corresponding to the mean square error value as the spectrum envelope type of the high-frequency signal.
In one implementation manner of this embodiment, referring to fig. 3, the preset spectrum envelope type includes an "2" type being an energy convex type, an "3" type being an energy concave type, an "4" type being an energy gradually rising type, an "5" type being an energy gradually decreasing type, an "6" type being a step type with high energy at front and low energy at rear, and an "7" type being a step type with high energy at front and low energy at rear.
Each target value corresponds to a preset spectrum envelope type, and the target value is a preset value. In one example, the target value may be z (i), i e [ N1, N2], i is the number of the frequency bin, z (i) is less than 1, taking type "2" as an example, if N2-N1+1 is equal to 9, the target value z (i) corresponding to type "2" may be set to be 000111000.
And calculating a mean square error value of a normalization value corresponding to each frequency point and each target value, and accurately determining a preset spectrum envelope type with the nearest spectrum distribution characteristic of the high-frequency signal based on the minimum mean square error value. For example, if the mean square error value of the target value corresponding to the type "2" and the normalized value corresponding to each frequency point is smaller than the types "3" to "7", the type "2" energy convex type can be determined as the spectrum envelope type of the high-frequency signal.
In one embodiment, the normalizing the power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point includes performing a difference processing on the power spectrum value of each frequency point and the average value to obtain a second difference value corresponding to each frequency point, calculating a square value of the second difference value corresponding to each frequency point, calculating an average value of the square values to obtain a normalized score, and dividing the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
In this embodiment, the formula can be specifically adoptedAndNormalization processing is performed to obtain a normalized value y (i) corresponding to each frequency point i, wherein N2-n1+1 is the total number of frequency points, N2 to N1 are the frequency point sequence number range, xavg is the average value, x (i) is the power spectrum value of each frequency point i, x (i) -xavg is the second difference value corresponding to each frequency point, and std is the normalized score (average value of square values).
In step S330, the low-frequency signal is subjected to audio encoding processing, and low-frequency encoded data corresponding to the low-frequency signal is generated.
The low frequency signal may be encoded using a conventional speech encoder (which may be CELP, SILK, AAC or the like) to generate low frequency encoded data.
In step S340, the spectral feature parameter and the low-frequency encoded data are transmitted to the receiving end, so that the receiving end determines an audio prediction network with the spectral feature parameter matched, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
When the frequency spectrum characteristic parameters and the low-frequency coded data are sent to the receiving end, the frequency spectrum characteristic parameters and the low-frequency coded data can be combined together to form a coded code stream which can be extremely small to be sent to the receiving end. The receiving end may determine an audio prediction network with matched spectral feature parameters based on the steps in the embodiment shown in fig. 2, and generate an audio output signal based on the audio prediction network and a decoded low-frequency signal obtained by decoding the low-frequency encoded data.
The method described in the above embodiments will be described in further detail below with reference to an application scenario example. The meaning of the related terms in this scenario is the same as in the foregoing embodiments, and specific reference may be made to the description in the foregoing embodiments. The flow of audio processing in this application scenario, in which the foregoing embodiments of the present application are applied to audio processing, may be shown in fig. 5 and 6.
First, referring to fig. 5, the encoding process in the audio processing process is performed at the acquisition end, and the process may include steps S410 to S450.
In step S410, a target audio signal is input, specifically, the acquisition end may acquire a target audio signal generated by analog-to-digital conversion of a sound signal.
In step S420, QMF decomposition, specifically, subjecting a target audio signal to decomposition processing to generate a high-frequency signal and a low-frequency signal, and subjecting the target audio signal to decomposition processing to generate a high-frequency signal and a low-frequency signal, includes subjecting the target audio signal to quadrature mirror decomposition filtering processing to generate the high-frequency signal and the low-frequency signal. Wherein, the target audio signal may be subjected to quadrature mirror image decomposition filtering processing by a quadrature mirror image filter (QMF, quandrature Mirror Filter) bank to generate a high frequency signal and a low frequency signal.
In step S430, spectral feature frame extraction is performed, specifically, on the high-frequency signal to obtain spectral feature parameters corresponding to the high-frequency signal.
The method comprises the steps of carrying out feature extraction processing on the high-frequency signals to obtain frequency spectrum feature parameters corresponding to the high-frequency signals, carrying out frequency domain conversion processing on the high-frequency signals to obtain frequency domain signals, calculating power spectrum values of all frequency points in the frequency domain signals, and carrying out feature extraction processing on the basis of the power spectrum values of all the frequency points to obtain the frequency spectrum feature parameters describing the frequency spectrum distribution features of the high-frequency signals.
The method comprises the steps of calculating an average value of power spectrum values of all frequency points, determining a maximum power spectrum value in the power spectrum values of all frequency points, carrying out difference processing on the maximum power spectrum value and the average value to obtain a first difference value, and determining the spectrum characteristic parameter corresponding to the high-frequency signal according to the first difference value.
The spectrum characteristic parameters comprise spectrum envelope types, and specifically referring to fig. 3, the preset spectrum envelope types comprise a low-energy tiling type of a 0 type, a high-energy tiling type of a1 type, an energy convex type of a2 type, an energy concave type of a3 type, an energy gradually rising type of a 4 type, an energy gradually falling type of a 5 type, a step type of high energy before and low energy after the 6 type, and a step type of high energy before and low energy after the 7 type.
The "0" type is a low-energy tiling type, and the "1" type is a high-energy tiling type. The determining the spectrum characteristic parameters corresponding to the high-frequency signals according to the first difference value comprises determining that the spectrum envelope type corresponding to the high-frequency signals is of a first type if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is smaller than a second preset threshold value, and determining that the spectrum envelope type corresponding to the high-frequency signals is of a second type if the first difference value is smaller than the first preset threshold value and the maximum power spectrum value is larger than the second preset threshold value. If the first difference is smaller than the first predetermined threshold C1 and the maximum power spectrum value is smaller than the second predetermined threshold C2, it may be determined that the spectrum envelope type corresponding to the high-frequency signal is a low-energy tiling type of the first type: "0". If the first difference value is smaller than a first preset threshold value C1 and the maximum power spectrum value is larger than a second preset threshold value C2, determining that the spectrum envelope type corresponding to the high-frequency signal is of a second type which is a high-energy tiling type of a type 1.
For the "2" to "7" forms. The method comprises the steps of determining spectral characteristic parameters corresponding to high-frequency signals according to the first difference value, normalizing power spectrum values of all frequency points to obtain normalized values corresponding to all the frequency points if the first difference value is larger than a first preset threshold value, obtaining at least one preset target value, wherein each target value corresponds to one preset spectral envelope type, calculating a mean square error value of the normalized value corresponding to all the frequency points and each target value, and determining the preset spectral envelope type corresponding to the target value corresponding to the smallest mean square error value as the spectral envelope type of the high-frequency signals.
Each target value corresponds to a preset spectrum envelope type, and the target value is a preset value. The target value may be z (i), i e [ N1, N2], i is the number of the frequency bin, z (i) is smaller than 1, taking type "2" as an example, if N2-N1+1 is equal to 9, the target value z (i) corresponding to type "2" may be set to 000111000. And calculating a mean square error value of a normalization value corresponding to each frequency point and each target value, and accurately determining a preset spectrum envelope type with the nearest spectrum distribution characteristic of the high-frequency signal based on the minimum mean square error value. For example, if the mean square error value of the target value corresponding to the type "2" and the normalized value corresponding to each frequency point is smaller than the types "3" to "7", the type "2" energy convex type can be determined as the spectrum envelope type of the high-frequency signal.
The normalization processing is carried out on the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point, and the normalization processing comprises the steps of carrying out difference processing on the power spectrum value of each frequency point and the average value to obtain a second difference value corresponding to each frequency point, calculating the square value of the second difference value corresponding to each frequency point, calculating the average value of the square values to obtain a normalization score, and dividing the second difference value corresponding to each frequency point by the normalization score to obtain a normalization value corresponding to each frequency point. In this embodiment, the formula can be specifically adoptedAndNormalization processing is performed to obtain a normalized value y (i) corresponding to each frequency point i, wherein N2-n1+1 is the total number of frequency points, N2 to N1 are the frequency point sequence number range, xavg is the average value, x (i) is the power spectrum value of each frequency point i, x (i) -xavg is the second difference value corresponding to each frequency point, and std is the normalized score (average value of square values).
In step S440, the low frequency speech is encoded, specifically, the low frequency signal is subjected to audio encoding processing, and low frequency encoded data corresponding to the low frequency signal is generated. The low frequency signal may be encoded using a conventional speech encoder (which may be CELP, SILK, AAC or the like) to generate low frequency encoded data.
In step S450, the data is output, and the spectral characteristic parameter and the low-frequency encoded data are transmitted to the receiving end. The spectrum characteristic parameters and the low-frequency coded data can be packaged together to form a coded code stream to be sent to a receiving end.
Further, referring to fig. 6, the decoding process in the audio processing process is performed at the acquisition end, and the process may include steps S510 to S550.
In step S510, a code stream is input, specifically, a code stream sent by the acquisition end is received, where the code stream includes a spectral feature parameter of a high-frequency signal and low-frequency encoded data of a low-frequency signal. Namely, the frequency spectrum characteristic parameters of the high-frequency signal and the low-frequency coded data of the low-frequency signal are received, and the high-frequency signal and the low-frequency signal are generated by decomposing the target audio signal.
In step S520, the code stream is parsed, specifically, the received code stream is parsed to obtain the spectral characteristic parameters of the high frequency signal and the low frequency encoded data of the low frequency signal in the code stream.
In step S530, the low frequency speech is decoded, specifically, the low frequency encoded data is decoded to generate a decoded low frequency signal. The receiving end can decode the low-frequency encoded data by a traditional voice decoder to generate a decoded low-frequency signal, and the decoded low-frequency signal is the decoded low-frequency signal.
In step S540, network matching, specifically, performing prediction network matching processing based on the spectrum feature parameters, to obtain an audio prediction network with the spectrum feature parameters matched.
The frequency spectrum characteristic parameters comprise frequency spectrum envelope types, the audio prediction network matched with the frequency spectrum characteristic parameters is obtained by performing prediction network matching processing on the basis of the frequency spectrum characteristic parameters, the audio prediction network matched with the frequency spectrum characteristic parameters comprises network information of at least one preset audio prediction network, each network information corresponds to one preset frequency spectrum envelope type, network information corresponding to the preset frequency spectrum envelope types matched with the frequency spectrum envelope types is determined to obtain target network information, and the preset audio prediction network corresponding to the target network information is determined to be the audio prediction network matched with the frequency spectrum characteristic parameters.
In step S550, a prediction process, specifically, an audio prediction process is performed based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal.
The step S550 of performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal comprises the step S551 of performing spectral feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectral information, the step S552 of performing audio prediction processing based on the low-frequency spectral information by adopting the audio prediction network to obtain predicted spectral information, and the step S553 of generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectral information.
The method comprises the steps of performing frequency spectrum feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information, wherein the step of performing improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information, and the step of generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted frequency spectrum information comprises the step of performing improved discrete cosine inverse transform processing on the predicted frequency spectrum information to generate the predicted high-frequency signal corresponding to the high-frequency signal.
The decoded low frequency signal may be subjected to a modified discrete cosine transform process by a modified discrete cosine transformer (MDCT, modified Discrete Cosine Transform) to obtain low frequency spectrum information. Then, the predicted spectral information predicted for the audio prediction network may be subjected to an inverse modified discrete cosine transform process by an inverse modified discrete cosine transformer (IMDCT, inverse Modified Discrete Cosine Transform) to generate a predicted high frequency signal.
In step S560, QMF synthesis, specifically, generating an audio output signal corresponding to the target audio signal from the high frequency signal and the decoded low frequency signal. Generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal comprises performing quadrature mirror image synthesis filtering processing on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal. Wherein, the quadrature mirror image synthesis filter processing can be performed on the predicted high frequency signal and the decoded low frequency signal through a quadrature mirror image filter (QMF, quandrature Mirror Filter) to generate a full-band audio output signal corresponding to the target audio signal.
The method can at least realize that the acquisition end can describe the spectrum distribution characteristics of the high-frequency signals in the target audio signals through the spectrum characteristic parameters with little data size, only the spectrum characteristic parameters and the low-frequency coded data of the low-frequency signals need to be transmitted during transmission, the transmission bandwidth is effectively reduced, meanwhile, the matched audio prediction network is selected based on the spectrum characteristic parameters to restore the high-frequency signals, the high-frequency signals are generated, and the general spectrum distribution characteristics can be described through the little data size, so that the errors of the predicted high-frequency signals and the original high-frequency signals are controllable, the generation of the audio output signals is controllable, the overall coding rate is effectively reduced in the audio processing process, the capability of restoring the high-frequency signals is strong, the transmission bandwidth of the audio data is effectively reduced, and the audio playing effect is ensured.
In order to facilitate better implementation of the audio processing method provided by the embodiment of the application, the embodiment of the application also provides an audio processing device based on the audio processing method. Where the meaning of the terms is the same as in the above-described audio processing method, specific implementation details may be referred to in the description of the method embodiments. Fig. 7 shows a block diagram of an audio processing device according to an embodiment of the application. Fig. 8 shows a block diagram of an audio processing device according to another embodiment of the application.
As shown in fig. 7, the audio processing apparatus 600 may include a receiving module 610, a decoding module 620, a matching module 630, a predicting module 640, and an output module 650, where the audio processing apparatus 600 may be applied to a device corresponding to a receiving end of audio.
The receiving module 610 may be configured to receive spectral feature parameters of a high-frequency signal and low-frequency encoded data of a low-frequency signal, where the high-frequency signal and the low-frequency signal belong to a target audio signal, the decoding module 620 may be configured to perform decoding processing on the low-frequency encoded data to generate a decoded low-frequency signal, the matching module 630 may be configured to perform prediction network matching processing based on the spectral feature parameters to obtain an audio prediction network with the matched spectral feature parameters, the prediction module 640 may be configured to perform audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, and the output module 650 may be configured to generate an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type, the matching module 630 includes an information obtaining unit configured to obtain network information of at least one preset audio prediction network, where each network information corresponds to a preset spectral envelope type, a network matching unit configured to determine network information corresponding to a preset spectral envelope type matched by the spectral envelope type, to obtain target network information, and a network determining unit configured to determine a preset audio prediction network corresponding to the target network information as the audio prediction network matched by the spectral feature parameters.
In some embodiments of the present application, the prediction module 640 includes an extraction processing unit configured to perform spectral feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information, an information prediction unit configured to perform audio prediction processing based on the low-frequency spectrum information by using the audio prediction network to obtain predicted spectrum information, and a signal generation unit configured to generate a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectrum information.
In some embodiments of the present application, the extraction processing unit is configured to perform modified discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information, and the signal generating unit is configured to perform modified discrete cosine inverse transform processing on the predicted spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
In some embodiments of the present application, the output module 650 is configured to perform quadrature mirror synthesis filtering on the predicted high frequency signal and the decoded low frequency signal to generate the audio output signal.
In this way, based on the audio processing apparatus 600, for the target audio signal, the spectrum distribution characteristics of the high-frequency signal therein may be described by the spectrum characteristic parameters of the minimum data size, only the spectrum characteristic parameters and the low-frequency encoded data of the low-frequency signal need to be transmitted when the data is received, the transmission bandwidth is effectively reduced, and at the same time, the high-frequency signal is restored by selecting the matched audio prediction network based on the spectrum characteristic parameters, so as to generate the high-frequency signal.
As shown in fig. 8, the audio processing apparatus 700 may include a decomposition module 710, an extraction module 720, an encoding module 730, and a delivery module 740, where the audio processing apparatus 700 may be applied to a device corresponding to an audio capturing end.
The decomposition module 710 may be configured to decompose a target audio signal to generate a high-frequency signal and a low-frequency signal, the extraction module 720 may be configured to perform feature extraction processing on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal, the encoding module 730 may be configured to perform audio encoding processing on the low-frequency signal to generate low-frequency encoded data corresponding to the low-frequency signal, and the transmission module 740 may be configured to send the spectral feature parameter and the low-frequency encoded data to a receiving end, so that the receiving end determines an audio prediction network that matches the spectral feature parameter, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
In some embodiments of the present application, the extraction module 720 includes a frequency domain conversion unit configured to perform a frequency domain conversion process on the high-frequency signal to obtain a frequency domain signal, a power spectrum value calculation unit configured to calculate a power spectrum value of each frequency point in the frequency domain signal, and a spectrum feature parameter acquisition unit configured to perform a feature extraction process based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing a spectrum distribution feature of the high-frequency signal.
In some embodiments of the present application, the spectrum characteristic parameter obtaining unit includes an element calculating subunit, configured to calculate an average value of power spectrum values of the frequency points, determine a maximum power spectrum value of the power spectrum values of the frequency points, a difference processing subunit, configured to perform a difference processing on the maximum power spectrum value and the average value to obtain a first difference value, and a spectrum characteristic parameter determining subunit, configured to determine a spectrum characteristic parameter corresponding to the high frequency signal according to the first difference value.
In some embodiments of the present application, the spectral feature parameter includes a spectral envelope type, and the spectral feature parameter determining subunit is configured to determine that the spectral envelope type corresponding to the high-frequency signal is a first type if the first difference value is smaller than a first predetermined threshold value and the maximum power spectrum value is smaller than a second predetermined threshold value, and determine that the spectral envelope type corresponding to the high-frequency signal is a second type if the first difference value is smaller than the first predetermined threshold value and the maximum power spectrum value is larger than the second predetermined threshold value.
In some embodiments of the present application, the spectral feature parameter includes a spectral envelope type, the spectral feature parameter determining subunit is configured to normalize a power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point if the first difference value is greater than a first predetermined threshold value, obtain at least one preset target value, each target value corresponds to a preset spectral envelope type, calculate a mean square error value between the normalized value corresponding to each frequency point and each target value, and determine a preset spectral envelope type corresponding to a target value corresponding to the smallest mean square error value as the spectral envelope type of the high-frequency signal.
In some embodiments of the present application, the spectrum characteristic parameter determining subunit is configured to perform a difference processing on the power spectrum value of each frequency point and the average value to obtain a second difference value corresponding to each frequency point, calculate a square value of the second difference value corresponding to each frequency point, calculate an average value of the square values to obtain a normalized score, and divide the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
In some embodiments of the present application, the decomposing module 710 is configured to perform quadrature mirror image decomposition filtering processing on the target audio signal to generate the high frequency signal and the low frequency signal.
In this way, based on the audio processing apparatus 700, for the target audio signal, the spectrum distribution characteristics of the high-frequency signal therein can be described by the spectrum characteristic parameters with very few data sizes, only the spectrum characteristic parameters and the low-frequency encoded data of the low-frequency signal need to be transmitted when the data are transmitted, the transmission bandwidth is effectively reduced, meanwhile, the matched audio prediction network is selected based on the spectrum characteristic parameters to restore the high-frequency signal, and generate the high-frequency signal.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
In addition, the embodiment of the present application further provides an electronic device, which may be a terminal or a server, as shown in fig. 9, which shows a schematic structural diagram of the electronic device according to the embodiment of the present application, specifically:
The electronic device may include one or more processing cores 'processors 801, one or more computer-readable storage media's memory 802, power supply 803, and input unit 804, among other components. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 9 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
The processor 801 is a control center of the electronic device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 802, and calling data stored in the memory 802, thereby controlling the electronic device as a whole. Optionally, the processor 801 may include one or more processing cores, and preferably the processor 801 may integrate an application processor that primarily processes operating systems, user pages, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801.
The memory 802 may be used to store software programs and modules, and the processor 801 executes various functional applications and data processing by executing the software programs and modules stored in the memory 802. The memory 802 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area that may store data created according to the use of the computer device, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 with access to the memory 802.
The electronic device further comprises a power supply 803 for powering the various components, preferably the power supply 803 can be logically coupled to the processor 801 via a power management system such that functions such as managing charging, discharging, and power consumption are performed by the power management system. The power supply 803 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input unit 804, which input unit 804 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.
Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 801 in the electronic device loads executable files corresponding to the processes of one or more computer programs into the memory 802 according to the following instructions, and the processor 801 executes the computer programs stored in the memory 802, so as to implement the functions in the foregoing embodiments of the present application.
The processor 801 may perform, for example, receiving spectral feature parameters of a high-frequency signal and low-frequency encoded data of a low-frequency signal, where the high-frequency signal and the low-frequency signal belong to a target audio signal, performing decoding processing on the low-frequency encoded data to generate a decoded low-frequency signal, performing prediction network matching processing based on the spectral feature parameters to obtain an audio prediction network with the matched spectral feature parameters, performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
The processor 801 may perform the steps of performing a decomposition process on a target audio signal to generate a high-frequency signal and a low-frequency signal, performing a feature extraction process on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal, performing an audio encoding process on the low-frequency signal to generate low-frequency encoded data corresponding to the low-frequency signal, and transmitting the spectral feature parameter and the low-frequency encoded data to a receiving end, so that the receiving end determines an audio prediction network with the spectral feature parameter matched, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
It will be appreciated by those of ordinary skill in the art that all or part of the steps of the various methods of the above embodiments may be performed by a computer program, or by computer program control related hardware, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present application also provide a computer readable storage medium having stored therein a computer program that can be loaded by a processor to perform the steps of any of the methods provided by the embodiments of the present application.
The computer readable storage medium may include, among others, read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disks, and the like.
Since the computer program stored in the computer readable storage medium may execute the steps of any one of the methods provided in the embodiments of the present application, the beneficial effects that can be achieved by the methods provided in the embodiments of the present application may be achieved, which are detailed in the previous embodiments and are not described herein.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the computer device to perform the methods provided in the various alternative implementations of the application described above.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It will be understood that the application is not limited to the embodiments which have been described above and shown in the drawings, but that various modifications and changes can be made without departing from the scope thereof.

Claims (16)

1.一种音频处理方法,其特征在于,包括:1. An audio processing method, comprising: 接收高频信号的频谱特征参数及低频信号的低频编码数据,所述高频信号及所述低频信号属于目标音频信号,所述频谱特征参数为描述所述高频信号的频谱分布特征的信息,并包括频谱包络类型;Receiving spectrum characteristic parameters of a high-frequency signal and low-frequency coding data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal, and the spectrum characteristic parameters are information describing spectrum distribution characteristics of the high-frequency signal and include a spectrum envelope type; 对所述低频编码数据进行解码处理,生成解码低频信号;Decoding the low-frequency encoded data to generate a decoded low-frequency signal; 基于所述频谱特征参数进行预测网络匹配处理,得到所述频谱特征参数匹配的音频预测网络,所述频谱特征参数匹配的音频预测网络是基于所述频谱特征参数对应的训练样本训练的深度学习网络;Performing prediction network matching processing based on the spectral feature parameters to obtain an audio prediction network matched with the spectral feature parameters, wherein the audio prediction network matched with the spectral feature parameters is a deep learning network trained based on training samples corresponding to the spectral feature parameters; 基于所述音频预测网络与所述解码低频信号进行音频预测处理,以生成所述高频信号对应的预测高频信号;Performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; 根据所述预测高频信号与所述解码低频信号,生成所述目标音频信号对应的音频输出信号;generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal; 其中,所述基于所述频谱特征参数进行预测网络匹配处理,得到所述频谱特征参数匹配的音频预测网络,包括:The step of performing prediction network matching processing based on the spectrum feature parameters to obtain an audio prediction network matching the spectrum feature parameters includes: 获取至少一个预设音频预测网络的网络信息,每个所述网络信息对应一种预设频谱包络类型;Acquire network information of at least one preset audio prediction network, each of the network information corresponds to a preset spectrum envelope type; 确定所述频谱包络类型匹配的预设频谱包络类型所对应网络信息,得到目标网络信息;Determine network information corresponding to a preset spectrum envelope type that matches the spectrum envelope type, and obtain target network information; 将所述目标网络信息对应的预设音频预测网络,确定为所述频谱特征参数匹配的音频预测网络。The preset audio prediction network corresponding to the target network information is determined as the audio prediction network that matches the frequency spectrum feature parameters. 2.根据权利要求1所述的方法,其特征在于,所述基于所述音频预测网络与所述解码低频信号进行音频预测处理,以生成所述高频信号对应的预测高频信号,包括:2. The method according to claim 1, characterized in that the audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal comprises: 对所述解码低频信号进行频谱特征提取处理,得到低频频谱信息;Performing spectrum feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information; 采用所述音频预测网络,基于所述低频频谱信息进行音频预测处理,得到预测频谱信息;Using the audio prediction network, performing audio prediction processing based on the low-frequency spectrum information to obtain predicted spectrum information; 基于所述预测频谱信息生成所述高频信号对应的预测高频信号。A predicted high-frequency signal corresponding to the high-frequency signal is generated based on the predicted spectrum information. 3.根据权利要求2所述的方法,其特征在于,所述对所述解码低频信号进行频谱特征提取处理,得到低频频谱信息,包括:3. The method according to claim 2, characterized in that the step of performing spectrum feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information comprises: 对所述解码低频信号进行改进离散余弦变换处理,得到所述低频频谱信息;Performing improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information; 所述基于所述预测频谱信息生成所述高频信号对应的预测高频信号,包括:The generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectrum information includes: 对所述预测频谱信息进行改进离散余弦反变换处理,生成所述高频信号对应的预测高频信号。The predicted frequency spectrum information is processed by an improved inverse discrete cosine transform to generate a predicted high-frequency signal corresponding to the high-frequency signal. 4.根据权利要求1至3任一项所述的方法,其特征在于,所述根据所述预测高频信号与所述解码低频信号,生成所述目标音频信号对应的音频输出信号,包括:4. The method according to any one of claims 1 to 3, characterized in that generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal comprises: 对所述预测高频信号与所述解码低频信号进行正交镜像合成滤波处理,生成所述音频输出信号。The predicted high-frequency signal and the decoded low-frequency signal are subjected to orthogonal mirror synthesis filtering to generate the audio output signal. 5.一种音频处理方法,其特征在于,包括:5. An audio processing method, comprising: 将目标音频信号进行分解处理,生成高频信号与低频信号;Decompose the target audio signal to generate high-frequency signal and low-frequency signal; 对所述高频信号进行特征提取处理,以获得所述高频信号对应的频谱特征参数,所述频谱特征参数为描述所述高频信号的频谱分布特征的信息,并包括频谱包络类型;Performing feature extraction processing on the high-frequency signal to obtain a spectrum feature parameter corresponding to the high-frequency signal, wherein the spectrum feature parameter is information describing a spectrum distribution feature of the high-frequency signal and includes a spectrum envelope type; 对所述低频信号进行音频编码处理,生成所述低频信号对应的低频编码数据;Performing audio coding processing on the low-frequency signal to generate low-frequency coding data corresponding to the low-frequency signal; 将所述频谱特征参数与所述低频编码数据发送至接收端,以使得所述接收端确定所述频谱特征参数匹配的音频预测网络,并基于所述音频预测网络与所述低频编码数据所解码得到的解码低频信号生成音频输出信号,其中,所述频谱特征参数匹配的音频预测网络是基于所述频谱特征参数对应的训练样本训练的深度学习网络;The spectral feature parameters and the low-frequency coded data are sent to a receiving end, so that the receiving end determines an audio prediction network that matches the spectral feature parameters, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data, wherein the audio prediction network that matches the spectral feature parameters is a deep learning network trained based on training samples corresponding to the spectral feature parameters; 其中,所述确定所述频谱特征参数匹配的音频预测网络,包括:Wherein, the determining of the audio prediction network matching the spectrum feature parameters comprises: 获取至少一个预设音频预测网络的网络信息,每个所述网络信息对应一种预设频谱包络类型;Acquire network information of at least one preset audio prediction network, each of the network information corresponds to a preset spectrum envelope type; 确定所述频谱包络类型匹配的预设频谱包络类型所对应网络信息,得到目标网络信息;Determine network information corresponding to a preset spectrum envelope type that matches the spectrum envelope type, and obtain target network information; 将所述目标网络信息对应的预设音频预测网络,确定为所述频谱特征参数匹配的音频预测网络。The preset audio prediction network corresponding to the target network information is determined as the audio prediction network that matches the frequency spectrum feature parameters. 6.根据权利要求5所述的方法,其特征在于,所述对所述高频信号进行特征提取处理,以获得所述高频信号对应的频谱特征参数,包括:6. The method according to claim 5, characterized in that the step of performing feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal comprises: 将所述高频信号进行频域转换处理,得到频域信号;Performing frequency domain conversion processing on the high frequency signal to obtain a frequency domain signal; 计算所述频域信号中各频点的功率谱值;Calculate the power spectrum value of each frequency point in the frequency domain signal; 基于各所述频点的功率谱值进行特征提取处理,以获得描述所述高频信号的频谱分布特征的频谱特征参数。A feature extraction process is performed based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing the spectrum distribution feature of the high-frequency signal. 7.根据权利要求6所述的方法,其特征在于,所述基于各所述频点的功率谱值进行特征提取处理,以获得描述所述高频信号的频谱分布特征的频谱特征参数,包括:7. The method according to claim 6, characterized in that the feature extraction process based on the power spectrum value of each frequency point to obtain spectrum feature parameters describing the spectrum distribution characteristics of the high-frequency signal comprises: 计算各所述频点的功率谱值的平均值,并确定各所述频点的功率谱值中的最大功率谱值;Calculating the average value of the power spectrum values of each of the frequency points, and determining the maximum power spectrum value among the power spectrum values of each of the frequency points; 对所述最大功率谱值与所述平均值进行求差处理,得到第一差值;Performing a difference processing on the maximum power spectrum value and the average value to obtain a first difference value; 根据所述第一差值确定所述高频信号对应的频谱特征参数。A frequency spectrum characteristic parameter corresponding to the high-frequency signal is determined according to the first difference. 8.根据权利要求7所述的方法,其特征在于,所述根据所述第一差值确定所述高频信号对应的频谱特征参数,包括:8. The method according to claim 7, wherein determining the frequency spectrum characteristic parameter corresponding to the high-frequency signal according to the first difference comprises: 若所述第一差值小于第一预定门限阈值且所述最大功率谱值小于第二预定门限阈值,则确定所述高频信号对应的频谱包络类型为第一类型;If the first difference is less than a first predetermined threshold value and the maximum power spectrum value is less than a second predetermined threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a first type; 若所述第一差值小于第一预定门限阈值且所述最大功率谱值大于第二预定门限阈值,则确定所述高频信号对应的频谱包络类型为第二类型。If the first difference is smaller than a first predetermined threshold and the maximum power spectrum value is larger than a second predetermined threshold, it is determined that the spectrum envelope type corresponding to the high-frequency signal is a second type. 9.根据权利要求7所述的方法,其特征在于,所述根据所述第一差值确定所述高频信号对应的频谱特征参数,包括:9. The method according to claim 7, wherein determining the frequency spectrum characteristic parameter corresponding to the high-frequency signal according to the first difference comprises: 若所述第一差值大于第一预定门限阈值,则对各所述频点的功率谱值进行归一化处理,得到各所述频点对应的归一值;If the first difference is greater than a first predetermined threshold, normalizing the power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point; 获取预设的至少一个目标值,每个所述目标值对应一种预设频谱包络类型;Acquire at least one preset target value, each of the target values corresponding to a preset spectrum envelope type; 计算各所述频点对应的归一值与每个所述目标值的均方误差值;Calculate the mean square error between the normalized value corresponding to each frequency point and each target value; 将最小的所述均方误差值对应的目标值所对应预设频谱包络类型,确定为所述高频信号的频谱包络类型。The preset spectrum envelope type corresponding to the target value corresponding to the minimum mean square error value is determined as the spectrum envelope type of the high-frequency signal. 10.根据权利要求9所述的方法,其特征在于,所述对各所述频点的功率谱值进行归一化处理,得到各所述频点对应的归一值,包括:10. The method according to claim 9, characterized in that the normalizing the power spectrum value of each frequency point to obtain the normalized value corresponding to each frequency point comprises: 将各所述频点的功率谱值分别与所述平均值进行求差处理,得到各所述频点对应的第二差值;Performing difference processing on the power spectrum value of each frequency point and the average value respectively to obtain a second difference value corresponding to each frequency point; 计算各所述频点对应的第二差值的平方值,并计算所述平方值的平均值,得到归一化分值;Calculating the square value of the second difference corresponding to each of the frequency points, and calculating the average value of the square values to obtain a normalized score; 将各所述频点对应的第二差值分别除以所述归一化分值,得到各所述频点对应的归一值。The second difference values corresponding to the frequency points are divided by the normalized scores to obtain normalized values corresponding to the frequency points. 11.根据权利要求5至10任一项所述的方法,其特征在于,所述将目标音频信号进行分解处理,生成高频信号与低频信号,包括:11. The method according to any one of claims 5 to 10, characterized in that the step of decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal comprises: 将所述目标音频信号进行正交镜像分解滤波处理,生成所述高频信号与所述低频信号。The target audio signal is subjected to orthogonal mirror decomposition filtering processing to generate the high-frequency signal and the low-frequency signal. 12.一种音频处理装置,其特征在于,包括:12. An audio processing device, comprising: 接收模块,用于接收高频信号的频谱特征参数及低频信号的低频编码数据,所述高频信号及所述低频信号属于目标音频信号,所述频谱特征参数为描述所述高频信号的频谱分布特征的信息,并包括频谱包络类型;A receiving module, used for receiving spectrum characteristic parameters of a high-frequency signal and low-frequency coding data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal, and the spectrum characteristic parameters are information describing the spectrum distribution characteristics of the high-frequency signal and include a spectrum envelope type; 解码模块,用于对所述低频编码数据进行解码处理,生成解码低频信号;A decoding module, used for decoding the low-frequency encoded data to generate a decoded low-frequency signal; 匹配模块,用于基于所述频谱特征参数进行预测网络匹配处理,得到所述频谱特征参数匹配的音频预测网络,所述频谱特征参数匹配的音频预测网络是基于所述频谱特征参数对应的训练样本训练的深度学习网络;A matching module, configured to perform a prediction network matching process based on the spectral feature parameters to obtain an audio prediction network matched with the spectral feature parameters, wherein the audio prediction network matched with the spectral feature parameters is a deep learning network trained based on training samples corresponding to the spectral feature parameters; 预测模块,用于基于所述音频预测网络与所述解码低频信号进行音频预测处理,以生成所述高频信号对应的预测高频信号;A prediction module, configured to perform audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; 输出模块,用于根据所述预测高频信号与所述解码低频信号,生成所述目标音频信号对应的音频输出信号;An output module, configured to generate an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal; 其中,所述基于所述频谱特征参数进行预测网络匹配处理,得到所述频谱特征参数匹配的音频预测网络,包括:The step of performing prediction network matching processing based on the spectrum feature parameters to obtain an audio prediction network matching the spectrum feature parameters includes: 获取至少一个预设音频预测网络的网络信息,每个所述网络信息对应一种预设频谱包络类型;Acquire network information of at least one preset audio prediction network, each of the network information corresponds to a preset spectrum envelope type; 确定所述频谱包络类型匹配的预设频谱包络类型所对应网络信息,得到目标网络信息;Determine network information corresponding to a preset spectrum envelope type that matches the spectrum envelope type, and obtain target network information; 将所述目标网络信息对应的预设音频预测网络,确定为所述频谱特征参数匹配的音频预测网络。The preset audio prediction network corresponding to the target network information is determined as the audio prediction network that matches the frequency spectrum feature parameters. 13.一种音频处理装置,其特征在于,包括:13. An audio processing device, comprising: 分解模块,用于将目标音频信号进行分解处理,生成高频信号与低频信号;A decomposition module is used to decompose the target audio signal to generate a high-frequency signal and a low-frequency signal; 提取模块,用于对所述高频信号进行特征提取处理,以获得所述高频信号对应的频谱特征参数,所述频谱特征参数为描述所述高频信号的频谱分布特征的信息,并包括频谱包络类型;An extraction module, used for performing feature extraction processing on the high-frequency signal to obtain a spectrum feature parameter corresponding to the high-frequency signal, wherein the spectrum feature parameter is information describing the spectrum distribution characteristics of the high-frequency signal and includes a spectrum envelope type; 编码模块,用于对所述低频信号进行音频编码处理,生成所述低频信号对应的低频编码数据;An encoding module, used for performing audio encoding processing on the low-frequency signal to generate low-frequency encoding data corresponding to the low-frequency signal; 输送模块,用于将所述频谱特征参数与所述低频编码数据发送至接收端,以使得所述接收端确定所述频谱特征参数匹配的音频预测网络,并基于所述音频预测网络与所述低频编码数据所解码得到的解码低频信号生成音频输出信号,其中,所述频谱特征参数匹配的音频预测网络是基于所述频谱特征参数对应的训练样本训练的深度学习网络;A delivery module, configured to send the spectral feature parameters and the low-frequency coded data to a receiving end, so that the receiving end determines an audio prediction network that matches the spectral feature parameters, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data, wherein the audio prediction network that matches the spectral feature parameters is a deep learning network trained based on training samples corresponding to the spectral feature parameters; 其中,所述确定所述频谱特征参数匹配的音频预测网络,包括:Wherein, the determining of the audio prediction network matching the spectrum feature parameters comprises: 获取至少一个预设音频预测网络的网络信息,每个所述网络信息对应一种预设频谱包络类型;Acquire network information of at least one preset audio prediction network, each of the network information corresponds to a preset spectrum envelope type; 确定所述频谱包络类型匹配的预设频谱包络类型所对应网络信息,得到目标网络信息;Determine network information corresponding to a preset spectrum envelope type that matches the spectrum envelope type, and obtain target network information; 将所述目标网络信息对应的预设音频预测网络,确定为所述频谱特征参数匹配的音频预测网络。The preset audio prediction network corresponding to the target network information is determined as the audio prediction network that matches the frequency spectrum feature parameters. 14.一种计算机可读存储介质,其特征在于,其上存储有计算机程序,当所述计算机程序被计算机的处理器执行时,使计算机执行权利要求1至4及5至11中任一项所述的方法。14. A computer-readable storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed by a processor of a computer, the computer is caused to execute the method according to any one of claims 1 to 4 and 5 to 11. 15.一种电子设备,其特征在于,包括:存储器,存储有计算机程序;处理器,读取存储器存储的计算机程序,以执行权利要求1至4及5至11中任一项所述的方法。15. An electronic device, comprising: a memory storing a computer program; and a processor reading the computer program stored in the memory to execute the method according to any one of claims 1 to 4 and 5 to 11. 16.一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序,所述计算机程序被处理器执行时实现权利要求1至4及5至11中任一项的方法。16. A computer program product, characterized in that the computer program product comprises a computer program, and when the computer program is executed by a processor, the method of any one of claims 1 to 4 and 5 to 11 is implemented.
CN202111371005.1A 2021-11-18 2021-11-18 Audio processing method, device, storage medium, equipment and product Active CN114333861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111371005.1A CN114333861B (en) 2021-11-18 2021-11-18 Audio processing method, device, storage medium, equipment and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111371005.1A CN114333861B (en) 2021-11-18 2021-11-18 Audio processing method, device, storage medium, equipment and product

Publications (2)

Publication Number Publication Date
CN114333861A CN114333861A (en) 2022-04-12
CN114333861B true CN114333861B (en) 2025-07-11

Family

ID=81045765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111371005.1A Active CN114333861B (en) 2021-11-18 2021-11-18 Audio processing method, device, storage medium, equipment and product

Country Status (1)

Country Link
CN (1) CN114333861B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550732B (en) * 2022-04-15 2022-07-08 腾讯科技(深圳)有限公司 Coding and decoding method and related device for high-frequency audio signal

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971693B (en) * 2013-01-29 2017-02-22 华为技术有限公司 High-band signal prediction method, encoding/decoding device
EP3182412B1 (en) * 2014-08-15 2023-06-07 Samsung Electronics Co., Ltd. Sound quality improving method and device, sound decoding method and device, and multimedia device employing same
CN105070293B (en) * 2015-08-31 2018-08-21 武汉大学 Audio bandwidth expansion coding-decoding method based on deep neural network and device
US10008218B2 (en) * 2016-08-03 2018-06-26 Dolby Laboratories Licensing Corporation Blind bandwidth extension using K-means and a support vector machine
CN107945811B (en) * 2017-10-23 2021-06-01 北京大学 A Generative Adversarial Network Training Method and Audio Coding and Decoding Method for Band Expansion
CN110556122B (en) * 2019-09-18 2024-01-19 腾讯科技(深圳)有限公司 Frequency band extension method, device, electronic equipment and computer-readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112767954A (en) * 2020-06-24 2021-05-07 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN114333861A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
US11308978B2 (en) Systems and methods for energy efficient and low power distributed automatic speech recognition on wearable devices
CN101896964A (en) Systems, methods and devices for contextual descriptor transfer
CN114550732B (en) Coding and decoding method and related device for high-frequency audio signal
US9449605B2 (en) Inactive sound signal parameter estimation method and comfort noise generation method and system
BRPI0812029B1 (en) method of recovering hidden data, telecommunication device, data hiding device, data hiding method and upper set box
CN116741193B (en) Training method and device for voice enhancement network, storage medium and computer equipment
CN114338623A (en) Audio processing method, device, equipment, medium and computer program product
CN110111811A (en) Audio signal detection method, device and storage medium
CN114333861B (en) Audio processing method, device, storage medium, equipment and product
CN117594059A (en) Audio repair methods, devices, storage media and electronic equipment
CN112767955A (en) Audio encoding method and device, storage medium and electronic equipment
CN114329042B (en) Data processing method, device, equipment, storage medium and computer program product
CN115130569A (en) Audio processing method, apparatus and computer equipment, storage medium, program product
HK40070975A (en) Audio processing method, device, storage medium, equipment and product
HK40070975B (en) Audio processing method, device, storage medium, equipment and product
CN118230728A (en) Method, device, storage medium, and speech recognition system for processing audio data
CN113571081B (en) Speech enhancement method, device, equipment and storage medium
CN112133279B (en) Vehicle-mounted information broadcasting method and device and terminal equipment
CN113259063B (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN109273003A (en) Voice control method and system for driving recorder
CN112908346B (en) Packet loss recovery method and device, electronic device, and computer-readable storage medium
CN109473116A (en) Speech coding method, speech decoding method and device
CN120375836A (en) Speech compression coding method, decoding method and related equipment based on satellite channel
HK40070387B (en) Method for encoding and decoding high-frequency audio signal, and related apparatus
HK40070387A (en) Method for encoding and decoding high-frequency audio signal, and related apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40070975

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant