HK40070975A - Audio processing method, device, storage medium, equipment and product - Google Patents
Audio processing method, device, storage medium, equipment and product Download PDFInfo
- Publication number
- HK40070975A HK40070975A HK42022060733.7A HK42022060733A HK40070975A HK 40070975 A HK40070975 A HK 40070975A HK 42022060733 A HK42022060733 A HK 42022060733A HK 40070975 A HK40070975 A HK 40070975A
- Authority
- HK
- Hong Kong
- Prior art keywords
- frequency
- frequency signal
- low
- audio
- spectrum
- Prior art date
Links
Description
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an audio processing method, apparatus, storage medium, device, and product.
Background
The audio processing is usually mainly audio encoding and decoding processing, the audio encoding and decoding processing process is mainly that sound signals are collected by a collecting end, the collecting end encodes and compresses the audio signals of the collected sound signals and then sends the audio signals to a receiving end, and the receiving end decodes and plays the sound.
At present, in the related art, a certain mode is adopted by a collecting end to reduce the code rate of an audio signal and then transmit the audio signal to a receiving end so as to reduce the transmission bandwidth, however, in the current mode, the code rate reduction is limited, which results in poor transmission bandwidth reduction effect, and the receiving end generates uncontrollable audio output signals in order to reduce the code rate, which results in poor audio playing effect.
Disclosure of Invention
The embodiment of the application provides an audio processing scheme, which can effectively reduce the transmission bandwidth of audio data and ensure the audio playing effect.
In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:
according to an embodiment of the present application, an audio processing method includes: receiving spectral characteristic parameters of a high-frequency signal and low-frequency coded data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal; decoding the low-frequency coded data to generate a decoded low-frequency signal; performing prediction network matching processing based on the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters; performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
According to one embodiment of the present application, an audio processing apparatus includes: the receiving module is used for receiving the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coded data of the low-frequency signals, wherein the high-frequency signals and the low-frequency signals belong to target audio signals; the decoding module is used for decoding the low-frequency coded data to generate a decoded low-frequency signal; the matching module is used for performing prediction network matching processing based on the spectrum characteristic parameters to obtain an audio prediction network matched with the spectrum characteristic parameters; the prediction module is used for carrying out audio prediction processing on the basis of the audio prediction network and the decoded low-frequency signal so as to generate a predicted high-frequency signal corresponding to the high-frequency signal; and the output module is used for generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type; the matching module comprises: the information acquisition unit is used for acquiring network information of at least one preset audio prediction network, and each network information corresponds to a preset spectrum envelope type; the network matching unit is used for determining network information corresponding to a preset spectrum envelope type matched with the spectrum envelope type to obtain target network information; and the network determining unit is used for determining a preset audio prediction network corresponding to the target network information as the audio prediction network matched with the spectral characteristic parameters.
In some embodiments of the present application, the prediction module comprises: the extraction processing unit is used for extracting the frequency spectrum characteristics of the decoded low-frequency signal to obtain low-frequency spectrum information; the information prediction unit is used for performing audio prediction processing on the basis of the low-frequency spectrum information by adopting the audio prediction network to obtain predicted spectrum information; and a signal generating unit for generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectrum information.
In some embodiments of the present application, the extraction processing unit is configured to: carrying out improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information; the signal generation unit is configured to: and performing improved inverse discrete cosine transform processing on the predicted frequency spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
In some embodiments of the present application, the output module is configured to: and performing orthogonal mirror synthesis filtering processing on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal.
According to an embodiment of the present application, an audio processing method includes: decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal; carrying out feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal; carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal; and sending the spectral characteristic parameters and the low-frequency coded data to a receiving end so that the receiving end determines an audio prediction network matched with the spectral characteristic parameters, and generating an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data.
According to one embodiment of the present application, an audio processing apparatus includes: the decomposition module is used for decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal; the extraction module is used for carrying out feature extraction processing on the high-frequency signal so as to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal; the coding module is used for carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal; and the transmission module is used for transmitting the spectral characteristic parameters and the low-frequency coded data to a receiving end so that the receiving end determines an audio prediction network matched with the spectral characteristic parameters and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data.
In some embodiments of the present application, the extraction module comprises: the frequency domain conversion unit is used for carrying out frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal; the power spectrum value calculating unit is used for calculating the power spectrum value of each frequency point in the frequency domain signal; and the frequency spectrum characteristic parameter acquisition unit is used for performing characteristic extraction processing on the basis of the power spectrum value of each frequency point to acquire frequency spectrum characteristic parameters describing the frequency spectrum distribution characteristics of the high-frequency signals.
In some embodiments of the present application, the spectrum characteristic parameter obtaining unit includes: the element calculating subunit is used for calculating the average value of the power spectrum values of the frequency points and determining the maximum power spectrum value in the power spectrum values of the frequency points; the difference processing subunit is configured to perform difference processing on the maximum power spectrum value and the average value to obtain a first difference value; and the spectral characteristic parameter determining subunit is configured to determine a spectral characteristic parameter corresponding to the high-frequency signal according to the first difference.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type; the spectral feature parameter determining subunit is configured to: if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is smaller than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a first type; and if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is larger than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type; the spectral feature parameter determining subunit is configured to: if the first difference value is larger than a first preset threshold value, normalizing the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point; acquiring at least one preset target value, wherein each target value corresponds to a preset spectrum envelope type; calculating a mean square error value of the normalization value corresponding to each frequency point and each target value; and determining the preset spectrum envelope type corresponding to the target value corresponding to the minimum mean square error value as the spectrum envelope type of the high-frequency signal.
In some embodiments of the present application, the spectral feature parameter determining subunit is configured to: performing difference processing on the power spectrum value of each frequency point and the average value respectively to obtain a second difference value corresponding to each frequency point; calculating a square value of a second difference value corresponding to each frequency point, and calculating an average value of the square values to obtain a normalized score; and dividing the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
In some embodiments of the present application, the decomposition module is configured to: and carrying out orthogonal mirror image decomposition filtering processing on the target audio signal to generate the high-frequency signal and the low-frequency signal.
According to another embodiment of the present application, a computer-readable storage medium has stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the method of an embodiment of the present application.
According to another embodiment of the present application, an electronic device includes: a memory storing a computer program; and the processor reads the computer program stored in the memory to execute the method in the embodiment of the application.
According to another embodiment of the present application, a computer program product or computer program comprises computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described in the embodiments of this application.
In the embodiment of the application, frequency spectrum characteristic parameters of a high-frequency signal and low-frequency coded data of a low-frequency signal are received, and the high-frequency signal and the low-frequency signal are generated by decomposing a target audio signal; decoding the low-frequency coded data to generate a decoded low-frequency signal; performing prediction network matching processing based on the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters; performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
In this way, for a target audio signal, the spectral distribution characteristics of a high-frequency signal in the target audio signal can be described through the spectral characteristic parameters with a small data size, only the spectral characteristic parameters and low-frequency coded data of the low-frequency signal need to be transmitted when the data are received, the transmission bandwidth is effectively reduced, meanwhile, a matched audio prediction network is selected based on the spectral characteristic parameters to restore the high-frequency signal, and a predicted high-frequency signal is generated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 shows a schematic diagram of a system to which embodiments of the present application may be applied.
Fig. 2 shows a flow diagram of an audio processing method according to an embodiment of the application.
Fig. 3 shows a schematic diagram of spectral envelope types according to an embodiment of the present application.
Fig. 4 shows a flow diagram of an audio processing method according to another embodiment of the present application.
Fig. 5 shows a flow chart of an audio processing procedure in one scenario.
Fig. 6 shows another flow chart of the audio processing procedure in one scenario.
Fig. 7 shows a block diagram of an audio processing device according to an embodiment of the application.
Fig. 8 shows a block diagram of an audio processing device according to another embodiment of the present application.
FIG. 9 shows a block diagram of an electronic device according to an embodiment of the application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, the principles of the present application are described in the foregoing text and are not meant to be limiting, as those of ordinary skill in the art will appreciate that various steps and operations described below may be implemented in hardware.
Fig. 1 shows a schematic diagram of a system 100 to which embodiments of the present application may be applied. As shown in fig. 1, the system 100 may include a terminal 101, a terminal 102, and a server 103.
The terminal 101 and the terminal 102 may be any devices, and the terminal 102 includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, a VR/AR device, an intelligent watch, a computer, and the like.
The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like. In one embodiment of this example, the server 103 is a cloud server.
In some examples, the terminal 101, the terminal 102, and the server 103 may be nodes in a blockchain network, which may improve the security of audio processing.
In one embodiment of this example, the terminal 101 may: receiving spectral characteristic parameters of a high-frequency signal and low-frequency coded data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal; decoding the low-frequency coded data to generate a decoded low-frequency signal; performing prediction network matching processing based on the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters; performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
In an example, the frequency spectrum characteristic parameter of the high frequency signal and the low frequency encoding data of the low frequency signal may be directly sent by the terminal 102 as an acquisition end to the terminal 101 as a receiving end; in one example, the spectral characteristic parameter of the high frequency signal and the low frequency encoded data of the low frequency signal may be transmitted by the terminal 102 to the terminal 101 through the server 103; in an example, the spectral characteristic parameter of the high frequency signal and the low frequency encoded data of the low frequency signal may be sent by an a processing unit serving as an acquisition end in the terminal 101 to a B processing unit serving as a receiving end in the terminal 101.
In one implementation of this example, the terminal 102 may: decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal; carrying out feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal; carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal; and sending the spectral characteristic parameters and the low-frequency coded data to a receiving end so that the receiving end determines an audio prediction network matched with the spectral characteristic parameters, and generating an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data.
In an example, the receiving end may be the terminal 101, and the terminal 102 may directly send the spectrum characteristic parameter and the low frequency encoded data to the terminal 101, or the terminal 102 may send the spectrum characteristic parameter and the low frequency encoded data to the terminal 101 through the server 103; in an example, the receiving end may be one C processing unit in the middle terminal 102, and the D processing unit in the terminal 102 as the acquiring end may send the spectral feature parameter and the low frequency encoded data to the C processing unit.
Fig. 2 schematically shows a flow chart of an audio processing method according to an embodiment of the application. The main body of the audio processing method may be any receiving end, which may decode the received audio related data to generate an audio output signal, where the audio output signal is used to play sound, and the receiving end is, for example, the terminal 101 or the terminal 102 shown in fig. 1.
As shown in fig. 2, the audio processing method may include steps S210 to S250.
Step S210, receiving spectral characteristic parameters of a high-frequency signal and low-frequency coded data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal; step S220, decoding the low-frequency coded data to generate a decoded low-frequency signal; step S230, performing prediction network matching processing based on the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters; step S240, performing audio prediction processing on the decoded low-frequency signal based on an audio prediction network to generate a predicted high-frequency signal corresponding to the high-frequency signal; in step S250, an audio output signal corresponding to the target audio signal is generated based on the predicted high frequency signal and the decoded low frequency signal.
In this way, for a target audio signal, the spectral distribution characteristics of a high-frequency signal in the target audio signal can be described through the spectral characteristic parameters with a small data size, only the spectral characteristic parameters and low-frequency coded data of the low-frequency signal need to be transmitted when the data are received, the transmission bandwidth is effectively reduced, meanwhile, a matched audio prediction network is selected based on the spectral characteristic parameters to restore the high-frequency signal, and a predicted high-frequency signal is generated.
The following describes a specific procedure of each step performed when audio processing is performed in the embodiment shown in fig. 2.
In step S210, spectral feature parameters of a high frequency signal and low frequency encoded data of a low frequency signal are received, and the high frequency signal and the low frequency signal are generated by decomposing a target audio signal.
The target audio signal may be a digital audio signal generated by the acquisition terminal through analog-to-digital conversion of the acquired sound signal, the target audio signal may be decomposed into a high frequency signal and a low frequency signal, the high frequency signal may be a part of the target audio signal higher than a predetermined frequency, and the low frequency signal may be a part of the target audio signal lower than the predetermined frequency.
The acquisition end can generate spectrum characteristic parameters for describing the spectrum distribution characteristics according to the extracted spectrum distribution characteristics by extracting the spectrum distribution characteristics of the high-frequency signals. The spectrum characteristic parameter may be a number or an identifier, and the data size of the spectrum characteristic parameter may be controlled very little, for example, in an example, a spectrum distribution characteristic is described by "1", and only "1" of 3 bits (bits) is required to describe a spectrum distribution characteristic. And the low frequency signal can be encoded by a traditional voice encoder (which can be a CELP, SILK, AAC, etc. encoder) to generate low frequency encoded data.
After the acquisition end generates the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coded data of the low-frequency signals, the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coded data of the low-frequency signals can be sent to the receiving end. The acquisition end can combine the frequency spectrum characteristic parameters and the low-frequency coded data into a coded code stream to be transmitted to the receiving end when transmitting, and the code stream of the coded code stream can be extremely small.
In step S220, the low frequency encoded data is decoded to generate a decoded low frequency signal.
The receiving end can decode the low-frequency coded data through a traditional voice decoder to generate a decoded low-frequency signal, and the decoded low-frequency signal is also the decoded low-frequency signal.
In step S230, a prediction network matching process is performed based on the spectral feature parameters, so as to obtain an audio prediction network matched with the spectral feature parameters.
The audio prediction network is a prediction network for predicting high frequency signals, and the audio prediction network may be a deep learning network. The prediction network matching process is performed based on the spectral feature parameters, that is, an audio prediction network with matched spectral feature parameters is determined, and it can be understood that each spectral feature parameter may correspond to an audio prediction network.
The audio prediction network is used for predicting a predicted high-frequency signal corresponding to a high-frequency signal according to low-frequency coded data corresponding to a low-frequency signal to realize mapping of the audio from low frequency to high frequency, the mapping of the audio from low frequency to high frequency has diversity, the diversity of samples can be limited by adding spectral characteristic parameters of the high-frequency signal to match the corresponding audio prediction network, namely, a plurality of preset audio prediction networks are trained, each network corresponds to a training sample (the training sample can comprise an input sample, namely signal characteristic information (such as spectral information) of the low-frequency signal, and signal characteristic information (such as spectral information) of the high-frequency signal to be expected to be output) and is trained independently, and the audio prediction network matched with the spectral characteristic parameters can be determined from the plurality of preset audio prediction networks.
For example, the preset audio prediction network a may be trained based on the training sample corresponding to the spectral feature parameter 1, and the preset audio prediction network B may be trained based on the training sample corresponding to the spectral feature parameter 2. At this time, if the received spectral feature parameter is 1, the audio prediction network matched with the received spectral feature parameter is a preset audio prediction network a.
In one embodiment, the spectral feature parameters include a spectral envelope type; step S230, performing prediction network matching processing based on the spectral feature parameters to obtain an audio prediction network matched with the spectral feature parameters, including: acquiring network information of at least one preset audio prediction network, wherein each network information corresponds to a preset spectrum envelope type; determining network information corresponding to a preset spectrum envelope type matched with the spectrum envelope type to obtain target network information; and determining a preset audio prediction network corresponding to the target network information as an audio prediction network matched with the spectral characteristic parameters.
The spectral characteristic parameter is information describing spectral distribution characteristics, in this embodiment, the spectral characteristic parameter is a spectral envelope type, the spectral distribution characteristics are spectral envelope characteristics, the spectral envelope type is information describing the spectral envelope characteristics, and the spectral envelope characteristics can represent a trend of spectral value change in a signal spectrum, that is, each trend of spectral value change corresponds to one spectral envelope type.
The preset spectrum envelope type is a preset spectrum envelope type, the network information can be an identifier of a preset audio prediction network, and the preset audio prediction network is a pre-trained audio prediction network. Each network information corresponds to a preset spectrum envelope type, that is, each preset audio prediction network corresponds to a preset spectrum envelope type.
In this embodiment, each network corresponds to its own training sample and trains independently, for example, the a preset audio prediction network may be trained based on the training sample corresponding to the spectral envelope type 1, and the B preset audio prediction network may be trained based on the training sample corresponding to the spectral envelope type 2. At this time, if the received spectral envelope type is 1, the audio prediction network matched with the received spectral envelope type is a preset audio prediction network a.
Determining a preset spectral envelope type matched with the received spectral envelope type, for example, the received spectral envelope type is 1, and the preset spectral envelope type matched with the received spectral envelope type is 1. And if the network information corresponding to the matched preset spectrum envelope type 1 is X, the target network information is X. And further, the audio prediction network matched with the spectral characteristic parameters is a preset audio prediction network indicated by X.
The accuracy of predicting high-frequency signals of the audio prediction network can be effectively improved in a mode of performing prediction network matching by using a spectrum envelope type.
In one embodiment, referring to fig. 3, the default spectral envelope types include 8 types: the '0' type is a low-energy flat-laying type; the '1' type is a high-energy flat-laying type; the '2' type is an energy convex type; the 3 type is an energy concave type; the '4' type is an energy gradual rising type; a "5" energy taper type; the 6 type is a step type with high energy front and low energy back; the "7" pattern is a step pattern with low energy and high energy. The applicant finds that in the mode of the type of division, the prediction accuracy of the audio prediction network can be further improved by the mode of carrying out prediction network matching according to the type of the spectrum envelope.
It is understood that in other embodiments, the spectral feature parameter may include other feature information, such as information describing features such as a proportion of power spectral values in the spectrum that exceed a predetermined threshold; further step S230, performing prediction network matching processing based on the spectral feature parameters to obtain an audio prediction network matched with the spectral feature parameters, which may include: acquiring network information of at least one preset audio prediction network, wherein each network information corresponds to a preset spectrum envelope type; determining network information corresponding to the preset spectrum envelope type matched with the other characteristic information to obtain target network information; and determining a preset audio prediction network corresponding to the target network information as an audio prediction network matched with the spectral characteristic parameters.
In step S240, an audio prediction process is performed on the basis of the audio prediction network and the decoded low frequency signal to generate a predicted high frequency signal corresponding to the high frequency signal.
When the audio prediction network based on the spectral feature parameter matching is used for prediction, as the general spectral distribution feature can be described through a small amount of data, the error of the predicted high-frequency signal and the original high-frequency signal is controllable, and the matched audio prediction network can accurately predict the high-frequency signal.
Wherein, the signal characteristic information (such as spectrum information) of the decoded low-frequency signal can be input into the audio prediction network, the audio prediction network can predict the signal characteristic information (such as spectrum information) of the output high-frequency signal, and the output signal characteristic can be used for restoring the high-frequency signal to generate the predicted audio signal.
In one embodiment, the step S240 of performing an audio prediction process based on the audio prediction network and the decoded low frequency signal to generate a predicted high frequency signal corresponding to the high frequency signal includes: performing spectrum feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information; performing audio prediction processing based on the low-frequency spectrum information by adopting the audio prediction network to obtain predicted spectrum information; and generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectrum information.
The low-frequency spectrum information is input into an audio prediction network, the audio prediction network carries out audio prediction processing, so that predicted spectrum information is output, and audio restoration is carried out based on the predicted spectrum information, so that a predicted high-frequency signal corresponding to the high-frequency signal can be obtained.
In an embodiment, the performing a spectral feature extraction process on the decoded low-frequency signal to obtain low-frequency spectral information includes: carrying out improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information; the generating a predicted high frequency signal corresponding to the high frequency signal based on the predicted spectrum information includes: and performing improved inverse discrete cosine transform processing on the predicted frequency spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
In this embodiment, Modified Discrete Cosine Transform (MDCT) processing may be performed on the decoded low-frequency signal by using a Modified Discrete Cosine Transform (MDCT) to obtain low-frequency spectrum information. Then, with respect to the predicted spectrum information predicted by the audio prediction network, an Inverse Modified Discrete Cosine Transform (IMDCT) may be used to perform Inverse Modified Discrete Cosine Transform processing, thereby generating a predicted high frequency signal. In this way of prediction, the applicant has found that the accuracy of predicting the audio signal can be further improved.
In step S250, an audio output signal corresponding to the target audio signal is generated based on the predicted high frequency signal and the decoded low frequency signal.
The predicted high-frequency signal is a predicted high-frequency signal, the decoded low-frequency signal is an original low-frequency signal, and the predicted high-frequency signal and the decoded low-frequency signal are synthesized to generate an audio output signal corresponding to the original target audio signal.
In one embodiment, the step S250 of generating an audio output signal corresponding to the target audio signal according to the predicted high frequency signal and the decoded low frequency signal includes: and performing orthogonal mirror synthesis filtering processing on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal. The predicted high-frequency signal and the decoded low-frequency signal may be subjected to quadrature Mirror synthesis filtering by a Quadrature Mirror Filter (QMF) to generate a full-band audio output signal corresponding to the target audio signal.
It is understood that in other embodiments, the predicted high frequency signal and the decoded low frequency signal may be subjected to synthesis filtering processing by other existing synthesis filters to generate an audio output signal.
Fig. 4 schematically shows a flow chart of an audio processing method according to an embodiment of the application. The execution subject of the audio processing method may be any terminal, such as the terminal 101 or the terminal 102 shown in fig. 1.
As shown in fig. 4, the audio processing method may include steps S310 to S340.
Step S310, decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal; step S320, performing feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal; step S330, carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal; step S340, sending the spectral feature parameters and the low-frequency encoded data to a receiving end, so that the receiving end determines an audio prediction network matched with the spectral feature parameters, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
In this way, for a target audio signal, the spectral distribution characteristics of a high-frequency signal in the target audio signal can be described through the spectral characteristic parameters with extremely small data size, only the spectral characteristic parameters and the low-frequency coded data of the low-frequency signal need to be transmitted during transmission, the transmission bandwidth is effectively reduced, meanwhile, a matched audio prediction network is selected based on the spectral characteristic parameters to restore the high-frequency signal, and a predicted high-frequency signal is generated.
The following describes a specific procedure of each step performed when audio processing is performed in the embodiment shown in fig. 3.
In step S310, the target audio signal is subjected to decomposition processing to generate a high-frequency signal and a low-frequency signal.
The target audio signal may be a digital audio signal generated by the acquisition terminal through analog-to-digital conversion of the acquired sound signal, the target audio signal may be decomposed into a high frequency signal and a low frequency signal, the high frequency signal may be a part of the target audio signal higher than a predetermined frequency, and the low frequency signal may be a part of the target audio signal lower than the predetermined frequency.
The target audio signal may be decomposed by a band-pass Filter (BPF) set or a Quadrature Mirror Filter (QMF) set according to a corresponding manner to generate a high-frequency signal and a low-frequency signal.
In one embodiment, the step S310 of decomposing the target audio signal to generate a high frequency signal and a low frequency signal includes: and carrying out orthogonal mirror image decomposition filtering processing on the target audio signal to generate the high-frequency signal and the low-frequency signal. The target audio signal may be subjected to quadrature Mirror decomposition filtering processing by a Quadrature Mirror Filter (QMF) set, so as to generate a high-frequency signal and a low-frequency signal.
In step S320, feature extraction processing is performed on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal.
The acquisition end can generate spectrum characteristic parameters for describing the spectrum distribution characteristics according to the extracted spectrum distribution characteristics by extracting the spectrum distribution characteristics of the high-frequency signals. The spectral distribution characteristics may include, for example, spectral envelope characteristics and a proportion of power spectral values in the spectrum that exceed a predetermined threshold. In one embodiment of the present example, the spectral distribution characteristic is a spectral envelope characteristic, and the spectral characteristic parameter is a spectral envelope type.
The spectrum characteristic parameter may be a number or an identifier, and the data size of the spectrum characteristic parameter may be controlled very little, for example, in an example, a spectrum distribution characteristic is described by "1", and only "1" of 3 bits (bits) is required to describe a spectrum distribution characteristic.
In one embodiment, step S320, performing feature extraction processing on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal, includes: carrying out frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal; calculating the power spectrum value of each frequency point in the frequency domain signal; and performing characteristic extraction processing based on the power spectrum value of each frequency point to obtain a spectrum characteristic parameter for describing the spectrum distribution characteristic of the high-frequency signal.
The high-frequency signal is a time domain signal, a frequency domain signal of a frequency domain can be obtained through frequency domain conversion processing (such as fourier transform processing), and a power spectrum value of each frequency point in the frequency domain signal can be extracted based on a frequency spectrum (namely a spectrogram) of the frequency domain signal. The frequency spectrum distribution characteristics of the high-frequency signals can be accurately analyzed by extracting the characteristics based on the power spectrum values of the frequency points, and then frequency spectrum characteristic parameters for describing the frequency spectrum distribution characteristics are generated. In one example, the power spectrum value of each frequency point may be a logarithmic value of the power spectrum of each frequency point.
In one embodiment, the performing feature extraction processing based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing a spectrum distribution feature of the high-frequency signal includes: calculating the average value of the power spectrum values of the frequency points, and determining the maximum power spectrum value in the power spectrum values of the frequency points; performing difference processing on the maximum power spectrum value and the average value to obtain a first difference value; and determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference.
For example, the power spectrum value of each frequency point is x (i), i ∈ [ N1, N2], i is the number of the frequency point, the average value xavg of the power spectrum values of each frequency point is the average value of all x (i), the maximum power spectrum value xmax in the power spectrum values of each frequency point is the largest one of all x (i), and the first difference value is obtained by subtracting the average value xavg from the maximum power spectrum value xmax. Therefore, the spectral distribution characteristic can be accurately reflected based on the size of the first difference, and particularly, the spectral envelope characteristic can be accurately reflected.
In some other manners, when feature extraction processing is performed based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing a spectrum distribution feature of the high-frequency signal, the corresponding spectrum feature parameter may be determined according to a scale of the power spectrum value exceeding a predetermined threshold in the spectrum by calculating the scale.
In one embodiment, the spectral feature parameters include a spectral envelope type; the determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference value includes: if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is smaller than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a first type; and if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is larger than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type. The first type and the second type each describe a corresponding spectral envelope characteristic.
In an implementation manner under this embodiment, referring to fig. 3, the default spectral envelope type includes: the '0' type is a low-energy flat-laying type; the "1" type is a high energy tiled type. If the first difference is smaller than the first predetermined threshold C1 and the maximum power spectrum value is smaller than the second predetermined threshold C2, it may be determined that the spectrum envelope type corresponding to the high frequency signal is the first type: a low energy tiled type of "0" type. If the first difference is smaller than a first predetermined threshold value C1 and the maximum power spectrum value is larger than a second predetermined threshold value C2, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type: high energy tiled type of "1" type.
In one embodiment, the spectral feature parameters include a spectral envelope type; the determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference value includes: if the first difference value is larger than a first preset threshold value, normalizing the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point; acquiring at least one preset target value, wherein each target value corresponds to a preset spectrum envelope type; calculating a mean square error value of the normalization value corresponding to each frequency point and each target value; and determining the preset spectrum envelope type corresponding to the target value corresponding to the minimum mean square error value as the spectrum envelope type of the high-frequency signal.
In an implementation manner under this embodiment, referring to fig. 3, the default spectral envelope type includes: the '2' type is an energy convex type; the 3 type is an energy concave type; the '4' type is an energy gradual rising type; the '5' type is an energy gradually-reduced type; the 6 type is a step type with high energy front and low energy back; the "7" pattern is a step pattern with low energy and high energy.
Each target value corresponds to a preset spectrum envelope type, and the target value is a preset value. In one example, the target value may be z (i), i ∈ [ N1, N2], i is the sequence number of the frequency bin, z (i) is smaller than 1, and for example, if N2 to N1+1 is equal to 9, the target value z (i) corresponding to type "2" may be set to 000111000.
And calculating a mean square error value of the normalization value corresponding to each frequency point and each target value, and accurately determining the preset spectrum envelope type with the closest spectrum distribution characteristic of the high-frequency signal based on the minimum mean square error value. For example, if the mean square error value of the target value corresponding to the type "2" and the normalized value corresponding to each frequency point is the smallest compared to the types "3" to "7", it can be determined that the energy convex type of the type "2" is the spectrum envelope type of the high-frequency signal.
In one embodiment, the normalizing the power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point includes: performing difference processing on the power spectrum value of each frequency point and the average value respectively to obtain a second difference value corresponding to each frequency point; calculating a square value of a second difference value corresponding to each frequency point, and calculating an average value of the square values to obtain a normalized score; and dividing the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
In this embodiment, the method may specifically be based on a formulaAndand carrying out normalization processing to obtain a normalization value y (i) corresponding to each frequency point i, wherein N2-N1+1 is the total number of the frequency points, N2-N1 are the frequency point number ranges, xavg is an average value, x (i) is the power spectrum value of each frequency point i, x (i) -xavg is a second difference value corresponding to each frequency point, and std is a normalization score (average value of square values).
In step S330, the low frequency signal is subjected to audio encoding processing, and low frequency encoded data corresponding to the low frequency signal is generated.
The low frequency signal can be encoded by a traditional speech encoder (which can be a CELP, SILK, AAC, etc. encoder) to generate low frequency encoded data.
In step S340, the spectral feature parameters and the low frequency encoded data are transmitted to the receiving end, so that the receiving end determines an audio prediction network matched with the spectral feature parameters, and generates an audio output signal based on a decoded low frequency signal decoded by the audio prediction network and the low frequency encoded data.
When the frequency spectrum characteristic parameters and the low-frequency coded data are sent to the receiving end, the frequency spectrum characteristic parameters and the low-frequency coded data can be combined into a coded code stream which can be sent to the receiving end, and the code stream of the coded code stream can be extremely small. The receiving end may determine an audio prediction network with matched spectral feature parameters based on the steps in the embodiment shown in fig. 2, and generate an audio output signal based on the audio prediction network and a decoded low-frequency signal obtained by decoding the low-frequency encoded data.
According to the method described in the above embodiment, the following is further described in detail with reference to the application scenario example. The meaning of the related noun in this context is the same as that in the foregoing embodiment, and the description in the foregoing embodiment may be specifically referred to. Fig. 5 and fig. 6 show a flow of audio processing in this application scenario, where the foregoing embodiment of the present application is applied to audio processing.
First, referring to fig. 5, an encoding process in an audio processing process, which may include steps S410 to S450, is performed at the acquisition side.
In step S410, a target audio signal is input: specifically, the acquisition end may acquire a target audio signal generated by analog-to-digital conversion of the sound signal.
In step S420, QMF decomposition: specifically, decomposing a target audio signal to generate a high-frequency signal and a low-frequency signal, and decomposing the target audio signal to generate the high-frequency signal and the low-frequency signal includes: and carrying out orthogonal mirror image decomposition filtering processing on the target audio signal to generate the high-frequency signal and the low-frequency signal. The target audio signal may be subjected to quadrature Mirror decomposition filtering processing by a Quadrature Mirror Filter (QMF) set, so as to generate a high-frequency signal and a low-frequency signal.
In step S430, spectral frame extraction: specifically, feature extraction processing is performed on the high-frequency signal to obtain a spectrum feature parameter corresponding to the high-frequency signal.
The method for extracting the characteristics of the high-frequency signal to obtain the spectrum characteristic parameters corresponding to the high-frequency signal comprises the following steps: carrying out frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal; calculating the power spectrum value of each frequency point in the frequency domain signal; and performing characteristic extraction processing based on the power spectrum value of each frequency point to obtain a spectrum characteristic parameter for describing the spectrum distribution characteristic of the high-frequency signal.
Performing feature extraction processing based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing the spectrum distribution feature of the high-frequency signal, including: calculating the average value of the power spectrum values of the frequency points, and determining the maximum power spectrum value in the power spectrum values of the frequency points; performing difference processing on the maximum power spectrum value and the average value to obtain a first difference value; and determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference.
Wherein the spectral feature parameters comprise a spectral envelope type; specifically, referring to fig. 3, the preset spectral envelope types include: the '0' type is a low-energy flat-laying type; the '1' type is a high-energy flat-laying type; the '2' type is an energy convex type; the 3 type is an energy concave type; the '4' type is an energy gradual rising type; the '5' type is an energy gradually-reduced type; the 6 type is a step type with high energy front and low energy back; the "7" pattern is a step pattern with low energy and high energy.
Low energy tiling for type "0" and high energy tiling for type "1". The determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference value includes: if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is smaller than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a first type; and if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is larger than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type. If the first difference is smaller than a first predetermined threshold C1 and the maximum power spectrum value is smaller than a second predetermined threshold C2, it may be determined that the spectrum envelope type corresponding to the high frequency signal is the first type: a low energy tiled type of "0" type. If the first difference is smaller than a first predetermined threshold value C1 and the maximum power spectrum value is larger than a second predetermined threshold value C2, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type: high energy tiled type of "1" type.
For types "2" to "7". The determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference value includes: if the first difference value is larger than a first preset threshold value, normalizing the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point; acquiring at least one preset target value, wherein each target value corresponds to a preset spectrum envelope type; calculating a mean square error value of the normalization value corresponding to each frequency point and each target value; and determining the preset spectrum envelope type corresponding to the target value corresponding to the minimum mean square error value as the spectrum envelope type of the high-frequency signal.
Each target value corresponds to a preset spectrum envelope type, and the target value is a preset value. The target value may be z (i), i ∈ [ N1, N2], i is the sequence number of the frequency bin, z (i) is smaller than 1, for example, if N2-N1+1 is equal to 9, the target value z (i) corresponding to type "2" may be set to 000111000. And calculating a mean square error value of the normalization value corresponding to each frequency point and each target value, and accurately determining the preset spectrum envelope type with the closest spectrum distribution characteristic of the high-frequency signal based on the minimum mean square error value. For example, if the mean square error value of the target value corresponding to the type "2" and the normalized value corresponding to each frequency point is the smallest compared to the types "3" to "7", it can be determined that the energy convex type of the type "2" is the spectrum envelope type of the high-frequency signal.
The normalizing the power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point comprises: performing difference processing on the power spectrum value of each frequency point and the average value respectively to obtain a second difference value corresponding to each frequency point; calculating a square value of a second difference value corresponding to each frequency point, and calculating an average value of the square values to obtain a normalized score; and dividing the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point. In this embodiment, the method may specifically be based on a formulaAndand carrying out normalization processing to obtain a normalization value y (i) corresponding to each frequency point i, wherein N2-N1+1 is the total number of the frequency points, N2-N1 are the frequency point number ranges, xavg is an average value, x (i) is the power spectrum value of each frequency point i, x (i) -xavg is a second difference value corresponding to each frequency point, and std is a normalization score (average value of square values).
In step S440, low frequency speech coding: specifically, the low-frequency signal is subjected to audio coding processing, and low-frequency coded data corresponding to the low-frequency signal is generated. The low frequency signal can be encoded by a traditional speech encoder (which can be a CELP, SILK, AAC, etc. encoder) to generate low frequency encoded data.
In step S450, data output: and transmitting the frequency spectrum characteristic parameters and the low-frequency coded data to a receiving end. The frequency spectrum characteristic parameters and the low-frequency coded data can be packaged together to form a coded code stream which is sent to a receiving end.
Further, referring to fig. 6, the decoding process in the audio processing process is performed at the acquisition end, and the process may include steps S510 to S550.
In step S510, a code stream is input: specifically, a code stream sent by an acquisition end is received, wherein the code stream comprises a frequency spectrum characteristic parameter of a high-frequency signal and low-frequency encoding data of a low-frequency signal. Namely, receiving the spectral characteristic parameters of the high-frequency signal and the low-frequency coded data of the low-frequency signal, wherein the high-frequency signal and the low-frequency signal are generated by decomposing the target audio signal.
In step S520, code stream parsing: specifically, the received code stream is analyzed, and the spectrum characteristic parameters of the high-frequency signals in the code stream and the low-frequency encoding data of the low-frequency signals are analyzed.
In step S530, low frequency speech decoding: specifically, the low frequency encoded data is decoded to generate a decoded low frequency signal. The receiving end can decode the low-frequency coded data through a traditional voice decoder to generate a decoded low-frequency signal, and the decoded low-frequency signal is also the decoded low-frequency signal.
In step S540, network matching: specifically, prediction network matching processing is performed based on the spectral feature parameters, so as to obtain an audio prediction network matched with the spectral feature parameters.
The spectral feature parameters comprise a spectral envelope type; performing prediction network matching processing based on the spectral feature parameters to obtain an audio prediction network matched with the spectral feature parameters, including: acquiring network information of at least one preset audio prediction network, wherein each network information corresponds to a preset spectrum envelope type; determining network information corresponding to a preset spectrum envelope type matched with the spectrum envelope type to obtain target network information; and determining a preset audio prediction network corresponding to the target network information as an audio prediction network matched with the spectral characteristic parameters.
In step S550, the prediction processing: specifically, audio prediction processing is performed on the basis of the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal.
In step S550, performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal, including: step S551, extracting the frequency spectrum characteristics of the decoded low-frequency signal to obtain low-frequency spectrum information; s552, performing audio prediction processing based on the low-frequency spectrum information by adopting the audio prediction network to obtain predicted spectrum information; and S553, generating a predicted high frequency signal corresponding to the high frequency signal based on the predicted spectrum information.
The extracting processing of the frequency spectrum feature of the decoded low-frequency signal to obtain the low-frequency spectrum information includes: carrying out improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information; the generating a predicted high frequency signal corresponding to the high frequency signal based on the predicted spectrum information includes: and performing improved inverse discrete cosine transform processing on the predicted frequency spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
The low-frequency spectrum information may be obtained by performing Modified Discrete Cosine Transform (MDCT) processing on the decoded low-frequency signal. Then, with respect to the predicted spectrum information predicted by the audio prediction network, an Inverse Modified Discrete Cosine Transform (IMDCT) may be used to perform Inverse Modified Discrete Cosine Transform processing, thereby generating a predicted high frequency signal.
In step S560, QMF synthesizes: specifically, an audio output signal corresponding to the target audio signal is generated according to the predicted high-frequency signal and the decoded low-frequency signal. Generating an audio output signal corresponding to the target audio signal according to the predicted high frequency signal and the decoded low frequency signal, comprising: and performing orthogonal mirror synthesis filtering processing on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal. The predicted high-frequency signal and the decoded low-frequency signal may be subjected to quadrature Mirror synthesis filtering by a Quadrature Mirror Filter (QMF) to generate a full-band audio output signal corresponding to the target audio signal.
The method can be at least realized, aiming at a target audio signal, a collection end can describe the spectral distribution characteristic of a high-frequency signal through the spectral characteristic parameter with less data size, only the spectral characteristic parameter and the low-frequency coded data of the low-frequency signal need to be transmitted during transmission, the transmission bandwidth is effectively reduced, meanwhile, a matched audio prediction network is selected based on the spectral characteristic parameter to restore the high-frequency signal, and a predicted high-frequency signal is generated.
In order to better implement the audio processing method provided by the embodiments of the present application, an embodiment of the present application further provides an audio processing apparatus based on the audio processing method. Wherein the noun has the same meaning as in the audio processing method, and the details of the implementation can be referred to the description in the method embodiment. Fig. 7 shows a block diagram of an audio processing device according to an embodiment of the application. Fig. 8 shows a block diagram of an audio processing device according to another embodiment of the present application.
As shown in fig. 7, the audio processing apparatus 600 may include a receiving module 610, a decoding module 620, a matching module 630, a predicting module 640, and an output module 650, and the audio processing apparatus 600 may be applied to a device corresponding to a receiving end of audio.
The receiving module 610 may be configured to receive a spectral feature parameter of a high-frequency signal and low-frequency encoded data of a low-frequency signal, where the high-frequency signal and the low-frequency signal belong to a target audio signal; the decoding module 620 may be configured to perform decoding processing on the low-frequency encoded data to generate a decoded low-frequency signal; the matching module 630 may be configured to perform prediction network matching processing based on the spectral feature parameters, so as to obtain an audio prediction network matched with the spectral feature parameters; the prediction module 640 may be configured to perform an audio prediction process on the decoded low-frequency signal based on the audio prediction network to generate a predicted high-frequency signal corresponding to the high-frequency signal; the output module 650 may be configured to generate an audio output signal corresponding to the target audio signal according to the predicted high frequency signal and the decoded low frequency signal.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type; the matching module 630 includes: the information acquisition unit is used for acquiring network information of at least one preset audio prediction network, and each network information corresponds to a preset spectrum envelope type; the network matching unit is used for determining network information corresponding to a preset spectrum envelope type matched with the spectrum envelope type to obtain target network information; and the network determining unit is used for determining a preset audio prediction network corresponding to the target network information as the audio prediction network matched with the spectral characteristic parameters.
In some embodiments of the present application, the prediction module 640 includes: the extraction processing unit is used for extracting the frequency spectrum characteristics of the decoded low-frequency signal to obtain low-frequency spectrum information; the information prediction unit is used for performing audio prediction processing on the basis of the low-frequency spectrum information by adopting the audio prediction network to obtain predicted spectrum information; and a signal generating unit for generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectrum information.
In some embodiments of the present application, the extraction processing unit is configured to: carrying out improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information; the signal generation unit is configured to: and performing improved inverse discrete cosine transform processing on the predicted frequency spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
In some embodiments of the present application, the output module 650 is configured to: and performing orthogonal mirror synthesis filtering processing on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal.
In this way, based on the audio processing apparatus 600, for a target audio signal, the spectral distribution characteristics of a high-frequency signal therein can be described by using spectral characteristic parameters with a very small data size, only low-frequency encoded data of the spectral characteristic parameters and a low-frequency signal need to be transmitted when receiving data, the transmission bandwidth is effectively reduced, and meanwhile, a matched audio prediction network is selected based on the spectral characteristic parameters to perform high-frequency signal reduction to generate a predicted high-frequency signal.
As shown in fig. 8, the audio processing apparatus 700 may include a decomposition module 710, an extraction module 720, an encoding module 730, and a delivery module 740, and the audio processing apparatus 700 may be applied to a device corresponding to an audio acquisition end.
The decomposition module 710 may be configured to decompose the target audio signal to generate a high frequency signal and a low frequency signal; the extracting module 720 may be configured to perform feature extraction processing on the high-frequency signal to obtain a spectrum feature parameter corresponding to the high-frequency signal; the encoding module 730 may be configured to perform audio encoding processing on the low-frequency signal to generate low-frequency encoded data corresponding to the low-frequency signal; the transmission module 740 may be configured to send the spectral feature parameters and the low-frequency encoded data to a receiving end, so that the receiving end determines an audio prediction network matched with the spectral feature parameters, and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency encoded data.
In some embodiments of the present application, the extraction module 720 includes: the frequency domain conversion unit is used for carrying out frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal; the power spectrum value calculating unit is used for calculating the power spectrum value of each frequency point in the frequency domain signal; and the frequency spectrum characteristic parameter acquisition unit is used for performing characteristic extraction processing on the basis of the power spectrum value of each frequency point to acquire frequency spectrum characteristic parameters describing the frequency spectrum distribution characteristics of the high-frequency signals.
In some embodiments of the present application, the spectrum characteristic parameter obtaining unit includes: the element calculating subunit is used for calculating the average value of the power spectrum values of the frequency points and determining the maximum power spectrum value in the power spectrum values of the frequency points; the difference processing subunit is configured to perform difference processing on the maximum power spectrum value and the average value to obtain a first difference value; and the spectral characteristic parameter determining subunit is configured to determine a spectral characteristic parameter corresponding to the high-frequency signal according to the first difference.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type; the spectral feature parameter determining subunit is configured to: if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is smaller than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a first type; and if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is larger than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type.
In some embodiments of the present application, the spectral feature parameters include a spectral envelope type; the spectral feature parameter determining subunit is configured to: if the first difference value is larger than a first preset threshold value, normalizing the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point; acquiring at least one preset target value, wherein each target value corresponds to a preset spectrum envelope type; calculating a mean square error value of the normalization value corresponding to each frequency point and each target value; and determining the preset spectrum envelope type corresponding to the target value corresponding to the minimum mean square error value as the spectrum envelope type of the high-frequency signal.
In some embodiments of the present application, the spectral feature parameter determining subunit is configured to: performing difference processing on the power spectrum value of each frequency point and the average value respectively to obtain a second difference value corresponding to each frequency point; calculating a square value of a second difference value corresponding to each frequency point, and calculating an average value of the square values to obtain a normalized score; and dividing the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
In some embodiments of the present application, the decomposition module 710 is configured to: and carrying out orthogonal mirror image decomposition filtering processing on the target audio signal to generate the high-frequency signal and the low-frequency signal.
In this way, based on the audio processing apparatus 700, for a target audio signal, the spectral distribution characteristics of a high-frequency signal therein can be described by using spectral characteristic parameters with a very small data size, only low-frequency encoded data of the spectral characteristic parameters and a low-frequency signal need to be transmitted when data is transmitted, the transmission bandwidth is effectively reduced, and meanwhile, a matched audio prediction network is selected based on the spectral characteristic parameters to perform high-frequency signal reduction to generate a predicted high-frequency signal.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In addition, an embodiment of the present application further provides an electronic device, where the electronic device may be a terminal or a server, as shown in fig. 9, which shows a schematic structural diagram of the electronic device according to the embodiment of the present application, and specifically:
the electronic device may include components such as a processor 801 of one or more processing cores, memory 802 of one or more computer-readable storage media, a power supply 803, and an input unit 804. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 9 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
the processor 801 is a control center of the electronic device, connects various parts of the entire computer device using various interfaces and lines, and performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 802 and calling data stored in the memory 802, thereby performing overall monitoring of the electronic device. Alternatively, processor 801 may include one or more processing cores; preferably, the processor 801 may integrate an application processor, which handles primarily the operating system, user pages, application programs, etc., and a modem processor, which handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 801.
The memory 802 may be used to store software programs and modules, and the processor 801 executes various functional applications and data processing by operating the software programs and modules stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 access to the memory 802.
The electronic device further comprises a power supply 803 for supplying power to each component, and preferably, the power supply 803 can be logically connected with the processor 801 through a power management system, so that functions of charging, discharging, power consumption management and the like can be managed through the power management system. The power supply 803 may also include one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and any like components.
The electronic device may further include an input unit 804, and the input unit 804 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 801 in the electronic device loads an executable file corresponding to one or more processes of the computer program into the memory 802 according to the following instructions, and the processor 801 executes the computer program stored in the memory 802, thereby implementing various functions in the foregoing embodiments of the present application.
As the processor 801 may perform: receiving spectral characteristic parameters of a high-frequency signal and low-frequency coded data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal; decoding the low-frequency coded data to generate a decoded low-frequency signal; performing prediction network matching processing based on the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters; performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal; and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
As the processor 801 may perform: decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal; carrying out feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal; carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal; and sending the spectral characteristic parameters and the low-frequency coded data to a receiving end so that the receiving end determines an audio prediction network matched with the spectral characteristic parameters, and generating an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by a computer program, which may be stored in a computer-readable storage medium and loaded and executed by a processor, or by related hardware controlled by the computer program.
To this end, the present application further provides a computer-readable storage medium, in which a computer program is stored, where the computer program can be loaded by a processor to execute the steps in any one of the methods provided by the present application.
Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the computer program stored in the computer-readable storage medium can execute the steps in any method provided in the embodiments of the present application, the beneficial effects that can be achieved by the method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the method provided in the various alternative implementations of the above embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the embodiments that have been described above and shown in the drawings, but that various modifications and changes can be made without departing from the scope thereof.
Claims (17)
1. An audio processing method, comprising:
receiving spectral characteristic parameters of a high-frequency signal and low-frequency coded data of a low-frequency signal, wherein the high-frequency signal and the low-frequency signal belong to a target audio signal;
decoding the low-frequency coded data to generate a decoded low-frequency signal;
performing prediction network matching processing based on the frequency spectrum characteristic parameters to obtain an audio prediction network matched with the frequency spectrum characteristic parameters;
performing audio prediction processing based on the audio prediction network and the decoded low-frequency signal to generate a predicted high-frequency signal corresponding to the high-frequency signal;
and generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
2. The method of claim 1, wherein the spectral feature parameters comprise a spectral envelope type;
the audio prediction network matched with the spectral characteristic parameters is obtained by performing prediction network matching processing based on the spectral characteristic parameters, and the method comprises the following steps:
acquiring network information of at least one preset audio prediction network, wherein each network information corresponds to a preset spectrum envelope type;
determining network information corresponding to a preset spectrum envelope type matched with the spectrum envelope type to obtain target network information;
and determining a preset audio prediction network corresponding to the target network information as an audio prediction network matched with the spectral characteristic parameters.
3. The method according to claim 1, wherein the performing an audio prediction process with the decoded low-frequency signal based on the audio prediction network to generate a predicted high-frequency signal corresponding to the high-frequency signal comprises:
performing spectrum feature extraction processing on the decoded low-frequency signal to obtain low-frequency spectrum information;
performing audio prediction processing based on the low-frequency spectrum information by adopting the audio prediction network to obtain predicted spectrum information;
and generating a predicted high-frequency signal corresponding to the high-frequency signal based on the predicted spectrum information.
4. The method according to claim 3, wherein said performing a spectral feature extraction process on the decoded low-frequency signal to obtain low-frequency spectral information comprises:
carrying out improved discrete cosine transform processing on the decoded low-frequency signal to obtain the low-frequency spectrum information;
the generating a predicted high frequency signal corresponding to the high frequency signal based on the predicted spectrum information includes:
and performing improved inverse discrete cosine transform processing on the predicted frequency spectrum information to generate a predicted high-frequency signal corresponding to the high-frequency signal.
5. The method according to any one of claims 1 to 4, wherein generating an audio output signal corresponding to the target audio signal based on the predicted high frequency signal and the decoded low frequency signal comprises:
and performing orthogonal mirror synthesis filtering processing on the predicted high-frequency signal and the decoded low-frequency signal to generate the audio output signal.
6. An audio processing method, comprising:
decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal;
carrying out feature extraction processing on the high-frequency signal to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal;
carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal;
and sending the spectral characteristic parameters and the low-frequency coded data to a receiving end so that the receiving end determines an audio prediction network matched with the spectral characteristic parameters, and generating an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data.
7. The method according to claim 6, wherein the performing feature extraction processing on the high-frequency signal to obtain a spectral feature parameter corresponding to the high-frequency signal comprises:
carrying out frequency domain conversion processing on the high-frequency signal to obtain a frequency domain signal;
calculating the power spectrum value of each frequency point in the frequency domain signal;
and performing characteristic extraction processing based on the power spectrum value of each frequency point to obtain a spectrum characteristic parameter for describing the spectrum distribution characteristic of the high-frequency signal.
8. The method according to claim 7, wherein the performing feature extraction processing based on the power spectrum value of each frequency point to obtain a spectrum feature parameter describing a spectrum distribution feature of the high-frequency signal comprises:
calculating the average value of the power spectrum values of the frequency points, and determining the maximum power spectrum value in the power spectrum values of the frequency points;
performing difference processing on the maximum power spectrum value and the average value to obtain a first difference value;
and determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference.
9. The method of claim 8, wherein the spectral feature parameters comprise a spectral envelope type;
the determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference value includes:
if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is smaller than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a first type;
and if the first difference value is smaller than a first preset threshold value and the maximum power spectrum value is larger than a second preset threshold value, determining that the spectrum envelope type corresponding to the high-frequency signal is a second type.
10. The method of claim 8, wherein the spectral feature parameters comprise a spectral envelope type;
the determining the spectral characteristic parameter corresponding to the high-frequency signal according to the first difference value includes:
if the first difference value is larger than a first preset threshold value, normalizing the power spectrum value of each frequency point to obtain a normalization value corresponding to each frequency point;
acquiring at least one preset target value, wherein each target value corresponds to a preset spectrum envelope type;
calculating a mean square error value of the normalization value corresponding to each frequency point and each target value;
and determining the preset spectrum envelope type corresponding to the target value corresponding to the minimum mean square error value as the spectrum envelope type of the high-frequency signal.
11. The method according to claim 10, wherein the normalizing the power spectrum value of each frequency point to obtain a normalized value corresponding to each frequency point comprises:
performing difference processing on the power spectrum value of each frequency point and the average value respectively to obtain a second difference value corresponding to each frequency point;
calculating a square value of a second difference value corresponding to each frequency point, and calculating an average value of the square values to obtain a normalized score;
and dividing the second difference value corresponding to each frequency point by the normalized score to obtain a normalized value corresponding to each frequency point.
12. The method according to any one of claims 6 to 11, wherein the decomposing the target audio signal to generate the high frequency signal and the low frequency signal comprises:
and carrying out orthogonal mirror image decomposition filtering processing on the target audio signal to generate the high-frequency signal and the low-frequency signal.
13. An audio processing apparatus, comprising:
the receiving module is used for receiving the frequency spectrum characteristic parameters of the high-frequency signals and the low-frequency coded data of the low-frequency signals, wherein the high-frequency signals and the low-frequency signals belong to target audio signals;
the decoding module is used for decoding the low-frequency coded data to generate a decoded low-frequency signal;
the matching module is used for performing prediction network matching processing based on the spectrum characteristic parameters to obtain an audio prediction network matched with the spectrum characteristic parameters;
the prediction module is used for carrying out audio prediction processing on the basis of the audio prediction network and the decoded low-frequency signal so as to generate a predicted high-frequency signal corresponding to the high-frequency signal;
and the output module is used for generating an audio output signal corresponding to the target audio signal according to the predicted high-frequency signal and the decoded low-frequency signal.
14. An audio processing apparatus, comprising:
the decomposition module is used for decomposing the target audio signal to generate a high-frequency signal and a low-frequency signal;
the extraction module is used for carrying out feature extraction processing on the high-frequency signal so as to obtain a frequency spectrum feature parameter corresponding to the high-frequency signal;
the coding module is used for carrying out audio coding processing on the low-frequency signal to generate low-frequency coded data corresponding to the low-frequency signal;
and the transmission module is used for transmitting the spectral characteristic parameters and the low-frequency coded data to a receiving end so that the receiving end determines an audio prediction network matched with the spectral characteristic parameters and generates an audio output signal based on a decoded low-frequency signal obtained by decoding the audio prediction network and the low-frequency coded data.
15. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor of a computer, causes the computer to perform the method of any one of claims 1 to 5 and 6 to 12.
16. An electronic device, comprising: a memory storing a computer program; a processor reading a computer program stored in the memory to perform the method of any one of claims 1 to 5 and 6 to 12.
17. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1 to 5 and 6 to 12.
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK40070975A true HK40070975A (en) | 2022-11-04 |
| HK40070975B HK40070975B (en) | 2025-09-19 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN103065631B (en) | A kind of method of speech recognition, device | |
| CN103971680B (en) | A kind of method, apparatus of speech recognition | |
| BRPI0812029B1 (en) | method of recovering hidden data, telecommunication device, data hiding device, data hiding method and upper set box | |
| CN116741193B (en) | Training method and device for voice enhancement network, storage medium and computer equipment | |
| US9449605B2 (en) | Inactive sound signal parameter estimation method and comfort noise generation method and system | |
| CN114338623A (en) | Audio processing method, device, equipment, medium and computer program product | |
| CN114550732A (en) | Coding and decoding method and related device for high-frequency audio signal | |
| CN117672254A (en) | Voice conversion method, device, computer equipment and storage medium | |
| CA2956019C (en) | Method for estimating noise in an audio signal, noise estimator, audio encoder, audio decoder, and system for transmitting audio signals | |
| CN114333861B (en) | Audio processing method, device, storage medium, equipment and product | |
| CN112767955A (en) | Audio encoding method and device, storage medium and electronic equipment | |
| RU2633097C2 (en) | Methods and devices for signal coding and decoding | |
| HK40070975A (en) | Audio processing method, device, storage medium, equipment and product | |
| CN112669857B (en) | Voice processing method, device and equipment | |
| CN117334204A (en) | Signal processing methods, devices, computer equipment, storage media and program products | |
| CN101740030A (en) | Method and device for transmitting and receiving speech signals | |
| HK40070975B (en) | Audio processing method, device, storage medium, equipment and product | |
| CN117789701A (en) | Data transmission method, model training method, device, chip and terminal | |
| CN119721071B (en) | Voice translation method, system and related device | |
| CN109273003A (en) | Voice control method and system for driving recorder | |
| HK40070387A (en) | Method for encoding and decoding high-frequency audio signal, and related apparatus | |
| HK40070387B (en) | Method for encoding and decoding high-frequency audio signal, and related apparatus | |
| CN113259063B (en) | Data processing method, data processing device, computer equipment and computer readable storage medium | |
| HK40069959A (en) | Audio processing method, device, equipment and medium | |
| CN120375836A (en) | Speech compression coding method, decoding method and related equipment based on satellite channel |