CN112992161A

CN112992161A - Audio encoding method, audio decoding method, audio encoding apparatus, audio decoding medium, and electronic device

Info

Publication number: CN112992161A
Application number: CN202110386567.7A
Authority: CN
Inventors: 程赫楠
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2021-06-18

Abstract

The present invention relates to an audio encoding method, an audio decoding method, an apparatus, a medium, and an electronic device, wherein the method includes: receiving an audio signal, and separating the audio signal into a first audio signal and a second audio signal, wherein the frequency of the first audio signal is greater than that of the second audio signal; coding the first audio signal by adopting a first audio coding mode to obtain a first code stream; coding the second audio signal by adopting a second audio coding mode to obtain a second code stream; and transmitting the first code stream and the second code stream to a decoder through a network so that the decoder restores the audio signal based on the decoding processing of the first code stream and the second code stream. The embodiment of the invention changes the transmitted code stream from one to two, increases the fault tolerance under the poor network condition, reduces the probability of losing part of data packets in the transmission process, improves the audio transmission quality, further ensures that the decoded audio quality is better, and simultaneously improves the stability of audio transmission under the environment of packet loss.

Description

Audio encoding method, audio decoding method, audio encoding apparatus, audio decoding medium, and electronic device

Technical Field

The disclosed embodiments relate to the field of computer technologies, and in particular, to an audio encoding method, an audio decoding method, an audio encoding apparatus, an audio decoding apparatus, a computer-readable storage medium for implementing the audio encoding method and the audio decoding method, and an electronic device.

Background

In the audio technology, an encoding end such as an encoder can encode an analog audio signal to obtain a digital code stream, and a decoding end such as a decoder can perform corresponding decoding processing on the digital code stream transmitted from the encoding end, so that the digital code stream is restored to the analog audio signal.

In the related art, some audio encoding techniques such as Opus encoder encode an input audio signal into a segment of digital code stream, which is poor in the transmission network condition, the audio transmission quality is poor, and part of data packets may be lost in the transmission process, so that a decoder cannot receive part of the lost data packets, thereby losing some audio information, resulting in poor decoded audio quality such as discontinuity.

Disclosure of Invention

In order to solve the technical problems described above or at least partially solve the technical problems, the present disclosure provides an audio encoding method, an audio decoding method, an audio encoding apparatus, an audio decoding apparatus, a computer-readable storage medium implementing the audio encoding method and the audio decoding method, and an electronic device.

In a first aspect, an embodiment of the present disclosure provides an audio encoding method, including:

receiving an audio signal, and separating the audio signal into a first audio signal and a second audio signal, wherein the frequency of the first audio signal is greater than that of the second audio signal;

coding the first audio signal by adopting a first audio coding mode to obtain a first code stream;

coding the second audio signal by adopting a second audio coding mode to obtain a second code stream; wherein the second audio coding mode is different from the first audio coding mode;

and transmitting the first code stream and the second code stream to a decoder through a network, so that the decoder decodes and processes the recovered audio signal based on the first code stream and the second code stream.

In some embodiments of the present disclosure, the encoding the first audio signal by using a first audio encoding manner to obtain a first code stream includes:

coding the first audio signal by adopting the first audio coding mode to obtain a first coded signal;

and carrying out interval coding on the first coded signal to obtain a first code stream.

In some embodiments of the present disclosure, the encoding the second audio signal by using a second audio coding method to obtain a second code stream includes:

coding the second audio signal by adopting the second audio coding mode to obtain a second coded signal;

and carrying out interval coding on the second coded signal to obtain a second code stream.

In some embodiments of the present disclosure, the first audio encoding mode is encoding using a cellencoder; and/or the second audio coding mode adopts a Silk Encoder for coding.

In a second aspect, an embodiment of the present disclosure provides an audio decoding method, including:

receiving a first code stream and a second code stream transmitted by an encoder; the first code stream is obtained by encoding a first audio signal by adopting a first audio encoding mode, the second code stream is obtained by encoding a second audio signal by adopting a second audio encoding mode, the first audio signal and the second audio signal are obtained by separating the audio signals, and the frequency of the first audio signal is greater than that of the second audio signal;

decoding the first code stream by adopting a first audio decoding mode to obtain a first audio signal;

decoding the second code stream by adopting a second audio decoding mode to obtain a second audio signal;

and mixing the first audio signal and the second audio signal to recover the decoded audio signal.

In some embodiments of the present disclosure, the first code stream is obtained by performing interval coding on a first coding signal, where the first coding signal is obtained by coding the first audio signal in the first audio coding manner;

the decoding the first code stream by adopting a first audio decoding mode to obtain a first audio signal comprises the following steps:

performing interval decoding on the first code stream to obtain a first coded signal;

and decoding the first coded signal by adopting the first audio decoding mode to obtain a first audio signal.

In some embodiments of the present disclosure, the second code stream is obtained by performing interval coding on a second coding signal, where the second coding signal is obtained by coding the second audio signal in the second audio coding manner; the method further comprises the following steps:

the decoding the second code stream by using a second audio decoding method to obtain a second audio signal includes:

performing inter-decoding on the second code stream to obtain a second coded signal;

and decoding the second coded signal by adopting the second audio decoding mode to obtain a second audio signal.

In some embodiments of the present disclosure, the first audio decoding manner is to use a celldecoder for decoding; and/or the second audio decoding mode adopts a Silk Decoder for decoding.

In a third aspect, an embodiment of the present disclosure provides an audio encoding apparatus, including:

the signal separation module is used for receiving an audio signal and separating the audio signal into a first audio signal and a second audio signal, wherein the frequency of the first audio signal is greater than that of the second audio signal;

the first coding module is used for coding the first audio signal by adopting a first audio coding mode to obtain a first code stream;

the second coding module is used for coding the second audio signal by adopting a second audio coding mode to obtain a second code stream; wherein the second audio coding mode is different from the first audio coding mode;

and the signal sending module is used for transmitting the first code stream and the second code stream to a decoder through a network so that the decoder decodes and processes the first code stream and the second code stream to recover the audio signal.

In a fourth aspect, an embodiment of the present disclosure provides an audio decoding apparatus, including:

the code stream receiving module is used for receiving a first code stream and a second code stream transmitted by the encoder; the first code stream is obtained by encoding a first audio signal by adopting a first audio encoding mode, the second code stream is obtained by encoding a second audio signal by adopting a second audio encoding mode, the first audio signal and the second audio signal are obtained by separating the audio signals, and the frequency of the first audio signal is greater than that of the second audio signal;

the first decoding module is used for decoding the first code stream by adopting a first audio decoding mode to obtain a first decoding signal;

the second decoding module is used for decoding the second code stream by adopting a second audio decoding mode to obtain a second decoding signal;

and the audio recovery module is used for mixing the first decoded signal and the second decoded signal to recover and obtain a decoded audio signal.

In a fifth aspect, the disclosed embodiments provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the audio encoding method or the audio decoding method according to any one of the above embodiments.

In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including:

a processor; and

a memory for storing a computer program;

wherein the processor is configured to perform the steps of the audio encoding method or the audio decoding method of any of the above embodiments via execution of the computer program.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

an audio encoding method, an apparatus, a medium, and an electronic device in an embodiment of the present disclosure receive an audio signal, separate the audio signal into a first audio signal and a second audio signal, where a frequency of the first audio signal is greater than a frequency of the second audio signal; coding the first audio signal by adopting a first audio coding mode to obtain a first code stream; coding the second audio signal by adopting a second audio coding mode to obtain a second code stream; wherein the second audio coding mode is different from the first audio coding mode; and transmitting the first code stream and the second code stream to a decoder through a network, so that the decoder decodes and processes the recovered audio signal based on the first code stream and the second code stream. Thus, the original audio signal is divided into a first audio signal and a second audio signal, the first audio signal and the second audio signal are respectively encoded, then the two corresponding code streams are transmitted to a decoder through a network, and the decoder decodes and recovers the two code streams to obtain the original audio signal. Therefore, the scheme of the embodiment changes the transmitted code stream from one to two, increases the fault tolerance in poor network conditions, reduces the probability of losing part of data packets in the transmission process, improves the audio transmission quality, and further ensures that the decoded audio quality is better. Meanwhile, the stability of audio transmission in a packet loss environment is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is a flow chart of an audio encoding method according to an embodiment of the disclosure;

FIG. 2 is a flowchart of an audio encoding method according to another embodiment of the disclosure;

FIG. 3 is a flowchart of an audio encoding method according to another embodiment of the disclosure;

FIG. 4 is a flowchart of an audio decoding method according to an embodiment of the disclosure;

FIG. 5 is a diagram illustrating audio quality test results according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an audio encoding apparatus according to an embodiment of the disclosure;

FIG. 7 is a schematic diagram of an audio decoding apparatus according to an embodiment of the disclosure;

fig. 8 is a schematic diagram of an electronic device implementing an audio encoding method or an audio decoding method according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

It is to be understood that, hereinafter, "at least one" means one or more, "a plurality" means two or more. "and/or" is used to describe the association relationship of the associated objects, meaning that there may be three relationships, for example, "a and/or B" may mean: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

Fig. 1 is a flowchart of an audio encoding method according to an embodiment of the present disclosure, where the audio encoding method may be implemented on an encoding side based on an encoder such as an Opus audio encoder, and specifically includes the following steps:

step S101: the method comprises the steps of receiving an audio signal, and separating the audio signal into a first audio signal and a second audio signal, wherein the frequency of the first audio signal is greater than that of the second audio signal.

In particular, the Opus audio encoder in this embodiment may divide the sampled audio signal from the frequency domain into a high frequency signal and a low frequency signal, where the first audio signal is the high frequency signal and the second audio signal is the low frequency signal, thus dividing the audio signal into two parts. Illustratively, the audio signal is, for example, a human voice, and the audio signal has a frequency in the range of 0-8kHz, so that the sampling rate is 16kHz according to the Nyquist sampling law, wherein the low-frequency signal portion has a frequency of 0-4kHz and the high-frequency signal portion has a frequency of 4-8 kHz. The low frequency signal portion typically contains the basic information and most of the energy of the speech signal. The high frequency signal portion contains the high frequency information of the audio signal, which typically contains the details of the speech signal.

Step S102: and coding the first audio signal by adopting a first audio coding mode to obtain a first code stream.

For example, the first audio encoding mode may be, but is not limited to, encoding with a cellencoder. Celt (constrained energy mapped) Encoder is an ultra-low delay audio Encoder that can be used in real-time high quality speech transmission applications. The specific encoding method and process of the cell Encoder can be understood by referring to the prior art, and are not described herein.

Specifically, the Opus audio Encoder encodes the high-frequency signal portion of the audio signal by using a first audio encoding method, for example, using a cell Encoder, to obtain a first code stream BS1, that is, a code stream of the base layer.

Step S103: coding the second audio signal by adopting a second audio coding mode to obtain a second code stream; wherein the second audio coding mode is different from the first audio coding mode.

Illustratively, the second audio encoding scheme is different from the first encoding scheme, such as encoding using a Celt Encoder, and the second audio encoding scheme may be encoding using a Silk Encoder, for example. The Silk codec is a Silk wideband audio encoder provided by Skype to third-party developers and hardware manufacturers, has good elasticity on audio bandwidth, network bandwidth and algorithm complexity, and can be understood by referring to the prior art for specific encoding modes and processes thereof, which are not described herein again.

Specifically, the Opus audio Encoder encodes the second audio signal, i.e., the low-frequency signal portion of the audio signal, by using a second audio encoding method, for example, using a Silk Encoder, to obtain a second code stream BS 2.

Step S104: and transmitting the first code stream and the second code stream to a decoder through a network, so that the decoder decodes and processes the recovered audio signal based on the first code stream and the second code stream.

Specifically, the network may be an internet network, after the first code stream BS1 and the second code stream BS2 of the original audio signal are obtained by encoding, the Opus audio encoder may transmit the first code stream BS1 and the second code stream BS2 to the decoder through the network, and then the decoder may recover the original audio signal based on the decoding processing of the first code stream BS1 and the second code stream BS 2.

In the audio encoding method in this embodiment, the original audio signal is divided into two parts, namely, a high-frequency signal and a low-frequency signal, the high-frequency signal and the low-frequency signal are encoded respectively, and then the two corresponding code streams are transmitted to the decoder through the network, and the decoder decodes and recovers the two code streams to obtain the original audio signal. Therefore, the scheme of the embodiment changes the transmitted code stream from one to two, increases the fault tolerance in poor network conditions, reduces the probability of losing part of data packets in the transmission process, improves the audio transmission quality, and further ensures that the decoded audio quality is better. Meanwhile, the stability of audio transmission in a packet loss environment is improved.

Optionally, in some embodiments of the present disclosure, with reference to fig. 2, in step S102, a first audio coding manner is adopted to code the first audio signal to obtain a first code stream, which may specifically include the following steps:

step S201: and coding the first audio signal by adopting the first audio coding mode to obtain a first coded signal.

Illustratively, an Opus audio Encoder encodes a first audio signal, i.e. a high frequency signal portion, of an audio signal using, for example, a cell Encoder, resulting in a first encoded signal.

Step S202: and carrying out interval coding on the first coded signal to obtain a first code stream.

Illustratively, range encodings (RangerEncoders) are a form of data compression methods in arithmetic coding, which achieve higher compression rates than classical Huffman coding. For specific implementation of interval coding, reference may be made to the prior art, and details are not described here.

Specifically, in this embodiment, the first code stream BS1 is obtained by performing the interval encoding again, so as to obtain a higher compression ratio, improve the reliability when the code stream is subsequently transmitted through the network, reduce the packet loss rate, and further improve the quality of the decoded audio.

Optionally, in some embodiments of the present disclosure, in step S103, a second audio coding mode is adopted to code the second audio signal to obtain a second code stream, which may specifically include the following steps:

step S301: and coding the second audio signal by adopting the second audio coding mode to obtain a second coded signal.

Illustratively, the Opus audio Encoder encodes the second audio signal, i.e. the low-frequency signal portion of the audio signal, for example using a Silk Encoder, resulting in a second encoded signal.

Step S302: and carrying out interval coding on the second coded signal to obtain a second code stream.

Specifically, the Opus audio encoder performs interval encoding on the second encoded signal to obtain a second code stream BS2, so as to obtain a higher compression ratio, improve reliability when the code stream is subsequently transmitted through the network, reduce packet loss rate, and further improve quality of the decoded audio.

In a second aspect, an embodiment of the present disclosure provides an audio decoding method, which may be applied to a decoding end corresponding to the encoding end, and implemented based on a decoder, such as an Opus audio decoder, and specifically includes the following steps:

step S401: receiving a first code stream and a second code stream transmitted by an encoder; the first code stream is obtained by encoding a first audio signal by adopting a first audio encoding mode, the second code stream is obtained by encoding a second audio signal by adopting a second audio encoding mode, the first audio signal and the second audio signal are obtained by separating the audio signals, and the frequency of the first audio signal is greater than that of the second audio signal.

It can be understood that, with respect to the process of determining the first code stream and the second code stream by the encoder, reference may be made to the description in the foregoing embodiment, and details are not described here again.

Step S402: and decoding the first code stream by adopting a first audio decoding mode to obtain a first audio signal.

For example, since the Encoder, such as the Opus audio Encoder, uses, for example, a cell Encoder to encode the first audio signal, i.e., the high-frequency signal portion of the audio signal, the decoding may use a corresponding cell Decoder to perform decoding, and specifically may decode the first code stream BS1 to obtain the first audio signal, i.e., the high-frequency signal portion.

Step S403: and decoding the second code stream by adopting a second audio decoding mode to obtain a second audio signal.

Illustratively, the second audio signal, i.e. the low frequency signal portion of the audio signal, is encoded as the Opus audio Encoder employs, for example, a Silk Encoder. Therefore, during decoding, a corresponding Silk Decoder, for example, may be used for decoding, and specifically, the second code stream BS2 may be decoded to obtain a second audio signal, i.e., a low frequency signal portion.

Step S404: and mixing the first audio signal and the second audio signal to recover the decoded audio signal.

In particular, the decoder, such as an Opus audio decoder, mixes the first audio signal, i.e. the high frequency signal portion, with the second audio signal, i.e. the low frequency signal portion, restoring the decoded audio signal.

In the audio decoding method in this embodiment, a first code stream and a second code stream output by an encoder are received, the first code stream and the second code stream are obtained by dividing an original audio signal into two parts, namely a high-frequency signal and a low-frequency signal, by the encoder, respectively encoding the high-frequency signal and the low-frequency signal, and a decoder decodes and recovers the two parts of code streams to obtain the original audio signal. Therefore, the scheme of the embodiment changes the transmitted code stream from one to two, increases the fault tolerance in poor network conditions, reduces the probability of losing part of data packets in the transmission process, improves the audio transmission quality, and further ensures that the decoded audio quality is better. Meanwhile, the stability of audio transmission in a packet loss environment is improved.

Optionally, in some embodiments of the present disclosure, the first code stream BS1 is obtained by performing interval coding on a first encoded signal, where the first encoded signal is obtained by encoding the first audio signal by using the first audio coding method. The first audio encoding method is, for example, to encode a high frequency signal portion, which is a first audio signal of an audio signal, by using a cell Encoder, and then perform interval encoding once again. Correspondingly, at the decoding end, the decoding the first code stream in step S402 by using a first audio decoding manner to obtain a first audio signal, which may specifically include: performing interval decoding on the first code stream to obtain a first coded signal; and decoding the first coded signal by adopting the first audio decoding mode to obtain a first audio signal.

It can be understood that, since the encoding is performed by the last time interval encoding, the decoding needs to be performed by the interval decoding first, and then the decoding is performed based on, for example, the celldecoder to obtain the first audio signal, i.e., the high frequency signal portion.

Optionally, in some embodiments of the present disclosure, the second code stream BS2 is obtained by performing interval coding on a second encoded signal, where the second encoded signal is obtained by encoding the second audio signal by using the second audio coding method. The second audio encoding method is, for example, to encode a low-frequency signal portion, which is a second audio signal of the audio signal, by using a Silk Encoder, and then perform interval encoding once again. Correspondingly, the decoding the second code stream by using a second audio decoding manner in step S403 to obtain a second audio signal may specifically include: performing inter-decoding on the second code stream to obtain a second coded signal; and decoding the second coded signal by adopting the second audio decoding mode to obtain a second audio signal.

It can be understood that, since the encoding is performed with the last time of the interval encoding, the decoding needs to perform the interval decoding first, and then perform the decoding based on, for example, the Silk Decoder to obtain the second audio signal, i.e., the low frequency signal portion.

Finally, the decoded low-frequency signal portion, i.e., the base layer signal, and the high-frequency signal portion, i.e., the enhancement layer signal, may be mixed to obtain the decoded and restored audio signal.

Aspects of embodiments of the present disclosure are described below with reference to a specific example. A specific example audio encoding and decoding method may include the steps of:

step 1): the input audio is sampled and the sampled audio signal is divided from the frequency domain into a low frequency signal part (base layer) and a high frequency signal part (enhancement layer).

Step 2): the low-frequency signal part is firstly coded in a SilkEncoder coding mode, and then the coding result is coded once again in a range Encoder (RangerEncoder), so that a base layer code stream BS1 is obtained.

Step 3): the high-frequency signal part is firstly coded in a Celt Encoder coding mode, and then the coding result is subjected to interval coding (RangerEncoder) again to obtain an enhancement layer code stream BS 2.

Step 4): and respectively transmitting the encoded base layer code stream BS1 and the encoded enhancement layer code stream BS2 to a decoding end through a network.

Step 5): and the decoding end inputs the received code stream into a decoder for decoding. After the code stream signal enters the Decoder, interval decoding (Ranger Decoder) is firstly carried out, then whether the part of the code stream signal belongs to a base layer signal or an enhancement layer signal is judged, if the part of the code stream signal belongs to the base layer signal, the code stream signal is decoded through the Silk Decoder, and if the part of the code stream signal belongs to the enhancement layer signal, the code stream signal is decoded through the Celt Decoder. And finally, mixing the decoded base layer signal and the enhancement layer signal to obtain the decoded and restored audio signal.

Specifically, due to the network data packet loss condition, the decoding end mainly processes the received code stream signal according to the following four conditions:

a) the method comprises the following steps Both the base layer signal and the enhancement layer signal receive: at this time, the received base layer signal and the enhancement layer signal are decoded and mixed to obtain a decoded audio signal.

b) The method comprises the following steps Receiving the base layer signal, missing enhancement layer signal: the enhancement layer signal of the current frame is replaced by the enhancement layer signal of the adjacent frame, such as the previous frame/the next frame, and the enhancement layer signal of the next frame is prioritized, and the base layer signal and the enhancement layer signal are decoded and mixed to obtain a decoded audio signal. If the enhancement layer signals of the adjacent frames are lost, the enhancement layer signals of the current frame are replaced by Gaussian white noise.

c) The method comprises the following steps Missing base layer signal, receiving enhancement layer signal: the base layer signal of the current frame is replaced by the base layer signal of the adjacent frame, such as the previous frame/the next frame, and the base layer signal and the enhancement layer signal are decoded and mixed to obtain a decoded audio signal. If the base layer signals of the adjacent frames are lost, the base layer signals of the current frame are replaced by Gaussian white noise.

d) The method comprises the following steps Both the base layer signal and the enhancement layer signal are lost: the base layer signal and the enhancement layer signal of the current frame are replaced by adjacent frames such as a previous frame/a next frame, and the base layer signal and the enhancement layer signal of the next frame are prioritized, and the base layer signal and the enhancement layer signal are decoded and mixed to obtain a decoded audio signal. If the base layer signals of the adjacent frames are lost, the base layer signals of the current frame are replaced by Gaussian white noise, and if the enhancement layer signals of the adjacent frames are lost, the enhancement layer signals of the current frame are replaced by the Gaussian white noise.

When the decoder receives only the base layer signal, the basic information of the audio speech signal can be recovered. When the decoder receives only the enhancement layer signal, part of the detail information of the audio signal can be recovered. When the decoder receives both the base layer signal and the enhancement layer signal, the complete audio signal information can be recovered. Compared with the original Opus audio codec, the scheme of the embodiment changes the transmitted code stream from one to two, increases the fault tolerance under the poor network condition, reduces the probability of losing part of data packets in the transmission process, and greatly reduces the packet loss rate, thereby improving the audio transmission quality and further ensuring that the decoded audio quality is better. Meanwhile, in order to reduce the influence of packet loss, the prior art obtains more stable and continuous data transmission by introducing redundancy, and a typical technology is a forward error correction coding technology, but if the situation of continuous packet loss occurs, the forward error correction coding technology can only recover the information of the last lost packet frame, which results in poor quality of the decoded audio. The coded data redundancy brought by the two-time coding of the scheme of the embodiment can be ignored compared with the forward error correction coding technology adopted in the prior Opus codec, and meanwhile, the stability of audio transmission in a packet loss environment is improved through sub-band coding transmission, so that the decoded audio quality can be improved.

The scheme of the embodiment of the disclosure realizes the function of scalable coding by coding the audio signal in a sub-band mode on the basis of the Opus codec. According to the test, the stability of the audio transmission of the Opus codec under the packet loss environment is improved, and the following detailed test contents are provided.

The test data source is the THCHS30 Chinese speech data set published by the university of Qinghua Speech and language technology center, and 200 pieces of data including male voice and female voice were selected for the test. The software used to test subjective quality is an implementation of the PESQ algorithm described in ITU-T Recommendation p.862.

The test parameters were set to input a single-channel audio sequence with a sampling rate of 16kHz and a bit rate of 25 kbps. Tests were performed under different packet loss rates. The test result is shown in fig. 5, and the result shows that the opus-2 curve in fig. 5 corresponds to the case of only losing high-frequency packets, and at this time, the audio quality is generally stable under different packet loss rates. For the situation that the high-frequency packets and the low-frequency packets are lost with probability, one code stream data packet transmitted by the scheme of the embodiment represented by the opus-1 curve is changed into two code stream data packets, and the quality of the decoding audio frequency is improved to a certain extent compared with the quality of the audio frequency under the situation that the existing single data packet is lost represented by the opus curve. Therefore, the scheme of the embodiment can be verified to improve the stability of the audio transmission of the Opus codec under the packet loss environment, and further improve the decoded audio quality.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc. Additionally, it will also be readily appreciated that the steps may be performed synchronously or asynchronously, e.g., among multiple modules/processes/threads.

Based on the same inventive concept, fig. 6 illustrates an audio encoding apparatus according to an embodiment of the present disclosure, including:

a signal separation module 601, configured to receive an audio signal and separate the audio signal into a first audio signal and a second audio signal, where a frequency of the first audio signal is greater than a frequency of the second audio signal;

a first encoding module 602, configured to encode the first audio signal in a first audio encoding manner to obtain a first code stream;

a second encoding module 603, configured to encode the second audio signal in a second audio encoding manner to obtain a second code stream; wherein the second audio coding mode is different from the first audio coding mode;

a signal sending module 604, configured to transmit the first code stream and the second code stream to a decoder through a network, so that the decoder recovers an audio signal based on the decoding processing of the first code stream and the second code stream.

The audio encoding device provided by the embodiment of the disclosure divides an original audio signal into a first audio signal and a second audio signal, respectively encodes the first audio signal and the second audio signal, then transmits the two corresponding code streams to a decoder through a network, and the decoder decodes and recovers the two code streams to obtain the original audio signal. Therefore, the scheme of the embodiment changes the transmitted code stream from one to two, increases the fault tolerance in poor network conditions, improves the audio transmission quality, reduces the probability of losing part of data packets in the transmission process, and further ensures that the decoded audio quality is better.

Optionally, in some embodiments of the present disclosure, the encoding the first audio signal by the first encoding module using a first audio encoding manner to obtain a first code stream includes: coding the first audio signal by adopting the first audio coding mode to obtain a first coded signal; and carrying out interval coding on the first coded signal to obtain a first code stream.

Optionally, in some embodiments of the present disclosure, the encoding the second audio signal by the second encoding module using a second audio encoding manner to obtain a second code stream includes: coding the second audio signal by adopting the second audio coding mode to obtain a second coded signal; and carrying out interval coding on the second coded signal to obtain a second code stream.

Optionally, in some embodiments of the present disclosure, the first audio encoding manner may be, but is not limited to, encoding using a cellencoder.

Optionally, in some embodiments of the present disclosure, the second audio encoding manner may be, but is not limited to, encoding by using a Silk Encoder.

Based on the same inventive concept, fig. 7 illustrates an audio decoding apparatus according to an embodiment of the present disclosure, including:

a code stream receiving module 701, configured to receive a first code stream and a second code stream transmitted by an encoder; the first code stream is obtained by encoding a first audio signal by adopting a first audio encoding mode, the second code stream is obtained by encoding a second audio signal by adopting a second audio encoding mode, the first audio signal and the second audio signal are obtained by separating the audio signals, and the frequency of the first audio signal is greater than that of the second audio signal;

a first decoding module 702, configured to decode the first code stream by using a first audio decoding manner to obtain a first decoded signal;

a second decoding module 703, configured to decode the second code stream by using a second audio decoding manner to obtain a second decoded signal;

an audio restoring module 704, configured to mix the first decoded signal with the second decoded signal, and restore the decoded audio signal.

The audio decoding device in this embodiment receives a first code stream and a second code stream that are output by an encoder, where the first code stream and the second code stream are obtained by the encoder dividing an original audio signal into two parts, namely a high-frequency signal and a low-frequency signal, and encoding the high-frequency signal and the low-frequency signal respectively, and a decoder decodes and recovers the two parts of code streams to obtain the original audio signal. Therefore, the scheme of the embodiment changes the transmitted code stream from one to two, increases the fault tolerance in poor network conditions, improves the audio transmission quality, reduces the probability of losing part of data packets in the transmission process, and further ensures that the decoded audio quality is better.

Optionally, in some embodiments of the present disclosure, the first code stream is obtained by performing interval coding on a first coding signal, where the first coding signal is obtained by coding the first audio signal in the first audio coding manner. Correspondingly, the decoding of the first code stream by the first decoding module in a first audio decoding manner to obtain a first audio signal may specifically include: performing interval decoding on the first code stream to obtain a first coded signal; and decoding the first coded signal by adopting the first audio decoding mode to obtain a first audio signal.

Optionally, in some embodiments of the present disclosure, the second code stream is obtained by performing interval coding on a second coding signal, where the second coding signal is obtained by coding the second audio signal in the second audio coding manner. Correspondingly, the decoding, by the second decoding module, the second code stream in a second audio decoding manner to obtain a second audio signal, including: performing inter-decoding on the second code stream to obtain a second coded signal; and decoding the second coded signal by adopting the second audio decoding mode to obtain a second audio signal.

Optionally, in some embodiments of the present disclosure, the first audio decoding manner may be decoding using a celldecoder, but is not limited thereto. In some embodiments of the present disclosure, the second audio decoding manner may be decoding using a Silk Decoder, but is not limited thereto.

The specific manner in which the above-mentioned embodiments of the apparatus perform operations and the corresponding technical effects are described in detail in the corresponding embodiments of the method, and will not be described in detail herein.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units. The components shown as modules or units may or may not be physical units, i.e. may be located in one place or may also be distributed over a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the wood-disclosed scheme. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the audio encoding method or the audio decoding method according to any one of the above embodiments.

By way of example, and not limitation, such readable storage media can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

The disclosed embodiment also provides an electronic device, as shown in fig. 8, the electronic device includes a processor 801 and a memory 802, and the memory 802 is used for storing computer programs. Wherein the processor is configured to perform the steps of the audio encoding method or the audio decoding method of any of the above embodiments via execution of the computer program.

The various aspects, implementations, or features of the described embodiments can be used alone or in any combination. Aspects of the described embodiments may be implemented by software, hardware, or a combination of software and hardware. The described embodiments may also be embodied by a computer-readable medium having computer-readable code stored thereon, the computer-readable code comprising instructions executable by at least one computing device. The computer readable medium can be associated with any data storage device that can store data which can be read by a computer system. Exemplary computer readable media can include read-only memory, random-access memory, CD-ROMs, HDDs, DVDs, magnetic tape, and optical data storage devices, among others. The computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

The above description of the technology may refer to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration embodiments in which the described embodiments may be practiced. These embodiments, while described in sufficient detail to enable those skilled in the art to practice them, are non-limiting; other embodiments may be utilized and changes may be made without departing from the scope of the described embodiments. For example, the order of operations described in a flowchart is non-limiting, and thus the order of two or more operations illustrated in and described in accordance with the flowchart may be altered in accordance with several embodiments. As another example, in several embodiments, one or more operations illustrated in and described with respect to the flowcharts are optional or may be eliminated. Additionally, certain steps or functions may be added to the disclosed embodiments, or two or more steps may be permuted in order. All such variations are considered to be encompassed by the disclosed embodiments and the claims.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An audio encoding method, comprising:

2. The method of claim 1, wherein the encoding the first audio signal by using the first audio encoding method to obtain the first code stream comprises:

3. The method according to claim 1 or 2, wherein said encoding the second audio signal using the second audio coding scheme to obtain a second code stream comprises:

4. The method of claim 3, wherein the first audio encoding is performed using a Celt Encoder; and/or the second audio coding mode adopts a Silk Encoder for coding.

5. An audio decoding method, comprising:

6. The method according to claim 5, wherein the first code stream is obtained by interval coding a first encoded signal, and the first encoded signal is obtained by encoding the first audio signal by the first audio coding method;

7. The method according to claim 5 or 6, wherein the second code stream is obtained by interval coding a second encoded signal, and the second encoded signal is obtained by encoding the second audio signal by using the second audio coding mode; the method further comprises the following steps:

8. The method of claim 7, wherein the first audio decoding mode is a Celt Decoder; and/or the second audio decoding mode adopts a Silk Decoder for decoding.

9. An audio encoding apparatus, comprising:

10. An audio decoding apparatus, comprising:

11. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the audio encoding method of any one of claims 1 to 4 or to carry out the steps of the audio decoding method of any one of claims 5 to 8.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the steps of the audio encoding method of any one of claims 1 to 4 or to implement the steps of the audio decoding method of any one of claims 5 to 8 via execution of the executable instructions.