CN101276587A - Audio encoding apparatus and method thereof, audio decoding device and method thereof - Google Patents
Audio encoding apparatus and method thereof, audio decoding device and method thereof Download PDFInfo
- Publication number
- CN101276587A CN101276587A CNA2007100888785A CN200710088878A CN101276587A CN 101276587 A CN101276587 A CN 101276587A CN A2007100888785 A CNA2007100888785 A CN A2007100888785A CN 200710088878 A CN200710088878 A CN 200710088878A CN 101276587 A CN101276587 A CN 101276587A
- Authority
- CN
- China
- Prior art keywords
- frequency
- low
- domain
- sub
- band
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 187
- 238000004458 analytical method Methods 0.000 claims abstract description 265
- 230000009466 transformation Effects 0.000 claims abstract description 107
- 230000005236 sound signal Effects 0.000 claims abstract description 32
- 238000000695 excitation spectrum Methods 0.000 claims description 389
- 230000005284 excitation Effects 0.000 claims description 232
- 238000013139 quantization Methods 0.000 claims description 176
- 238000001228 spectrum Methods 0.000 claims description 135
- 230000015572 biosynthetic process Effects 0.000 claims description 119
- 238000003786 synthesis reaction Methods 0.000 claims description 119
- 239000013598 vector Substances 0.000 claims description 98
- 238000001914 filtration Methods 0.000 claims description 37
- 238000006243 chemical reaction Methods 0.000 claims description 32
- 238000000354 decomposition reaction Methods 0.000 claims description 19
- 230000003044 adaptive effect Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 5
- 238000011002 quantification Methods 0.000 abstract 1
- 230000003595 spectral effect Effects 0.000 description 55
- 230000008569 process Effects 0.000 description 34
- 238000010586 diagram Methods 0.000 description 28
- 238000012545 processing Methods 0.000 description 19
- 230000006870 function Effects 0.000 description 16
- 101100500467 Arabidopsis thaliana EAAC gene Proteins 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 10
- 238000005070 sampling Methods 0.000 description 10
- 239000000284 extract Substances 0.000 description 6
- 230000004048 modification Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 238000011426 transformation method Methods 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000005238 low-frequency sound signal Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
Images
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
The invention discloses a single acoustic path sound coding device using an analysis sub-band filter set module to decompound a single acoustic path sound signal into a low-frequency sub-band domain signal comprising at least two sub-bands and a high-frequency sub-band domain signal comprising at least two sub-bands; the low-frequency sub-band domain signal is performed with prediction analysis by a low-frequency sub-band domain time-varying prediction analysis module, is performed with time domain transformation by a low-frequency sub-band domain time-frequency transformation module, and is performed with quantification coding by a low-frequency sub-band domain waveform coding module so as to obtain low-frequency sub-band domain waveform coding data; high-frequency sub-band domain parameter coding data is extracted from the high-frequency sub-band domain signal by parameter coding of a high-frequency sub-band domain parameter coding module; the low-frequency sub-band domain waveform coding data and the high-frequency sub-band domain parameter coding data are multiplexed by a bit-stream multiplexing module to output a sound code stream. The invention also discloses a single acoustic path sound decoding device, a single acoustic path sound coding/decoding method, a dimensional sound coding/decoding device and method, all of which can realize high-quality coding of wide band sound in a low code rate.
Description
Technical Field
The present invention relates to a sound encoding and decoding technology, and more particularly, to a monaural sound encoding device and method, a monaural sound decoding device and method, a stereo sound encoding device and method, and a stereo sound decoding device and method.
Background
At present, the wideband high-quality compression coding technology adopts a waveform coding technology. Waveform coding techniques are roughly classified into 3 types: predictive coding, sub-band coding and transform coding. The prediction coding changes the actual value of the transmitted sound signal into the transmission error value, and can express the sound signal by using fewer coding bits. Subband coding divides the sound signal into several continuous subbands, so that the signal on each subband can be coded by adopting a separate coding scheme, for example, for subband signals with different intensities, different quantization steps are adopted, and quantization errors can be reduced to the maximum extent. Transform coding transforms a signal in a time domain space into a transform domain, such as a frequency domain, so that the signal is more intensively distributed in the transform domain, and coding efficiency can be improved. The above coding and decoding techniques may also be combined with each other to form current sub-band predictive coding, sub-band transform coding, and predictive transform coding.
The subband predictive coding system APT-X100 adopted by a Digital home theater system (DTS) of Digital cinema systems is a representative subband predictive coding system. The APT-X100 system firstly adopts a two-stage series quadrature mirror Filter bank (QMF) to divide an audio signal into 4 sub-band signals, carries out time domain linear prediction on the sub-band signals to obtain sub-band excitation signals with sub-band redundancy removed, and then adopts self-adaptive quantization coding to obtain the audio signal subjected to sub-band prediction coding.
Advanced Audio Coding (AAC) organized by the Dolby laboratories AC-3 and the Moving Picture Experts Group (MPEG) is representative of the use of transform Coding schemes. The Transform coding schemes of AC-3 and AAC first perform Modified Discrete Cosine Transform/Modified Discrete Sine Transform (MDCT, Modified Discrete Cosine Transform/MDST, Modified Discrete Sine Transform) on an audio signal, perform bit allocation on a Transform coefficient according to a psychoacoustic model, and then perform quantization coding by using scalar huffman coding to obtain a Transform-coded audio signal.
MPEG-organized MP3 is representative of sub-band transform coding. The MPEG layer 1 and the MPEG layer 2 adopt a Pseudo-Mirror Filter bank (PQMF, Pseudo-Quadrature Mirror Filter) to divide an audio signal into 32 sub-band signals, and carry out scalar quantization and entropy coding on the sub-band signals; the MPEG layer 3, i.e. MP3, further adopts MDCT to eliminate signal redundancy on the basis of 32PQMF subband filtering, and then performs scalar quantization and entropy coding to obtain a subband transform coded audio signal.
MPGE TwinVQ is representative of predictive transform coding, and high coding efficiency can be obtained as well.
Practice shows that sub-band predictive coding, transform coding, sub-band transform coding and predictive transform coding can compress audio signals sampled by 48kHz and 16 bits of a single channel to code rate of about 64kb/s, thereby effectively improving coding efficiency and being very suitable for high-quality sound coding application. However, if the bit rate is to be further compressed, parametric coding techniques more suitable for low rate voice coding applications must be employed.
Currently, in low rate coding applications, a "waveform-parameter" coding scheme based on waveform coding is typically used. The combination of waveform and parameter coding technology can effectively improve coding efficiency. Among them, the Extended Enhanced Advanced audio coding (EAAC +, Enhanced Advanced audio coding Plus) and the Extended Wideband adaptive multi-Rate (AMR-WB +, Extended adaptive multi-Rate-Wideband) codec adopted by the third Generation Partnership Project (3GPP, 3rd Generation Partnership Project) are voice coding techniques that have been proven to be most suitable for low bandwidth applications such as mobile communication. The encoder structure of the EAAC + scheme and the decoder structure of the EAAC + scheme are respectively referred to fig. 1 and fig. 2. The encoder structure of the AMR-WB + scheme and the decoder structure of the AMR-WB + scheme are referred to fig. 3 and fig. 4, respectively.
In the EAAC + scheme, a Spectral Band Replication (SBR) encoding technique is added to an MPEG ACC technique based on a waveform encoding technique, and the technique can perform effective Spectral structure adjustment on a high-frequency copy excitation signal obtained by copying a low-frequency excitation signal. The basic model of the EAAC + scheme adopts a waveform coding technology with higher efficiency, is more suitable for coding music signals than the AMR-WB + scheme, but has higher decoding complexity than the AMR-WB +. Meanwhile, because no effective parameter module conforming to a speech generation mechanism, namely 'prediction + excitation', is adopted, the speech coding quality under low code rate is lower than that of AMR-WB +.
The parametric stereo coding technique in the EAAC + coding scheme is an encoding method having a higher compression rate and encoded sound quality, but the coding scheme is an implementation method based on a subband signal spectrum. The current parametric stereo coding techniques in EAAC + do not involve a stereo coding solution based on a subband excitation spectrum.
In the AMR-WB + scheme, an input signal is decomposed into two sub-bands by a simple filtering method, the first sub-band is a Low Frequency (LF) signal, and the sampling Frequency of the first sub-band is Fs/2 after critical sampling, wherein Fs is the sampling Frequency of a signal obtained by a resampling module of an AMR-WB + encoder, and in stereo coding, the Low Frequency signal is decomposed into a Low Frequency signal of one sub-band and an intermediate Frequency signal of one sub-band; the second sub-band is a High Frequency (HF) signal, also critically sampled, with a sampling Frequency of Fs/2. The HF signal adopts a simple high frequency Extension coding (BWE) technique, which is a time domain implementation method, and the decoding complexity is low, but the high frequency copy excitation signal obtained by copying the low frequency excitation signal cannot be effectively adjusted in spectrum structure, so the quality of the sound generated after coding is low, and especially the coded sound of some music signals has large distortion.
The stereo coding technology in the AMR-WB + scheme is realized by adopting a time domain filtering method, the resolution is not as high as EAAC +, and the quality of the music generated after decoding is lower than that of EAAC +.
Therefore, in the application of low-rate coding of digital sound, the existing EAAC + technology and AMR-WB + technology have certain defects in the aspects of coding quality and complexity. At present, a more reasonable sound encoding and decoding technology is needed, so that high-quality sound encoding is realized under the constraint of lower code rate and lower realization complexity, and the problems of the technologies are solved.
Disclosure of Invention
In view of the above, it is a primary object of the present invention to provide a monaural sound encoding apparatus that can achieve high-quality encoding of wideband sound at a low code rate.
A second main object of the present invention is to provide a method for encoding a mono audio signal, which can achieve high-quality encoding of wideband audio at a low bit rate.
A third main object of the present invention is to provide a monaural sound decoding apparatus capable of realizing high-quality decoding of wideband sound at a low code rate.
A fourth main object of the present invention is to provide a monaural sound decoding method that can achieve high-quality decoding of wideband sound at a low code rate.
A fifth main object of the present invention is to provide a stereo encoding apparatus capable of high-quality encoding of wide-band sound at a low code rate.
A sixth main object of the present invention is to provide a stereo encoding method capable of realizing high-quality encoding of wideband sound at a low code rate.
A seventh main object of the present invention is to provide a stereo decoding apparatus capable of high-quality decoding of wideband sound at a low code rate.
An eighth main object of the present invention is to provide a stereo decoding method capable of achieving high-quality decoding of wideband sound at a low code rate.
According to a first aspect of the above object, the present invention provides a monaural sound encoding apparatus comprising:
an analysis subband filterbank module for performing a subband decomposition of the mono sound signal into a low frequency subband domain signal comprising at least 2 subbands and a high frequency subband domain signal comprising at least 2 subbands;
the low-frequency sub-band domain time-varying prediction analysis module is used for performing prediction analysis on the low-frequency sub-band domain signal to acquire a low-frequency sub-band domain excitation signal;
the low-frequency sub-band domain time-frequency transformation module is used for performing time-frequency transformation on the low-frequency sub-band domain excitation signal to acquire a low-frequency sub-band domain excitation spectrum;
the low-frequency subband domain waveform coding module is used for carrying out quantization coding on the low-frequency subband domain excitation spectrum to obtain low-frequency subband domain waveform coding data;
the high-frequency subband domain parameter coding module is used for calculating high-frequency subband domain parameters for recovering the high-frequency subband domain signals from the low-frequency subband domain excitation spectrum according to the low-frequency subband domain excitation spectrum and the high-frequency subband domain signals, and obtaining high-frequency subband domain parameter coding data after carrying out quantization coding on the high-frequency subband domain parameters; or calculating a high-frequency sub-band domain parameter for recovering the high-frequency sub-band domain signal from the low-frequency sub-band domain excitation signal according to the low-frequency sub-band domain excitation signal and the high-frequency sub-band domain signal, and performing quantization coding on the high-frequency sub-band domain parameter to obtain high-frequency sub-band domain parameter coded data;
and the bit stream multiplexing module is used for multiplexing the low-frequency subband domain waveform coded data and the high-frequency subband domain parameter coded data so as to output a sound coded code stream.
The single sound track sound coding device further comprises a low-frequency sub-band signal type analysis module, wherein the low-frequency sub-band signal type analysis module is used for carrying out signal type analysis on a frame of low-frequency sub-band signals and outputting a low-frequency sub-band signal type analysis result; outputting a signal type if the low frequency sub-band domain signal is a slowly varying signal; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and outputting a signal type and the fast-changing point position;
the low-frequency subband domain time-varying predictive analysis module is further used for dividing the low-frequency subband domain signal into one or more subframes for predictive analysis according to the analysis result of the low-frequency subband domain signal type;
the low-frequency sub-band domain time-frequency transformation module is further used for dividing the low-frequency sub-band domain excitation signal into one or more subframes for time-frequency transformation according to the analysis result of the low-frequency sub-band domain signal type;
the bit stream multiplexing module is further configured to multiplex the low-frequency subband domain signal type analysis result.
According to a second aspect of the above object, the present invention provides a mono sound encoding method, comprising:
A. performing sub-band decomposition on the single-channel sound signal to obtain a low-frequency sub-band domain signal comprising at least 2 sub-bands and a high-frequency sub-band domain signal comprising at least 2 sub-bands;
B. performing predictive analysis and time-frequency transformation on the low-frequency sub-band domain signal to obtain a low-frequency sub-band domain excitation spectrum; carrying out quantization coding on the low-frequency subband domain excitation spectrum to obtain low-frequency subband domain waveform coding data;
C. calculating high-frequency sub-band domain parameters for recovering the high-frequency sub-band domain signals from the low-frequency sub-band domain excitation spectrum according to the high-frequency sub-band domain signals and the low-frequency sub-band domain excitation spectrum, and obtaining high-frequency sub-band domain parameter coding data after carrying out quantization coding on the high-frequency sub-band domain parameters; or, according to the high-frequency sub-band domain signal and the low-frequency sub-band domain excitation signal obtained through the predictive analysis, calculating a high-frequency sub-band domain parameter used for recovering the high-frequency sub-band domain signal from the low-frequency sub-band domain excitation signal, and obtaining high-frequency sub-band domain parameter coding data after carrying out quantization coding on the high-frequency sub-band domain parameter;
D. multiplexing the low-frequency sub-band domain waveform coded data and the high-frequency sub-band domain parameter coded data, and outputting a sound coded code stream.
The mono sound encoding method further comprises the steps of: performing signal type analysis on a frame of low-frequency subband domain signals, and determining a low-frequency subband domain signal type analysis result; if the low-frequency sub-band domain signal is a slowly varying signal, taking the signal type as a low-frequency sub-band domain signal type analysis result; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and taking the signal type and the fast-changing point position as a signal type analysis result;
step B, the performing predictive analysis and time-frequency transformation on the low-frequency sub-band domain signal comprises: dividing a frame of low-frequency subband domain signals into one or more than one sub-frames for predictive analysis according to the type analysis result of the low-frequency subband domain signals; performing predictive analysis on the low-frequency sub-band domain signals according to the sub-frames to obtain low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to a dividing sequence to generate a frame of low-frequency sub-band domain excitation signals;
dividing the frame of low-frequency sub-band domain excitation signal into one or more sub-frames for time-frequency transformation according to the type analysis result of the low-frequency sub-band domain signal; performing time-frequency transformation on the low-frequency sub-band domain excitation signal according to the sub-frame to obtain a low-frequency sub-band domain excitation spectrum of the sub-frame;
step D further comprises multiplexing the low frequency subband domain signal type analysis results.
According to a third aspect of the above object, the present invention provides a monaural sound decoding apparatus comprising:
the bit stream demultiplexing module is used for demultiplexing the sound coding code stream to obtain low-frequency subband domain waveform coding data comprising at least 2 subbands and high-frequency subband domain parameter coding data comprising at least 2 subbands;
the low-frequency subband domain waveform decoding module is used for carrying out inverse quantization decoding on the low-frequency subband domain waveform coded data so as to obtain a low-frequency subband domain excitation spectrum;
the low-frequency sub-band domain frequency-time transformation module is used for carrying out frequency-time transformation on the low-frequency sub-band domain excitation spectrum so as to obtain a low-frequency sub-band domain excitation signal;
the low-frequency sub-band domain time-varying prediction synthesis module is used for performing prediction synthesis on the low-frequency sub-band domain excitation signal to acquire a low-frequency sub-band domain signal;
the high-frequency subband domain parameter decoding module is used for carrying out inverse quantization decoding on the high-frequency subband domain parameter coded data to obtain high-frequency subband domain parameters and recovering a high-frequency subband domain signal from the low-frequency subband domain excitation spectrum according to the high-frequency subband domain parameters; or restoring a high-frequency subband domain signal from the low-frequency subband domain excitation signal according to the high-frequency subband domain parameter;
and the synthesis subband filter bank module is used for carrying out subband synthesis on the low-frequency subband domain signal and the high-frequency subband domain signal so as to obtain a decoded single-channel sound signal.
The bit stream demultiplexing module is further used for acquiring a low-frequency subband domain signal type analysis result for recovering single-channel sound from the demultiplexed sound coding code stream;
the low-frequency sub-band domain frequency-time transformation module is further used for dividing the received low-frequency sub-band domain excitation spectrum into one or more than one sub-frames for frequency-time transformation according to the low-frequency sub-band domain signal type analysis result;
and the low-frequency subband domain time-varying prediction synthesis module is further used for dividing the received low-frequency subband domain excitation signal into one or more than one sub-frames for prediction synthesis according to the analysis result of the low-frequency subband domain signal type.
According to a fourth aspect of the above object, the present invention provides a monaural sound decoding method, comprising:
A. demultiplexing the sound coding code stream to obtain low-frequency sub-band domain waveform coding data comprising at least 2 sub-bands and high-frequency sub-band domain parameter coding data comprising at least 2 sub-bands;
B. performing inverse quantization decoding on the low-frequency subband domain waveform coded data to acquire a low-frequency subband domain excitation spectrum; performing frequency-time transformation and prediction synthesis on the low-frequency sub-band domain excitation spectrum to obtain a low-frequency sub-band domain signal;
C. performing inverse quantization decoding on the high-frequency subband domain parameter coded data to acquire high-frequency subband domain parameters, and recovering a high-frequency subband domain signal from the low-frequency subband domain excitation spectrum according to the high-frequency subband domain parameters; or restoring a high-frequency subband domain signal from a low-frequency subband domain excitation signal obtained through the frequency-time transformation according to the high-frequency subband domain parameter;
D. and performing subband synthesis on the low-frequency subband domain signal and the high-frequency subband domain signal, and outputting a decoded single-channel sound signal.
The step A further comprises the step of obtaining a low-frequency subband domain signal type analysis result for recovering the single-channel sound from the demultiplexed sound coding code stream;
b, the frequency-time transformation and prediction synthesis of the low-frequency sub-band domain excitation spectrum comprises the following steps: dividing a frame of low-frequency subband domain excitation spectrum into one or more subframes for frequency-time conversion according to the analysis result of the low-frequency subband domain signal type; performing frequency-time conversion on the low-frequency sub-band domain excitation spectrum according to the sub-frames to obtain low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to a dividing sequence to generate a frame of low-frequency sub-band domain excitation signals;
dividing the frame of low-frequency subband domain excitation signal into one or more than one sub-frames for predictive synthesis according to the analysis result of the low-frequency subband domain signal type; and performing prediction synthesis on the low-frequency sub-band domain signals according to the sub-frames to obtain the low-frequency sub-band domain signals of the sub-frames.
According to a fifth aspect of the above object, the present invention provides a stereo encoding apparatus comprising:
the analysis subband filter bank module is used for performing subband decomposition on the left channel and the right channel of the stereo signal respectively to decompose the stereo signal into a low-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands and a high-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands;
the low-frequency sub-band domain time-varying prediction analysis module is used for respectively carrying out prediction analysis on the low-frequency sub-band domain signals of the left channel and the right channel so as to obtain low-frequency sub-band domain excitation signals of the left channel and the right channel;
the low-frequency sub-band domain time-frequency transformation module is used for respectively carrying out time-frequency transformation on the left and right sound channel low-frequency sub-band domain excitation signals so as to obtain low-frequency sub-band domain excitation spectrums of the left and right sound channels;
the low-frequency sub-band domain stereo coding module is used for carrying out stereo coding on the low-frequency sub-band domain excitation spectrums of the left and right sound channels so as to obtain low-frequency sub-band domain stereo coding data;
a high-frequency subband domain parameter coding module, configured to calculate, according to the low-frequency subband domain excitation spectrums of the left and right channels and the high-frequency subband domain signals of the left and right channels, high-frequency subband domain parameters of the left and right channels, which are used to recover the high-frequency subband domain signals of the left and right channels from the low-frequency subband domain excitation spectrums of the left and right channels, and perform quantization coding on the high-frequency subband domain parameters of the left and right channels, respectively, so as to obtain high-frequency subband domain parameter coded data of the left and right channels; or, respectively according to the low-frequency subband domain excitation signals of the left and right channels and the high-frequency subband domain signals of the left and right channels, calculating high-frequency subband domain parameters of the left and right channels for recovering the high-frequency subband domain signals of the left and right channels from the low-frequency subband domain excitation signals of the left and right channels, and respectively performing quantization coding on the high-frequency subband domain parameters of the left and right channels to obtain high-frequency subband domain parameter coding data of the left and right channels;
and the bit stream multiplexing module is used for multiplexing the low-frequency sub-band domain stereo coded data and the high-frequency sub-band domain parameter coded data of the left and right sound channels so as to output a stereo sound coded code stream.
The stereo coding device further comprises a low-frequency sub-band domain and signal type analysis module, wherein the module is used for calculating a frame of low-frequency sub-band domain and signal according to the low-frequency sub-band domain signals of a frame of left and right channels, carrying out signal type analysis on the low-frequency sub-band domain and signal, and outputting the analysis result of the low-frequency sub-band domain and signal type; outputting a signal type if the low frequency subband domain sum signal is a slowly varying signal; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and outputting a signal type and the fast-changing point position;
the low-frequency subband domain time-varying predictive analysis module is further used for dividing the received low-frequency subband domain signals of the left and right channels of one frame into one or more than one sub-frames for predictive analysis according to the low-frequency subband domain and the signal type analysis result;
the low-frequency sub-band domain time-frequency transformation module is further used for dividing the received left and right channel low-frequency sub-band domain excitation signals into one or more than one sub-frames for time-frequency transformation according to the low-frequency sub-band domain and the signal type analysis result;
the bit stream multiplexing module is further configured to multiplex the low-frequency subband domain and the signal type analysis result.
When performing stereo coding, the low-frequency subband stereo coding module is further configured to select from more than one selectable stereo coding modes, perform coding using the selected stereo coding mode, and output coding mode selection information to the bitstream multiplexing module
According to a sixth aspect of the above object, the present invention provides a stereo encoding method comprising:
A. performing subband decomposition on a left channel and a right channel of the stereo signal respectively to decompose the left channel and the right channel into a low-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands and a high-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands;
B. respectively carrying out predictive analysis and time-frequency transformation on the low-frequency sub-band domain signals of the left and right sound channels to obtain low-frequency sub-band domain excitation spectrums of the left and right sound channels; stereo coding is carried out on the low-frequency sub-band domain excitation spectrums of the left and right sound channels to obtain low-frequency sub-band domain stereo coding data;
C. calculating left and right channel high-frequency subband domain parameters for recovering the left and right channel high-frequency subband domain signals from the left and right channel low-frequency subband domain excitation spectrums according to the left and right channel high-frequency subband domain signals and the left and right channel low-frequency subband domain excitation spectrums, and respectively carrying out quantization coding on the left and right channel high-frequency subband domain parameters to obtain left and right channel high-frequency subband domain parameter coding data; or, respectively calculating left and right channel high frequency subband domain parameters for recovering the left and right channel high frequency subband domain signals from the left and right channel low frequency subband domain excitation signals according to the left and right channel high frequency subband domain signals and the left and right channel low frequency subband domain excitation signals obtained through the predictive analysis, and respectively performing quantization coding on the left and right channel high frequency subband domain parameters to obtain left and right channel high frequency subband domain parameter coding data;
D. and multiplexing the low-frequency sub-band domain stereo coded data and the high-frequency sub-band domain parameter coded data of the left and right sound channels to output stereo sound coded code streams.
The stereo encoding method further comprises the steps of: b, calculating a frame of low-frequency subband domain sum signal according to the low-frequency subband domain signals of the left channel and the right channel of the frame obtained by subband decomposition in the step A, carrying out signal type analysis on the low-frequency subband domain sum signal, and determining the analysis result of the low-frequency subband domain sum signal; if the low-frequency sub-band domain sum signal is a slowly-varying signal, taking the signal type as the analysis result of the low-frequency sub-band domain sum signal; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and taking a signal type and the fast-changing point position as analysis results of the low-frequency sub-band domain and the signal type;
step B, the step of performing predictive analysis and time-frequency transformation on the low-frequency subband signals of the left/right channels comprises the following steps: dividing a frame of left/right channel low-frequency subband domain signals into one or more than one sub-frames for predictive analysis according to the low-frequency subband domain and the signal type analysis result; performing predictive analysis on the low-frequency sub-band domain signals of the left/right channels according to the sub-frames to obtain left/right channel low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to the dividing sequence to generate a frame of left/right channel low-frequency sub-band domain excitation signals;
dividing the left/right channel low-frequency subband domain excitation signal into one or more subframes for time-frequency transformation according to the low-frequency subband domain and the signal type analysis result; performing time-frequency transformation on the left/right channel low-frequency sub-band domain excitation signal according to the sub-frame to obtain a left/right channel low-frequency sub-band domain excitation spectrum of the sub-frame;
step D further comprises multiplexing the low frequency subband domain and the signal type analysis result.
When stereo coding is performed, the stereo coding of the low-frequency subband domain excitation spectrums of the left and right channels in step B includes:
b2, dividing the low-frequency sub-band excitation spectrum of the left channel and the right channel into a plurality of sub-bands, and selecting a stereo coding mode for each sub-band to carry out stereo coding;
step D further comprises multiplexing the coding mode selection information.
According to a seventh aspect of the above object, the present invention provides a stereo decoding apparatus comprising:
the bit stream demultiplexing module is used for demultiplexing the stereo sound coding stream to obtain low-frequency subband domain stereo coding data comprising at least 2 subbands and high-frequency subband domain parameter coding data comprising left and right channels of at least 2 subbands;
the low-frequency sub-band domain stereo decoding module is used for carrying out stereo decoding on the low-frequency sub-band domain stereo coded data so as to obtain low-frequency sub-band domain excitation spectrums of a left channel and a right channel;
the low-frequency sub-band domain time transformation module is used for respectively carrying out frequency time transformation on the low-frequency sub-band domain excitation spectrums of the left and right sound channels so as to obtain low-frequency sub-band domain excitation signals of the left and right sound channels;
the low-frequency sub-band domain time-varying prediction synthesis module is used for respectively carrying out prediction synthesis on the low-frequency sub-band domain excitation signals of the left sound channel and the right sound channel so as to obtain low-frequency sub-band domain signals of the left sound channel and the right sound channel;
a high-frequency subband domain parameter decoding module, configured to perform inverse quantization decoding on the high-frequency subband domain parameter encoded data of the left and right channels to obtain high-frequency subband domain parameters of the left and right channels, and recover high-frequency subband domain signals of the left and right channels from low-frequency subband domain excitation spectrums of the left and right channels according to the high-frequency subband domain parameters of the left and right channels, respectively; or restoring high-frequency subband domain signals of the left channel and the right channel from the low-frequency subband domain excitation signals of the left channel and the right channel according to the high-frequency subband domain parameters of the left channel and the right channel respectively;
and the synthesis subband filter bank module is used for carrying out subband synthesis on the low-frequency subband domain signals of the left and right channels and the high-frequency subband domain signals of the left and right channels so as to obtain the decoded stereo signals of the left and right channels.
The bit stream demultiplexing module is further used for acquiring a low-frequency sub-band domain for recovering stereo and a signal type analysis result from the demultiplexed stereo sound coding code stream;
the low-frequency sub-band domain frequency-time conversion module is further used for dividing the received low-frequency sub-band domain excitation spectrums of the left and right sound channels into one or more than one sub-frames for frequency-time conversion according to the low-frequency sub-band domain and the signal type analysis result;
and the low-frequency subband domain time-varying prediction synthesis module is further used for dividing the received low-frequency subband domain excitation signals of the left and right channels into one or more than one sub-frames for prediction synthesis according to the low-frequency subband domain and the signal type analysis result.
The bit stream multiplexing module is further used for acquiring coding mode selection information for stereo decoding from the demultiplexed stereo sound coding code stream;
when performing stereo encoding, the low-frequency subband stereo decoding module is further configured to perform stereo decoding using a stereo decoding mode corresponding to the encoding mode selection information.
According to an eighth aspect of the above object, the present invention provides a stereo decoding method comprising:
A. demultiplexing the stereo sound coding stream to obtain low-frequency sub-band domain stereo coding data comprising at least 2 sub-bands and high-frequency sub-band domain parameter coding data comprising left and right channels of at least 2 sub-bands
B. Performing stereo decoding on the low-frequency sub-band domain stereo coded data obtained by demultiplexing to obtain low-frequency sub-band domain excitation spectrums of the left and right sound channels; respectively carrying out frequency-time transformation and prediction synthesis on the low-frequency subband domain excitation spectrums of the left and right sound channels to obtain low-frequency subband domain signals of the left and right sound channels;
C. carrying out inverse quantization decoding on the high-frequency subband domain parameter coded data of the left and right channels obtained by demultiplexing to obtain high-frequency subband domain parameters of the left and right channels, and respectively restoring high-frequency subband domain signals of the left and right channels from low-frequency subband domain excitation spectrums of the left and right channels according to the high-frequency subband domain parameters of the left and right channels; or restoring the high-frequency subband domain signals of the left and right channels from the low-frequency subband domain excitation signals of the left and right channels obtained through frequency-time conversion according to the high-frequency subband domain parameters of the left and right channels respectively;
D. and performing subband synthesis on the low-frequency subband domain signals of the left and right channels and the high-frequency subband domain signals of the left and right channels to obtain decoded stereo signals of the left and right channels.
Step A further comprises the steps of obtaining a low-frequency sub-band domain for recovering stereo sound and a signal type analysis result from the de-multiplexed sound coding code stream;
b, the frequency-time transformation and the prediction synthesis of the low-frequency subband domain excitation spectrum of the left/right sound channel comprise the following steps: dividing a frame of the left/right channel low-frequency subband domain excitation spectrum into one or more subframes for frequency-time conversion according to the low-frequency subband domain and the signal type analysis result; performing frequency-time conversion on the left/right channel low-frequency sub-band domain excitation spectrum according to the sub-frames to obtain left/right channel low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to the dividing sequence to generate a frame of left/right channel low-frequency sub-band domain excitation signals;
dividing the left/right channel low-frequency subband domain excitation signal into one or more than one sub-frames for prediction synthesis according to the low-frequency subband domain and signal type analysis result; predicting and synthesizing the left/right channel low-frequency sub-band domain signals according to the sub-frames to obtain the left/right channel low-frequency sub-band domain signals of the sub-frames
When stereo decoding is performed, the step a further includes obtaining coding mode selection information for stereo decoding from the demultiplexed sound coding stream;
step B, performing stereo decoding as follows: and decoding the low-frequency subband domain stereo coded data of each subband k by adopting the stereo decoding mode corresponding to the coding mode selection information.
Compared with the prior art, the single-channel sound coding and decoding scheme provided by the invention firstly carries out sub-band decomposition on the sound signal, the sound signal is decomposed into a low-frequency sub-band domain signal comprising at least 2 sub-bands and a high-frequency sub-band domain signal comprising at least 2 sub-bands, and then the low-frequency sub-band domain signal and the high-frequency sub-band domain signal are respectively coded by adopting different processing methods. When the low-frequency sub-band domain signal is subjected to waveform coding, an effective parameter model of 'prediction + excitation' is adopted for the low-frequency sub-band domain signal, so that the voice coding quality under the constraint of low code rate is improved; and then, the low-frequency sub-band domain excitation signal is transformed into a low-frequency sub-band domain excitation spectrum, the distribution of the signal in a transformation space is more concentrated, and the same sound signal can be represented by a lower code rate, so that the code rate is reduced, and finally, the sub-band domain excitation spectrum is quantized, coded and output. As can be seen, when waveform-coding a low-frequency sound signal, a combination of high-efficiency subband coding, predictive coding, and transform coding is used. When the high-frequency subband domain signal is processed, a high-efficiency parameter coding mode is adopted. In the process of parameter coding, effective frequency spectrum structure adjustment and time domain gain adjustment are carried out, so that the coding efficiency is improved, and the distortion of the decoded sound is reduced.
The invention also provides a stereo coding and decoding scheme, which not only has the advantages of the single sound channel coding and decoding scheme based on the principle of the invention, but also provides a plurality of parameter stereo coding methods based on sub-band excitation spectrums, and is suitable for stereo coding at extremely low code rate.
Drawings
FIG. 1 is a block diagram of an EAAC + encoder of the prior art;
FIG. 2 is a block diagram of an EAAC + decoder according to the prior art;
FIG. 3 is a block diagram of an AMR-WB + encoder in the prior art;
FIG. 4 is a block diagram of an AMR-WB + decoder in the prior art;
FIG. 5 is a block diagram of a mono audio encoding apparatus according to a preferred embodiment of the present invention;
FIG. 6 is a block diagram of the low frequency subband domain time varying prediction analysis module shown in FIG. 5;
FIG. 7 is a block diagram of the high frequency subband domain parameter encoding module shown in FIG. 5;
FIG. 8 is a flowchart of an encoding method of a mono audio encoding device according to the present invention;
FIG. 9a is a time-frequency plan view of a slowly varying signal after time-frequency transformation;
FIG. 9b is a time-frequency plane diagram of a fast-varying signal after time-frequency transformation;
FIG. 10 is a block diagram of a mono audio decoding apparatus according to a preferred embodiment of the present invention;
FIG. 11 is a block diagram of the low frequency subband domain time varying prediction synthesis module shown in FIG. 10;
FIG. 12 is a block diagram illustrating the structure of a high frequency subband domain parameter decoding module shown in FIG. 10;
FIG. 13 is a flowchart of a decoding method of a mono audio decoding device according to the present invention;
FIG. 14 is a block diagram showing a stereo encoding apparatus according to a preferred embodiment of the present invention;
FIG. 15 is a flow chart of an encoding method based on the stereo encoding apparatus of the present invention;
FIG. 16 is a model diagram of the sum and difference stereo coding modes of the present invention;
FIG. 17 is a block diagram of a parametric stereo coding mode according to the present invention;
FIG. 18 is a block diagram of a model of a parametric error stereo coding mode according to the present invention;
FIG. 19 is a block diagram showing a stereo decoding apparatus according to a preferred embodiment of the present invention;
fig. 20 is a flowchart of a decoding method of a stereo audio decoding apparatus according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the following embodiments and the accompanying drawings.
Fig. 5 is a block diagram showing a configuration of a monaural sound encoding apparatus according to a preferred embodiment of the present invention. As shown in fig. 5, the mono sound encoding apparatus according to the preferred embodiment of the present invention includes an analysis subband filterbank module 501, a low frequency subband-domain signal type analysis module 502, a low frequency subband-domain time-varying prediction analysis module 503, a low frequency subband-domain time-frequency transform module 504, a low frequency subband-domain waveform encoding module 505, a high frequency subband-domain parameter encoding module 506, and a bitstream multiplexing module 507.
First, the connection relationship and function of each module in fig. 5 are described in detail, wherein:
analysis subband filterbank block 501 for converting input monophonic soundsPerforming sub-band decomposition on the channel sound signal M to make each sub-band signal correspond to a specific frequency range, and generating sub-band domain signals M of each frequency range1~Mk1+k2Then the low frequency subband domain signal M1~Mk1The high frequency subband signals M are input into the low frequency subband signal type analyzing module 502 and the low frequency subband time-varying prediction analyzing module 503 in units of framesk1+1~Mk1+k2The input is input to the high frequency subband domain parametric coding module 506 in units of frames. Wherein k1 is greater than or equal to 2, and k2 is greater than or equal to 2.
The subband Filter bank analyzing module 501 may perform subband decomposition processing by using a mirror Filter (QMF), a pseudo mirror Filter (PQMF), a Cosine Modulated Filter (CMF), a Discrete Wavelet (Discrete Wavelet) converter, and the like.
A low-frequency subband domain signal type analyzing module 502, configured to perform signal type analysis on each frame of low-frequency subband domain signals received from the subband filter bank analyzing module 501, determine whether the type of the frame of signal is a slowly varying signal or a rapidly varying signal, and if the type of the frame of signal is a slowly varying signal, directly output a signal type, for example, output an identifier indicating that the type of the frame of signal is slowly varying; if the signal is a fast-changing signal, the position of the fast-changing point is continuously calculated, and the corresponding signal type and the position of the fast-changing point are output. The signal type analysis result is output to the low-frequency sub-band time-varying prediction analysis module 503 for sub-frame division control, and is output to the low-frequency sub-band time-frequency transform module 504 for controlling the order of time-frequency transform, and the signal type analysis result can also be output to the bit stream multiplexing module 507 as Side Information (Side Information) of the audio coding code stream. The audio coded stream is data transmitted from the encoding apparatus to the decoding apparatus, and includes audio coded data and side information. The side information is one of code rate data, and is usually control information or parametric coding information, which is used to recover the sound signal at the decoding end. The low frequency sub-band signal type analyzing module 502 may determine the signal type by using signal perceptual entropy, by calculating the energy of the signal frame, and so on. In practice, a mono sound encoding device according to the principles of the present invention may not include this module.
The low-frequency subband domain time-varying prediction analysis module 503 is configured to receive the low-frequency subband domain signal from the analysis subband filter bank module 501, perform subframe division processing on the received low-frequency subband domain signal according to a low-frequency subband domain signal type analysis result received from the low-frequency subband domain signal type analysis module 502, perform prediction analysis on the low-frequency subband domain signal according to subframes, that is, perform linear prediction filtering, to obtain a low-frequency subband domain excitation signal, or called a low-frequency subband domain residual signal, and output the obtained low-frequency subband domain excitation signal to the low-frequency subband domain time-frequency transform module 504 in a frame unit. When Linear prediction filtering is adopted, the low-frequency sub-band domain time-varying prediction analysis module 503 performs Linear prediction analysis on the low-frequency sub-band domain signal to obtain a group of prediction coefficients, converts the prediction coefficients into a group of Line Spectrum Frequencies (LSFs), performs vector quantization on the group of LSF parameters, and obtains an LSF vector quantization index for constructing a Linear prediction synthesis filter at a decoding end, so that the LSF vector quantization index is output to the bit stream multiplexing module 507 as side information.
And a low-frequency sub-band domain time-frequency transformation module 504, configured to receive the low-frequency sub-band domain excitation signal from the low-frequency sub-band domain time-varying prediction analysis module 503, and transform the low-frequency sub-band domain excitation signal from a time domain to a frequency domain by using transformation with different length orders according to a low-frequency sub-band domain signal type analysis result received from the low-frequency sub-band domain signal type analysis module 502, so as to obtain a frequency domain representation of the low-frequency sub-band domain excitation signal, that is, a low-frequency sub-band. And outputting the low-frequency subband domain excitation spectrum obtained by time-frequency transformation to a low-frequency subband domain waveform coding module 505 and a high-frequency subband domain parameter coding module 506. If the mono sound encoding apparatus according to the present principles does not include the low frequency subband domain signal type analyzing module 502, the order is not controlled during time-frequency transform.
And a low-frequency subband domain waveform coding module 505, configured to receive the low-frequency subband domain excitation spectrum from the low-frequency subband domain time-frequency transform module 504, perform quantization coding on the low-frequency subband domain excitation spectrum to obtain low-frequency subband domain waveform coded data, and output the low-frequency subband domain waveform coded data to the bit stream multiplexing module 507 as sound coded data of a sound coded code stream.
A high-frequency subband domain parameter coding module 506, configured to receive the low-frequency subband domain excitation spectrum from the low-frequency subband domain time-frequency transform module 504, receive the high-frequency subband domain signal from the analysis subband filter bank module 501, extract the high-frequency subband domain parameter according to the low-frequency subband domain excitation spectrum and the high-frequency subband domain signal, where the high-frequency subband domain parameter is used to recover the high-frequency subband domain signal from the low-frequency subband domain excitation spectrum at the decoding end, then the high-frequency subband domain parameter coding module 506 performs quantization coding on the extracted high-frequency subband domain parameter, and outputs the obtained high-frequency subband domain parameter coding data to the bit stream multiplexing module 507 as side information.
And a bit stream multiplexing module 507, configured to multiplex the audio encoded data and the side information received from the low-frequency subband domain signal type analyzing module 502, the low-frequency subband domain time-varying prediction analyzing module 503, the low-frequency subband domain waveform encoding module 505, and the high-frequency subband domain parameter encoding module 506 to form an audio encoded code stream.
Since the low frequency subband domain signal and the high frequency subband domain signal each include more than 2 subbands, the low frequency subband domain signal type analyzing module 502 performs signal type analysis on each subband of the low frequency subband domain signal; the low-frequency subband domain time-varying prediction analysis module 503 performs prediction analysis on each subband of the low-frequency subband domain signal; the low-frequency sub-band domain time-frequency transform module 504 performs time-frequency transform on each sub-band of the low-frequency sub-band domain excitation signal; the low-frequency sub-band domain waveform coding module 505 performs quantization coding on each sub-band of the low-frequency sub-band domain excitation spectrum and outputs the coded sub-band to the bit stream multiplexing module 507 in a code stream form; the high-frequency subband domain parameter coding module 506 obtains the high-frequency subband domain parameters of each subband and outputs the high-frequency subband domain parameters to the bit stream multiplexing module 507 in a code stream form. The following detailed description in this embodiment will not describe each sub-band separately.
The following describes the low-frequency subband time-varying prediction analysis module 503, the low-frequency subband time-frequency transform module 504, the low-frequency subband waveform coding module 505, and the high-frequency subband parameter coding module 506 of the above-mentioned monaural sound coding apparatus in detail.
Fig. 6 is a block diagram of the low-frequency subband domain time-varying prediction analysis module 503 in fig. 5. The low frequency subband domain time-varying prediction analysis module 503 is composed of a subframe divider 600, a linear prediction analyzer 601, a converter 602, a vector quantizer 603, an inverse converter 604, and a linear prediction filter 605. Specifically, the sub-frame divider 600 performs a molecular frame process on the input low-frequency sub-band signals y (n) according to the type analysis result of the received low-frequency sub-band signals, divides one frame y (n) into one or more sub-frames for linear prediction analysis, and performs a linear prediction analysis on the low-frequency sub-band signals y (n) by the linear prediction analyzer 601 in units of sub-frames to obtain a set of prediction coefficients aiThe group a is then converted by the converter 602iConverting into a group of Line Spectrum Frequencies (LSFs), sending the group of LSFs into a vector quantizer 603 for vector quantization to obtain LSF vector quantization indexes, and obtaining a group of quantized line spectrum frequencies according to the LSF vector quantization indexesThe group to be obtainedThe inverse transformer 604 computes a set of quantized prediction coefficientsFinally, the sub-frame signals processed by the sub-frame divider 600 by the sub-frame division are filtered by the linear prediction filter 605 formed by the quantized set of prediction coefficients to obtain the low-frequency sub-band excitation signal xeAnd (n) outputting the frame unit. If the sub-frames are divided into a plurality of sub-frames, the sub-frames are combined into a frame output in the division order. Vector quantization is adopted in the prediction analysis process, and the LSF vector quantization index is used as a composition when linear prediction synthesis is carried out at a decoding endThe information of the linear prediction synthesis filter, therefore, at the encoding end, the LSF vector quantization index is output as side information to the bitstream multiplexing module 507. Each sub-frame for linear prediction analysis corresponds to its respective LSF vector quantization index.
When the sub-frames used for linear prediction analysis are divided, if the signal type is a slowly-varying signal, only one sub-frame is divided in one frame, namely the sub-frames are not divided; if the signal type is a fast-changing signal, dividing the frame of low-frequency sub-band domain signal into different sub-frames according to the position of the fast-changing point, which may specifically be: a plurality of 256 sampling point frames before the fast change point are divided into a subframe, the 256 sampling point frames including the fast change point are divided into a subframe, and a plurality of 256 sampling point frames after the fast change point are divided into a subframe. If the mono sound encoding apparatus according to the present principles does not include the low frequency subband-domain signal type analyzing module 502, the low frequency subband-domain time varying predicting module 503 does not include the sub-frame divider 600.
The time-frequency Transform module 504 may use Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Modified Discrete Cosine Transform (MDCT), or other Transform methods for performing time-frequency Transform. And during specific time-frequency transformation, the sub-frames of the low-frequency sub-band domain excitation signals are re-divided according to the analysis result of the low-frequency sub-band domain signal types, one frame of low-frequency sub-band domain excitation signals are divided into one or more sub-frames used for time-frequency transformation, and the low-frequency sub-band domain excitation signals are transformed according to the sub-frames.
When the sub-frame used for conversion is divided, if the signal type is a slowly-varying signal, the sub-frame is divided, and conversion with a longer order is selected; if the signal type is a fast-changing signal, dividing the signal into different subframes or subframes with the same length according to the position of the fast-changing point, and selecting the conversion with a shorter order. When the sub-frames are divided according to the fast-changing point positions, the dividing method is the same as the method for dividing the sub-frames for linear predictive analysis by the low-frequency sub-band domain time-varying predictive analysis module 503.
The method for performing quantization coding on the low-frequency subband domain excitation spectrum by the low-frequency subband domain waveform coding module 505 may adopt a quantization scheme like scalar plus Huffman (Huffman) coding in MPEG AAC, or may adopt a vector quantization scheme. When the vector quantization scheme is adopted, the vector codeword index generated in the vector quantization process is sent to the bitstream multiplexing module 507 (not shown in fig. 5) as side information, and the vector codeword index will be used as information for decoding the low frequency subband domain waveform encoded data by the decoding end.
Fig. 7 is a block diagram illustrating the structure of the high-frequency subband-domain parameter encoding module 506 shown in fig. 5, wherein the high-frequency subband-domain parameter encoding module 506 includes a signal type analyzer 701, a time-varying prediction analyzer 702, a spectral parameter encoder 703, a time-varying prediction synthesizer 704, and a time-domain adaptive gain adjustment parameter extractor 705.
The signal type analyzer 701 is configured to perform signal type analysis on each frame of high-frequency subband domain signal received from the subband filter bank module 501, output a result of the signal type analysis of the high-frequency subband domain signal to the time-varying prediction analyzer 702, the time-varying prediction synthesizer 704, and the time-domain adaptive gain adjustment parameter extractor 705, and output the result to the bitstream multiplexing module 507 as side information. The method for analyzing the signal type of the high-frequency subband-domain signal is the same as the method for analyzing the signal type of the low-frequency subband-domain signal type analyzing module 502 in principle. In practice, the signal type analyzer 701 may not be included in the high frequency subband domain parametric coding module according to the present invention.
The time-varying prediction analyzer 702 is configured to receive the high-frequency subband domain signal output by the analysis subband filter bank module 501, and divide the received high-frequency subband domain signal into one or more subframes according to the analysis result of the high-frequency subband domain signal type from the signal type analyzer 701, where each subframe is a group of high-frequency subband domain signal samples and is a unit of prediction analysis. Here, if the result of the analysis of the high-frequency subband domain signal type is a slowly varying signal, the signal is divided into one subframe, or it may be considered that the subframe division is not performed, and if the signal is a rapidly varying signal, the signal is divided into more than one subframe according to the occurrence position of the rapidly varying point. Then, linear predictive filtering is performed on each subframe to obtain a high-frequency subband domain excitation signal, or called a high-frequency subband domain residual signal, and then the obtained high-frequency subband domain excitation signal is output to the spectral parameter encoder 703. The specific structure of the time-varying prediction analyzer 702 may adopt the structure of the low-frequency subband domain time-varying prediction analysis module 503 shown in fig. 6, and according to the input high-frequency subband domain signal, the sub-frame division, the linear prediction analysis, the conversion from the prediction coefficient to the line spectral frequency, the vector quantization, and the inverse conversion from the line spectral frequency to the prediction coefficient are performed, and finally, a linear prediction filter is generated, and the linear prediction filter is adopted to filter the high-frequency subband domain signal according to the sub-frame, and finally, the obtained high-frequency subband domain excitation signal is output in a frame unit. And if the sub-frames are divided, combining the processed sub-frames into a frame output according to the dividing sequence. In the vector quantization process, the generated high frequency LSF vector quantization index is output to the time-varying prediction synthesizer 704 and is output as side information to the bitstream multiplexing module 507 (not shown in fig. 7). In order to distinguish the LSF vector quantization index generated in the low-frequency subband domain signal processing process, the LSF vector quantization index obtained by performing linear prediction analysis on the high-frequency subband domain signal is referred to as a high-frequency LSF vector quantization index.
A spectrum parameter encoder 703, configured to receive a frame of high-frequency subband domain excitation signal from the time-varying prediction analyzer 702, and receive a low-frequency subband domain excitation spectrum from the low-frequency subband domain time-frequency transform module 504; performing Discrete Fourier Transform (DFT) on a part, corresponding to the low-frequency sub-band domain excitation spectrum, of the high-frequency sub-band domain excitation signal to obtain a high-frequency sub-band domain excitation spectrum, and performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum according to the obtained high-frequency sub-band domain excitation spectrum, wherein the spectrum adjustment comprises tonal adjustment and gain adjustment; finally, the spectrum adjustment parameters are output to the bit stream multiplexing module 507 after quantization coding, and the low frequency sub-band domain excitation spectrum after spectrum adjustment is subjected to Inverse Discrete Fourier Transform (IDFT), so as to obtain a low frequency sub-band domain excitation signal after spectrum adjustment, and the low frequency sub-band domain excitation signal is output to the time-varying prediction synthesizer 704. In this module, besides DFT, other time-frequency transform methods can be applied to the high-frequency subband domain excitation signal to obtain the high-frequency subband domain excitation spectrum. Similarly, the inverse transform also uses a frequency-time transform corresponding to the above-mentioned time-frequency transform.
The spectral parameter encoder 703 includes a DFT transformer 703a, a spectral tonality adjustment and parameter extractor 703b, a spectral gain adjustment and parameter extractor 703c, and an IDFT transformer 703 d. Wherein, the DFT transformer 703a divides a frame of the high frequency subband domain excitation signal received from the time-varying predictive analyzer 702 into one or more subframes for DFT transformation, the dividing method is the same as the method for dividing the low frequency subband domain excitation signal according to the analysis result of the low frequency subband domain signal type (not shown in fig. 5) from the low frequency subband domain signal type analyzing module 502; performing DFT on the high-frequency subband domain excitation signal according to the subframe to obtain a frequency spectrum of the subframe, that is, a high-frequency subband domain excitation spectrum, and outputting the obtained high-frequency subband domain excitation spectrum to the spectrum tonal modification and parameter extractor 703b and the spectrum gain modification and parameter extractor 703 c;
a spectrum tonality adjustment and parameter extractor 703b, configured to perform spectrum tonality adjustment on the low-frequency subband domain excitation spectrum received from the low-frequency subband domain time-frequency transform module 504 according to the high-frequency subband domain excitation spectrum of the subframe output by the DFT transformer 703a, output an obtained parameter of the spectrum tonality adjustment to the bitstream multiplexing module 507 as side information after being subjected to quantization coding, and output the low-frequency subband domain excitation spectrum after the spectrum tonality adjustment to the spectrum gain adjustment and parameter extractor 703 c;
a spectrum gain adjustment and parameter extractor 703c, configured to perform spectrum gain adjustment on the low-frequency subband domain excitation spectrum after the corresponding spectrum tonality adjustment from the spectrum tonality adjustment and parameter extractor 703b according to the high-frequency subband domain excitation spectrum of the subframe from the DFT converter 703a, output the parameters of the spectrum gain adjustment as side information to the bit stream multiplexing module 507 after quantization coding, and output the low-frequency subband domain excitation spectrum after the spectrum gain adjustment to the IDFT converter 703 d;
an IDFT converter 703d, configured to perform IDFT conversion on the low-frequency subband domain excitation spectrum after the spectral gain adjustment from the spectral gain adjustment and parameter extractor 703c, to obtain a low-frequency subband domain excitation signal after the spectral adjustment, where the spectral adjustment includes spectral tonality adjustment and spectral gain adjustment; the spectrally modified low frequency subband excitation signals for each subframe are then recombined into a full frame of spectrally modified low frequency subband excitation signals, which are output to the time-varying prediction synthesizer 704.
A time-varying prediction synthesizer 704, which re-divides the low-frequency subband domain excitation signal from the spectral parameter coding module 703 after one frame of spectral adjustment into one or more sub-frames according to the analysis result of the high-frequency subband domain signal type from the signal type analyzer 701, wherein each sub-frame is a group of low-frequency subband domain excitation signal samples after spectral adjustment, and is a unit of comprehensive filtering; then, a linear prediction synthesis filter is obtained according to the high-frequency LSF vector quantization index from the time-varying prediction analyzer 701, and synthesis filtering is performed on the spectrum-adjusted low-frequency sub-band domain excitation signal of each sub-frame to obtain a reconstructed high-frequency sub-band domain signal, which is output to a time domain adaptive gain adjustment parameter extractor 705 in a frame unit;
a time domain adaptive gain adjustment parameter extractor 705, which divides the reconstructed high frequency sub-band domain signal of each frame from the time varying prediction synthesizer 704 into one or more sub-frames according to the analysis result of the high frequency sub-band domain signal type from the signal type analyzer 701, wherein each sub-frame is a group of reconstructed high frequency sub-band domain signal samples and is a unit for time domain gain adjustment; and receives the high-frequency subband domain signal from the analysis subband filter bank module 501, compares the time domain gain of the reconstructed high-frequency subband domain signal of each subframe with the time domain gain of the high-frequency subband domain signal corresponding to the subframe to obtain a time domain gain adjustment parameter, and outputs the time domain gain adjustment parameter to the bit stream multiplexing module 507 after quantization coding. The time domain gain refers to an average value of signal energy within a subframe.
Therefore, the high frequency subband domain parameters extracted by the high frequency subband domain parameter encoding module 506 include a spectrum adjustment parameter, a time domain gain adjustment parameter, a high frequency LSF vector quantization index, and a high frequency subband domain signal type analysis result. These high frequency subband domain parameters are quantized and encoded to generate high frequency subband domain parameter encoded data, which is output to the bitstream multiplexing module 507 as side information.
The bit stream multiplexing module 507 multiplexes the audio coded data and the side information received from the above modules to form an audio coded stream. The sound encoding data is low-frequency subband waveform encoding data output by the low-frequency subband waveform encoding module 505. The side information includes the low frequency subband domain signal type analysis result output by the low frequency subband domain signal type analysis module 502, the LSF vector quantization index output by the low frequency subband domain time varying prediction analysis module 503, and the high frequency subband domain parameter encoding data output by the high frequency subband domain parameter encoding module 506. The high-frequency sub-band domain parameter coding data comprises a spectrum adjustment parameter, a time domain gain adjustment parameter, a high-frequency LSF vector quantization index and a high-frequency sub-band domain signal type analysis result of quantization coding. When the low-frequency sub-band domain waveform coding module adopts vector quantization coding, the side information also comprises a vector code word index.
The high-frequency subband domain parameter encoding module 506 of the preferred embodiment extracts the high-frequency subband domain parameters for restoring the high-frequency subband domain signal from the low-frequency subband domain excitation spectrum according to the high-frequency subband domain signal and the low-frequency subband domain excitation spectrum. In practice, the high frequency subband-domain parametric coding module 506 receives the low frequency subband-domain excitation signal generated by the low frequency subband-domain time varying prediction analysis module 503, but does not receive the low frequency subband-domain excitation spectrum. In this case, the high frequency subband-domain parameter encoding module 506 may not perform spectral adjustment, i.e., the high frequency subband-domain parameter encoding module 506 does not include the spectral parameter encoder 703, but extracts the high frequency subband-domain parameters for restoring the high frequency subband-domain signal from the high frequency subband-domain excitation signal directly from the low frequency subband-domain excitation signal and the high frequency subband-domain signal, the high frequency subband-domain parameters not including the spectral adjustment parameters.
Fig. 8 is a flowchart of an encoding method of a monaural sound encoding apparatus according to the present invention. As shown in fig. 8, the method comprises the steps of:
step 11: carrying out sub-band decomposition on the input single-sound-channel signals to enable each sub-band signal to correspond to a specific frequency range; the single-channel sound signal after subband decomposition can be simply divided into a low-frequency subband domain signal and a high-frequency subband domain signal according to the frequency range. Wherein the low frequency subband domain signal comprises subband domain signals of at least 2 subbands; the high frequency subband-domain signal also comprises subband-domain signals of at least 2 subbands.
Step 12: performing signal type analysis on the low-frequency sub-band domain signal, and if the low-frequency sub-band domain signal is a slowly varying type signal, directly determining the signal type as a low-frequency sub-band domain signal type analysis result; if the signal is a fast-changing type signal, the position of the fast-changing point is continuously calculated, and finally the signal type and the fast-changing point position are determined as the analysis result of the low-frequency sub-band domain signal type.
Step 13: and performing molecular frame processing on the low-frequency sub-band domain signal according to the type analysis result of the low-frequency sub-band domain signal, and performing predictive analysis, namely linear predictive filtering, according to the sub-frame to obtain a low-frequency sub-band domain excitation signal.
Step 14: and performing time-frequency transformation on the low-frequency sub-band domain excitation signal by adopting different length orders according to the type analysis result of the low-frequency sub-band domain signal to obtain a low-frequency sub-band domain excitation spectrum.
Step 15: and carrying out quantization coding on the low-frequency subband domain excitation spectrum to obtain low-frequency subband domain waveform coded data.
Step 16: and extracting high-frequency subband domain parameters for recovering the high-frequency subband domain signals from the low-frequency subband domain excitation spectrum according to the low-frequency subband domain excitation spectrum and the high-frequency subband domain signals, and carrying out quantitative coding on the high-frequency subband domain parameters to obtain high-frequency subband domain parameter coded data.
And step 17: and multiplexing the low-frequency sub-band domain waveform coded data and the side information to obtain a sound coded code stream. The side information comprises a low-frequency subband domain signal type analysis result, high-frequency subband domain parameter coding data and LSF vector quantization indexes generated in the linear prediction filtering process.
Wherein, because the low frequency sub-band domain signal and the high frequency sub-band domain signal respectively include more than 2 sub-bands, the above step 12 performs signal type analysis on each sub-band of the low frequency sub-band domain signal; step 13, performing predictive analysis on each subband of the low-frequency subband domain signal; step 14, performing time-frequency transformation on each sub-band of the low-frequency sub-band domain excitation signal; step 15, carrying out quantization coding on each sub-band of the low-frequency sub-band domain excitation spectrum; step 16 calculates the high frequency subband domain parameters for each subband. The following detailed description in this embodiment will not describe each sub-band separately.
The steps in fig. 8 are explained in detail below.
The subband filter bank of step 11 may use techniques such as QMF, PQMF, CMF, and discrete wavelet transform. Preferably, the subband decomposition process is described by taking PQMF as an example.
The impulse response of the analysis filter in PQMF is:
wherein p is0And (N) is a prototype filter, M is the number of subbands, N is the order of the prototype filter, M is more than or equal to 4, k is more than or equal to 0 and less than M-1, N is more than or equal to 0 and less than 2KM-1, and N is 2 KM.
The steps of subband decomposition with PQMF are as follows:
step 11 a: translating the data input into the PQMF buffer, and moving the data in the buffer out of the buffer, wherein the number of the moved data is M and is the same as the number of the sub-bands, namely:
x[n]=x[n-M],n=N-1~M
wherein x [ n ] is a Pulse Code Modulation (PCM) sample of the input sound signal;
step 11 b: m newly added samples are input into a buffer area to obtain a vector to be filtered
u[n]=x[n]×c[n],n=0~N-1
wherein c [ n ] is a window function;
step 11 e: calculating sub-band data yk(n):
Wherein k is more than or equal to 0 and less than M-1, and n is more than or equal to 0 and less than 2 KM-1.
There are many methods for determining the type of the low-frequency subband domain signal in step 12, for example, determining the type of the signal through signal perceptual entropy, determining the type of the signal through calculating the energy of the signal subframe, and the like. Preferably, a method for judging the signal type by calculating the subframe energy of the signal is adopted, and the specific process is as follows:
step 12 a: high-pass filtering a frame of low-frequency subband domain signal y (n), and filtering out low-frequency parts, such as frequencies below 500 Hz;
step 12 b: dividing the high-pass filtered signal into a plurality of subframes yi (n), for the convenience of calculation, one frame of signal is generally divided into an integer number of subframes, for example, when one frame is 2048 points, 256 points are one subframe;
step 12 c: and respectively calculating the energy Ei of each subframe yi (n), wherein i is the sequence number of the subframe. And then, the energy ratio of the current subframe to the previous subframe is calculated, when the energy ratio is greater than a certain threshold Te, the type of the frame signal is judged to be a fast-changing signal, and if the energy ratios of all the subframes to the previous frame are less than Te, the type of the frame signal is judged to be a slowly-changing signal. If the signal is a fast-changing signal, continuing to execute the step 12d, otherwise, not executing the step 12d, and determining the slowly-changing signal type as a low-frequency sub-band signal type analysis result. The threshold Te in this method can be obtained by some methods known in signal processing, such as counting the average ratio of the encoded signal energies and multiplying by a constant to obtain Te;
step 12 d: and for the fast-changing signals, judging the subframe with the maximum energy as the position of the fast-changing point. And determining the fast-changing signal type and the position of the fast-changing point as the analysis result of the low-frequency sub-band signal type.
In step 13, for a frame signal in each low-frequency subband domain, if the frame signal type is a fast-change signal, dividing the low-frequency subband domain signal of the frame into subframes according to the position of a fast-change point, specifically, dividing a plurality of 256 sampling point frames before the fast-change point into one subframe, dividing the 256 sampling point frames including the fast-change point into one subframe, dividing a plurality of 256 sampling point frames after the fast-change point into one subframe, otherwise, dividing into one subframe or not performing subframe division processing. And then carrying out linear prediction filtering on the frame signal of the low-frequency subband domain according to the sub-frame to obtain low-frequency subband domain excitation signals on each low-frequency subband domain of the frame and an LSF vector quantization index of each sub-frame.
The p-order linear predictive analysis processing for the N-point low-frequency subband domain signals y (N) of a subframe comprises the following steps:
step 13 a: calculating the autocorrelation coefficient r (k) of the low-frequency subband domain signal y (n) of the current subframe,
<math>
<mrow>
<mi>r</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mi>k</mi>
</mrow>
<mi>N</mi>
</munderover>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>×</mo>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>,</mo>
<mi>k</mi>
<mo>∈</mo>
<mrow>
<mo>[</mo>
<mn>0</mn>
<mo>,</mo>
<mi>p</mi>
<mo>]</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
step 13 b: adopting the obtained autocorrelation coefficients r (k), and obtaining a group of prediction coefficients a by recursively executing a Levinson-Durbin algorithmiAnd from the prediction coefficient aiForming a linear prediction filter
<math>
<mrow>
<mi>A</mi>
<mrow>
<mo>(</mo>
<mi>z</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mn>1</mn>
<mo>-</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
<mi>p</mi>
</munderover>
<msub>
<mi>a</mi>
<mi>i</mi>
</msub>
<msup>
<mi>z</mi>
<mrow>
<mo>-</mo>
<mi>i</mi>
</mrow>
</msup>
<mo>;</mo>
</mrow>
</math>
Step 13 c: by applying two polynomials Seeking root, will aiConversion to a set of line-spectrum pairs LSPiAnd is paired with LSPiObtaining line spectrum frequency LSFi;
Step 13 d: carrying out vector quantization on the line spectrum frequency to obtain the quantized line spectrum frequencyAnd converted into quantized line spectrum pairThe LSF vector quantization index is output to the bitstream multiplexing module 507 as side information for generating a linear prediction synthesis filter at the decoding apparatus side;
step 13 e: from quantized line spectrum pairsBy calculating f1(z) and f2(z) determining quantized Linear prediction Filter coefficientsAnd forming a quantized filter
<math>
<mrow>
<mover>
<mi>A</mi>
<mo>^</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>z</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mn>1</mn>
<mo>-</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>P</mi>
</munderover>
<mrow>
</mrow>
<msub>
<mover>
<mi>a</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<msup>
<mi>z</mi>
<mrow>
<mo>-</mo>
<mi>i</mi>
</mrow>
</msup>
<mo>;</mo>
</mrow>
</math>
Step 13 f: calculating the predicted low-frequency sub-band domain excitation signal by the quantized filter
<math>
<mrow>
<mi>e</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>P</mi>
</munderover>
<msub>
<mover>
<mi>a</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>.</mo>
</mrow>
</math>
In step 14, there are many methods for performing time-frequency transformation on the low-frequency subband domain excitation signal, such as Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Modified Discrete Cosine Transform (MDCT), and so on. Preferably, the time-frequency transformation process is described by taking Discrete Fourier Transform (DFT) and Modified Discrete Cosine Transform (MDCT) as examples.
For the time-frequency transformation by Discrete Fourier Transform (DFT), firstly, the current frame is processed by the molecular frame according to the analysis result of the signal type, the low-frequency sub-band domain excitation signal of M + N samples is selected from the initial position of the current sub-frame of the excitation signal, and x is used hereeAnd (N) represents that M is the data length of the current subframe, and N is the overlapping length of the next subframe. The length of M and N is determined by the signal type of the current frame: when the signal type is a slowly varying signal, dividing into one sub-frame, and selecting a longer order for M and N, assuming that a frame length is 2048 in this embodiment, then M is 2048, and N is 256; n may be 1/8 for M; when the signal type is a fast-changing signal, the sub-frames may be divided according to the position of the fast-changing point, the length of the sub-frame is taken as M, N is M/8, or one frame is dividedA shorter order is selected for M and N for a plurality of subframes of equal length, and in this embodiment, a frame is divided into 8 equal-length subframes, where M is 256 and N is 32. Then, windowing is carried out on the low-frequency sub-band domain excitation signals of the M + N samples to obtain a windowed signal xw(n)=w(n)xe(n) of (a). Where w (n) is a window function, various window functions may be used in an implementation, for example, cosine windows may be employed, i.e.
Wherein N is0The length of the overlap of the current frame is determined by the signal type of the previous frame. Then, DFT conversion is carried out on the signals after windowing, thereby obtaining M + N low-frequency sub-band domain excitation spectrum coefficients,
<math>
<mrow>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>M</mi>
<mo>+</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<msub>
<mi>x</mi>
<mi>w</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mi>π</mi>
<mo>/</mo>
<mrow>
<mo>(</mo>
<mi>M</mi>
<mo>+</mo>
<mi>N</mi>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mi>kn</mi>
</mrow>
</msup>
<mo>,</mo>
</mrow>
</math>
k∈[0,M+N-1]
for the case of performing time-frequency transformation by using Modified Discrete Cosine Transform (MDCT), firstly, selecting low-frequency sub-band domain excitation signals of M samples of a previous sub-frame and M samples of a current sub-frame, then performing windowing on the low-frequency sub-band domain excitation signals of 2M samples of the two sub-frames, and then performing MDCT on the windowed signals to obtain M low-frequency sub-band domain excitation spectral coefficients.
The impulse response of the MDCT analysis filter is:
the MDCT transforms to:
<math>
<mrow>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mn>2</mn>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<msub>
<mi>x</mi>
<mi>e</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>h</mi>
<mi>k</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math>
k is more than or equal to 0 and less than or equal to M-1, wherein: w (n) is a window function; x is the number ofe(n) is the input low frequency subband domain excitation signal of the MDCT transform; x (k) is the low frequency subband domain excitation spectrum of the output of the MDCT transform.
To satisfy the condition of complete reconstruction of the signal, the window function w (n) of the MDCT transform must satisfy the following two conditions:
w (2M-1-n) ═ w (n) and w2(n)+w2(n+M)=1。
In practice, a Sine window may be selected as the window function. Of course, the above-described limitation of the window function can also be modified by using a biorthogonal transform with specific analysis filters and synthesis filters.
Thus, the frame data which is subjected to time-frequency transformation by MDCT can obtain different time-frequency plane graphs according to the signal type. For example, if the time-frequency transformation order of the current frame is 2048 when the current frame is a slowly varying signal and 256 when the current frame is a rapidly varying signal type, the time-frequency plane graph is as shown in fig. 9, where fig. 9a is the time-frequency plane graph of the slowly varying signal; fig. 9b is a time-frequency plane diagram of a fast-varying signal.
In step 15, the waveform quantization coding of the low-frequency subband domain excitation spectrum may use a quantization scheme similar to the scalar plus huffman coding in mpeg aac, or may use a vector quantization scheme. In fixed rate coding, a vector quantizer is a reasonable choice. In one embodiment of the present invention, the preferred embodiment employs an 8-dimensional vector quantization scheme. The vector quantization module performs 8-dimensional vector quantization on the excitation signal spectrum of the low-frequency subband domain. The result of the vector quantization is output to the bitstream multiplexing module 507.
The vector quantization method specifically comprises the following steps: firstly, forming a plurality of 8-dimensional vector signals by using frequency domain coefficients on a low-frequency sub-band domain; and then searching a code word with the minimum distance from the vector to be quantized in the code book by a full search method according to a perception distance measure criterion to obtain a vector code word index of the code word. The vector code word index is added with a sound coding code stream for decoding low-frequency sub-band domain waveform coding data at a decoding end. Wherein the perceptual distance may employ a euclidean distance measure.
In step 16, the high frequency subband domain parametric coding is a method for extracting the high frequency subband domain parameters used for restoring the high frequency subband domain signal according to the low frequency subband domain excitation spectrum and the high frequency subband domain signal. The method for encoding the high-frequency subband domain parameters comprises the following steps:
step 16-1: performing signal type analysis on the high-frequency sub-band domain signal obtained in the step 11 to determine a high-frequency sub-band domain signal type analysis result;
wherein, the high-frequency subband domain signal type analysis comprises the following steps: and judging whether the type of the high-frequency sub-band domain signal is a slowly-varying signal or a rapidly-varying signal, if so, further analyzing the position of a rapid-varying point, and determining the type analysis result of the high-frequency sub-band domain signal. Here, the judging and analyzing method is the same as the signal type analyzing method of the low frequency subband domain signal, and see the specific implementation process of step 12.
Step 16-2: and dividing the high-frequency sub-band domain signal into one or more than one sub-frames for predictive analysis according to the analysis result of the type of the high-frequency sub-band domain signal, and performing linear predictive filtering according to the sub-frames to obtain a high-frequency LSF vector quantization index of the high-frequency sub-band domain signal and a high-frequency sub-band domain excitation signal. The obtaining method is the same as the method for calculating the LSF vector quantization index of the low-frequency subband domain signal and the low-frequency subband domain excitation signal, and the specific implementation process in step 13 is shown.
Step 16-3: and performing time-frequency transformation with different orders on the high-frequency sub-band domain excitation signal to obtain a high-frequency sub-band domain excitation spectrum. For example, a DFT transform may be employed. The order of the time-frequency transformation is the same as the order of the time-frequency transformation performed on the low-frequency sub-band domain excitation signal in step 14, so that the high-frequency sub-band domain excitation spectrum obtained in this step corresponds to the low-frequency sub-band domain excitation spectrum obtained in step 14. See step 14 for a specific implementation.
Step 16-4: and performing spectrum tonal modification adjustment in spectrum adjustment on the low-frequency sub-band domain excitation spectrum according to the high-frequency sub-band domain excitation spectrum, and obtaining a spectrum tonal modification parameter and the low-frequency sub-band domain excitation spectrum after the spectrum tonal modification.
Wherein the spectral tonal adjustment comprises:
step 16-4 a: dividing the high-frequency sub-band domain excitation spectrum and the low-frequency sub-band domain excitation spectrum into a plurality of frequency bands, and calculating the tonality T of the high-frequency sub-band domain excitation spectrum of each frequency bandrefAnd the tone T of the low-frequency sub-band domain excitation spectrum of the corresponding frequency bandestAnd determining TestAnd TrefSize of (c), if Test<Tref-T0If the adjustment type is chord adding adjustment, executing step 16-4 b; if Tref-Tref-T0≤Test≤T1If yes, the adjustment type is set as no adjustment, and the process is directly switched to the spectrum gain adjustment of the step 16-5; if Test>Tref+T1If the adjustment type is noise adjustment, executing step 16-4 c; wherein T is0、T1Is a preset constant;
step 16-4 b: setting the type of adjustment as a chording adjustment, setting the type of adjustment and a chording factor Using the chordal energy as spectral tuning adjustment parameter
<math>
<mrow>
<mi>Δ</mi>
<msub>
<mi>E</mi>
<mi>T</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>E</mi>
<mi>est</mi>
</msub>
<mo>·</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>ref</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mi>est</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mn>1</mn>
<mo>+</mo>
<msub>
<mi>T</mi>
<mi>est</mi>
</msub>
</mrow>
</mfrac>
</mrow>
</math>
Performing chord adding adjustment on the low-frequency sub-band domain excitation spectrum and turning to the spectrum gain adjustment of the step 16-5;
step 16-4 c: setting the adjustment type as a noise-adding adjustment, and setting the adjustment type and a noise-adding factor
<math>
<mrow>
<mover>
<mi>p</mi>
<mo>^</mo>
</mover>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>T</mi>
<mi>ref</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mi>est</mi>
</msub>
</mrow>
<mrow>
<msub>
<mi>T</mi>
<mi>ref</mi>
</msub>
<mo>·</mo>
<mrow>
<mo>(</mo>
<msub>
<mrow>
<mn>1</mn>
<mo>+</mo>
<mi>T</mi>
</mrow>
<mi>est</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
</math>
As spectral tuning adjustment parameter, using noise energy
<math>
<mrow>
<mi>Δ</mi>
<msub>
<mi>E</mi>
<mi>N</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<msub>
<mi>E</mi>
<mi>est</mi>
</msub>
<mo>·</mo>
<mrow>
<mo>(</mo>
<msub>
<mi>T</mi>
<mi>est</mi>
</msub>
<mo>-</mo>
<msub>
<mi>T</mi>
<mi>ref</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msub>
<mi>T</mi>
<mi>ref</mi>
</msub>
<mo>·</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>+</mo>
<msub>
<mi>T</mi>
<mi>est</mi>
</msub>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
</math>
And (5) carrying out noise adjustment on the low-frequency sub-band domain excitation spectrum and transferring to the spectrum gain adjustment of the step 16-5.
Step 16-5: and performing spectrum gain adjustment in spectrum adjustment on the low-frequency sub-band domain excitation spectrum after the spectrum tone adjustment according to the high-frequency sub-band domain excitation spectrum, and obtaining a spectrum gain adjustment parameter and the low-frequency sub-band domain excitation spectrum after the spectrum gain adjustment. The method comprises the following steps:
step 16-5 a: dividing the high-frequency sub-band domain excitation spectrum and the low-frequency sub-band domain excitation spectrum after spectral tonality adjustment into a plurality of sub-bands, and respectively calculating the energy of the high-frequency sub-band domain excitation spectrum of each sub-band and the low-frequency sub-band domain excitation spectrum after spectral tonality adjustment;
step 16-5 b: for any frequency band, calculating the energy ratio of the high-frequency sub-band domain excitation spectrum of the frequency band and the low-frequency sub-band domain excitation spectrum after spectral tonality adjustment, and taking the square root of the energy ratio as the frequency domain gain adjustment parameter of the frequency band;
step 16-5 c: multiplying each spectral line in any sub-frequency band of the low-frequency sub-band domain excitation spectrum after the spectral tonality adjustment by a spectral gain adjustment parameter corresponding to the sub-frequency band to obtain a low-frequency sub-band domain excitation spectrum after the spectral gain adjustment; and sets of spectral gain adjustment parameters for the respective sub-bands as spectral gain adjustment parameters.
Step 16-6: and performing frequency-time transformation on the low-frequency sub-band domain excitation spectrum after spectrum adjustment to obtain an adjusted low-frequency sub-band domain excitation signal. Wherein the spectral adjustment includes a spectral tonality adjustment and a spectral gain adjustment. Corresponding to the time-frequency transformation of step 16-3, the IDFT transformation is adopted in this step.
Step 16-7: and performing comprehensive filtering on the adjusted low-frequency sub-band domain excitation signal according to the type analysis result of the high-frequency sub-band domain signal to obtain a reconstructed high-frequency sub-band domain signal. The method specifically comprises the following steps:
step 16-7 a: dividing the adjusted low-frequency subband domain excitation signal of one frame into one or more subframes for predictive synthesis according to the analysis result of the high-frequency subband domain signal type, wherein the dividing method is the same as the principle of the subframe dividing method in the step 16-2;
step 16-7 b: for each subframe, performing inverse vector quantization by using a group of high-frequency LSF vector quantization indexes in the subframe obtained in the step 16-2 to obtain a line spectrum frequency of vector quantization, converting the line spectrum frequency into a linear prediction filter coefficient after vector quantization, and forming a linear prediction comprehensive filter by the obtained coefficients;
step 16-7 c: and according to the sub-frame, carrying out comprehensive filtering on the adjusted low-frequency sub-band domain excitation signal through a linear prediction comprehensive filter corresponding to the sub-frame to obtain a frame of reconstructed high-frequency sub-band domain signal.
Step 16-8: and extracting a time domain gain adjustment parameter according to the analysis result of the type of the high-frequency sub-band domain signal, the time domain gain of the reconstructed high-frequency sub-band domain signal and the time domain gain of the high-frequency sub-band domain signal obtained in the step 11. The specific method comprises the following steps: dividing the obtained reconstructed high-frequency sub-band domain signal of each frame into one or more sub-frames according to the type analysis result of the high-frequency sub-band domain signal; and calculating the ratio of the energy average value of the reconstructed high-frequency sub-band domain signal of each subframe to the energy average value of the high-frequency sub-band domain signal corresponding to the subframe, and taking the square root of the ratio as a time domain gain adjustment parameter for adjusting the subframe.
Step 16-9: and carrying out quantization coding on the high-frequency sub-band domain parameters including the high-frequency sub-band domain signal type analysis result obtained in the step 16-1, the high-frequency LSF vector quantization index obtained in the step 16-2, the spectrum adjustment parameter obtained in the step 16-6 and the time domain gain adjustment parameter obtained in the step 16-8, and then outputting.
In the preferred embodiment, if step 16 extracts the high frequency subband domain parameters directly according to the low frequency subband domain excitation signal and the high frequency subband domain signal obtained in step 13, the step of spectrum adjustment is not included in the extraction process, and therefore the spectrum adjustment parameters are not included in the high frequency subband domain parameters.
The following describes a mono audio decoding apparatus and method according to a preferred embodiment of the present invention, and since the decoding process is the inverse of the encoding process, only the decoding process will be described briefly. Fig. 10 is a block diagram showing the configuration of a monaural sound decoding apparatus according to a preferred embodiment of the present invention. The sound decoding apparatus of the preferred embodiment of the present invention includes: a bitstream demultiplexing module 1001, a low frequency subband-domain waveform decoding module 1002, a low frequency subband-domain frequency-time transform module 1003, a low frequency subband-domain time-varying prediction synthesis module 1004, a high frequency subband-domain parameter decoding module 1005, and a synthesis subband filterbank module 1006.
Next, the connection relationship and functions of the respective modules shown in fig. 10 are described in general.
A bit stream demultiplexing module 1001 configured to demultiplex a received sound coding stream to obtain sound coding data and side information of a corresponding data frame, and output the corresponding sound coding data to the low-frequency subband waveform decoding module 1002; corresponding side information is output to the low frequency subband-domain frequency-time transform module 1003, the low frequency subband-domain time-varying prediction synthesis module 1004, and the high frequency subband-domain parameter decoding module 1005.
Wherein, the data output to the low frequency subband domain waveform decoding module 1002 is low frequency subband domain waveform encoded data; the side information output to the low frequency subband domain frequency-time transform module 1003 is a low frequency subband domain signal type analysis result; the side information output to the low frequency subband domain time varying prediction synthesis module 1004 includes a low frequency subband domain signal type analysis result and an LSF vector quantization index; the side information output to the high frequency subband domain parameter decoding module 1005 is high frequency subband domain parameter encoded data, and includes: the high-frequency sub-band domain signal type analysis result of the quantization coding, the high-frequency LSF vector quantization index, the spectrum adjustment parameter and the time domain gain adjustment parameter.
Since the bit stream demultiplexing module 1001 receives the sound coding code stream sent by the monaural sound coding apparatus shown in fig. 5, the low-frequency subband domain waveform coding data obtained by demultiplexing the sound coding code stream includes coding data of at least 2 subbands; the high frequency subband domain encoded data comprises encoded data for at least 2 subbands; the low frequency subband-domain signal type analysis result and the LSF vector quantization index also include corresponding data for at least 2 subbands, respectively.
A low frequency subband waveform decoding module 1002, configured to perform inverse quantization decoding on the low frequency subband waveform encoded data received from the bitstream demultiplexing module 1001, and output the obtained inverse quantized low frequency subband excitation spectrum to the low frequency subband frequency-time transform module 1003 and the high frequency subband parameter decoding module 1005. Here, the inverse quantization decoding method is an inverse process of applying quantization coding in the low frequency subband domain waveform coding module 505 at the encoding end.
A low-frequency subband frequency-time transform module 1003, configured to perform frequency-time transform on the low-frequency subband excitation spectrum received from the low-frequency subband waveform decoding module 1002, where the frequency-time transform obtains the low-frequency subband excitation signal by adopting transforms of different length orders according to the low-frequency subband signal type analysis result in the side information output by the bitstream demultiplexing module 1001, and outputs the low-frequency subband excitation signal to the low-frequency subband time-varying prediction synthesizing module 1004 in units of frames. The frequency-time Transform method is an Inverse process of the time-frequency Transform in the encoding-side low-frequency sub-band domain time-frequency Transform module 504, and includes Inverse Discrete Fourier Transform (IDFT), Inverse Discrete Cosine Transform (IDCT), Inverse Modified Discrete Cosine Transform (IMDCT), and the like.
The low-frequency subband domain time-varying prediction synthesis module 1004 performs sub-frame processing on a frame of low-frequency subband domain excitation signal received from the low-frequency subband domain frequency-time transform module 1003 according to a low-frequency subband domain signal type analysis result in the side information output by the bit stream demultiplexing module 1001, and then performs prediction synthesis according to a sub-frame to obtain a sub-frame low-frequency subband domain signal. If the frame signal is a fast-changing signal, dividing the low-frequency sub-band domain excitation signal of the frame into more than one sub-frame according to the position of a fast-changing point, performing prediction synthesis on the low-frequency sub-band domain excitation signal of the corresponding sub-frame according to the LSF vector quantization index of each sub-frame in the side information to obtain a low-frequency sub-band domain signal of each sub-frame, and finally combining the low-frequency sub-band domain signals of each sub-frame into the low-frequency sub-band domain signal of the frame; if the signal is a slowly-varying signal, dividing into a sub-frame, which is equivalent to the condition that the sub-frame is not divided, so that the low-frequency sub-band domain excitation signal of the frame is subjected to prediction synthesis according to the LSF vector quantization index of the frame to obtain the low-frequency sub-band domain signal of the frame.
A high frequency subband domain parameter decoding module 1005, configured to perform inverse quantization on the high frequency subband domain parameter encoded data received from the bitstream demultiplexing module 1001 to obtain high frequency subband domain parameters; the high frequency subband domain signal is restored according to the low frequency subband domain excitation spectrum and the high frequency subband domain parameters output by the low frequency subband domain waveform decoding module 1002. When the high-frequency sub-band domain signal is restored according to the high-frequency sub-band domain parameters, the high-frequency sub-band domain signal of one sub-band is restored according to the low-frequency sub-band domain excitation spectrum of one sub-band; the high-frequency subband domain signals of a plurality of subbands can be restored by adopting the low-frequency subband domain excitation spectrum of the same subband, and only for the high-frequency subband domain signals of different subbands restored by adopting the low-frequency subband domain excitation spectrum of the same subband, the high-frequency LSF vector quantization indexes for restoring the signals are different.
A synthesis subband filterbank module 1006, configured to perform subband synthesis on the low-frequency subband signal output by the low-frequency subband time-varying prediction synthesis module 1004 and the high-frequency subband signal output by the high-frequency subband parameter decoding module 1005 to obtain a decoded mono sound signal.
Corresponding to the encoding end, the low frequency subband domain waveform decoding module 1002 performs inverse quantization decoding on each subband of the low frequency subband domain waveform encoded data; the low-frequency subband domain frequency-time transformation module 1003 performs frequency-time transformation on each subband of the low-frequency subband domain excitation spectrum; the low-frequency subband domain time-varying prediction synthesis module 1004 performs prediction synthesis on each subband of the low-frequency subband domain excitation signal; the high-frequency subband domain parameter decoding module 1005 obtains the high-frequency subband domain signal of the corresponding subband according to the high-frequency subband domain parameter encoded data of each subband. The following detailed description of the present embodiment will not be described in detail for each sub-band.
The following describes the details of the low-frequency subband time-varying prediction and synthesis module 1004 and the high-frequency subband parameter decoding module 1005 of the monaural speech decoding apparatus.
Fig. 11 is a block diagram of the low frequency subband domain time-varying prediction synthesis module 1004 shown in fig. 10, which is composed of a de-vector quantizer 1101, a converter 1102 and a linear prediction synthesizer 1103. Wherein the de-vector quantizer 1101 receives the LSF vector quantization index of the sub-frame in the side information, and decodes the quantized line spectrum frequency from the LSF vector quantization indexOutput to the converter 1102. Converter 1102 converts the received quantized line spectral frequenciesConversion to a quantized set of linear prediction filter coefficientsThese coefficients are input to a linear prediction synthesizer 1103. The linear prediction synthesizer 1103 receives the coefficientsForming a linear predictive synthesis filterDividing the low-frequency subband domain excitation signal received from the low-frequency subband domain frequency-time conversion module 1003 into subframes according to the analysis result of the low-frequency subband domain signal type in the side information, and adopting a linear prediction synthesis filterAnd carrying out comprehensive filtering on the low-frequency sub-band domain excitation signal according to the sub-frames to obtain a restored low-frequency sub-band domain signal. Predictive synthesis is the inverse of predictive analysis in the encoding-side low-frequency subband domain time-varying predictive analysis module 503.
Fig. 12 is a block diagram of a high frequency subband-domain parameter decoding module 1005 shown in fig. 10, which includes a spectral parameter decoder 1201, an adaptive time-domain gain decoder 1202, and a time-varying prediction synthesizer 1203.
The spectral parameter decoder 1201 first obtains the low-frequency subband domain excitation spectrum from the low-frequency subband domain waveform decoding module 1002, correspondingly adjusts the low-frequency subband domain excitation spectrum according to the spectrum adjustment parameters (including the tonality adjustment parameter and the gain adjustment parameter) obtained from the bitstream demultiplexing module 1001, then performs frequency-time transformation such as IDFT on the adjusted low-frequency subband domain excitation spectrum, and obtains the adjusted low-frequency subband domain excitation signal, and outputs the adjusted low-frequency subband domain excitation signal to the adaptive time domain gain decoder 1202.
The spectral parameter decoder 1201 includes a spectral tonality adjuster 1201a, a spectral gain adjuster 1201b, and an IDFT transformer 1201 c. The spectral tonality adjuster 1201a is configured to receive spectral tonality adjustment parameters in the side information from the bitstream demultiplexing module 1001, and perform spectral tonality adjustment on the low-frequency subband domain excitation spectrum obtained from the low-frequency subband domain waveform decoding module 1002 according to the spectral tonality adjustment parameters; a spectrum gain adjuster 1201b, configured to receive a spectrum gain adjustment parameter in the side information from the bitstream demultiplexing module 1001, and perform spectrum gain adjustment on the low-frequency subband domain excitation spectrum after spectrum tonality adjustment according to the spectrum gain adjustment parameter; the IDFT transformer 1201c performs IDFT transformation on the low-frequency subband domain excitation spectrum after spectral gain adjustment to obtain a low-frequency subband domain excitation signal, and outputs the low-frequency subband domain excitation signal to the adaptive time domain gain decoder 1202.
The adaptive time domain gain decoder 1202 divides the adjusted low-frequency subband domain excitation signal from the spectral parameter decoder 1201 into one or more subframes according to the analysis result of the high-frequency subband domain signal type in the side information output by the bit stream demultiplexing module 1001; the high-frequency sub-band domain signal type analysis result comprises a signal type and a position of a quick change point when a quick change signal occurs, and the sub-frame division method is the same as that in the time domain adaptive gain adjustment parameter extractor 705 at the encoding end; then, the adaptive time-domain gain decoder 1202 performs time-domain gain adjustment on the low-frequency subband domain excitation signal of each subframe according to the time-domain gain adjustment parameter of the corresponding subframe obtained from the bitstream demultiplexing module 1001, and outputs the obtained time-domain gain-adjusted low-frequency subband domain excitation signal to the time-varying prediction synthesizer 1203.
The time domain gain adjusting method comprises the following steps: first, quantized time-domain gain adjustment parameters of a subframe to be subjected to time-domain gain adjustment obtained from the bitstream demultiplexing module 1001 are dequantized, and excitation signals in the subframe are all multiplied by the dequantized adjustment parameters. The obtained result is the low-frequency sub-band domain excitation signal after the time domain gain adjustment.
A time-varying prediction synthesizer 1203, which combines the low-frequency subband domain excitation signals of one or several subframes from the adaptive time-domain gain decoder 1202 into a frame of low-frequency subband domain excitation signals according to the time sequence thereof, and divides the frame of low-frequency subband domain excitation signals into subframes again according to the analysis result of the high-frequency subband domain signal type in the output side information from the bit stream demultiplexing module 1001, wherein the method for dividing the subframes is the same as that in the time-varying prediction analyzer 702 at the encoding end; then, a linear prediction synthesis filter of each sub-frame is formed according to the high-frequency LSF vector quantization index obtained from the bit stream demultiplexing module 1001, and a low-frequency sub-band domain excitation signal of the corresponding sub-frame is subjected to synthesis filtering to obtain a reconstructed high-frequency sub-band domain signal of the corresponding sub-frame. The method of the synthesis filtering is the same as the time-varying prediction synthesizer 704 at the encoding end; and outputs the obtained reconstructed high frequency subband domain signal of the corresponding subframe as a decoded high frequency subband domain signal to the synthesis subband filter bank module 1006.
Corresponding to the encoding end, when the high-frequency subband domain parameter does not include the spectrum adjustment parameter, the high-frequency subband domain parameter decoding module 1005 may also obtain the low-frequency subband domain excitation signal from the low-frequency subband domain frequency-time transform module 1003, so the high-frequency subband domain parameter decoding module 1005 does not perform spectrum adjustment, that is, the high-frequency subband domain parameter decoding module 1005 does not include the spectrum parameter decoder 1201, and performs synthesis filtering on the low-frequency subband domain excitation signal after the time domain gain adjustment only according to the linear prediction synthesis filter formed by the high-frequency LSF vector quantization index, thereby obtaining the reconstructed high-frequency subband domain signal, and realizing the recovery of the high-frequency subband domain signal.
Fig. 13 is a flowchart of a decoding method of the monaural sound decoding apparatus according to the present invention. As shown in fig. 13, the method includes the steps of:
step 21: and demultiplexing the sound coding code stream to obtain low-frequency sub-band domain waveform coding data and all side information used for decoding.
Step 22: and decoding and inversely quantizing the low-frequency subband domain waveform coded data to obtain an inversely quantized low-frequency subband domain excitation spectrum.
Step 23: and performing frequency-time transformation on the inversely quantized low-frequency subband domain excitation spectrum to obtain a low-frequency subband domain excitation signal.
Step 24: and obtaining a linear prediction synthesis filter according to the LSF vector quantization index read from the side information, performing synthesis filtering on the low-frequency sub-band domain excitation signal, and recovering the low-frequency sub-band domain signal.
Step 25: and restoring the high-frequency subband domain signal according to the low-frequency subband domain excitation spectrum obtained in the step 22 and the high-frequency subband domain parameter coding data in the side information.
Step 26: and combining the low-frequency sub-band domain signal obtained in the step 24 and the high-frequency sub-band domain signal obtained in the step 25, and performing sub-band synthesis by using a comprehensive sub-band filter bank to recover the single-channel sound signal.
Corresponding to the encoding end, the low frequency subband domain waveform encoded data obtained in step 21 includes waveform encoded data of at least two subbands, and the side information includes high frequency parameter encoded data of at least two subbands. Step 22, carrying out inverse quantization decoding on each subband of the low-frequency subband domain waveform coded data; step 23, performing frequency-time transformation on each sub-band of the low-frequency sub-band domain excitation spectrum; step 24, performing prediction synthesis on each subband of the low-frequency subband domain excitation signal; and 25, acquiring the high-frequency sub-band domain signal of the corresponding sub-band according to the high-frequency sub-band domain parameter coding data of each sub-band. The following detailed description of the present embodiment will not be described in detail for each sub-band.
The respective steps in fig. 13 are explained in detail below.
In step 22, the low frequency subband domain waveform decoding method corresponds to the low frequency subband domain waveform encoding method. If the specific implementation of the low-frequency subband domain signal coding part is the vector quantization method, the corresponding low-frequency subband domain inverse quantization needs to obtain a vector codeword index from the code stream, and find a corresponding vector in a fixed codebook according to the codeword index. And sequentially combining the low-frequency subband domain waveform vectors into the low-frequency subband domain excitation spectrum after inverse quantization.
In step 23, corresponding to the specific implementation method of the low-frequency subband domain time-frequency transform of the mono audio coding part of the present invention, the frequency-time transform process is described by taking Inverse Discrete Fourier Transform (IDFT) and Inverse Modified Discrete Cosine Transform (IMDCT) as examples.
For an Inverse Discrete Fourier Transform (IDFT), the frequency-time transform process includes three steps: IDFT transformation, time domain windowing processing and time domain superposition operation.
In this embodiment, the low-frequency subband domain excitation spectrum is first divided into one or more subframes for frequency-time conversion according to the low-frequency subband domain signal type analysis result in the side information, and then the low-frequency subband domain excitation spectrum is subjected to frequency-time conversion according to the subframes. The partitioning method is the same as that of the coding-end low-frequency sub-band domain time-frequency transform module 504, which adopts DFT transform.
Secondly, performing IDFT on the low-frequency sub-band domain excitation spectrum after inverse quantization to obtain a transformed time domain signal xi,n. The expression of the IDFT transform is:
<math>
<mrow>
<msub>
<mi>x</mi>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>n</mi>
</mrow>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mi>N</mi>
<mo>+</mo>
<mi>M</mi>
</mrow>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>+</mo>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mi>π</mi>
<mo>/</mo>
<mrow>
<mo>(</mo>
<mi>M</mi>
<mo>+</mo>
<mi>N</mi>
<mo>)</mo>
</mrow>
<mo>)</mo>
<mi>kn</mi>
</mrow>
</mrow>
</msup>
<mo>,</mo>
</mrow>
</math>
wherein, M represents the sample number of the current subframe, N represents the superimposed sample number of the next subframe, and the lengths of M and N are determined by the signal type of the current frame and are consistent with the value in the encoder low-frequency subband domain time-frequency transform module 504; n represents the sample serial number, N is more than or equal to 0 and less than N + M, and i represents the frame serial number; k represents a spectrum number.
And then, windowing the time domain signal after IDFT conversion, and reserving the last N points of the windowed data to be the data superposed by the next subframe. The windowing function corresponds to the encoding side. Such as cosine windows:
wherein N is0The length of the superposition for the current subframe. Determined by the signal type of the last frame. And finally, performing superposition calculation on the first M points of the windowed time domain signal. The calculation method is to store the last N of the previous sub-frame0Point time domain data and previous N of current sub-frame0Time domain data superposition of points, the rest M-N0And the point data is not changed, and the obtained data is a time domain signal of the current subframe after frequency-time conversion, namely a low-frequency sub-band domain excitation signal.
For the Inverse Modified Discrete Cosine Transform (IMDCT), the frequency-time transform process includes three steps: IMDCT transformation, time domain windowing processing and time domain superposition operation.
In this embodiment, the low-frequency subband domain excitation spectrum is first divided into one or more subframes for frequency-time transformation according to the analysis result of the low-frequency subband domain signal type in the side information. The partitioning method is the same as that of the encoding-end low-frequency sub-band domain time-frequency transform module 504, in which MDCT is adopted for time-frequency transform.
Secondly, IMDCT transformation is carried out on the low-frequency sub-band domain excitation spectrum according to the sub-frames to obtain a transformed time domain signal xi,n. The expression of the IMDCT transform is:
<math>
<mrow>
<msub>
<mi>x</mi>
<mrow>
<mi>i</mi>
<mo>,</mo>
<mi>n</mi>
</mrow>
</msub>
<mo>=</mo>
<mfrac>
<mn>2</mn>
<mi>N</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mfrac>
<mi>N</mi>
<mn>2</mn>
</mfrac>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mi>spec</mi>
<mrow>
<mo>[</mo>
<mi>i</mi>
<mo>]</mo>
</mrow>
<mrow>
<mo>[</mo>
<mi>k</mi>
<mo>]</mo>
</mrow>
<mi>cos</mi>
<mrow>
<mo>(</mo>
<mfrac>
<mrow>
<mn>2</mn>
<mi>π</mi>
</mrow>
<mi>N</mi>
</mfrac>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>+</mo>
<msub>
<mi>n</mi>
<mn>0</mn>
</msub>
<mo>)</mo>
</mrow>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>+</mo>
<mfrac>
<mn>1</mn>
<mn>2</mn>
</mfrac>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math>
wherein N represents the sample serial number, N is more than or equal to 0 and less than N, N represents the time domain sample number and is 2048, and N0(N/2+ 1)/2; i represents a frame number; k represents a spectrum number.
And then, windowing the time domain signal obtained by IMDCT transformation in the time domain. To satisfy the complete reconstruction condition, the window function w (n) must satisfy the following two conditions: w (2M-1-n) ═ w (n) and w2(n)+w2(n+M)=1
Typical window functions are Sine (Sine) windows, Kaiser-Bessel derived (KBD, Kaiser-Bessel derived) windows, etc. The above-described limitations on the window function can additionally be modified using biorthogonal transforms, using specific analysis filters and synthesis filters.
And finally, carrying out superposition processing on the windowed time domain signal to obtain a time domain audio signal. The method comprises the following steps: overlapping and adding the first N/2 samples of the signal obtained after the windowing operation and the last N/2 samples of the signal of the previous subframe to obtain N/2 output time domain audio samples, namely time audio samplesi,n=preSami,n+preSami-1,n+N/2Wherein i represents a frame number, n represents a sample number, has
<math>
<mrow>
<mn>0</mn>
<mo>≤</mo>
<mi>n</mi>
<mo>≤</mo>
<mfrac>
<mi>N</mi>
<mn>2</mn>
</mfrac>
<mo>.</mo>
</mrow>
</math>
Thereby obtaining a low frequency subband domain excitation signal.
In step 24, the predictive synthesis of the low-frequency subband domain excitation signal is the inverse process of predictive analysis of the low-frequency subband domain signal in the encoding section, and the function of the predictive synthesis is to synthesize the low-frequency subband domain excitation signal through linear prediction to obtain a synthesized low-frequency subband domain signal.
When performing the predictive synthesis, firstly, according to the analysis result of the low-frequency subband domain signal type in the side information, a frame of low-frequency subband domain excitation signal is divided into one or more subframes for predictive synthesis again, and the dividing method is the same as the method for dividing the subframes of the encoding-side low-frequency subband domain time-varying predictive analysis module 503. The following is a description of the implementation of the invention for a low-frequency subband-domain excitation signal x of a subframee(n) the specific process of performing predictive synthesis, comprising the steps of:
step 24 a: decoding the LSF vector quantization index into a quantized line spectrum frequency from the LSF vector quantization index read from the side information, and converting the line spectrum frequency into a line spectrum pair;
step 24 b: from the quantized line spectrum pair by calculating f1(z) and f2(z) determining quantized prediction coefficientsAnd forming a quantized linear prediction synthesis filter
<math>
<mrow>
<mfrac>
<mn>1</mn>
<mrow>
<mover>
<mi>A</mi>
<mo>^</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>z</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mn>1</mn>
<mo>-</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>p</mi>
</munderover>
<msub>
<mover>
<mi>a</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<msup>
<mi>z</mi>
<mrow>
<mo>-</mo>
<mi>i</mi>
</mrow>
</msup>
</mrow>
</mfrac>
</mrow>
</math>
Wherein, p is a prediction order and is the same as the encoding end;
step 24c, exciting the low-frequency sub-band domain signal xe(n) obtaining the synthesized low frequency subband domain signal y (n) by a linear predictive synthesis filter:
<math>
<mrow>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>x</mi>
<mi>e</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>+</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>p</mi>
</munderover>
<msub>
<mover>
<mi>a</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mi>y</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>-</mo>
<mi>i</mi>
<mo>)</mo>
</mrow>
<mo>.</mo>
</mrow>
</math>
in step 25, the method for decoding the parameters of the high frequency subband domain comprises the following steps:
step 25-1: and (3) carrying out inverse quantization decoding on the high-frequency sub-band domain parameter coded data of the side information to obtain high-frequency sub-band domain parameters, reading spectrum tonality adjusting parameters from the high-frequency sub-band domain parameters, and carrying out spectrum tonality adjustment on the low-frequency sub-band domain excitation spectrum obtained in the step (22) according to the spectrum tonality adjusting parameters. Wherein the spectrum tone adjusting parameters comprise an adjusting type and an adjusting parameterThe adjustability adjustment comprises the following steps:
step 25-1 a: dividing the low-frequency sub-band excitation spectrum into one or more sub-bands, and calculating each sub-bandEnergy E ofest;
Step 25-1 b: judging the adjustment type of each sub-band, and if the adjustment type is not adjustment, not processing the band; if the adjustment type is chord adding treatment, adding chord at the middle position of the frequency band, and adding chord energy
<math>
<mrow>
<mi>Δ</mi>
<msub>
<mi>E</mi>
<mi>T</mi>
</msub>
<mo>=</mo>
<msub>
<mi>E</mi>
<mi>est</mi>
</msub>
<mo>·</mo>
<mover>
<mi>P</mi>
<mo>^</mo>
</mover>
<mo>,</mo>
</mrow>
</math>
The phase of the added chord is continuous with the phase of the previous frame; if the adjustment type is noise adjustment, random noise is added in the frequency band, and the noise energy is
<math>
<mrow>
<mi>Δ</mi>
<msub>
<mi>E</mi>
<mi>N</mi>
</msub>
<mo>=</mo>
<msub>
<mi>E</mi>
<mi>est</mi>
</msub>
<mo>·</mo>
<mover>
<mi>P</mi>
<mo>^</mo>
</mover>
<mo>;</mo>
</mrow>
</math>
And finishing the spectral tonal adjustment after all the frequency bands are processed.
Step 25-2, reading spectrum gain adjustment parameters from the high-frequency sub-band domain parameters, and performing spectrum gain adjustment on the low-frequency sub-band domain excitation spectrum after spectrum tonality adjustment, including:
step 25-2 a: dividing the low-frequency sub-band domain excitation spectrum into one or more sub-bands;
step 25-2 b: and for any sub-band, multiplying each spectral line in the sub-band by the spectral gain adjustment parameter corresponding to the sub-band to obtain the low-frequency sub-band domain excitation spectrum after the spectral gain adjustment of the sub-band, and combining all the sub-bands to obtain the low-frequency sub-band domain excitation spectrum after the spectral gain adjustment.
Step 25-3: and performing IDFT on the adjusted low-frequency sub-band domain excitation spectrum to obtain a low-frequency sub-band domain excitation signal, wherein the transformation method is the same as the low-frequency sub-band domain frequency-time transformation method, and the specific implementation mode is shown in step 23.
Step 25-4: and reading time domain gain adjustment parameters from the high-frequency sub-band domain parameters, performing time domain gain adjustment on the obtained low-frequency sub-band domain excitation signal according to the corresponding time domain gain adjustment parameters, and outputting the obtained adjusted low-frequency sub-band domain excitation signal. The specific method comprises the following steps: dividing a low-frequency sub-band domain excitation signal into one or more sub-frames; and reading time domain gain adjustment parameters, and multiplying the low-frequency sub-band domain excitation signals of each subframe by the corresponding time domain gain adjustment parameters to obtain the low-frequency sub-band domain excitation signals after time domain gain adjustment.
Step 25-5: and carrying out comprehensive filtering on the obtained low-frequency sub-band domain excitation signal after the time domain gain adjustment to obtain a reconstructed high-frequency sub-band domain signal. The method specifically comprises the following steps:
step 25-5 a: reading the analysis result of the high-frequency sub-band domain signal type, and dividing the low-frequency sub-band domain excitation signal of one frame into one or more than one sub-frames for prediction synthesis according to the read analysis result of the high-frequency sub-band domain signal type;
step 25-5 b: for each subframe, reading a high-frequency LSF vector quantization index to form a linear prediction synthesis filter of each subframe;
step 25-5 c: and performing comprehensive filtering on the low-frequency sub-band domain excitation signal by the linear prediction comprehensive filter to obtain a reconstructed high-frequency sub-band domain signal, namely recovering the high-frequency sub-band domain signal, realizing band expansion decoding, and continuously executing subsequent processing steps.
In step 26, a process of subband synthesis is described by taking PQMF as an example, which corresponds to a specific embodiment of subband decomposition by using PQMF in the analysis subband filter bank of the mono audio coding part of the present invention.
The impulse response of the synthesis filter in PQMF is:
wherein p is0And (N) is a prototype filter, M is the number of subbands, N is the order of the prototype filter, k is more than or equal to 0 and less than M-1, N is more than or equal to 0 and less than 2KM-1, and N is more than or equal to 2 KM.
The steps of synthesizing the time domain waveform signal by using the PQMF are as follows:
step 26 a: translating data in the buffer of PQMF, i.e.:
z[t]=z[t-M],t=N-1~M
step 26 b: calculate M new coefficients, i.e.:
wherein t is M-1 to 0.
Step 26 c: generating a weight vector, namely:
u[Mn+k]=z[2Mn+k]
u[Mn+64+k]=z[2Mn+192+k]
wherein K is 0 to M-1, and n is 0 to K.
Step 26 d: windowing the weighted vector data, i.e.:
w [ N ] ═ u [ N ] + c [ N ], where N is 0 to N-1
Step 26 e: computing PCM samples for monophonic sounds
Step 26 f: and sequentially outputting the PCM samples to obtain a recovered single-channel sound signal.
In this preferred embodiment, in the step 25, when restoring the high-frequency subband-domain signal, if there is no spectrum adjustment parameter in the high-frequency subband-domain parameters, the step 25 may restore the high-frequency subband-domain signal from the low-frequency subband-domain excitation signal obtained in the step 23 according to the high-frequency subband-domain parameters, and does not include spectrum adjustment in the restoring process.
The following describes a stereo encoding apparatus and method according to preferred embodiments of the present invention.
Fig. 14 is a block diagram showing a configuration of a stereo encoding apparatus according to a preferred embodiment of the present invention, the stereo encoding apparatus according to the preferred embodiment of the present invention includes: an analysis subband filterbank module 1401, a low frequency subband-domain and signal type analysis module 1402, a low frequency subband-domain time-varying prediction analysis module 1403, a low frequency subband-domain time-frequency transform module 1404, a low frequency subband-domain stereo encoding module 1405, a high frequency subband-domain parameter encoding module 1406, and a bitstream multiplexing module 1407.
The connection relationship and function of each module in fig. 14 are specifically described below, where:
an analysis subband filterbank block 1401 for performing subband decomposition on the left and right channels (L, R) of the input stereo signal to generate subband domain signals of each frequency range corresponding to a specific frequency rangeSignal (L)1,R1)~(Lk1+k2,Rk1+k2) Wherein k1 is greater than or equal to 2, and k2 is greater than or equal to 2. Then the low frequency subband signals (L) in the two channels1,R1)~(Lk1,Rk1) The high frequency subband signals (L) in the two channels are input to the low frequency subband and signal type analysis block 1402 and the low frequency subband time-varying prediction analysis block 1403 in units of framesk1+1,Rk1+1)~(Lk1,Rk1) The encoded data is input to the high frequency subband domain parametric coding module 1406 in units of frames.
A low-frequency subband domain and signal type analyzing module 1402, configured to receive low-frequency subband domain signals in two channels from the subband filter bank analyzing module 1401, calculate a low-frequency subband domain sum signal from the low-frequency subband signals in the two channels, perform signal type analysis on the low-frequency subband domain sum signal, determine whether the type of the frame sum signal is a slowly-varying signal or a rapidly-varying signal, and if the type of the frame sum signal is a slowly-varying signal, directly output the signal type, for example, output an identifier indicating that the type of the frame sum signal is slowly-varying; if the signal is a fast-changing signal, the position of the fast-changing point is continuously calculated, and the corresponding signal type and the position of the fast-changing point are output. The result of the low frequency subband domain and signal type analysis is output to the low frequency subband domain time varying prediction analysis module 1403 for sub-frame division control, and is output to the low frequency subband domain time frequency transform module 1404 for controlling the order of time frequency transform, and the result of the low frequency subband domain and signal type analysis is also output to the bit stream multiplexing module 1407 as side information of the voice encoded code stream. In practice, a stereo encoding apparatus according to the principles of the present invention may not include this module.
A low-frequency subband domain time-varying prediction analysis module 1403, configured to receive the low-frequency subband domain signals in the two channels from the analysis subband filter bank module 1401, perform sub-frame processing on the received low-frequency subband domain signals in the two channels according to the low-frequency subband domain and signal type analysis result output by the low-frequency subband domain and signal type analysis module 1402, perform prediction analysis, that is, linear prediction filtering, on the low-frequency subband domain signals in the two channels according to sub-frames to obtain low-frequency subband domain excitation signals in the two channels, and output the obtained low-frequency subband domain excitation signals in the two channels to the low-frequency subband domain time-frequency transform module 1404. And when outputting, combining the sub-frames into a frame according to the dividing sequence and outputting.
When the sub-frames are divided, if the sum signal type is a slowly varying signal, only one sub-frame is divided in one frame, namely the sub-frames are not divided; and if the sum signal type is a fast-changing signal, dividing the left/right channel low-frequency sub-band domain signal of the frame into different sub-frames according to the position of the fast-changing point. The specific method for dividing different subframes refers to a specific method for performing subframe division processing on a single-channel sound signal by the low-frequency subband domain time-varying prediction analysis module 503.
The low frequency subband time-varying prediction analyzing module 1403 may perform linear prediction analysis on the low frequency subband signals by sub-frame using the composition structure in the monaural sound encoding apparatus of the present invention. The LSF vector quantization indexes of the two channels obtained in the linear prediction analysis process are sent to the bitstream multiplexing module 1407 as side information.
A low-frequency sub-band domain time-frequency transform module 1404, configured to receive the low-frequency sub-band domain excitation signals in the two channels from the low-frequency sub-band domain time-varying prediction analysis module 1403, and transform the low-frequency sub-band domain excitation signals in the two channels from the time domain to the frequency domain by using transforms of different length orders according to the low-frequency sub-band domain and signal type analysis result output by the low-frequency sub-band domain and signal type analysis module 1402, so as to obtain frequency domain representations of the low-frequency sub-band domain excitation signals in the two channels, that is, low-frequency sub-band domain excitation spectrums of the two channels. And then the low-frequency subband domain excitation spectrums of the two sound channels obtained by time-frequency transformation are output to the low-frequency subband domain waveform coding module 1405 and the high-frequency subband domain parameter coding module 1406. If the stereo encoding apparatus according to the principles of the present invention does not include the low frequency subband domain and signal type analyzing module 1402, the order is not controlled during time-frequency transform.
Specific time-frequency transformation methods include Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Modified Discrete Cosine Transform (MDCT), and the like. And in the specific time-frequency transformation, the low-frequency sub-band domain and the signal type analysis result are used as the order control of the time-frequency transformation. The fast-changing signal is subjected to time-frequency transformation by taking a subframe as a unit, and the transformation with a shorter order is selected; the slowly-varying signal is time-frequency transformed by using frame as unit, and longer order transformation is selected. And outputting the low-frequency subband domain excitation spectrums of the two sound channels obtained by time-frequency transformation to a low-frequency subband domain stereo coding module 1405.
The low-frequency subband domain stereo encoding module 1405 is configured to receive the low-frequency subband domain excitation spectrum in the two channels from the low-frequency subband domain time-frequency transform module 1404, divide the excitation spectrum into a plurality of subbands, perform stereo encoding on each subband in a stereo encoding mode, to obtain low-frequency subband domain stereo encoded data, and output the low-frequency subband domain stereo encoded data as sound encoded data in a sound encoded code stream to the bit stream multiplexing module 1407. The stereo coding modes include a sum and difference stereo coding mode, a parametric stereo coding mode and a parametric error stereo coding mode. When stereo coding is performed, one of the three coding modes is selected for each sub-band to perform stereo coding. Wherein the coding mode selection information is also output as side information to the bitstream multiplexing module 1407.
A high-frequency subband domain parameter coding module 1406, configured to receive low-frequency subband domain excitation spectrums of two channels from the low-frequency subband domain time-frequency transform module 1404, receive high-frequency subband domain signals of two channels from the analysis subband filter bank 1401, extract high-frequency subband domain parameters of the two channels according to the low-frequency subband domain excitation spectrums of the two channels and the high-frequency subband domain signals of the two channels, where the high-frequency subband domain parameters are used to recover the high-frequency subband domain signals of the two channels from the low-frequency subband domain excitation spectrums of the two channels, and then, after performing quantization coding on the extracted high-frequency subband domain parameters, the high-frequency subband domain parameter coding module 1406 obtains high-frequency subband domain parameter coding data, and outputs the high-frequency subband domain parameter coding data to the bit stream multiplexing module 1407 as side information.
A bit stream multiplexing module 1407, configured to multiplex the audio encoded data and the side information received from the low-frequency sub-band domain and signal type analysis module 1402, the low-frequency sub-band domain time-varying prediction analysis module 1403, the low-frequency sub-band domain stereo encoding module 1405, and the high-frequency sub-band domain parameter encoding module 1406, so as to form a stereo audio encoded code stream.
In this embodiment, the low-frequency subband domain time-varying prediction analysis module 1403, the low-frequency subband domain time-frequency transform module 1404, and the high-frequency subband domain parameter coding module 1406 need to process the left and right channels of the stereo respectively, and the processing method is the same as that of the module with the same name in the monaural sound coding apparatus. Therefore, each of the three modules realizes processing of stereo sound by combining the modules of the same name in the two monaural sound encoding apparatuses.
It can be seen that the difference between the monaural sound coding device and the preferred embodiment of the present invention is that when the monaural sound coding device generates the sound coding data of the sound coding code stream, the monaural sound coding device uses the low-frequency subband waveform coding module 505; the stereo encoding apparatus employs the low-frequency subband stereo encoding module 1405 when generating the audio encoded data of the audio encoded code stream. This module also performs subband division and stereo coding on each subband of the low frequency subband domain stereo coded data.
Of course, like the high frequency subband domain parameter coding module 506 in the mono audio coding apparatus, the high frequency subband domain parameter coding module 1406 of the preferred embodiment may also receive the low frequency subband domain excitation signals of two channels generated by the low frequency subband domain time variant prediction analysis module 1403, instead of receiving the low frequency subband domain excitation spectrums of two channels. In this case, the high frequency subband-domain parameters of the two channels are extracted directly from the low frequency subband-domain excitation signals of the two channels and the high frequency subband-domain signals of the two channels. The high frequency subband domain parameters are used to recover the two-channel high frequency subband domain signals from the two-channel high frequency subband domain excitation signals, the high frequency subband domain parameters not including the spectral adjustment parameters.
Fig. 15 is a flow chart of an encoding method based on the stereo encoding apparatus of the present invention. As shown in fig. 15, the method includes the steps of:
step 31: the left and right channels of the input stereo signal are respectively decomposed in sub-band, so that each sub-band domain signal in the two channels corresponds to a specific frequency range. Decomposing to obtain low-frequency subband domain signals of a left channel and a right channel and high-frequency subband domain signals of the left channel and the right channel, wherein the low-frequency subband domain signals of the left channel and the right channel comprise subband domain signals of at least 2 subbands; the high frequency subband domain signals of the left/right channel also comprise subband domain signals of at least 2 subbands.
Step 32: calculating a low-frequency sub-band domain sum signal from the low-frequency sub-band domain signals in the two sound channels, carrying out signal type analysis on the low-frequency sub-band domain sum signal, and if the low-frequency sub-band domain sum signal is a slowly-varying type signal, directly determining the signal type as a low-frequency sub-band domain sum signal type analysis result; if the signal is a fast-changing type signal, the position of the fast-changing point is continuously calculated, and finally the signal type and the position of the fast-changing point are determined as the analysis result of the low-frequency sub-band domain and the signal type.
Step 33: and performing molecular frame processing on the low-frequency subband domain signals in the two sound channels according to the analysis result of the low-frequency subband domain and the signal type, performing predictive analysis, namely linear predictive filtering, according to sub-frames, and obtaining low-frequency subband domain excitation signals in the two sound channels.
Step 34: and performing time-frequency transformation on the low-frequency sub-band domain excitation signals in the two sound channels by adopting different length orders according to the analysis result of the low-frequency sub-band domain and the signal type to obtain low-frequency sub-band domain excitation spectrums in the two sound channels.
Step 35: and dividing the low-frequency subband domain excitation spectrums in the two sound channels into a plurality of subbands, and performing stereo coding on each subband to obtain low-frequency subband domain stereo coding data.
Step 36: and according to the low-frequency subband domain excitation spectrums in the two sound channels and the high-frequency subband domain signals in the two sound channels, extracting high-frequency subband domain parameters for recovering the high-frequency subband domain signals in the two sound channels from the low-frequency subband domain excitation spectrums in the two sound channels, and carrying out quantization coding on the high-frequency subband domain parameters to obtain high-frequency subband domain parameter coding data.
Step 37: and multiplexing the low-frequency sub-band domain stereo coded data and the side information to obtain a sound coded code stream. The side information comprises a low-frequency subband domain, a signal type analysis result, high-frequency subband domain parameter coding data of a left channel and a right channel, and LSF vector quantization indexes of the two channels generated in the process of carrying out linear prediction filtering on the low-frequency subband domain signal.
The subband decomposition method in step 31, the signal type determination method in step 32, the prediction analysis processing in step 33, the time-frequency transform method in step 34, and the high-frequency subband domain parameter encoding method in step 36 are all described in the embodiment of the encoding method of the mono encoding apparatus of the present invention, and the same method is adopted in the embodiment of the encoding method of the stereo encoding apparatus of the present invention, and therefore, the description is omitted.
The low-frequency subband stereo encoding in step 35 is performed by first dividing the low-frequency subband excitation spectrum in the two channels into a plurality of subbands, then selecting one of three encoding modes, i.e., sum-difference stereo encoding mode, parametric stereo encoding mode, and parametric error stereo encoding mode, for each subband, and encoding the low-frequency excitation spectrum in the two channels in the subband. In the division, the sub-bands of the low-frequency sub-band excitation spectrum of the two channels are divided. Two implementation methods for coding mode selection are given below:
coding mode selection implementation method 1: and respectively encoding and decoding the low-frequency subband domain excitation spectrums in the two sound channels by using the three encoding modes with the same bit number, calculating the error between the low-frequency subband domain excitation spectrums in the two sound channels restored by decoding and the low-frequency subband domain excitation spectrums before encoding, and selecting the encoding mode with the minimum error as the encoding mode of stereo encoding. Outputting the coding mode selection information as side information to the bitstream multiplexing module 1407;
coding mode selection implementation method 2: for lower frequency sub-bands having a frequency lower than a certain value in the low frequency sub-band domain, for example, sub-bands below 1kHz, a sum-difference stereo coding mode and a parametric stereo coding mode are respectively used for encoding and decoding, an error between the low frequency sub-band domain excitation spectrum in the restored two channels and the low frequency sub-band domain excitation spectrum before encoding is calculated, and a coding mode having a smaller error is selected, and coding mode selection information is output as side information to the bitstream multiplexing module 1407, and for higher frequency sub-bands having a frequency higher than the above certain value, for example, sub-bands above 1kHz, a parametric stereo coding mode is used. At this time, the selection information of the parametric stereo coding mode may or may not be output to the bitstream multiplexing module 1407.
Of course, a fixed stereo coding mode may be used in practical applications, in which case the coding mode selection information does not need to be output as side information to the bitstream multiplexing module 1407.
The following describes in detail the implementation of each of the three stereo coding modes.
Fig. 16 is a model diagram of a sum and difference stereo coding mode. The sum and difference stereo coding mode calculates a sum excitation spectrum and a difference excitation spectrum in a sub-band according to the low-frequency sub-band excitation spectrum in the sub-band in the two sound channels. The specific implementation method comprises the following steps:
excitation spectrum from left and right channelsAndcalculating the corresponding sum excitation spectrumSum and difference excitation spectrumAnd will beAndafter waveform quantization coding is carried out, the obtained result isAndoutput as low frequency sub-band domain stereo encoded data to the bitstream multiplexing module 1407.Andthe calculation formula of (A) is as follows:
wherein, it is toAndthe waveform quantization coding may be performed by using a method of quantizing and coding the low frequency subband waveform coding module 505 of the monaural sound coding apparatus.
Fig. 17 is a model diagram of a parametric stereo coding mode. The parametric stereo coding mode is to calculate a mono excitation spectrum in a subband k according to the low-frequency subband excitation spectrum in the subband k in the two channels, and calculate parameters for recovering the low-frequency subband excitation spectrum in the subband k in the two channels by the subband mono excitation spectrum. Two specific implementations of parametric stereo coding are listed below.
The parametric stereo coding implementation method 1 comprises the following steps:
step 35-1 a: within sub-band k for a channel, e.g. the right channelCalculating a weighting parameter g for the channelr(k) And obtaining a scaled excitation spectrum of the vocal tract
<math>
<mrow>
<msup>
<mover>
<mi>R</mi>
<mo>→</mo>
</mover>
<mo>′</mo>
</msup>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<msub>
<mi>g</mi>
<mi>r</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mover>
<mi>R</mi>
<mo>→</mo>
</mover>
<mo>,</mo>
</mrow>
</math>
So as to be zoomedAndare equal in energy; gr(k) The following formula can be used for the calculation method:
wherein E isR(k) And EL(k) The energy of the right channel and the left channel in the sub-band k are respectively.
Step 35-1 b: for each frequency point i within a sub-band k, a weighted sum excitation spectrum of the frequency point is calculatedSum weighted difference excitation spectrumSince the energy ratio of the left and right channels at each frequency point in subband k is statistically approximately the same after scaling, the sum of the energy ratios of the left and right channels at each frequency point in subband k is approximately the sameAndthe energies are approximately equal, so the weighted sum excitation spectrumSum weighted difference excitation spectrumApproximately vertical. The calculation formula is as follows:
step 35-1 c: generating and weighting sum excitation spectrumOrthogonal excitation spectrum with equal amplitude and verticalityFrom orthogonal excitation spectraSum weighted difference excitation spectrumCalculating orthogonal excitation spectraWeighting parameter g ofd(k) So that g is adoptedd(k) Scaled orthogonal excitation spectrumAndare equal in energy. gd(k) The following formula can be used for the calculation method:
wherein E isS(k) And ED(k) Respectively, weighted difference excitation spectra within sub-band kAnd orthogonal excitation spectrumThe energy of (a).
Step 35-1 d: the weighted sum excitation spectrumAnd gr(k) And gd(k) Respectively, and then output to the bit stream multiplexing module 1407 after being subjected to quantization coding. Wherein after quantization codingQuantizing the encoded g for low frequency subband-domain stereo encoded datar(k) And gd(k) Is side information.
Parametric stereo coding the parameter g in implementation 2 relative to implementation 1r(k)、gd(k) Sum weighted sum excitation spectrumIs obtained according to the principle of minimum error, and comprises the following steps:
step 35-2 a: for subband k, the first parameter g is calculated according to the following formulad(k):
Wherein,
wherein x islAnd ylRespectively the real part and imaginary part, x, of the left channel low-frequency subband domain excitation spectrumrAnd yrRespectively a real part and an imaginary part of the excitation spectrum of the low-frequency subband domain of the right channel;
step 35-2 b: for subband k, the second parameter g is calculated according to the following formular(k):
Wherein,
step 35-2 c: for each frequency point i within the sub-band k, the weighting and excitation spectra are calculated according to the following formula
Wherein x ismAnd ymRepresenting weighted and excitation spectra, respectivelyThe real and imaginary parts of (a) and (b),
<math>
<mrow>
<msup>
<mover>
<mi>M</mi>
<mo>→</mo>
</mover>
<mo>′</mo>
</msup>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>=</mo>
<msub>
<mi>x</mi>
<mi>m</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>+</mo>
<mi>j</mi>
<msub>
<mi>y</mi>
<mi>m</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>;</mo>
</mrow>
</math>
g (k) is an importance factor for parametric stereo coding in subband k, reflecting the distribution of the parametric stereo coding error in the left and right channels, and may be chosen according to signal characteristics, e.g. g (k) may be equal to the energy ratio of the left channel to the right channel in subband k, e.g. EL(k)/ER(k)。
Step 35-2 d: the weighted sum excitation spectrumgr(k) And gd(k) Respectively, and then output to the bit stream multiplexing module 1407 after being subjected to quantization coding. Wherein after quantization codingQuantizing the encoded g for low frequency subband-domain stereo encoded datar(k) And gd(k) Is side information.
Fig. 18 is a model diagram of a parametric error stereo coding mode. The parameter error stereo coding mode is to calculate a single sound channel excitation spectrum and an error spectrum in a sub-frequency band according to the low-frequency sub-band excitation spectrum in the sub-frequency band of the two sound channels and recover the parameters of the low-frequency sub-band excitation spectrum in the sub-frequency band of the two sound channels through the single sound channel excitation spectrum and the error spectrum.
Compared with a calculation model of a parametric stereo coding mode, if the coding precision needs to be improved, the parametric error stereo coding mode is adopted to further calculate the error of the excitation spectrum, namely the error spectrumAnd will error the spectrumWaveform quantization encoding is also performed. The method for implementing the parametric error stereo coding mode comprises the following steps:
step 35-3 a: for a channel in sub-band k, e.g. the right channelCalculating a weighting parameter g for the channelr(k) And obtaining a scaled excitation spectrum of the vocal tractSince the energy ratios of the left and right channels at each frequency point i in the parameter extraction band are statistically approximately the same, the energy ratios of the left and right channels at each frequency point i in the parameter extraction band are approximately the sameAndapproximately equal in energy, so the weighted sum excitation spectrumSum weighted difference excitation spectrumApproximately vertical; wherein, gr(k) And g in step 35-1ar(k) The calculation method is the same.
Step 35-3 b: for each frequency point i in the sub-band, a weighted sum excitation spectrum of the frequency point is calculatedSum weighted difference excitation spectrum
Step 35-3 c: generating and weighting sum excitation spectrumOrthogonal excitation spectrum with equal amplitude and verticality
Step 35-3 d: from orthogonal excitation spectraSum weighted difference excitation spectrumCalculating a weighting parameter gd(k) And are obtained according to gd(k) Scaled orthogonal excitation spectrumWherein, gd(k) And g in step 35-1cd(k) The calculation method is the same.
Step 35-3 e: by calculating weighted difference excitation spectraAnd scaled orthogonal excitation spectrumDifference of (2)An error excitation spectrum can be obtainedNamely, it is
<math>
<mrow>
<mover>
<mi>E</mi>
<mo>→</mo>
</mover>
<mo>=</mo>
<msup>
<mover>
<mi>S</mi>
<mo>→</mo>
</mover>
<mo>′</mo>
</msup>
<mo>-</mo>
<msup>
<mover>
<mi>D</mi>
<mo>→</mo>
</mover>
<mo>′</mo>
</msup>
<mo>.</mo>
</mrow>
</math>
Step 35-3 f: the weighted sum excitation spectrumError excitation spectrumParameter gr(k) And gd(k) Respectively, and then output to the bit stream multiplexing module 1407 after being subjected to quantization coding. Wherein after quantization codingAndquantizing the encoded g for low frequency subband-domain stereo encoded datar(k) And gd(k) Is side information.
In the preferred embodiment, if step 36 extracts the high frequency subband domain parameters directly according to the low frequency subband domain excitation signals of the two channels and the high frequency subband domain signals of the two channels obtained in step 33, the step of spectrum adjustment is not included in the extraction process, and therefore the spectrum adjustment parameters are not included in the high frequency subband domain parameters of the two channels.
The following describes a stereo decoding apparatus and method according to preferred embodiments of the present invention.
Fig. 19 is a block diagram showing a configuration of a stereo decoding apparatus according to a preferred embodiment of the present invention. As shown in fig. 19, the stereo decoding apparatus according to the preferred embodiment of the present invention includes: a bitstream demultiplexing module 1901, a low frequency subband domain stereo decoding module 1902, a low frequency subband domain frequency-to-time transform module 1903, a low frequency subband domain time varying prediction synthesis module 1904, a high frequency subband domain parameter decoding module 1905, and a synthesis subband filterbank module 1906.
The connection relationship and functions of the respective modules shown in fig. 19 will be specifically described, wherein,
a bitstream demultiplexing module 1901, configured to demultiplex a received sound coding stream to obtain sound coding data and side information of a corresponding data frame, and output the corresponding sound coding data to the low-frequency subband domain stereo decoding module 1902; the corresponding side information is output to the low frequency subband domain frequency-time transform module 1903, the low frequency subband domain time-varying prediction synthesis module 1904, and the high frequency subband domain parameter decoding module 1905.
The sound encoding data output to the low frequency subband stereo decoding module 1902 is low frequency subband stereo encoding data; the side information output to the low frequency subband domain frequency-time transform module 1903 is a low frequency subband domain and signal type analysis result; the side information output to the low-frequency subband domain time-varying prediction synthesis module 1904 includes a low-frequency subband domain and signal type analysis result and a low-frequency subband domain LSF vector quantization index; the side information output to the high frequency subband domain parameter decoding module 1905 is high frequency subband domain parameter encoded data, and includes: the high-frequency sub-band domain signal type analysis result, the high-frequency LSF vector quantization index, the spectrum adjustment parameter and the time domain gain adjustment parameter. When the low frequency subband domain stereo encoding module 1405 at the encoding end outputs the encoding mode selection information, the encoding mode selection information is also output as side information to the low frequency subband domain stereo decoding module 1902 (not shown in fig. 19).
The low-frequency subband domain stereo decoding module 1902 is configured to perform stereo decoding on the low-frequency subband domain stereo encoded data according to the encoding mode selection information in the side information output by the bitstream demultiplexing module 1901, obtain low-frequency subband domain excitation spectra in the two channels, and send the low-frequency subband domain excitation spectra to the low-frequency subband domain frequency-time transform module 1903 and the high-frequency subband domain parameter decoding module 1905.
A low-frequency subband domain frequency-time transform module 1903, configured to perform frequency-time transform on low-frequency subband domain excitation spectra in two channels received from the low-frequency subband domain stereo decoding module 1902, where the frequency-time transform obtains low-frequency subband domain excitation signals in the two channels by using transforms of different length orders according to low-frequency subband domains and signal type analysis results in side information output by the bitstream demultiplexing module 1901, and sends the low-frequency subband domain excitation signals to the low-frequency subband domain time-varying prediction synthesis module 1904 in units of frames. The frequency-time transform method is the inverse process of the time-frequency transform in the encoding-side low-frequency subband domain time-frequency transform module 1404, and includes Inverse Discrete Fourier Transform (IDFT), Inverse Discrete Cosine Transform (IDCT), Inverse Modified Discrete Cosine Transform (IMDCT), and the like.
A low-frequency subband domain time-varying prediction synthesis module 1904, configured to divide a frame of low-frequency subband domain excitation spectrum received from the low-frequency subband domain frequency-time transform module 1903 into one or more subframes used for prediction synthesis according to the analysis result of the low-frequency subband domain and the signal type in the side information output by the bit stream demultiplexing module 1901, perform synthesis filtering on each subframe, obtain a low-frequency subband domain signal of each subframe, and form a frame of low-frequency subband domain signals according to the division order. The division method is the same as the sub-frame division method of the low frequency sub-band domain time-varying prediction analysis module 1403 of the stereo encoding apparatus.
A high frequency subband domain parameter decoding module 1905, configured to recover the high frequency subband domain signals in the two channels according to the low frequency subband domain excitation spectrum in the two channels received from the low frequency subband domain stereo decoding module 1902 and the high frequency subband domain parameter encoding data of the two channels in the bitstream demultiplexing module 1901 output side information.
A synthesis subband filterbank module 1906, configured to perform subband synthesis on the low-frequency subband domain signals in the two channels output by the low-frequency subband domain time-varying prediction synthesis module 1904 and the high-frequency subband domain signals in the two channels output by the high-frequency subband domain parameter decoding module 1905 to obtain a decoded stereo signal.
In this embodiment, the low-frequency subband domain frequency-time transform module 1903, the low-frequency subband domain time-varying prediction synthesis module 1904, and the high-frequency subband domain parameter decoding module 1905 respectively adopt two sets of modules with the same name of the monaural sound decoding device to respectively process the left and right channel signals.
Corresponding to the stereo decoding apparatus, when the high-frequency subband domain parameters do not include the spectrum adjustment parameters, the high-frequency subband domain parameter decoding module 1905 may also obtain the low-frequency subband domain excitation signals of the two channels from the low-frequency subband domain frequency-time transform module 1903, so that the high-frequency subband domain parameter decoding module 1905 does not perform spectrum adjustment, and performs synthesis filtering on the low-frequency subband domain excitation signals of the two channels, which are only subjected to the time domain gain adjustment, according to respective linear predictive synthesis filters formed by the high-frequency LSF vector quantization indexes of the two channels, thereby obtaining reconstructed high-frequency subband domain signals of the two channels, and achieving restoration of the high-frequency subband domain signals of the two channels.
Fig. 20 is a flowchart of a decoding method of a stereo audio decoding apparatus according to the present invention. As shown in fig. 20, the method includes the steps of:
step 41: and demultiplexing the sound coding code stream to obtain low-frequency sub-band domain stereo coding data and all side information used for decoding.
Step 42: and performing stereo decoding on the low-frequency sub-band domain stereo coded data according to stereo coding mode selection information in the side information to obtain low-frequency sub-band domain excitation spectrums in the two sound channels.
Step 43: and according to the analysis result of the low-frequency sub-band domain and the signal type in the side information, performing frequency-time conversion on the low-frequency sub-band domain excitation spectrums in the two sound channels according to different orders to obtain low-frequency sub-band domain excitation signals in the two sound channels.
Step 44: and acquiring a low-frequency sub-band domain linear prediction synthesis filter according to LSF vector quantization indexes of two sound channels in the side information, dividing sub-frame processing is carried out on low-frequency sub-band domain excitation signals in the two sound channels by adopting a low-frequency sub-band domain in the side information and a signal type analysis result, and then comprehensive filtering is carried out on the low-frequency sub-band domain excitation signals in the two sound channels by adopting the acquired linear prediction synthesis filter according to sub-frames to obtain decoded low-frequency sub-band domain signals in the two sound channels.
Step 45: and restoring the high-frequency subband domain signals in the two sound channels according to the low-frequency subband domain excitation spectrums in the two sound channels and the high-frequency subband domain parameter coding data in the side information to obtain the decoded high-frequency subband domain signals in the two sound channels.
Step 46: and combining the low-frequency subband domain signals in the two sound channels and the high-frequency subband domain signals in the two sound channels, and performing subband synthesis by using a comprehensive subband filter bank.
The frequency-time transform method in step 43, the predictive synthesis process in step 44, the high-frequency subband domain parameter decoding method in step 45, and the subband synthesis method in step 46 have been described in the embodiment of the decoding method of the mono decoding apparatus of the present invention, and the same method is adopted in the embodiment of the decoding method of the stereo decoding apparatus of the present invention, and therefore, will not be described.
The sum and difference stereo decoding mode restores the low frequency subband domain excitation spectra in the two channels in a subband by the sum excitation spectrum and the difference excitation spectrum of the low frequency subband domain in the subband. The specific implementation method comprises the following steps:
the low frequency subband stereo decoding module 1902 performs inverse quantization decoding on the low frequency subband stereo encoded data received from the bitstream demultiplexer 1901 to obtain a sum excitation spectrum of the low frequency subbandSum and difference excitation spectrumThe low-frequency subband domain excitation spectrums of the left and right channels are restored by adopting the following formula.
The parametric stereo decoding mode is based on the weighting in the subbands received by the low frequency subband stereo decoding module 1902 and the corresponding parameters in the excitation spectrum and side informationAndto recover the left and right channel low frequency subband excitation spectra within the subband. Embodiment 1 and embodiment 2 of the parametric stereo coding method corresponding to the coding section, but the decoding process of the two embodiments is the same, and comprises the following steps:
step 42-1 a: the low frequency subband stereo decoding module 1902 performs inverse quantization decoding on the low frequency subband stereo encoded data and corresponding parameters received from the bitstream demultiplexer 1901 to obtain a weighted sum excitation spectrumParameter(s)And
step 42-1 b: generating and weighting sum excitation spectrumOrthogonal excitation spectrum with equal amplitude and verticalityWherein,
<math>
<mrow>
<mover>
<mi>D</mi>
<mo>→</mo>
</mover>
<mrow>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
</mrow>
<mo>=</mo>
<msub>
<mrow>
<mo>-</mo>
<mi>y</mi>
</mrow>
<mi>m</mi>
</msub>
<mrow>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>jx</mi>
<mi>m</mi>
</msub>
<mrow>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
step 42-1 c: according to the obtained parametersOrthogonal excitation spectrumScaling to obtain a scaled orthogonal excitation spectrum
Step 42-1 d: from weighted sum excitation spectrumAnd scaled orthogonal excitation spectrumObtaining the excitation spectrums of the left channel and the right channel, wherein the excitation spectrum of one channel (the right channel) is subjected to scaling; the calculation formula is as follows:
step 42-1 e: by parameters derived from side informationRescaling the scaled sound channel back to the original size to obtain the scaled sound channel
The parametric error stereo decoding mode is based on the subband weights and excitation spectra obtained by the low frequency subband stereo decoding module 1902Error excitation spectrumAnd corresponding parameters in the side informationAndto recover the subband left and right channel excitation spectrum. The specific implementation method comprises the following steps:
step 42-2 a: the low frequency subband stereo decoding module 1902 performs inverse quantization decoding on the low frequency subband stereo encoded data and corresponding parameters received from the bitstream demultiplexer 1901 to obtain a weighted sum excitation spectrumError excitation spectrumAnd parametersAnd
step 42-2 b: generating and weighting sum excitation spectrumOrthogonal excitation spectrum with equal amplitude and verticality
Step 42-2 c: according to the obtained parametersOrthogonal excitation spectrumScaling to obtain a scaled orthogonal excitation spectrum
Step 42-2 d: scaled orthogonal excitation spectrumAnd error excitation spectrumAdding to obtain the recovered weighted difference excitation spectrum
Step 42-2 e: from weighted sum excitation spectrumSum weighted difference excitation spectrumObtaining the excitation spectrums of the left channel and the right channel, wherein the excitation spectrum of one channel (the right channel) is subjected to scaling;
In the preferred embodiment, in the step 45, when restoring the high frequency subband domain signals of the two channels, if there is no spectrum adjustment parameter in the high frequency subband domain parameters of the two channels, the step 45 may restore the high frequency subband domain signals of the two channels from the low frequency subband domain excitation signals of the two channels obtained in the step 43 according to the high frequency subband domain parameters of the two channels, and the restoring process does not include the step of spectrum adjustment.
It can be seen from the above that, the monaural sound coding and decoding scheme provided by the invention adopts a combination form of high-efficiency subband coding, predictive coding and transform coding when performing waveform coding on low-frequency subband signals, thereby improving the coding efficiency; the high-efficiency parameter coding mode is adopted when the high-frequency subband domain signals are processed, and effective frequency spectrum structure adjustment and time domain gain adjustment are carried out in the process of carrying out parameter coding on the high-frequency subband domain excitation signals, so that the coding efficiency is improved, and the distortion of the decoded sound is reduced.
In addition, the stereo coding and decoding scheme of the invention not only has the advantages of the single sound channel coding and decoding scheme based on the principle of the invention, but also provides a plurality of parameter stereo coding methods based on the sub-band excitation spectrum, can reduce the code rate, and is suitable for stereo coding under extremely low code rate.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (31)
1. A mono sound encoding apparatus, comprising:
an analysis subband filterbank module for performing a subband decomposition of the mono sound signal into a low frequency subband domain signal comprising at least 2 subbands and a high frequency subband domain signal comprising at least 2 subbands;
the low-frequency sub-band domain time-varying prediction analysis module is used for performing prediction analysis on the low-frequency sub-band domain signal to acquire a low-frequency sub-band domain excitation signal;
the low-frequency sub-band domain time-frequency transformation module is used for performing time-frequency transformation on the low-frequency sub-band domain excitation signal to acquire a low-frequency sub-band domain excitation spectrum;
the low-frequency subband domain waveform coding module is used for carrying out quantization coding on the low-frequency subband domain excitation spectrum to obtain low-frequency subband domain waveform coding data;
the high-frequency sub-band domain parameter coding module is used for calculating high-frequency sub-band domain parameters for recovering the high-frequency sub-band domain signals from the low-frequency sub-band domain excitation spectrum according to the low-frequency sub-band domain excitation spectrum and the high-frequency sub-band domain signals, and obtaining high-frequency sub-band domain parameter coding data after carrying out quantization coding on the high-frequency sub-band domain parameters; or, according to the low-frequency sub-band domain excitation signal and the high-frequency sub-band domain signal, calculating a high-frequency sub-band domain parameter for recovering the high-frequency sub-band domain signal from the low-frequency sub-band domain excitation signal, and after carrying out quantization coding on the high-frequency sub-band domain parameter, acquiring high-frequency sub-band domain parameter coded data;
and the bit stream multiplexing module is used for multiplexing the low-frequency subband domain waveform coded data and the high-frequency subband domain parameter coded data so as to output a sound coded code stream.
2. The encoding apparatus as claimed in claim 1, wherein the encoding apparatus further comprises a low frequency subband signal type analyzing module for performing signal type analysis on a frame of the low frequency subband signal and outputting a low frequency subband signal type analysis result; outputting a signal type if the low frequency sub-band domain signal is a slowly varying signal; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and outputting a signal type and the fast-changing point position;
the low-frequency subband domain time-varying predictive analysis module is further used for dividing the low-frequency subband domain signal into one or more subframes for predictive analysis according to the analysis result of the low-frequency subband domain signal type;
the low-frequency sub-band domain time-frequency transformation module is further used for dividing the low-frequency sub-band domain excitation signal into one or more subframes for time-frequency transformation according to the analysis result of the low-frequency sub-band domain signal type;
the bit stream multiplexing module is further configured to multiplex the low-frequency subband domain signal type analysis result.
3. The encoding apparatus according to claim 1, wherein when the high frequency subband-domain parameter encoding module calculates the high frequency subband-domain parameters from the low frequency subband-domain excitation spectrum and the high frequency subband-domain signal, the high frequency subband-domain parameter encoding module includes:
the time-varying predictive analyzer is used for carrying out linear predictive analysis on the received high-frequency sub-band domain signal to obtain a high-frequency LSF vector quantization index and a high-frequency sub-band domain excitation signal of the linear predictive filter; encoding the high-frequency LSF vector quantization index and outputting the encoded high-frequency LSF vector quantization index to the bit stream multiplexing module;
the spectrum parameter encoder is used for performing time-frequency transformation on the high-frequency sub-band domain excitation signal to obtain a high-frequency sub-band domain excitation spectrum, performing spectrum adjustment on the received low-frequency sub-band domain excitation spectrum according to the high-frequency sub-band domain excitation spectrum, performing quantization coding on the extracted spectrum adjustment parameters, outputting the extracted spectrum adjustment parameters to the bit stream multiplexing module, and performing frequency-time transformation on the low-frequency sub-band domain excitation spectrum after spectrum adjustment to obtain a low-frequency sub-band domain excitation signal after spectrum adjustment;
the time-varying prediction synthesizer is used for performing comprehensive filtering on the spectrum-adjusted low-frequency sub-band domain excitation signal by adopting a linear prediction synthesis filter obtained according to the high-frequency LSF vector quantization index to obtain a reconstructed high-frequency sub-band domain signal;
and the time domain adaptive gain adjustment parameter extraction module is used for comparing the high-frequency sub-band domain signal with the reconstructed high-frequency sub-band domain signal to obtain a time domain gain adjustment parameter, and outputting the time domain gain adjustment parameter to the bit stream multiplexing module after quantization coding.
4. A method for encoding a monaural sound, the method comprising:
A. performing sub-band decomposition on the single-channel sound signal to obtain a low-frequency sub-band domain signal comprising at least 2 sub-bands and a high-frequency sub-band domain signal comprising at least 2 sub-bands;
B. performing predictive analysis and time-frequency transformation on the low-frequency sub-band domain signal to obtain a low-frequency sub-band domain excitation spectrum; carrying out quantization coding on the low-frequency subband domain excitation spectrum to obtain low-frequency subband domain waveform coding data;
C. calculating high-frequency sub-band domain parameters for recovering the high-frequency sub-band domain signals from the low-frequency sub-band domain excitation spectrum according to the high-frequency sub-band domain signals and the low-frequency sub-band domain excitation spectrum, and obtaining high-frequency sub-band domain parameter coding data after carrying out quantization coding on the high-frequency sub-band domain parameters; or, according to the high-frequency sub-band domain signal and the low-frequency sub-band domain excitation signal obtained through the predictive analysis, calculating a high-frequency sub-band domain parameter used for recovering the high-frequency sub-band domain signal from the low-frequency sub-band domain excitation signal, and obtaining high-frequency sub-band domain parameter coding data after carrying out quantization coding on the high-frequency sub-band domain parameter;
D. multiplexing the low-frequency sub-band domain waveform coded data and the high-frequency sub-band domain parameter coded data, and outputting a sound coded code stream.
5. The encoding method of claim 4, wherein the method further comprises the steps of: performing signal type analysis on a frame of low-frequency subband domain signals, and determining a low-frequency subband domain signal type analysis result; if the low-frequency sub-band domain signal is a slowly varying signal, taking the signal type as a low-frequency sub-band domain signal type analysis result; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and taking the signal type and the fast-changing point position as a signal type analysis result;
step B, the performing predictive analysis and time-frequency transformation on the low-frequency sub-band domain signal comprises: dividing a frame of low-frequency subband domain signals into one or more than one sub-frames for predictive analysis according to the type analysis result of the low-frequency subband domain signals; performing predictive analysis on the low-frequency sub-band domain signals according to the sub-frames to obtain low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to a dividing sequence to generate a frame of low-frequency sub-band domain excitation signals;
dividing the frame of low-frequency sub-band domain excitation signal into one or more sub-frames for time-frequency transformation according to the type analysis result of the low-frequency sub-band domain signal; performing time-frequency transformation on the low-frequency sub-band domain excitation signal according to the sub-frame to obtain a low-frequency sub-band domain excitation spectrum of the sub-frame;
step D further comprises multiplexing the low frequency subband domain signal type analysis results.
6. The encoding method according to claim 4, wherein when said step C calculates the high frequency subband-domain parameters from the high frequency subband-domain signal and the low frequency subband-domain excitation spectrum, said step C comprises:
c1, obtaining a high-frequency sub-band domain excitation signal according to the high-frequency sub-band domain signal, and performing time-frequency transformation on the high-frequency sub-band domain excitation signal to obtain a high-frequency sub-band domain excitation spectrum; performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum according to the high-frequency sub-band domain excitation spectrum to obtain a spectrum adjustment parameter and a spectrum-adjusted low-frequency sub-band domain excitation spectrum; carrying out frequency-time conversion on the low-frequency sub-band domain excitation spectrum after spectrum adjustment to obtain a low-frequency sub-band domain excitation signal after spectrum adjustment;
c2, comprehensively filtering the spectrum-adjusted low-frequency sub-band domain excitation signal to obtain a reconstructed high-frequency sub-band domain signal, and extracting a time domain gain adjustment parameter according to the time domain gain of the reconstructed high-frequency sub-band domain signal and the time domain gain of the high-frequency sub-band domain signal; and obtaining the high-frequency subband domain parameters including the spectrum adjusting parameters obtained in the step C1 and the time domain gain adjusting parameters.
7. A mono sound decoding apparatus, comprising:
the bit stream demultiplexing module is used for demultiplexing the sound coding code stream to obtain low-frequency subband domain waveform coding data comprising at least 2 subbands and high-frequency subband domain parameter coding data comprising at least 2 subbands;
the low-frequency subband domain waveform decoding module is used for carrying out inverse quantization decoding on the low-frequency subband domain waveform coded data so as to obtain a low-frequency subband domain excitation spectrum;
the low-frequency sub-band domain frequency-time transformation module is used for carrying out frequency-time transformation on the low-frequency sub-band domain excitation spectrum so as to obtain a low-frequency sub-band domain excitation signal;
the low-frequency sub-band domain time-varying prediction synthesis module is used for performing prediction synthesis on the low-frequency sub-band domain excitation signal to acquire a low-frequency sub-band domain signal;
the high-frequency subband domain parameter decoding module is used for carrying out inverse quantization decoding on the high-frequency subband domain parameter coded data to obtain high-frequency subband domain parameters and recovering a high-frequency subband domain signal from the low-frequency subband domain excitation spectrum according to the high-frequency subband domain parameters; or restoring a high-frequency subband domain signal from the low-frequency subband domain excitation signal according to the high-frequency subband domain parameter;
and the synthesis subband filter bank module is used for carrying out subband synthesis on the low-frequency subband domain signal and the high-frequency subband domain signal so as to obtain a decoded single-channel sound signal.
8. The decoding apparatus as claimed in claim 7, wherein the bitstream demultiplexing module is further configured to obtain a low frequency subband domain signal type analysis result for restoring a monaural sound from the demultiplexed sound coding stream;
the low-frequency sub-band domain frequency-time transformation module is further used for dividing the received low-frequency sub-band domain excitation spectrum into one or more than one sub-frames for frequency-time transformation according to the low-frequency sub-band domain signal type analysis result;
and the low-frequency subband domain time-varying prediction synthesis module is further used for dividing the received low-frequency subband domain excitation signal into one or more than one sub-frames for prediction synthesis according to the analysis result of the low-frequency subband domain signal type.
9. The decoding apparatus of claim 7, wherein when the high frequency subband-domain parameter decoding module recovers a high frequency subband-domain signal from a low frequency subband-domain excitation spectrum according to high frequency subband-domain parameters, the high frequency subband-domain parameters include a high frequency LSF vector quantization index, a spectrum adjustment parameter, and a time-domain gain adjustment parameter; the high frequency subband domain parameter decoding module includes:
the spectrum parameter decoder is used for performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum according to the spectrum adjustment parameters, and performing frequency-time transformation on the low-frequency sub-band domain excitation spectrum after spectrum adjustment to obtain a low-frequency sub-band domain excitation signal after spectrum adjustment;
the self-adaptive time domain gain decoder is used for carrying out time domain gain adjustment on the low-frequency sub-band domain excitation signal according to the time domain gain adjustment parameter to obtain a low-frequency sub-band domain excitation signal after the time domain gain adjustment;
and the time-varying prediction synthesizer is used for obtaining a linear prediction synthesis filter according to the high-frequency LSF vector quantization index, performing synthesis filtering on the low-frequency sub-band domain excitation signal after the time domain gain adjustment to obtain a reconstructed high-frequency sub-band domain signal, and outputting the reconstructed high-frequency sub-band domain signal to the synthesis sub-band filter bank module.
10. A method for decoding monophonic sounds, the method comprising:
A. demultiplexing the sound coding code stream to obtain low-frequency sub-band domain waveform coding data comprising at least 2 sub-bands and high-frequency sub-band domain parameter coding data comprising at least 2 sub-bands;
B. performing inverse quantization decoding on the low-frequency subband domain waveform coded data to acquire a low-frequency subband domain excitation spectrum; performing frequency-time transformation and prediction synthesis on the low-frequency sub-band domain excitation spectrum to obtain a low-frequency sub-band domain signal;
C. performing inverse quantization decoding on the high-frequency subband domain parameter coded data to acquire high-frequency subband domain parameters, and recovering a high-frequency subband domain signal from the low-frequency subband domain excitation spectrum according to the high-frequency subband domain parameters; or restoring a high-frequency subband domain signal from a low-frequency subband domain excitation signal obtained through the frequency-time transformation according to the high-frequency subband domain parameter;
D. and performing subband synthesis on the low-frequency subband domain signal and the high-frequency subband domain signal, and outputting a decoded single-channel sound signal.
11. The decoding method according to claim 10, wherein the step a further comprises obtaining a low frequency subband-domain signal type analysis result for restoring a mono sound from the demultiplexed sound encoding code stream;
b, the frequency-time transformation and prediction synthesis of the low-frequency sub-band domain excitation spectrum comprises the following steps: dividing a frame of low-frequency subband domain excitation spectrum into one or more subframes for frequency-time conversion according to the analysis result of the low-frequency subband domain signal type; performing frequency-time conversion on the low-frequency sub-band domain excitation spectrum according to the sub-frames to obtain low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to a dividing sequence to generate a frame of low-frequency sub-band domain excitation signals;
dividing the frame of low-frequency subband domain excitation signal into one or more than one sub-frames for predictive synthesis according to the analysis result of the low-frequency subband domain signal type; and performing prediction synthesis on the low-frequency sub-band domain signals according to the sub-frames to obtain the low-frequency sub-band domain signals of the sub-frames.
12. The decoding method according to claim 10, wherein when said step C restores the high frequency subband-domain signal from the low frequency subband-domain excitation spectrum according to the high frequency subband-domain parameters, the high frequency subband-domain parameters include a high frequency LSF vector quantization index, a spectrum adjustment parameter, and a time-domain gain adjustment parameter; the step C comprises the following steps:
c1, performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum according to the spectrum adjustment parameters; performing frequency-time transformation on the adjusted low-frequency sub-band domain excitation spectrum to obtain a low-frequency sub-band domain excitation signal;
c2, performing time domain gain adjustment on the low-frequency sub-band domain excitation signal to obtain an adjusted low-frequency sub-band domain excitation signal;
and C3, performing comprehensive filtering on the adjusted low-frequency sub-band domain excitation signal by adopting a linear prediction comprehensive filter obtained according to the high-frequency LSF vector quantization index to obtain a reconstructed high-frequency sub-band domain signal.
13. A stereo encoding apparatus, characterized in that the encoding apparatus comprises:
the analysis subband filter bank module is used for performing subband decomposition on the left channel and the right channel of the stereo signal respectively to decompose the stereo signal into a low-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands and a high-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands;
the low-frequency sub-band domain time-varying prediction analysis module is used for respectively carrying out prediction analysis on the low-frequency sub-band domain signals of the left channel and the right channel so as to obtain low-frequency sub-band domain excitation signals of the left channel and the right channel;
the low-frequency sub-band domain time-frequency transformation module is used for respectively carrying out time-frequency transformation on the left and right sound channel low-frequency sub-band domain excitation signals so as to obtain low-frequency sub-band domain excitation spectrums of the left and right sound channels;
the low-frequency sub-band domain stereo coding module is used for carrying out stereo coding on the low-frequency sub-band domain excitation spectrums of the left and right sound channels so as to obtain low-frequency sub-band domain stereo coding data;
a high-frequency subband domain parameter coding module, configured to calculate, according to the low-frequency subband domain excitation spectrums of the left and right channels and the high-frequency subband domain signals of the left and right channels, high-frequency subband domain parameters of the left and right channels, which are used to recover the high-frequency subband domain signals of the left and right channels from the low-frequency subband domain excitation spectrums of the left and right channels, and perform quantization coding on the high-frequency subband domain parameters of the left and right channels, respectively, so as to obtain high-frequency subband domain parameter coded data of the left and right channels; or, respectively according to the low-frequency subband domain excitation signals of the left and right channels and the high-frequency subband domain signals of the left and right channels, calculating high-frequency subband domain parameters of the left and right channels for recovering the high-frequency subband domain signals of the left and right channels from the low-frequency subband domain excitation signals of the left and right channels, and respectively performing quantization coding on the high-frequency subband domain parameters of the left and right channels to obtain high-frequency subband domain parameter coding data of the left and right channels;
and the bit stream multiplexing module is used for multiplexing the low-frequency sub-band domain stereo coded data and the high-frequency sub-band domain parameter coded data of the left and right sound channels so as to output a stereo sound coded code stream.
14. The encoding apparatus as claimed in claim 13, wherein the encoding apparatus further comprises a low frequency subband sum signal type analyzing module for calculating a frame of low frequency subband sum signals from the low frequency subband sum signals of the left and right channels of the frame, performing signal type analysis on the low frequency subband sum signals, and outputting the low frequency subband sum signal type analysis result; outputting a signal type if the low frequency subband domain sum signal is a slowly varying signal; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and outputting a signal type and the fast-changing point position;
the low-frequency subband domain time-varying predictive analysis module is further used for dividing the received low-frequency subband domain signals of the left and right channels of one frame into one or more than one sub-frames for predictive analysis according to the low-frequency subband domain and the signal type analysis result;
the low-frequency sub-band domain time-frequency transformation module is further used for dividing the received left and right channel low-frequency sub-band domain excitation signals into one or more than one sub-frames for time-frequency transformation according to the low-frequency sub-band domain and the signal type analysis result;
the bit stream multiplexing module is further configured to multiplex the low-frequency subband domain and the signal type analysis result.
15. The encoding apparatus as claimed in claim 13, wherein the high frequency subband-domain parameter encoding module comprises two identical high frequency subband-domain parameter encoding sub-modules for extracting the high frequency subband-domain parameters of the left and right channels, respectively; when the high frequency subband domain parameter encoding sub-module calculates the high frequency subband domain parameters of the left/right channel according to the low frequency subband domain excitation spectrum of the left/right channel and the high frequency subband domain signals of the left/right channel, the high frequency subband domain parameter encoding sub-module includes:
the time-varying prediction analyzer is used for carrying out linear prediction analysis on the received high-frequency sub-band domain signals of the left/right channels to obtain high-frequency LSF vector quantization indexes and high-frequency sub-band domain excitation signals of the linear prediction filter of the left/right channels; encoding the high-frequency LSF vector quantization index of the left/right channel and outputting the encoded high-frequency LSF vector quantization index to the bit stream multiplexing module;
the spectrum parameter encoder is used for performing time-frequency transformation on the high-frequency sub-band domain excitation signal of the left/right channel to obtain a high-frequency sub-band domain excitation spectrum of the left/right channel, performing spectrum adjustment on the received low-frequency sub-band domain excitation spectrum of the left/right channel according to the high-frequency sub-band domain excitation spectrum of the left/right channel, performing quantization coding on the extracted spectrum adjustment parameter of the left/right channel and outputting the quantized coded spectrum adjustment parameter to the bit stream multiplexing module, and performing frequency-time transformation on the low-frequency sub-band domain excitation spectrum of the left/right channel after spectrum adjustment to obtain a low-frequency sub-band domain excitation signal of the left/right channel after spectrum adjustment;
the time-varying prediction synthesizer is used for adopting a linear prediction synthesis filter of the left/right sound channel obtained according to the high-frequency LSF vector quantization index to carry out synthesis filtering on the low-frequency sub-band domain excitation signal of the left/right sound channel after the spectrum adjustment so as to obtain a reconstructed high-frequency sub-band domain signal of the left/right sound channel;
and the time domain adaptive gain adjustment parameter extractor is used for comparing the high-frequency sub-band domain signals of the left/right channels with the reconstructed high-frequency sub-band domain signals of the left/right channels to obtain time domain gain adjustment parameters of the left/right channels, and outputting the time domain gain adjustment parameters to the bit stream multiplexing module after quantization coding.
16. The encoding apparatus of claim 13, wherein the low frequency subband-domain stereo encoding module is further configured to select from more than one selectable stereo encoding mode, encode using the selected stereo encoding mode, and output encoding mode selection information to the bitstream multiplexing module.
17. A stereo encoding method, characterized in that the method comprises:
A. performing subband decomposition on a left channel and a right channel of the stereo signal respectively to decompose the left channel and the right channel into a low-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands and a high-frequency subband domain signal of the left channel and the right channel comprising at least 2 subbands;
B. respectively carrying out predictive analysis and time-frequency transformation on the low-frequency sub-band domain signals of the left and right sound channels to obtain low-frequency sub-band domain excitation spectrums of the left and right sound channels; stereo coding is carried out on the low-frequency sub-band domain excitation spectrums of the left and right sound channels to obtain low-frequency sub-band domain stereo coding data;
C. calculating left and right channel high-frequency subband domain parameters for recovering the left and right channel high-frequency subband domain signals from the left and right channel low-frequency subband domain excitation spectrums according to the left and right channel high-frequency subband domain signals and the left and right channel low-frequency subband domain excitation spectrums, and respectively carrying out quantization coding on the left and right channel high-frequency subband domain parameters to obtain left and right channel high-frequency subband domain parameter coding data; or, respectively calculating left and right channel high frequency subband domain parameters for recovering the left and right channel high frequency subband domain signals from the left and right channel low frequency subband domain excitation signals according to the left and right channel high frequency subband domain signals and the left and right channel low frequency subband domain excitation signals obtained through the predictive analysis, and respectively performing quantization coding on the left and right channel high frequency subband domain parameters to obtain left and right channel high frequency subband domain parameter coding data;
D. and multiplexing the low-frequency sub-band domain stereo coded data and the high-frequency sub-band domain parameter coded data of the left and right sound channels to output stereo sound coded code streams.
18. The encoding method of claim 17, wherein the method further comprises the steps of: b, calculating a frame of low-frequency sub-band domain sum signal according to the low-frequency sub-band domain signals of the left channel and the right channel of the frame obtained in the step A, carrying out signal type analysis on the low-frequency sub-band domain sum signal, and determining the low-frequency sub-band domain sum signal type analysis result; if the low-frequency sub-band domain sum signal is a slowly-varying signal, taking the signal type as the analysis result of the low-frequency sub-band domain sum signal; if the signal is a fast-changing signal, further acquiring a fast-changing point position, and taking a signal type and the fast-changing point position as analysis results of the low-frequency sub-band domain and the signal type;
step B, the step of performing predictive analysis and time-frequency transformation on the low-frequency subband signals of the left/right channels comprises the following steps: dividing a frame of left/right channel low-frequency subband domain signals into one or more than one sub-frames for predictive analysis according to the low-frequency subband domain and the signal type analysis result; performing predictive analysis on the low-frequency sub-band domain signals of the left/right channels according to the sub-frames to obtain left/right channel low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to the dividing sequence to generate a frame of left/right channel low-frequency sub-band domain excitation signals;
dividing the left/right channel low-frequency subband domain excitation signal into one or more subframes for time-frequency transformation according to the low-frequency subband domain and the signal type analysis result; performing time-frequency transformation on the left/right channel low-frequency sub-band domain excitation signal according to the sub-frame to obtain a left/right channel low-frequency sub-band domain excitation spectrum of the sub-frame;
step D further comprises multiplexing the low frequency subband domain and the signal type analysis result.
19. The encoding method according to claim 17, wherein when said step C calculates the high frequency subband domain parameters of the left/right channel from the high frequency subband domain signals of the left/right channel and the low frequency subband domain excitation spectra of the left/right channel, said step C comprises:
c1, obtaining a high-frequency sub-band domain excitation signal of the left/right channel according to the high-frequency sub-band domain signal of the left/right channel, and performing time-frequency transformation on the high-frequency sub-band domain excitation signal of the left/right channel to obtain a high-frequency sub-band domain excitation spectrum of the left/right channel; performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum of the left/right channel according to the high-frequency sub-band domain excitation spectrum of the left/right channel to obtain a spectrum adjustment parameter of the left/right channel and the low-frequency sub-band domain excitation spectrum of the left/right channel after the spectrum adjustment; performing frequency-time conversion on the low-frequency sub-band domain excitation spectrum of the left/right channel after spectrum adjustment to obtain a low-frequency sub-band domain excitation signal of the left/right channel after spectrum adjustment;
c2, carrying out comprehensive filtering on the spectrum-adjusted low-frequency sub-band domain excitation signals of the left/right channel to obtain a reconstructed high-frequency sub-band domain signal of the left/right channel, and extracting time domain gain adjustment parameters of the left/right channel according to the time domain gain of the reconstructed high-frequency sub-band domain signal of the left/right channel and the time domain gain of the high-frequency sub-band domain signal of the left/right channel; obtaining the high frequency subband domain parameters of the left/right channel including the spectrum adjustment parameters of the left/right channel obtained in step C1 and the time domain gain adjustment parameters of the left/right channel.
20. The encoding method of claim 17, wherein the stereo encoding of the low frequency subband domain excitation spectra of the left and right channels of step B comprises:
b2, dividing the low-frequency sub-band excitation spectrum of the left channel and the right channel into a plurality of sub-bands, and selecting a stereo coding mode for each sub-band to carry out stereo coding;
step D further comprises multiplexing the coding mode selection information.
21. The encoding method of claim 20, wherein the step B2 of selecting a stereo coding mode for the subband k comprises:
b21, respectively adopting more than one selectable stereo coding modes to carry out stereo coding and decoding on the low-frequency subband domain excitation spectrums of the left and right channels of the subband k, calculating the error between the low-frequency subband domain excitation spectrums of the left and right channels of the decoded and restored subband k and the low-frequency subband domain excitation spectrums of the left and right channels of the subband k before coding, and selecting the coding mode with the minimum error as the stereo coding mode of the subband k by comparison; or
B22, in the low frequency sub-band domain, adopting a fixed stereo coding mode for sub-bands with frequencies higher than a certain value; the sub-bands having frequencies below the determined value are subjected to the step B21 to select a stereo coding mode.
22. The encoding method of claim 21, wherein when the stereo encoding mode is a parametric stereo encoding mode, the performing stereo encoding comprises:
b31, calculating a weighting parameter g for the first channel low-frequency sub-band excitation spectrum in the left and right channels in the sub-band kr(k) Making the second channel low-frequency subband domain excitation spectrum sum adopt gr(k) After the scaling, the excitation spectrum energy of the low-frequency sub-band domain of the first sound channel is equal;
b32, calculating a weighted sum excitation spectrum and a weighted difference excitation spectrum according to the second channel subband excitation spectrum and the scaled first channel low-frequency subband excitation spectrum;
b33, calculating an orthogonal excitation spectrum which is perpendicular to the weighted sum excitation spectrum in a constant amplitude mode;
b34, calculating a weighting parameter g according to the orthogonal excitation spectrum and the weighted difference excitation spectrumd(k) Such that said weighted difference excitation spectrum sum adopts gd(k) The energy of the scaled orthogonal excitation spectrums is equal;
b35, weighting and exciting spectrum, gr(k) And gd(k) Respectively performing quantization coding, using the weighted sum excitation spectrum of quantization coding as low-frequency subband stereo coding data, and quantizing the coded gr(k) And gd(k) As parameters for stereo decoding;
step D further comprises weighting said parameter gr(k) And gd(k) Multiplexing is carried out;
or, when the stereo encoding mode is the parametric stereo encoding mode, the performing stereo encoding includes:
b41, in subband k, calculating a weighting parameter g according to the following formulad(k):
Wherein,
<math>
<mrow>
<mi>a</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>∈</mo>
<mi>band</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
</mrow>
</munder>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>r</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<msub>
<mi>y</mi>
<mi>l</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>-</mo>
<msub>
<mi>x</mi>
<mi>l</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<msub>
<mi>y</mi>
<mi>r</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
xland ylRespectively the real part and imaginary part, x, of the left channel low-frequency subband domain excitation spectrumrAnd yrRespectively a real part and an imaginary part of the excitation spectrum of the low-frequency subband domain of the right channel;
b42, calculating a weighting parameter g according to the following formular(k):
Wherein,
<math>
<mrow>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>∈</mo>
<mi>band</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
</mrow>
</munder>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>l</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<msub>
<mi>x</mi>
<mi>l</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>+</mo>
<msub>
<mi>y</mi>
<mi>l</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<msub>
<mi>y</mi>
<mi>l</mi>
</msub>
<mo>[</mo>
<mi>i</mi>
<mo>,</mo>
<mi>k</mi>
<mo>]</mo>
<mo>)</mo>
</mrow>
<mo>;</mo>
</mrow>
</math>
g (k) an importance factor for the assignment of parametric stereo coding errors to the left and right channels;
b43, calculating a weighted and excited spectrum according to the following formula,
wherein x ismAnd ymRepresenting the real and imaginary parts of the weighted and excitation spectra, respectively;
step D further comprises weighting said parameter gr(k) And gd(k) Multiplexing is carried out;
when the stereo encoding mode is a parametric error stereo encoding mode, the performing stereo encoding includes:
b51, calculating a weighting parameter g for the first channel low-frequency sub-band excitation spectrum in the left and right channels in the sub-band kr(k) Making the second channel low-frequency subband domain excitation spectrum sum adopt gr(k) After the scaling, the excitation spectrum energy of the low-frequency sub-band domain of the first sound channel is equal;
b52, calculating a weighted sum excitation spectrum and a weighted difference excitation spectrum according to the second channel subband excitation spectrum and the scaled first channel low-frequency subband excitation spectrum;
b53, calculating an orthogonal excitation spectrum which is perpendicular to the weighted sum excitation spectrum in a constant amplitude mode;
b54, calculating a weighting parameter g according to the orthogonal excitation spectrum and the weighted difference excitation spectrumd(k) Such that said weighted difference excitation spectrum sum adopts gd(k) The energy of the scaled orthogonal excitation spectrums is equal;
b55, calculating the error excitation spectrum of the weighted difference excitation spectrum and the scaled orthogonal excitation spectrum;
b56, excitation spectrum of weighted sum, excitation spectrum of error, gr(k) And gd(k) Respectively carrying out quantization coding by taking the weighted sum excitation spectrum and the error excitation spectrum of the quantization coding asLow frequency sub-band stereo coded data, g to be quantization codedr(k) And gd(k) As parameters for stereo decoding;
step D further comprises weighting said parameter gr(k) And gd(k) Multiplexing is performed.
23. A stereo decoding apparatus, characterized in that the decoding apparatus comprises:
the bit stream demultiplexing module is used for demultiplexing the stereo sound coding stream to obtain low-frequency subband domain stereo coding data comprising at least 2 subbands and high-frequency subband domain parameter coding data comprising left and right channels of at least 2 subbands;
the low-frequency sub-band domain stereo decoding module is used for carrying out stereo decoding on the low-frequency sub-band domain stereo coded data so as to obtain low-frequency sub-band domain excitation spectrums of a left channel and a right channel;
the low-frequency sub-band domain time transformation module is used for respectively carrying out frequency time transformation on the low-frequency sub-band domain excitation spectrums of the left and right sound channels so as to obtain low-frequency sub-band domain excitation signals of the left and right sound channels;
the low-frequency sub-band domain time-varying prediction synthesis module is used for respectively carrying out prediction synthesis on the low-frequency sub-band domain excitation signals of the left sound channel and the right sound channel so as to obtain low-frequency sub-band domain signals of the left sound channel and the right sound channel;
a high-frequency subband domain parameter decoding module, configured to perform inverse quantization decoding on the high-frequency subband domain parameter encoded data of the left and right channels to obtain high-frequency subband domain parameters of the left and right channels, and recover high-frequency subband domain signals of the left and right channels from low-frequency subband domain excitation spectrums of the left and right channels according to the high-frequency subband domain parameters of the left and right channels, respectively; or restoring high-frequency subband domain signals of the left channel and the right channel from the low-frequency subband domain excitation signals of the left channel and the right channel according to the high-frequency subband domain parameters of the left channel and the right channel respectively;
and the synthesis subband filter bank module is used for carrying out subband synthesis on the low-frequency subband domain signals of the left and right channels and the high-frequency subband domain signals of the left and right channels so as to obtain the decoded stereo signals of the left and right channels.
24. The decoding apparatus as claimed in claim 23, wherein the bitstream demultiplexing module is further configured to obtain a low frequency subband domain and a signal type analysis result for restoring stereo sound from the demultiplexed sound coding stream of stereo sound;
the low-frequency sub-band domain frequency-time conversion module is further used for dividing the received low-frequency sub-band domain excitation spectrums of the left and right sound channels into one or more than one sub-frames for frequency-time conversion according to the low-frequency sub-band domain and the signal type analysis result;
and the low-frequency subband domain time-varying prediction synthesis module is further used for dividing the received low-frequency subband domain excitation signals of the left and right channels into one or more than one sub-frames for prediction synthesis according to the low-frequency subband domain and the signal type analysis result.
25. The decoding apparatus as claimed in claim 23, wherein the high frequency subband-domain parameter decoding module includes two identical high frequency subband-domain parameter decoding sub-modules for restoring the high frequency subband-domain signals of the left and right channels, respectively; when the high-frequency subband domain parameter coding module recovers high-frequency subband domain signals from low-frequency subband domain excitation spectrums of the left and right channels according to high-frequency subband domain parameters of the left and right channels, the high-frequency subband domain parameters of the left/right channels comprise high-frequency LSF vector quantization indexes, spectrum adjustment parameters and time domain gain adjustment parameters; the high frequency subband domain parameter decoding sub-module comprises:
the spectrum parameter decoder is used for performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum of the left/right channel according to the spectrum adjustment parameter of the left/right channel, performing frequency-time transformation on the low-frequency sub-band domain excitation spectrum of the left/right channel after spectrum adjustment, and obtaining a low-frequency sub-band domain excitation signal of the left/right channel after spectrum adjustment;
the adaptive time domain gain decoder is used for carrying out time domain gain adjustment on the low-frequency sub-band domain excitation signals of the left/right sound channel according to the time domain gain adjustment parameters of the left/right sound channel to obtain the low-frequency sub-band domain excitation signals of the left/right sound channel after the time domain gain adjustment;
and the time-varying prediction synthesizer is used for acquiring a linear prediction synthesis filter of the left/right channel by adopting the high-frequency LSF vector quantization index according to the left/right channel, performing synthesis filtering on the low-frequency sub-band domain excitation signal of the left/right channel after the time domain gain adjustment to obtain a reconstructed high-frequency sub-band domain signal of the left/right channel, and outputting the reconstructed high-frequency sub-band domain signal to the synthesis sub-band filter bank module.
26. The decoding apparatus as claimed in claim 23, wherein the bitstream multiplexing module is further configured to obtain coding mode selection information for stereo decoding from the demultiplexed sound coding streams of the stereo;
the low-frequency subband domain stereo decoding module is further configured to perform stereo decoding using a stereo decoding mode corresponding to the encoding mode selection information.
27. A stereo decoding method, characterized in that the method comprises:
A. demultiplexing a stereo sound coding code stream to obtain low-frequency sub-band domain stereo coding data comprising at least 2 sub-bands and high-frequency sub-band domain parameter coding data comprising left and right channels of at least 2 sub-bands;
B. performing stereo decoding on the low-frequency sub-band domain stereo coded data to obtain low-frequency sub-band domain excitation spectrums of a left sound channel and a right sound channel; respectively carrying out frequency-time transformation and prediction synthesis on the low-frequency subband domain excitation spectrums of the left and right sound channels to obtain low-frequency subband domain signals of the left and right sound channels;
C. carrying out inverse quantization decoding on the high-frequency subband domain parameter coded data of the left and right channels to obtain high-frequency subband domain parameters of the left and right channels, and respectively restoring high-frequency subband domain signals of the left and right channels from low-frequency subband domain excitation spectrums of the left and right channels according to the high-frequency subband domain parameters of the left and right channels; or restoring the high-frequency subband domain signals of the left and right channels from the low-frequency subband domain excitation signals of the left and right channels obtained through frequency-time conversion according to the high-frequency subband domain parameters of the left and right channels respectively;
D. and performing subband synthesis on the low-frequency subband domain signals of the left and right channels and the high-frequency subband domain signals of the left and right channels to obtain decoded stereo signals of the left and right channels.
28. The decoding method of claim 27, wherein the step a further comprises obtaining a low frequency subband domain and a signal type analysis result for restoring stereo sound from the demultiplexed sound encoding code stream;
b, the frequency-time transformation and the prediction synthesis of the low-frequency subband domain excitation spectrum of the left/right sound channel comprise the following steps: dividing a frame of the left/right channel low-frequency subband domain excitation spectrum into one or more subframes for frequency-time conversion according to the low-frequency subband domain and the signal type analysis result; performing frequency-time conversion on the left/right channel low-frequency sub-band domain excitation spectrum according to the sub-frames to obtain left/right channel low-frequency sub-band domain excitation signals of the sub-frames, and then combining the sub-frames according to the dividing sequence to generate a frame of left/right channel low-frequency sub-band domain excitation signals;
dividing the left/right channel low-frequency subband domain excitation signal into one or more than one sub-frames for prediction synthesis according to the low-frequency subband domain and signal type analysis result; and carrying out prediction synthesis on the left/right channel low-frequency sub-band domain signals according to the sub-frames to obtain the left/right channel low-frequency sub-band domain signals of the sub-frames.
29. The decoding method of claim 27, wherein when the step C restores the high frequency subband-domain signal of the left/right channel from the low frequency subband-domain excitation spectrum of the left/right channel according to the high frequency subband-domain parameters of the left/right channel, the high frequency subband-domain parameters of the left/right channel include a high frequency LSF vector quantization index, a spectrum adjustment parameter, and a time-domain gain adjustment parameter; the step C comprises the following steps:
c1, performing spectrum adjustment on the low-frequency sub-band domain excitation spectrum of the left/right channel according to the spectrum adjustment parameters of the left/right channel; performing frequency-time conversion on the adjusted low-frequency sub-band domain excitation spectrum of the left/right sound channel to obtain a low-frequency sub-band domain excitation signal of the left/right sound channel;
c2, performing time domain gain adjustment on the low-frequency sub-band domain excitation signal of the left/right channel according to the time domain gain adjustment parameter of the left/right channel to obtain an adjusted low-frequency sub-band domain excitation signal of the left/right channel;
and C3, performing comprehensive filtering on the adjusted low-frequency sub-band domain excitation signals of the left/right channels by adopting a linear prediction comprehensive filter obtained according to the high-frequency LSF vector quantization index of the left/right channels to obtain the reconstructed high-frequency sub-band domain signals of the left/right channels.
30. The decoding method of claim 27, wherein the step a further comprises obtaining coding mode selection information for stereo decoding from the demultiplexed sound encoded code stream;
step B, performing stereo decoding as follows: and decoding the low-frequency subband domain stereo coded data of each subband k by adopting the stereo decoding mode corresponding to the coding mode selection information.
31. The decoding method of claim 30, wherein when the stereo decoding mode is a parametric stereo decoding mode, the step a further comprises obtaining a weighting parameter g of quantization coding for stereo decoding from the demultiplexed sound encoded code streamr(k) And gd(k) The low-frequency sub-band domain stereo coded data is a weighted sum excitation spectrum of quantization coding; the performing stereo decoding includes:
b11, weighting and excitation spectrum of the quantization coding, weighting parameter g of the quantization codingr(k) And gd(k) Performing inverse quantization decoding; obtaining a weighted sum excitation spectrum and a weighted parameter gr(k) And gd(k);
B12, calculating an orthogonal excitation spectrum which is perpendicular to the weighted sum excitation spectrum in a constant amplitude mode;
b13, according to gd(k) Obtaining a zoomed orthogonal excitation spectrum;
b14, calculating the excitation spectrum of the second channel low-frequency sub-band domain and the excitation spectrum of the scaled first channel low-frequency sub-band domain in the left channel and the right channel according to the weighted excitation spectrum and the scaled orthogonal excitation spectrum;
b15, according to gr(k) Obtaining a restored low-frequency sub-band domain excitation spectrum of the first sound channel;
when the stereo decoding mode is a parameter error stereo decoding mode, the step A further comprises the step of obtaining a weighting parameter g of quantization coding for stereo decoding from the de-multiplexed sound coding code streamr(k) And gd(k) The low-frequency sub-band domain stereo coded data is a weighted sum excitation spectrum and an error excitation spectrum of quantization coding; the performing stereo decoding includes:
b21, weighting sum excitation spectrum of quantization coding, error excitation spectrum of quantization coding, weighting parameter g of quantization codingr(k) And gd(k) Performing inverse quantization decoding; obtaining a weighted sum excitation spectrum, an error excitation spectrum and a weighted parameter gr(k) And gd(k);
B22, calculating an orthogonal excitation spectrum which is perpendicular to the weighted sum excitation spectrum in a constant amplitude mode;
b23, according to gd(k) Obtaining a zoomed orthogonal excitation spectrum;
b24, adding the scaled orthogonal excitation spectrum and the weighted sum excitation spectrum to obtain a restored weighted difference excitation spectrum;
b25, calculating the excitation spectrum of the second channel low-frequency sub-band domain in the left and right channels and the excitation spectrum of the scaled first channel low-frequency sub-band domain according to the weighted sum excitation spectrum and the weighted difference excitation spectrum;
b26, according to gr(k) And restoring the low-frequency subband domain excitation spectrum of the first sound channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100888785A CN101276587B (en) | 2007-03-27 | 2007-03-27 | Audio encoding apparatus and method thereof, audio decoding device and method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2007100888785A CN101276587B (en) | 2007-03-27 | 2007-03-27 | Audio encoding apparatus and method thereof, audio decoding device and method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101276587A true CN101276587A (en) | 2008-10-01 |
CN101276587B CN101276587B (en) | 2012-02-01 |
Family
ID=39995948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2007100888785A Active CN101276587B (en) | 2007-03-27 | 2007-03-27 | Audio encoding apparatus and method thereof, audio decoding device and method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101276587B (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101436407B (en) * | 2008-12-22 | 2011-08-24 | 西安电子科技大学 | Audio codec method |
CN102177545A (en) * | 2009-04-09 | 2011-09-07 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for generating a synthesis audio signal and for encoding an audio signal |
CN102737636A (en) * | 2011-04-13 | 2012-10-17 | 华为技术有限公司 | Audio coding method and device thereof |
CN102006483B (en) * | 2009-09-03 | 2013-05-01 | 中兴通讯股份有限公司 | Video coding and decoding method and device |
CN103119648A (en) * | 2010-09-22 | 2013-05-22 | 杜比实验室特许公司 | Efficient implementation of phase-shift filtering for decorrelation and other applications in audio coding systems |
CN103714822A (en) * | 2013-12-27 | 2014-04-09 | 广州华多网络科技有限公司 | Sub-band coding and decoding method and device based on SILK coder decoder |
CN103854650A (en) * | 2012-11-30 | 2014-06-11 | 中兴通讯股份有限公司 | Stereo audio coding method and device |
CN104078048A (en) * | 2013-03-29 | 2014-10-01 | 北京天籁传音数字技术有限公司 | Acoustic decoding device and method thereof |
CN104103276A (en) * | 2013-04-12 | 2014-10-15 | 北京天籁传音数字技术有限公司 | Sound coding device, sound decoding device, sound coding method and sound decoding method |
CN105164749A (en) * | 2013-04-30 | 2015-12-16 | 杜比实验室特许公司 | Hybrid encoding of multichannel audio |
CN107134280A (en) * | 2013-09-12 | 2017-09-05 | 杜比国际公司 | The coding of multichannel audio content |
CN107221334A (en) * | 2016-11-01 | 2017-09-29 | 武汉大学深圳研究院 | The method and expanding unit of a kind of audio bandwidth expansion |
CN109036457A (en) * | 2018-09-10 | 2018-12-18 | 广州酷狗计算机科技有限公司 | Restore the method and apparatus of audio signal |
WO2020051786A1 (en) * | 2018-09-12 | 2020-03-19 | Shenzhen Voxtech Co., Ltd. | Signal processing device having multiple acoustic-electric transducers |
CN110941415A (en) * | 2019-11-08 | 2020-03-31 | 北京达佳互联信息技术有限公司 | An audio file processing method, device, electronic device and storage medium |
CN111862994A (en) * | 2020-05-30 | 2020-10-30 | 北京声连网信息科技有限公司 | A method and device for decoding a sound wave signal |
CN112365897A (en) * | 2020-11-26 | 2021-02-12 | 北京百瑞互联技术有限公司 | Method, device and medium for self-adaptively adjusting interframe transmission code rate of LC3 encoder |
WO2021179788A1 (en) * | 2020-03-11 | 2021-09-16 | 腾讯科技(深圳)有限公司 | Speech signal encoding and decoding methods, apparatuses and electronic device, and storage medium |
CN113646836A (en) * | 2019-03-27 | 2021-11-12 | 诺基亚技术有限公司 | Sound field dependent rendering |
WO2022012554A1 (en) * | 2020-07-17 | 2022-01-20 | 华为技术有限公司 | Multi-channel audio signal encoding method and apparatus |
WO2022012628A1 (en) * | 2020-07-17 | 2022-01-20 | 华为技术有限公司 | Multi-channel audio signal encoding/decoding method and device |
CN113990332A (en) * | 2018-01-26 | 2022-01-28 | 杜比国际公司 | Retrospective compatible integration of high frequency reconstruction techniques for audio signals |
US11665482B2 (en) | 2011-12-23 | 2023-05-30 | Shenzhen Shokz Co., Ltd. | Bone conduction speaker and compound vibration device thereof |
-
2007
- 2007-03-27 CN CN2007100888785A patent/CN101276587B/en active Active
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101436407B (en) * | 2008-12-22 | 2011-08-24 | 西安电子科技大学 | Audio codec method |
CN102177545B (en) * | 2009-04-09 | 2013-03-27 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for generating a synthesis audio signal and for encoding an audio signal |
CN102177545A (en) * | 2009-04-09 | 2011-09-07 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for generating a synthesis audio signal and for encoding an audio signal |
CN102006483B (en) * | 2009-09-03 | 2013-05-01 | 中兴通讯股份有限公司 | Video coding and decoding method and device |
CN103119648A (en) * | 2010-09-22 | 2013-05-22 | 杜比实验室特许公司 | Efficient implementation of phase-shift filtering for decorrelation and other applications in audio coding systems |
CN103119648B (en) * | 2010-09-22 | 2015-06-17 | 杜比实验室特许公司 | Efficient implementation of phase shift filtering for decorrelation and other applications in an audio coding system |
CN102737636A (en) * | 2011-04-13 | 2012-10-17 | 华为技术有限公司 | Audio coding method and device thereof |
CN102737636B (en) * | 2011-04-13 | 2014-06-04 | 华为技术有限公司 | Audio coding method and device thereof |
WO2012139401A1 (en) * | 2011-04-13 | 2012-10-18 | 华为技术有限公司 | Audio coding method and device |
US11665482B2 (en) | 2011-12-23 | 2023-05-30 | Shenzhen Shokz Co., Ltd. | Bone conduction speaker and compound vibration device thereof |
CN103854650A (en) * | 2012-11-30 | 2014-06-11 | 中兴通讯股份有限公司 | Stereo audio coding method and device |
CN104078048B (en) * | 2013-03-29 | 2017-05-03 | 北京天籁传音数字技术有限公司 | Acoustic decoding device and method thereof |
CN104078048A (en) * | 2013-03-29 | 2014-10-01 | 北京天籁传音数字技术有限公司 | Acoustic decoding device and method thereof |
CN104103276A (en) * | 2013-04-12 | 2014-10-15 | 北京天籁传音数字技术有限公司 | Sound coding device, sound decoding device, sound coding method and sound decoding method |
CN104103276B (en) * | 2013-04-12 | 2017-04-12 | 北京天籁传音数字技术有限公司 | Sound coding device, sound decoding device, sound coding method and sound decoding method |
CN105164749A (en) * | 2013-04-30 | 2015-12-16 | 杜比实验室特许公司 | Hybrid encoding of multichannel audio |
CN105164749B (en) * | 2013-04-30 | 2019-02-12 | 杜比实验室特许公司 | Hybrid encoding of multi-channel audio |
CN107134280A (en) * | 2013-09-12 | 2017-09-05 | 杜比国际公司 | The coding of multichannel audio content |
US11776552B2 (en) | 2013-09-12 | 2023-10-03 | Dolby International Ab | Methods and apparatus for decoding encoded audio signal(s) |
CN107134280B (en) * | 2013-09-12 | 2020-10-23 | 杜比国际公司 | Encoding of multi-channel audio content |
US11410665B2 (en) | 2013-09-12 | 2022-08-09 | Dolby International Ab | Methods and apparatus for decoding encoded audio signal(s) |
CN103714822A (en) * | 2013-12-27 | 2014-04-09 | 广州华多网络科技有限公司 | Sub-band coding and decoding method and device based on SILK coder decoder |
CN107221334A (en) * | 2016-11-01 | 2017-09-29 | 武汉大学深圳研究院 | The method and expanding unit of a kind of audio bandwidth expansion |
CN113990332A (en) * | 2018-01-26 | 2022-01-28 | 杜比国际公司 | Retrospective compatible integration of high frequency reconstruction techniques for audio signals |
CN109036457A (en) * | 2018-09-10 | 2018-12-18 | 广州酷狗计算机科技有限公司 | Restore the method and apparatus of audio signal |
CN109036457B (en) * | 2018-09-10 | 2021-10-08 | 广州酷狗计算机科技有限公司 | Method and apparatus for restoring audio signal |
WO2020051786A1 (en) * | 2018-09-12 | 2020-03-19 | Shenzhen Voxtech Co., Ltd. | Signal processing device having multiple acoustic-electric transducers |
US11373671B2 (en) | 2018-09-12 | 2022-06-28 | Shenzhen Shokz Co., Ltd. | Signal processing device having multiple acoustic-electric transducers |
US11875815B2 (en) | 2018-09-12 | 2024-01-16 | Shenzhen Shokz Co., Ltd. | Signal processing device having multiple acoustic-electric transducers |
CN113646836A (en) * | 2019-03-27 | 2021-11-12 | 诺基亚技术有限公司 | Sound field dependent rendering |
US12058511B2 (en) | 2019-03-27 | 2024-08-06 | Nokia Technologies Oy | Sound field related rendering |
CN110941415A (en) * | 2019-11-08 | 2020-03-31 | 北京达佳互联信息技术有限公司 | An audio file processing method, device, electronic device and storage medium |
CN110941415B (en) * | 2019-11-08 | 2023-11-28 | 北京达佳互联信息技术有限公司 | Audio file processing method and device, electronic equipment and storage medium |
WO2021179788A1 (en) * | 2020-03-11 | 2021-09-16 | 腾讯科技(深圳)有限公司 | Speech signal encoding and decoding methods, apparatuses and electronic device, and storage medium |
CN111862994A (en) * | 2020-05-30 | 2020-10-30 | 北京声连网信息科技有限公司 | A method and device for decoding a sound wave signal |
WO2022012554A1 (en) * | 2020-07-17 | 2022-01-20 | 华为技术有限公司 | Multi-channel audio signal encoding method and apparatus |
WO2022012628A1 (en) * | 2020-07-17 | 2022-01-20 | 华为技术有限公司 | Multi-channel audio signal encoding/decoding method and device |
CN112365897A (en) * | 2020-11-26 | 2021-02-12 | 北京百瑞互联技术有限公司 | Method, device and medium for self-adaptively adjusting interframe transmission code rate of LC3 encoder |
CN112365897B (en) * | 2020-11-26 | 2024-07-09 | 北京百瑞互联技术股份有限公司 | Method, device and medium for adaptively adjusting inter-frame transmission code rate of LC3 encoder |
Also Published As
Publication number | Publication date |
---|---|
CN101276587B (en) | 2012-02-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101276587B (en) | Audio encoding apparatus and method thereof, audio decoding device and method thereof | |
JP7483792B2 (en) | Decoding device and method for decoding an encoded audio signal | |
JP4950210B2 (en) | Audio compression | |
KR101589942B1 (en) | Cross product enhanced harmonic transposition | |
JP6980871B2 (en) | Signal coding method and its device, and signal decoding method and its device | |
CN101086845B (en) | Sound coding device and method and sound decoding device and method | |
CN103366749B (en) | A kind of sound codec devices and methods therefor | |
CN103366750B (en) | A kind of sound codec devices and methods therefor | |
WO2005096274A1 (en) | An enhanced audio encoding/decoding device and method | |
JP5629319B2 (en) | Apparatus and method for efficiently encoding quantization parameter of spectral coefficient coding | |
CN103366751B (en) | A kind of sound codec devices and methods therefor | |
WO2009125588A1 (en) | Encoding device and encoding method | |
JP2024529883A (en) | Parametric Audio Coding per Integration Band | |
RU2409874C9 (en) | Audio signal compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20081001 Assignee: Pan Xingde Assignor: Beijing Tianlai Chuanyin Digital Technology Co., Ltd. Contract record no.: 2013990000772 Denomination of invention: Audio encoding apparatus and method thereof, audio decoding device and method thereof Granted publication date: 20120201 License type: Common License Record date: 20131119 |
|
LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model |