HK1172139B - Frequency band enlarging apparatus and method, encoding apparatus and method, decoding apparatus and method - Google Patents
Frequency band enlarging apparatus and method, encoding apparatus and method, decoding apparatus and method Download PDFInfo
- Publication number
- HK1172139B HK1172139B HK12112699.5A HK12112699A HK1172139B HK 1172139 B HK1172139 B HK 1172139B HK 12112699 A HK12112699 A HK 12112699A HK 1172139 B HK1172139 B HK 1172139B
- Authority
- HK
- Hong Kong
- Prior art keywords
- frequency
- power
- band
- signal
- subband
- Prior art date
Links
Description
Technical Field
The present invention relates to a band extending apparatus and method, an encoding apparatus and method, a decoding apparatus and method, and a program, and particularly relates to a band extending apparatus and method, an encoding apparatus and method, a decoding apparatus and method, and a program in which a music signal can be played with higher sound quality due to extension of a frequency band.
Background
In recent years, music distribution services that distribute music data over a network or the like are increasingly widely used. With such a music distribution service, encoded data obtained by encoding a music signal is distributed as music data. As an encoding method of a music signal, an encoding method that suppresses the file capacity of encoded data and reduces the bit rate in order to reduce the amount of time spent in a download event has become mainstream.
Such music signal encoding methods are broadly divided into: an encoding method such as MP3 (MPEG (moving picture experts group) audio layer 3) (international standard ISO/IEC 11172-3); and encoding methods such as HE-AAC (high efficiency MPEG4 AAC) (international standard ISO/IEC 14496-3).
In the encoding method represented by MP3, music signal components of a high frequency band (hereinafter referred to as "high frequency") of about 15kHz or higher, which are difficult to be detected by the human ear, are deleted, and signal components of the remaining low frequency band (hereinafter referred to as "low frequency") are encoded. This coding method will be referred to as a high frequency erasure coding method hereinafter. With this high-frequency erasure coding method, the file capacity of coded data can be suppressed. However, high-frequency sounds can be detected by a person, although minutely, so if sounds are generated and output from a decoded music signal obtained by decoding the encoded data, deterioration in sound quality occurs, such as loss of realism possessed by the original sounds or the sounds become dull.
In contrast, in the encoding method represented by HE-AAC, the feature information is extracted from the high-frequency signal component, and the feature information is encoded together with the low-frequency signal component. Such a coding method will be hereinafter referred to as a high-frequency characteristic coding method. In the high-frequency feature encoding method, only feature information of a high-frequency signal component is encoded as information relating to the high-frequency signal component, whereby it is possible to improve encoding efficiency while suppressing deterioration of sound quality.
In decoding encoded data that has been encoded using a high-frequency feature encoding method, a low-frequency signal component and feature information are decoded, and a high-frequency signal component is generated from the low-frequency signal component and the feature information after decoding. Thus, such a technique of expanding the frequency band of the low-frequency signal component by generating the high-frequency signal component from the low-frequency signal component will be referred to as a band expansion technique hereinafter.
As an application example of the band extending technique, there may be post-processing after decoding the encoded data using the above-described high-frequency deletion encoding method. In this post-processing, the band of the low-frequency signal component is extended by generating a high-frequency signal component lost in encoding from the low-frequency signal component after decoding (see PTL 1). Note that the method for band extension in PTL1 will be referred to as a PTL1 band extension method hereinafter.
With regard to the PTL1 band expansion method, the device estimates a high-frequency power spectrum (hereinafter referred to as a high-frequency envelope where applicable) from the power spectrum of an input signal, using a low-frequency signal component after decoding as the input signal, and generates a high-frequency signal component having the frequency envelope of the high frequency from the low-frequency signal component.
Fig. 1 shows an example of a decoded low frequency power spectrum and an estimated high frequency envelope as an input signal.
In fig. 1, the vertical axis represents power in a logarithmic manner, and the horizontal axis represents frequency.
The device determines a frequency band of a low frequency side of a high frequency signal component (hereinafter referred to as an "extension start frequency band") according to the type of encoding format related to an input signal and information such as a sampling rate, a bit rate, and the like (hereinafter referred to as "side information"). Next, the apparatus divides the input signal, which is a low-frequency signal component, into a plurality of subband signals. The apparatus finds an average value (hereinafter referred to as "group power") for each group in the time direction of the power of each of a plurality of subband signals after division, that is, a plurality of subband signals on the low frequency side (hereinafter referred to simply as "low frequency side") from the extension start band. As shown in fig. 1, the device uses, as power, an average value of respective group powers of a plurality of subband signals on the low frequency side, and uses, as a starting point, a point at a frequency on the lower edge of the extension start band. The device estimates a linear line passing through the start point with a predetermined slope as a frequency envelope on the high frequency side (hereinafter referred to as high frequency side) from the extension start band. Note that the position of the power direction for the starting point may be adjusted by the user. The apparatus generates each of the plurality of subband signals on the high frequency side from the plurality of subband signals on the low frequency side to become a frequency envelope on the high frequency side as estimated. The apparatus adds a plurality of generated subband signals on the high frequency side as high frequency signal components, and further adds and outputs low frequency signal components. Thereby, the music signal after the band expansion becomes closer to the original music signal. Therefore, a music signal having a higher sound quality can be played.
The PTL1 frequency band extension method described above has the following advantages: it is possible to expand the band of a music signal after decoding encoded data of the music signal, where such encoded data has various high frequency deletion encoding methods and various bit rates.
Reference list
Patent document
PTL 1: japanese unexamined patent application publication No.2008-139844
Disclosure of Invention
Technical problem
However, the PTL1 frequency band extension method can be improved for the following points: the estimated high-frequency-side frequency envelope is a linear line having a predetermined slope, i.e., the point at which the shape of the frequency envelope is fixed.
That is, the power spectrum of the music signal has various shapes, and depending on the type of the music signal, there are cases where the high-frequency side frequency envelope estimated using the PTL1 frequency band extension method is largely changed.
Fig. 2 shows an example of an original power spectrum of an attack-type music signal (attack-type music signal) accompanied by a temporal abrupt change, for example, when a drum is struck with a loud sound once.
Note that fig. 2 also shows a low-frequency side signal component of an offensive music signal as an input signal and a high-frequency side frequency envelope estimated from the input signal in accordance with the PTL1 frequency band extension method together.
As shown in fig. 2, the original high-frequency side power spectrum of the attack-type music signal is approximately flat.
Conversely, the estimated high-frequency-side frequency envelope has a predetermined negative slope, and even if it is adjusted to a power closer to the original power spectrum at the start point, the difference from the original power spectrum increases as the frequency increases.
Thus, with the PTL1 band extension method, the estimated high-frequency-side frequency envelope cannot realize the original high-frequency-side frequency envelope with high accuracy. Therefore, if sound is generated and output from the music signal after band expansion, the clarity of sound will be lost from the listening point of view compared to the original sound.
In addition, with the high-frequency feature encoding method such as the above-described HE-ACC, the high-frequency side frequency envelope is used as feature information of the high-frequency signal component to be encoded, but the decoding side is required to reproduce the original high-frequency side frequency envelope in a highly accurate manner.
The present invention has taken such a situation into consideration and enables a music signal to be played with high sound quality due to the extension of the frequency band.
Solution to the problem
The band extending apparatus according to the first aspect of the present invention comprises: signal dividing means configured to divide an input signal into a plurality of subband signals; feature amount calculation means configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the signal division means; a high-frequency subband power estimating means configured to calculate an estimated value of a high-frequency subband power, which is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the feature amount calculating means; and high-frequency signal component generating means configured to generate a high-frequency signal component based on the plurality of subband signals divided by the signal dividing means and the estimated value of the high-frequency subband power calculated by the high-frequency subband power estimating means; thereby expanding the frequency band of the input signal using the high-frequency signal component generated by the high-frequency signal component generating means.
The feature amount calculation means may calculate low-frequency subband powers as powers of the plurality of subband signals as the feature amount.
The feature amount calculation means may calculate a temporal variation of low-frequency subband power as the power of the plurality of subband signals as the feature amount.
The feature amount calculation means may calculate a difference between a maximum power and a minimum power of the input signal in a predetermined frequency band as the feature amount.
The feature amount calculation means may calculate a temporal variation of a difference between a maximum value and a minimum value of power of the input signal in a predetermined frequency band as the feature amount.
The feature amount calculation means may calculate a slope of power of the input signal in a predetermined frequency band as the feature amount.
The feature amount calculation means may calculate a temporal change in a slope of power of the input signal in a predetermined frequency band as the feature amount.
The high-frequency subband power estimating means may calculate the estimated value of the high-frequency subband power based on the feature amount and a coefficient for each high-frequency subband obtained in advance by learning.
The coefficient of each high-frequency subband may be generated by performing clustering on a residual vector of the high-frequency signal component calculated using a coefficient of each high-frequency subband obtained by regression analysis using a plurality of teaching signals and performing regression analysis using the teaching signal belonging to each cluster obtained by the clustering.
The residual vectors may be normalized using a dispersion value of each component of a plurality of the residual vectors, and the normalized vectors are clustered.
The high-frequency subband power estimating means may calculate an estimated value of the high-frequency subband power based on the feature quantity, the coefficient for each of the high-frequency subbands, and a constant; the constant is calculated from a center-of-gravity vector of a new cluster obtained by further calculating the residual vector using a coefficient of each high-frequency subband obtained by regression analysis using the teaching signal belonging to the cluster and performing clustering of the residual vector into the new clusters.
The high frequency subband power estimating means may record, in an associated manner, coefficients for each of the high frequency subbands and pointers for determining the coefficients for each of the high frequency subbands, and further record a plurality of sets of the pointers and the constants, some of the plurality of sets may include pointers having the same value.
The high frequency signal generating means may generate the high frequency signal component from a low frequency subband power that is a power of the plurality of subband signals and an estimated value of the high frequency subband power.
A frequency band extending method according to a first aspect of the present invention includes: a signal dividing step arranged to divide an input signal into a plurality of subband signals; a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the processing in the signal dividing step; a high-frequency subband power estimating step configured to calculate an estimated value of a high-frequency subband power, which is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the processing in the feature amount calculating step; and a high-frequency signal component generating step configured to generate a high-frequency signal component based on the plurality of subband signals divided by the processing in the signal dividing step and the estimated value of the high-frequency subband power calculated by the processing in the high-frequency subband power estimating step; thereby expanding the frequency band of the input signal using the high-frequency signal component generated by the processing in the high-frequency signal component generating step.
A program according to the first aspect of the present invention includes: a signal dividing step arranged to divide an input signal into a plurality of subband signals; a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the processing in the signal dividing step; a high-frequency subband power estimating step configured to calculate an estimated value of a high-frequency subband power, which is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the processing in the feature amount calculating step; and a high-frequency signal component generating step configured to generate a high-frequency signal component based on the plurality of subband signals divided by the processing in the signal dividing step and the estimated value of the high-frequency subband power calculated by the processing in the high-frequency subband power estimating step; thereby causing a computer to execute processing for expanding the frequency band of the input signal using the high-frequency signal component generated by the processing in the high-frequency signal component generating step.
With regard to the first aspect of the present invention, an input signal is divided into a plurality of subband signals; calculating a feature quantity representing a feature of the input signal using the input signal and at least one of a plurality of divided subband signals; calculating an estimated value of a high-frequency subband power, which is a power of a subband signal having a frequency band higher than that of the input signal, based on the calculated feature amount; generating a high frequency signal component based on the plurality of divided subband signals and the calculated estimated value of the high frequency subband power; and generating a frequency band of the input signal using the generated high frequency signal component.
The encoding device according to the second aspect of the present invention includes: a subband dividing means configured to divide an input signal into a plurality of subbands and generate a low-frequency subband signal composed of a plurality of subbands on a low-frequency side and a high-frequency subband signal composed of a plurality of subbands on a high-frequency side; feature amount calculation means configured to calculate a feature amount representing a feature of the input signal using at least one of the input signal and the low frequency subband signal generated by the subband dividing means; pseudo high frequency sub-band power calculation means configured to calculate a pseudo high frequency sub-band power as a pseudo power of the high frequency sub-band signal based on the feature amount calculated by the feature amount calculation means; pseudo high frequency sub-band power difference calculation means configured to calculate a high frequency sub-band power as a power of the high frequency sub-band signal from the high frequency sub-band signal generated by the sub-band division means, and calculate a pseudo high frequency sub-band power difference which is a difference with respect to the pseudo high frequency sub-band power calculated by the pseudo high frequency sub-band power calculation means; a high frequency encoding means configured to encode the pseudo high frequency sub-band power difference calculated by the pseudo high frequency sub-band power difference calculating means to generate high frequency encoded data; a low frequency encoding device configured to encode a low frequency signal that is a low frequency signal of the input signal to generate low frequency encoded data; and multiplexing means configured to multiplex the low frequency encoded data generated by the low frequency encoding means and the high frequency encoded data generated by the high frequency encoding means to obtain an output code string.
The encoding apparatus may further include a low frequency decoding device configured to decode the low frequency encoding data generated by the low frequency encoding device to generate a low frequency signal; the subband dividing means generates a low frequency subband signal from the low frequency signal generated by the low frequency decoding means.
The high frequency encoding means may calculate a similarity between the pseudo high frequency sub-band power difference and a representative vector or representative value in a predetermined plurality of pseudo high frequency sub-band power difference spaces to generate an index corresponding to the representative vector or representative value whose similarity is the maximum value as the high frequency encoded data.
The pseudo high frequency sub-band power difference calculation means may calculate an evaluation value for each of a plurality of coefficients for calculating the pseudo high frequency sub-band power based on the pseudo high frequency sub-band power and the high frequency sub-band power of each sub-band; the high-frequency encoding means generates, as the high-frequency encoded data, an index indicating a coefficient of the evaluation value that is the highest evaluation value.
The pseudo high frequency sub-band power difference calculation means may calculate the evaluation value based on at least any one of a sum of squares of the pseudo high frequency sub-band power difference for each sub-band, a maximum value of an absolute value of the pseudo high frequency sub-band power for the sub-band, or an average value of the pseudo high frequency sub-band power difference for each sub-band.
The pseudo high frequency sub-band power difference calculation means may calculate the evaluation value based on the pseudo high frequency sub-band power difference of different frames.
The pseudo high frequency subband power difference calculating means may calculate the evaluation value using the pseudo high frequency subband power difference multiplied by a weight, which is a weight for each subband such that the weight of the subband is larger the more the subband is on the low frequency side.
The pseudo high frequency sub-band power difference calculation means may calculate the evaluation value using the pseudo high frequency sub-band power difference multiplied by a weight, which is a weight for each sub-band such that the higher the high frequency sub-band power of a sub-band is, the higher the weight of the sub-band is.
The encoding method according to the second aspect of the present invention comprises: a subband dividing step configured to divide an input signal into a plurality of subbands and generate a low-frequency subband signal composed of a plurality of subbands on a low-frequency side and a high-frequency subband signal composed of a plurality of subbands on a high-frequency side; a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using at least one of the input signal and the low frequency subband signal generated by the processing in the subband dividing step; a pseudo high frequency sub-band power calculating step configured to calculate a pseudo high frequency sub-band power as a pseudo power of the high frequency sub-band signal based on the feature amount calculated by the processing in the feature amount calculating step; a pseudo high frequency sub-band power difference calculation step configured to calculate a high frequency sub-band power as a power of the high frequency sub-band signal from the high frequency sub-band signal generated by the processing in the sub-band division step, and calculate a pseudo high frequency sub-band power difference which is a difference with respect to the pseudo high frequency sub-band power calculated by the processing in the pseudo high frequency sub-band power calculation step; a high-frequency encoding step configured to encode the pseudo high-frequency subband power difference calculated in the processing in the pseudo high-frequency subband power difference calculating step to generate high-frequency encoded data; a low frequency encoding step configured to encode a low frequency signal that is a low frequency signal of the input signal to generate low frequency encoded data; and a multiplexing step configured to multiplex the low frequency encoded data generated by the processing in the low frequency encoding step and the high frequency encoded data generated by the processing in the high frequency encoding step to obtain an output code string.
According to a second aspect of the present invention, a program causing a computer to execute processing including: a subband dividing step configured to divide an input signal into a plurality of subbands and generate a low-frequency subband signal composed of a plurality of subbands on a low-frequency side and a high-frequency subband signal composed of a plurality of subbands on a high-frequency side; a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using at least one of the input signal and the low-frequency subband signal generated by the processing in the subband dividing step; a pseudo high frequency subband power calculating step configured to calculate a pseudo high frequency subband power as a pseudo power of the high frequency subband signal based on the feature amount calculated by the processing in the feature amount calculating step; a pseudo high frequency sub-band power difference calculation step configured to calculate a high frequency sub-band power as a power of the high frequency sub-band signal from the high frequency sub-band signal generated by the processing in the sub-band division step, and calculate a pseudo high frequency sub-band power difference which is a difference with respect to the pseudo high frequency sub-band power calculated by the processing in the pseudo high frequency sub-band power calculation step; a high-frequency encoding step configured to encode the pseudo high-frequency subband power difference calculated in the processing in the pseudo high-frequency subband power difference calculating step to generate high-frequency encoded data; a low frequency encoding step configured to encode a low frequency signal that is a low frequency signal of the input signal to generate low frequency encoded data; and a multiplexing step configured to multiplex the low frequency encoded data generated by the processing in the low frequency encoding step and the high frequency encoded data generated by the processing in the high frequency encoding step to obtain an output code string.
With regard to the second aspect of the present invention, an input signal is divided into a plurality of subbands; generating a low frequency subband signal composed of a plurality of subbands on a low frequency side and a high frequency subband signal composed of a plurality of subbands on a high frequency side; calculating a feature quantity representing a feature of the input signal using at least one of the input signal and the generated low frequency subband signal; calculating a pseudo high frequency subband power as a pseudo power of the high frequency subband signal based on the calculated feature amount; calculating a high-frequency subband power as a power of the high-frequency subband signal from the generated high-frequency subband signal; calculating a pseudo high frequency sub-band power difference, the pseudo high frequency sub-band power difference being a difference with respect to the calculated pseudo high frequency sub-band power; encoding the calculated pseudo high frequency sub-band power difference to generate high frequency encoded data; encoding a low frequency signal that is a low frequency signal of the input signal to generate low frequency encoded data; and multiplexing the generated low frequency encoded data and the generated high frequency encoded data to obtain an output code string.
A decoding device according to a third aspect of the present invention comprises: demultiplexing means configured to demultiplex input encoded data into at least low frequency encoded data and an index; a low frequency decoding device configured to decode the low frequency encoded data to generate a low frequency signal; a subband dividing device configured to divide a band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal for each of the low frequency subbands; and generating means configured to generate the high frequency signal based on the index and the low frequency subband signal.
The index may be obtained based on the input signal before encoding and the high frequency signal estimated from the input signal at a device that encodes the input signal and outputs the encoded data.
The index may not be encoded.
The index may be information indicating an estimation coefficient for generating the high frequency signal.
The generating means may generate the high frequency signal based on the estimation coefficient indicated by the index among the plurality of estimation coefficients.
The generating means may comprise: feature amount calculation means configured to calculate a feature amount representing a feature of the encoded data using at least one of the low frequency signal and the low frequency subband signal; a high-frequency subband power calculating device configured to calculate a high-frequency subband power of a high-frequency subband signal of the high-frequency subband with respect to each of a plurality of high-frequency subbands constituting a band of the high-frequency signal by calculation using the feature amount and the estimation coefficient; and a high frequency signal generating device configured to generate the high frequency signal based on the high frequency subband power and the low frequency subband signal.
The high-frequency subband power calculating means may calculate the high-frequency subband power of the high-frequency subband by linearly combining a plurality of the feature amounts by using the estimation coefficient prepared for each of the high-frequency subbands.
The feature amount calculation means may calculate a low frequency subband power of the low frequency subband signal of each of the low frequency subbands as the feature amount.
The index may be information indicating an estimation coefficient among a plurality of the estimation coefficients, the high-frequency subband power closest to the high-frequency subband power obtained from the high-frequency signal of the input signal before encoding being obtained as a result of comparison between the high-frequency subband power obtained from the high-frequency signal of the input signal before encoding and the high-frequency subband power generated based on the estimation coefficient.
The index may be information indicating an estimation coefficient for each of which a sum of squares of differences between the high-frequency subband power obtained from the high-frequency signal of the input signal before encoding and the high-frequency subband power generated based on the estimation coefficient becomes minimum.
The encoding data may further include difference information representing a difference between the high frequency subband power obtained from the high frequency signal of the input signal before encoding and the high frequency subband power generated based on the estimation coefficient.
The difference information may be encoded.
The high-frequency subband power calculating means adds the difference indicated in the difference information included in the encoded data to the high-frequency subband power obtained by the calculation using the feature amount and the estimation coefficient; the high frequency signal generating means generates the high frequency signal based on the low frequency subband signal and the high frequency subband power to which the difference has been added.
The estimation coefficient may be obtained by regression analysis using a least square method, in which the feature quantity is an explanatory variable and the high-frequency subband power is an explained variable.
The decoding apparatus may further include coefficient output means, wherein the index is information indicating a disparity vector composed of the disparity for each of the high-frequency subbands, wherein the difference vector has as an element a difference between the high frequency subband power obtained from the high frequency signal of the input signal before encoding and the high frequency subband power generated based on the estimation coefficient, the coefficient output means is configured to obtain a distance between a representative vector or value in a feature space of the difference and the difference vector indicated by the index, the representative vector or value has as an element the difference of the high-frequency subbands obtained in advance for each of the estimation coefficients, and supplying the representative vector or the representative value whose distance is the shortest among the plurality of the estimation coefficients to the high-frequency subband power calculating means.
The index may be information indicating an estimation coefficient, of a plurality of the estimation coefficients, that obtains the high frequency signal closest to the high frequency signal of the input signal before encoding as a result of comparison between the high frequency signal of the input signal before encoding and the high frequency signal generated based on the estimation coefficient.
The estimation coefficients may be obtained by regression analysis.
The generating means may generate the high frequency signal based on information obtained by decoding the encoded index.
The index may have been entropy coded.
The decoding method or program according to the third aspect includes: a demultiplexing step arranged to demultiplex the input encoded data into at least low frequency encoded data and an index; a low frequency decoding step arranged to decode the low frequency encoded data to generate a low frequency signal; a subband dividing step configured to divide a frequency band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal of each of the low frequency subbands; and a generating step arranged to generate the high frequency signal based on the index and the low frequency subband signal.
With regard to the third aspect of the present invention, input encoded data is demultiplexed into at least low frequency encoded data and an index; decoding the low frequency encoded data to generate a low frequency signal; dividing the frequency band of the low-frequency signal into a plurality of low-frequency sub-bands to generate a low-frequency sub-band signal of each low-frequency sub-band; and generating the high frequency signal based on the index and the low frequency subband signal.
The decoding device according to the fourth aspect of the present invention comprises: demultiplexing means configured to demultiplex input encoded data into low frequency encoded data and an index for obtaining an estimation coefficient used to generate a high frequency signal; a low frequency decoding device configured to decode the low frequency encoded data to generate a low frequency signal; a subband dividing means configured to divide a band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal for each of the low frequency subbands; feature amount calculation means configured to calculate a feature amount representing a feature of the encoded data using at least one of the low frequency signal and the low frequency subband signal; a high-frequency subband power calculating means configured to calculate a high-frequency subband power of a high-frequency subband signal of the high-frequency subband by multiplying the feature amount by an estimation coefficient determined by the index of a plurality of the estimation coefficients prepared in advance with respect to each of a plurality of high-frequency subbands constituting a band of the high-frequency signal and obtaining a sum of the feature amounts multiplied by the estimation coefficient; and a high frequency signal generating device configured to generate the high frequency signal using the high frequency subband power and the low frequency subband signal.
The feature amount calculation means may calculate a low frequency subband power of the low frequency subband signal of each of the low frequency subbands as the feature amount.
The index may be information for obtaining, of the plurality of estimation coefficients, a sum of squares of differences obtained for each of the high-frequency subbands, the difference being a difference between the high-frequency subband power obtained from a true value of the high-frequency signal and the high-frequency subband power generated using the estimation coefficient, becomes minimum.
The index may further include difference information representing a difference between the high-frequency subband power obtained from the true value and the high-frequency subband power generated using the estimation coefficient; the high-frequency subband power calculating means further adds the difference represented by the difference information included in the index to the high-frequency subband power obtained by obtaining a sum of the feature amounts multiplied by the estimation coefficient; and wherein the high frequency signal generating means generates the high frequency signal using the low frequency subband signal and the high frequency subband power to which the difference has been added by the high frequency subband power calculating means.
The index may be information indicating the estimation coefficient.
The index may be information obtained by entropy-encoding information indicating the estimation coefficient; the high-frequency subband power calculating means calculates the high-frequency subband power using an estimation coefficient indicated by information obtained by decoding the index.
The plurality of estimation coefficients may be obtained in advance by regression analysis using a least square method, with the feature quantity as an explanatory variable and the high-frequency subband power as an explained variable.
The decoding apparatus may further include coefficient output means, wherein the index is information indicating the disparity vector composed of the disparity for each of the high-frequency subbands, the difference vector having as an element a difference between the high frequency subband power obtained from a true value of the high frequency signal and the high frequency subband power generated using the estimation coefficient, the coefficient output means is configured to obtain a distance between a representative vector or value in a feature space of the difference and a difference vector indicated by the index, the representative vector or value has as an element the difference of the high-frequency subbands obtained in advance for each of the estimation coefficients, and the coefficient output means is further configured to supply the estimation coefficient of the representative vector or the representative value whose distance is the shortest among the plurality of estimation coefficients to the high-frequency subband power calculating means.
A decoding method or program according to a fourth aspect of the present invention includes: a demultiplexing step configured to demultiplex input encoded data into low frequency encoded data and an index for obtaining an estimation coefficient used to generate a high frequency signal; a low frequency decoding step arranged to decode the low frequency encoded data to generate a low frequency signal; a subband dividing step configured to divide a frequency band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal of each low frequency subband; a feature amount calculating step configured to calculate a feature amount representing a feature of the encoded data using at least one of the low frequency signal and the low frequency subband signal; a high-frequency subband power calculating step configured to calculate a high-frequency subband power of a high-frequency subband signal of the high-frequency subband by multiplying the feature amount by an estimation coefficient determined by the index of a plurality of the estimation coefficients prepared in advance with respect to each of a plurality of high-frequency subbands constituting a band of the high-frequency signal and obtaining a sum of the feature amounts multiplied by the estimation coefficient; and a high frequency signal generating step configured to generate the high frequency signal using the high frequency subband power and the low frequency subband signal.
With regard to the fourth aspect of the present invention, input encoded data is demultiplexed into low frequency encoded data and an index for obtaining an estimation coefficient used to generate a high frequency signal; decoding the low frequency encoded data to generate a low frequency signal; dividing the frequency band of the low-frequency signal into a plurality of low-frequency sub-bands to generate a low-frequency sub-band signal of each low-frequency sub-band; calculating a feature quantity representing a feature of the encoded data using at least one of the low frequency signal and the low frequency subband signal; calculating a high-frequency subband power of a high-frequency subband signal of the high-frequency subband by multiplying the feature amount by an estimation coefficient determined by the index of a plurality of the estimation coefficients prepared in advance with respect to each of a plurality of high-frequency subbands constituting a band of the high-frequency signal and obtaining a sum of the feature amounts multiplied by the estimation coefficient; and generating the high frequency signal using the high frequency subband power and the low frequency subband signal.
The invention has the advantages of
According to the first to fourth aspects of the present invention, a music signal can be played with higher sound quality due to the expansion of the frequency band.
Drawings
Fig. 1 is a diagram showing an example of a decoded low-frequency power spectrum and an estimated high-frequency envelope, which are used as input signals.
Fig. 2 is a diagram showing an example of an original power spectrum of an attack-type music signal accompanied by a time jump.
Fig. 3 is a block diagram showing a functional configuration example of the band expanding device according to the first embodiment of the present invention.
Fig. 4 is a flowchart describing an example of a band extending process by the band extending apparatus in fig. 3.
Fig. 5 is a diagram showing the power spectrum of a signal input into the band expanding device in fig. 3 and the location of band-pass filtering on the frequency axis.
Fig. 6 is a diagram showing an example of frequency characteristics of a musical piece and an estimated high-frequency power spectrum.
Fig. 7 is a diagram showing an example of a power spectrum of a signal input into the band expanding device of fig. 3.
Fig. 8 is a graph illustrating an example of a power spectrum of the input signal of fig. 7 after homomorphic filtering.
Fig. 9 is a block diagram showing a functional configuration example of a coefficient learning device that performs learning of coefficients used in the high-frequency signal generating circuit of the band extending device of fig. 3.
Fig. 10 is a flowchart describing an example of the coefficient learning process of the coefficient learning apparatus in fig. 9.
Fig. 11 is a block diagram showing a functional configuration example of an encoding apparatus according to the second embodiment of the present invention.
Fig. 12 is a flowchart by describing an example of the encoding process of the encoding apparatus in fig. 11.
Fig. 13 is a block diagram showing a functional configuration example of a decoding apparatus according to the second embodiment of the present invention.
Fig. 14 is a flowchart describing an example of a decoding process by the decoding apparatus in fig. 13.
Fig. 15 is a block diagram showing a functional configuration example of a coefficient learning device that performs learning of a representative vector used in the high-frequency encoding circuit of the encoding device in fig. 11 and a decoded high-frequency subband power estimating coefficient used in the high-frequency decoding circuit of the decoding device in fig. 13.
Fig. 16 is a flowchart describing an example of coefficient learning processing by the coefficient learning apparatus in fig. 15.
Fig. 17 is a diagram showing an example of a code string output by the encoding apparatus in fig. 11.
Fig. 18 is a block diagram showing a functional configuration example of an encoding device.
Fig. 19 is a flowchart describing the encoding process.
Fig. 20 is a block diagram showing a functional configuration example of a decoding apparatus.
Fig. 21 is a flowchart describing the decoding process.
Fig. 22 is a flowchart describing the encoding process.
Fig. 23 is a flowchart describing the decoding process.
Fig. 24 is a flowchart describing the encoding process.
Fig. 25 is a flowchart describing the encoding process.
Fig. 26 is a flowchart describing the encoding process.
Fig. 27 is a flowchart describing the encoding process.
Fig. 28 is a block diagram showing a configuration example of the coefficient learning apparatus.
Fig. 29 is a flowchart describing the coefficient learning process.
Fig. 30 is a block diagram showing a configuration example of computer hardware to which the process of the present invention is applied by program execution.
Detailed Description
There is provided according to an exemplary embodiment of the present disclosure a band extending apparatus including: signal dividing means configured to divide an input signal into a plurality of subband signals; feature amount calculation means configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the signal division means; a high-frequency subband power estimating means configured to calculate an estimated value of a high-frequency subband power that is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the feature amount calculating means; and high-frequency signal component generating means configured to generate high-frequency signal components based on the plurality of subband signals divided by the signal dividing means and the estimated value of the high-frequency subband power calculated by the high-frequency subband power estimating means; wherein the frequency band of the input signal is extended using the high-frequency signal component generated by the high-frequency signal component generating means.
There is provided a band extending method according to an exemplary embodiment of the present disclosure, including: a signal dividing step arranged to divide an input signal into a plurality of subband signals; a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the processing in the signal dividing step; a high-frequency subband power estimating step configured to calculate an estimated value of a high-frequency subband power, which is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the processing in the feature amount calculating step; and a high-frequency signal component generating step configured to generate a high-frequency signal component based on the plurality of subband signals divided by the processing in the signal dividing step and the estimated value of the high-frequency subband power calculated by the processing in the high-frequency subband power estimating step; wherein the frequency band of the input signal is extended using the high-frequency signal component generated by the processing in the high-frequency signal component generating step.
There is provided an encoding apparatus according to an exemplary embodiment of the present disclosure, including: a subband dividing means configured to divide an input signal into a plurality of subbands and generate a low-frequency subband signal composed of a plurality of subbands on a low-frequency side and a high-frequency subband signal composed of a plurality of subbands on a high-frequency side; feature amount calculation means configured to calculate a feature amount representing a feature of the input signal using at least one of the input signal and the low frequency subband signal generated by the subband dividing means; pseudo high frequency sub-band power calculation means configured to calculate a pseudo high frequency sub-band power as a pseudo power of the high frequency sub-band signal based on the feature amount calculated by the feature amount calculation means; pseudo high frequency sub-band power difference calculation means configured to calculate a high frequency sub-band power as a power of the high frequency sub-band signal from the high frequency sub-band signal generated by the sub-band division means, and calculate a pseudo high frequency sub-band power difference which is a difference with respect to the pseudo high frequency sub-band power calculated by the pseudo high frequency sub-band power calculation means; a high frequency encoding means configured to encode the pseudo high frequency sub-band power difference calculated by the pseudo high frequency sub-band power difference calculating means to generate high frequency encoded data; a low frequency encoding device configured to encode a low frequency signal that is a low frequency signal of the input signal to generate low frequency encoded data; and multiplexing means configured to multiplex the low frequency encoded data generated by the low frequency encoding means and the high frequency encoded data generated by the high frequency encoding means to obtain an output code string.
There is provided in accordance with an exemplary embodiment of the present disclosure an encoding method including: a subband dividing step configured to divide an input signal into a plurality of subbands and generate a low-frequency subband signal composed of a plurality of subbands on a low-frequency side and a high-frequency subband signal composed of a plurality of subbands on a high-frequency side; a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using at least one of the input signal and the low frequency subband signal generated by the processing in the subband dividing step; a pseudo high frequency sub-band power calculating step configured to calculate a pseudo high frequency sub-band power as a pseudo power of the high frequency sub-band signal based on the feature amount calculated by the processing in the feature amount calculating step; a pseudo high frequency sub-band power difference calculation step configured to calculate a high frequency sub-band power as a power of the high frequency sub-band signal from the high frequency sub-band signal generated by the processing in the sub-band division step, and calculate a pseudo high frequency sub-band power difference which is a difference with respect to the pseudo high frequency sub-band power calculated by the processing in the pseudo high frequency sub-band power calculation step; a high-frequency encoding step configured to encode the pseudo high-frequency subband power difference calculated in the processing in the pseudo high-frequency subband power difference calculating step to generate high-frequency encoded data; a low frequency encoding step configured to encode a low frequency signal that is a low frequency signal of the input signal to generate low frequency encoded data; and a multiplexing step configured to multiplex the low frequency encoded data generated by the processing in the low frequency encoding step and the high frequency encoded data generated by the processing in the high frequency encoding step to obtain an output code string.
There is provided a decoding apparatus according to an exemplary embodiment of the present disclosure, including: demultiplexing means configured to demultiplex input encoded data into at least low frequency encoded data and an index; a low frequency decoding device configured to decode the low frequency encoded data to generate a low frequency signal; a subband dividing device configured to divide a band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal for each of the low frequency subbands; and generating means configured to generate the high frequency signal based on the index and the low frequency subband signal.
There is provided a decoding method according to an exemplary embodiment of the present disclosure, including: a demultiplexing step arranged to demultiplex the input encoded data into at least low frequency encoded data and an index; a low frequency decoding step arranged to decode the low frequency encoded data to generate a low frequency signal; a subband dividing step configured to divide a frequency band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal of each of the low frequency subbands; and a generating step arranged to generate the high frequency signal based on the index and the low frequency subband signal.
There is provided a decoding apparatus according to an exemplary embodiment of the present disclosure, including: demultiplexing means configured to demultiplex input encoded data into low frequency encoded data and an index for obtaining an estimation coefficient used to generate a high frequency signal; a low frequency decoding device configured to decode the low frequency encoded data to generate a low frequency signal; a subband dividing means configured to divide a band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal for each of the low frequency subbands; feature amount calculation means configured to calculate a feature amount representing a feature of the encoded data using at least one of the low frequency signal and the low frequency subband signal; a high-frequency subband power calculating means configured to calculate a high-frequency subband power of a high-frequency subband signal of the high-frequency subband by multiplying the feature amount by an estimation coefficient determined by the index of a plurality of the estimation coefficients prepared in advance with respect to each of a plurality of high-frequency subbands constituting a band of the high-frequency signal and obtaining a sum of the feature amounts multiplied by the estimation coefficient; and a high frequency signal generating device configured to generate the high frequency signal using the high frequency subband power and the low frequency subband signal.
There is provided a decoding method according to an exemplary embodiment of the present disclosure, including: a demultiplexing step configured to demultiplex input encoded data into low frequency encoded data and an index for obtaining an estimation coefficient used to generate a high frequency signal; a low frequency decoding step arranged to decode the low frequency encoded data to generate a low frequency signal; a subband dividing step configured to divide a frequency band of the low frequency signal into a plurality of low frequency subbands to generate a low frequency subband signal of each low frequency subband; a feature amount calculating step configured to calculate a feature amount representing a feature of the encoded data using at least one of the low frequency signal and the low frequency subband signal; a high-frequency subband power calculating step configured to calculate a high-frequency subband power of a high-frequency subband signal of the high-frequency subband by multiplying the feature amount by an estimation coefficient determined by the index of a plurality of the estimation coefficients prepared in advance with respect to each of a plurality of high-frequency subbands constituting a band of the high-frequency signal and obtaining a sum of the feature amounts multiplied by the estimation coefficient; and a high frequency signal generating step configured to generate the high frequency signal using the high frequency subband power and the low frequency subband signal.
Embodiments of the present invention will be described with reference to the accompanying drawings. Note that the description will be given in the following order.
1. First embodiment (case of applying the present invention to a band expanding device)
2. Second embodiment (case of applying the present invention to an encoding apparatus and a decoding apparatus)
3. Third embodiment (case of including coefficient index in high frequency encoded data)
4. Fourth embodiment (case of including coefficient index and pseudo high frequency subband power difference in high frequency encoded data)
5. Fifth embodiment (case of selecting coefficient index using estimated value)
6. Sixth embodiment (case of sharing a part of coefficients)
<1. first embodiment >
According to the first embodiment, a process of expanding a frequency band (hereinafter referred to as a band expansion process) is performed on a decoded low-frequency signal component obtained by decoding encoded data encoded using a high-frequency deletion encoding method.
[ example of functional configuration of band expansion device ]
Fig. 3 shows a functional configuration example of a band extending apparatus to which the present invention is applied.
For the decoded low-frequency signal component as an input signal, the band expanding device 10 performs band expanding processing on its input signal, and outputs the band-expanded processed signal obtained as a result as an output signal.
The band expanding device 10 includes a low-pass filter 11, a delay circuit 12, a band-pass filter 13, a feature amount calculating circuit 14, a high-frequency subband power estimating circuit 15, a high-frequency signal generating circuit 16, a high-pass filter 17, and a signal adding unit 18.
The low-pass filter 11 filters the input signal at a predetermined cutoff frequency, and supplies a low-frequency signal component (i.e., a signal component of a low frequency) as a filtered signal to the delay circuit 12.
In order to synchronize when adding the low-frequency signal component from the low-pass filter 11 and a high-frequency signal component described later, the delay circuit 12 delays the low-frequency signal component by a certain delay time and then supplies it to the signal adding unit 18.
The band pass filters 13 include band pass filters 13-1 to 13-N each having a different pass band. The band pass filter 13-i (1. ltoreq. i. ltoreq.N) allows a predetermined pass band signal of the input signal to pass as one of the plurality of subband signals and supplies it to the characteristic amount calculating circuit 14 and the high frequency signal generating circuit 16.
The feature amount calculation circuit 14 calculates one or more feature amounts using at least one of the plurality of subband signals from the band pass filter 13 and the input signal, and supplies the feature amounts to the high frequency subband power estimation circuit 15. Now, the feature quantity is information representing a signal feature of the input signal.
The high-frequency subband power estimating circuit 15 calculates an estimated value of the high-frequency subband power (i.e., the power of the high-frequency subband signal) of each high-frequency subband based on one or more feature amounts from the feature amount calculating circuit 14, and supplies them to the high-frequency signal generating circuit 16.
The high-frequency signal generation circuit 16 generates high-frequency signal components (i.e., high-frequency signal components) based on the plurality of subband signals from the band-pass filter 13 and the estimated values of the plurality of subband powers from the high-frequency subband power estimation circuit 15, and supplies them to the high-pass filter 17.
The high-pass filter 17 filters the high-frequency signal component from the high-frequency signal generating circuit 16 at a cutoff frequency corresponding to the cutoff frequency in the low-pass filter 11, and supplies it to the signal adding unit 18.
The signal adding unit 18 adds the low-frequency signal component from the delay circuit 12 and the high-frequency signal component from the high-pass filter 17, and outputs it as an output signal.
Note that according to the configuration in fig. 3, the subband signal is obtained using the band pass filter 13, but the configuration is not limited to this, and for example, a band division filter such as that disclosed in PTL1 may be used.
In addition, similarly, according to the configuration in fig. 3, the subband signals are synthesized using the signal adding unit 18, but the configuration is not limited to this, and for example, a band synthesizing filter such as disclosed in PTL1 may be used.
[ band expansion processing of band expansion device ]
Next, a band extending process using the band extending apparatus in fig. 3 will be described with reference to a flowchart in fig. 4.
In step S1, the low-pass filter 11 filters the input signal at a predetermined cutoff frequency, and supplies a low-frequency signal component as a filtered signal to the delay circuit 12.
The low-pass filter 11 may set an optional frequency as a cutoff frequency, but according to the present embodiment, in which a predetermined frequency band is used as an extension start frequency band described later, the cutoff frequency is set to a frequency corresponding to the lower end of the extension start frequency band. Therefore, the low-pass filter 11 supplies a low-frequency signal component (a signal component having a frequency band lower than the extension start frequency band) as the filtered signal to the delay circuit 12.
In addition, the low-pass filter 11 may also set an optimum frequency as a cutoff frequency in accordance with an encoding parameter of the input signal, such as a high-frequency erasure encoding method, a bit rate, and the like. For example, side information used by the band extension method in PTL1 may be used as the encoding parameter.
In step S2, the delay circuit 12 delays the low-frequency signal component from the low-pass filter 11 by exactly a certain amount of delay time and supplies it to the signal adding unit 18.
In step S3, the band pass filter 13 (band pass filters 13-1 to 13-N) divides the input signal into a plurality of subband signals, and supplies each of the divided plurality of subband signals to the characteristic amount calculating circuit 14 and the high frequency signal generating circuit 16. Note that details of the process of dividing the input signal using the band-pass filter 13 will be described later.
In step S4, the characteristic amount calculation circuit 14 calculates one or more characteristic amounts using at least one of the input signal and the plurality of subband signals from the band pass filter 13, and supplies the characteristic amounts to the high frequency subband power estimation circuit 15. Note that details of the process of calculating the feature amount using the feature amount calculation circuit 14 will be described later.
In step S5, the high-frequency subband power estimating circuit 15 calculates estimated values of a plurality of high-frequency subband powers based on one or more feature amounts from the feature amount calculating circuit 14, and supplies the estimated values to the high-frequency signal generating circuit 16. Note that details of a process of calculating an estimated value of the high-frequency subband power using the high-frequency subband power estimating circuit 15 will be described later.
In step S6, the high-frequency signal generation circuit 16 generates high-frequency signal components based on the plurality of subband signals from the band-pass filter 13 and the estimated values of the plurality of high-frequency subband powers from the high-frequency subband power estimation circuit 15, and supplies the high-frequency signal components to the high-pass filter 17. The high-frequency signal component is here a signal component of a frequency band higher than the extension start frequency band. Note that details of the process of generating the high-frequency signal component using the high-frequency signal generating circuit 16 will be described later.
In step S7, the high-pass filter 17 filters the high-frequency signal component from the high-frequency signal generating circuit 16, thereby removing noise from the repetitive component to a low frequency included in the high-frequency signal component, and supplies the high-frequency signal component to the signal adding unit 18.
In step S8, the signal addition unit 18 adds the low-frequency signal component from the delay circuit 12 and the high-frequency signal component from the high-pass filter 17, and outputs the added signal as an output signal.
According to the above process, the frequency band can be extended for the decoded low-frequency signal component after decoding.
Next, details of the processing of each of steps S3 to S6 in the flowchart of fig. 4 will be described.
[ details of processing of band-pass filter ]
First, details of the processing of the band-pass filter 13 in step S3 of the flowchart of fig. 4 will be described.
Note that, for the sake of explanation, hereinafter, the number N of the band pass filters 13 will be: n = 4.
For example, one of 16 sub-bands obtained by dividing the nyquist frequency of the input signal into 16 equal parts may be set as the extension start band, and among the 16 sub-bands, each of four sub-bands whose frequency band is lower than the extension start band is set as the pass band of the band pass filters 13-1 to 13-4, respectively.
Fig. 5 shows the position of each pass band of the band pass filters 13-1 to 13-4 on the frequency axis.
As shown in fig. 5, if a first sub-band index of a high frequency from a frequency band (sub-band) having a frequency band lower than the extension start frequency band is denoted by sb, and a second sub-band index is denoted by sb-1 and an I-th sub-band index is denoted by sb- (I-1), each of the band pass filters 13-1 to 13-4 is assigned as a pass band of each sub-band having an index of sb to sb-3 among the sub-bands lower than the extension start frequency band.
Note that, according to the present invention, each pass band of the band pass filters 13-1 to 13-4 is described as a predetermined four pass bands among 16 sub bands obtained by dividing the nyquist frequency of the input signal into 16 equal parts, but not limited thereto, the pass band may be a predetermined four sub bands among 256 sub bands obtained by dividing the nyquist frequency of the input signal into 256 equal parts. In addition, the bandwidth of each of the band pass filters 13-1 to 13-4 may be different from each other.
[ details of processing of the characteristic amount calculating circuit ]
Next, details of the processing of the feature amount calculation circuit 14 in step S4 of the flowchart of fig. 4 will be described.
The feature amount calculation circuit 14 uses at least one of the input signal and the plurality of subband signals from the band pass filter 13 to calculate one or more feature amounts used by the high frequency subband power estimation circuit 15 to calculate a high frequency subband power estimated value.
More specifically, the characteristic amount calculation circuit 14 calculates the power of the subband signal of each subband (subband power (hereinafter, also referred to as low-frequency subband power)) as a characteristic amount from the four subband signals from the band pass filter 13, and supplies them to the high-frequency subband power estimation circuit 15.
That is, the feature amount calculation circuit 14 finds the low-frequency subband power within a certain predetermined time frame, which is called power (ib, J), from the four subband signals x (ib, n) supplied from the band pass filter 13 using the following expression (1). Here, ib denotes a subband index, and n denotes a discrete time index. Note that the number of samples for one frame is FSIZE, and the power is expressed in decibels.
[ expression 1]
(sb-3≤ib≤sb)
…(1)
Thus, the low frequency subband power (ib, J) found by the use characteristic amount calculating circuit 14 is supplied as a characteristic amount to the high frequency subband power estimating circuit 15.
[ details of processing using a high-frequency subband Power estimating Circuit ]
Next, details of the processing using the high-frequency subband power estimating circuit 15 in step S5 of the flowchart of fig. 4 will be described.
The high-frequency subband power estimating circuit 15 calculates an estimated value of subband power (high-frequency subband power) of a band (frequency expansion band) to be expanded above the subband (expansion start band) having the index sb +1 based on the four subband powers supplied from the characteristic amount calculating circuit 14.
That is, if it is assumed that the subband index of the highest band of the frequency extension band is eb, the high-frequency subband power estimating circuit 15 estimates the subband power of (eb-sb) subbands in which the indices are sb +1 to eb.
Estimation value power of sub-band power of frequency expansion frequency band with index of ibest(ib, J) uses the four sub-band powers power (ib, J) supplied from the characteristic amount calculation circuit 14, and may be expressed, for example, in the following expression (2).
[ expression 2]
(J*FSIZE≤n≤(J+1)FSIZE-1,sb+1≤ib≤eb)
…(2)
Now, in expression (2), the coefficient Aib(kb) and BibAre coefficients having different values for each subband ib. Coefficient Aib(kb) and BibAre coefficients that are appropriately set so that good values can be obtained for various input signals. In addition, the coefficient A is changed by changing the subband sbib(kb) and BibChanging to the optimum value. Note that the coefficient a will be described laterib(kb) and BibIs generated.
In expression (2), the high-frequency subband power estimated value is calculated in a linear combination using the power of each of the plurality of subband signals from the band pass filter 13, but the arrangement is not limited to this, and for example, the calculation may be performed using a linear combination of a plurality of low-frequency subband powers for several frames before and after the time frame J, or may be performed using a nonlinear function.
Thus, the high-frequency subband power estimated value calculated using the high-frequency subband power estimating circuit 15 is supplied to the high-frequency signal generating circuit 16.
[ details of processing by the high-frequency signal generating circuit ]
Next, details of the processing of the high-frequency signal generating circuit 16 in step S6 of the flowchart of fig. 4 will be described.
High frequency signalThe generation circuit 16 calculates the low-frequency subband power (ib, J) of each subband of the plurality of subband signals supplied from the band pass filter 13 based on the above expression (1). The high-frequency signal generation circuit 16 uses the calculated plurality of low-frequency subband powers power (ib, J) and the high-frequency subband power estimated value power calculated by the high-frequency subband power estimation circuit 15 based on the above expression (2)est(ib, J) to find the gain amount G (ib, J) according to the following expression (3).
[ expression 3]
(J*FSIZE≤n≤(J+1)FSIZE-1,sb+1≤ib≤eb)
…(3)
Now, in expression (3), sbmap(ib) in sub-bandib is a subband index of the mapping source in the case of the subband of the mapping destination, and is expressed by the following expression (4).
[ expression 4]
(sb+1≤ib≤eb)
…(4)
Note that in expression (4), int (a) is a function of rounding a decimal point of the value a.
Next, the high-frequency signal generation circuit 16 calculates a gain-adjusted subband signal x2 (ib, n) by multiplying the gain amount G (ib, J) found in expression (3) by the output of the band-pass filter 13 using expression (5) below.
[ expression 5]
x2(ib,n)=G(ib,J)×(sbmap(ib),n)
(J*FSIZE≤n≤(J+1)FSIZE-1,sb+1≤ib≤eb)
…(5)
In addition, the high-frequency signal generation circuit 16 calculates a gain-adjusted subband signal x3 (ib, n) subjected to cosine transform from the gain-adjusted subband signal x2 (ib, n) by performing cosine adjustment from a frequency corresponding to the lower end frequency of the subband indexed by 0 as sb-3 to a frequency corresponding to the upper end frequency of the subband indexed by sb using the following expression (6).
[ expression 6]
x3(ib,n)=x2(ib,n)*2cos(n)*{4(ib+1)π/32}
(Sb+1≤ib≤eb)
…(6)
Note that in expression (6), the circumferential ratio is shown. Expression (6) here indicates that the gain-adjusted subband signal x2 (ib, n) is frequency-shifted toward the high frequency side by four band values.
The high-frequency signal generation circuit 16 then calculates a high-frequency signal component x from the gain-adjusted subband signal x3 (ib, n) translated toward the high-frequency side using the following expression (7)high(n)。
[ expression 7]
Thus, the high-frequency signal generation circuit 16 generates a high-frequency signal component based on the four low-frequency subband powers calculated from the four subband signals from the band pass filter 13 and based on the high-frequency subband power estimated value from the high-frequency subband power estimation circuit 15, and supplies the high-frequency signal component to the high-pass filter 17.
According to the above processing, with respect to an input signal obtained after decoding encoded data encoded by the high-frequency deletion encoding method, by using low-frequency subband power calculated from a plurality of subband signals as a feature amount and calculating a high-frequency subband power estimated value based on the feature amount and a coefficient set appropriately, and generating a high-frequency signal component appropriately from the low-frequency subband power and the high-frequency subband power estimated value, it is possible to estimate frequency extension band subband power with high accuracy, and thus it is possible to play back a music signal with high sound quality.
The description has been given above for the following examples: the feature amount calculation circuit 14 calculates only the low-frequency subband power calculated from the plurality of subband signals as the feature amount, but in this case, depending on the type of the input signal, it may not be possible to estimate the subband power of the frequency extended band with high accuracy.
Thus, the feature amount calculation circuit 14 calculates a feature amount having a strong correlation with the form of the frequency-extended band sub-band power (the shape of the high-frequency power spectrum), whereby the frequency-extended band sub-band power can be estimated with high accuracy at the high-frequency sub-band power estimation circuit 15.
[ other examples of feature quantity calculated by the feature quantity calculation circuit ]
Fig. 6 shows an example of frequency characteristics of a musical piece with respect to a certain input signal in which the vocal music occupies a large part of the musical piece and a high-frequency power spectrum obtained by estimating high-frequency subband power by calculating only low-frequency subband power as a characteristic amount.
As shown in fig. 6, in the frequency characteristics of the vocal pieces, the estimated high-frequency power spectrum is generally positioned higher than that of the original signal. The uncomfortable feeling of the human singing voice is easily sensed by the human ear, and therefore, the high-frequency subband power estimation needs to be performed particularly accurately in the vocal music section.
In addition, as shown in fig. 6, in the frequency characteristic of the vocal pieces, a large depression is generally seen between 4.9kHz and 11.025 kHz.
Now, an example will be described below in which the degree of concavity between 4.9kHz and 11.025kHz in the frequency domain is used as a feature quantity for estimating the high-frequency subband power in a vocal piece. Note that the feature quantity indicating the degree of dishing is hereinafter referred to as a dishing (dip).
Next, a calculation example of the dip (J) in the time frame J will be described.
First, FFT (fast fourier transform) of 2048 points is performed for signals included in 2048 sample segments within a series of several frames (including frame J) before and after frame J of the input signal, and coefficients on the frequency axis are calculated. The power spectrum is obtained by db-transforming the absolute values of the calculated respective coefficients.
Fig. 7 shows an example of a power spectrum obtained according to the above description. Now, in order to remove the minute components of the power spectrum, a homomorphic filtering process is performed to remove components of a frequency of, for example, 1.3kHz or less. According to the homomorphic filtering process, the respective dimensions of the power spectrum can be regarded as a time series, and the filtering process is performed by applying a low-pass filter, thereby smoothing the fine components of the spectral peaks.
Fig. 8 shows an example of a power spectrum of an input signal after homomorphic filtering. In the homomorphic filtered power spectrum of fig. 8, the difference between the minimum value and the maximum value of the power spectrum included in the range corresponding to 4.9kHz to 11.025kHz is set as the dip (j).
Thus, a feature quantity having a strong correlation with the sub-band power of the frequency extension band is calculated. Note that the calculation example of the dip (j) is not limited to the above example, and another method may be used.
Next, another example of calculating a feature quantity having a strong correlation with the sub-band power of the frequency extension band will be described.
[ still another example of feature quantity calculated using a feature quantity calculation circuit ]
For the frequency characteristics of an attack section (i.e., a section including an attack-type music signal), the high-frequency side power spectrum is generally approximately flat in a certain input signal, as described with reference to fig. 2. In the method of calculating only low-frequency subband power as the feature amount, the frequency expansion band subband power is estimated, but the feature amount representing the temporal variation unique to the input signal including the attack section is not used, and therefore, it is difficult to estimate, with high accuracy, the approximately flat frequency expansion band subband power seen in the attack section, for example.
Thus, in the following, an example will be described in which the low-frequency subband power temporal variation is used as a feature quantity used in estimating the high-frequency subband power of an attack segment.
The time variation power of the low frequency sub-band power in a certain time frame J is found using, for example, the following expression (8)d(J)。
[ expression 8]
Time variation power of low frequency subband power according to expression (8)d(J) Represents a ratio of the sum of four low frequency subband powers in a time frame J to the sum of four low frequency subband powers in a time frame (J-1) (i.e., a frame previous to the time frame J), andthe larger the value, the larger the temporal variation of the power between frames, i.e., the more aggressive the signal included in the time frame J is considered to be.
Also, comparing the statistical average power spectrum shown in fig. 1 with the power spectrum in the attack section (attack-type music signal) shown in fig. 2, the power spectrum in the attack section rises toward the right side in the middle frequency. Such frequency signatures are typically shown in attack fragments.
Now, an example of applying the slope of the intermediate frequency as a feature quantity used in estimating the high-frequency subband power of the attack section will be described below.
The slope (J) in the intermediate frequency of a certain time frame J is obtained using, for example, the following expression (9).
[ expression 9]
In expression (9), the coefficient w (ib) is a weighting coefficient adjusted to be weighted by the high-frequency subband power. According to expression (9), slope (j) represents a ratio between the sum of four low frequency subband powers weighted by high frequencies and the sum of four low frequency subband powers. For example, in the case where the four low-frequency subband powers become powers corresponding to the mid-frequency subbands, slope (j) takes a larger value when the mid-frequency power spectrum rises to the right, and slope (j) takes a smaller value when it falls to the right.
In addition, in many cases, the intermediate frequency slope change is large before and after the attack section, whereby the slope time change s expressed by the following expression (10) can be madeloped(J) Is set as a feature quantity for estimating the high frequency subband power of the attack fragment.
[ expression 10]
sloped(J)=slope(J)/slope(J-1)
(J*FSIZE≤n≤(J+1)FSIZE-1)
…(10)
In addition, similarly, the temporal variation dip of the above-described depression dip (j) expressed in the following expression (11) may be similarlyd(J) Is set as a feature quantity for estimating the high frequency subband power of the attack fragment.
[ expression 11]
dipd(J)=dip(J)-dip(J-1)
(J*FSIZE≤n≤(J+1)FSIZE-1)
…(11)
According to the above method, the feature quantities having a strong correlation with the frequency extension band sub-band power are calculated, and therefore, by using these, estimation of the frequency extension band sub-band power using the high frequency sub-band power estimation circuit 15 can be performed with higher accuracy.
The above describes an example of calculating a feature quantity having a strong correlation with the frequency extension band sub-band power, and the following describes an example of estimating the high-frequency sub-band power using the feature quantity thus calculated.
[ details of processing using a high-frequency subband Power estimating Circuit ]
Now, an example of estimating the high-frequency subband power using the dip described with reference to fig. 8 and the low-frequency subband power as the feature amount will be described.
That is, in step S4 in the flowchart of fig. 4, the feature amount calculation circuit 14 calculates low-frequency subband power and a dip as feature amounts for each subband from the four subband signals from the band pass filters 13, and supplies these to the high-frequency subband power estimation circuit 15.
In step S5, the high-frequency subband power estimating circuit 15 calculates an estimated value of the high-frequency subband power based on the dip and the four low-frequency subband powers from the feature amount calculating circuit 14.
Now, regarding the subband powers and dips, since the ranges (scales) of values that can be taken are different, the high-frequency subband power estimating circuit 15 performs a transform of the dip value as shown below, for example.
The high-frequency subband power estimating circuit 15 calculates the maximum frequency subband power of the four low-frequency subband powers and the dip value in advance for a large number of input signals, and finds the average value and the standard deviation for each. Now, the average of the sub-band powers is poweraveMeans that the standard deviation of the sub-band power is expressed in powerstdIndicating that the average value of the dip is dipaveIndicating that the standard deviation of sag is dipstdAnd (4) showing.
The high-frequency subband power estimating circuit 15 uses these values to transform the dip value dip (j) in the following expression (12), and obtains the post-transform dips(J)。
[ expression 12]
By performing the transform shown in expression (12), the high-frequency subband power estimating circuit 15 can transform the dip value dip (j) into the variable (dip) dips(J) The variable dips(J) Equivalent to the statistical mean and dispersion (dispersion) of the low frequency subband powers and may cause the range over which the dip may be taken to be approximately the same as the range over which the subband powers may be taken.
From the characteristic amount calculation circuit 14 and the dip shown in expression (12)s(J) Using a linear combination of four low frequency subband powers power (ib, J), the estimate power of the subband power indexed by ib in the band extension band is usedest(ib, J) is expressed, for example, by the following expression (13).
Expression [13]
(J*FSIZE≤n≤(J+1)FSIZE-1,sb+1≤ib≤eb)
…(13)
Now, in expression (13), coefficient Cib(kb)、DibAnd EibAre coefficients having different values for each subband ib. Coefficient Cib(kb)、DibAnd EibAre coefficients that are appropriately set so that good values can be obtained with respect to various input signals. In addition, depending on the variation of the sub-band sb, the coefficient Cib(kb)、DibAnd EibBut also to an optimum value. Note that the coefficient C will be described laterib(kb)、DibAnd EibIs generated.
In expression (13), the high-frequency subband power estimated value is calculated using a linear combination, but not limited to this, and may be calculated using a linear combination of a plurality of feature quantities of several frames before and after the time frame J, or may be calculated using, for example, a nonlinear function.
According to the above processing, in the estimation of the high-frequency subband power, the dip value specific to the vocal piece is used as the feature amount, whereby it is possible to improve the accuracy of the estimation of the high-frequency subband power of the vocal piece and reduce the uncomfortable feeling that is easily sensed by the human ear and is generated by the method in which the high-frequency power spectrum is estimated to be larger than the high-frequency power spectrum of the original signal using only the low-frequency subband power as the feature amount, as compared with the case where only the low-frequency subband power is the feature amount, whereby it is possible to play the music signal at a higher sound quality.
Now, regarding the sag (the sag in the vocal piece frequency feature) calculated as the feature amount using the above-described method, in the case where the number of subband divisions is 16, the frequency resolution is low, and therefore, the sag here cannot be expressed using only the low-frequency subband power.
Now, by increasing the number of subband divisions (for example, to 16 times, that is, 256 divisions), increasing the number of subband divisions (for example, to 16 times, that is, 64 divisions) using the band pass filter 13, and increasing the number of low frequency subband powers (for example, to 16 times, that is, 64 divisions) calculated using the feature amount calculating circuit 14, the frequency resolution can be improved, and the degree of concavity here can be expressed using only the low frequency subband powers.
Thus, it is conceivable that the high-frequency subband power may be estimated using only the low-frequency subband power with approximately the same accuracy as the estimation of the high-frequency subband power using the above-described dip as the feature amount.
However, by increasing the number of subband divisions, the number of band divisions, and the number of low frequency subband powers, the amount of computation increases. If we consider that the high-frequency subband power can be estimated with similar accuracy for each method, the method of estimating the high-frequency subband power without increasing the number of subband divisions and using the dip as the feature amount is more efficient from the viewpoint of the amount of calculation.
The above description has given the method of estimating the high-frequency subband power using the dip and the low-frequency subband power, but the feature amount used in the estimation of the high-frequency subband power is not limited to this combination, and one or more of the above-described feature amounts (low-frequency subband power, dip, low-frequency subband power temporal change, slope, temporal change of slope, and temporal change of dip) may be used. Thus, the accuracy of estimating the high-frequency subband power can be further improved.
In addition, as described above, in the input signal, by using a parameter specific to a segment in which it is difficult to estimate the high-frequency subband power due to the feature amount for estimating the high-frequency subband power, the estimation accuracy of the segment can be improved. For example, the low-frequency subband power temporal change, the slope, the temporal change in the slope, and the temporal change in the dip are parameters specific to the attack section, and by using these parameters as the feature quantities, the estimation accuracy of the high-frequency subband power in the attack section can be improved.
Note that, in the case where the estimation of the high-frequency subband power is performed using the feature quantities other than the low-frequency subband power and the dip (i.e., using the low-frequency subband power temporal change, the slope, the temporal change of the slope, and the temporal change of the dip), the high-frequency subband power can be estimated using the same method as described above.
Note that each calculation method of the feature amount shown here is not limited to the above-described method, but other methods may be used.
[ obtaining coefficient Cib(kb)、Dib、EibMethod (2)]
Next, solving for the coefficient C in the above expression (13) will be describedib(kb)、DibAnd EibThe method of (1).
As a coefficient of calculation Cib(kb)、DibAnd EibThe method (2) uses the following method: learning is performed in advance using a teaching signal having a wide band (hereinafter referred to as a wide-band teaching signal) such that the coefficient C in estimating the frequency-extended band sub-band powerib(kb)、DibAnd EibMay be good values for various input signals and may be determined based on the learning result.
In executing the learning coefficient Cib(kb)、DibAnd EibUsing a filter having a band pass similar to that described with reference to figure 5A coefficient learning device of a band pass filter of a pass band width of the devices 13-1 to 13-4 having a frequency higher than the extension start band. When the broadband teaching signal is input, the coefficient learning device performs learning.
[ functional configuration example of coefficient learning device ]
FIG. 9 shows the execution coefficient Cib(kb)、DibAnd EibThe function configuration example of the learned coefficient learning device of (1).
As for the signal component having a frequency lower than the extension start band of the broadband teaching signal inputted into the coefficient learning device 20 in fig. 9, it is advantageous that the band-limited input signal inputted into the band extending device 10 in fig. 3 is a signal encoded in the same format as the encoding format performed at the time of encoding.
The coefficient learning device 20 includes a band-pass filter 21, a high-frequency subband power calculating circuit 22, a feature amount calculating circuit 23, and a coefficient estimating circuit 24.
The band pass filters 21 include band pass filters 21-1 to 21- (K + N) each having a different pass band. The band-pass filter 21-i (1. ltoreq. i. ltoreq. K + N) allows a predetermined pass-band signal of the input signal to pass through and supplies it as one of a plurality of subband signals to the high-frequency subband power calculating circuit 22 or the feature amount calculating circuit 23. Note that the band pass filters 21-1 to 21-K of the band pass filters 21-1 to 21- (K + N) allow signals having a frequency higher than the extension start band to pass.
The high-frequency subband power calculating circuit 22 calculates the high-frequency subband power of each subband for each specific time frame for the plurality of high-frequency subband signals from the band pass filter 21, and supplies them to the coefficient estimating circuit 24.
The feature amount calculation circuit 23 calculates the same feature amount as the feature amount calculated by the feature amount calculation circuit 14 of the band extending apparatus 10 in fig. 3 for each of the same specific time frames as the high frequency sub-band power is calculated by the high frequency sub-band power calculation circuit 22. That is, the characteristic amount calculation circuit 23 calculates one or more characteristic amounts using at least one of the broadband teaching signal and the plurality of subband signals from the band pass filter 21, and supplies the characteristic amounts to the coefficient estimation circuit 24.
The coefficient estimation circuit 24 estimates, for each specific time frame, a coefficient used by the high frequency subband power estimation circuit 15 of the band extending apparatus 10 in fig. 3 based on the high frequency subband power from the high frequency subband power calculation circuit 22 and the feature amount from the feature amount calculation circuit 23.
[ coefficient learning processing of coefficient learning apparatus ]
Next, the coefficient learning process of the coefficient learning apparatus in fig. 9 will be described with reference to the flowchart in fig. 10.
In step S11, the band-pass filter 21 divides the input signal (broadband teaching signal) into (K + N) sub-band signals. The band pass filters 21-1 to 21-K supply a plurality of subband signals having a frequency higher than the extension start band to the high frequency subband power calculating circuit 22. In addition, the band pass filters 21- (K + 1) to 21- (K + N) supply a plurality of subband signals having frequencies lower than the extension start band to the characteristic amount calculating circuit 23.
In step S12, the high frequency subband power calculating circuit 22 calculates the high frequency subband power (ib, J) of each subband for each specific time frame for the plurality of high frequency subband signals from the band pass filters 21 (band pass filters 21-1 to 21-K). The high-frequency subband power (ib, J) is found using expression (1) above. The high-frequency subband power calculating circuit 22 supplies the calculated high-frequency subband power to the coefficient estimating circuit 24.
In step S13, the characteristic amount calculation circuit 23 calculates the characteristic amount for each time frame that is the same as the specific time frame for which the high frequency sub-band power calculation circuit 22 calculates the high frequency sub-band power.
Note that, in the feature amount calculation circuit 14 of the band extending apparatus 10 in fig. 3, a description is given below of calculating four low-frequency subband powers and dips assuming that the four low-frequency subband powers and dips are calculated as feature amounts, and similarly to the feature amount calculation circuit 23 of the coefficient learning apparatus 20.
That is, the feature amount calculation circuit 23 calculates four low-frequency subband powers using four subband signals from the band pass filters 21 (band pass filters 21- (K + 1) to 21- (K + 4)) each having the same frequency band as the four subband signals input into the feature amount calculation circuit 14 of the band expanding device 10. In addition, the characteristic amount calculating circuit 23 calculates the dip from the broadband teaching signal, and calculates the dip dips (j) based on the above expression (12). The feature amount calculation circuit 23 supplies the calculated four low frequency subband powers and dip dips (j) as feature amounts to the coefficient estimation circuit 24.
In step S14, the coefficient estimation circuit 24 bases on (eb-sb) high-frequency subband powers and feature quantities (four low-frequency subband powers and a dip) supplied for the same time frame from the high-frequency subband power calculation circuit 22 and the feature quantity calculation circuit 23s(J) ) to perform coefficient Cib(kb)、DibAnd EibIs estimated. For example, the coefficient estimation circuit 24 sets five feature quantities (four low-frequency subband powers and a dip) for a certain high-frequency subbands(J) Is used) as an explanatory variable, and sets the high-frequency subband power (ib, J) as an explained variable and performs regression analysis using the least square method, thereby determining the coefficient C in expression (13)ib(kb)、DibAnd Eib。
Note that, needless to say, the coefficient Cib(kb)、DibAnd EibThe estimation method of (2) is not limited to the above-described method, but various types of general parameter identification methods may be used.
According to the above-described processing, learning of coefficients for estimating high-frequency subband powers is performed in advance using a broadband teaching signal, whereby a good output result can be obtained for various input signals input in the band extending apparatus 10, and therefore, music signals can be played with higher sound quality.
Note that the coefficient a in the above expression (2) may also be obtained using the above coefficient learning methodib(kb) and Bib。
The above-described coefficient learning process occurs on the premise that: in the high-frequency subband power estimating circuit 15 of the band extending apparatus 10, each estimated value of the high-frequency subband power is calculated using a linear combination of the four low-frequency subband powers and the dip. However, the high-frequency subband power estimating method in the high-frequency subband power estimating circuit 15 is not limited to the above-described example, and for example, the feature amount calculating circuit 14 may calculate one or more feature amounts (low-frequency subband power temporal change, slope temporal change, and dip temporal change) other than the dip to calculate the high-frequency subband power, or may use a linear combination of a plurality of feature amounts of a plurality of frames before and after the time frame J, or may use a nonlinear function. That is, in the coefficient learning process, the coefficient estimation circuit 24 should be able to calculate (learn) the coefficient under similar conditions that refer to: conditions similar to those of the feature quantity, time frame, and function used when calculating the high frequency subband power using the high frequency subband power estimating circuit 15 of the band extending apparatus 10.
< 2> second embodiment
With the second embodiment, the encoding process and the decoding process using the high-frequency characteristic encoding method are performed using the encoding apparatus and the decoding apparatus.
[ example of functional configuration of encoding apparatus ]
Fig. 11 shows a functional configuration example of an encoding device to which the present invention is applied.
The encoding apparatus 30 includes a low-pass filter 31, a low-frequency encoding circuit 32, a sub-band dividing circuit 33, a feature amount calculating circuit 34, a pseudo high-frequency sub-band power calculating circuit 35, a pseudo high-frequency sub-band power difference calculating circuit 36, a high-frequency encoding circuit 37, a multiplexing circuit 38, and a low-frequency decoding circuit 39.
The low-pass filter 31 filters the input signal at a predetermined cutoff frequency, and supplies a signal having a frequency lower than the cutoff frequency (hereinafter referred to as a low-frequency signal) as a filtered signal to the low-frequency encoding circuit 32, the subband dividing circuit 33, and the feature amount calculating circuit 34.
The low frequency encoding circuit 32 encodes the low frequency signal from the low pass filter 31, and supplies the low frequency encoded data obtained as a result to the multiplexing circuit 38 and the low frequency decoding circuit 39.
The sub-band division circuit 33 divides the low frequency signal from the input signal and the low pass filter 31 into a plurality of sub-band signals having an equal equalization of a predetermined bandwidth, and supplies these sub-band signals to the characteristic amount calculation circuit 34 or the pseudo high frequency sub-band power difference calculation circuit 36. More specifically, the subband dividing circuit 33 supplies a plurality of subband signals obtained from the low frequency signal as an input (hereinafter referred to as low frequency subband signals) to the characteristic amount calculating circuit 34. In addition, the sub-band division circuit 33 supplies, as an input, a sub-band signal (hereinafter referred to as a high-frequency sub-band signal) having a frequency higher than the cutoff frequency set by the low-pass filter 31, of a plurality of sub-band signals obtained from the input signal, to the pseudo high-frequency sub-band power difference calculation circuit 36.
The feature amount calculation circuit 34 calculates one or more feature amounts using at least one of the low frequency signal from the low pass filter 31 or the plurality of sub-band signals of the low frequency sub-band signal from the sub-band division circuit 33, and supplies the feature amounts to the pseudo high frequency sub-band power calculation circuit 35.
The pseudo high frequency sub-band power calculation circuit 35 generates pseudo high frequency sub-band power based on one or more feature amounts from the feature amount calculation circuit 34, and supplies the pseudo high frequency sub-band power to the pseudo high frequency sub-band power difference calculation circuit 36.
The pseudo high frequency sub-band power difference calculation circuit 36 calculates a pseudo high frequency sub-band power difference described later based on the high frequency sub-band signal from the sub-band division circuit 33 and the pseudo high frequency sub-band power from the pseudo high frequency sub-band power calculation circuit 35, and supplies the pseudo high frequency sub-band power difference to the high frequency encoding circuit 37.
The high frequency encoding circuit 37 encodes the pseudo high frequency sub-band power difference from the pseudo high frequency sub-band power difference calculation circuit 36, and supplies the high frequency encoded data obtained as a result to the multiplexing circuit 38.
The multiplexing circuit 38 multiplexes the low frequency encoded data from the low frequency encoding circuit 32 and the high frequency encoded data from the high frequency encoding circuit 37, and outputs the multiplexed data as an output code string.
The low frequency decoding circuit 39 optionally decodes the low frequency encoded data from the low frequency encoding circuit 32, and supplies the decoded data obtained as a result to the subband dividing circuit 33 and the feature amount calculating circuit 34.
[ encoding processing of encoding apparatus ]
Next, an encoding process using the encoding apparatus 30 in fig. 11 will be described with reference to the flowchart in fig. 12.
In step S111, the low-pass filter 31 filters the input signal at a predetermined cutoff frequency, and supplies a low-frequency signal as a filtered signal to the low-frequency encoding circuit 32, the subband dividing circuit 33, and the feature amount calculating circuit 34.
In step S112, the low frequency encoding circuit 32 encodes the low frequency signal from the low pass filter 31, and supplies the low frequency encoded data obtained as a result to the multiplexing circuit 38.
Note that, regarding the encoding of the low-frequency signal in step S112, it is sufficient to select an appropriate encoding format in accordance with the circuit scale and the encoding efficiency to be found, and the present invention does not depend on the encoding format.
In step S113, the subband dividing circuit 33 equally divides the input signal and the low frequency signal into a plurality of subband signals having a predetermined bandwidth. The subband dividing circuit 33 supplies a low-frequency subband signal obtained from the low-frequency signal as an input to the feature amount calculating circuit 34. In addition, of the plurality of sub-band signals obtained from the input signal as an input, the sub-band division circuit 33 supplies a high-frequency sub-band signal having a frequency band higher than the band-limited frequency set by the low-pass filter 31 to the pseudo high-frequency sub-band power difference calculation circuit 36.
In step S114, the feature amount calculation circuit 34 calculates one or more feature amounts using at least one of the plurality of sub-band signals of the low frequency sub-band signal from the low pass filter 31 or the low frequency sub-band signal from the sub-band division circuit 33, and supplies the feature amounts to the pseudo high frequency sub-band power calculation circuit 35. Note that the feature amount calculating circuit 34 in fig. 11 basically has the same configuration and function as the feature amount calculating circuit 14 in fig. 3, and therefore the processing in step S114 is basically the same as the processing in step S4 in the flowchart in fig. 4, and thus detailed description thereof will be omitted.
In step S115, the pseudo high frequency sub-band power calculation circuit 35 generates pseudo high frequency sub-band power based on the one or more feature amounts from the feature amount calculation circuit 34, and supplies the pseudo high frequency sub-band power to the pseudo high frequency sub-band power difference calculation circuit 36. Note that the pseudo high frequency subband power calculating circuit 35 in fig. 11 basically has the same configuration and function as the high frequency subband power estimating circuit 15 in fig. 3, and the processing in step S115 is basically the same as the processing in step S5 of the flowchart in fig. 4, so detailed description will be omitted.
In step S116, the pseudo high frequency sub-band power difference calculation circuit 36 calculates a pseudo high frequency sub-band power difference based on the high frequency sub-band signal from the sub-band division circuit 33 and the pseudo high frequency sub-band power from the pseudo high frequency sub-band power calculation circuit 35, and supplies the pseudo high frequency sub-band power difference to the high frequency encoding circuit 37.
More specifically, the pseudo high-frequency subband power difference calculating circuit 36 calculates the (high-frequency) subband power (ib, J) of the high-frequency subband signal from the subband dividing circuit 33 within a certain time frame J. Note that, according to the present embodiment, the subbands of the low frequency subband signal and the subbands of the high frequency subband signal are all identified using the index ib. The calculation method of the subband power may be a method similar to that of the first embodiment, that is, the method for expression (1) may be applied.
Then, the pseudo high frequency subband power difference calculating circuit 36 obtains the high frequency subband power (ib, J) in the time frame J and the pseudo high frequency subband power from the pseudo high frequency subband power calculating circuit 351hDifference between (ib, J) powerdiff(ib, J) (pseudo high frequency sub-band power difference). Pseudo high frequency sub-band power difference powerdiff(ib, J) is found using the following expression (14).
[ expression 14]
powerdiff(ib,J)=power(ib,J)-powerlh(ib,J)
(J*FSIZE≤n≤(J+1)FSIZE-1,sb+1≤ib≤eb)
…(14)
In expression (14), the index sb +1 represents the minimum frequency subband index in the high frequency subband signal. In addition, the index eb denotes the maximum frequency subband index in the high frequency subband signal.
Thus, the pseudo high frequency sub-band power difference calculated using the pseudo high frequency sub-band power difference calculation circuit 36 is supplied to the high frequency encoding circuit 37.
In step S117, the high frequency encoding circuit 37 encodes the pseudo high frequency subband power difference from the pseudo high frequency subband power difference calculating circuit 36, and supplies the high frequency encoded data obtained as a result to the multiplexing circuit 38.
More specifically, the high frequency encoding circuit 37 doesIt is determined to which cluster of the plurality of clusters in the feature space of the preset pseudo high frequency sub-band power difference the vectorized pseudo high frequency sub-band calculation difference from the pseudo high frequency sub-band power difference calculation circuit 36 (hereinafter referred to as pseudo high frequency sub-band power difference vector) should belong. Now, the pseudo high frequency subband power difference vector in a certain time frame J represents the pseudo high frequency subband power difference power for each index ibdiffThe value of (ib, J) is taken as a vector of (eb-sb) dimensions of the elements of the vector. In addition, the eigenspace for pseudo high frequency subband power differences also has a space of dimension (eb-sb).
In the feature space for pseudo high frequency sub-band power difference, the high frequency encoding circuit 37 measures the distance between each representative vector of a plurality of preset clusters and the pseudo high frequency sub-band power difference vector, and finds an index for the cluster having the shortest distance (hereinafter referred to as pseudo high frequency sub-band power difference ID) and supplies it to the multiplexing circuit 38 as high frequency encoded data.
In step S118, the multiplexing circuit 38 multiplexes the low frequency encoded data output from the low frequency encoding circuit 32 and the high frequency encoded data output from the high frequency encoding circuit 37, and outputs an output code string.
Now, as for an encoding apparatus for the high frequency characteristic encoding method, a technique is disclosed in japanese unexamined patent application publication No.2007-17908 in which a pseudo high frequency subband signal is generated from a low frequency subband signal, a pseudo high frequency subband signal power and a high frequency subband signal power are compared for each subband, a power gain of each subband is calculated to match the pseudo high frequency subband signal power and the high frequency subband signal power, and the power gain is included in a code string as high frequency characteristic information.
On the other hand, according to the above-described process, at the time of decoding, only the pseudo high frequency subband power difference ID has to be included in the output code string as information for estimating the high frequency subband power. That is, in the case where the number of clusters preset is 64, for example, as information for decoding a high frequency signal using a decoding apparatus, only 6-bit information has to be added to a code string for one time frame, the amount of information included in the code string can be reduced, the encoding efficiency can be improved, and thus a music signal can be played with higher sound quality, as compared with the method disclosed in japanese unexamined patent application publication No. 2007-17908.
In addition, regarding the above-described processing, if there is a margin (leeway) in the calculation amount, the low frequency decoding circuit 39 may input the low frequency signal obtained by decoding the low frequency encoded data from the low frequency encoding circuit 32 to the subband dividing circuit 33 and the feature amount calculating circuit 34. For the decoding process by the decoding apparatus, a feature amount is calculated from a low frequency signal obtained by decoding low frequency encoded data, and a high frequency subband power is estimated based on the feature amount. Therefore, with respect to the encoding process that also includes the pseudo high-frequency subband power difference ID calculated based on the feature amount calculated from the decoded low-frequency signal in the code string, it is made possible to estimate the high-frequency subband power with higher accuracy in the decoding process using the decoding apparatus. Therefore, the music signal can be played with a higher sound quality.
[ example of functional configuration of decoding apparatus ]
Next, a functional configuration example of a decoding device corresponding to the encoding device 30 in fig. 11 will be described with reference to fig. 13.
The decoding device 40 includes a demultiplexing circuit 41, a low-frequency decoding circuit 42, a subband dividing circuit 43, a feature amount calculating circuit 44, a high-frequency decoding circuit 45, a decoded high-frequency subband power calculating circuit 46, a decoded high-frequency signal generating circuit 47, and a synthesizing circuit 48.
The demultiplexing circuit 41 demultiplexes the input code string into high frequency encoded data and low frequency encoded data, and supplies the low frequency encoded data to the low frequency decoding circuit 42 and the high frequency encoded data to the high frequency decoding circuit 45.
The low frequency decoding circuit 42 performs decoding of the low frequency encoded data from the demultiplexing circuit 41. The low frequency decoding circuit 42 supplies a low frequency signal obtained as a result of decoding (referred to herein as a decoded low frequency signal) to the subband dividing circuit 43, the feature amount calculating circuit 44, and the synthesizing circuit 48.
The subband dividing circuit 43 equally divides the decoded low frequency signal from the low frequency decoding circuit 42 into a plurality of subband signals having a predetermined bandwidth, and supplies the obtained subband signal (decoded low frequency subband signal) to the characteristic amount calculating circuit 44 and the decoded high frequency signal generating circuit 47.
The feature amount calculation circuit 44 calculates one or more feature amounts using at least one of the plurality of subband signals of the decoded low frequency subband signal from the low frequency decoding circuit 42 and the decoded low frequency subband signal from the subband dividing circuit 43, and supplies the feature amounts to the decoded high frequency subband power calculation circuit 46.
The high frequency decoding circuit 45 performs decoding of the high frequency encoded data from the demultiplexing circuit 41, and supplies a coefficient for estimating high frequency subband power (hereinafter referred to as decoded high frequency subband power estimating coefficient) prepared in advance for each ID (index) to the decoded high frequency subband power calculating circuit 46 using the pseudo high frequency subband power difference ID obtained as a result.
The decoded high-frequency subband power calculating circuit 46 calculates a decoded high-frequency subband power based on the one or more feature amounts from the feature amount calculating circuit 44 and the decoded high-frequency subband power estimating coefficient from the high-frequency decoding circuit 45, and supplies the decoded high-frequency subband power to the decoded high-frequency signal generating circuit 47.
The decoded high-frequency signal generation circuit 47 generates a decoded high-frequency signal based on the decoded low-frequency subband signal from the subband dividing circuit 43 and the decoded high-frequency subband power from the decoded high-frequency subband power calculation circuit 46, and supplies the decoded high-frequency signal to the synthesis circuit 48.
The synthesis circuit 48 synthesizes the decoded low frequency signal from the low frequency decoding circuit 42 and the decoded high frequency signal from the decoded high frequency signal generation circuit 47, and outputs the synthesized signal as an output signal.
[ decoding processing of decoding apparatus ]
Next, a decoding process using the decoding apparatus in fig. 13 will be described with reference to the flowchart in fig. 14.
In step S131, the demultiplexing circuit 41 demultiplexes the input code string into high frequency encoded data and low frequency encoded data, supplies the low frequency encoded data to the low frequency decoding circuit 42, and supplies the high frequency encoded data to the high frequency decoding circuit 45.
In step S132, the low frequency decoding circuit 42 performs decoding of the low frequency encoded data from the demultiplexing circuit 41, and supplies the decoded low frequency signal obtained as a result to the subband dividing circuit 43, the feature amount calculating circuit 44, and the synthesizing circuit 48.
In step S133, the subband dividing circuit 43 equally divides the decoded low frequency signal from the low frequency decoding circuit 42 into a plurality of subband signals having a predetermined bandwidth, and supplies the obtained decoded low frequency subband signal to the characteristic amount calculating circuit 44 and the decoded high frequency signal generating circuit 47.
In step S134, the feature amount calculation circuit 44 calculates one or more feature amounts from at least one of the plurality of subband signals of the decoded low frequency signal from the low frequency decoding circuit 42 and the decoded low frequency subband signal from the subband dividing circuit 43, and supplies the feature amounts to the decoded high frequency subband power calculation circuit 46. Note that the feature amount calculating circuit 44 in fig. 13 basically has the same configuration and function as the feature amount calculating circuit 14 in fig. 3, and the processing in step S134 is basically the same as the processing in step S4 of the flowchart in fig. 4, and therefore, detailed description thereof will be omitted.
In step S135, the high frequency decoding circuit 45 performs decoding of the high frequency encoded data from the demultiplexing circuit 41, and supplies the decoded high frequency subband power estimating coefficient prepared in advance for each ID (index) to the decoded high frequency subband power calculating circuit 46 using the pseudo high frequency subband power difference ID obtained as a result.
In step S136, the decoded high-frequency subband power calculating circuit 46 calculates the decoded high-frequency subband power based on the one or more feature amounts from the feature amount calculating circuit 44 and the decoded high-frequency subband power estimating coefficient from the high-frequency decoding circuit 45. Note that the decoded high-frequency subband power calculating circuit 46 in fig. 13 has substantially the same configuration and function as the high-frequency subband power estimating circuit 15 in fig. 3, and the processing in step S136 is substantially the same as the processing in step S5 of the flowchart in fig. 4, and therefore, detailed description thereof will be omitted.
In step S137, the decoded high-frequency signal generation circuit 47 outputs a decoded high-frequency signal based on the decoded low-frequency subband signal from the subband dividing circuit 43 and the decoded high-frequency subband power from the decoded high-frequency subband power calculation circuit 46. Note that the decoded high-frequency signal generating circuit 47 in fig. 13 has substantially the same configuration and function as the high-frequency signal generating circuit 16 in fig. 3, and the processing in step S137 is substantially the same as the processing in step S6 of the flowchart in fig. 4, and thus detailed description thereof will be omitted.
In step 138, the synthesis circuit 48 synthesizes the decoded low frequency signal from the low frequency decoding circuit 42 and the decoded high frequency signal from the decoded high frequency signal generation circuit 47, and outputs it as an output signal.
According to the above-described processing, by using, at the time of decoding, a high-frequency subband power estimating coefficient corresponding to a feature of a difference between the pseudo high-frequency subband power and the actual high-frequency subband power calculated in advance at the time of encoding, it is possible to improve the accuracy of estimating the high-frequency subband power at the time of decoding, and thus it is possible to play back a music signal with higher sound quality.
In addition, according to the above-described processing, the only information included in the code string for generating the high frequency signal is the pseudo high frequency subband power difference ID, which is not so much, and therefore, the decoding processing can be efficiently performed.
The above description has been made for the encoding process and the decoding process to which the present invention is applied, but a representative vector of each of a plurality of clusters in the feature space of the pseudo high-frequency subband power difference preset for the high-frequency encoding circuit 37 of the encoding apparatus 30 in fig. 11 and a calculation method of the decoded high-frequency subband power estimation coefficient output by the high-frequency decoding circuit 45 of the decoding apparatus 40 in fig. 13 will be described below.
Representative vectors of a plurality of clusters in a feature space of pseudo high-frequency subband power differences, and a calculation method of a decoded high-frequency subband power estimation coefficient corresponding to each cluster
As a method of finding a representative vector of a plurality of clusters and a decoded high frequency subband power estimating coefficient for each cluster, it is necessary to prepare a coefficient capable of accurately estimating high frequency subband power at the time of decoding from a pseudo high frequency subband power difference vector calculated at the time of encoding. Therefore, the following techniques are applied: in this technique, learning is performed in advance using a broadband teaching signal, and these can be determined based on the learning result.
[ functional configuration example of coefficient learning device ]
Fig. 15 shows a functional configuration example of a coefficient learning device that performs learning of representative vectors of a plurality of clusters and decoded high-frequency subband power estimating coefficients for each cluster.
When the input signal of the encoding apparatus 30 passes through the low-pass filter 31 and is encoded by the low-frequency encoding circuit 32 and is further decoded into a decoded low-frequency signal by the low-frequency decoding circuit 42 of the decoding apparatus 40, a signal component lower than the cutoff frequency set by the low-pass filter 31 of the encoding apparatus 30 is advantageous in the wide-band teaching signal input to the coefficient learning apparatus 50 in fig. 15.
The coefficient learning device 50 includes a low-pass filter 51, a sub-band division circuit 52, a feature amount calculation circuit 53, a pseudo high-frequency sub-band power calculation circuit 54, a pseudo high-frequency sub-band power difference calculation circuit 55, a pseudo high-frequency sub-band power difference clustering circuit 56, and a coefficient estimation circuit 57.
Note that each of the low-pass filter 51, the sub-band dividing circuit 52, the feature amount calculating circuit 53, and the pseudo high-frequency sub-band power calculating circuit 54 of the coefficient learning apparatus 50 in fig. 15 has substantially the same configuration and function as the corresponding low-pass filter 31, sub-band dividing circuit 33, feature amount calculating circuit 34, and pseudo high-frequency sub-band power calculating circuit 35 in the encoding apparatus 30 in fig. 11, and therefore, description thereof will be appropriately omitted.
That is, the pseudo high frequency sub-band power difference calculation circuit 55 has a configuration and a function similar to those of the pseudo high frequency sub-band power difference calculation circuit 36 in fig. 11, but the calculated pseudo high frequency sub-band power difference is supplied to the pseudo high frequency sub-band power difference clustering circuit 56, and the high frequency sub-band power calculated at the time of calculating the pseudo high frequency sub-band power difference is supplied to the coefficient estimation circuit 57.
The pseudo high-frequency sub-band power difference clustering circuit 56 clusters the pseudo high-frequency sub-band power difference vectors obtained from the pseudo high-frequency sub-band power difference of the pseudo high-frequency sub-band power difference calculation circuit 55, and calculates a representative vector for each cluster.
The coefficient estimation circuit 57 calculates a high-frequency subband power estimation coefficient for each cluster that has been clustered using the pseudo high-frequency subband power difference clustering circuit 56, based on the high-frequency subband power from the pseudo high-frequency subband power difference calculating circuit 55 and one or more feature amounts from the feature amount calculating circuit 53.
[ coefficient learning processing of coefficient learning apparatus ]
Next, a coefficient learning process using the coefficient learning apparatus 50 in fig. 15 will be described with reference to a flowchart in fig. 16.
Note that the processing in steps S151 to S155 in the flowchart in fig. 16 is similar to the processing in steps S111 and S113 to S116 in the flowchart in fig. 12 except that the signal input into the coefficient learning device 50 is a broadband teaching signal, and thus the description thereof will be omitted.
That is, in step S156, the pseudo high frequency sub-band power difference clustering circuit 56 clusters a plurality of (a large number of time frames) pseudo high frequency sub-band power difference vectors obtained from the pseudo high frequency sub-band power differences from the pseudo high frequency sub-band power difference calculating circuit 55 into, for example, 64 clusters, and calculates a representative vector for each cluster. An example of a clustering method may be to use k-means clustering, for example. The pseudo high-frequency subband power difference clustering circuit 56 sets the gravity center vector of each cluster obtained as a result of performing k-means clustering as a representative vector of each cluster. Note that the method of clustering and the number of clusters are not limited to the above description, but other methods may be used.
In addition, the pseudo high frequency subband power difference clustering circuit 56 measures distances of 64 representative vectors using the pseudo high frequency subband power difference vector obtained from the pseudo high frequency subband power difference calculating circuit 55 in the time frame J, and determines the index cid (J) of the cluster to which the representative vector having the shortest distance belongs. Note that the index cid (j) is taken from 1 to an integer of the number of clusters (64 in this example). The pseudo high frequency sub-band power difference clustering circuit 56 thus outputs a representative vector and provides the index cid (j) to the coefficient estimation circuit 57.
In step S157, the coefficient estimation circuit 57 performs calculation of decoded high-frequency subband power estimation coefficients of each cluster, which are coefficients of a plurality of combinations of the feature amount and (eb-sb) number of high-frequency subband powers supplied from the pseudo high-frequency subband power difference calculation circuit 55 and the feature amount calculation circuit 53 to the same time frame, for each group having the same index cid (j) (belonging to the same cluster). Note that the method for calculating the coefficients using the coefficient estimation circuit 57 is similar to that of the coefficient estimation circuit 24 of the coefficient learning device 20 in fig. 9, but it goes without saying that other methods may be used.
According to the above-described processing, the learning is performed in advance using the broadband teaching signal for the representative vector of each of the plurality of clusters in the feature space of the pseudo high-frequency subband power difference preset in the high-frequency encoding circuit 37 of the encoding apparatus 30 in fig. 11 and for the decoded high-frequency subband power estimation coefficient output in the high-frequency decoding circuit 45 of the decoding apparatus 40 in fig. 13, whereby good output results can be obtained with respect to various input signals input into the encoding apparatus 30 and various input code strings input into the decoding apparatus 40, and therefore, music signals can be played with higher sound quality.
In addition, the coefficient data for calculating the high frequency subband power in the pseudo high frequency subband power calculating circuit 35 of the encoding apparatus 30 and the decoded high frequency subband power calculating circuit 46 of the decoding apparatus 40 may be processed as follows for signal encoding and decoding. That is, by using coefficient data which differs according to the type of input signal, the coefficient thereof can be recorded at the beginning of the code string.
For example, by modifying the coefficient data in accordance with a signal for speech or jazz music or the like, the encoding efficiency can be improved.
Fig. 17 shows the code string obtained in this way.
The code string a in fig. 17 is a code string of coding speech, and the coefficient data α optimal for speech is recorded in the header.
In contrast, the code string B in fig. 17 is a code string for encoding jazz music, and the coefficient data β optimal for jazz music is recorded in the header.
Such a plurality of types of coefficient data may be prepared by learning similar types of music signals in advance, and the encoding apparatus 30 may select coefficient data using genre information (e.g., genre information recorded in the header of an input signal). Alternatively, the coefficient data may be selected by performing waveform analysis of the signal to determine the genre. That is, such a genre analysis method for a signal is not limited to a specific method.
In addition, if the calculation time allows, the above-described learning device may be built into the encoding device 30, processing is performed using the coefficients of its dedicated signal, and the coefficients may be recorded into the header as finally shown in the code string C of fig. 17.
Next, the advantages of using this method will be described.
There are multiple locations in an input signal where the form of the high frequency sub-band power is similar. By using such features that many input signals have and by learning coefficients for estimating the high-frequency subband power separately for each input signal, it is made possible to reduce redundancy caused by the presence of similar positions of the high-frequency subband power. In addition, high-frequency subband power estimation can be performed with higher accuracy than learning of a coefficient that statistically estimates high-frequency subband power using a plurality of signals.
In addition, as shown above, the following arrangement may be made: coefficient data learned from an input signal at the time of encoding is inserted into several frames at a time.
<3. third embodiment >
[ example of functional configuration of encoding apparatus ]
Note that, according to the above description, the pseudo high frequency subband power difference ID is output as high frequency encoded data from the encoding apparatus 30 to the decoding apparatus 40, but a coefficient index for obtaining a decoded high frequency subband power estimating coefficient may be set as the high frequency encoded data.
In such a case, the encoding device 30 is configured, for example, as shown in fig. 18. Note that in fig. 18, portions corresponding to the case in fig. 11 have the same reference numerals, and thus description thereof will be appropriately omitted.
The encoding apparatus 30 in fig. 18 is different from the encoding apparatus 30 in fig. 11 in that: the low frequency decoding circuit 39 is not provided, but other design points are the same.
With the encoding apparatus 30 in fig. 18, the feature amount calculation circuit 34 calculates the low frequency subband power as a feature amount using the low frequency subband signal supplied from the subband dividing circuit 33, and supplies it to the pseudo high frequency subband power calculation circuit 35.
Further, the plurality of decoded high-frequency subband power estimating coefficients obtained in advance by regression analysis and the coefficient index for identifying such decoded high-frequency subband power estimating coefficients are correlated and recorded in the pseudo high-frequency subband power calculating circuit 35.
Specifically, sets of coefficients a for calculating the various subbands of expression (2) above are prepared in advanceib(kb) and the coefficient BibAs the decoded high frequency subband power estimation coefficients. For example, these Aib(kb) and the coefficient BibThe regression analysis is previously found using the least square method, in which the low-frequency subband power is used as an explanatory variable and the high-frequency subband power is used as an explained variable. In the regression analysis, an input signal composed of a low-frequency subband signal and a high-frequency subband signal is used as a broadband teaching signal.
The pseudo high frequency sub-band power calculation circuit 35 calculates a pseudo high frequency sub-band power of each high frequency side sub-band using the feature amount from the feature amount calculation circuit 34 and the decoded high frequency sub-band power estimation coefficient for each recorded decoded high frequency sub-band power estimation coefficient, and supplies the pseudo high frequency sub-band power to the pseudo high frequency sub-band power difference calculation circuit 36.
The pseudo high frequency sub-band power difference calculation circuit 36 compares the high frequency sub-band power obtained from the high frequency sub-band signal supplied from the sub-band division circuit 33 with the pseudo high frequency sub-band power from the pseudo high frequency sub-band power calculation circuit 35.
As a result of the comparison of the plurality of decoded high-frequency subband power estimating coefficients, the pseudo high-frequency subband power difference calculating circuit 36 supplies the coefficient index of the decoded high-frequency subband power estimating coefficient that obtains the pseudo high-frequency subband power closest to the high-frequency subband power to the high-frequency encoding circuit 37. In other words, the coefficient index of the decoded high-frequency subband power estimation coefficient of the high-frequency signal that realizes the input signal at the time of decoding (i.e., the decoded high-frequency signal that obtains the closest true value) is selected.
[ encoding processing of encoding apparatus ]
Next, the encoding process performed by the encoding apparatus 30 of fig. 18 will be described with reference to the flowchart of fig. 19. Note that the processing in steps S181 to S183 is similar to steps S111 to S113 of fig. 12, and therefore, detailed description thereof will be omitted.
In step S184, the feature amount calculation circuit 34 calculates a feature amount using the low frequency subband signal from the subband dividing circuit 33, and supplies the feature amount to the pseudo high frequency subband power calculation circuit 35.
Specifically, the feature amount calculating circuit 34 performs the calculation in expression (1) described above to calculate the low-frequency subband power (ib, J) of the frame J (where 0 ≦ J) of each subband ib (where sb-3 ≦ ib ≦ sb) on the low-frequency side as the feature amount. That is, the low frequency subband power (ib, J) is calculated by logarithm of the root mean square of the sample values of each sample of the low frequency subband signal constituting the frame J.
In step S185, the pseudo high frequency sub-band power calculating circuit 35 calculates the pseudo high frequency sub-band power based on the feature amount supplied from the feature amount calculating circuit 34, and supplies the pseudo high frequency sub-band power to the pseudo high frequency sub-band power difference calculating circuit 36.
For example, the pseudo high frequency sub-band power calculating circuit 35 uses a coefficient A recorded in advanceib(kb) and the coefficient BibThe calculation in expression (2) described above is performed as the decoded high frequency subband power estimation coefficient and the low frequency subband power (kb, J) (where sb-3 ≦ kb ≦ sb), and the pseudo high frequency subband power is calculatedest(ib,J)。
That is, the coefficient A of each sub-bandib(kb) multiplied by each low-frequency side element provided as a feature quantityLow frequency sub-band power of band (kb, J), and further coefficient BibAdded to the sum of the low frequency sub-band powers multiplied by the coefficients, to become a pseudo high frequency sub-band powerest(ib, J). Pseudo high frequency subband powers are calculated for each high frequency side subband with indices sb +1 through eb.
In addition, the pseudo high frequency sub-band power calculation circuit 35 performs calculation of pseudo high frequency sub-band power for each decoded high frequency sub-band power estimation coefficient recorded in advance. For example, assume that the coefficient indexes are 1 to K (where 2 ≦ K), and K decoded high-frequency subband power estimating coefficients are prepared in advance. In this case, a pseudo high frequency subband power for each subband is calculated for each of the K decoded high frequency subband power estimation coefficients.
In step S186, the pseudo high frequency sub-band power difference calculation circuit 36 calculates a pseudo high frequency sub-band power difference based on the high frequency sub-band signal from the sub-band division circuit 33 and the pseudo high frequency sub-band power from the pseudo high frequency sub-band power calculation circuit 35.
Specifically, the pseudo high frequency subband power difference calculating circuit 36 performs calculation similar to the calculation in the above expression (1) for the high frequency subband signal from the subband dividing circuit 33, and calculates the high frequency subband power (ib, J) in the frame J. Note that, according to the present embodiment, the subbands of the low frequency subband signal and the subbands of the high frequency subband signal are all identified using the index ib.
Next, the pseudo high frequency subband power difference calculating circuit 36 performs calculation similar to the calculation in the above expression (14), and finds the high frequency subband power (ib, J) and the pseudo high frequency subband power in the frame Jest(ib, J). Thus, for each decoded high-frequency subband power estimating coefficient, a pseudo high-frequency subband power difference power is obtained for each high-frequency side subband having indices sb +1 to ebdiff(ib,J)。
In step S187, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (15) for each decoded high frequency sub-band power estimation coefficient, and calculates the square sum of the pseudo high frequency sub-band power differences.
[ expression 15]
Note that in expression (15), the difference square sum E (J, id) shows the square sum of the pseudo high frequency subband power differences of frame J found for the decoded high frequency subband power estimating coefficient whose coefficient index is id. In addition, in expression (15), powerdiff(ib, J, id) represents the pseudo high frequency subband power difference power for frame J for subband index ib, found for the decoded high frequency subband power estimation coefficient with coefficient index iddiff(ib, J). The difference sum of squares E (J, id) is calculated for each of the k decoded high frequency subband power estimation coefficients.
The difference square sum E (J, id) thus obtained shows the similarity between the high frequency subband power calculated from the actual high frequency signal and the pseudo high frequency subband power calculated using the decoded high frequency subband power estimating coefficient whose coefficient index is id.
That is, the error of the estimated value with respect to the true value of the high-frequency subband power is represented. Therefore, the smaller the difference sum of squares E (J, id), the closer the decoded high-frequency signal obtained using the calculation of the decoded high-frequency subband power estimating coefficient is to the actual high-frequency signal. In other words, the decoded high-frequency subband power estimating coefficient having the smallest sum of squared differences E (J, id) can be said to be the best estimating coefficient of the band extending process performed at the time of decoding the output code string.
Thus, the pseudo high-frequency subband power difference calculating circuit 36 selects the difference square sum E (J, id) whose value is the smallest among the k difference square sums E (J, id), and supplies a coefficient index representing the decoded high-frequency subband power estimating coefficient corresponding to the difference square sum to the high-frequency encoding circuit 37.
In step S188, the high frequency encoding circuit 37 encodes the coefficient index supplied from the pseudo high frequency subband power difference calculating circuit 36, and supplies the high frequency encoded data obtained as a result to the multiplexing circuit 38.
For example, in step S188, entropy encoding or the like is performed on the coefficient index. Thus, the information amount of the high frequency encoded data output to the decoding apparatus 40 can be compressed. Note that the high-frequency encoded data may be any type of information as long as the information can obtain the best decoded high-frequency subband power estimating coefficient, and, for example, the coefficient index may be used as the high-frequency encoded data without change.
In step S189, the multiplexing circuit 38 multiplexes the low frequency encoded data supplied from the low frequency encoding circuit 32 and the high frequency encoded data supplied from the high frequency encoding circuit 37, and outputs the output code string obtained as a result, and ends the encoding process.
Thus, by outputting the high frequency encoded data obtained by encoding the coefficient index together with the low frequency encoded data as an output code string, the decoding apparatus 40 receiving the input of the output code string can obtain the decoded high frequency subband power estimating coefficient optimal for the band extending process. Thus, a signal having a higher sound quality can be obtained.
[ example of functional configuration of decoding apparatus ]
In addition, the decoding device 40 that inputs and decodes the output code string output from the encoding device 30 in fig. 18 as an input code string is configured as shown in fig. 20, for example. Note that in fig. 20, portions corresponding to the case in fig. 13 have the same reference numerals, and a description thereof will be omitted.
The decoding apparatus 40 in fig. 20 is the same as the decoding apparatus 40 in fig. 13 from the viewpoint of being composed of the demultiplexing circuit 41 to the synthesizing circuit 48, but is different from the decoding apparatus 40 in fig. 13 from the viewpoint of the decoded low-frequency signal from the low-frequency decoding circuit 42 not being supplied to the feature amount calculating circuit 44.
At the decoding device 40 in fig. 20, the high frequency decoding circuit 45 records in advance the same decoded high frequency subband power estimating coefficient as that recorded by the pseudo high frequency subband power calculating circuit 35 in fig. 18. That is, a set of coefficients A previously found by regression analysis, which are used as the decoded high-frequency subband power estimating coefficientsib(kb) and the coefficient BibAssociated with the coefficient index and recorded.
The high-frequency decoding circuit 45 decodes the high-frequency encoded data supplied from the demultiplexing circuit 41, and supplies the decoded high-frequency subband power estimating coefficient shown together with the coefficient index obtained as a result to the decoded high-frequency subband power calculating circuit 46.
[ decoding processing of decoding apparatus ]
Next, a decoding process performed using the decoding apparatus 40 in fig. 20 will be described with reference to the flowchart in fig. 21.
The decoding process is started when the output code string output from the encoding apparatus 30 is supplied to the decoding apparatus 40 as an input code string. Note that the processing in steps S211 to S213 is similar to that in steps S131 to S133 in fig. 14, and thus detailed description thereof will be omitted.
In step S214, the feature amount calculation circuit 44 calculates a feature amount using the decoded low frequency subband signal from the subband dividing circuit 43, and supplies the feature amount to the decoded high frequency subband power calculation circuit 46. Specifically, the feature amount calculation circuit 44 performs the calculation in expression (1) above, and calculates the low-frequency subband power (ib, J) of the frame J (where 0 ≦ J) for each low-frequency side subband ib as a feature amount.
In step S215, the high-frequency decoding circuit 45 performs decoding of the high-frequency encoded data supplied from the demultiplexing circuit 41, and supplies the decoded high-frequency subband power estimating coefficient shown by the coefficient index obtained as a result to the decoded high-frequency subband power calculating circuit 46. That is, of the plurality of decoded high-frequency subband power estimating coefficients recorded in advance in the high-frequency decoding circuit 45, the decoded high-frequency subband power estimating coefficient indicated by the coefficient index obtained by decoding is output.
In step S216, the decoded high-frequency subband power calculating circuit 46 calculates the decoded high-frequency subband power based on the feature amount supplied from the feature amount calculating circuit 44 and the decoded high-frequency subband power estimating coefficient supplied from the high-frequency decoding circuit 45, and supplies the decoded high-frequency subband power to the decoded high-frequency signal generating circuit 47.
That is, the decoded high-frequency subband power calculating circuit 46 uses the coefficient A serving as the decoded high-frequency subband power estimating coefficientib(kb) and BibAnd low-frequency subband power (kb, J) (where sb-3. ltoreq. kb. ltoreq. sb) as a feature amount to perform the calculation in the above expression (2) and calculate the decoded high-frequency subband power. Thus, the decoded high-frequency subband power of each high-frequency side subband having indices sb +1 through eb is obtained.
In step S217, the decoded high-frequency signal generation circuit 47 generates a decoded high-frequency signal based on the decoded low-frequency subband signal supplied from the subband dividing circuit 43 and the decoded high-frequency subband power supplied from the decoded high-frequency subband power calculation circuit 46.
Specifically, the decoded high-frequency signal generation circuit 47 uses the decoded low-frequency subband signal to perform the calculation in expression (1) above, and calculates the low-frequency subband power of each low-frequency side subband. The decoded high-frequency signal generation circuit 47 then performs the calculation in expression (3) above using the obtained low-frequency subband power and decoded high-frequency subband power, and calculates the gain amount G (ib, J) for each high-frequency-side subband.
In addition, the decoded high-frequency signal generation circuit 47 uses the gain amount G (ib, J) and the decoded low-frequency subband signal to perform the calculations in the above-described expression (5) and expression (6), and generates a high-frequency subband signal x3 (ib, n) for each high-frequency-side subband.
That is, the decoded high-frequency signal generation circuit 47 subjects the decoded low-frequency subband signal x (ib, n) to amplitude adjustment in accordance with the ratio between the low-frequency subband power and the decoded high-frequency subband power, and as a result thereof, also subjects the obtained decoded low-frequency subband signal x2 (ib, n) to frequency modulation. Thereby, the signal of the low-frequency side subband frequency component is converted into the frequency component signal of the high-frequency side subband, and the high-frequency subband signal x3 (ib, n) is obtained.
The processing by which the high frequency subband signal for each subband is obtained is described in detail below.
It is assumed that four sub-bands arranged in succession in the frequency domain are referred to as band blocks, and the frequency band is divided such that one band block is composed of four sub-bands with indices sb to sb-3 on the low frequency side (hereinafter referred to as low frequency blocks in particular). At this time, for example, a band composed of subbands indexed by sb +1 to sb +4 on the high frequency side is regarded as one band block. Note that, hereinafter, a frequency band block on the high frequency side (i.e., composed of subbands having an index of sb +1 or more) is particularly referred to as a high frequency block.
Let us now look at one of the subbands that make up the high-frequency block and generate a high-frequency subband signal for that subband (hereinafter referred to as the subband of interest). First, the decoded high-frequency signal generation circuit 47 identifies the sub-band of the low-frequency block that is in the same positional relationship with the position of the sub-band of interest in the high-frequency block.
For example, if the index of the sub-band of interest is sb +1, the sub-band of interest is a frequency band having the lowest frequency of the high frequency block, and thus, the sub-band of the low frequency block, which is in the same positional relationship with the sub-band of interest, becomes a sub-band whose index is sb-3.
Thus, when a sub-band of the low frequency block that is in the same positional relationship with the sub-band of interest is identified, the low frequency sub-band power and decoded low frequency sub-band signal for that sub-band and the decoded high frequency sub-band power for the sub-band of interest are used to generate a high frequency sub-band signal for the sub-band of interest.
That is, the decoded high frequency subband power and low frequency subband power are substituted in expression (3), and the gain amount according to the ratio of the powers thereof is calculated. The calculated gain amount is multiplied by the decoded low frequency subband signal, and further, the decoded low frequency subband signal multiplied by the gain amount is subjected to frequency modulation using the calculation in expression (6) to become a high frequency subband signal of the subband of interest.
With the above processing, a high-frequency subband signal of each high-frequency side subband is obtained. Subsequently, the decoded high-frequency signal generation circuit 47 also performs the calculation in expression (7) above, sums up the obtained respective high-frequency subband signals, and generates a decoded high-frequency signal. The decoded high-frequency signal generation circuit 47 supplies the obtained decoded high-frequency signal to the synthesis circuit 48, and the process proceeds from step S217 to step S218.
In step S218, the synthesis circuit 48 synthesizes the decoded low frequency signal from the low frequency decoding circuit 42 and the decoded high frequency signal from the decoded high frequency signal generation circuit 47, and outputs the synthesized signal as an output signal. Subsequently, the decoding process ends.
As described above, according to the decoding device 40, the coefficient index is obtained from the high-frequency encoded data obtained by demultiplexing the input code string, and the decoded high-frequency subband power is calculated using the decoded high-frequency subband power estimating coefficient shown by the coefficient index, whereby the estimation accuracy of the high-frequency subband power can be improved. Thus, the music signal can be played with a higher sound quality.
<4. fourth embodiment >
[ encoding processing of encoding apparatus ]
In addition, examples are described for the following cases: only the coefficient index is included in the high frequency-encoded data, but other information may be included therein.
For example, if coefficient indices are included in the high-frequency encoded data, the decoded high-frequency subband power estimating coefficient that obtains the decoded high-frequency subband power closest to the high-frequency subband power of the true high-frequency signal can be known on the decoding apparatus 40 side.
However, approximately the pseudo high frequency sub-band power difference power calculated using the pseudo high frequency sub-band power difference calculating circuit 36diff(ib, J) the difference having substantially the same value occurs in the true high-frequency subband power (true value) and the decoded high-frequency subband power (estimated value) at the decoding apparatus 40 side.
Now, if not only the coefficient index but also the pseudo high frequency subband power difference for each subband are contained in the high frequency encoded data, the total error of the decoded high frequency subband power with respect to the actual high frequency subband power can be known at the decoding apparatus 40 side. Thus, the error can be used to further improve the estimation accuracy of the high frequency subband power.
Next, an encoding process and a decoding process in the case where the pseudo high frequency subband power difference is included in the high frequency encoded data will be described with reference to flowcharts in fig. 22 and 23.
First, an encoding process performed using the encoding apparatus 30 in fig. 18 will be described with reference to a flowchart in fig. 22. Note that the processing in step S241 to step S246 is similar to the processing in step S181 to step S186 in fig. 19, and thus detailed description thereof will be omitted.
In step S247, the pseudo high frequency subband power difference calculating circuit 36 performs the calculation in the above expression (15), and calculates the difference square sum E (J, id) of each decoded high frequency subband power estimating coefficient.
The pseudo high-frequency subband power difference calculating circuit 36 selects the difference square sum which is the minimum value among the difference square sums E (J, id), and supplies a coefficient index showing a decoded high-frequency subband power estimating coefficient corresponding to the difference square sum to the high-frequency encoding circuit 37.
In addition, the pseudo high frequency subband power difference calculating circuit 36 calculates the pseudo high frequency subband power difference power for each subband, which is obtained for the decoded high frequency subband power estimating coefficient corresponding to the selected sum of squares of differencesdiff(ib, J) is supplied to the high frequency encoding circuit 37.
In step S248, the high frequency encoding circuit 37 encodes the coefficient index and the pseudo high frequency subband calculation difference supplied from the pseudo high frequency subband power difference calculating circuit 36, and supplies the high frequency encoded data obtained as a result to the multiplexing circuit 38.
Thus, the pseudo high-frequency subband power difference (i.e., the estimation error of the high-frequency subband power) of each subband in which the indexes sb +1 to eb on the high-frequency side is supplied to the decoding apparatus 40 as high-frequency encoded data.
When the high-frequency encoded data has been obtained, subsequently, the processing in step S249 is executed, and the encoding processing is ended, but the processing in step S249 is similar to the processing in step S189 of fig. 19, and thus detailed description thereof will be omitted.
As described above, when the pseudo high frequency subband power difference is included in the high frequency encoded data, the estimation accuracy of the high frequency subband power can be further improved at the decoding apparatus 40, and a music signal with higher sound quality can be obtained.
[ decoding processing of decoding apparatus ]
Next, the execution of the decoding process using the decoding apparatus 40 in fig. 20 will be described with reference to the flowchart in fig. 23. Note that the processing in steps S271 to S274 is similar to the processing in steps S211 to S214 of fig. 21, and thus detailed description thereof will be omitted.
In step S275, the high frequency decoding circuit 45 performs decoding of the high frequency encoded data supplied from the demultiplexing circuit 41. The high frequency decoding circuit 45 then supplies the decoded high frequency subband power estimating coefficient represented by the coefficient index obtained by decoding and the pseudo high frequency subband power difference for each subband obtained by decoding to the decoded high frequency subband power calculating circuit 46.
In step S276, the decoded high-frequency subband power calculating circuit 46 calculates the decoded high-frequency subband power based on the feature amount supplied from the feature amount calculating circuit 44 and the decoded high-frequency subband power estimating coefficient supplied from the high-frequency decoding circuit 45. Note that in step S276, processing similar to that in step S216 of fig. 21 is executed.
In step S277, the decoded high frequency subband power calculating circuit 46 adds the pseudo high frequency subband power difference supplied from the high frequency decoding circuit 45 to the decoded high frequency subband power, sets it as the final decoded high frequency subband power, and supplies it to the decoded high frequency signal generating circuit 47. That is, the decoded high frequency subband power for each calculated subband is added with the pseudo high frequency subband power difference for the same subband.
Subsequently, the processing in step S278 and step S279 is performed, and the decoding processing is ended, but the processing here is the same as the processing in step S217 and step S218 in fig. 21, and thus description thereof will be omitted.
As described above, the decoding device 40 obtains the coefficient index and the pseudo high frequency subband power difference from the high frequency encoded data obtained by demultiplexing the input code string. The decoding apparatus 40 then calculates the decoded high frequency subband power using the decoded high frequency subband power estimation coefficient represented by the coefficient index and the pseudo high frequency subband power difference. Thus, the estimation accuracy of the high-frequency subband power can be improved, and the music signal can be played with higher sound quality.
Note that a difference in the estimated value of the high-frequency subband power occurring between the encoding apparatus 30 and the decoding apparatus 40, that is, a difference in the pseudo high-frequency subband power and the decoded high-frequency subband power (hereinafter referred to as an inter-apparatus estimated difference) may be considered.
In such a case, for example, the pseudo high frequency subband power difference serving as the high frequency encoded data may be corrected using the inter-device estimation difference, or the inter-device estimation difference may be included in the high frequency encoded data, and the pseudo high frequency subband power difference may be corrected by the inter-device estimation difference on the decoding device 40 side. In addition, it is possible to record the inter-device estimation difference on the decoding device 40 side in advance, wherein the decoding device 40 adds the inter-device estimation difference to the pseudo high frequency subband power difference and performs the correction. Thereby, a decoded high frequency signal closer to the actual high frequency signal can be obtained.
<5. fifth embodiment >
Note that the encoding device 30 in fig. 18 is described such that the pseudo high frequency subband power difference calculating circuit 36 selects an optimum difference square sum from among a plurality of coefficient indices as the difference square sum E (J, id) serving as an index, but may select a coefficient index using an index other than the difference square sum.
For example, the coefficient index may be selected using, as an index, an evaluation value that takes into account the mean, maximum, and average values, etc., of the residuals between the high-frequency subband powers and the pseudo high-frequency subband powers. In such a case, the encoding device 30 in fig. 18 executes the encoding process shown in the flowchart in fig. 24.
The encoding process using the encoding device 30 will be described below with reference to a flowchart in fig. 24. Note that the processing in step S301 to step S305 is similar to the processing in step S181 to step S185 in fig. 19, and thus detailed description thereof will be omitted. When the processing in step S301 to step S305 has been performed, the pseudo high frequency subband power for each subband is calculated for each of the K decoded high frequency subband power estimating coefficients.
In step S306, the pseudo high frequency subband power difference calculating circuit 36 calculates an evaluation value Res (id, J) for each of the K decoded high frequency subband power estimating coefficients using the current frame J subjected to the processing.
Specifically, the pseudo high frequency subband power difference calculating circuit 36 performs calculation similar to that calculated in the above expression (1) using the high frequency subband signal of each subband supplied from the subband dividing circuit 33, and calculates the high frequency subband power (ib, J) in the frame J. Note that, according to the present embodiment, the subbands of the low frequency subband signal and the subbands of the high frequency subband signal are all identified using the index ib.
When the high-frequency subband power (ib, J) has been obtained, the pseudo high-frequency subband power difference calculating circuit 36 calculates the following expression (16), and calculates the residual mean square value Resstd(id,J)。
[ expression 16]
That is, for each subband on the high frequency side where the indexes are sb +1 to eb, the difference power between the high frequency subband power (ib, J) of the frame J and the pseudo high frequency subband power is foundest(ib, id, J), and the sum of the squares of their differences becomes the residual mean square Resstd(id, J). Note that the pseudo high frequency sub-band powerest(ib, id, J) denotes the pseudo high frequency subband power for frame J for the subband with index ib, which is found for the decoded high frequency subband power estimation coefficient with coefficient index id.
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (17), calculates the residual maximum value Resmax(id,J)。
[ expression 17]
Resmax(id,J)=maxib{|power(ib,J)-powerest(ib,id,J)|}
...(17)
Note that in expression (17), maxib{|power(ib,J)-powerest(ib, id, J) | } denotes the high frequency sub-band power (ib, J) and pseudo high frequency sub-band power of each sub-band with indices sb +1 through ebest(ib, id, J) maximum of absolute values of previous differences. Thus, the high frequency subband power (ib, J) and the pseudo high frequency subband power in frame JestThe maximum value of the absolute values of the differences between (ib, id, J) becomes the residual maximum value Resmax(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (18), calculates the residual average value Resave(id,J)。
[ expression 18]
That is, for each subband on the high frequency side where the indexes sb +1 to eb are, the high frequency subband power (ib, J) and the pseudo high frequency subband power (power) at the frame J are foundest(ib, id, J) and summing the differences. The absolute value of a value obtained by dividing the sum of the obtained differences by the number of subbands on the high frequency side (eb-sb) becomes the residual average Resave(id, J). Here residual mean Resave(id, J) represents the size of the average of the estimated differences for the respective subbands in which the symbol has been considered.
In addition, when the residual mean square Res is obtainedstd(id, J), residual maximum Resmax(id, J) and residual mean Resave(id, J), the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (19), thereby calculating the final evaluation value Res (id, J).
[ expression 19]
Res(id,J)=Resstd(id,J)+Wmax×Resmax(id,J)+Wave×Resave(id,J)
…(19)
That is, the residual mean square Resstd(id, J), residual maximum Resmax(id, J) and residual mean ResaveThe (id, J) weights are added, thereby becoming the final evaluation value Res (id, J). Note that in expression (19), WmaxAnd WaveIs a preset weight value and may be, for example, Wmax=0.5、WaveAnd =0.5 and the like.
The pseudo high frequency subband power difference calculating circuit 36 performs the above-described processing and calculates an evaluation value Res (id, J) for each of the K decoded high frequency subband power estimating coefficients, that is, for each of the K coefficient indices id.
In step S307, the pseudo high frequency sub-band power difference calculation circuit 36 selects the coefficient index id based on the evaluation value Res (id, J) of each found coefficient index id.
The evaluation value Res (id, J) obtained using the above processing represents the similarity between the high frequency subband power calculated from the actual high frequency signal and the pseudo high frequency subband power calculated using the decoded high frequency subband power estimating coefficient in which the coefficient index is id. That is, this shows the magnitude of the high frequency component estimation error.
Therefore, since the calculation of the decoded high frequency subband power estimating coefficient is used, the smaller the evaluation value Res (id, J), the closer the decoded high frequency signal to the actual high frequency signal will be obtained. Thus, the pseudo high-frequency subband power difference calculating circuit 36 selects the evaluation value which is the smallest among the K evaluation values Res (id, J), and supplies the coefficient index indicating the decoded high-frequency subband power estimating coefficient corresponding to the evaluation value to the high-frequency encoding circuit 37.
When the coefficient index is output to the high-frequency encoding circuit 37, the processing in step S308 and step S309 is subsequently performed, and the encoding processing is ended, but the processing is similar to the processing in step S188 and step S189 of fig. 19, and thus detailed description thereof will be omitted.
As described above, with respect to the encoding device 30, the RES is used according to the residual mean square valuestd(id, J), residual maximum Resmax(id, J) and residual mean Resave(id, J) calculated evaluation value Res (id, J), thereby selecting the best coefficient index of the decoded high frequency subband power estimating coefficient.
By using the evaluation value Res (id, J), the estimation accuracy of the high-frequency subband power can be evaluated using more evaluation scales than the case of using the sum of squared differences, and thus, a more accurate decoded high-frequency subband power estimation coefficient can be selected. Thus, with the decoding device 40 that receives the input of the output code string, it is possible to obtain the decoded high-frequency subband power estimating coefficient that is optimal for the band extending process, and it is possible to obtain a signal with higher sound quality.
< modification 1>
In addition, by performing the above-described encoding process for each input signal frame, it is possible to select a coefficient index that differs for each successive frame at a constant region having little temporal variation in high-frequency subband power for each high-frequency-side subband of the input signal.
That is, regarding successive frames constituting a constant region of the input signal, the high-frequency subband power of each frame approximately has the same value, and therefore the same coefficient index can be successively selected for these frames. However, in the segments of these consecutive frames, the coefficient index selected by the frame may vary, and therefore, the high-frequency component of the audio played back on the decoding apparatus 40 side may stop to remain constant. The played audio may cause discomfort to the sense of hearing.
Now, in the case of selecting the coefficient index using the encoding apparatus 30, the estimation result of the high-frequency component of the temporally previous frame is also considered. In such a case, the encoding device 30 in fig. 18 executes the encoding process shown in the flowchart in fig. 25.
Next, an encoding process using the encoding device 30 will be described with reference to a flowchart in fig. 25. Note that the processing in steps S331 to S336 is similar to that in steps S301 to S306 of fig. 24, and thus detailed description thereof will be omitted.
In step S337, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an evaluation value ResP (id, J) using the past frame and the current frame.
Specifically, the pseudo high frequency subband power difference calculating circuit 36 records the pseudo high frequency subband power for each subband obtained using the decoded high frequency subband power estimating coefficient for the coefficient index finally selected for frame (J-1) (i.e., one frame temporally prior to frame J to be processed). Now, the finally selected coefficient index is the coefficient index encoded by the high frequency encoding circuit 37 and output by the decoding device 40.
Hereinafter, it is assumed that the coefficient index id specifically selected in the frame (J-1) is referred to as idselected(J-1). In addition, the description will continue, where the coefficient index id will be usedselectedThe pseudo high frequency subband power of the subband with index ib (where sb +1 ≦ ib ≦ eb) obtained by decoding the high frequency subband power estimation coefficient of (J-1) is called powerest(ib,idselected(J-1),J-1)。
The pseudo high-frequency subband power difference calculating circuit 36 first calculates the following expression (20) to calculate an estimated residual mean square value Resstd(id,J)。
[ expression 20]
That is, for each subband on the high frequency side where the indices sb +1 through eb are found, the pseudo high frequency subband power in frame (J-1) is foundest(ib,idselected(J-1), J-1) and pseudo high frequency subband power for frame Jest(ib, id, J). The sum of the squares of their differences then becomes the estimated residual mean square ResPstd(id, J). Note that the pseudo high frequency sub-band powerest(ib, id, J) denotes the pseudo high frequency subband power for frame J for the subband with index ib, which is found for the decoded high frequency subband power estimation coefficient with coefficient index id.
Here the residual mean square ResP is estimatedstd(id, J) is the sum of the squared differences of the pseudo high frequency sub-band power between temporally successive frames, whereby the estimated residual mean square ResPstdThe smaller (id, J) is, the smaller temporal variation exists in the high-frequency component estimation value.
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (21), thereby calculating an estimated residual maximum value ResPmax(id,J)。
[ expression 21]
ResPmax(id,J)=maxib{|powerest(ib,idse|ected(J-1),J-1)
-powerest(ib,id,J)|} …(21)
Note that in expression (21), maxib{|powerest(ib,idselected(J-1),J-1)-powerest(ib, id, J) | } denotes the pseudo high frequency subband power at each subband with indices sb +1 to ebest(ib,idselected(J-1), J-1) and pseudo high frequency subband powerest(ib, id, J) is the maximum value of the absolute value of the difference between them. Therefore, the maximum value of the absolute value of the difference in pseudo high-frequency subband power between temporally successive frames becomes the estimated residual maximum value ResPmax(id,J)。
Estimate residual maximum ResPmaxThe smaller the value of (id, J), the closer the estimation result of the high frequency component between consecutive frames.
When the estimated residual maximum ResP has been obtainedmax(id, J), next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (22) to calculate an estimated residual average value ResPave(id,J)。
[ expression 22]
That is, the pseudo high frequency subband power in the frame (J-1) is found for each subband on the high frequency side with indices sb +1 to ebest(ib,idselected(J-1), J-1) and pseudo high frequency subband power for frame Jest(ib, id, J). The absolute value of a value obtained by dividing the sum of differences in the respective subbands by the number of subbands on the high frequency side (eb-sb) becomes the estimated residual average value ResPave(id, J). Estimating residual mean ResPave(id, J) here denotes the magnitude of the average of the differences in the estimated values of the subbands between frames in which symbols are considered.
In addition, when the mean square value ResP of the estimated residual error is obtainedstd(id, J), estimated residual maximum ResPmax(id, J) and estimated residual mean ResPave(id, J), the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (23) to thereby calculate an evaluation valueResP(id,J)。
[ expression 23]
ResP(id,J)=ResPstd(id,J)+Wmax×ResPmax(id,J)
+Wave×ResPave(id,J) ...(23)
That is, the residual mean square ResP will be estimatedstd(id, J), estimated residual maximum ResPmax(id, J) and estimated residual mean ResPave(id, J) is subjected to weighted addition, thereby becoming an evaluation value ResP (id, J). Note that in expression (23), WmaxAnd WaveIs a preset weight value and may be, for example, Wmax=0.5、WaveAnd =0.5 and the like.
Thus, when the evaluation value ResP (id, J) using the past frame and the current frame has been calculated, the process proceeds from step S337 to step S338.
In step S338, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the following expression (24), and calculates the final evaluation value Resall(id,J)。
[ expression 24]
Resall(id,J)=Res(id,J)+Wp(J)×ResP(id,J) …(14)
That is, the obtained evaluation value Res (id, J) and the evaluation value ResP (id, J) are added with a weight. Note that in expression (24), Wp(J) Is a weight value defined by, for example, the following expression (25).
[ expression 25]
In addition, power in expression (25)r(J) Is a value defined by the following expression (26).
[ expression 26]
powerr(J) The average of the difference between the high frequency subband powers for frame (J-1) and frame J is represented here. In addition, according to expression (25), when powerr(J) When approaching 0, Wp(J) Is a value within a predetermined range when power is appliedr(J) Becoming increasingly smaller, Wp(J) Becomes a value close to 1 and goes to powerr(J) Is a value greater than a predetermined rangep(J) Becomes 0.
Now, when powerr(J) Is a value within a predetermined range around 0, a high frequency subband between successive framesThe average value of the power difference becomes smaller by a certain amount. In other words, the temporal variation of the high frequency component of the input signal is small, and thus the current frame of the input signal is in a constant region.
The more stable the high frequency component of the input signal, the weight Wp(J) The closer to a value of 1, conversely, the more unstable the high-frequency component, the closer to 0 the value. Therefore, as to the evaluation value Res shown in expression (24)all(id, J), the smaller the temporal change of the high-frequency component of the input signal, the greater the contribution rate of the evaluation value ResP (id, J) which is used as an evaluation scale from the comparison result with the high-frequency component estimation result of the immediately preceding frame.
Therefore, with respect to the constant region of the input signal, the decoded high-frequency subband power estimating coefficient capable of obtaining the estimation result in the vicinity of the high-frequency component of the immediately preceding frame is selected, and the audio can be played at a more natural high sound quality on the decoding apparatus 40 side. Conversely, regarding the non-constant region of the input signal, at the evaluation value ResallThe term for the evaluation value ResP (id, J) in (id, J) becomes 0, thereby obtaining a decoded high-frequency signal closer to the actual high-frequency signal.
The pseudo high-frequency subband power difference calculating circuit 36 performs the above processing, and calculates the evaluation value Res for each of the K decoded high-frequency subband power estimating coefficientsall(id,J)。
In step S339, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an evaluation value Res based on the obtained evaluation value Res of each decoded high frequency sub-band power estimation coefficientall(id, J) to select a coefficient index id.
The evaluation value Res obtained using the above-described processingall(id, J) linearly combining the evaluation value Res (id, J) and the evaluation value ResP (id, J) using the weight value. As described above, the smaller the value of the evaluation value Res (id, J), the closer the decoded high-frequency signal to the true high-frequency signal can be obtained. In addition, the smaller the value of the evaluation value ResP (id, J), the closer to the immediately preceding can be obtainedA decoded high frequency signal of the frame.
Therefore, the evaluation value ResallThe smaller (id, J), the more accurate the decoded high frequency signal can be obtained. Thus, at K evaluation values Resall(id, J), the pseudo high frequency subband power difference calculating circuit 36 selects the evaluation value having the smallest value, and supplies a coefficient index representing the decoded high frequency subband power estimating coefficient corresponding to the evaluation value to the high frequency encoding circuit 37.
When the coefficient index has been selected, subsequently, the processing in step S340 and step S341 is performed, and the encoding processing is ended, but the processing here is similar to the processing in step S308 to step S309 in fig. 24, and thus detailed description thereof will be omitted.
As shown above, with the encoding device 30, the evaluation value Res obtained by linearly combining the evaluation value Res (id, J) and the evaluation value ResP (id, J) is usedall(id, J) to select the best coefficient index for decoding the high frequency subband power estimation coefficients.
By using the evaluation value Resall(id, J), similar to the case of using the evaluation value Res (id, J), a more accurate decoded high frequency subband power estimation coefficient can be selected with more estimation scales. Further, by using the evaluation value Resall(id, J), the temporal variation in the constant region of the high-frequency component of the signal to be played can be suppressed on the decoding apparatus 40 side, and a signal with higher sound quality can be obtained.
< modification 2>
Now, with regard to the band extension processing, if higher sound quality of audio is to be obtained, the sub-band on the low frequency side becomes more important from the viewpoint of hearing. That is, among the respective subbands on the high frequency side, the higher the estimation accuracy of the subband close to the low frequency side is, the higher the audio quality can be played.
Now, in the case where the evaluation value is calculated for each decoded high-frequency subband power estimating coefficient, the subband on the very low frequency side may be weighted. In such a case, the encoding device 30 in fig. 18 executes the encoding process shown in the flowchart of fig. 26.
The encoding process by the encoding device 30 will be described below with reference to a flowchart in fig. 26. Note that the processes in step S371 to step S375 are similar to those in step S331 to step S335 in fig. 25, and thus detailed description thereof will be omitted.
In step S376, the pseudo high-frequency subband power difference calculating circuit 36 calculates an evaluation value ResW for each of the K decoded high-frequency subband power estimating coefficients using the current frame J to be processedband(id,J)。
Specifically, the pseudo high frequency subband power difference calculating circuit 36 performs calculation similar to that in the above expression (1) using the high frequency subband signals of the respective subbands supplied from the subband dividing circuit 33, thereby calculating the high frequency subband power (ib, J) in the frame J.
When the high-frequency subband power (ib, J) has been obtained, the pseudo high-frequency subband power difference calculating circuit 36 calculates the following expression (27), and calculates the residual mean square value ResstdWband(id,J)。
[ expression 27]
That is, for each of the high-frequency-side subbands having indices sb +1 to eb, the high-frequency subband power (ib, J) and the pseudo high-frequency subband power (power) in the frame J are foundest(ib, id, J), and weight W of each sub-bandband(ib) multiplied by the difference. Is multiplied by a weight WbandThe sum of squared differences (ib) becomes the residual mean square ResstdWband(id,J)。
Now, the weight Wband(ib) (where sb + 1. ltoreq. ib. ltoreq. eb) is defined, for example, by the following expression (28). The closer the subband is to the low frequency side, the weight WbandThe larger the value of (ib) becomes.
[ expression 28]
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates a residual maximum ResmaxWband(id, J). Specifically, the high-frequency subband powers power (ib, J) and pseudo high-frequency subband powers power of the respective subbands having indices sb +1 to eb have been madeestThe difference between (ib, id, J) is multiplied by a weight WbandThe maximum value of the absolute values of those of (ib) becomes the residual maximum value ResmaxWband(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the residual average value ResaveWband(id,J)。
Specifically, for each subband in which the indices sb +1 to eb are found, the power (ib, J) at the high frequency subband and the pseudo high frequency subband power are foundest(ib, id, J) and multiplying the difference by a weight Wband(ib) and find the multiplication by weight Wband(ib) sum of differences. The absolute value of a value obtained by dividing the sum of the obtained differences by the number of subbands on the high-frequency side (eb-sb) is the residual average ResaveWband(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an evaluation value ResWband(id, J). That is, the residual mean square ResstdWband(id, J), multiplied by weight WmaxResidual maximum Res ofmaxWband(id, J) and has been multiplied by a weight WaveResidual mean Res ofaveWbandThe sum of (id, J) is the evaluation value ResWband(id,J)。
In the step S377, the process is executed,the pseudo high frequency sub-band power difference calculation circuit 36 calculates an evaluation value ResPW using the past frame and the current frameband(id,J)。
Specifically, the pseudo high frequency subband power difference calculating circuit 36 records the pseudo high frequency subband power of each subband obtained using the decoded high frequency subband power estimating coefficient of the finally selected coefficient index for the frame (J-1) one frame temporally preceding the frame J to be processed.
The pseudo high frequency sub-band power difference calculation circuit 36 first calculates the estimated residual mean square value ResPstdWband(id, J). That is, for each subband on the high frequency side where the indices sb +1 through eb are found, the power at the pseudo high frequency subband is foundest(ib,idselected(J-1), J-1) and pseudo high frequency subband powerest(ib, id, J) and multiplying the difference by a weight Wband(ib). Multiplying by a weight Wband(ib) the sum of squared differences is the estimated residual mean square ResPstdWband(id,J)。
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an estimated residual maximum ResPmaxWband(id, J). In particular, the weight W will be determined byband(ib) multiplying the pseudo high frequency subband power of each subband having indices sb +1 to eb thereinest(ib,idselected(J-1), J-1) and pseudo high frequency subband powerest(ib, id, J) as the estimated residual maximum ResPmaxWband(id,J)。
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an estimated residual average ResPaveWband(id, J). Specifically, pseudo high frequency subband power of each subband in which indexes sb +1 to eb are foundest(ib,idselected(J-1), J-1) and pseudo high frequency subband powerest(ib, id, J) and multiplying the difference by a weight Wband(ib). By multiplying by a weight WbandThe absolute value of the value obtained by dividing the sum of the differences of (ib) by the number of subbands on the high-frequency side (eb-sb) is the estimated residual average value ResPaveWband(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 obtains an estimated residual error mean square value ResPstdWband(id, J), multiplied by weight WmaxEstimated residual maximum ResP ofmaxWband(id, J) and has been multiplied by a weight WaveEstimated residual mean value of (ResP)aveWband(id, J) and takes it as the evaluation value ResWband(id,J)。
In step S378, the pseudo high frequency sub-band power difference calculation circuit 36 evaluates the value ResWband(id, J) and multiplied by weight W in expression (25)p(J) Evaluation value of ResPWband(id, J) are added, and the final evaluation value Res is calculatedallWband(id, J). Evaluation value ResallWband(id, J) is here calculated for each of the K decoded high frequency subband power estimation coefficients.
Subsequently, the processing in steps S379 to S381 is executed, and the encoding processing is ended, but the processing here is similar to the processing in steps S339 to S341 in fig. 25, and thus detailed description thereof will be omitted. Note that in step S379, of the K coefficient indexes, the one having the smallest evaluation value Res is selectedallWbandCoefficient index of (id, J).
Thus, each subband is weighted so that the weight is placed further toward the subband on the low frequency side, whereby audio with higher sound quality can be obtained on the decoding apparatus 40 side.
Note that, with regard to the above description, based on the evaluation value ResallWband(id, J) selection of the decoded high frequency subband power estimating coefficient is performed, but may be based on the evaluation value ResWband(id, J) to select the decoded high frequency subband power estimation coefficients.
< modification 3>
In addition, human hearing has a characteristic of better sensing a frequency band when the amplitude (power) of the frequency band is large, and thus an evaluation value may be calculated for each decoded high-frequency subband power estimating coefficient so that a weight is placed on a subband having a large power.
In such a case, the encoding device 30 in fig. 18 executes the encoding process shown in the flowchart in fig. 27. The encoding process using the encoding device 30 will be described below with reference to a flowchart in fig. 27. Note that the processing in steps S401 to S405 is similar to the processing in steps S331 to S335 in fig. 25, and thus detailed description thereof will be omitted.
In step S406, the pseudo high frequency subband power difference calculating circuit 36 calculates an evaluation value ResW using the current frame J subjected to processing for each of the K decoded high frequency subband power estimating coefficientspower(id,J)。
Specifically, the pseudo high frequency subband power difference calculating circuit 36 performs calculation similar to the above expression (1) using the high frequency subband signal of each subband supplied from the subband dividing circuit 33, thereby calculating the high frequency subband power (ib, J) in the frame J.
When the high-frequency subband power (ib, J) has been obtained, the pseudo high-frequency subband power difference calculating circuit 36 calculates the following expression (29), thereby calculating the residual mean square value ResstdWpower(id,J)。
[ expression 29]
That is, the power (ib, J) at the high frequency subband and the pseudo high frequency subband power are found for each subband at the high frequency side where the indexes are sb +1 to ebest(ib, id, J), and weight W of each sub-bandpower(power (ib, J)) multiplied by these differences. Multiplying by a weight WpowerThe sum of squared differences of (power (ib, J)) is the residual mean square ResstdWpower(id,J)。
Now that the user has finished the process,the weight W is defined by, for example, the following expression (30)power(power (ib, J)) (wherein sb +1 ≦ ib ≦ eb). Weight WpowerThe value of (power (ib, J)) increases as the high frequency subband power (ib, J) of its subband increases.
[ expression 30]
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates a residual maximum ResmaxWpower(id, J). In particular, by weighting Wpower(power (ib, J)) multiplied by the high frequency subband power (ib, J) and pseudo high frequency subband power (power) at each subband with indices sb +1 to ebestThe maximum value of the absolute values obtained from the difference between (ib, id, J) is the residual maximum ResmaxWpower(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the residual average value ResaveWpower(id,J)。
Specifically, findHigh frequency subband power (ib, J) and pseudo high frequency subband power for each subband with indices sb +1 through ebest(ib, id, J) and multiplying the difference by a weight Wpower(power (ib, J)), and the multiplication by the weight W is obtainedpower(power (ib, J)) of the difference. The absolute value of a value obtained by dividing the obtained difference sum by the number of subbands on the high frequency side (eb-sb) is a residual average ResaveWpower(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an evaluation value ResWpower(id, J). That is, the residual mean square ResstdWpower(id, J), multiplied by weight WmaxResidual maximum Res ofmaxWpower(id, J) and has been multiplied by a weight WaveResidual mean Res ofaveWpowerThe sum of (id, J) is the evaluation value ResWpower(id,J)。
In step S407, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an evaluation value ResPW using the past frame and the current framepower(id,J)。
Specifically, the pseudo high frequency subband power difference calculating circuit 36 records the pseudo high frequency subband power of each subband obtained using the decoded high frequency subband power estimating coefficient of the finally selected coefficient index for a frame (J-1) one frame ahead in time of the frame J to be processed.
The pseudo high frequency sub-band power difference calculation circuit 36 first calculates the estimated residual mean square value ResPstdWpower(id, J). That is, for each subband on the high frequency side with indices sb +1 to eb, the pseudo high frequency subband power is foundest(ib,idselected(J-1), J-1) and pseudo high frequency subband powerest(ib, id, J) and multiplying the difference by a weight Wpower(power (ib, J)). Multiplying by a weight WpowerThe sum of squares of the differences of (power (ib, J)) is the estimated residualMean square value ResPstdWpower(id,J)。
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates the estimated residual maximum ResPmaxWpower(id, J). In particular, multiplied by a weight WpowerPower of power (ib, J)) at pseudo high frequency sub-band powerest(ib, id, J) and pseudo high frequency subband power for each subband with indices sb +1 to ebest(ib,idselectedThe absolute value of the maximum of the difference between (J-1), J-1) is the estimated residual maximum ResPmaxWpower(id,J)。
Next, the pseudo high frequency sub-band power difference calculation circuit 36 calculates an estimated residual average ResPaveWpower(id, J). Specifically, the power of the pseudo high frequency sub-band is obtainedest(ib, id, J) and pseudo high frequency subband power for each subband with indices sb +1 to ebest(ib,idselected(J-1), J-1) and multiplying the difference by a weight Wpower(power (ib, J)). By multiplying by a weight WpowerThe absolute value of a value obtained by dividing the sum of differences of (power (ib, J)) by the number of subbands on the high-frequency side (eb-sb) is an estimated residual average ResPaveWpower(id,J)。
In addition, the pseudo high frequency sub-band power difference calculation circuit 36 obtains an estimated residual error mean square value ResPstdWpower(id, J), multiplied by weight WmaxEstimated residual maximum ResP ofmaxWpower(id, J) and has been multiplied by a weight WaveEstimated residual mean value of (ResP)aveWpower(id, J) and the sum is taken as an evaluation value ResWpower(id,J)。
In step S408, the pseudo high frequency sub-band power difference calculation circuit 36 evaluates the value ResWpower(id, J) and multiplied by weight W in expression (25)p(J) Evaluation value of ResPWpower(id,J) Are added, and a final evaluation value Res is calculatedallWpower(id, J). Evaluation value ResallWpower(id, J) is here calculated for each of the K decoded high frequency subband power estimation coefficients.
Subsequently, the processing in step S409 to step S411 is performed, and the encoding processing is ended, but the processing here is similar to the processing in step S339 to step S341 in fig. 25, and thus detailed description thereof will be omitted. Note that in step S409, the evaluation value Res having the smallest among the K coefficient indexes is selectedallWpowerCoefficient index of (id, J).
Thereby, the weight is made possible to be further placed on the sub-band having larger power, each sub-band being weighted, whereby audio having a higher sound quality can be obtained on the decoding apparatus 40 side.
Note that, with regard to the above description, based on the evaluation value ResallWpower(id, J) the selection of the decoded high frequency subband power estimating coefficients is performed, but may also be based on the evaluation value ResWpower(id, J) to select the decoded high frequency subband power estimation coefficients.
<6. sixth embodiment >
[ configuration of coefficient learning apparatus ]
Now, a set of coefficients A will be used as decoded high frequency subband power estimation coefficientsib(kb) and the coefficient BibIs associated with the coefficient index and recorded into the decoding apparatus 40 in fig. 20. For example, when decoded high-frequency subband power estimating coefficients of 128 coefficient indices have been recorded at the decoding apparatus 40, a large area is required as a recording area of a memory in which these decoded high-frequency subband power estimating coefficients and the like are recorded.
Thus, a part of the several decoded high-frequency subband power estimating coefficients can be made to be shared coefficients, so that a recording area required for recording the decoded high-frequency subband power estimating coefficients can be made smaller. In such a case, the coefficient learning device that finds the decoded high-frequency subband power estimating coefficient by learning is configured, for example, as shown in fig. 28.
The coefficient learning device 81 includes a subband dividing circuit 91, a high-frequency subband power calculating circuit 92, a feature amount calculating circuit 93, and a coefficient estimating circuit 94.
Pieces of pitch data or the like for learning are supplied to the coefficient learning device 81 as a broadband teaching signal. A wideband teaching signal is a signal that includes multiple high frequency subband components and multiple low frequency subband components.
The subband dividing circuit 91 is composed of a band pass filter or the like, divides the supplied broadband teaching signal into a plurality of subband signals, and supplies these to the high-frequency subband power calculating circuit 92 and the characteristic amount calculating circuit 93. Specifically, the high-frequency subband signal of each subband on the high-frequency side with indices sb +1 to eb is supplied to the high-frequency subband power calculating circuit 92, and the low-frequency subband signal of each subband on the low-frequency side with indices sb-3 to sb is supplied to the characteristic amount calculating circuit 93.
The high-frequency subband power calculating circuit 92 calculates high-frequency subband powers of the respective high-frequency subband signals supplied from the subband dividing circuit 91, and supplies the high-frequency subband powers to the coefficient estimating circuit 94. The feature amount calculation circuit 93 calculates low-frequency subband powers as feature amounts based on the respective low-frequency subband signals supplied from the subband dividing circuit 91, and supplies them to the coefficient estimation circuit 94.
The coefficient estimation circuit 94 performs regression analysis by using the high-frequency subband power from the high-frequency subband power calculation circuit 92 and the feature amount from the feature amount calculation circuit 93 to generate a decoded high-frequency subband power estimation coefficient, and outputs it to the decoding apparatus 40.
[ description of coefficient learning processing ]
Next, the coefficient learning process performed by the coefficient learning apparatus 81 will be described with reference to a flowchart in fig. 29.
In step S431, the subband dividing circuit 91 divides each of the supplied plurality of broadband teaching signals into a plurality of subband signals. The subband dividing circuit 91 supplies the high-frequency subband signals of the subbands indexed to sb +1 to eb to the high-frequency subband power calculating circuit 92, and supplies the low-frequency subband signals of the subbands indexed to sb-3 to sb to the characteristic amount calculating circuit 93.
In step S432, the high-frequency subband power calculating circuit 92 performs calculation similar to that in the above-described expression (1), and calculates the high-frequency subband power of each high-frequency subband signal supplied from the subband dividing circuit 91 and supplies it to the coefficient estimating circuit 94.
In step S433, the feature amount calculating circuit 93 performs calculation similar to that in the above-described expression (1), and calculates low-frequency subband power as a feature amount for each low-frequency subband signal supplied from the subband dividing circuit 91, and supplies it to the coefficient estimating circuit 94.
Thus, the high frequency subband power and the low frequency subband power are provided to coefficient estimation circuit 94 for each frame of the plurality of wideband teaching signals.
In step S434, the coefficient estimation circuit 94 performs regression analysis using the least square method, and calculates the coefficient A of each high-frequency-side subband ib (where sb +1 ≦ ib ≦ eb) having indices sb +1 to ebib(kb) and the coefficient Bib。
Note that, with respect to the regression analysis, the low-frequency subband power supplied from the feature amount calculating circuit 93 is an explanatory variable, and the high-frequency subband power supplied from the high-frequency subband power calculating circuit 92 is an explained variable. In addition, regression analysis is performed using the low-frequency subband power and the high-frequency subband power of all frames constituting the entirety of the broadband teaching signal supplied to the coefficient learning device 81.
In step S435, the coefficient estimation circuit 94 uses the coefficient a found for each subband ibib(kb) and the coefficient BibTo find a residual vector for each frame of the wideband teaching signal.
For example, the coefficient estimation circuit 94 subtracts the high frequency subband power (ib, J) multiplied by the coefficient A from each subband ib of the frame J (where sb +1 ≦ ib ≦ eb)ib(kb) (where sb-3. ltoreq. kb. ltoreq. sb) and coefficient BibAnd summing to obtain a residual. The vector consisting of the residuals of each subband ib of the frame J is a residual vector.
Note that the residual vector is calculated for all frames constituting all of the broadband teaching signals supplied to the coefficient learning device 81.
In step S436, the coefficient estimation circuit 94 normalizes the residual vectors obtained for the respective frames. For example, the coefficient estimation circuit 94 normalizes the residual vectors by finding the dispersion values of the residuals of the sub-bands ib of the residual vectors of all frames and dividing the residuals of the sub-bands ib of the respective residual vectors by the square root of the dispersion value of each sub-band.
In step S437, the coefficient estimation circuit 94 performs K-average clustering on the normalized residual vectors of all frames.
For example, in using the coefficient Aib(kb) and the coefficient BibThe average frequency envelope of all frames obtained when performing the estimation of the high frequency subband power is referred to as average frequency envelope SA. In addition, it will be assumed that the predetermined frequency envelope having a power greater than that of the average frequency envelope SA is the frequency envelope SH, and the predetermined frequency envelope having a power lower than that of the average frequency envelope SA is the frequency envelope SL.
At this time, residual vector clustering is performed such that each of the residual vectors of the coefficients of the frequency envelope that obtain the frequency envelope close to the average frequency envelope SA, the frequency envelope SH, the frequency envelope SL belongs to the cluster CA, the cluster CH, and the cluster CL, respectively. In other words, clustering is performed such that the residual vector of each frame belongs to one of the cluster CA, the cluster CH, or the cluster CL.
With respect to estimating high-frequency components based on correlation between low-frequency components and high-frequency componentsBand extension processing when using coefficients A obtained by regression analysisib(kb) and the coefficient BibWhen the residual vector is calculated, the more distant the subband is from the high frequency side, the larger the residual is, according to its characteristics. Therefore, if the residual vectors are clustered but do not change, a larger weight will be placed on the subband far from the high frequency side, and processing is performed.
In contrast, with the coefficient learning device 81, by normalizing the residual vectors using the dispersion values of the residual values of each sub-band, it is apparent that the dispersion of the residual of each sub-band is equal, and clustering is performed by weighting the respective sub-bands equally.
In step S438, the coefficient estimation circuit 94 selects one of the cluster CA, the cluster CH, or the cluster CL as a cluster to be processed.
In step S439, the coefficient estimation circuit 94 calculates the coefficient A of each sub-band ib (where sb +1 ≦ ib ≦ eb) by regression analysis using the frame of the residual vector of the cluster belonging to the cluster selected as the cluster to be processedib(kb) and the coefficient Bib。
That is, if it is assumed that a frame of a residual vector belonging to a cluster to be processed is referred to as a frame to be processed, low-frequency subband powers and high-frequency subband powers of all frames to be processed are an explanatory variable and an explained variable, and regression analysis is performed using a least square method. Thus, the coefficient a is obtained for each subband ibib(kb) and the coefficient Bib。
In step S440, the coefficient estimation circuit 94 uses the coefficient a obtained for the processing of all the frames to be processed in step S439ib(kb) and the coefficient BibAnd a residual vector is determined. Note that in step S440, processing similar to that in step S435 is performed, and residual vectors of respective frames to be processed are found.
In step S441, the coefficient estimation circuit 94 normalizes the residual vectors of the respective frames to be processed obtained in the process in step S440 by performing the process similar to that in step S436. That is, the residual is divided by the square root of the dispersion value, and normalization of the residual vector is performed for each subband.
In step S442, the coefficient estimation circuit 94 clusters the residual vectors of all the frames to be processed that have been normalized by K-averaging or the like. The number of clusters is defined herein as follows. For example, at the coefficient learning device 81, in the case of generating 128 coefficient index decoding high-frequency subband power estimating coefficients, the number of frames to be processed is multiplied by 128, and the number obtained by dividing the obtained number by the number of all frames is the number of clusters. Now, the number of all frames is the total number of all frames of the entire broadband teaching signal supplied to the coefficient learning device 81.
In step S443, the coefficient estimation circuit 94 finds the barycentric vector of each cluster obtained using the processing in step S442.
For example, the cluster obtained by clustering in step S442 corresponds to a coefficient index, and at the coefficient learning device 81, a coefficient index is assigned to each cluster, and a decoded high-frequency subband power estimating coefficient for each coefficient index is found.
Specifically, it is assumed that in step S438, the cluster CA is selected as the cluster to be processed, and in step S442, F clusters are obtained by clustering in step S442. Now, if attention is paid to one cluster CF among the F clusters, the decoded high-frequency subband power estimating coefficient of the coefficient index of the cluster CF is set as the coefficient a found as to the cluster CA in step S439ibCoefficient A of linear correlation term of (ib)ib(kb). In addition, the vector subjected to the inverse process of normalization (inverse normalization) performed in step 441 on the barycentric vector of the cluster CF determined in step S443 and the coefficient B determined in step S439 are performedibIs a coefficient B which is a constant term of the decoded high frequency subband power estimation coefficientib. Here, the denormalization is: for example, in the case where the normalization performed in step S411 divides the residual by the square root of the dispersion value of each sub-band, the center of gravity of the cluster CF is multiplied byElements of the quantity are processed for the same value (square root of the variance value for each subband).
That is, the coefficient a obtained in step S439ib(kb) and the coefficient B determined as described aboveibBecomes the estimated coefficients of the decoded high frequency subband power of the coefficient indices of the cluster CF. Therefore, each of the F clusters obtained by clustering has the sharing coefficient a found for the cluster CAib(kb) as a linear correlation term for the decoded high frequency subband power estimation coefficients.
In step S444, the coefficient learning apparatus 81 determines whether all of the cluster CA, the cluster CH, and the cluster CL have been processed as the cluster to be processed. In step S444, in the case where it has been determined that not all clusters have been processed yet, the process returns to step S438, and the above-described process is repeated. That is, the next cluster is selected as the cluster to be processed, and the decoded high frequency subband power estimating coefficient is calculated.
In contrast, in step S444, in the case where it has been determined that all the clusters have been processed, a predetermined number of high-frequency subband power estimating coefficients of the code to be solved are obtained, whereby the processing proceeds to step S445.
In step S445, the coefficient estimation circuit 94 outputs the found coefficient index and the decoded high-frequency subband power estimation coefficient to the decoding apparatus 40 and causes it to be recorded, and ends the coefficient learning process.
For example, among the decoded high-frequency subband power estimating coefficients output to the decoding apparatus 40, several decoded high-frequency subband power estimating coefficients have the same coefficient Aib(kb) as a linear correlation term. Thus, with respect to their shared coefficient Aib(kb) to which the coefficient learning device 81 corresponds as the recognition coefficient Aib(kb) of information, and with respect to the coefficient index, the corresponding linear correlation term index and coefficient B as a constant termib。
The coefficient learning device 81 indexes (pointers) and sums of the respective linear correlation termsCoefficient Aib(kb) and corresponding coefficient indices and linear correlation term indices (pointers) and coefficients BibIs supplied to the decoding device 40 and this is recorded in a memory within the high frequency decoding circuit 45 of the decoding device 40. Thus, in recording a plurality of decoded high-frequency subband power estimating coefficients, regarding the shared linear correlation term, if the linear correlation term index (pointer) is stored in the recording area for various decoded high-frequency subband power estimating coefficients, the recording area can be kept considerably small.
In this case, the linear correlation term index and the coefficient aib(kb) is correlated and recorded in a memory in the high-frequency decoding circuit 45, whereby a linear correlation term index and a coefficient B can be obtained from the coefficient indexibAnd further coefficient A can be obtained from the linear correlation term indexib(kb)。
Note that, as a result of analysis by the present applicant, it can be seen that even if three modes or the like sharing linear correlation terms of a plurality of decoded high-frequency subband power estimating coefficients are shared, there is very little deterioration in sound quality from the viewpoint of listening to audio subjected to band extension processing. Therefore, according to the coefficient learning device 81, the sound quality of the sound after the band expansion processing is not degraded, and the recording area required for recording the decoded high-frequency subband power estimating coefficients can be small.
As described above, the coefficient learning device 81 generates and outputs the decoded high-frequency subband power estimating coefficient for each coefficient index from the supplied broadband teaching signal.
Note that the coefficient learning process of fig. 29 is described as normalizing the residual vector, but in one or both of step S436 or step S441, normalization of the residual vector is not necessarily performed.
In addition, the following arrangement may be made: normalization of the residual vector is performed, but sharing of the linear correlation terms of the decoded high frequency subband power estimation coefficients is not performed. In such a case, after the normalization process of step S436, the normalized residual vectors are clustered into the same number of clusters as the number of decoded high-frequency subband power estimating coefficients to be found. Regression analysis is performed for each cluster using frames of residual vectors belonging to the respective clusters, and decoded high-frequency subband power estimation coefficients are generated for the respective clusters.
The series of processes described above may be performed using hardware or may be performed using software. In the case where the series of processes is executed using software, a program constituting the software is installed from a program recording medium into a computer or a general-purpose personal computer or the like having dedicated built-in hardware, for example, a general-purpose personal computer that can execute various functions by various types of programs installed.
Fig. 30 is a block diagram showing a configuration example of hardware of a computer that executes the above-described series of processing using a program.
In the computer, a CPU101, a ROM (read only memory) 102, and a RAM (random access memory) 103 are connected to each other via a bus 104.
An input/output interface 105 is also connected to the bus 104. The input/output interface 105 is connected with: an input unit 106 including a keyboard, a mouse, a microphone, and the like; an output unit 107 including a display, a speaker, and the like; a storage unit 108 including a hard disk, a nonvolatile memory, or the like; a communication unit 109 including a network interface and the like; and a drive 110 for driving a removable medium 111, such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory.
With the computer configured as above, for example, the CPU101 loads a program stored in the memory unit 108 into the RAM103 through the input/output interface 105 and the bus 104, and executes the program, thereby executing the series of processes described above.
A program executed by a computer (CPU 101) is recorded in a removable medium 111, the removable medium 111 being a package medium including: such as a magnetic disk (including a floppy disk), an optical disk ((CD-ROM (compact disk-read only memory), DVD (digital versatile disk)), etc.), a magneto-optical disk, or a semiconductor memory, etc., or by way of example only, a cable or wireless communication medium, such as a local area network, the internet, or digital satellite broadcasting.
By installing the removable medium 111 on the drive 110, the program is installed into the storage unit 108 through the input/output interface 105. In addition, the program may be received by the communication unit 109 through a cable or a wireless transmission medium and installed in the memory 108. Further, the program may be installed in advance in the ROM102 or the storage unit 108.
Note that the program executed by the computer may be a program that executes processing in a time-series manner in the order described in this specification, or may be a program in which processing is executed in parallel, or a program that is executed at a required timing such as when called, or the like.
Note that the embodiment of the present invention is not limited to the above-described embodiment, and various modifications can be made within the essential scope of the present invention.
List of reference numerals
10 band extension device
11 low-pass filter
12 delay circuit
1313-1 to 13-N band-pass filter
14 characteristic quantity calculating circuit
15 high-frequency sub-band power estimation circuit
16 high frequency signal generating circuit
17 high-pass filter
18 signal adding unit
20 coefficient learning apparatus
2121-1 to 21- (K + N) band-pass filter
22 high-frequency sub-band power calculating circuit
23 characteristic quantity calculating circuit
24 coefficient estimation circuit
30 coding device
31 low pass filter
32 low frequency coding circuit
33 subband dividing circuit
34 characteristic quantity calculating circuit
35 pseudo high frequency sub-band power calculating circuit
36 pseudo high frequency sub-band power difference calculating circuit
37 high frequency coding circuit
38 multiplexing circuit
40 decoding device
41 multiplexing/demultiplexing circuit
42 low frequency decoding circuit
43 subband dividing circuit
44 characteristic quantity calculating circuit
45 high frequency decoding circuit
46 decoding high-frequency sub-band power calculating circuit
47 decoding high frequency signal generating circuit
48 synthetic circuit
50 coefficient learning device
51 low-pass filter
52 sub-band dividing circuit
53 characteristic amount calculating circuit
54 pseudo high frequency sub-band power calculating circuit
55 pseudo high frequency sub-band power difference calculating circuit
56 pseudo high frequency sub-band power difference clustering circuit
57 coefficient estimation circuit
101 CPU
102 ROM
103 RAM
104 bus
105 input/output interface
106 input unit
107 output unit
108 memory cell
109 communication unit
110 driver
111 removable media
Claims (12)
1. A band extending apparatus comprising:
signal dividing means configured to divide an input signal into a plurality of subband signals;
feature amount calculation means configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the signal division means;
a high-frequency subband power estimating means configured to calculate an estimated value of a high-frequency subband power that is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the feature amount calculating means; and
a high-frequency signal component generating means configured to generate a high-frequency signal component based on the plurality of subband signals divided by the signal dividing means and the estimated value of the high-frequency subband power calculated by the high-frequency subband power estimating means;
wherein the frequency band of the input signal is extended using the high-frequency signal component generated by the high-frequency signal component generating means;
wherein the high-frequency subband power estimating means calculates an estimated value of the high-frequency subband power based on the feature amount and a coefficient of each high-frequency subband obtained in advance by learning;
wherein the coefficient of each high-frequency subband is generated by performing clustering on a residual vector of the high-frequency signal component calculated using the coefficient of each high-frequency subband obtained by regression analysis using a plurality of teaching signals and performing regression analysis using the teaching signal belonging to each cluster obtained by the clustering.
2. The band extending apparatus according to claim 1, wherein the feature amount calculating means calculates a low-frequency subband power as the power of the plurality of subband signals as the feature amount.
3. The band extending apparatus according to claim 1, wherein the feature amount calculating means calculates a temporal variation of low-frequency subband power as the power of the plurality of subband signals as the feature amount.
4. The band extending apparatus according to claim 1, wherein the feature amount calculating means calculates a difference between a maximum power and a minimum power in a predetermined band of the input signal as the feature amount.
5. The band expanding device according to claim 1, wherein the feature amount calculating means calculates a temporal variation of a difference between a maximum value and a minimum value of power in a predetermined frequency band of the input signal as the feature amount.
6. The band extending apparatus according to claim 1, wherein the characteristic amount calculating means calculates a slope of power in a predetermined frequency band of the input signal as the characteristic amount.
7. The band extending apparatus according to claim 1, wherein the feature amount calculating means calculates a temporal change in a slope of power in a predetermined frequency band of the input signal as the feature amount.
8. The band extending apparatus according to claim 1, wherein said residual vectors are normalized using a dispersion value of each component of a plurality of said residual vectors, and said normalized vectors are clustered.
9. The band extending apparatus according to claim 1, wherein the high-frequency subband power estimating means calculates the estimated value of the high-frequency subband power based on the feature quantity, a coefficient and a constant for each of the high-frequency subbands;
wherein the constant is calculated from a center-of-gravity vector of a new cluster obtained by further calculating the residual vector using a coefficient of each high-frequency subband obtained by regression analysis using the teaching signal belonging to the cluster and performing clustering of the residual vector into the new clusters.
10. The band extending apparatus according to claim 9, wherein said high-frequency subband power estimating means records in an associated manner a coefficient for each of said high-frequency subbands and a pointer for determining the coefficient for said each high-frequency subband, and further records a plurality of sets of said pointer and said constant, some of said plurality of sets including pointers having the same value.
11. The band extending apparatus according to claim 1, wherein said high-frequency signal component generating means generates said high-frequency signal component from a low-frequency subband power that is a power of said plurality of subband signals and an estimated value of said high-frequency subband power.
12. A method of band extension, comprising:
a signal dividing step arranged to divide an input signal into a plurality of subband signals;
a feature amount calculating step configured to calculate a feature amount representing a feature of the input signal using the input signal and at least one of the plurality of subband signals divided by the processing in the signal dividing step;
a high-frequency subband power estimating step configured to calculate an estimated value of a high-frequency subband power, which is a power of a subband signal having a band higher than that of the input signal, based on the feature amount calculated by the processing in the feature amount calculating step; and
a high-frequency signal component generating step configured to generate a high-frequency signal component based on the plurality of subband signals divided by the processing in the signal dividing step and the estimated value of the high-frequency subband power calculated by the processing in the high-frequency subband power estimating step;
wherein the frequency band of the input signal is expanded using the high-frequency signal component generated by the processing in the high-frequency signal component generating step;
wherein the estimated value of the high-frequency subband power is calculated based on the feature amount and a coefficient of each high-frequency subband obtained in advance by learning;
wherein the coefficient of each high-frequency subband is generated by performing clustering on a residual vector of the high-frequency signal component calculated using the coefficient of each high-frequency subband obtained by regression analysis using a plurality of teaching signals and performing regression analysis using the teaching signal belonging to each cluster obtained by the clustering.
Applications Claiming Priority (7)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2009-233814 | 2009-10-07 | ||
| JP2009233814 | 2009-10-07 | ||
| JP2010-092689 | 2010-04-13 | ||
| JP2010092689 | 2010-04-13 | ||
| JP2010162259A JP5754899B2 (en) | 2009-10-07 | 2010-07-16 | Decoding apparatus and method, and program |
| JP2010-162259 | 2010-07-16 | ||
| PCT/JP2010/066882 WO2011043227A1 (en) | 2009-10-07 | 2010-09-29 | Frequency band enlarging apparatus and method, encoding apparatus and method, decoding apparatus and method, and program |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1172139A1 HK1172139A1 (en) | 2013-04-12 |
| HK1172139B true HK1172139B (en) | 2015-03-20 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| AU2022283728B2 (en) | Frequency band extending device and method, encoding device and method, decoding device and method, and program | |
| WO2012050023A1 (en) | Encoding device and method, decoding device and method, and program | |
| JP6508551B2 (en) | Decryption apparatus and method, and program | |
| HK1172139B (en) | Frequency band enlarging apparatus and method, encoding apparatus and method, decoding apparatus and method | |
| HK1200237B (en) | Encoding device and encoding method | |
| HK1200236B (en) | Decoding device and decoding method |