CN100339886C

CN100339886C - Encoder capable of detecting transient position of sound signal and encoding method

Info

Publication number: CN100339886C
Application number: CNB031103685A
Authority: CN
Inventors: 徐建华
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc
Priority date: 2003-04-10
Filing date: 2003-04-10
Publication date: 2007-09-26
Anticipated expiration: 2023-04-10
Also published as: CN1536559A

Abstract

An encoder includes a polyphase filter bank, a transient detector, and an encoding processing unit. The encoder first performs a sub-band encoding step to generate a plurality of sub-band samples according to an input signal, wherein each sub-band sample comprises a plurality of frequency sub-bands. Then, a selection step is performed to select a plurality of subband samples as reference sampling data, and the block length of the window data is determined according to the energy sum of the frequency subbands of the reference sampling data in a preset frequency range. Finally, transform coding step is carried out, and the multiple frequency sub-bands are transformed by a preset transformation algorithm according to the window data determined in the selection step to generate output signals.

Description

Encoder capable of detecting transient position of sound signal and encoding method

技术领域technical field

本发明提供一种编码器，尤指一种可以检测声音信号的暂态位置的编码器。本发明的编码器亦可以进一步判断频域编码时使用视窗数据的块长度。The invention provides an encoder, especially an encoder capable of detecting the transient position of a sound signal. The encoder of the present invention can further determine the block length of the window data used in the frequency domain encoding.

背景技术Background technique

目前有许多编码器依据人类听觉系统的特性而采用特殊的编码演算法，可将数字声音信号数据压缩至十倍以上，如MP3、AAC、WWA及Dolby Digital等，这些编码器采用了知觉编码、频域编码、视窗切换及动态位分配等技术来消除原始声音信号数据中不必要的内容。At present, many encoders use special encoding algorithms based on the characteristics of the human auditory system, which can compress digital audio signal data to more than ten times, such as MP3, AAC, WWA and Dolby Digital, etc. These encoders use perceptual encoding, Frequency domain coding, window switching and dynamic bit allocation and other technologies are used to eliminate unnecessary content in the original sound signal data.

知觉编码是通过消除一般人类听觉系统所感受不到的声音信号数据来进行压缩。一般来说，人类所能听到的声音频率约为20Hz到20kHz之间，而其他频域的声音一般人类是感受不到的。另一方面，人类的听觉系统在某些情况下也会产生听觉的屏蔽(mask)，而无法分辨出量化的噪声，例如当有一个音量或音色特别突出的声音出现时，其邻近的细小的声音会比较难被察觉，因此在编码时不需要把所有的声音细节都编进去。Perceptual coding is compression by eliminating sound signal data that is not perceived by the general human auditory system. Generally speaking, the sound frequency that humans can hear is about 20Hz to 20kHz, while the sound in other frequency domains is generally not felt by humans. On the other hand, the human auditory system also produces an auditory mask in some cases, and cannot distinguish quantized noise. The sound will be more difficult to detect, so you don't need to encode all the sound details into it.

频域编码是一种可以有效消除不必要数据的方法，将有很强相关性的时域数据转换到各元素几乎不相关的频域内，来除去除数据中不必要的内容，一般可分为变换编码或子带(subband)编码。变换编码的频谱解析度较高，而子带编码的解析度低但效率较高，所以可以将这两种编码结合成一个混合滤波器，在不同频率处有不同的解析度。然而，频域编码有一个显著的现象称为前向回波(preechoes)，举例来说，一般静音之后倘若突然出现很大的声音，可能会使得量化误差增大。在变换编码和子带编码中都会产生这种现象，导致数据在转换回时域之后出现声音的前向回波。Frequency domain coding is a method that can effectively eliminate unnecessary data. It converts time domain data with strong correlation into frequency domain where each element is almost irrelevant to remove unnecessary content in the data. Generally, it can be divided into Transform coding or subband coding. Transform coding has higher spectral resolution, while subband coding has lower resolution but higher efficiency, so the two codes can be combined into a hybrid filter with different resolutions at different frequencies. However, frequency domain coding has a remarkable phenomenon called preechoes. For example, if there is a sudden loud sound after general silence, the quantization error may increase. This phenomenon occurs in both transform coding and subband coding, causing a forward echo of the sound after the data has been converted back to the time domain.

消除前向回波的一种方法是将误差限制在一个较小的时间段内，把声音的其它部分与前向回波分开，使前向回波产生于屏蔽区之中。将误差限制在一个较小的时间段内需要使用较小的块来进行频域变换，这种方法称为视窗切换，当信号稳定时使用较大的块来进行频域编码，而当信号有大幅度的暂态(Transient)时，就使用较小的块来进行频域编码。视窗切换的缺点是表示相同数据时需要更多的位数，因为随着编码数据数量的增加需要更多的信息。One way to eliminate the forward echo is to limit the error to a smaller time period, separate the other parts of the sound from the forward echo, and make the forward echo occur in the shielded area. Limiting the error to a smaller time period requires the use of smaller blocks for frequency domain transformation. This method is called window switching. When the signal is stable, larger blocks are used for frequency domain coding, and when the signal has When there is a large transient (Transient), smaller blocks are used for frequency domain coding. The disadvantage of window switching is that more bits are required to represent the same data, because more information is required as the amount of encoded data increases.

一个编码器是否有好的编码品质、与位在各个子带或系数之间的分配有很大的关系。为有效地分配位，必须不断地分析输入信号，并根据对人类听觉系统的知识所建立的模型，将较多位分配到人的听觉最有效的区域，在人耳不敏感的区域就不用分配或只分配很少的编码位。因为信号不停变化，人的听觉系统在不同条件下对信号也会有不同的反应，这就是动态位分配的技术。好的位分配方案需要精确的心理声学模型(psychoacoustic model)。Whether an encoder has good encoding quality has a lot to do with the allocation of bits between subbands or coefficients. In order to effectively allocate bits, the input signal must be continuously analyzed, and based on the model established by the knowledge of the human auditory system, more bits are allocated to the most effective area of human hearing, and no allocation is required in areas where the human ear is not sensitive Or only allocate few encoding bits. Because the signal is constantly changing, the human auditory system will respond differently to the signal under different conditions. This is the technology of dynamic bit allocation. A good bit allocation scheme requires an accurate psychoacoustic model.

请参考图1，图1为已知MPEG layer-3声音信号编码的示意图。首先，脉冲码调制(pulse code modulation，PCM)的输入信号10由多相滤波器组(polyphase filter bank)12分成32个等宽的频率子带(frequencysubbands)，多相滤波器组12可以简易的分析频率对时间的关系，但是等宽的频率子带并不能准确地反映出人类听觉系统的听觉特性，此外，邻近的频率子带会有较多的重叠部分，所以多相滤波器组12的输出需使用修正离散余弦变换(modified discrete cosine transform，MDCT)14来补偿。修正离散余弦变换14进一步将频率子带做细分，以获得较好的频谱解析度，而且可以将一些由多相滤波器组12所产生的重叠消除掉。修正离散余弦变换14包含两个不同长度的视窗块，分别为一个十八采样的长块和一个六采样的短块，因为连续的转移视窗块有百分之五十的重叠，所以块的长度分别是三十六和十二。在声音信号稳定时，长块有较高的频率解析度及较好的压缩率，而短块则提供较好的时间解析度。由于长块的时间解析度较低，若在处理的块中发生暂态现象，因量化噪声(Quantization Noise)会扩散到整个块，使得能量较小的信号因本身屏蔽效应(Mask)较低无法遮蔽量化噪声而产生失真，如前向回波。为避免前向回波，已知MPEG声音信号编码使用心理声学模型16来检测声音信号的暂态(Transient)位置，以使用短块进行修正离散余弦变换14来避免前向回波。在将输入信号10使用频域编码的技术转换到频域后，接着进行量化程序18，根据心理声学模型16来量化数据，然后进行封装程序20，将数据封装后输出数据位流(bitstream)的输出信号22。Please refer to FIG. 1, which is a schematic diagram of known MPEG layer-3 audio signal encoding. First, the input signal 10 of pulse code modulation (pulse code modulation, PCM) is divided into 32 frequency subbands (frequency subbands) of equal width by polyphase filter bank (polyphase filter bank) 12, and polyphase filter bank 12 can be simple Analyze the relationship between frequency and time, but frequency subbands of equal width cannot accurately reflect the auditory characteristics of the human auditory system. In addition, adjacent frequency subbands will have more overlapping parts, so the polyphase filter bank 12 The output needs to be compensated using a modified discrete cosine transform (MDCT)14. The Modified Discrete Cosine Transform 14 further subdivides the frequency subbands to obtain better spectral resolution, and can eliminate some overlaps generated by the polyphase filter bank 12 . The Modified Discrete Cosine Transform 14 contains two window blocks of different lengths, one eighteen-sample long block and one six-sample short block, because consecutive transfer window blocks overlap by 50%, so the length of the block Thirty-six and twelve, respectively. When the sound signal is stable, long blocks have higher frequency resolution and better compression ratio, while short blocks provide better time resolution. Due to the low time resolution of the long block, if a transient phenomenon occurs in the processed block, the quantization noise (Quantization Noise) will spread to the entire block, so that the signal with less energy cannot be detected due to its low masking effect (Mask). Distortions such as forward echoes that mask quantization noise. To avoid forward echo, known MPEG audio signal coding uses a psychoacoustic model 16 to detect the transient (Transient) position of the audio signal, so as to use short blocks to perform Modified Discrete Cosine Transform 14 to avoid forward echo. After the input signal 10 is converted to the frequency domain using the technology of frequency domain encoding, then the quantization program 18 is performed to quantize the data according to the psychoacoustic model 16, and then the encapsulation process 20 is performed to output the data bitstream (bitstream) after encapsulating the data Output signal 22.

由上述可知，在进行频域编码时，为避免前向回波，视窗切换是一种常用的技术，这时检测声音信号暂态位置的机制便很重要。已知MPEG声音信号编码使用心理声学模型16来检测声音信号的暂态位置，虽然很准确，但由于心理声模型16相当复杂，所需的成本也很高，若因为使用视窗切换需要检测声音信号的暂态位置而使用高成本的心理声学模型16，是相当不经济的。It can be seen from the above that, in the frequency domain encoding, in order to avoid the forward echo, window switching is a commonly used technique, and the mechanism for detecting the transient position of the sound signal is very important at this time. It is known that MPEG sound signal coding uses psychoacoustic model 16 to detect the transient position of the sound signal, although it is very accurate, but because the psychoacoustic model 16 is quite complicated, the required cost is also high. It is quite uneconomical to use a high-cost psychoacoustic model 16 for the transient position of the model.

发明内容Contents of the invention

因此本发明的主要目的是提供一种可检测声音信号暂态位置的编码器。另一方面，本发明亦提供一种可判断频域编码时使用视窗数据的块长度的编码器及编码方法，以解决上述问题。It is therefore the main object of the present invention to provide an encoder which detects the transient position of an acoustic signal. On the other hand, the present invention also provides an encoder and an encoding method capable of determining the block length of window data used in frequency domain encoding, so as to solve the above problems.

本发明是提供一种编码器，用来将输入信号编码为输出信号。该编码器包含多相滤波器组，用来根据该输入信号产生多个子带样本，不同的子带样本对应于不同时段的输入信号波形，而每一子带样本中包含多个频率子带；暂态检测器，连接到该多相滤波器组，用来决定视窗数据的块长度，该视窗数据中包含有多个加权值，该暂态检测器包含子带选择器，用来选择该多个子带样本作为参考采样数据；能量计算器，连接到该子带选择器，用来计算该参考采样数据中频率子带的能量总和；分区器，连接到该子带选择器与该能量计算器之间，用来将该参考采样数据分成数组子采样数据，每一组子采样数据包含至少一个子带样本；以及比较器，连接到该能量计算器，用来将能量计算器的输出值与第一临界值作比较，根据该比较结果输出表示视窗数据的块长度的信号；以及编码处理单元，连接到该多相滤波器组与该暂态检测器，用来将该多个频率子带乘以该暂态视窗数据中的多个加权值以产生加权结果，再以预设的转换演算法根据该加权结果产生该输出信号。The present invention provides an encoder for encoding an input signal into an output signal. The encoder includes a polyphase filter bank, which is used to generate a plurality of subband samples according to the input signal, and different subband samples correspond to input signal waveforms of different periods, and each subband sample includes a plurality of frequency subbands; A transient detector, connected to the polyphase filter bank, is used to determine the block length of the window data, the window data contains a plurality of weighted values, and the transient detector includes a subband selector, which is used to select the multiple weighted values sub-band samples as reference sampling data; an energy calculator, connected to the sub-band selector, used to calculate the energy sum of the frequency sub-bands in the reference sampling data; a partitioner, connected to the sub-band selector and the energy calculator between, used to divide the reference sampling data into an array of sub-sampling data, each group of sub-sampling data includes at least one sub-band sample; and a comparator, connected to the energy calculator, used to compare the output value of the energy calculator with The first critical value is compared, and a signal representing the block length of the window data is output according to the comparison result; and an encoding processing unit, connected to the polyphase filter bank and the transient detector, is used for the plurality of frequency subbands Multiple weighted values in the transient window data are multiplied to generate a weighted result, and then a preset conversion algorithm is used to generate the output signal according to the weighted result.

本发明另提供一种编码方法，用来将输入信号编码为输出信号。该编码方法包含有进行子带编码步骤，以根据该输入信号产生多个子带样本，不同的子带样本对应于不同时段的输入信号波形，而每一子带样本中包含多个频率子带；进行选择步骤，以提供对应于预设块长度的视窗数据，该视窗数据中包含有多个加权值；而该选择步骤中包含有：在该多个子带样本中，选出多个子带样本作为参考采样数据，并根据该参考采样数据在预设频率范围内的频率子带的能量总和来决定该视窗数据的块长度；以及进行变换编码步骤，将该多个频率子带乘以该选择步骤所决定的视窗数据的多个加权值以产生加权结果，并以预设的转换演算法根据该加权结果产生该输出信号。The present invention further provides an encoding method for encoding an input signal into an output signal. The encoding method includes a subband encoding step to generate a plurality of subband samples according to the input signal, different subband samples correspond to input signal waveforms of different periods, and each subband sample includes a plurality of frequency subbands; A selection step is performed to provide window data corresponding to a preset block length, the window data includes a plurality of weighted values; and the selection step includes: among the plurality of sub-band samples, selecting a plurality of sub-band samples as Referring to the sampled data, and determining the block length of the window data according to the energy sum of the frequency subbands of the reference sampled data within the preset frequency range; and performing a transform encoding step, multiplying the plurality of frequency subbands by the selection step A plurality of weighted values of the determined window data are used to generate a weighted result, and a preset conversion algorithm is used to generate the output signal according to the weighted result.

相对已知技术，本发明提供一种编码器及编码方法可用来决定进行修正离散余弦变换时使用的视窗数据的块长度，利用编码器的过程中所产生的子带样本中频率子带所含的能量值为判断声音信号数据是否发生暂态，远比已知使用心理声学模型需要较低的成本，符合经济效益。Compared with the known technology, the present invention provides an encoder and an encoding method that can be used to determine the block length of the window data used when performing Modified Discrete Cosine Transformation. The energy value of judging whether the sound signal data has a transient state requires far lower cost than the known psychoacoustic model, which is in line with economic benefits.

附图说明Description of drawings

图1为已知MPEG layer-3声音信号编码的示意图。Fig. 1 is the schematic diagram of known MPEG layer-3 sound signal encoding.

图2为本发明实施例的编码器的示意图。Fig. 2 is a schematic diagram of an encoder according to an embodiment of the present invention.

图3为本实施例的子带样本的示意图。FIG. 3 is a schematic diagram of subband samples in this embodiment.

图4为本发明实施例中编码器检测声音信号的暂态位置方法的流程图。FIG. 4 is a flow chart of a method for an encoder to detect a transient position of a sound signal in an embodiment of the present invention.

附图符号说明Description of reference symbols

10 输入信号 12 多相滤波器组10 Input signal 12 Polyphase filter bank

14 修正离散余弦变换 16 心理声学模型14 Modified discrete cosine transform 16 Psychoacoustic model

18 量化程序 20 封装程序18 Quantization program 20 Encapsulation program

22 输出信号 30 本发明编码器22 Output signal 30 Encoder of the present invention

32 暂态检测器 34 编码处理单元32 Transient detector 34 Encoding processing unit

36 子带选择器 38 能量计算器36 Subband Selector 38 Energy Calculator

40 分区器 42 比较器40 Partitioner 42 Comparator

50 参考采样数据50 reference sampling data

具体实施方式Detailed ways

请参考图2，图2为本发明实施例的编码器30的示意图。编码器30用来将脉冲码调制的输入信号10编码为位流的输出信号22。编码器20包含多相滤波器组12、暂态检测器32以及编码处理单元34。多相滤波器组12根据该输入信号10产生多个子带样本，不同的子带样本对应于不同时段的输入信号10波形，而每一子带样本中包含多个频率子带。编码处理单元34可对该多个频率子带进行修正离散余弦变换。暂态检测器32连接到多相滤波器组12及编码处理单元34之间，可决定编码处理单元34进行修正离散余弦变换时所使用的视窗数据的块长度。暂态检测器32包含子带选择器36、能量计算器38、分区器40以及比较器42。子带选择器36会在预设频率范围选择该多个子带样本中部分的子带样本作为参考采样数据，接着能量计算器38会计算参考采样数据中所含的能量值，之后将该能量值交给比较器42与临界值作比较。若是参考采样数据的总能量超过该临界值时，也就是在参考采样数据中可能存在暂态的情形，则再由分区器40将参考采样数据分成数组等宽的子采样数据，而每一组子采样数据至少包含一子带样本，此时能量计算器38会计算相邻两组子采样数据在预设频率范围内的频率子带的能量差值，再将该能量差值传送至比较器42与预定的临界值作比较。如果该能量差值大于预定的临界值时，则可决定编码处理单元34使用短块的视窗数据进行修正离散余弦变换，如此反覆直到分区器42完成所有可能的子采样数据组合。若此时相邻两组的子采样数据的能量差值仍小于预定的临界值，则可决定编码处理单元34使用长块的视窗数据进行修正离散余弦变换。Please refer to FIG. 2 , which is a schematic diagram of an encoder 30 according to an embodiment of the present invention. The encoder 30 is used to encode the pulse code modulated input signal 10 into an output signal 22 of a bit stream. The encoder 20 includes a polyphase filter bank 12 , a transient detector 32 and an encoding processing unit 34 . The polyphase filter bank 12 generates a plurality of subband samples according to the input signal 10 , and different subband samples correspond to waveforms of the input signal 10 at different periods, and each subband sample includes multiple frequency subbands. The encoding processing unit 34 may perform Modified Discrete Cosine Transformation on the multiple frequency subbands. The transient detector 32 is connected between the polyphase filter bank 12 and the encoding processing unit 34, and can determine the block length of the window data used by the encoding processing unit 34 to perform Modified Discrete Cosine Transformation. Transient detector 32 includes subband selector 36 , energy calculator 38 , partitioner 40 and comparator 42 . The sub-band selector 36 will select some of the sub-band samples in the plurality of sub-band samples in the preset frequency range as reference sample data, and then the energy calculator 38 will calculate the energy value contained in the reference sample data, and then the energy value Handed over to the comparator 42 for comparison with the threshold value. If the total energy of the reference sampling data exceeds the critical value, that is, there may be a transient situation in the reference sampling data, then the partitioner 40 divides the reference sampling data into sub-sampling data with equal widths of arrays, and each group The sub-sampling data contains at least one sub-band sample. At this time, the energy calculator 38 will calculate the energy difference of the frequency sub-bands of the adjacent two groups of sub-sampling data within the preset frequency range, and then send the energy difference to the comparator 42 is compared with a predetermined threshold. If the energy difference is greater than a predetermined critical value, the encoding processing unit 34 may decide to use the window data of the short block to perform Modified Discrete Cosine Transformation, and so on until the partitioner 42 completes all possible combinations of sub-sampled data. If the energy difference between the adjacent two groups of sub-sampled data is still smaller than the predetermined critical value, the encoding processing unit 34 may decide to use the window data of the long block to perform Modified Discrete Cosine Transform.

请参考图3，图3为本实施例的子带样本的示意图。多相滤波器组12在一个时段t1中输出十八个子带样本，每一个子带样本中含有三十二个频率子带。编码处理单元34对重叠时段中的每一个频率子带进行修正离散余弦变换，也就是三十六个子带样本。暂态检测器32针对发生声音信号暂态的位置作检测以决定编码处理单元34应使用何种视窗块来进行修正离散余弦变换。所谓的预设频率范围通常指的是介于截止子带与编码限制子带之间的频率，子带选择器36会选择这个频率范围内的频率子带来作为参考采样数据50。截止子带可以根据经验或是实验值来选择第一个子带或是更高频的子带。在本实施例中，截止子带的频率大约为4kHz。编码限制子带就必须要根据编码规则来决定。由于位率(bitrate)以及带宽(bandwidth)都有其限制，编码器30必须舍弃部分高频子带的信息，而被舍弃的频率子带的数据就不再列入考虑。假设没有信息被舍弃的话，则最后一个子带就是编码限制子带。在参考采样数据50选定后，能量计算器38会计算出参考采样数据50中所含的能量值，再由比较器42来判断是否对参考采样数据50继续作检测，分区器40可将参考采样数据50再分成数组等宽的子采样数据，然后能量计算器38会计算相邻两组子采样数据的能量差值，由比较器42决定视窗数据的块长度。举例来说，首先能量计算器38计算子带选择器36选出的参考采样数据50中所有频率子带的总能量，若总能量大于-60dB，则参考采样数据中可能存在有暂态的情形发生，由分区器40将参考采样数据50中的子带样本分成六组等宽的子采样数据，接着由能量计算器38计算相邻两组子采样数据的能量差值交给比较器42进行比较，若两子采样数据的能量差值并未大于20dB，表示这两此子采样数据之间其实并无暂态的情形发生，分区器40会重新将参考采样数据50中的子带样本分成3组等宽的子采样数据，此时再由能量计算器38计算相邻两组子采样数据的能量差值交给比较器42判断是否大于12dB。若大于12dB，则表示数据中含有暂态的情形，因此判断应使用短块视窗；若并未大于12dB，则使用长块视窗。Please refer to FIG. 3 , which is a schematic diagram of subband samples in this embodiment. The polyphase filter bank 12 outputs eighteen subband samples in a time period t1, and each subband sample contains thirty-two frequency subbands. The encoding processing unit 34 performs Modified Discrete Cosine Transformation on each frequency subband in the overlapping period, that is, thirty-six subband samples. The transient detector 32 detects the position where the transient of the audio signal occurs to determine which window block the encoding processing unit 34 should use to perform Modified Discrete Cosine Transformation. The so-called preset frequency range usually refers to the frequency between the cut-off subband and the coding limit subband, and the subband selector 36 will select a frequency subband within this frequency range as the reference sample data 50 . The cut-off sub-band can be selected from the first sub-band or higher frequency sub-bands based on experience or experimental values. In this embodiment, the frequency of the cutoff sub-band is about 4kHz. The encoding restriction subband must be determined according to the encoding rules. Since bitrate and bandwidth have limitations, the encoder 30 must discard information of part of the high-frequency sub-bands, and the data of the discarded frequency sub-bands is no longer considered. The last subband is the coding-limited subband, assuming no information is discarded. After the reference sampling data 50 is selected, the energy calculator 38 will calculate the energy value contained in the reference sampling data 50, and then the comparator 42 will judge whether to continue to detect the reference sampling data 50, and the partitioner 40 can use the reference sampling data The data 50 is further divided into sub-sampling data of equal width, and then the energy calculator 38 calculates the energy difference between two adjacent groups of sub-sampling data, and the comparator 42 determines the block length of the window data. For example, first the energy calculator 38 calculates the total energy of all frequency subbands in the reference sample data 50 selected by the subband selector 36, if the total energy is greater than -60dB, there may be a transient state in the reference sample data occurs, the sub-band samples in the reference sampling data 50 are divided into six groups of equal-width sub-sampling data by the partitioner 40, and then the energy difference between the adjacent two groups of sub-sampling data is calculated by the energy calculator 38 and handed over to the comparator 42. In comparison, if the energy difference between the two sub-sampled data is not greater than 20dB, it means that there is no transient state between the two sub-sampled data, and the partitioner 40 will re-divide the sub-band samples in the reference sampled data 50 into Three sets of equal-width sub-sampling data, at this time, the energy calculator 38 calculates the energy difference between two adjacent groups of sub-sampling data and sends it to the comparator 42 to determine whether it is greater than 12dB. If it is greater than 12dB, it means that the data contains a transient situation, so it is judged that a short block window should be used; if it is not greater than 12dB, a long block window is used.

请参考图4，图4为本发明实施例中，编码器30检测声音信号暂态位置的方法的流程图。本实施例的编码方法可检测声音信号的暂态位置。本实施例的编码方法首先进行子带编码步骤，根据输入信号10产生多个子带样本，不同的子带样本对应于不同时段的输入信号10波形，而每一子带样本中包含多个频率子带。接着进行选择步骤，以决定下一步骤所需使用的视窗数据的块长度，视窗数据中含有多个加权值，选择步骤的方法为在该多个子带样本中，选出多个子带样本作为参考采样数据，并根据参考采样数据在预设频率范围内的频率子带的能量总和来决定该视窗数据的块长度。最后进行变换编码步骤，将该多个频率子带乘以选择步骤所决定的视窗数据的多个加权值以产生加权结果，并根据加权结果使用修正离散余弦变换产生输出信号。而检测声音信号暂态位置的详细步骤如下：Please refer to FIG. 4 . FIG. 4 is a flowchart of a method for the encoder 30 to detect the transient position of the sound signal in an embodiment of the present invention. The encoding method of this embodiment can detect the transient position of the sound signal. The encoding method of this embodiment first performs the sub-band encoding step, and generates a plurality of sub-band samples according to the input signal 10. Different sub-band samples correspond to waveforms of the input signal 10 at different time periods, and each sub-band sample contains a plurality of frequency sub-bands. bring. Then carry out the selection step to determine the block length of the window data to be used in the next step. The window data contains a plurality of weighted values. The method of the selection step is to select a plurality of sub-band samples as a reference in the plurality of sub-band samples. Sampling data, and determining the block length of the window data according to the energy sum of the frequency sub-bands within the preset frequency range of the reference sampling data. Finally, a transform coding step is performed, multiplying the plurality of frequency subbands by the plurality of weighted values of the window data determined in the selection step to generate a weighted result, and using Modified Discrete Cosine Transform to generate an output signal according to the weighted result. The detailed steps of detecting the transient position of the sound signal are as follows:

步骤110：开始进行检测声音信号的暂态位置；Step 110: start to detect the transient position of the sound signal;

步骤120：计算选择作为参考采样数据中的频率子带的总能量是否大于预定的临界值，若是，则进行步骤130，若否，则进行步骤170；Step 120: Calculate whether the total energy of the frequency sub-band selected as the reference sampling data is greater than a predetermined critical value, if so, proceed to step 130, if not, proceed to step 170;

步骤130：将参考采样数据分成数组等宽的子采样数据，每一组子采样数据包含一个以上的子带样本，计算每一组子采样数据中所有的频率子带在预设频率范围中的能量值，接着进行步骤140；Step 130: Divide the reference sampling data into arrays of sub-sampling data of equal width, each group of sub-sampling data contains more than one sub-band sample, and calculate the frequency sub-bands in each group of sub-sampling data in the preset frequency range Energy value, then proceed to step 140;

步骤140：判断相邻两组子采样数据的能量差值是否大于预定的临界值，若是，则进行步骤160，若否，则进行步骤150；Step 140: judging whether the energy difference between two adjacent groups of sub-sampling data is greater than a predetermined critical value, if so, proceed to step 160, if not, proceed to step 150;

步骤150：判断参考采样数据是否还可以分成不同的子采样数据，若是，则回到步骤130，若否，则进行步骤170；Step 150: judging whether the reference sampling data can also be divided into different sub-sampling data, if so, then return to step 130, if not, then proceed to step 170;

步骤160：参考采样数据中含有暂态位置，输出使用短块的视窗数据信号，进行步骤180；Step 160: Refer to the transient position contained in the sampling data, output the window data signal using a short block, and proceed to step 180;

步骤170：参考采样数据中不含暂态位置，输出使用长块的视窗数据信号，进行步骤180；Step 170: Referring to the sampling data that does not contain the transient position, output the window data signal using a long block, and proceed to Step 180;

步骤180：输出判断结果，结束检测声音信号的暂态位置。Step 180: output the judgment result, and end detecting the transient position of the sound signal.

以上所述仅为本发明的优选实施例，凡依本发明权利要求书所做的相似变化与改进，皆应属本发明专利的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all similar changes and improvements made according to the claims of the present invention shall fall within the scope of the patent of the present invention.

Claims

1. A coding method, which is used to encode an input signal into an output signal, the method comprising:

Performing a subband encoding step to generate a plurality of subband samples according to the input signal, different subband samples correspond to input signal waveforms of different periods, and each subband sample includes a plurality of frequency subbands;

performing a selection step to provide window data corresponding to a preset block length, the window data including a plurality of weighted values;

And the selection step includes:

Among the plurality of subband samples, a plurality of frequency subbands are selected according to a preset frequency range as reference sample data, and the energy sum of the frequency subbands of the reference sample data within the preset frequency range is combined with a first A threshold value is compared to determine the block length of the window data; and

performing a transform encoding step, multiplying the plurality of frequency subbands in the reference sampling data by the plurality of weighted values of the window data determined in the selection step to generate a weighted result, and using a preset conversion algorithm to generate according to the weighted result the output signal.

2. The coding method as claimed in claim 1, wherein when performing the selecting step, if the energy sum of the frequency subbands in the reference sampling data is greater than the first critical value, then additionally performing a comparing step, which includes:

Dividing the reference sampled data into arrays of sub-sampled data, each set of sub-sampled data comprising at least one frequency subband; and

Calculating the energy magnitude difference of the frequency subbands in two adjacent groups of sub-sampling data, if the difference is greater than a second critical value, then in the transform encoding step, using window data with a short block length; and

If the energy magnitude difference of the frequency sub-bands in the adjacent two groups of sub-sampling data is less than or equal to the second critical value, then perform another comparison step, and make the frequency sub-bands contained in the sub-sampling data in this comparison step Differs from the subsampled data in the previous comparison.

3. The encoding method as claimed in claim 2, wherein if the energy sum of the frequency subbands in the reference sample data is less than the first critical value, then in the transform encoding step, a long block-length window data is used .

4. The encoding method as claimed in claim 1, wherein the input signal is a pulse code modulated signal.

5. The encoding method as claimed in claim 1, wherein the output signal is an encoded bit stream.

6. The encoding method as claimed in claim 1, wherein the preset transformation algorithm is Modified Discrete Cosine Transform.

7. An encoder for encoding an input signal into an output signal, comprising:

A polyphase filter bank is used to generate a plurality of subband samples according to the input signal, and different subband samples correspond to input signal waveforms of different periods, and each subband sample includes multiple frequency subbands;

A transient detector, connected to the polyphase filter bank, is used to determine the block length of the window data, and the window data contains a plurality of weighted values, and the transient detector includes: a subband selector, used for according to a The preset frequency range selects a plurality of frequency subbands from the plurality of subband samples as reference sampling data; an energy calculator, connected to the subband selector, is used to calculate the energy sum of the frequency subbands in the reference sampling data; partition A device, connected between the subband selector and the energy calculator, is used to divide the reference sampled data into groups of subsampled data, each group of subsampled data includes at least one frequency subband; and a comparator connected to the an energy calculator for comparing an output value of the energy calculator with a first critical value, and outputting a signal representing a block length of the window data according to the comparison result; and

an encoding processing unit, connected to the polyphase filter bank and the transient detector, for multiplying a plurality of frequency subbands in the reference sampling data by a plurality of weighted values in the transient window data to generate a weighted result , and then use a preset conversion algorithm to generate the output signal according to the weighted result.

8. The encoder as claimed in claim 7, wherein the energy calculator calculates the energy magnitude difference of the frequency sub-bands in two adjacent sets of sub-sampling data, and then sends the result to the comparator to be used as the second critical value. Compare.

9. The encoder according to claim 8, wherein the partitioner further divides the reference sampling data into arrays of sub-sampling data according to the comparison result of the comparator, and the frequency sub-bands contained in each group of sub-sampling data A frequency subband different from that contained in the previous subsampled data.

10. The encoder as claimed in claim 7, wherein the input signal is a pulse code modulated signal.

11. The encoder of claim 7, wherein the output signal is an encoded bit stream.

12. The encoder as claimed in claim 7, wherein the predetermined transformation algorithm is Modified Discrete Cosine Transform.

13. A method of detecting the transient state of a sound signal during sound signal encoding, the method comprising:

(a) generating a plurality of sub-band samples according to the sound signal, different sub-band samples correspond to sound signal waveforms of different periods, and each sub-band sample includes a plurality of frequency sub-bands;

(b) Among the plurality of sub-band samples, select a plurality of frequency sub-bands as reference sampling data according to a preset frequency range, and calculate the energy sum of the frequency sub-bands within the preset frequency range according to the reference sampling data ;

(c) If the energy sum of the frequency sub-bands in the reference sampling data is greater than the first critical value, divide the reference sampling data into an array of sub-sampling data, each group of sub-sampling data includes at least one frequency sub-band, and perform step ( d);

On the contrary, if the energy sum of the frequency subbands in the reference sampling data is not greater than the first critical value, it is judged that there is no sound signal transient in the reference sampling data, and the processing ends;

(d) Calculating the energy difference of the frequency sub-bands in the adjacent two sets of sub-sampling data, and judging the audio transient position in the sound signal according to the difference.

14. The method as claimed in claim 13, wherein when performing step (d) and judging the transient state of the sound signal according to the difference, if the difference is greater than a second critical value, then judging that the two groups of sub-sampled data The corresponding sound signal waveform is a transient waveform, and if the difference is less than the second critical value, the reference sampling data is divided into an array of sub-sampling data different from step (c), and step (d) is performed again ).

15. A transient detector arranged in a sound signal encoder, used to detect whether the sound signal input to the encoder contains a transient state, and the sound signal encoder includes a polyphase filter bank, which is used according to the input signal A plurality of sub-band samples are generated, and different sub-band samples correspond to input signal waveforms of different periods, and each sub-band sample includes a plurality of frequency sub-bands, the transient detector is connected to the polyphase filter bank, and includes :

a subband selector, configured to select a plurality of frequency subbands from the plurality of subband samples according to a preset frequency range as reference sampling data;

An energy calculator, connected to the subband selector, used to calculate the energy sum of the frequency subbands in the reference sample data;

a partitioner, connected between the subband selector and the energy calculator, for dividing the reference sample data into an array of sub-sampling data, each group of sub-sampling data comprising at least one frequency sub-band; and

A comparator, connected to the energy calculator, is used to compare the output value of the energy calculator with the first critical value, and judge whether the sound signal input to the encoder contains a transient state according to the comparison result.

16. The transient detector as claimed in claim 15, wherein the energy calculator calculates the energy magnitude difference of the frequency sub-bands in the adjacent two sets of sub-sampling data, and then transmits the result to the comparator and the second threshold value for comparison.

17. The transient detector as claimed in claim 16, wherein the partitioner further divides the reference sampling data into arrays of sub-sampling data according to the comparison result of the comparator, and the frequency contained in each group of sub-sampling data The subbands are different from the frequency subbands contained in the previous subsampled data.

18. The transient detector as claimed in claim 15, wherein the sound signal is a pulse code modulated signal.