CN102117613B

CN102117613B - Method and equipment for processing digital audio in variable speed

Info

Publication number: CN102117613B
Application number: CN 200910202164
Authority: CN
Inventors: 吴晟; 林福辉; 张本好; 董树景; 李昙; 徐晶明
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2009-12-31
Filing date: 2009-12-31
Publication date: 2012-12-12
Anticipated expiration: 2029-12-31
Also published as: CN102117613A

Abstract

The invention relates to the audio signal processing technology and discloses a method and equipment for processing digital audio in variable speed. In the invention, a pair of perfect reconstructing window functions WL and WR with amplitude attenuation and increase characteristics is used to act on original digital audio according to different delays to obtain a pair of windowing data; and an audio waveform is reconstructed by using the windowing data to obtain audio after variable-speed process. The defection on a fundamental tone period and the relativity of the audio and the time-frequency conversion are avoided, so that the calculation amount is extremely small. In addition, the playing time of the contents is prolonged or shortened by using the compaction and the introduction of the self waveform of an audio signal without changing the audio waveform, so that the original tone quality can be better maintained.

Description

Digital audio frequency speed-changing processing method and equipment thereof

Technical Field

The present invention relates to audio signal processing technology, and in particular, to audio frequency shift processing in audio signal processing technology.

Background

Playback speed adjustment of recorded digital audio is in wide demand in various multimedia applications. For example, the speed of sound of voice reproduction is reduced, which is helpful for people with hearing or comprehension disorder and foreign language beginners to improve the intelligibility of listening sound; and the speed of sound is increased, so that the listener can be helped to save the time for acquiring information from the sound recording. In addition, the music rhythm can be changed by adjusting the music playback speed, and a unique effect is obtained; for the accompanying sound in the video, the sound speed adjustment of the audio reproduction can lead the audience to hear the synchronous accompanying sound without distortion while the video accelerates or slows down the playback speed.

However, directly adjusting the playing speed of audio without any processing brings about variations in tone and timbre due to linear migration of frequency components in the sound. For example, when the sound speed is reduced, the sound becomes deep, and the voice becomes a nasal sound emitted during deep sleep; when the speed of sound is increased, the sound becomes sharp, and the voice appears as if a child speaks quickly. Therefore, in order to ensure that the audio only has the variation of the sound velocity, and the tone do not have the transformation and have no obvious distortion, the digital audio needs to be processed. At present, the audio frequency speed change processing is mostly performed by using an algorithm based on an overlap-add technique or an algorithm based on time-frequency transformation and frequency spectrum processing. See also U.S. patent No. 5952596 for techniques for audio shift processing.

However, the inventor of the present invention finds that, in an algorithm implemented based on an overlap add (overlap add) technique, it is necessary to determine the delay of an overlap window by detecting the similarity (waveform similarity) of waveforms, and this kind of method can only process voices with obvious pitch periods, and it uses a cross-correlation detection technique in the time domain or the frequency domain to obtain the delay of similar waveforms as the delay of the overlap window, so the computation consumption is large and the processing sound quality is general; while an algorithm based on time-frequency transform and spectral processing can process general audio (general audio) including voice and music, which resamples original digital audio to transform a sampling rate, converts the digital audio with the changed sampling rate into a frequency domain to obtain a frequency spectrum of the audio, then performs frequency spectrum shift (frequency shift) on the frequency spectrum, and transforms the processed frequency spectrum back to a time domain, the algorithm is generally implemented by using a perfectly reconfigurable short time Fourier transform (short time Fourier transform), the short time Fourier transform needs to process a longer audio at a time in order to obtain a higher processing sound quality, although the method can obtain a better processing sound quality, the amount of calculation and the storage capacity are large, and the algorithm can hardly be implemented on handheld and mobile devices because of the constraints of the calculation capacity and the power consumption.

Disclosure of Invention

The invention aims to provide a digital audio speed change processing method and a device thereof, which can realize speed change processing of general digital audio with lower calculation amount and obtain higher processing sound quality.

In order to solve the above technical problem, an embodiment of the present invention provides a digital audio speed change processing method, including the following steps:

a, filling audio signal data to be subjected to audio variable speed processing into a buffer area until the filled length of the buffer area reaches a data processing length L_p；

B, windowing the audio signal data to be processed in the buffer area in the following mode to obtain an output signal x_out：

If the audio frequency speed change processing is accelerated speed processing, the length in the buffer area is L_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after left end alignment_LTo obtain x_LThe length in the buffer is L_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after right end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WA number of said output signals x_out；

If the audio frequency speed changing processing is the processing of slowing down, the length in the buffer area is L_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after right end alignment_LTo obtain x_LThe length in the buffer is L_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after left end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WA number of said output signals x_out；

C will finish the L of windowing processing_DMoving the signal out of the buffer area, and continuously filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length L_p；

Repeatedly executing the step B and the step C until the audio frequency speed change processing of all the audio signal data is completed;

wherein, the W_LFor a window function having amplitude attenuation characteristics, W_RFor a window function having an amplitude increasing characteristic, W_LAnd W_REach has L_WThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1.

An embodiment of the present invention further provides a digital audio speed-changing processing device, including:

a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area until the filled length of the buffer area reaches the data processing length L_p；

A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal x_out(ii) a The windowing processing module is used for setting the length in the buffer zone to be L when the audio frequency speed change processing is accelerated_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after left end alignment_LTo obtain x_LThe length in the buffer is L_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after right end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WA number of said output signals x_out(ii) a When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is L_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after right end alignment_LTo obtain x_LThe length in the buffer is L_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after left end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WA number of said output signals x_out；

A shift module for shifting the windowed L_DA signal is shifted out of the buffer and the filling module is instructed to bufferThe tail part of the area is continuously filled with the audio signal data to be processed until the filled length of the buffer area reaches the data processing length L_p；

When the filled length of the buffer reaches the data processing length L_pTriggering the processing of the windowing processing module; when the windowing processing module obtains L_WA number of said output signals x_outTriggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed;

Compared with the prior art, the implementation mode of the invention has the main differences and the effects that:

using a pair of perfect reconstruction window functions W with amplitude attenuation and gain characteristics_LAnd W_RAnd acting on the original digital audio according to different time delays to obtain a pair of windowed data, and reconstructing an audio waveform by using the windowed data to obtain the audio subjected to variable speed processing. The calculation amount is extremely low because the pitch period and the correlation of the audio do not need to be detected and the time-frequency transformation does not need to be carried out. In addition, the compression and the introduction of the waveform of the audio signal are utilized to realize the time increase and decrease of the playing content, and the audio waveform is not changed, so that the original tone quality can be maintained more.

Further, when windowing the audio signal data to be processed in the buffer, W_LAnd W_RAdding an initial reconstruction window function equal to 1 to the corresponding point; or, W_LAnd W_RA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and respectively and independently generating reconstruction window functions with different weight distributions, or transforming the initial reconstruction window to obtain the reconstruction window functions. Since audio compression (speed up) is time-to-timeThe compressed audio information is smoothly dispersed over the processed shortened audio data; whereas audio extension (slowing down) is to obtain longer audio data by smoothly overlapping incoming past and future (temporally newer relative to the reference data) audio information. The overlapping process introduces or spreads the signal with larger energy into the original part with smaller energy, resulting in over-echo (after the echo occurs) and pre-echo (before the echo occurs), so that when windowing is performed, an appropriate reconstruction window function can be further selected according to the echo type, so as to further ensure the audio quality after speed change.

Further, the echo type of the audio signal data is obtained according to the judgment result of the block energy or the block absolute value of the audio signal data and the preset threshold. Since if the past signal is larger than the present signal, over-echo is likely to occur; if the past signal is smaller than the present signal, pre-echo is likely to occur. Therefore, the block energy (or the absolute value of the block) of the audio signal is used as the basis for judging the echo type, and the accuracy of the judgment result can be effectively ensured.

Further, an initial reconstruction window W_LAnd W_RThe following were used:

W_L(k)＝1-W_R(k)，k＝1，2，…，L_W

experiments prove that when the window W is initially reconstructed_LAnd W_RDesigned as W_LAnd W_RIn the process, 4 pairs of reconstruction windows with different weight distributions, which can be more flexibly used for processing the audio, can be obtained through the decimation conversion.

Further, L is preset_WAccording to L_WAnd the playback rate r to L_DAnd L_pThe value of (c). Due to three lengths L_W、L_D、L_pThe relationship between is fixed and determining one yields the other two. Design and L of the reconfigurable window_WIs directly related, thus employing a fixed length L_WLet L_D、L_pThe subsequent windowing operation can be simpler and more convenient as r changes.

Drawings

FIG. 1 is a flow chart of a digital audio shift processing method according to a first embodiment of the present invention;

FIG. 2 is a schematic diagram of buffer filling according to a first embodiment of the present invention;

FIG. 3 is an initial reconstruction window W according to a first embodiment of the present invention_LAnd W_RA schematic view of the window type of (1);

FIG. 4 is a diagram illustrating the output of a combined signal of a windowing process when the playback rate r > 1 according to the first embodiment of the present invention;

FIG. 5 is a diagram illustrating the output of a combined signal of the windowing process when the playback rate r < 1 according to the first embodiment of the present invention;

FIG. 6 is a diagram illustrating buffer shifting according to the first embodiment of the present invention;

FIG. 7 is a diagram illustrating waveform effects of a voice test according to a first embodiment of the present invention;

FIG. 8 is a diagram illustrating the waveform effect of a music test according to the first embodiment of the present invention;

FIG. 9 is a diagram illustrating a voice original spectrum of a voice test according to a first embodiment of the present invention;

FIG. 10 is a diagram illustrating a spectrum of a speech test with time compression of 0.5 times according to a first embodiment of the present invention;

FIG. 11 is a schematic diagram of a spectrum diagram of a voice test with a time spread of 2 times according to a first embodiment of the present invention;

FIG. 12 is a diagram illustrating a music original spectrum of a music test according to a first embodiment of the present invention;

FIG. 13 is a graph illustrating a time-compressed 0.5-fold spectrum of a music test according to a first embodiment of the present invention;

FIG. 14 is a schematic diagram of a spectrogram of a music test with a time spread of 2 times according to the first embodiment of the present invention;

FIG. 15 is a diagram illustrating a reconstruction window for obtaining different weight distributions by decimation according to a second embodiment of the present invention;

FIG. 16 is a reconstruction window W according to a second embodiment of the present invention_L1And W_R1A schematic diagram of (a);

FIG. 17 is a reconstruction window W according to a second embodiment of the present invention_L2And W_R2A schematic diagram of (a);

FIG. 18 is a reconstruction window W according to a second embodiment of the present invention_L3And W_R3A schematic diagram of (a);

FIG. 19 is a reconstruction window W according to a second embodiment of the present invention_L4And W_R4A schematic diagram of (a);

fig. 20 is a flowchart of a digital audio shift processing method according to a second embodiment of the present invention;

FIG. 21 is a diagram illustrating signal blocks corresponding to echo determination parameters according to a second embodiment of the present invention;

FIG. 22 is a diagram for expanding W according to audio in the second embodiment of the present invention_LAnd W_RIs replaced by W_L1And W_R1A schematic illustration of pre-echo prevention;

FIG. 23 is a diagram of W when audio is extended according to the second embodiment of the present invention_LAnd W_RIs replaced by W_L2And W_R2A schematic representation of prevention of hyperechoic sound;

FIG. 24 shows a graph of W when audio is extended according to the second embodiment of the present invention_LAnd W_RIs replaced by W_L3And W_R3A schematic illustration of pre-echo prevention;

FIG. 25 is a diagram for expanding W according to audio in the second embodiment of the present invention_LAnd W_RIs replaced by W_L4And W_R4A schematic representation of prevention of hyperechoic sound;

FIG. 26 is a diagram for compressing W in audio according to the second embodiment of the present invention_LAnd W_RIs replaced by W_L1And W_R1A schematic representation of prevention of hyperechoic sound;

FIG. 27 shows W in audio compression according to the second embodiment of the present invention_LAnd W_RIs replaced by W_L2And W_R2A schematic illustration of pre-echo prevention;

FIG. 28 shows a diagram of compressing W in audio according to the second embodiment of the present invention_LAnd W_RIs replaced by W_L3And W_R3A schematic representation of prevention of hyperechoic sound;

FIG. 29 is a diagram for compressing W in audio according to the second embodiment of the present invention_LAnd W_RIs replaced by W_L4And W_R4A schematic illustration of pre-echo prevention;

fig. 30 is a schematic structural diagram of a digital audio shift processing apparatus according to a third embodiment of the present invention.

Detailed Description

In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The first embodiment of the invention relates to a digital audio speed-changing processing method, and the specific flow is shown in fig. 1.

In step 110, the audio signal data to be subjected to audio frequency variable speed processing is filled into the buffer area until the filled length of the buffer area reaches the data processing length L_p。

Specifically, the reconstruction window length L is set in advance_WUpdating the data length L_DData processing length L_pThe value of (c). In the present embodiment, the playback rate r is a linear playback rate adjustment ratio, and if r < 1, the sound is expanded (i.e., slowed down), and if r > 1, the sound is compressed (i.e., sped up), for example, if r is 0.5, the playback time is 2 times the original playback time.

When the sampling frequency of the digital audio is not changed, a section of sound with the length (number of sampling points) of L is input, the sound is adjusted to be played at the playing speed r after being processed, and the length of the sound is changed into L/r; it is conversely understood that the fixed output length (i.e., the reconstruction window length defined in the present embodiment) L_WThen, the input sound length (i.e., the update data length defined in the present embodiment) L is input_D＝r×L_W. In the case of audio compression, the processing is carried out only on the input update data, and the data processing length L is then_p＝L_D(ii) a While in audio extension, the output audio length is increased by L_W-L_DThe part is 2 (L) before the input data_W-L_D) Dot information, provided by overlapping, so that the data processing length L_pTo reach 2 (L)_W-L_D)+L_D＝2L_W-L_D. That is, according to the playback rate r, L_W、L_DAnd L_pThe relationship between them is:

L_D＝r×L_W

in this step, the audio signal data is filled sample by sample or block by block to the end of the buffer until the filled length of the buffer exceeds L_PAs shown in fig. 2. For convenience, the buffer is expressed as an array x (k) in this embodiment.

Then, in step 120, the audio signal data to be processed in the buffer is windowed to obtain the output signal x_out。

Specifically, the reconstruction window function W is predetermined_LAnd W_R，W_LAnd W_REach has L_WData of points, W_LHaving an amplitude-attenuating characteristic, W_RWith the characteristic of an amplitude increase. W_LAnd W_RMeets the requirement of perfect reconstruction condition, i.e. W_LAnd W_RThe corresponding point addition is equal to 1, and the formula is expressed as W_L(k)+W_R(k)＝1，k＝1，2，…，L_W. In this embodiment, a verified, more practical W is used_LAnd W_R：

W_L(k)＝1-W_R(k)，k＝1，2，…，L_W

L_WWhen W is equal to N_LAnd W_RThe window type of (2) is shown in fig. 3. Of course, in practical application, W may be used_LAnd W_RThe design is other forms, and only the perfect reconstruction condition is satisfied or approximately satisfied.

The specific windowing processing mode in this step is as follows:

if the playback rate r > 1, i.e. sound compression (speed up), the windowed data is

x_L(k)＝x(k)W_L(k)，k＝1，2，…，L_W

x_R(k)＝x(k+L_D-L_W)W_R(k)，k＝1，2，…，L_W

The output data is a frame L_WOutputting the dot form:

x_out(k)＝x_L(k)+x_R(k)，k＝1，2，…，L_W

as shown in fig. 4, x_LActually, the length is L_p＝L_D＝r×L_wOf (d) and a window function W_LMultiplying W point by point after left end alignment_LAnd x is_RIs to the length of the window function W for the data to be processed_RMultiplying W point by point after right end alignment_R。x_LAnd x_RAfter addition, the audio data length is from L_pIs reduced to L_wThe audio length is compressed. Compressed L_p-L_wPoint at a window function W with perfect reconstruction properties_LAnd W_RUnder the smoothing effect of the window function, the window function is fused in the processed audio, the information quantity is not reduced, the perfect reconstruction characteristic of the window function is realized, and the stationarity of the processed audio is also ensured.

If the playback rate r < 1, i.e. the sound is expanded (slowed down), the windowed data is

x_L(k)＝x(k)W_R(k)，k＝1，2，…，L_W

x_R(k)＝x(k+L_W-L_D)W_L(k)，k＝1，2，…，L_W

The output data is still output in the form of one frame LW point:

x_out(k)＝x_L(k)+x_R(k)，k＝1，2，…，L_W

as shown in fig. 5, x_LActually, the length is L_p＝(2-r)L_wData to be processed (r < 1) and a window function W_LMultiplying W point by point after right end alignment_LAnd x is_RIs to the length of the window function W for the data to be processed_RMultiplying W point by point after left end alignment_R。x_LAnd x_RAfter addition, the audio data length is from L_DTo increase to L_wThe length of the audio is elongated, and the information is added by the length L_p＝(2-r)L_wFirst (1-r) L in the signal to be processed_wAnd post (1-r) L_wPoints, which are relative to the middle L_DThe points are past information and future information, respectively. By windowing and superposition, the first (1-r) L in the signal to be processed_wAnd post (1-r) L_wPoints are introduced into the middle L_wThe point, the audio information introduced, is in a window function W with perfect reconstruction properties_LAnd W_RThe smooth function of the window function is fused in the processed audio frequency, the perfect reconstruction characteristic of the window function is fused, and the stability of the processed audio frequency is also ensured.

It is worth mentioning that the three lengths L are used_w，L_D，L_PThe direct relationship is fixed and determining one yields the other two. Design and L of the reconfigurable window_wIs directly related, so in this embodiment, a fixed length L is used_wLet L_D，L_PAnd varies with r to make subsequent windowing operations simpler. In addition, it is understood that in practical application, L may be determined first_D(or L)_P) Is again in accordance with L_D(or L)_P) And the playback rate r to L_wAnd L_P(or L)_D) The value of (c).

Next, in step 130, the windowed L is processed_DThe signal is moved out of the buffer and the audio signal data to be processed continues to be filled at the end of the buffer until the filled length of the buffer reaches the data processing length Lp, as shown in fig. 6. That is to say thatFrom L in x (k)_dAll data starting at +1 is shifted to the beginning in its entirety, i.e.:

x(k)＝x(k+L_D)，k＝1，2，…

after the step 130 is completed, the process returns to the step 120 until the audio speed change process of all the audio signal data is completed.

Since in the present embodiment, a pair of perfect reconstruction window functions W having amplitude attenuation and gain characteristics is used_LAnd W_RAnd acting on the original digital audio according to different time delays to obtain a pair of windowed data, and reconstructing an audio waveform by using the windowed data to obtain the audio subjected to variable speed processing. Audio compression (speed up) time-compressed audio information is smoothly spread over the processed shortened audio data by the overlap of a pair of windowed audio signals. Whereas the audio extension (slowing down) smoothly overlaps by introducing past and future (temporally newer relative to the reference data) audio information to obtain longer audio data. Because the pitch period and the correlation of the audio do not need to be detected and the time-frequency transformation does not need to be carried out, the audio processing flow only uses shifting, windowing and overlapping, the calculation theoretical complexity is only O (N), and is less than O (N) using the waveform similarity detection algorithm²) And O (Nlog) of algorithm of time-frequency transform and spectrum processing₂N). The amount of calculation is extremely low. In addition, the compression and the introduction of the waveform of the audio signal are utilized to realize the time increase and decrease of the playing content, and the audio waveform is not changed, so that the original tone quality can be maintained more. The actual processing shows that the speed change range of the embodiment is wide, the time control is accurate, the processing tone quality is high, and the universal audio (including voice and music) can be processed.

For example, the test performs a speed change process on a segment of English speech and a segment of violin concerto's Liang Zhu'. The playing speed r is 2 and 0.5, and is correspondingly accelerated to 2 times and decelerated to 0.5 times. If the conventional algorithm is used for the rate, the audio quality is seriously reduced, and after the method in the embodiment is adopted for processing, the audio is still kept at a high degree and is relatively improved more. As shown in fig. 7 and 8, the waveform envelope of the audio maintains the same shape. And the frequency spectrum of the audio also maintains the same voiceprint (as shown in fig. 9-11, fig. 12-14). The actual audition shows that the audio before and after processing has not changed in tone except for the speed of sound.

The second embodiment of the present invention relates to a digital audio shift processing method. The second embodiment is improved on the basis of the first embodiment, and the main improvement lies in that: in the first embodiment, W used when windowing audio signal data to be processed in a buffer is performed_LAnd W_RAdding an initial reconstruction window function, W, equal to 1 to the corresponding point_LAnd W_RIs stationary. In the present embodiment, when performing windowing on the audio signal data to be processed in the buffer, it is necessary to select different reconstruction window functions according to the echo types of the audio signal data, and then perform windowing using the selected reconstruction window function, as shown in fig. 20.

In particular, in audio processing, different weight distribution reconstruction windows may be used depending on the waveform. The acquisition method of the reconstruction window is not unique, but must satisfy or approximately satisfy a perfect reconstruction condition, i.e., W_LAnd W_RThe corresponding point addition is approximately 1

W_L(k)+W_R(k)≈1，k＝1，2，…，L_W

The reconstruction windows with different weight distributions can be separately generated and stored, or only one initial reconstruction window can be generated and stored, and the reconstruction windows with other weight distributions are obtained from the initial reconstruction window through transformation. The conversion method comprises the following steps: the original reconstruction window is decimated in integer proportion, e.g. 2 decimating 1, 3 decimating 1, 4 decimating 1, etc., to obtain the slowly varying part of the transform window type, and the invariant parts at both ends are filled with 0 or 1 respectively to obtain the extension to reach the original length, as shown in fig. 15.

In the present embodiment, the first embodiment is usedW in the mode_LAnd W_RAs an initial reconstruction window function, 4 pairs of reconstruction window functions with different weight distributions can be obtained by carrying out decimation conversion on the initial reconstruction window, and the reconstruction window functions are used for processing audio more flexibly. Wherein, W_L1，W_R1(the window type is shown in FIG. 16) is:

W_L2，W_R2(the window type is shown in FIG. 17) is:

W_L3，W_R3(the window type is shown in FIG. 18) is:

W_L4，W_R4(the window type is shown in FIG. 19) is:

in this embodiment, the echo type of the audio signal data is obtained according to the judgment result of the block energy or the block absolute value of the audio signal data and the preset threshold.

Specifically, the design determination parameters engLa, engLb, engRa, and engRb, and the signal blocks (scribe-and-fill regions) corresponding thereto are shown in fig. 21, and it can be seen that engLa, engLb, engRa, and engRb are arranged from old to new in time, and when energy calculation is adopted, the determination parameters engLa, engLb, engRa, and engRb are obtained by the following formulas:

when absolute value calculation is adopted, the judgment parameters engLa, engLb, engRa and engRb are obtained by the following formula:

according to the judgment parameters engLa, engLb, engRa and engRb, an echo judgment threshold echo rate (which must be a value greater than 1 and generally greater than 2) is set, an echo type is obtained according to the judgment parameters and the echo judgment threshold, and a used reconstruction window is selected, which can be specifically realized by the following codes:

initial echoControl is 0; holding W_LAnd W_RIs not changed

if (outLen > frmLen) sound expansion judgment process

{

if(engRa＞(engLa+engLb)*echoRate)

echoControl＝3；W_LAnd W_RIs replaced by W_L3And W_R3Prevention of pre-echo (pre echo)

else if(engRb＞(engLa+engLb)*echoRate)

echoControl＝1；W_LAnd W_RIs replaced by W_L1And W_R1Prevention of pre-echo (pre echo)

if(engLb＞(engRa+engRb)*echoRate)

echoControl＝4；W_LAnd W_RIs replaced by W_L4And W_R4Prevention of hyperechoic sound (post echo)

else if(engLa＞(engRa+engRb)*echoRate)

echoControl＝2；W_LAnd W_RIs replaced by W_L2And W_R2Prevention of hyperechoic sound (post echo)

}

else sound compression judgment process

{

if(engRb＞(engLa+engLb))

{

if(engRa＞(engLa+engLb))

echoControl＝4；W_LAnd W_RIs replaced by W_L4And W_R4Prevention of pre-echo (pre echo)

else

echoControl＝2；W_LAnd W_RIs replaced by W_L2And W_R2Prevention of pre-echo (pre echo)

}

if(engLa＞(engRa+engRb))

{

if(engLb＞(engRa+engRb))

echoControl＝3；W_LAnd W_RIs replaced by W_L3And W_R3Prevention of hyperechoic sound (post echo)

else

echoControl＝1；W_LAnd W_RIs replaced by W_L1And W_R1Prevention of hyperechoic sound (post echo)

}

It has been experimentally confirmed that the speech intelligibility can be further improved by selecting an adaptive window function according to the echo type, as shown in fig. 22-29.

As will be appreciated by those skilled in the art, audio compression (speed up) is the smooth spreading of time-compressed audio information over processed shortened audio data; whereas audio extension (slowing down) is to obtain longer audio data by smoothly overlapping incoming past and future (temporally newer relative to the reference data) audio information. This overlapping process introduces or spreads the more energetic signal into the original less energetic portion, resulting in both over-echo (echo after signal generation) and pre-echo (echo before signal generation). Therefore, the echo is judged, and the dispersion degree of signal energy can be reduced and pre-echo and over-echo can be relieved when the audio signal with extremely uneven energy distribution is faced through the switching of the window function.

In the present embodiment, the basis of the echo determination is the block energy (or block absolute value) of the audio signal, and if the past signal is larger than the present signal, an over-echo is likely to occur. In audio compression, the transition band of the window function needs to be shifted left, i.e. using a similar W_L1，W_R1Or W_L3，W_R3Such a window function suppresses the larger (past) signal at the left end, reducing the dispersion (examples correspond to fig. 26, 28, respectively); while in audio extension the transition band of the window function needs to be shifted to the right, i.e. using a similar W_L2，W_R2Or W_L4，W_R4Such a window function suppresses the (past) large signal at the left end and reduces the spread (examples correspond to fig. 23 and 25, respectively). If the past signal is smaller than the present one, pre-echo is easily generated, and the transition band of the window function needs to be shifted to the right in audio compression, i.e. using similar W_L2，W_R2Or W_L4，W_R4Such a window function suppresses the larger (future) signal at the right end, reducing the dispersion (examples correspond to fig. 27, 29, respectively); in audio extension, the transition band of the window function needs to be shifted left, i.e. using a similar W_L1，W_R1Or W_L3，W_R3Such a window function suppresses the larger (future) signal at the right end, reducing the dispersion (examples correspond to fig. 22, 24, respectively). W_L1，W_R1And W_L3，W_R3The transition zone of (A) is shifted to the left to different degrees, W_L2，W_R2And W_L4，W_R4The degree of right shift is also different, which needs to be discriminated by the subdivided block energy (or block absolute value).

It is easy to find that, in the present embodiment, the audio quality after speed change is further ensured by further selecting an appropriate reconstruction window function according to the echo type. Also, since if the past signal is larger than the present signal, over-echo is likely to occur. If the past signal is smaller than the present signal, pre-echo is likely to occur. Therefore, the block energy (or the absolute value of the block) of the audio signal is used as the basis for judging the echo type, and the accuracy of the judgment result can be effectively ensured.

The method embodiments of the present invention may be implemented in software, hardware, firmware, etc. Whether the present invention is implemented as software, hardware, or firmware, the instruction code may be stored in any type of computer-accessible memory (e.g., permanent or modifiable, volatile or non-volatile, solid or non-solid, fixed or removable media, etc.). Also, the Memory may be, for example, Programmable Array Logic (PAL), Random Access Memory (RAM), Programmable Read Only Memory (PROM), Read-Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic disk, an optical disk, a Digital Versatile Disk (DVD), or the like.

A third embodiment of the present invention relates to a digital audio shift processing apparatus. As shown in fig. 30, the digital audio shift processing device includes:

a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area until the filled length of the buffer area reaches the data processing length L_p。

A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal x_out. When the audio frequency speed change processing is the speed-up processing, the windowing processing module leads the length in the buffer area to be L_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after left end alignment_LTo obtain x_LWill have a length L in the buffer_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after right end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WAn output signal x_out. When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is L_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after right end alignment_LTo obtain x_LWill have a length L in the buffer_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after left end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WA number of said output signals x_out。

A shift module for shifting the windowed L_DThe signal is moved out of the buffer area and the filling module is instructed to continue filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length L_p。

When the filled length of the buffer reaches the data processing length L_pAnd triggering the processing of the windowing processing module. When the windowing processing module obtains L_WAn output signal x_outAnd triggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed.

Wherein, W_LAs a window function having amplitude attenuation characteristics, W_RFor a window function having an amplitude increasing characteristic, W_LAnd W_REach has L_WThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1. L is_WIs a predetermined value according to L_WAnd the playback rate r to L_DAnd L_pThe value of (c).

In the present embodiment, a window function W for performing windowing processing_LAnd W_RAdding an initial reconstruction window function equal to 1 to the corresponding point, an initial reconstruction window W_LAnd W_RThe following were used:

W_L(k)＝1-W_R(k)，k＝1，2，…，L_W

it is to be understood that the first embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.

A fourth embodiment of the present invention relates to a digital audio shift processing apparatus. The fourth embodiment is an improvement on the third embodiment, and the main improvement lies in that: in the third embodiment, a window function W for performing windowing processing_LAnd W_RAdding an initial reconstruction window function, W, equal to 1 to the corresponding point_LAnd W_RIs fixed; in the present embodiment, the window function W for performing the windowing process_LAnd W_RA reconstruction window function of different weight distributions selected according to the echo type of the audio signal data. That is to say, the digital audio variable speed processing device of the embodiment further includes a window function selection module, configured to obtain an echo type of the audio signal data according to a judgment result of a block energy or a block absolute value of the audio signal data and a preset threshold, and output the obtained echo type to the windowing processing module, where the window function is used to select the reconstruction window functions with different weight distributions.

And respectively and independently generating reconstruction window functions with different weight distributions, or transforming the initial reconstruction window to obtain the reconstruction window functions. The reconstruction window functions with different weight distributions are obtained by transforming the initial reconstruction window as follows:

the initial reconstruction window is decimated in integer proportion to obtain the slowly varying part of the transformed window type, and the unchanged parts at both ends are filled with 0 or 1 respectively until the original length of the initial reconstruction window is reached.

It is to be understood that the second embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the second embodiment. The related technical details mentioned in the second embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the second embodiment.

It should be noted that, each unit mentioned in each device embodiment of the present invention is a logical unit, and physically, one logical unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units, and the physical implementation manner of these logical units itself is not the most important, and the combination of the functions implemented by these logical units is the key to solve the technical problem provided by the present invention. Furthermore, the above-mentioned embodiments of the apparatus of the present invention do not introduce elements that are less relevant for solving the technical problems of the present invention in order to highlight the innovative part of the present invention, which does not indicate that there are no other elements in the above-mentioned embodiments of the apparatus.

While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A digital audio speed-changing processing method is characterized by comprising the following steps:

If the audio frequency speed change processing is the speed-up processing, the buffering is carried outLength in zone L_pOf audio signal data and length L_WWindow function W of_LMultiplying W point by point after left end alignment_LTo obtain x_LThe length in the buffer is L_pOf audio signal data and length L_WWindow function W of_RMultiplying W point by point after right end alignment_RTo obtain x_RX to be obtained_LAnd x_RAdd to obtain L_WA number of said output signals x_out；

wherein, the W_LFor a window function having amplitude attenuation characteristics, W_RFor a window function having an amplitude increasing characteristic, W_LAnd W_REach has L_WData of points, the addition of the corresponding points being equal to 1 or approximately 1; said L_WIs a predetermined value according to L_WAnd the playback rate r to obtain said L_DAnd L_pThe value of (c).

2. The method of claim 1, wherein the windowing is performed on the audio signal data to be processed in the buffer areaW_LAnd W_RAdding an initial reconstruction window function equal to 1 to the corresponding point; or, the W_LAnd W_RA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and the reconstruction window functions with different weight distributions are respectively and independently generated or obtained by transforming the initial reconstruction window.

3. The method of claim 2, wherein the echo type of the audio signal data is obtained according to a judgment result of a block energy or a block absolute value of the audio signal data and a preset threshold.

4. The digital audio frequency variable speed processing method according to claim 2, wherein the reconstruction window functions with different weight distributions obtained by transforming the initial reconstruction window are obtained as follows:

and performing integer ratio value extraction on the initial reconstruction window to obtain a slowly-changed part of a transformation window type, and filling the unchanged parts at two ends with 0 or 1 respectively until the original length of the initial reconstruction window is reached.

5. The digital audio variable-speed processing method according to claim 2, wherein the initial reconstruction window W_LAnd W_RThe following were used:

Figure 2009102021641100001DEST_PATH_IMAGE002

W_L(k)＝1-W_R(k)，k＝1，2，…，L_W。

6. a digital audio shift processing apparatus, comprising:

a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area till the buffer areaIs filled to the data processing length L_p；

A shift module for shifting the windowed L_DThe signal is moved out of the buffer area and the filling module is instructed to continue filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length L_p；

wherein, the W_LFor a window function having amplitude attenuation characteristics, W_RTo have an amplitudeWindow function of increasing characteristics, W_LAnd W_REach has L_WData of points, the addition of the corresponding points being equal to 1 or approximately 1; said L_WIs a predetermined value according to L_WAnd the playback rate r to obtain said L_DAnd L_pThe value of (c).

7. The apparatus according to claim 6, wherein the window function W for windowing_LAnd W_RAdding an initial reconstruction window function equal to 1 to the corresponding point; or,

the window function W for windowing_LAnd W_RA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and the reconstruction window functions with different weight distributions are respectively and independently generated or obtained by transforming the initial reconstruction window.

8. The digital audio variable speed processing device according to claim 7, further comprising: and the window function selection module is used for acquiring the echo type of the audio signal data according to the judgment result of the block energy or the block absolute value of the audio signal data and a preset threshold, and outputting the acquired echo type to the windowing processing module.

9. The apparatus according to claim 7, wherein the reconstruction window functions with different weight distributions are obtained by transforming the initial reconstruction window as follows:

10. The digital audio variable speed processing device according to claim 7, wherein the initial reconstruction windowW_LAnd W_RThe following were used:

W_L(k)＝1-W_R(k)，k＝1，2，…，L_W。