Disclosure of Invention
The invention aims to provide a digital audio speed change processing method and a device thereof, which can realize speed change processing of general digital audio with lower calculation amount and obtain higher processing sound quality.
In order to solve the above technical problem, an embodiment of the present invention provides a digital audio speed change processing method, including the following steps:
a, filling audio signal data to be subjected to audio variable speed processing into a buffer area until the filled length of the buffer area reaches a data processing length Lp;
B, windowing the audio signal data to be processed in the buffer area in the following mode to obtain an output signal xout:
If the audio frequency speed change processing is accelerated speed processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout;
If the audio frequency speed changing processing is the processing of slowing down, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout;
C will finish the L of windowing processingDMoving the signal out of the buffer area, and continuously filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length Lp;
Repeatedly executing the step B and the step C until the audio frequency speed change processing of all the audio signal data is completed;
wherein, the WLFor a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1.
An embodiment of the present invention further provides a digital audio speed-changing processing device, including:
a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area until the filled length of the buffer area reaches the data processing length Lp;
A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal xout(ii) a The windowing processing module is used for setting the length in the buffer zone to be L when the audio frequency speed change processing is acceleratedpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout(ii) a When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout;
A shift module for shifting the windowed LDA signal is shifted out of the buffer and the filling module is instructed to bufferThe tail part of the area is continuously filled with the audio signal data to be processed until the filled length of the buffer area reaches the data processing length Lp;
When the filled length of the buffer reaches the data processing length LpTriggering the processing of the windowing processing module; when the windowing processing module obtains LWA number of said output signals xoutTriggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed;
wherein, the WLFor a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1.
Compared with the prior art, the implementation mode of the invention has the main differences and the effects that:
using a pair of perfect reconstruction window functions W with amplitude attenuation and gain characteristicsLAnd WRAnd acting on the original digital audio according to different time delays to obtain a pair of windowed data, and reconstructing an audio waveform by using the windowed data to obtain the audio subjected to variable speed processing. The calculation amount is extremely low because the pitch period and the correlation of the audio do not need to be detected and the time-frequency transformation does not need to be carried out. In addition, the compression and the introduction of the waveform of the audio signal are utilized to realize the time increase and decrease of the playing content, and the audio waveform is not changed, so that the original tone quality can be maintained more.
Further, when windowing the audio signal data to be processed in the buffer, WLAnd WRAdding an initial reconstruction window function equal to 1 to the corresponding point; or, WLAnd WRA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and respectively and independently generating reconstruction window functions with different weight distributions, or transforming the initial reconstruction window to obtain the reconstruction window functions. Since audio compression (speed up) is time-to-timeThe compressed audio information is smoothly dispersed over the processed shortened audio data; whereas audio extension (slowing down) is to obtain longer audio data by smoothly overlapping incoming past and future (temporally newer relative to the reference data) audio information. The overlapping process introduces or spreads the signal with larger energy into the original part with smaller energy, resulting in over-echo (after the echo occurs) and pre-echo (before the echo occurs), so that when windowing is performed, an appropriate reconstruction window function can be further selected according to the echo type, so as to further ensure the audio quality after speed change.
Further, the echo type of the audio signal data is obtained according to the judgment result of the block energy or the block absolute value of the audio signal data and the preset threshold. Since if the past signal is larger than the present signal, over-echo is likely to occur; if the past signal is smaller than the present signal, pre-echo is likely to occur. Therefore, the block energy (or the absolute value of the block) of the audio signal is used as the basis for judging the echo type, and the accuracy of the judgment result can be effectively ensured.
Further, an initial reconstruction window WLAnd WRThe following were used:
WL(k)=1-WR(k),k=1,2,…,LW
experiments prove that when the window W is initially reconstructedLAnd WRDesigned as WLAnd WRIn the process, 4 pairs of reconstruction windows with different weight distributions, which can be more flexibly used for processing the audio, can be obtained through the decimation conversion.
Further, L is presetWAccording to LWAnd the playback rate r to LDAnd LpThe value of (c). Due to three lengths LW、LD、LpThe relationship between is fixed and determining one yields the other two. Design and L of the reconfigurable windowWIs directly related, thus employing a fixed length LWLet LD、LpThe subsequent windowing operation can be simpler and more convenient as r changes.
Drawings
FIG. 1 is a flow chart of a digital audio shift processing method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of buffer filling according to a first embodiment of the present invention;
FIG. 3 is an initial reconstruction window W according to a first embodiment of the present inventionLAnd WRA schematic view of the window type of (1);
FIG. 4 is a diagram illustrating the output of a combined signal of a windowing process when the playback rate r > 1 according to the first embodiment of the present invention;
FIG. 5 is a diagram illustrating the output of a combined signal of the windowing process when the playback rate r < 1 according to the first embodiment of the present invention;
FIG. 6 is a diagram illustrating buffer shifting according to the first embodiment of the present invention;
FIG. 7 is a diagram illustrating waveform effects of a voice test according to a first embodiment of the present invention;
FIG. 8 is a diagram illustrating the waveform effect of a music test according to the first embodiment of the present invention;
FIG. 9 is a diagram illustrating a voice original spectrum of a voice test according to a first embodiment of the present invention;
FIG. 10 is a diagram illustrating a spectrum of a speech test with time compression of 0.5 times according to a first embodiment of the present invention;
FIG. 11 is a schematic diagram of a spectrum diagram of a voice test with a time spread of 2 times according to a first embodiment of the present invention;
FIG. 12 is a diagram illustrating a music original spectrum of a music test according to a first embodiment of the present invention;
FIG. 13 is a graph illustrating a time-compressed 0.5-fold spectrum of a music test according to a first embodiment of the present invention;
FIG. 14 is a schematic diagram of a spectrogram of a music test with a time spread of 2 times according to the first embodiment of the present invention;
FIG. 15 is a diagram illustrating a reconstruction window for obtaining different weight distributions by decimation according to a second embodiment of the present invention;
FIG. 16 is a reconstruction window W according to a second embodiment of the present inventionL1And WR1A schematic diagram of (a);
FIG. 17 is a reconstruction window W according to a second embodiment of the present inventionL2And WR2A schematic diagram of (a);
FIG. 18 is a reconstruction window W according to a second embodiment of the present inventionL3And WR3A schematic diagram of (a);
FIG. 19 is a reconstruction window W according to a second embodiment of the present inventionL4And WR4A schematic diagram of (a);
fig. 20 is a flowchart of a digital audio shift processing method according to a second embodiment of the present invention;
FIG. 21 is a diagram illustrating signal blocks corresponding to echo determination parameters according to a second embodiment of the present invention;
FIG. 22 is a diagram for expanding W according to audio in the second embodiment of the present inventionLAnd WRIs replaced by WL1And WR1A schematic illustration of pre-echo prevention;
FIG. 23 is a diagram of W when audio is extended according to the second embodiment of the present inventionLAnd WRIs replaced by WL2And WR2A schematic representation of prevention of hyperechoic sound;
FIG. 24 shows a graph of W when audio is extended according to the second embodiment of the present inventionLAnd WRIs replaced by WL3And WR3A schematic illustration of pre-echo prevention;
FIG. 25 is a diagram for expanding W according to audio in the second embodiment of the present inventionLAnd WRIs replaced by WL4And WR4A schematic representation of prevention of hyperechoic sound;
FIG. 26 is a diagram for compressing W in audio according to the second embodiment of the present inventionLAnd WRIs replaced by WL1And WR1A schematic representation of prevention of hyperechoic sound;
FIG. 27 shows W in audio compression according to the second embodiment of the present inventionLAnd WRIs replaced by WL2And WR2A schematic illustration of pre-echo prevention;
FIG. 28 shows a diagram of compressing W in audio according to the second embodiment of the present inventionLAnd WRIs replaced by WL3And WR3A schematic representation of prevention of hyperechoic sound;
FIG. 29 is a diagram for compressing W in audio according to the second embodiment of the present inventionLAnd WRIs replaced by WL4And WR4A schematic illustration of pre-echo prevention;
fig. 30 is a schematic structural diagram of a digital audio shift processing apparatus according to a third embodiment of the present invention.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment of the invention relates to a digital audio speed-changing processing method, and the specific flow is shown in fig. 1.
In step 110, the audio signal data to be subjected to audio frequency variable speed processing is filled into the buffer area until the filled length of the buffer area reaches the data processing length Lp。
Specifically, the reconstruction window length L is set in advanceWUpdating the data length LDData processing length LpThe value of (c). In the present embodiment, the playback rate r is a linear playback rate adjustment ratio, and if r < 1, the sound is expanded (i.e., slowed down), and if r > 1, the sound is compressed (i.e., sped up), for example, if r is 0.5, the playback time is 2 times the original playback time.
When the sampling frequency of the digital audio is not changed, a section of sound with the length (number of sampling points) of L is input, the sound is adjusted to be played at the playing speed r after being processed, and the length of the sound is changed into L/r; it is conversely understood that the fixed output length (i.e., the reconstruction window length defined in the present embodiment) LWThen, the input sound length (i.e., the update data length defined in the present embodiment) L is inputD=r×LW. In the case of audio compression, the processing is carried out only on the input update data, and the data processing length L is thenp=LD(ii) a While in audio extension, the output audio length is increased by LW-LDThe part is 2 (L) before the input dataW-LD) Dot information, provided by overlapping, so that the data processing length LpTo reach 2 (L)W-LD)+LD=2LW-LD. That is, according to the playback rate r, LW、LDAnd LpThe relationship between them is:
LD=r×LW
in this step, the audio signal data is filled sample by sample or block by block to the end of the buffer until the filled length of the buffer exceeds LPAs shown in fig. 2. For convenience, the buffer is expressed as an array x (k) in this embodiment.
Then, in step 120, the audio signal data to be processed in the buffer is windowed to obtain the output signal xout。
Specifically, the reconstruction window function W is predeterminedLAnd WR,WLAnd WREach has LWData of points, WLHaving an amplitude-attenuating characteristic, WRWith the characteristic of an amplitude increase. WLAnd WRMeets the requirement of perfect reconstruction condition, i.e. WLAnd WRThe corresponding point addition is equal to 1, and the formula is expressed as WL(k)+WR(k)=1,k=1,2,…,LW. In this embodiment, a verified, more practical W is usedLAnd WR:
WL(k)=1-WR(k),k=1,2,…,LW
LWWhen W is equal to NLAnd WRThe window type of (2) is shown in fig. 3. Of course, in practical application, W may be usedLAnd WRThe design is other forms, and only the perfect reconstruction condition is satisfied or approximately satisfied.
The specific windowing processing mode in this step is as follows:
if the playback rate r > 1, i.e. sound compression (speed up), the windowed data is
xL(k)=x(k)WL(k),k=1,2,…,LW
xR(k)=x(k+LD-LW)WR(k),k=1,2,…,LW
The output data is a frame LWOutputting the dot form:
xout(k)=xL(k)+xR(k),k=1,2,…,LW
as shown in fig. 4, xLActually, the length is Lp=LD=r×LwOf (d) and a window function WLMultiplying W point by point after left end alignmentLAnd x isRIs to the length of the window function W for the data to be processedRMultiplying W point by point after right end alignmentR。xLAnd xRAfter addition, the audio data length is from LpIs reduced to LwThe audio length is compressed. Compressed Lp-LwPoint at a window function W with perfect reconstruction propertiesLAnd WRUnder the smoothing effect of the window function, the window function is fused in the processed audio, the information quantity is not reduced, the perfect reconstruction characteristic of the window function is realized, and the stationarity of the processed audio is also ensured.
If the playback rate r < 1, i.e. the sound is expanded (slowed down), the windowed data is
xL(k)=x(k)WR(k),k=1,2,…,LW
xR(k)=x(k+LW-LD)WL(k),k=1,2,…,LW
The output data is still output in the form of one frame LW point:
xout(k)=xL(k)+xR(k),k=1,2,…,LW
as shown in fig. 5, xLActually, the length is Lp=(2-r)LwData to be processed (r < 1) and a window function WLMultiplying W point by point after right end alignmentLAnd x isRIs to the length of the window function W for the data to be processedRMultiplying W point by point after left end alignmentR。xLAnd xRAfter addition, the audio data length is from LDTo increase to LwThe length of the audio is elongated, and the information is added by the length Lp=(2-r)LwFirst (1-r) L in the signal to be processedwAnd post (1-r) LwPoints, which are relative to the middle LDThe points are past information and future information, respectively. By windowing and superposition, the first (1-r) L in the signal to be processedwAnd post (1-r) LwPoints are introduced into the middle LwThe point, the audio information introduced, is in a window function W with perfect reconstruction propertiesLAnd WRThe smooth function of the window function is fused in the processed audio frequency, the perfect reconstruction characteristic of the window function is fused, and the stability of the processed audio frequency is also ensured.
It is worth mentioning that the three lengths L are usedw,LD,LPThe direct relationship is fixed and determining one yields the other two. Design and L of the reconfigurable windowwIs directly related, so in this embodiment, a fixed length L is usedwLet LD,LPAnd varies with r to make subsequent windowing operations simpler. In addition, it is understood that in practical application, L may be determined firstD(or L)P) Is again in accordance with LD(or L)P) And the playback rate r to LwAnd LP(or L)D) The value of (c).
Next, in step 130, the windowed L is processedDThe signal is moved out of the buffer and the audio signal data to be processed continues to be filled at the end of the buffer until the filled length of the buffer reaches the data processing length Lp, as shown in fig. 6. That is to say thatFrom L in x (k)dAll data starting at +1 is shifted to the beginning in its entirety, i.e.:
x(k)=x(k+LD),k=1,2,…
after the step 130 is completed, the process returns to the step 120 until the audio speed change process of all the audio signal data is completed.
Since in the present embodiment, a pair of perfect reconstruction window functions W having amplitude attenuation and gain characteristics is usedLAnd WRAnd acting on the original digital audio according to different time delays to obtain a pair of windowed data, and reconstructing an audio waveform by using the windowed data to obtain the audio subjected to variable speed processing. Audio compression (speed up) time-compressed audio information is smoothly spread over the processed shortened audio data by the overlap of a pair of windowed audio signals. Whereas the audio extension (slowing down) smoothly overlaps by introducing past and future (temporally newer relative to the reference data) audio information to obtain longer audio data. Because the pitch period and the correlation of the audio do not need to be detected and the time-frequency transformation does not need to be carried out, the audio processing flow only uses shifting, windowing and overlapping, the calculation theoretical complexity is only O (N), and is less than O (N) using the waveform similarity detection algorithm2) And O (Nlog) of algorithm of time-frequency transform and spectrum processing2N). The amount of calculation is extremely low. In addition, the compression and the introduction of the waveform of the audio signal are utilized to realize the time increase and decrease of the playing content, and the audio waveform is not changed, so that the original tone quality can be maintained more. The actual processing shows that the speed change range of the embodiment is wide, the time control is accurate, the processing tone quality is high, and the universal audio (including voice and music) can be processed.
For example, the test performs a speed change process on a segment of English speech and a segment of violin concerto's Liang Zhu'. The playing speed r is 2 and 0.5, and is correspondingly accelerated to 2 times and decelerated to 0.5 times. If the conventional algorithm is used for the rate, the audio quality is seriously reduced, and after the method in the embodiment is adopted for processing, the audio is still kept at a high degree and is relatively improved more. As shown in fig. 7 and 8, the waveform envelope of the audio maintains the same shape. And the frequency spectrum of the audio also maintains the same voiceprint (as shown in fig. 9-11, fig. 12-14). The actual audition shows that the audio before and after processing has not changed in tone except for the speed of sound.
The second embodiment of the present invention relates to a digital audio shift processing method. The second embodiment is improved on the basis of the first embodiment, and the main improvement lies in that: in the first embodiment, W used when windowing audio signal data to be processed in a buffer is performedLAnd WRAdding an initial reconstruction window function, W, equal to 1 to the corresponding pointLAnd WRIs stationary. In the present embodiment, when performing windowing on the audio signal data to be processed in the buffer, it is necessary to select different reconstruction window functions according to the echo types of the audio signal data, and then perform windowing using the selected reconstruction window function, as shown in fig. 20.
In particular, in audio processing, different weight distribution reconstruction windows may be used depending on the waveform. The acquisition method of the reconstruction window is not unique, but must satisfy or approximately satisfy a perfect reconstruction condition, i.e., WLAnd WRThe corresponding point addition is approximately 1
WL(k)+WR(k)≈1,k=1,2,…,LW
The reconstruction windows with different weight distributions can be separately generated and stored, or only one initial reconstruction window can be generated and stored, and the reconstruction windows with other weight distributions are obtained from the initial reconstruction window through transformation. The conversion method comprises the following steps: the original reconstruction window is decimated in integer proportion, e.g. 2 decimating 1, 3 decimating 1, 4 decimating 1, etc., to obtain the slowly varying part of the transform window type, and the invariant parts at both ends are filled with 0 or 1 respectively to obtain the extension to reach the original length, as shown in fig. 15.
In the present embodiment, the first embodiment is usedW in the modeLAnd WRAs an initial reconstruction window function, 4 pairs of reconstruction window functions with different weight distributions can be obtained by carrying out decimation conversion on the initial reconstruction window, and the reconstruction window functions are used for processing audio more flexibly. Wherein, WL1,WR1(the window type is shown in FIG. 16) is:
WL2,WR2(the window type is shown in FIG. 17) is:
WL3,WR3(the window type is shown in FIG. 18) is:
WL4,WR4(the window type is shown in FIG. 19) is:
in this embodiment, the echo type of the audio signal data is obtained according to the judgment result of the block energy or the block absolute value of the audio signal data and the preset threshold.
Specifically, the design determination parameters engLa, engLb, engRa, and engRb, and the signal blocks (scribe-and-fill regions) corresponding thereto are shown in fig. 21, and it can be seen that engLa, engLb, engRa, and engRb are arranged from old to new in time, and when energy calculation is adopted, the determination parameters engLa, engLb, engRa, and engRb are obtained by the following formulas:
when absolute value calculation is adopted, the judgment parameters engLa, engLb, engRa and engRb are obtained by the following formula:
according to the judgment parameters engLa, engLb, engRa and engRb, an echo judgment threshold echo rate (which must be a value greater than 1 and generally greater than 2) is set, an echo type is obtained according to the judgment parameters and the echo judgment threshold, and a used reconstruction window is selected, which can be specifically realized by the following codes:
initial echoControl is 0; holding WLAnd WRIs not changed
if (outLen > frmLen) sound expansion judgment process
{
if(engRa>(engLa+engLb)*echoRate)
echoControl=3;WLAnd WRIs replaced by WL3And WR3Prevention of pre-echo (pre echo)
else if(engRb>(engLa+engLb)*echoRate)
echoControl=1;WLAnd WRIs replaced by WL1And WR1Prevention of pre-echo (pre echo)
if(engLb>(engRa+engRb)*echoRate)
echoControl=4;WLAnd WRIs replaced by WL4And WR4Prevention of hyperechoic sound (post echo)
else if(engLa>(engRa+engRb)*echoRate)
echoControl=2;WLAnd WRIs replaced by WL2And WR2Prevention of hyperechoic sound (post echo)
}
else sound compression judgment process
{
if(engRb>(engLa+engLb))
{
if(engRa>(engLa+engLb))
echoControl=4;WLAnd WRIs replaced by WL4And WR4Prevention of pre-echo (pre echo)
else
echoControl=2;WLAnd WRIs replaced by WL2And WR2Prevention of pre-echo (pre echo)
}
if(engLa>(engRa+engRb))
{
if(engLb>(engRa+engRb))
echoControl=3;WLAnd WRIs replaced by WL3And WR3Prevention of hyperechoic sound (post echo)
else
echoControl=1;WLAnd WRIs replaced by WL1And WR1Prevention of hyperechoic sound (post echo)
}
}
It has been experimentally confirmed that the speech intelligibility can be further improved by selecting an adaptive window function according to the echo type, as shown in fig. 22-29.
As will be appreciated by those skilled in the art, audio compression (speed up) is the smooth spreading of time-compressed audio information over processed shortened audio data; whereas audio extension (slowing down) is to obtain longer audio data by smoothly overlapping incoming past and future (temporally newer relative to the reference data) audio information. This overlapping process introduces or spreads the more energetic signal into the original less energetic portion, resulting in both over-echo (echo after signal generation) and pre-echo (echo before signal generation). Therefore, the echo is judged, and the dispersion degree of signal energy can be reduced and pre-echo and over-echo can be relieved when the audio signal with extremely uneven energy distribution is faced through the switching of the window function.
In the present embodiment, the basis of the echo determination is the block energy (or block absolute value) of the audio signal, and if the past signal is larger than the present signal, an over-echo is likely to occur. In audio compression, the transition band of the window function needs to be shifted left, i.e. using a similar WL1,WR1Or WL3,WR3Such a window function suppresses the larger (past) signal at the left end, reducing the dispersion (examples correspond to fig. 26, 28, respectively); while in audio extension the transition band of the window function needs to be shifted to the right, i.e. using a similar WL2,WR2Or WL4,WR4Such a window function suppresses the (past) large signal at the left end and reduces the spread (examples correspond to fig. 23 and 25, respectively). If the past signal is smaller than the present one, pre-echo is easily generated, and the transition band of the window function needs to be shifted to the right in audio compression, i.e. using similar WL2,WR2Or WL4,WR4Such a window function suppresses the larger (future) signal at the right end, reducing the dispersion (examples correspond to fig. 27, 29, respectively); in audio extension, the transition band of the window function needs to be shifted left, i.e. using a similar WL1,WR1Or WL3,WR3Such a window function suppresses the larger (future) signal at the right end, reducing the dispersion (examples correspond to fig. 22, 24, respectively). WL1,WR1And WL3,WR3The transition zone of (A) is shifted to the left to different degrees, WL2,WR2And WL4,WR4The degree of right shift is also different, which needs to be discriminated by the subdivided block energy (or block absolute value).
It is easy to find that, in the present embodiment, the audio quality after speed change is further ensured by further selecting an appropriate reconstruction window function according to the echo type. Also, since if the past signal is larger than the present signal, over-echo is likely to occur. If the past signal is smaller than the present signal, pre-echo is likely to occur. Therefore, the block energy (or the absolute value of the block) of the audio signal is used as the basis for judging the echo type, and the accuracy of the judgment result can be effectively ensured.
The method embodiments of the present invention may be implemented in software, hardware, firmware, etc. Whether the present invention is implemented as software, hardware, or firmware, the instruction code may be stored in any type of computer-accessible memory (e.g., permanent or modifiable, volatile or non-volatile, solid or non-solid, fixed or removable media, etc.). Also, the Memory may be, for example, Programmable Array Logic (PAL), Random Access Memory (RAM), Programmable Read Only Memory (PROM), Read-Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic disk, an optical disk, a Digital Versatile Disk (DVD), or the like.
A third embodiment of the present invention relates to a digital audio shift processing apparatus. As shown in fig. 30, the digital audio shift processing device includes:
a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area until the filled length of the buffer area reaches the data processing length Lp。
A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal xout. When the audio frequency speed change processing is the speed-up processing, the windowing processing module leads the length in the buffer area to be LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLWill have a length L in the bufferpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWAn output signal xout. When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLWill have a length L in the bufferpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout。
A shift module for shifting the windowed LDThe signal is moved out of the buffer area and the filling module is instructed to continue filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length Lp。
When the filled length of the buffer reaches the data processing length LpAnd triggering the processing of the windowing processing module. When the windowing processing module obtains LWAn output signal xoutAnd triggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed.
Wherein, WLAs a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1. L isWIs a predetermined value according to LWAnd the playback rate r to LDAnd LpThe value of (c).
In the present embodiment, a window function W for performing windowing processingLAnd WRAdding an initial reconstruction window function equal to 1 to the corresponding point, an initial reconstruction window WLAnd WRThe following were used:
WL(k)=1-WR(k),k=1,2,…,LW
it is to be understood that the first embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
A fourth embodiment of the present invention relates to a digital audio shift processing apparatus. The fourth embodiment is an improvement on the third embodiment, and the main improvement lies in that: in the third embodiment, a window function W for performing windowing processingLAnd WRAdding an initial reconstruction window function, W, equal to 1 to the corresponding pointLAnd WRIs fixed; in the present embodiment, the window function W for performing the windowing processLAnd WRA reconstruction window function of different weight distributions selected according to the echo type of the audio signal data. That is to say, the digital audio variable speed processing device of the embodiment further includes a window function selection module, configured to obtain an echo type of the audio signal data according to a judgment result of a block energy or a block absolute value of the audio signal data and a preset threshold, and output the obtained echo type to the windowing processing module, where the window function is used to select the reconstruction window functions with different weight distributions.
And respectively and independently generating reconstruction window functions with different weight distributions, or transforming the initial reconstruction window to obtain the reconstruction window functions. The reconstruction window functions with different weight distributions are obtained by transforming the initial reconstruction window as follows:
the initial reconstruction window is decimated in integer proportion to obtain the slowly varying part of the transformed window type, and the unchanged parts at both ends are filled with 0 or 1 respectively until the original length of the initial reconstruction window is reached.
It is to be understood that the second embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the second embodiment. The related technical details mentioned in the second embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the second embodiment.
It should be noted that, each unit mentioned in each device embodiment of the present invention is a logical unit, and physically, one logical unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units, and the physical implementation manner of these logical units itself is not the most important, and the combination of the functions implemented by these logical units is the key to solve the technical problem provided by the present invention. Furthermore, the above-mentioned embodiments of the apparatus of the present invention do not introduce elements that are less relevant for solving the technical problems of the present invention in order to highlight the innovative part of the present invention, which does not indicate that there are no other elements in the above-mentioned embodiments of the apparatus.
While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.