[go: up one dir, main page]

CN102117613B - Method and equipment for processing digital audio in variable speed - Google Patents

Method and equipment for processing digital audio in variable speed Download PDF

Info

Publication number
CN102117613B
CN102117613B CN 200910202164 CN200910202164A CN102117613B CN 102117613 B CN102117613 B CN 102117613B CN 200910202164 CN200910202164 CN 200910202164 CN 200910202164 A CN200910202164 A CN 200910202164A CN 102117613 B CN102117613 B CN 102117613B
Authority
CN
China
Prior art keywords
length
audio
processing
audio signal
signal data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200910202164
Other languages
Chinese (zh)
Other versions
CN102117613A (en
Inventor
吴晟
林福辉
张本好
董树景
李昙
徐晶明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN 200910202164 priority Critical patent/CN102117613B/en
Publication of CN102117613A publication Critical patent/CN102117613A/en
Application granted granted Critical
Publication of CN102117613B publication Critical patent/CN102117613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the audio signal processing technology and discloses a method and equipment for processing digital audio in variable speed. In the invention, a pair of perfect reconstructing window functions WL and WR with amplitude attenuation and increase characteristics is used to act on original digital audio according to different delays to obtain a pair of windowing data; and an audio waveform is reconstructed by using the windowing data to obtain audio after variable-speed process. The defection on a fundamental tone period and the relativity of the audio and the time-frequency conversion are avoided, so that the calculation amount is extremely small. In addition, the playing time of the contents is prolonged or shortened by using the compaction and the introduction of the self waveform of an audio signal without changing the audio waveform, so that the original tone quality can be better maintained.

Description

Digital audio frequency speed-changing processing method and equipment thereof
Technical Field
The present invention relates to audio signal processing technology, and in particular, to audio frequency shift processing in audio signal processing technology.
Background
Playback speed adjustment of recorded digital audio is in wide demand in various multimedia applications. For example, the speed of sound of voice reproduction is reduced, which is helpful for people with hearing or comprehension disorder and foreign language beginners to improve the intelligibility of listening sound; and the speed of sound is increased, so that the listener can be helped to save the time for acquiring information from the sound recording. In addition, the music rhythm can be changed by adjusting the music playback speed, and a unique effect is obtained; for the accompanying sound in the video, the sound speed adjustment of the audio reproduction can lead the audience to hear the synchronous accompanying sound without distortion while the video accelerates or slows down the playback speed.
However, directly adjusting the playing speed of audio without any processing brings about variations in tone and timbre due to linear migration of frequency components in the sound. For example, when the sound speed is reduced, the sound becomes deep, and the voice becomes a nasal sound emitted during deep sleep; when the speed of sound is increased, the sound becomes sharp, and the voice appears as if a child speaks quickly. Therefore, in order to ensure that the audio only has the variation of the sound velocity, and the tone do not have the transformation and have no obvious distortion, the digital audio needs to be processed. At present, the audio frequency speed change processing is mostly performed by using an algorithm based on an overlap-add technique or an algorithm based on time-frequency transformation and frequency spectrum processing. See also U.S. patent No. 5952596 for techniques for audio shift processing.
However, the inventor of the present invention finds that, in an algorithm implemented based on an overlap add (overlap add) technique, it is necessary to determine the delay of an overlap window by detecting the similarity (waveform similarity) of waveforms, and this kind of method can only process voices with obvious pitch periods, and it uses a cross-correlation detection technique in the time domain or the frequency domain to obtain the delay of similar waveforms as the delay of the overlap window, so the computation consumption is large and the processing sound quality is general; while an algorithm based on time-frequency transform and spectral processing can process general audio (general audio) including voice and music, which resamples original digital audio to transform a sampling rate, converts the digital audio with the changed sampling rate into a frequency domain to obtain a frequency spectrum of the audio, then performs frequency spectrum shift (frequency shift) on the frequency spectrum, and transforms the processed frequency spectrum back to a time domain, the algorithm is generally implemented by using a perfectly reconfigurable short time Fourier transform (short time Fourier transform), the short time Fourier transform needs to process a longer audio at a time in order to obtain a higher processing sound quality, although the method can obtain a better processing sound quality, the amount of calculation and the storage capacity are large, and the algorithm can hardly be implemented on handheld and mobile devices because of the constraints of the calculation capacity and the power consumption.
Disclosure of Invention
The invention aims to provide a digital audio speed change processing method and a device thereof, which can realize speed change processing of general digital audio with lower calculation amount and obtain higher processing sound quality.
In order to solve the above technical problem, an embodiment of the present invention provides a digital audio speed change processing method, including the following steps:
a, filling audio signal data to be subjected to audio variable speed processing into a buffer area until the filled length of the buffer area reaches a data processing length Lp
B, windowing the audio signal data to be processed in the buffer area in the following mode to obtain an output signal xout
If the audio frequency speed change processing is accelerated speed processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
If the audio frequency speed changing processing is the processing of slowing down, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
C will finish the L of windowing processingDMoving the signal out of the buffer area, and continuously filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length Lp
Repeatedly executing the step B and the step C until the audio frequency speed change processing of all the audio signal data is completed;
wherein, the WLFor a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1.
An embodiment of the present invention further provides a digital audio speed-changing processing device, including:
a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area until the filled length of the buffer area reaches the data processing length Lp
A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal xout(ii) a The windowing processing module is used for setting the length in the buffer zone to be L when the audio frequency speed change processing is acceleratedpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout(ii) a When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
A shift module for shifting the windowed LDA signal is shifted out of the buffer and the filling module is instructed to bufferThe tail part of the area is continuously filled with the audio signal data to be processed until the filled length of the buffer area reaches the data processing length Lp
When the filled length of the buffer reaches the data processing length LpTriggering the processing of the windowing processing module; when the windowing processing module obtains LWA number of said output signals xoutTriggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed;
wherein, the WLFor a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1.
Compared with the prior art, the implementation mode of the invention has the main differences and the effects that:
using a pair of perfect reconstruction window functions W with amplitude attenuation and gain characteristicsLAnd WRAnd acting on the original digital audio according to different time delays to obtain a pair of windowed data, and reconstructing an audio waveform by using the windowed data to obtain the audio subjected to variable speed processing. The calculation amount is extremely low because the pitch period and the correlation of the audio do not need to be detected and the time-frequency transformation does not need to be carried out. In addition, the compression and the introduction of the waveform of the audio signal are utilized to realize the time increase and decrease of the playing content, and the audio waveform is not changed, so that the original tone quality can be maintained more.
Further, when windowing the audio signal data to be processed in the buffer, WLAnd WRAdding an initial reconstruction window function equal to 1 to the corresponding point; or, WLAnd WRA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and respectively and independently generating reconstruction window functions with different weight distributions, or transforming the initial reconstruction window to obtain the reconstruction window functions. Since audio compression (speed up) is time-to-timeThe compressed audio information is smoothly dispersed over the processed shortened audio data; whereas audio extension (slowing down) is to obtain longer audio data by smoothly overlapping incoming past and future (temporally newer relative to the reference data) audio information. The overlapping process introduces or spreads the signal with larger energy into the original part with smaller energy, resulting in over-echo (after the echo occurs) and pre-echo (before the echo occurs), so that when windowing is performed, an appropriate reconstruction window function can be further selected according to the echo type, so as to further ensure the audio quality after speed change.
Further, the echo type of the audio signal data is obtained according to the judgment result of the block energy or the block absolute value of the audio signal data and the preset threshold. Since if the past signal is larger than the present signal, over-echo is likely to occur; if the past signal is smaller than the present signal, pre-echo is likely to occur. Therefore, the block energy (or the absolute value of the block) of the audio signal is used as the basis for judging the echo type, and the accuracy of the judgment result can be effectively ensured.
Further, an initial reconstruction window WLAnd WRThe following were used:
WL(k)=1-WR(k),k=1,2,…,LW
experiments prove that when the window W is initially reconstructedLAnd WRDesigned as WLAnd WRIn the process, 4 pairs of reconstruction windows with different weight distributions, which can be more flexibly used for processing the audio, can be obtained through the decimation conversion.
Further, L is presetWAccording to LWAnd the playback rate r to LDAnd LpThe value of (c). Due to three lengths LW、LD、LpThe relationship between is fixed and determining one yields the other two. Design and L of the reconfigurable windowWIs directly related, thus employing a fixed length LWLet LD、LpThe subsequent windowing operation can be simpler and more convenient as r changes.
Drawings
FIG. 1 is a flow chart of a digital audio shift processing method according to a first embodiment of the present invention;
FIG. 2 is a schematic diagram of buffer filling according to a first embodiment of the present invention;
FIG. 3 is an initial reconstruction window W according to a first embodiment of the present inventionLAnd WRA schematic view of the window type of (1);
FIG. 4 is a diagram illustrating the output of a combined signal of a windowing process when the playback rate r > 1 according to the first embodiment of the present invention;
FIG. 5 is a diagram illustrating the output of a combined signal of the windowing process when the playback rate r < 1 according to the first embodiment of the present invention;
FIG. 6 is a diagram illustrating buffer shifting according to the first embodiment of the present invention;
FIG. 7 is a diagram illustrating waveform effects of a voice test according to a first embodiment of the present invention;
FIG. 8 is a diagram illustrating the waveform effect of a music test according to the first embodiment of the present invention;
FIG. 9 is a diagram illustrating a voice original spectrum of a voice test according to a first embodiment of the present invention;
FIG. 10 is a diagram illustrating a spectrum of a speech test with time compression of 0.5 times according to a first embodiment of the present invention;
FIG. 11 is a schematic diagram of a spectrum diagram of a voice test with a time spread of 2 times according to a first embodiment of the present invention;
FIG. 12 is a diagram illustrating a music original spectrum of a music test according to a first embodiment of the present invention;
FIG. 13 is a graph illustrating a time-compressed 0.5-fold spectrum of a music test according to a first embodiment of the present invention;
FIG. 14 is a schematic diagram of a spectrogram of a music test with a time spread of 2 times according to the first embodiment of the present invention;
FIG. 15 is a diagram illustrating a reconstruction window for obtaining different weight distributions by decimation according to a second embodiment of the present invention;
FIG. 16 is a reconstruction window W according to a second embodiment of the present inventionL1And WR1A schematic diagram of (a);
FIG. 17 is a reconstruction window W according to a second embodiment of the present inventionL2And WR2A schematic diagram of (a);
FIG. 18 is a reconstruction window W according to a second embodiment of the present inventionL3And WR3A schematic diagram of (a);
FIG. 19 is a reconstruction window W according to a second embodiment of the present inventionL4And WR4A schematic diagram of (a);
fig. 20 is a flowchart of a digital audio shift processing method according to a second embodiment of the present invention;
FIG. 21 is a diagram illustrating signal blocks corresponding to echo determination parameters according to a second embodiment of the present invention;
FIG. 22 is a diagram for expanding W according to audio in the second embodiment of the present inventionLAnd WRIs replaced by WL1And WR1A schematic illustration of pre-echo prevention;
FIG. 23 is a diagram of W when audio is extended according to the second embodiment of the present inventionLAnd WRIs replaced by WL2And WR2A schematic representation of prevention of hyperechoic sound;
FIG. 24 shows a graph of W when audio is extended according to the second embodiment of the present inventionLAnd WRIs replaced by WL3And WR3A schematic illustration of pre-echo prevention;
FIG. 25 is a diagram for expanding W according to audio in the second embodiment of the present inventionLAnd WRIs replaced by WL4And WR4A schematic representation of prevention of hyperechoic sound;
FIG. 26 is a diagram for compressing W in audio according to the second embodiment of the present inventionLAnd WRIs replaced by WL1And WR1A schematic representation of prevention of hyperechoic sound;
FIG. 27 shows W in audio compression according to the second embodiment of the present inventionLAnd WRIs replaced by WL2And WR2A schematic illustration of pre-echo prevention;
FIG. 28 shows a diagram of compressing W in audio according to the second embodiment of the present inventionLAnd WRIs replaced by WL3And WR3A schematic representation of prevention of hyperechoic sound;
FIG. 29 is a diagram for compressing W in audio according to the second embodiment of the present inventionLAnd WRIs replaced by WL4And WR4A schematic illustration of pre-echo prevention;
fig. 30 is a schematic structural diagram of a digital audio shift processing apparatus according to a third embodiment of the present invention.
Detailed Description
In the following description, numerous technical details are set forth in order to provide a better understanding of the present application. However, it will be understood by those skilled in the art that the technical solutions claimed in the present application can be implemented without these technical details and with various changes and modifications based on the following embodiments.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment of the invention relates to a digital audio speed-changing processing method, and the specific flow is shown in fig. 1.
In step 110, the audio signal data to be subjected to audio frequency variable speed processing is filled into the buffer area until the filled length of the buffer area reaches the data processing length Lp
Specifically, the reconstruction window length L is set in advanceWUpdating the data length LDData processing length LpThe value of (c). In the present embodiment, the playback rate r is a linear playback rate adjustment ratio, and if r < 1, the sound is expanded (i.e., slowed down), and if r > 1, the sound is compressed (i.e., sped up), for example, if r is 0.5, the playback time is 2 times the original playback time.
When the sampling frequency of the digital audio is not changed, a section of sound with the length (number of sampling points) of L is input, the sound is adjusted to be played at the playing speed r after being processed, and the length of the sound is changed into L/r; it is conversely understood that the fixed output length (i.e., the reconstruction window length defined in the present embodiment) LWThen, the input sound length (i.e., the update data length defined in the present embodiment) L is inputD=r×LW. In the case of audio compression, the processing is carried out only on the input update data, and the data processing length L is thenp=LD(ii) a While in audio extension, the output audio length is increased by LW-LDThe part is 2 (L) before the input dataW-LD) Dot information, provided by overlapping, so that the data processing length LpTo reach 2 (L)W-LD)+LD=2LW-LD. That is, according to the playback rate r, LW、LDAnd LpThe relationship between them is:
LD=r×LW
in this step, the audio signal data is filled sample by sample or block by block to the end of the buffer until the filled length of the buffer exceeds LPAs shown in fig. 2. For convenience, the buffer is expressed as an array x (k) in this embodiment.
Then, in step 120, the audio signal data to be processed in the buffer is windowed to obtain the output signal xout
Specifically, the reconstruction window function W is predeterminedLAnd WR,WLAnd WREach has LWData of points, WLHaving an amplitude-attenuating characteristic, WRWith the characteristic of an amplitude increase. WLAnd WRMeets the requirement of perfect reconstruction condition, i.e. WLAnd WRThe corresponding point addition is equal to 1, and the formula is expressed as WL(k)+WR(k)=1,k=1,2,…,LW. In this embodiment, a verified, more practical W is usedLAnd WR
WL(k)=1-WR(k),k=1,2,…,LW
LWWhen W is equal to NLAnd WRThe window type of (2) is shown in fig. 3. Of course, in practical application, W may be usedLAnd WRThe design is other forms, and only the perfect reconstruction condition is satisfied or approximately satisfied.
The specific windowing processing mode in this step is as follows:
if the playback rate r > 1, i.e. sound compression (speed up), the windowed data is
xL(k)=x(k)WL(k),k=1,2,…,LW
xR(k)=x(k+LD-LW)WR(k),k=1,2,…,LW
The output data is a frame LWOutputting the dot form:
xout(k)=xL(k)+xR(k),k=1,2,…,LW
as shown in fig. 4, xLActually, the length is Lp=LD=r×LwOf (d) and a window function WLMultiplying W point by point after left end alignmentLAnd x isRIs to the length of the window function W for the data to be processedRMultiplying W point by point after right end alignmentR。xLAnd xRAfter addition, the audio data length is from LpIs reduced to LwThe audio length is compressed. Compressed Lp-LwPoint at a window function W with perfect reconstruction propertiesLAnd WRUnder the smoothing effect of the window function, the window function is fused in the processed audio, the information quantity is not reduced, the perfect reconstruction characteristic of the window function is realized, and the stationarity of the processed audio is also ensured.
If the playback rate r < 1, i.e. the sound is expanded (slowed down), the windowed data is
xL(k)=x(k)WR(k),k=1,2,…,LW
xR(k)=x(k+LW-LD)WL(k),k=1,2,…,LW
The output data is still output in the form of one frame LW point:
xout(k)=xL(k)+xR(k),k=1,2,…,LW
as shown in fig. 5, xLActually, the length is Lp=(2-r)LwData to be processed (r < 1) and a window function WLMultiplying W point by point after right end alignmentLAnd x isRIs to the length of the window function W for the data to be processedRMultiplying W point by point after left end alignmentR。xLAnd xRAfter addition, the audio data length is from LDTo increase to LwThe length of the audio is elongated, and the information is added by the length Lp=(2-r)LwFirst (1-r) L in the signal to be processedwAnd post (1-r) LwPoints, which are relative to the middle LDThe points are past information and future information, respectively. By windowing and superposition, the first (1-r) L in the signal to be processedwAnd post (1-r) LwPoints are introduced into the middle LwThe point, the audio information introduced, is in a window function W with perfect reconstruction propertiesLAnd WRThe smooth function of the window function is fused in the processed audio frequency, the perfect reconstruction characteristic of the window function is fused, and the stability of the processed audio frequency is also ensured.
It is worth mentioning that the three lengths L are usedw,LD,LPThe direct relationship is fixed and determining one yields the other two. Design and L of the reconfigurable windowwIs directly related, so in this embodiment, a fixed length L is usedwLet LD,LPAnd varies with r to make subsequent windowing operations simpler. In addition, it is understood that in practical application, L may be determined firstD(or L)P) Is again in accordance with LD(or L)P) And the playback rate r to LwAnd LP(or L)D) The value of (c).
Next, in step 130, the windowed L is processedDThe signal is moved out of the buffer and the audio signal data to be processed continues to be filled at the end of the buffer until the filled length of the buffer reaches the data processing length Lp, as shown in fig. 6. That is to say thatFrom L in x (k)dAll data starting at +1 is shifted to the beginning in its entirety, i.e.:
x(k)=x(k+LD),k=1,2,…
after the step 130 is completed, the process returns to the step 120 until the audio speed change process of all the audio signal data is completed.
Since in the present embodiment, a pair of perfect reconstruction window functions W having amplitude attenuation and gain characteristics is usedLAnd WRAnd acting on the original digital audio according to different time delays to obtain a pair of windowed data, and reconstructing an audio waveform by using the windowed data to obtain the audio subjected to variable speed processing. Audio compression (speed up) time-compressed audio information is smoothly spread over the processed shortened audio data by the overlap of a pair of windowed audio signals. Whereas the audio extension (slowing down) smoothly overlaps by introducing past and future (temporally newer relative to the reference data) audio information to obtain longer audio data. Because the pitch period and the correlation of the audio do not need to be detected and the time-frequency transformation does not need to be carried out, the audio processing flow only uses shifting, windowing and overlapping, the calculation theoretical complexity is only O (N), and is less than O (N) using the waveform similarity detection algorithm2) And O (Nlog) of algorithm of time-frequency transform and spectrum processing2N). The amount of calculation is extremely low. In addition, the compression and the introduction of the waveform of the audio signal are utilized to realize the time increase and decrease of the playing content, and the audio waveform is not changed, so that the original tone quality can be maintained more. The actual processing shows that the speed change range of the embodiment is wide, the time control is accurate, the processing tone quality is high, and the universal audio (including voice and music) can be processed.
For example, the test performs a speed change process on a segment of English speech and a segment of violin concerto's Liang Zhu'. The playing speed r is 2 and 0.5, and is correspondingly accelerated to 2 times and decelerated to 0.5 times. If the conventional algorithm is used for the rate, the audio quality is seriously reduced, and after the method in the embodiment is adopted for processing, the audio is still kept at a high degree and is relatively improved more. As shown in fig. 7 and 8, the waveform envelope of the audio maintains the same shape. And the frequency spectrum of the audio also maintains the same voiceprint (as shown in fig. 9-11, fig. 12-14). The actual audition shows that the audio before and after processing has not changed in tone except for the speed of sound.
The second embodiment of the present invention relates to a digital audio shift processing method. The second embodiment is improved on the basis of the first embodiment, and the main improvement lies in that: in the first embodiment, W used when windowing audio signal data to be processed in a buffer is performedLAnd WRAdding an initial reconstruction window function, W, equal to 1 to the corresponding pointLAnd WRIs stationary. In the present embodiment, when performing windowing on the audio signal data to be processed in the buffer, it is necessary to select different reconstruction window functions according to the echo types of the audio signal data, and then perform windowing using the selected reconstruction window function, as shown in fig. 20.
In particular, in audio processing, different weight distribution reconstruction windows may be used depending on the waveform. The acquisition method of the reconstruction window is not unique, but must satisfy or approximately satisfy a perfect reconstruction condition, i.e., WLAnd WRThe corresponding point addition is approximately 1
WL(k)+WR(k)≈1,k=1,2,…,LW
The reconstruction windows with different weight distributions can be separately generated and stored, or only one initial reconstruction window can be generated and stored, and the reconstruction windows with other weight distributions are obtained from the initial reconstruction window through transformation. The conversion method comprises the following steps: the original reconstruction window is decimated in integer proportion, e.g. 2 decimating 1, 3 decimating 1, 4 decimating 1, etc., to obtain the slowly varying part of the transform window type, and the invariant parts at both ends are filled with 0 or 1 respectively to obtain the extension to reach the original length, as shown in fig. 15.
In the present embodiment, the first embodiment is usedW in the modeLAnd WRAs an initial reconstruction window function, 4 pairs of reconstruction window functions with different weight distributions can be obtained by carrying out decimation conversion on the initial reconstruction window, and the reconstruction window functions are used for processing audio more flexibly. Wherein, WL1,WR1(the window type is shown in FIG. 16) is:
Figure G2009102021641D00121
WL2,WR2(the window type is shown in FIG. 17) is:
Figure G2009102021641D00123
Figure G2009102021641D00124
WL3,WR3(the window type is shown in FIG. 18) is:
Figure G2009102021641D00131
Figure G2009102021641D00132
WL4,WR4(the window type is shown in FIG. 19) is:
Figure G2009102021641D00134
in this embodiment, the echo type of the audio signal data is obtained according to the judgment result of the block energy or the block absolute value of the audio signal data and the preset threshold.
Specifically, the design determination parameters engLa, engLb, engRa, and engRb, and the signal blocks (scribe-and-fill regions) corresponding thereto are shown in fig. 21, and it can be seen that engLa, engLb, engRa, and engRb are arranged from old to new in time, and when energy calculation is adopted, the determination parameters engLa, engLb, engRa, and engRb are obtained by the following formulas:
Figure G2009102021641D00135
Figure G2009102021641D00136
when absolute value calculation is adopted, the judgment parameters engLa, engLb, engRa and engRb are obtained by the following formula:
Figure G2009102021641D00137
Figure G2009102021641D00138
according to the judgment parameters engLa, engLb, engRa and engRb, an echo judgment threshold echo rate (which must be a value greater than 1 and generally greater than 2) is set, an echo type is obtained according to the judgment parameters and the echo judgment threshold, and a used reconstruction window is selected, which can be specifically realized by the following codes:
initial echoControl is 0; holding WLAnd WRIs not changed
if (outLen > frmLen) sound expansion judgment process
{
if(engRa>(engLa+engLb)*echoRate)
echoControl=3;WLAnd WRIs replaced by WL3And WR3Prevention of pre-echo (pre echo)
else if(engRb>(engLa+engLb)*echoRate)
echoControl=1;WLAnd WRIs replaced by WL1And WR1Prevention of pre-echo (pre echo)
if(engLb>(engRa+engRb)*echoRate)
echoControl=4;WLAnd WRIs replaced by WL4And WR4Prevention of hyperechoic sound (post echo)
else if(engLa>(engRa+engRb)*echoRate)
echoControl=2;WLAnd WRIs replaced by WL2And WR2Prevention of hyperechoic sound (post echo)
}
else sound compression judgment process
{
if(engRb>(engLa+engLb))
{
if(engRa>(engLa+engLb))
echoControl=4;WLAnd WRIs replaced by WL4And WR4Prevention of pre-echo (pre echo)
else
echoControl=2;WLAnd WRIs replaced by WL2And WR2Prevention of pre-echo (pre echo)
}
if(engLa>(engRa+engRb))
{
if(engLb>(engRa+engRb))
echoControl=3;WLAnd WRIs replaced by WL3And WR3Prevention of hyperechoic sound (post echo)
else
echoControl=1;WLAnd WRIs replaced by WL1And WR1Prevention of hyperechoic sound (post echo)
}
}
It has been experimentally confirmed that the speech intelligibility can be further improved by selecting an adaptive window function according to the echo type, as shown in fig. 22-29.
As will be appreciated by those skilled in the art, audio compression (speed up) is the smooth spreading of time-compressed audio information over processed shortened audio data; whereas audio extension (slowing down) is to obtain longer audio data by smoothly overlapping incoming past and future (temporally newer relative to the reference data) audio information. This overlapping process introduces or spreads the more energetic signal into the original less energetic portion, resulting in both over-echo (echo after signal generation) and pre-echo (echo before signal generation). Therefore, the echo is judged, and the dispersion degree of signal energy can be reduced and pre-echo and over-echo can be relieved when the audio signal with extremely uneven energy distribution is faced through the switching of the window function.
In the present embodiment, the basis of the echo determination is the block energy (or block absolute value) of the audio signal, and if the past signal is larger than the present signal, an over-echo is likely to occur. In audio compression, the transition band of the window function needs to be shifted left, i.e. using a similar WL1,WR1Or WL3,WR3Such a window function suppresses the larger (past) signal at the left end, reducing the dispersion (examples correspond to fig. 26, 28, respectively); while in audio extension the transition band of the window function needs to be shifted to the right, i.e. using a similar WL2,WR2Or WL4,WR4Such a window function suppresses the (past) large signal at the left end and reduces the spread (examples correspond to fig. 23 and 25, respectively). If the past signal is smaller than the present one, pre-echo is easily generated, and the transition band of the window function needs to be shifted to the right in audio compression, i.e. using similar WL2,WR2Or WL4,WR4Such a window function suppresses the larger (future) signal at the right end, reducing the dispersion (examples correspond to fig. 27, 29, respectively); in audio extension, the transition band of the window function needs to be shifted left, i.e. using a similar WL1,WR1Or WL3,WR3Such a window function suppresses the larger (future) signal at the right end, reducing the dispersion (examples correspond to fig. 22, 24, respectively). WL1,WR1And WL3,WR3The transition zone of (A) is shifted to the left to different degrees, WL2,WR2And WL4,WR4The degree of right shift is also different, which needs to be discriminated by the subdivided block energy (or block absolute value).
It is easy to find that, in the present embodiment, the audio quality after speed change is further ensured by further selecting an appropriate reconstruction window function according to the echo type. Also, since if the past signal is larger than the present signal, over-echo is likely to occur. If the past signal is smaller than the present signal, pre-echo is likely to occur. Therefore, the block energy (or the absolute value of the block) of the audio signal is used as the basis for judging the echo type, and the accuracy of the judgment result can be effectively ensured.
The method embodiments of the present invention may be implemented in software, hardware, firmware, etc. Whether the present invention is implemented as software, hardware, or firmware, the instruction code may be stored in any type of computer-accessible memory (e.g., permanent or modifiable, volatile or non-volatile, solid or non-solid, fixed or removable media, etc.). Also, the Memory may be, for example, Programmable Array Logic (PAL), Random Access Memory (RAM), Programmable Read Only Memory (PROM), Read-Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic disk, an optical disk, a Digital Versatile Disk (DVD), or the like.
A third embodiment of the present invention relates to a digital audio shift processing apparatus. As shown in fig. 30, the digital audio shift processing device includes:
a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area until the filled length of the buffer area reaches the data processing length Lp
A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal xout. When the audio frequency speed change processing is the speed-up processing, the windowing processing module leads the length in the buffer area to be LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLWill have a length L in the bufferpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWAn output signal xout. When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLWill have a length L in the bufferpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
A shift module for shifting the windowed LDThe signal is moved out of the buffer area and the filling module is instructed to continue filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length Lp
When the filled length of the buffer reaches the data processing length LpAnd triggering the processing of the windowing processing module. When the windowing processing module obtains LWAn output signal xoutAnd triggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed.
Wherein, WLAs a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWThe data of the points, the addition of the corresponding points is equal to 1 or approximate to 1. L isWIs a predetermined value according to LWAnd the playback rate r to LDAnd LpThe value of (c).
In the present embodiment, a window function W for performing windowing processingLAnd WRAdding an initial reconstruction window function equal to 1 to the corresponding point, an initial reconstruction window WLAnd WRThe following were used:
Figure G2009102021641D00171
WL(k)=1-WR(k),k=1,2,…,LW
it is to be understood that the first embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the first embodiment.
A fourth embodiment of the present invention relates to a digital audio shift processing apparatus. The fourth embodiment is an improvement on the third embodiment, and the main improvement lies in that: in the third embodiment, a window function W for performing windowing processingLAnd WRAdding an initial reconstruction window function, W, equal to 1 to the corresponding pointLAnd WRIs fixed; in the present embodiment, the window function W for performing the windowing processLAnd WRA reconstruction window function of different weight distributions selected according to the echo type of the audio signal data. That is to say, the digital audio variable speed processing device of the embodiment further includes a window function selection module, configured to obtain an echo type of the audio signal data according to a judgment result of a block energy or a block absolute value of the audio signal data and a preset threshold, and output the obtained echo type to the windowing processing module, where the window function is used to select the reconstruction window functions with different weight distributions.
And respectively and independently generating reconstruction window functions with different weight distributions, or transforming the initial reconstruction window to obtain the reconstruction window functions. The reconstruction window functions with different weight distributions are obtained by transforming the initial reconstruction window as follows:
the initial reconstruction window is decimated in integer proportion to obtain the slowly varying part of the transformed window type, and the unchanged parts at both ends are filled with 0 or 1 respectively until the original length of the initial reconstruction window is reached.
It is to be understood that the second embodiment is a method embodiment corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the second embodiment. The related technical details mentioned in the second embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the second embodiment.
It should be noted that, each unit mentioned in each device embodiment of the present invention is a logical unit, and physically, one logical unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units, and the physical implementation manner of these logical units itself is not the most important, and the combination of the functions implemented by these logical units is the key to solve the technical problem provided by the present invention. Furthermore, the above-mentioned embodiments of the apparatus of the present invention do not introduce elements that are less relevant for solving the technical problems of the present invention in order to highlight the innovative part of the present invention, which does not indicate that there are no other elements in the above-mentioned embodiments of the apparatus.
While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims (10)

1. A digital audio speed-changing processing method is characterized by comprising the following steps:
a, filling audio signal data to be subjected to audio variable speed processing into a buffer area until the filled length of the buffer area reaches a data processing length Lp
B, windowing the audio signal data to be processed in the buffer area in the following mode to obtain an output signal xout
If the audio frequency speed change processing is the speed-up processing, the buffering is carried outLength in zone LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
If the audio frequency speed changing processing is the processing of slowing down, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
C will finish the L of windowing processingDMoving the signal out of the buffer area, and continuously filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length Lp
Repeatedly executing the step B and the step C until the audio frequency speed change processing of all the audio signal data is completed;
wherein, the WLFor a window function having amplitude attenuation characteristics, WRFor a window function having an amplitude increasing characteristic, WLAnd WREach has LWData of points, the addition of the corresponding points being equal to 1 or approximately 1; said LWIs a predetermined value according to LWAnd the playback rate r to obtain said LDAnd LpThe value of (c).
2. The method of claim 1, wherein the windowing is performed on the audio signal data to be processed in the buffer areaWLAnd WRAdding an initial reconstruction window function equal to 1 to the corresponding point; or, the WLAnd WRA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and the reconstruction window functions with different weight distributions are respectively and independently generated or obtained by transforming the initial reconstruction window.
3. The method of claim 2, wherein the echo type of the audio signal data is obtained according to a judgment result of a block energy or a block absolute value of the audio signal data and a preset threshold.
4. The digital audio frequency variable speed processing method according to claim 2, wherein the reconstruction window functions with different weight distributions obtained by transforming the initial reconstruction window are obtained as follows:
and performing integer ratio value extraction on the initial reconstruction window to obtain a slowly-changed part of a transformation window type, and filling the unchanged parts at two ends with 0 or 1 respectively until the original length of the initial reconstruction window is reached.
5. The digital audio variable-speed processing method according to claim 2, wherein the initial reconstruction window WLAnd WRThe following were used:
Figure 2009102021641100001DEST_PATH_IMAGE002
WL(k)=1-WR(k),k=1,2,…,LW
6. a digital audio shift processing apparatus, comprising:
a filling module for filling the audio signal data to be subjected to audio variable speed processing into the buffer area till the buffer areaIs filled to the data processing length Lp
A windowing processing module for windowing the audio signal data to be processed in the buffer to obtain an output signal xout(ii) a The windowing processing module is used for setting the length in the buffer zone to be L when the audio frequency speed change processing is acceleratedpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after left end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after right end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout(ii) a When the audio frequency speed change processing is the speed reduction processing, the length in the buffer area is LpOf audio signal data and length LWWindow function W ofLMultiplying W point by point after right end alignmentLTo obtain xLThe length in the buffer is LpOf audio signal data and length LWWindow function W ofRMultiplying W point by point after left end alignmentRTo obtain xRX to be obtainedLAnd xRAdd to obtain LWA number of said output signals xout
A shift module for shifting the windowed LDThe signal is moved out of the buffer area and the filling module is instructed to continue filling the audio signal data to be processed at the tail part of the buffer area until the filled length of the buffer area reaches the data processing length Lp
When the filled length of the buffer reaches the data processing length LpTriggering the processing of the windowing processing module; when the windowing processing module obtains LWA number of said output signals xoutTriggering the processing of the shifting module until the audio frequency speed change processing of all the audio signal data is completed;
wherein, the WLFor a window function having amplitude attenuation characteristics, WRTo have an amplitudeWindow function of increasing characteristics, WLAnd WREach has LWData of points, the addition of the corresponding points being equal to 1 or approximately 1; said LWIs a predetermined value according to LWAnd the playback rate r to obtain said LDAnd LpThe value of (c).
7. The apparatus according to claim 6, wherein the window function W for windowingLAnd WRAdding an initial reconstruction window function equal to 1 to the corresponding point; or,
the window function W for windowingLAnd WRA reconstruction window function with different weight distributions selected according to the echo type of the audio signal data; and the reconstruction window functions with different weight distributions are respectively and independently generated or obtained by transforming the initial reconstruction window.
8. The digital audio variable speed processing device according to claim 7, further comprising: and the window function selection module is used for acquiring the echo type of the audio signal data according to the judgment result of the block energy or the block absolute value of the audio signal data and a preset threshold, and outputting the acquired echo type to the windowing processing module.
9. The apparatus according to claim 7, wherein the reconstruction window functions with different weight distributions are obtained by transforming the initial reconstruction window as follows:
and performing integer ratio value extraction on the initial reconstruction window to obtain a slowly-changed part of a transformation window type, and filling the unchanged parts at two ends with 0 or 1 respectively until the original length of the initial reconstruction window is reached.
10. The digital audio variable speed processing device according to claim 7, wherein the initial reconstruction windowWLAnd WRThe following were used:
Figure 135250DEST_PATH_IMAGE002
WL(k)=1-WR(k),k=1,2,…,LW
CN 200910202164 2009-12-31 2009-12-31 Method and equipment for processing digital audio in variable speed Active CN102117613B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200910202164 CN102117613B (en) 2009-12-31 2009-12-31 Method and equipment for processing digital audio in variable speed

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200910202164 CN102117613B (en) 2009-12-31 2009-12-31 Method and equipment for processing digital audio in variable speed

Publications (2)

Publication Number Publication Date
CN102117613A CN102117613A (en) 2011-07-06
CN102117613B true CN102117613B (en) 2012-12-12

Family

ID=44216345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200910202164 Active CN102117613B (en) 2009-12-31 2009-12-31 Method and equipment for processing digital audio in variable speed

Country Status (1)

Country Link
CN (1) CN102117613B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102419981B (en) * 2011-11-02 2013-04-03 展讯通信(上海)有限公司 Zooming method and device for time scale and frequency scale of audio signal
CN106469559B (en) * 2015-08-19 2020-10-16 中兴通讯股份有限公司 Voice data adjusting method and device
CN105208426B (en) * 2015-09-24 2018-07-06 福州瑞芯微电子股份有限公司 A kind of method and system of audio-visual synchronization speed change
CN110333722A (en) * 2019-07-11 2019-10-15 北京电影学院 A kind of robot trajectory generates and control method, apparatus and system
CN116088390A (en) * 2023-02-23 2023-05-09 展讯通信(上海)有限公司 Audio processing method, device and electronic equipment
CN118658450B (en) * 2024-08-20 2024-11-08 罗普特科技集团股份有限公司 AI voice rate adjustment method and system for AI platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175769A (en) * 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
EP0608833A2 (en) * 1993-01-25 1994-08-03 Matsushita Electric Industrial Co., Ltd. Method of and apparatus for performing time-scale modification of speech signals
US5781885A (en) * 1993-09-09 1998-07-14 Sanyo Electric Co., Ltd. Compression/expansion method of time-scale of sound signal
CN1208490A (en) * 1996-11-11 1999-02-17 松下电器产业株式会社 Sound reproducing speed converter
CN1440549A (en) * 2000-07-26 2003-09-03 Ssi株式会社 Continuously Variable Time Scale Alteration Technology for Digital Audio Signals

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5175769A (en) * 1991-07-23 1992-12-29 Rolm Systems Method for time-scale modification of signals
EP0608833A2 (en) * 1993-01-25 1994-08-03 Matsushita Electric Industrial Co., Ltd. Method of and apparatus for performing time-scale modification of speech signals
US5781885A (en) * 1993-09-09 1998-07-14 Sanyo Electric Co., Ltd. Compression/expansion method of time-scale of sound signal
CN1208490A (en) * 1996-11-11 1999-02-17 松下电器产业株式会社 Sound reproducing speed converter
CN1440549A (en) * 2000-07-26 2003-09-03 Ssi株式会社 Continuously Variable Time Scale Alteration Technology for Digital Audio Signals

Also Published As

Publication number Publication date
CN102117613A (en) 2011-07-06

Similar Documents

Publication Publication Date Title
CN102117613B (en) Method and equipment for processing digital audio in variable speed
AU719955B2 (en) Non-uniform time scale modification of recorded audio
JP3017715B2 (en) Audio playback device
KR101046147B1 (en) System and method for providing high quality stretching and compression of digital audio signals
KR101334366B1 (en) Method and apparatus for varying audio playback speed
US5611018A (en) System for controlling voice speed of an input signal
CA2253749C (en) Method and device for instantly changing the speed of speech
JP5367932B2 (en) System and method enabling audio speed conversion
JP2007003682A (en) Speaking speed converter
JPH0193795A (en) Enunciation speed conversion for voice
JP3373933B2 (en) Speech speed converter
JP3162945B2 (en) Video tape recorder
US20070269056A1 (en) Method and Apparatus for Audio Signal Expansion and Compression
JP2612867B2 (en) Voice pitch conversion method
Lin et al. High quality and low complexity pitch modification of acoustic signals
JP6313619B2 (en) Audio signal processing apparatus and program
US10891966B2 (en) Audio processing method and audio processing device for expanding or compressing audio signals
JP4313724B2 (en) Audio reproduction speed adjustment method, audio reproduction speed adjustment program, and recording medium storing the same
Haghparast et al. Real-time pitchshifting of musical signals by a time-varying factor using normalized filtered correlation time-scale modification (NFC-TSM)
KR100359988B1 (en) real-time speaking rate conversion system
JPH07192392A (en) Speaking speed conversion device
JP4985152B2 (en) Information processing apparatus, signal processing method, and program
JPH0713596A (en) Speech speed converting method
KR101152616B1 (en) Method for variable playback speed of audio signal and apparatus thereof
JP5089473B2 (en) Speech synthesis apparatus and speech synthesis method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180416

Address after: The 300456 Tianjin FTA test area (Dongjiang Bonded Port) No. 6865 North Road, 1-1-1802-7 financial and trade center of Asia

Patentee after: Xinji Lease (Tianjin) Co.,Ltd.

Address before: 201203 Shanghai city Zuchongzhi road Pudong Zhangjiang hi tech park, Spreadtrum Center Building 1, Lane 2288

Patentee before: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20110706

Assignee: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Assignor: Xinji Lease (Tianjin) Co.,Ltd.

Contract record no.: 2018990000196

Denomination of invention: Method and equipment for processing digital audio in variable speed

Granted publication date: 20121212

License type: Exclusive License

Record date: 20180801

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221027

Address after: 201203 Shanghai city Zuchongzhi road Pudong New Area Zhangjiang hi tech park, Spreadtrum Center Building 1, Lane 2288

Patentee after: SPREADTRUM COMMUNICATIONS (SHANGHAI) Co.,Ltd.

Address before: 300456 1-1-1802-7, north area of financial and Trade Center, No. 6865, Asia Road, Tianjin pilot free trade zone (Dongjiang Bonded Port Area)

Patentee before: Xinji Lease (Tianjin) Co.,Ltd.