CN107622774B

CN107622774B - A kind of music-tempo spectrogram generation method based on match tracing

Info

Publication number: CN107622774B
Application number: CN201710675484.3A
Authority: CN
Inventors: 桂文明
Original assignee: Jinling Institute of Technology
Current assignee: Jinling Institute of Technology
Priority date: 2017-08-09
Filing date: 2017-08-09
Publication date: 2018-08-21
Anticipated expiration: 2037-08-09
Also published as: CN107622774A

Abstract

The invention provides a method for generating a music velocity spectrogram based on matching and tracking, and relates to the field of content-based music information retrieval. The method includes the following steps: inputting a music signal, generating a note start point detection function o(n) and dividing it into frames ; Take the common music speed range and convert it into a frequency set; for each frequency in the frequency set, create a corresponding parent atom; perform a shift operation on the parent atom, and generate a new atom every time it is moved; combine all parent atoms and new atoms Assemble into a redundant dictionary; use the dictionary to perform matching and tracking on each frame of o(n), get the decomposition coefficient of each music speed, and finally generate the music speed spectrum of the music. The music velocity spectrogram generated by the present invention has the characteristics of high resolution and strong sparsity, and can flexibly set the resolution of the music velocity, the displacement granularity of the parent atom and the number of matching and tracking cycles according to its own requirements, thereby generating different resolutions and Music tempo spectrograms at different sparsities.

Description

A music tempo spectrogram generation method based on matching pursuit

技术领域technical field

本发明涉及基于内容的音乐信息检索领域，特别是涉及一种基于匹配追踪的音乐速度谱图生成方法。The invention relates to the field of content-based music information retrieval, in particular to a music tempo spectrogram generation method based on matching and pursuit.

背景技术Background technique

一、本发明相关概念和应用领域1. Related concepts and application fields of the present invention

音乐进行的快慢是音乐速度(tempo)，现代音乐中通常以“拍每分钟”(beats perminute,简称bpm)来作为速度的度量，比如音乐速度标记表示该音乐的速度是每分钟120个四分音符，也就是每个四分音符的时值占0.5秒，bmp值越大，速度越快。The speed of music progress is the music tempo (tempo). In modern music, "beats per minute" (beats per minute, bpm for short) is usually used as a measure of speed, such as music tempo marks Indicates that the speed of the music is 120 quarter notes per minute, that is, the duration of each quarter note is 0.5 seconds, and the larger the bmp value, the faster the speed.

音乐速度和音乐的节拍、节奏等密切相关，是音乐的重要特征之一。在音乐信息检索领域，速度估计是指基于音乐的内容，从mp3、wav等形式的，含音乐信号波形的文件出发估计音乐的行进速度。速度估计本身是一个具有挑战性的重要课题，同时又是音乐节拍感知、音乐节奏识别、音乐类型识别、音乐结构分析等研究方向的基础工作。例如，在音乐节拍感知过程中，往往需要先估计音乐速度，然后根据速度进行节拍类型和节拍结构的推断；又比如在音乐类型识别中，节奏和速度可作为识别类型的一种显著特征。Music speed is closely related to the beat and rhythm of music, and is one of the important characteristics of music. In the field of music information retrieval, speed estimation refers to estimating the traveling speed of music based on music content, starting from files containing music signal waveforms in the form of mp3 and wav. Tempo estimation itself is a challenging and important topic, and it is also the basic work of research directions such as music beat perception, music rhythm recognition, music type recognition, and music structure analysis. For example, in the process of music beat perception, it is often necessary to estimate the music speed first, and then infer the beat type and beat structure according to the speed; for example, in music genre recognition, rhythm and speed can be used as a significant feature of the recognition genre.

音乐速度在音乐行进过程中是不断变化的，变化的一种原因是由于音乐乐曲被创作时本身就是要求变化的，然而这种形式的变化次数在一首乐曲中一般不大，很多音乐甚至不发生变化；另一种原因是演奏或演唱的误差产生的，这种形式的变化难以避免，一般存在于音乐的所有部分。因此，估计音乐速度实际上是需要估计各时点的速度值。由于存在连音和休止符等现象，音乐速度模糊难辨；同时，速度又存在误差，因此各时点的速度实际上可看成是多个速度分量组成的向量。一首音乐各时点的速度，可以用音乐速度谱图(tempogram) 来进行描述。音乐的节拍跟踪、节奏识别、类型识别等应用均可借助音乐速度谱图，提取有用信息。The speed of the music is constantly changing during the progress of the music. One reason for the change is that when the music is created, it is required to change. Changes; Another reason is the error of performance or singing, this form of change is inevitable, and generally exists in all parts of music. Therefore, estimating the music tempo actually requires estimating the tempo values at each time point. Due to the phenomenon of legato and rest, the music speed is blurred and difficult to distinguish; at the same time, there are errors in the speed, so the speed at each time point can actually be regarded as a vector composed of multiple speed components. The tempo of a piece of music at each time point can be described by a tempogram. Applications such as music beat tracking, rhythm recognition, and genre recognition can use the music tempo spectrogram to extract useful information.

二、音乐速度谱图的现有生成技术和过程2. The existing generation technology and process of music velocity spectrogram

音乐速度谱图的生成过程一般分成两个阶段，第一阶段是音符起始点检测函数(note onset detection function)生成阶段，音符起始点是指音乐中每个音符开始演奏或演唱的那一刻，有的文献如[1]这一阶段称之为新颖曲线(Novelty Curve)生成；第二阶段是谱图生成过程。The generation process of the music velocity spectrogram is generally divided into two stages. The first stage is the generation stage of the note onset detection function. The note onset refers to the moment when each note in the music begins to play or sing. In literature such as [1], this stage is called Novelty Curve generation; the second stage is the spectrum generation process.

第一阶段主要包括信号变换、特征提取、起始点检测函数生成等几个部分，信号变换目的是用信号变换的方法把音乐波形信号从一维的高频数据转换为低频表示。一般是先对信号分帧，然后对每帧信号进行信号变换，信号变换方法包括短时傅立叶变换(ShortTime Fourier Transform,简称STFT)、小波变换(Wavelet Transform,简称WT)等。特征提取是从前一阶段的信号低频表示中，提取时域、频域、以及时频表示等特征。时域特征典型的如幅度包络特征，频域特征如谱波动特征(Spectral Flux)和频域能量特征等，时频表示特征主要是基于小波变换或Cohen类时频分布的特征表示。起始点检测函数生成是根据每帧信号提取的特征计算前后帧的变化情况，音符起始点一般存在于前后帧正向变化突然增大的情况中。典型的音符起始点检测函数生成过程可参考文献[1]。The first stage mainly includes several parts such as signal transformation, feature extraction, and start point detection function generation. The purpose of signal transformation is to convert the music waveform signal from one-dimensional high-frequency data to low-frequency representation by means of signal transformation. Generally, the signal is divided into frames first, and then signal transformation is performed on each frame signal. The signal transformation methods include Short Time Fourier Transform (STFT for short), Wavelet Transform (WT for short), etc. Feature extraction is to extract features such as time domain, frequency domain, and time-frequency representation from the low-frequency representation of the signal in the previous stage. Typical time-domain features are amplitude envelope features, frequency-domain features such as spectral fluctuation features (Spectral Flux) and frequency-domain energy features, and time-frequency representation features are mainly based on wavelet transform or Cohen-like time-frequency distribution feature representation. The starting point detection function generation is to calculate the changes of the front and back frames based on the features extracted from each frame signal. The starting point of the note generally exists in the case of a sudden increase in the positive changes of the front and back frames. The typical note onset detection function generation process can refer to [1].

第二阶段是根据前一阶段音符起始点检测函数的值，提取周期特性，形成音乐速度谱图。这一阶段目前主要的方法包括自相关函数法(Autocorrelation Function，简称ACF)、傅立叶变换法(Fourier Transform,简称FT)两种[1]。The second stage is to extract the periodic characteristics according to the value of the note start point detection function in the previous stage to form a music velocity spectrogram. Currently, the main methods at this stage include Autocorrelation Function (ACF for short) and Fourier Transform (FT for short) [1].

ACF是通过对音符起始点检测函数加窗并进行自相关计算，根据延迟提取音符起始点的周期性，并将延迟转换成音乐速度度量，从而形成音乐速度谱图。其计算公式为[1]：ACF is through windowing the note onset detection function and performing autocorrelation calculations, extracting the periodicity of the note onset according to the delay, and converting the delay into a music velocity measure to form a music velocity spectrogram. Its calculation formula is [1]:

A(t,l)＝∑_n∈Zo(n)o(n+l)W(n-t)/(2N+1-l) (1.1)A(t,l)=∑ _n∈Z o(n)o(n+l)W(nt)/(2N+1-l) (1.1)

其中t,n为离散时间，取l＝1...N为延迟，o(n)为音符起始点检测函数，W(n)为中心点是t＝0,支撑为[-N,N]的矩形窗。设f_s为o(n)的抽样频率，则延迟l对应的周期为l/f_s，频率为f_s/l，对应的音乐速度τ＝60*f_s/l。Where t, n are discrete time, l=1...N is the delay, o(n) is the note start point detection function, W(n) is the center point is t=0, and the support is [-N,N] rectangular window. Let f _s be the sampling frequency of o(n), then the period corresponding to the delay l is l/f _s , the frequency is f _s /l, and the corresponding music speed τ=60*f _s /l.

FT方法是对音符起始点检测函数进行加窗傅立叶变换，求得频域特性，并将频域度量转换成音乐速度度量，从而形成音乐速度谱图。其计算公式为：The FT method is to perform windowed Fourier transform on the note onset detection function to obtain the frequency domain characteristics, and convert the frequency domain measurement into a music velocity measurement, thereby forming a music velocity spectrogram. Its calculation formula is:

F(t,ω)＝∑_n∈Zo(n)W(n-t)e^-2πiωn (1.2)F(t,ω)=∑ _n∈Z o(n)W(nt)e ^-2πiωn (1.2)

其中t,n为离散时间，ω为频率，o(n)为音符起始点检测函数，W(n)为中心点是t＝0,支撑为[-N,N] 的汉宁窗。对于ω，目前有两种方法确定，一种是根据文献[2]的离散傅立叶变换法(Discrete Fourier Transform,简称DFT)，将ω＞0离散化为N个频点，间隔为f_s/NHz；另一种是类似文献[1]做法，取ω＝τ/60Hz，τ∈[30,480]bpm为常用音乐速度范围，并利用上述公式计算各时点ω对应的系数。Among them, t and n are discrete time, ω is frequency, o(n) is a note starting point detection function, W(n) is a Hanning window whose center point is t=0, and support is [-N,N]. For ω, there are currently two ways to determine it. One is to discretize ω>0 into N frequency points with an interval of f _s /NHz according to the Discrete Fourier Transform (DFT) method of literature [2]. ; The other is similar to the practice of literature [1], taking ω=τ/60Hz, τ∈[30,480]bpm as the common music speed range, and using the above formula to calculate the coefficient corresponding to ω at each time point.

三、现有技术的不足3. Insufficiency of existing technology

本发明的成果体现在音乐速度谱图生成的第二阶段。为说明现有技术的不足，引入音乐速度分辨率和音乐速度谱图稀疏性两个概念，并分别加以说明。音乐速度分辨率，这里借鉴领域的频率分辨率，用音乐速度谱图中速度分量的两个相邻有效点的间隔大小来表示，间隔越大，速度分辨率越差。音乐速度谱图稀疏性指所有谱图系数中的非零元素的个数，非零元素个数少，其稀疏性强，辨析度好。The achievement of the present invention is embodied in the second stage of music tempo spectrogram generation. In order to illustrate the deficiencies of the existing technology, two concepts of music velocity resolution and music velocity spectrogram sparsity are introduced and explained separately. The music velocity resolution, here refers to the frequency resolution in the field, expressed by the interval between two adjacent effective points of the velocity component in the music velocity spectrogram, the larger the interval, the worse the velocity resolution. The sparsity of the music velocity spectrogram refers to the number of non-zero elements in all spectrogram coefficients. The less the number of non-zero elements, the stronger the sparsity and the better the resolution.

1、现有技术无法满足常用音乐速度对速度分辨率的要求1. The existing technology cannot meet the speed resolution requirements of commonly used music speeds

音乐速度分辨率和谱图中两个相邻有效点的间隔呈反方向变化，间隔越大分辨率越差，反之，间隔越小则越好。The music speed resolution and the interval between two adjacent effective points in the spectrogram change in the opposite direction. The larger the interval, the worse the resolution. Conversely, the smaller the interval, the better.

考察ACF方法中速度分量的前后两点之差为这说明速度间隔随着延迟增大而减小，延迟越大，速度分辨率越高，也就是速度分辨率随速度增大而增大。我们按文献[1]取f_s＝1/0.023＝43.5,当l＝51(τ＝51.2)时，Δτ＝0.98，当l＜51时，速度的分辨率均小于1，当l＝21(τ＝124.2)时，Δτ＝5.6，此时已经不能分辨常用音乐的速度(τ∈[30,480])，更不用说l＜21的情况了。而要使Δτ最大值(n＝1时)小于1，f_s需小于1/30,也就是第一阶段分帧时帧长要大于30秒，那么音符起始点的误差也将是大于等于30秒，这显然不可行，因此，对ACF来说，音乐速度分辨率不恒定，在音乐速度大于51bpm时，分辨率低于1，不能满足常用音乐速度的分辨率。In the ACF method, the difference between the two points before and after the velocity component is This shows that the speed interval decreases as the delay increases, and the greater the delay, the higher the speed resolution, that is, the speed resolution increases as the speed increases. We take f _s =1/0.023=43.5 according to literature [1]. When l=51(τ=51.2), Δτ=0.98. When l<51, the speed resolution is less than 1. When l=21( τ=124.2), Δτ=5.6, at this time it is impossible to distinguish the speed of commonly used music (τ∈[30,480]), let alone the case of l<21. And to make the maximum value of Δτ (when n=1) less than 1, f _s needs to be less than 1/30, that is, the frame length of the first stage of framing must be greater than 30 seconds, then the error of the note starting point will also be greater than or equal to 30 seconds, which is obviously not feasible. Therefore, for ACF, the music speed resolution is not constant. When the music speed is greater than 51bpm, the resolution is lower than 1, which cannot meet the resolution of commonly used music speeds.

考察文献[2]中的DFT方法，速度间隔为60*f_s/Nbpm，按f_s＝1/0.023＝43.5计算，如果要达到1bpm的音乐速度分辨率，需N≥f_s*60＝2610，而这样一个长度的窗口时长为60秒，目前大部分音乐是在300秒以下，因此，窗口长度要求和一般音乐长度不相适应。要提高速度分辨率，另外一种方法是减小f_s，而这和o(n)的精度要求矛盾，因此是不可行的。Investigate the DFT method in the literature [2], the speed interval is 60*f _s /Nbpm, calculated according to f _s =1/0.023=43.5, if the music speed resolution of 1bpm is to be achieved, N≥f _s *60=2610 , and the window duration of such a length is 60 seconds, most of the music is below 300 seconds at present, therefore, the window length requirement does not adapt to the general music length. To improve the speed resolution, another way is to reduce f _s , but this is in conflict with the accuracy requirement of o(n), so it is not feasible.

考察文献[1]的FT方法，这种方法实际上是对音乐信号加窗后，通过计算离散时间傅立叶变换(Discrete Time Fourier Transform,简称DTFT)的方法计算常用音乐速度对应的频率ω的系数，这种方法实际上只是对ω在离散点上进行近似抽样，其实际频率分辨率并没有得到提升。Consider the FT method in the literature [1]. This method actually calculates the coefficient of the frequency ω corresponding to the commonly used music speed by calculating the Discrete Time Fourier Transform (DTFT) method after adding a window to the music signal. In fact, this method only approximates ω at discrete points, and its actual frequency resolution has not been improved.

综上所述，现有技术无法满足常用音乐速度对速度分辨率的要求，也就是生成的音乐速度谱图的部分区域将模糊不清。To sum up, the existing technology cannot meet the speed resolution requirements of commonly used music speed, that is, some areas of the generated music speed spectrogram will be blurred.

2、现有技术生成的音乐速度谱图稀疏性不够好2. The music velocity spectrogram generated by the existing technology is not good enough

音乐速度谱图稀疏性越强，其辨析度就越好。从另外一方面说，谱图稀疏性强说明谱图能量集中，性质好，应用效果好。The stronger the sparsity of the music tempo spectrogram, the better its resolution. On the other hand, the strong sparsity of the spectrum shows that the energy of the spectrum is concentrated, the property is good, and the application effect is good.

对ACF方法，在音乐速度超过51bpm时，其分量系数在正常精度(比如1bpm)下，需通过插值方法计算系数，必然造成稀疏性下降。而对FT方法来说，由于频谱泄露和分辨率问题的存在，频域系数的稀疏性显然同样要下降。因此，现有技术生成的音乐速度谱图稀疏性不够好，能量集中度差。For the ACF method, when the music speed exceeds 51bpm, its component coefficients are at normal precision (such as 1bpm), and the coefficients need to be calculated by interpolation method, which will inevitably lead to a decrease in sparsity. For the FT method, due to the existence of spectrum leakage and resolution problems, the sparsity of frequency domain coefficients will obviously also decrease. Therefore, the sparsity of the music velocity spectrogram generated by the prior art is not good enough, and the energy concentration is poor.

综上所述，现有技术生成的音乐速度谱图在分辨率和稀疏性方面，存在缺陷，而利用本发明可生成分辨率更高，稀疏性更好的音乐速度谱图。本专利所用参考文献如下：In summary, the music velocity spectrogram generated by the prior art has defects in resolution and sparsity, and the present invention can generate a music velocity spectrogram with higher resolution and better sparsity. The references used in this patent are as follows:

1.P.Grosche,M.Müller,F.Kurth.Cyclic tempogram—a mid-level temporepresentation for musicsignals[C]. in Acoustics Speech and Signal Processing(ICASSP),2010IEEE International Conference on.2010:IEEE.1.P.Grosche,M.Müller,F.Kurth.Cyclic tempogram—a mid-level temporepresentation for musicsignals[C]. in Acoustics Speech and Signal Processing(ICASSP),2010IEEE International Conference on.2010:IEEE.

2.G.Peeters.Time variable Tempo Detection and beat Marking[C].inICMC.2005.2.G.Peeters.Time variable Tempo Detection and beat Marking[C].inICMC.2005.

3.MIREX.MIREX音乐测试数据集.3. MIREX. MIREX music test data set.

http://www.music-ir.org/evaluation/MIREX/data/2006/tempo/tempo_train_2006.zip.2017.http://www.music-ir.org/evaluation/MIREX/data/2006/tempo/tempo_train_2006.zip.2017.

发明内容Contents of the invention

本发明的目的是为了克服现有技术生成的音乐速度谱图分辨率和稀疏性的不足问题，提供一种基于匹配追踪的音乐速度谱图生成方法。The object of the present invention is to provide a music tempo spectrogram generation method based on matching pursuit to overcome the insufficient resolution and sparsity of the music tempo spectrogram generated in the prior art.

为解决上述技术问题，本发明采用的技术方案是：In order to solve the problems of the technologies described above, the technical solution adopted in the present invention is:

1、输入音乐信号，生成音符起始点检测函数o(n)；1. Input a music signal to generate a note start point detection function o(n);

2、对o(n)分帧，形成若干帧信号；2. Framing o(n) to form several frame signals;

3、取常用音乐速度区间，按一定音乐速度分辨率，把速度集合转换成频率集合；3. Take the commonly used music speed range, and convert the speed set into a frequency set according to a certain music speed resolution;

4、对频率集合中的每一频率，创建一个对应的母原子；4. For each frequency in the frequency set, create a corresponding parent atom;

5、按一定粒度对所有母原子进行移位操作，每移动一步生成一个原子，把这些移动生成的原子连同母原子一起组成该母原子对应的频率的原子集合；5. Perform a shift operation on all parent atoms at a certain granularity, generate an atom at each step, and combine the atoms generated by these shifts together with the parent atoms to form an atomic set of the frequency corresponding to the parent atom;

6、把频率集合中所有频率对应的原子集合组装成冗余字典；6. Assemble the atomic sets corresponding to all frequencies in the frequency set into a redundant dictionary;

7、对o(n)的每一帧信号，运用冗余字典，进行匹配追踪，循环一定次数，生成一系列分解系数和对应的原子；7. For each frame signal of o(n), use the redundant dictionary to perform matching and tracking, and cycle for a certain number of times to generate a series of decomposition coefficients and corresponding atoms;

8、根据冗余字典中原子和音乐速度的关系，把o(n)的每一帧信号的分解系数，归属于某一音乐速度的系数；8. According to the relationship between the atom and the music speed in the redundant dictionary, assign the decomposition coefficient of each frame signal of o(n) to the coefficient of a certain music speed;

9、合并每帧信号的音乐速度谱向量，组成音乐速度谱图。9. Merge the music velocity spectrum vectors of each frame signal to form a music velocity spectrum graph.

本发明的有益效果：Beneficial effects of the present invention:

本发明的特点是基于冗余字典的匹配追踪算法，生成音乐速度谱图。其有益效果在于生成谱图的良好分辨率和稀疏特性。The present invention is characterized in that it generates a music speed spectrogram based on a redundant dictionary matching and tracking algorithm. The beneficial effect is the good resolution and sparse nature of the generated spectra.

本发明良好的分辨率得益于冗余字典中原子的灵活设置，可根据音乐速度分辨率需求生成更高分辨率的原子组成冗余字典，从而使谱图的分辨率更高。图2-图4是使用音乐信息检索交流评比中心(Music Information Retrieval Evaluation eXchange，简称MIREX)的测试数据集[3]中的一首音乐(train1.wav)，分别采用自相关函数法(图2)、傅立叶变换法 (图3)、和本发明匹配追踪方法(图4)，生成的音乐速度谱图。音乐速度轴的相邻间隔均为 1bpm(共571个点)，从分辨率看，图2自相关函数法在低速部分分辨率尚可，但在高速部分模糊不清，带状条渐宽，分辨率显著降低。图3傅立叶变换法中在高速和低速部分，带状条都较宽，分辨率显然不如本发明图4的结果(为和自相关函数法、傅立叶变换法比较，循环次数为571)。The good resolution of the present invention benefits from the flexible setting of atoms in the redundant dictionary, which can generate higher-resolution atoms to form a redundant dictionary according to the resolution requirements of music speed, so that the resolution of the spectrogram is higher. Figure 2-Figure 4 is a piece of music (train1.wav) in the test data set [3] of the Music Information Retrieval Evaluation eXchange (MIREX) using the autocorrelation function method (Figure 2 ), Fourier transform method (Fig. 3), and the matching pursuit method (Fig. 4) of the present invention, the music velocity spectrogram that generates. The adjacent intervals of the music speed axis are 1bpm (571 points in total). From the resolution point of view, the autocorrelation function method in Figure 2 is acceptable in the low-speed part, but blurred in the high-speed part, and the band gradually widens. Resolution is significantly reduced. In Fig. 3 Fourier transform method, at high-speed and low-speed parts, strips are all wider, and the resolution is obviously not as good as the result of Fig. 4 of the present invention (for comparing with autocorrelation function method and Fourier transform method, the number of cycles is 571).

本发明优良的稀疏特性得益于冗余字典提供了与原信号高度相似的原子，并且匹配追踪算法保证了这些高度相似原子的分解系数相对较大，非相似原子系数较小甚至为零。从图 2-图4比较可以看出图4的系数显著稀疏，零或近零系数占比显著大。The excellent sparsity feature of the present invention benefits from the fact that the redundant dictionary provides highly similar atoms to the original signal, and the matching and tracking algorithm ensures that the decomposition coefficients of these highly similar atoms are relatively large, and the coefficients of dissimilar atoms are small or even zero. From the comparison of Figure 2-Figure 4, it can be seen that the coefficients in Figure 4 are significantly sparse, and the proportion of zero or near-zero coefficients is significantly larger.

本发明生成的音乐速度谱图除具备良好的分辨率和稀疏性之外，还具有应用的灵活性。灵活性体现在音乐速度的分辨率、母原子的移位粒度、匹配追踪的循环次数的可调性。速度分辨率的调整可以在常用速度区间转换成频率集合过程中进行实施；母原子的移位粒度可在生成原子集合时进行设置，粒度越小，原子集合越大，谱图精度越高，图8-10分别为移位粒度50、20、5三种情况，从对比结果可看出精度越来越高；循环次数在匹配追踪算法中设置，循环次数越大，生成的系数越多，谱图越密集，但系数大小顺序仍然是不变的，图5-7分别是循环次数20、10、5的三种情况，显然谱图的系数越来越少，但较大系数不变。The music tempo spectrogram generated by the invention not only has good resolution and sparseness, but also has application flexibility. The flexibility is reflected in the resolution of the music speed, the granularity of the displacement of the parent atom, and the adjustability of the cycle number of the matching pursuit. The adjustment of the velocity resolution can be carried out during the process of converting the common velocity range into the frequency set; the shift granularity of the parent atom can be set when generating the atomic set. The smaller the granularity, the larger the atomic set, the higher the accuracy of the spectrum, 8-10 are the three cases of shift granularity 50, 20, and 5 respectively. From the comparison results, it can be seen that the accuracy is getting higher and higher; the number of cycles is set in the matching pursuit algorithm. The larger the number of cycles, the more coefficients are generated, and the spectrum The denser the graph is, the order of the coefficients remains unchanged. Figures 5-7 show the three cases of cycle times 20, 10, and 5 respectively. Obviously, the coefficients of the spectrum are getting fewer and fewer, but the larger coefficients remain unchanged.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述的附图仅仅是本发明的实施例的部分附图，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings described below are only For some drawings of the embodiments of the present invention, those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是发明实施例提供的音乐速度谱图生成流程图；Fig. 1 is the flow chart of music tempo spectrogram generation that the embodiment of the invention provides;

图2是采用自相关函数法生成的音乐速度谱图；Fig. 2 is the music velocity spectrogram that adopts autocorrelation function method to generate;

图3是采用傅立叶变换法生成的音乐速度谱图；Fig. 3 is the music velocity spectrogram that adopts Fourier transform method to generate;

图4是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为571，移位粒度为2)；Fig. 4 is that the present invention adopts the music tempo spectrogram (number of cycles is 571, and the displacement granularity is 2) that the method based on matching pursuit generates;

图5是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为20，移位粒度为2)；Fig. 5 is that the present invention adopts the music velocity spectrogram (number of cycles is 20, and the displacement granularity is 2) that the method based on matching pursuit generates;

图6是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为10，移位粒度为2)；Fig. 6 is that the present invention adopts the music tempo spectrogram (number of cycles is 10, and the displacement granularity is 2) that the method based on matching pursuit generates;

图7是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为5，移位粒度为2)；Fig. 7 is that the present invention adopts the music tempo spectrogram (number of cycles is 5, and the displacement granularity is 2) that the method based on matching pursuit generates;

图8是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为20，移位粒度为50)；Fig. 8 is that the present invention adopts the music tempo spectrogram (number of cycles is 20, and the displacement granularity is 50) that the method based on matching pursuit generates;

图9是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为20，移位粒度为20)；Fig. 9 is that the present invention adopts the music tempo spectrogram (number of cycles is 20, and the displacement granularity is 20) that the method based on matching pursuit generates;

图10是本发明采用基于匹配追踪的方法生成的音乐速度谱图(循环次数为20，移位粒度为5)。Fig. 10 is the music tempo spectrogram generated by the method based on matching pursuit in the present invention (the number of cycles is 20, and the shift granularity is 5).

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

音乐速度分辨率，这里借鉴领域的频率分辨率，用音乐速度谱图中速度分量的两个相邻有效点的间隔大小来表示，间隔越大，速度分辨率越差。音乐速度谱图稀疏性指所有谱图系数中的非零元素的个数，非零元素个数少，其稀疏性强，辨析度好。本发明实施例提供一种基于匹配追踪的音乐速度谱图生成方法，如图1所示，该方法包括：The music velocity resolution, here refers to the frequency resolution in the field, expressed by the interval between two adjacent effective points of the velocity component in the music velocity spectrogram, the larger the interval, the worse the velocity resolution. The sparsity of the music velocity spectrogram refers to the number of non-zero elements in all spectrogram coefficients. The number of non-zero elements is small, and its sparsity is strong and the resolution is good. Embodiments of the present invention provide a method for generating a music velocity spectrogram based on matching and pursuit, as shown in Figure 1, the method includes:

1.输入音乐信号，生成音符起始点检测函数o(n)1. Input the music signal and generate the note start point detection function o(n)

输入的音乐信号一般是wav、mp3等形式、含有波形的文件。音乐速度谱图生成的第一阶段包括信号变换、特征提取、起始点检测函数生成等过程，输出长度为N的音符起始点检测函数o(n)，即一个向量。这一阶段可参照文献[1]进行实施。The input music signal is generally in the form of wav, mp3, etc., and contains waveform files. The first stage of music velocity spectrogram generation includes signal transformation, feature extraction, onset detection function generation and other processes, and the output length is N note onset detection function o(n), that is, a vector. This stage can be implemented with reference to literature [1].

2.对o(n)分帧，形成若干帧信号2. Framing o(n) to form several frame signals

对o(n)进行分帧，优选地，分帧的帧长为6秒(设帧内有M个点)，每跳(hopsize)约0.2秒，形成行数为M、列数为N的检测函数矩阵X＝X(m,n)m∈[1...M]n∈[1...N]。Carry out framing to o(n), preferably, the frame length of framing is 6 seconds (assuming that there are M points in the frame), and each hop (hopsize) is about 0.2 seconds, forming a hop size with M rows and N rows Detection function matrix X=X(m,n)m∈[1...M]n∈[1...N].

3.取常用音乐速度区间τ∈[30,480],τ∈R，按音乐速度分辨率要求把速度集合转换成频率集合3. Take the commonly used music speed interval τ∈[30,480], τ∈R, and convert the speed set into a frequency set according to the music speed resolution requirements

现有技术不能由用户选择音乐速度分辨率生成所需音乐速度谱图，而本发明可自由选择音乐速度分辨率，生成相应的冗余字典，并经匹配追踪算法生成相应的音乐速度谱图，这体现了本发明可根据应用进行设置音乐速度分辨率的灵活性。本发明的音乐速度分辨率取值可以是1,2…正整型值并在所有子区间相同，也可以按自相关函数法或者傅立叶变换法的取值方法取值，甚至可以是划分子区间，并在各子区间按不同的音乐速度分辨率取值，比如在音乐最常用的速度区间[80,150]取音乐速度分辨率为0.25,而其他子区间取0.5。为方便比较，在实施例中整个区间取音乐速度分辨率为1，于是对于τ∈[30,480],τ∈Z，转换成频率集合集合 {f_b|f_b＝τ/60,τ＝[30,31,...480],b＝[1..B]}，其中b是对应的频率集合中的频率序号，B为序号最大值。In the prior art, the user cannot select the music tempo resolution to generate the required music tempo spectrogram, but the present invention can freely select the music tempo resolution, generate the corresponding redundant dictionary, and generate the corresponding music tempo spectrogram through the matching and tracking algorithm, This reflects the flexibility of the present invention to set the music speed resolution according to the application. The value of the music speed resolution of the present invention can be 1, 2... positive integer value and the same in all sub-intervals, or can be valued according to the value method of the autocorrelation function method or the Fourier transform method, and can even be divided into sub-intervals , and select values according to different music speed resolutions in each sub-interval. For example, in the most commonly used speed interval of music [80,150], the music speed resolution is set to 0.25, while other sub-intervals are set to 0.5. For the convenience of comparison, in the embodiment, the music speed resolution is set to 1 for the entire interval, so for τ∈[30,480], τ∈Z, it is converted into a frequency set set {f _b |f _b =τ/60,τ=[30 ,31,...480],b=[1..B]}, where b is the frequency sequence number in the corresponding frequency set, and B is the maximum value of the sequence number.

4.对频率集合中的每一频率，创建一个对应的母原子4. For each frequency in the frequency set, create a corresponding parent atom

具体来说，针对步骤3中得到的频率集合，将该集合中的每一频率f_b，创建该频率的余弦函数作为对应的母原子α_b，其长度为o(n)的分帧长度M，形式为： α_b＝cos(2πf_bt),t＝(0...M-1)/f_o,f_o为o(n)的抽样率，t表示时间。Specifically, for the frequency set obtained in step 3, for each frequency f _b in the set, create the cosine function of the frequency as the corresponding parent atom α _b , whose length is the frame length M of o(n) , the form is: α _b ＝cos(2πf _b t), t=(0...M-1)/f _o , f _o is the sampling rate of o(n), and t represents time.

5.按一定粒度对所有母原子进行右移位操作，每移动一步生成一个原子，把这些移动生成的原子连同母原子一起组成该母原子对应的频率的原子集合5. Perform a right shift operation on all parent atoms at a certain granularity, generate an atom at each step, and combine the atoms generated by these shifts together with the parent atom to form an atomic set of the frequency corresponding to the parent atom

母原子α_b的支撑域为[0,M-1],移位粒度d＝1,2,3...是一个正整数，将母原子α_b向右移动d*j位(j＝1,2,3...)，母原子α_b右移后，其左边[0,M-d*j-1]支撑域的值用 cos(-2πf_bt),t＝(M-d*j...1)/f_o补充，这样每移动一次，可以得到一个新的原子。此处母原子是周期函数，因此，设置最大移动位数不超过一个周期。所有母原子α_b和这些位移得到的原子一起组成了该母原子对应的原子集合d_b。The support domain of the parent atom α _b is [0, M-1], and the shift granularity d=1, 2, 3... is a positive integer, and the parent atom α _b is moved to the right by d*j bits (j=1 ,2,3...), after the parent atom α _b is shifted to the right, the value of the [0,Md*j-1] support domain on its left is cos(-2πf _b t),t=(Md*j... 1)/f _o is added, so that every time you move, you can get a new atom. Here the parent atom is a periodic function, therefore, the maximum number of moving digits is set to be no more than one period. All the parent atoms α _b and the displaced atoms constitute the atom set d _b corresponding to the parent atom.

此步骤中母原子移位粒度的可调性再次体现了应用本发明的灵活性。粒度越小，原子集合越大，谱图精度越高，但同时整个音乐速度谱图的计算耗时越多。图8-10分别为移位粒度50、20、5三种情况，从对比结果可看出谱图的精度越来越高。应用本发明时可根据计算耗时要求和谱图精度要求，确定母原子的移位粒度。The tunability of the parent atom translocation granularity in this step again reflects the flexibility of applying the present invention. The smaller the particle size, the larger the atomic set, and the higher the accuracy of the spectrogram, but at the same time, the calculation of the entire music velocity spectrogram takes more time. Figure 8-10 shows the three cases of shift granularity of 50, 20, and 5 respectively. From the comparison results, it can be seen that the accuracy of the spectrum is getting higher and higher. When the present invention is applied, the displacement granularity of the parent atom can be determined according to the calculation time-consuming requirement and the spectrogram accuracy requirement.

6.把第5步中由频率集合中所有频率对应的原子集合组装成冗余字典6. Assemble the atomic sets corresponding to all frequencies in the frequency set in step 5 into a redundant dictionary

把频率集合中所有频率f_b对应的原子集合d_b，组装成一个冗余字典D。Assemble the atomic set d _b corresponding to all frequencies f _b in the frequency set into a redundant dictionary D.

7.对o(n)的每一帧信号，运用冗余字典，进行匹配追踪，循环一定次数，生成一系列分解系数和对应的原子：7. For each frame signal of o(n), use the redundant dictionary to perform matching and tracking, and cycle for a certain number of times to generate a series of decomposition coefficients and corresponding atoms:

对o(n)的每一帧信号，即对检测函数矩阵的每一列X_i,i∈[1..N]，运用冗余字典D，实施匹配追踪算法：For each frame signal of o(n), that is, for each column X _i , i∈[1..N] of the detection function matrix, use the redundant dictionary D to implement the matching pursuit algorithm:

(1)置剩余信号y_n＝X_i,n＝0，开始执行循环；(1) Set the remaining signal y _n =X _i , n=0, and start to execute the loop;

(2)计算冗余字典的所有原子g_j∈D和剩余信号y_n的内积＜y_n,g_j＞，选择所有内积中绝对值最大者对应的原子g_k为本次循环匹配的原子，保存第n次循环的分解系数s_n＝|＜y_n,g_k＞|和对应原子g_n＝g_k；(2) Calculate the inner product <y _n , g _j > of all atoms g _j ∈ D of the redundant dictionary and the remaining signal y _n , and select the atom g _k corresponding to the largest absolute value of all inner products as the current cycle match Atom, save the decomposition coefficient s _n =|<y _n ,g _k >| and the corresponding atom g _n =g _k of the nth cycle;

(3)重新计算剩余信号y_n+1＝y_n-|＜y_n,g_k＞|g_k；(3) Recalculate the residual signal y _n+1 =y _n -|<y _n , g _k >|g _k ;

(4)若循环次数或剩余信号与原信号能量比达到精度要求，则退出循环，否则置 n＝n+1,从步骤(2)开始继续执行。(4) If the number of cycles or the energy ratio of the remaining signal and the original signal meets the accuracy requirement, exit the cycle, otherwise set n=n+1, and continue to execute from step (2).

优选地，本发明一般按循环次数终止循环，可根据音乐速度谱图的要求设置循环次数，比如K＝10次，20次…等。循环终止后得到s_n,g_n,n＝[1...K]。Preferably, the present invention generally terminates the cycle according to the number of cycles, and the number of cycles can be set according to the requirements of the music tempo spectrogram, such as K=10 times, 20 times...etc. After the loop is terminated, s _n , g _n , n=[1...K] are obtained.

本发明基于常用音乐速度区间生成的冗余字典提供了与原信号高度相似的原子，匹配追踪算法保证了这些高度相似原子的分解系数相对较大，非相似原子系数较小甚至为零，从而使得本发明生成的音乐速度谱图更具稀疏特性。从图2-图4比较可以看出图4的系数显著稀疏，零或近零系数占比显著大。The present invention provides atoms highly similar to the original signal based on the redundant dictionary generated by the commonly used music speed intervals, and the matching and tracking algorithm ensures that the decomposition coefficients of these highly similar atoms are relatively large, and the coefficients of dissimilar atoms are small or even zero, so that The music tempo spectrogram generated by the invention has more sparse characteristics. From the comparison of Figure 2-Figure 4, it can be seen that the coefficients in Figure 4 are significantly sparse, and the proportion of zero or near-zero coefficients is significantly larger.

匹配追踪算法的循环次数是可调的，循环次数越大，生成的系数越多，谱图越密集(系数大小和生成顺序仍然是不变的)，但计算耗时将随着循环次数的增大而增加。图5-7分别是循环次数20、10、5的三种情况，显然谱图的系数越来越少。在一些应用中少量大系数就足够完成任务，此时可设置循环次数为较小值，所需计算耗时较小；而另一些应用需要大量系数提供足够的信息，只需增加循环次数即可。这也体现了本发明应用的灵活性，而这是现有技术不具备的。The number of cycles of the matching pursuit algorithm is adjustable. The larger the number of cycles, the more coefficients will be generated and the denser the spectrogram (the size of the coefficients and the order of generation will remain unchanged), but the calculation time will increase with the increase of the number of cycles. Big and increase. Figures 5-7 show the three cases of cycle times 20, 10, and 5 respectively, and it is obvious that the coefficients of the spectrum are getting fewer and fewer. In some applications, a small number of large coefficients is enough to complete the task. At this time, the number of cycles can be set to a small value, and the required calculation time is less; while other applications require a large number of coefficients to provide sufficient information, just increase the number of cycles. . This also reflects the flexibility of the application of the present invention, and this is not available in the prior art.

8.根据冗余字典中原子和音乐速度的关系，把o(n)的每一帧信号的分解系数，归属于某一音乐速度的系数8. According to the relationship between atoms and music speed in the redundant dictionary, assign the decomposition coefficient of each frame signal of o(n) to the coefficient of a certain music speed

对每一帧信号，首先创建一个初始值为0的音乐速度谱向量S_n,n＝[1..N]，各分量的序号是音乐速度序号b,b＝[1..B]，各分量的值是该音乐速度的分解系数。然后，对每一帧信号的分解系数s_n，根据冗余字典中原子g_n的对应频率找到对应的音乐速度序号b，把分解系数s_n当作该音乐速度的分解系数，如果存在多个原子对应相同的音乐速度序号，则将多个分解系数累加求和后，再当作该音乐速度的分解系数。For each frame signal, first create a music velocity spectrum vector S _n with an initial value of 0, n=[1..N], the serial number of each component is the music velocity serial number b, b=[1..B], each The value of the component is the decomposition factor of the tempo of the music. Then, for the decomposition coefficient s _n of each frame signal, find the corresponding music speed sequence number b according to the corresponding frequency of the atom g _n in the redundant dictionary, and use the decomposition coefficient s _n as the decomposition coefficient of the music speed. If there are multiple The atoms correspond to the same music tempo serial number, and then multiple decomposition coefficients are accumulated and summed, and then used as the decomposition coefficient of the music tempo.

9.合并每帧信号的音乐速度谱向量，组成音乐速度谱图9. Merge the music velocity spectrum vectors of each frame signal to form a music velocity spectrum

所有帧的音乐速度谱向量S_n，按列方式组装合并成音乐速度谱图 S＝S(b,n),b＝[1..B],n＝[1...N]。The music velocity spectrum vectors S _n of all frames are assembled and merged into a music velocity spectrum graph S=S(b,n), b=[1..B],n=[1...N].

以上所述仅为本发明的实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above is only an embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technologies fields, are equally included in the scope of patent protection of the present invention.

Claims

1. a kind of music-tempo spectrogram generation method based on match tracing, specifically comprises the following steps：

S1. music signal is inputted, note starting point detection function o (n) is generated；

S2. to o (n) framings, several frame signals are formed；

Framing is carried out to o (n), it is preferable that the frame length of framing is 6 seconds, if there is M point in frame, often jumps 0.2 second, then forms line number Detection function matrix X=X (m, n) m ∈ [1...M] the n ∈ [1...N] for being N for M, columns；

S3. take common music-tempo section τ ∈ [30,480], τ ∈ R that sets of speeds is converted by music-tempo resolution requirement At frequency sets:

Positive integer value that the value of music-tempo resolution ratio is 1,2 ..., and it is identical in all subintervals；Or press auto-correlation function The obtaining value method value of method or Fourier transform；Subinterval is either divided, and in each subinterval by different music speed Resolution ratio value is spent, when it is 1 that entire section, which takes music-tempo resolution ratio, then for τ ∈ [30,480], τ ∈ Z, Z expression are just Integer is converted into frequency sets { f_b|f_b=τ/60, τ=[30,31 ... 480], b=[1..B] } and, wherein b is corresponding frequency Frequency serial number in set, B are serial number maximum value；

S4. to each frequency in frequency sets, a corresponding parent is created:

For the frequency sets obtained in step S3, by each frequency f in the set_b, create the cosine function conduct of the frequency Corresponding parent α_b, the framing length M of the length of o (n), form is：α_b=cos (2 π f_bT), t=(0...M-1)/f_o,f_o For the sampling rate of o (n), t indicates the time；

S5. all parents are carried out by certain particle size moving to right bit manipulation, often moves one atom of generation that moves a step, these is moved The atom of generation forms the atom set of the corresponding frequency of the parent together with parent:

Parent α_bSupporting domain be [0, M-1], shift granularity d=1,2,3... be a positive integer, by parent α_bIt moves right It moves d*j (j=1,2,3...), parent α_bAfter moving to right, value cos (- 2 π f of the left side [0, M-d*j-1] supporting domain_bt),t =(M-d*j...1)/f_oSupplement, it is often mobile primary in this way, a new atom can be obtained；Parent is period letter herein Therefore number is arranged maximum mobile digit and is no more than a cycle；All parent α_bThe atom obtained with these displacements group together At the corresponding atom set d of the parent_b；

S6. being assembled into redundant dictionary by the corresponding atom set of all frequencies in frequency sets in step S5：

All frequency f in frequency sets_bCorresponding atom set d_b, it is assembled into a redundant dictionary D；

S7. match tracing is carried out with redundant dictionary to each frame signal of o (n), recycles certain number, generates a series of points Solve coefficient and corresponding atom：

To each frame signal of o (n), i.e., to each row X of detection function matrix_i, i ∈ [1..N], with redundant dictionary D, implementation Matching pursuit algorithm：

(1) residual signal y is set_n=X_i, n=0 starts to execute cycle；

(2) all atom g of computing redundancy dictionary_j∈ D and residual signal y_nInner product<y_n,g_j>, select in all inner products absolutely It is worth the corresponding atom g of the maximum_kFor this matched atom of cycle, the decomposition coefficient s of n-th cycle is preserved_n=|<y_n,g_k>| With corresponding atom g_n=g_k；

(3) residual signal y is recalculated_n+1=y_n-|<y_n,g_k>|g_k；

(4) if cycle-index or residual signal reach required precision with original signal energy ratio, cycle is exited, n=n+ is otherwise set 1, it is continued to execute since step (2)；

S8. the decomposition coefficient of each frame signal of o (n) is belonged to according to the relationship of atom in redundant dictionary and music-tempo The coefficient of a certain music-tempo：

To each frame signal, the music-tempo that an initial value is 0 is created first and composes vector S_n, n=[1..N], the sequence of each component Number it is music-tempo serial number b, b=[1..B], the value of each component is the decomposition coefficient of the music-tempo；Then, each frame is believed Number decomposition coefficient s_n, according to atom g in redundant dictionary_nRespective frequencies find corresponding music-tempo serial number b, resolving system Number s_nAs the decomposition coefficient of the music-tempo, identical music-tempo serial number is answered if there is multiple atom pairs, then it will be multiple After the cumulative summation of decomposition coefficient, then as the decomposition coefficient of the music-tempo；

S9. merge the music-tempo spectrum vector per frame signal, form music-tempo spectrogram：

The music-tempo spectrum vector S of all frames_n, assembled by row mode and be merged into music-tempo spectrogram S=S (b, n), b= [1..B], n=[1...N].

2. a kind of music-tempo spectrogram generation method based on match tracing, it is characterised in that (4) step in step S7 is exited and followed The condition of ring is to terminate to recycle by cycle-index, cycle-index is arranged according to the requirement of music-tempo spectrogram, i.e., to K as cycle Number carries out assignment and exits cycle when K reaches preset value；S is obtained after loop termination_n,g_n, n=[1...K].