CN111785296A

CN111785296A - A music segmentation boundary recognition method based on repetitive melody

Info

Publication number: CN111785296A
Application number: CN202010459989.8A
Authority: CN
Inventors: 张克俊; 朱凯丽; 殷叶航; 叶雨晴; 伍文棋; 王昊阳
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2020-10-16
Anticipated expiration: 2040-05-26
Also published as: CN111785296B

Abstract

The invention relates to a music segmentation boundary identification method based on repeated melody, belonging to the technical field of audio signal processing. The method comprises the following steps: 1) extracting chroma characteristics from the audio, filling zero from beginning to end, aggregating every adjacent N frames to form a new frame vector, and forming a new frame characteristic vector sequence by all the frame vectors; 2) calculating Euclidean distance between each frame vector and other frame vectors in the frame feature sequence to obtain a self-similarity matrix S; 3) based on the self-similarity matrix S, obtaining a set N of the ith frame vector nearest neighbor frame_iAnd obtaining a recursion graph R of the self-similarity matrix S; 4) carrying out time delay processing on the recursive graph R to obtain a time delay matrix L; 5) carrying out line segment normalization and denoising on the L, and then carrying out reverse time delay processing to obtain a recursion graph R'; 6) and detecting all line segments, clustering the line segments, and sequentially processing from the cluster with the largest line segment to obtain a music segmentation boundary point set B. The recognition capability of the repeated melody in the music can be improved, and the music can be segmented in a shorter time.

Description

A music segmentation boundary recognition method based on repetitive melody

技术领域technical field

本发明涉及音频信号处理技术领域，具体地说，涉及一种基于重复旋律的音乐分段边界识别方法。The present invention relates to the technical field of audio signal processing, in particular to a method for identifying boundaries of music segments based on repeated melody.

背景技术Background technique

信息常以一定结构或层级进行组织来促进传播或是理解。人类通常很善于感知这样的结构，这种行为有时甚至是无意识地进行以让我们分析和充分获取给定信息的含义。然而考虑到大数据时代下的情况，我们越来越多地需要从计算机获得信息处理上的支持。因此，自动化获取信息的结构成为当今内容处理系统的关键任务。在广泛的多媒体内容中，音乐是一个典型的例子。Information is often organized in a structure or hierarchy to facilitate dissemination or understanding. Humans are generally very good at perceiving such structures, sometimes even unconsciously, to allow us to analyze and fully grasp the meaning of a given information. However, considering the situation in the era of big data, we increasingly need to obtain information processing support from computers. Therefore, automating the structure of obtaining information becomes a critical task of today's content processing systems. Among a wide range of multimedia content, music is a typical example.

音乐分段边界识别算法研究的重要应用有播放器的作品内导航、片段和混搭自动生成、相同作品版本识别以及大规模的音乐学研究。网络与数字娱乐产品的普及和发展，使得音乐已经成为最重要的数字媒体内容之一。Important applications of music segmentation boundary recognition algorithm research include player navigation within works, automatic generation of segments and mashups, identification of versions of identical works, and large-scale musicology research. With the popularization and development of network and digital entertainment products, music has become one of the most important digital media contents.

在当下，音乐除了作为独立的娱乐产品，同时也在影视作品中以配乐的形式扮演了重要的角色。作为独立的娱乐产品，在音乐分析上音乐分段是重要的基本流程。对于某类音乐作品的分析场景下，庞大的作品数量凸显了自动音乐分段重要性。作为配乐，实际应用中比起音乐整篇出现，更多的情况是取其片段使用，自动音乐分段能够极大地提升音乐片段提取的效率。可见，音乐分段边界识别算法研究具有广阔的市场应用前景。At present, in addition to being an independent entertainment product, music also plays an important role in the form of soundtrack in film and television works. As an independent entertainment product, music segmentation is an important basic process in music analysis. In the analysis scenario of a certain type of music works, the huge number of works highlights the importance of automatic music segmentation. As a soundtrack, in practical applications, rather than the whole piece of music, the fragment is used in more cases. Automatic music segmentation can greatly improve the efficiency of music fragment extraction. It can be seen that the research of music segmentation boundary recognition algorithm has broad market application prospects.

Foote在2000年最先将自相似矩阵用于音乐分段算法研究，以用于发现音乐中的重复旋律。Bruderer等人2006年的研究指出，有一些线索与人类在音乐结构感知上高度相关，如音色改变、重复和间歇等。Paulus等人2010年的研究指出，推断音乐结构有三个原则：新奇，同质和重复。Serra等人在2014年提出的音乐分段算法综合考虑了这些原则，引入了递归图的计算方法，大大提高了分段正确率，从而提升了自动音乐分段效率，促进了音乐自动分段算法的发展。In 2000, Foote first used self-similarity matrices in music segmentation algorithms to discover repetitive melodies in music. A 2006 study by Bruderer et al. pointed out that there are cues that are highly relevant to human perception of musical structure, such as timbre changes, repetitions, and pauses. A 2010 study by Paulus et al. states that there are three principles for inferring musical structure: novelty, homogeneity, and repetition. The music segmentation algorithm proposed by Serra et al. in 2014 comprehensively considered these principles and introduced the calculation method of the recursive graph, which greatly improved the segmentation accuracy, thereby improving the efficiency of automatic music segmentation and promoting the automatic music segmentation algorithm. development of.

然而，目前应用于音乐分段的算法本身还存在诸多不足，如无监督方法的分段粒度较大，对部分音乐的短片段获取存在困难，还存在结合乐理知识程度较低、过多依赖于数学方法的问题。深度学习方法未能充分考虑分段中重复的性质，且存在对数据的依赖、模型训练成本高和难以结合乐理知识的问题。However, there are still many shortcomings in the algorithms currently applied to music segmentation. For example, the segmentation granularity of the unsupervised method is relatively large, and it is difficult to obtain short segments of some music. Problems with mathematical methods. Deep learning methods fail to fully consider the nature of repetition in segmentation, and have problems of dependence on data, high model training costs, and difficulty in integrating music theory knowledge.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于重复旋律的音乐分段边界识别方法，以提升对音乐中重复旋律的识别能力，能够在更短时长规模上对音乐进行分段。The purpose of the present invention is to provide a method for identifying the boundaries of music segments based on repetitive melody, so as to improve the ability to recognize repetitive melody in music and to segment music on a shorter time scale.

为了实现上述目的，本发明提供的基于重复旋律的音乐分段边界识别方法包括以下步骤：In order to achieve the above object, the method for identifying the boundaries of music segments based on repeated melody provided by the invention comprises the following steps:

1)对音频提取chroma特征，得到特征向量序列，共M帧；对特征向量序列首尾零填充，聚合每相邻的N帧形成新的帧向量，所有帧向量构成新的帧特征向量序列；1) extracting chroma features from the audio to obtain a sequence of feature vectors, which is a total of M frames; zero-fill the beginning and end of the sequence of eigenvectors, aggregate every adjacent N frames to form a new frame vector, and all frame vectors form a new sequence of frame eigenvectors;

2)计算帧特征序列中每个帧向量与其他帧向量的欧氏距离，得到自相似矩阵S；2) Calculate the Euclidean distance between each frame vector and other frame vectors in the frame feature sequence to obtain a self-similar matrix S;

3)基于自相似矩阵S，得到第i个帧向量最近邻帧的集合N_i，i＝1，2，…，M，并依此得到自相似矩阵S的递归图R；3) Based on the self-similar matrix S, obtain the set N _i of the nearest neighbor frame of the ith frame vector, i=1, 2, ..., M, and obtain the recursive graph R of the self-similar matrix S accordingly;

4)将递归图R经时间延迟处理，得到时间延迟矩阵L；4) The recursive graph R is processed by time delay to obtain a time delay matrix L;

5)对时间延迟矩阵L进行线段规整及去噪，再反时间延迟处理得到规整及去噪后的递归图R’；5) Carry out line segment regularization and denoising to the time delay matrix L, and then inverse time delay processing to obtain a recursive graph R' after regularization and denoising;

6)基于递归图R’，检测出所有的线段并进行线段聚簇，从线段最多的簇开始依次处理，得到音乐分段边界点集合B。6) Based on the recursive graph R', detect all the line segments and perform line segment clustering, and process sequentially from the cluster with the most line segments to obtain the music segment boundary point set B.

上述技术方案中，针对音乐的重复片段，分帧提取音乐的音高类概述(PitchClass Profile)特征，也称为Chroma特征，该特征将给定范围的频率组织到12个音高类中去，突出反映了音乐的旋律。In the above-mentioned technical scheme, for the repeated segments of music, the pitch class profile (PitchClass Profile) feature of the music is extracted in frames, also called the Chroma feature, which organizes the frequencies of a given range into 12 pitch classes, Highlight the melody of the music.

可选地，在一个实施例中，步骤3)中，对于集合N_i中的k个元素是所有帧向量中与第i个帧向量最相似的k个帧向量，k的取值为帧向量总数的0.01。对于递归图R中的每个点R_i,j，若i属于N_j且j属于N_i，则取R_i,j等于1，否则取R_i,j等于0，依此得到自相似矩阵S的递归图R。Optionally, in one embodiment, in step 3), for the k elements in the set N _i are the k frame vectors most similar to the ith frame vector in all frame vectors, and the value of k is the frame vector 0.01 of the total. For each point R _i,j in the recursive graph R, if i belongs to N _j and j belongs to N _i , take R _i,j equal to 1, otherwise take R _i,j equal to 0, and thus obtain the self-similar matrix S The recursive graph R of .

可选地，在一个实施例中，步骤4)中，令L_i,j＝R_{i,(i+j)mod(M-1)}，i＝1，2，…，M，j＝1，2，…，M，得到递归图R的时间延迟矩阵L，即将递归图R中主对角线方向转化为水平方向。Optionally, in an embodiment, in step 4), let _Li,j =R _{i,(i+j)mod(M-1)} , i=1, 2, . . . , M, j=1, 2, ..., M, obtain the time delay matrix L of the recursive graph R, that is, convert the main diagonal direction in the recursive graph R to the horizontal direction.

可选地，在一个实施例中，步骤5)包括：Optionally, in one embodiment, step 5) includes:

5-1)对时间延迟矩阵L进行遍历，取值为1的定义为点；每找到一个点，通过广度优先搜素确定与其相连的所有点，步距小于3则认为相连；5-1) Traverse the time delay matrix L, and the value of 1 is defined as a point; every time a point is found, all points connected to it are determined by breadth-first search, and the step distance is less than 3, it is considered to be connected;

5-2)统计相连的点中每个相同纵坐标的点的数量，若点数量最多的纵坐标下点的数量大于5，则保留这些点中该纵坐标的点，其他点取值为0；否则将这些点全部取值为0；5-2) Count the number of points with the same ordinate in the connected points. If the number of points under the ordinate with the largest number of points is greater than 5, the point of the ordinate among these points is retained, and the other points are set to 0 ; otherwise, all these points are set to be 0;

5-3)令R’_{i,(i+j)mod(M-1)}＝L_i,j，i＝1，2，…，M，j＝1，2，…，M，得到规整及去噪后的递归图R’。5-3) Let R' _{i,(i+j)mod(M-1)} =L _i,j , i=1,2,...,M,j=1,2,...,M, get the regularization and removal The noised recursive graph R'.

可选地，在一个实施例中，步骤6)中，线段聚簇包括：Optionally, in one embodiment, in step 6), the line segment clustering includes:

遍历递归图R’，设置步距为3。Traverse the recursive graph R' and set the stride to 3.

找出图中所有线段，并用{x₁,x₂,y₁,y₂}对各线段进行标准化表示，x₁和x₂是起止点横坐标，y₁和y₂是起止点的纵坐标；Find all the line segments in the figure, and use {x ₁ , x ₂ , y ₁ , y ₂ } to standardize each line segment, where x ₁ and x ₂ are the abscissas of the start and end points, and y ₁ and y ₂ are the ordinates of the start and end points. ;

取一线段，遍历其他线段，找到与该线段对应为同一段旋律的所有线段进行聚簇；判定对应为同一段旋律的依据为：x₁与x₂的公共长度占各自的80％以上。Take a line segment, traverse other line segments, find all the line segments corresponding to the same melody and cluster them; the basis for judging that they correspond to the same melody is: the common length of x ₁ and x ₂ accounts for more than 80% of each.

可选地，在一个实施例中，步骤6)中，线段聚簇后，取线段条数最多的簇，对所有x₁和x₂取平均值，得到

和

然后针对该簇中的每条线段，根据x₁和

x₂和

的差值分别对y₁和y₂进行修正得到y’₁和y’₂；将

和所有的y’₁、y’₂作为时间点x进行如下处理：检查音乐分段边界点集合B中，是否存在与时间点x相隔小于n帧的分段边界点，若不存在则将时间点x加入B中。Optionally, in one embodiment, in step 6), after the line segments are clustered, the cluster with the largest number of line segments is taken, and the average value of all x ₁ and x ₂ is obtained to obtain:

and

Then for each line segment in that cluster, according to x ₁ and

x ₂ and

The difference of y ₁ and y ₂ are respectively corrected to obtain y' ₁ and y'₂;

And all y' ₁ and y' ₂ are processed as time point x as follows: check whether there is a segment boundary point that is less than n frames away from time point x in the set B of music segment boundary points, if not, set the time Click x to join B.

与现有技术相比，本发明的有益之处在于：Compared with the prior art, the advantages of the present invention are:

本发明利用乐理知识以及实际经验进行矩阵去噪，充分考虑了在音乐分段中噪音产生的主要原因，可以更彻底、高效地减少噪音造成的误差。而基于线段聚簇的分段点获取方法优先考虑了重复次数多的旋律片段，取平均值为分段点的方法也进一步地减少了误差，提高了泛化性能。The present invention uses music theory knowledge and practical experience to perform matrix denoising, fully considers the main reasons for noise generation in music segmentation, and can reduce errors caused by noise more thoroughly and efficiently. However, the segmentation point acquisition method based on line clustering gives priority to the melody segment with many repetitions, and the method of taking the average value as the segmentation point further reduces the error and improves the generalization performance.

附图说明Description of drawings

图1为本发明实施例中基于重复旋律的音乐分段边界识别方法的流程图；Fig. 1 is the flow chart of the music segmentation boundary identification method based on repeated melody in the embodiment of the present invention;

图2为本发明实施例中递归图R的示意图；2 is a schematic diagram of a recursive graph R in an embodiment of the present invention;

图3为本发明实施例中延迟矩阵L的示意图；3 is a schematic diagram of a delay matrix L in an embodiment of the present invention;

图4为本发明实施例中经规整去噪后的递归图R’。Fig. 4 is a recursive graph R' after regularization and denoising in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，以下结合实施例及其附图对本发明作进一步说明。显然，所描述的实施例是本发明的一部分实施例，而不是全部的实施例。基于所描述的实施例，本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described below with reference to the embodiments and the accompanying drawings. Obviously, the described embodiments are some, but not all, embodiments of the present invention. Based on the described embodiments, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

除非另外定义，本发明使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本发明中使用的“包括”或者“包含”等类似的词语意指出现该词前面的元件或者步骤涵盖出现在该词后面列举的元件或者步骤及其等同，而不排除其他元件或者步骤。Unless otherwise defined, technical or scientific terms used in the present invention should have the ordinary meaning as understood by one of ordinary skill in the art to which the present invention belongs. When used herein, "comprises" or "comprising" and similar words mean that elements or steps appearing before the word encompass the elements or steps listed after the word and their equivalents, but do not exclude other elements or steps.

实施例Example

本实施例基于重复旋律的音乐分段边界识别方法，构建基于自相似矩阵的音乐分段算法，实现音乐结构分段点自动识别。该方法可代替人工标注，用于音乐结构序列的生成，并可进一步应用于音乐分析、片段自动生成等。参见图1，其具体流程如下：This embodiment constructs a music segmentation algorithm based on a self-similar matrix based on a method for identifying the boundaries of music segments based on repeated melody, and realizes automatic identification of music structure segment points. This method can replace manual annotation for the generation of music structure sequences, and can be further applied to music analysis, automatic segment generation, etc. Referring to Figure 1, the specific process is as follows:

S100，对音频提取chroma特征，得到特征向量序列，共M帧；对特征向量序列首尾零填充，聚合每相邻的N帧形成新的帧向量，所有帧向量构成新的帧特征向量序列；S100, extract chroma features from the audio to obtain a sequence of feature vectors, totaling M frames; zero-pad the beginning and end of the sequence of feature vectors, aggregate every adjacent N frames to form a new frame vector, and all frame vectors form a new sequence of frame feature vectors;

样例音乐的特征序列为12维向量序列，长度为1344。首尾零填充得到长度为1350的长度序列，聚合每个相邻的7帧形成新的帧特征序列，得到12x7维向量序列，长度仍为1344。The feature sequence of the sample music is a 12-dimensional vector sequence with a length of 1344. The first and last zero-padding obtains a length sequence with a length of 1350, and each adjacent 7 frames are aggregated to form a new frame feature sequence, and a 12x7-dimensional vector sequence is obtained, and the length is still 1344.

S200，计算帧特征序列中每个帧向量与其他帧向量的欧氏距离，得到1344x1344的自相似矩阵S。S200 , calculating the Euclidean distance between each frame vector and other frame vectors in the frame feature sequence, to obtain a self-similar matrix S of 1344×1344.

S300，基于自相似矩阵S，得到第i帧最近邻帧的集合N_i，i＝1，2，…，M，并依此得到自相似矩阵S的递归图R，参见图2；S300, based on the self-similar matrix S, obtain the set N _i of the nearest neighbor frame of the ith frame, i=1, 2, ..., M, and obtain the recursive graph R of the self-similar matrix S accordingly, see Fig. 2;

集合N_i中的k个元素是所有帧中与第i帧最相似的k帧。对于递归图中的每个点R_i,j，若i属于N_j且j属于N_i，则取R_i,j等于1，否则取R_i,j等于0，得到1344x1344的递归图R。k的取值为帧总数的0.01，本实施例取13。The k elements in set Ni are the k frames most similar to the _i -th frame among all frames. For each point R _i,j in the recursive graph, if i belongs to N _j and j belongs to N _i , take Ri _,j equal to 1, otherwise take Ri _,j equal to 0, and get a recursive graph R of 1344x1344. The value of k is 0.01 of the total number of frames, which is 13 in this embodiment.

S400，将递归图R经时间延迟处理，得到时间延迟矩阵L，参见图3；S400, the recursive graph R is subjected to time delay processing to obtain a time delay matrix L, see FIG. 3;

先令L_i,j＝R_{i,(i+j)mod(M-1)}，得到递归图R的时间延迟矩阵L，将递归图R中主对角线方向转化为水平方向，提升计算效率。 _{Let Li,j} =R _{i,(i+j)mod(M-1) first} , get the time delay matrix L of the recursive graph R, convert the main diagonal direction in the recursive graph R to the horizontal direction, and improve the computational efficiency .

S500，对时间延迟矩阵L进行线段规整及去噪，再反时间延迟处理得到规整及去噪后的递归图R’，参见图4。S500, performing line segment regularization and denoising on the time delay matrix L, and then inverse time delay processing to obtain a regularized and denoised recursive graph R', see Fig. 4 .

首先对时间延迟矩阵L进行遍历，取值为1的定义为点。每次找到一个点，通过广度优先搜素确定与其相连的所有点，步距小于3则认为相连。统计这些相连的点中每个相同纵坐标的点的数量，若点数量最多的纵坐标下点的数量大于5，则保留这些点中该纵坐标的点，其他点取值均取0。否则将这些点取值全部取0。例如某一系列点为{(1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(2,2),(3,2),(4,2)}，则纵坐标为1的点最多且有6个，它们会被保留，而纵坐标为2的点将被抹去。然后，令R’_{i,(i+j)mod(M-1)}＝L_i,j，得到规整及去噪后的递归图R’。First, the time delay matrix L is traversed, and the value of 1 is defined as a point. Each time a point is found, all points connected to it are determined by breadth-first search, and the step distance is less than 3, it is considered to be connected. Count the number of points with the same ordinate in each of these connected points. If the number of points on the ordinate with the largest number of points is greater than 5, keep the point of this ordinate among these points, and take 0 for other points. Otherwise, the values of these points are all 0. For example, a series of points is {(1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(2,2),( 3,2). Then, let R' _{i,(i+j)mod(M-1)} =L _i,j to obtain the recursive graph R' after regularization and denoising.

S600，基于递归图R’，检测出所有的线段并进行线段聚簇，从线段最多的簇开始依次处理，得到音乐分段边界点集合B。S600, based on the recursive graph R', detect all line segments and perform line segment clustering, and process sequentially from the cluster with the most line segments to obtain a music segment boundary point set B.

首先找到递归图R’中所有的线段并标准化表示，遍历递归图R’，设置步距为3，找到所有线段。找到线段后，用{x₁,x₂,y₁,y₂}表示，x₁和x₂是起止点横坐标，y₁和y₂是起止点的纵坐标。如一段线段是{1,9,10,19}，代表第10帧到第18帧与第1帧到第9帧相似。然后将所有线段中x₁与x₂公共部分占各自的80％以上的部分聚在同一簇，如{1,9,10,18}、{2,9,20,27}以及{2,9,31,38}。聚簇后，对x₁取平均值，并将对应的y₁标记，例如此处x₁平均值为2，对应的3个y₁会取为11、20和31。检查边界点集合B中是否存在和它们差距在20帧(与需要的分段时长相关)内的点，若不存在则将它们加入B中。这样就获得了样例音乐的分段结果。First find all the line segments in the recursive graph R' and standardize the representation, traverse the recursive graph R', set the stride to 3, and find all the line segments. After the line segment is found, it is represented by {x ₁ , x ₂ , y ₁ , y ₂ }, where x ₁ and x ₂ are the abscissas of the starting and ending points, and y ₁ and y ₂ are the ordinates of the starting and ending points. For example, a line segment is {1,9,10,19}, which means that frames 10 to 18 are similar to frames 1 to 9. Then, in all line segments, the common parts of x ₁ and x ₂ account for more than 80% of their respective parts, clustered in the same cluster, such as {1, 9, 10, 18}, {2, 9, 20, 27} and {2, 9} ,31,38}. After clustering, take the average value of x ₁ and mark the corresponding y _1. For example, the average value of x ₁ here is 2, and the corresponding 3 y ₁ will be 11, 20 and 31. Check whether there are points in the boundary point set B that are within 20 frames (related to the required segmentation time) and add them to B if they do not exist. In this way, the segmentation result of the sample music is obtained.

Claims

1. a kind of music segmentation boundary recognition method based on repeated melody, is characterized in that, comprises the following steps:

1) extracting chroma features from the audio to obtain a sequence of feature vectors, which is a total of M frames; zero-fill the beginning and end of the sequence of eigenvectors, aggregate every adjacent N frames to form a new frame vector, and all frame vectors form a new sequence of frame eigenvectors;

2) Calculate the Euclidean distance between each frame vector and other frame vectors in the frame feature sequence to obtain a self-similar matrix S;

3) Based on the self-similar matrix S, obtain the set N _i of the nearest neighbor frame of the ith frame vector, i=1, 2, ..., M, and obtain the recursive graph R of the self-similar matrix S accordingly;

4) The recursive graph R is processed by time delay to obtain a time delay matrix L;

5) Carry out line segment regularization and denoising to the time delay matrix L, and then inverse time delay processing to obtain a recursive graph R' after regularization and denoising;

6) Based on the recursive graph R', detect all the line segments and perform line segment clustering, and process sequentially from the cluster with the most line segments to obtain the music segment boundary point set B.

2. the music segmentation boundary identification method based on repeated melody according to claim 1, is characterized in that, in step 3), for k elements in set N _i is the most with the i-th frame vector in all frame vectors. Similar k frame vectors, the value of k is 0.01 of the total number of frame vectors.

3. the music segmentation boundary identification method based on repeated melody according to claim 1, is characterized in that, in step 3), for each point R i in the recursive graph R _i,j , if i belongs to N _j and j If it belongs to N _i , take Ri _,j equal to 1, otherwise take Ri _,j equal to 0, and thus obtain the recursive graph R of the self-similar matrix S.

4. the music segmentation boundary identification method based on repeated melody according to claim 1, is characterized in that, in step 4), make _Li,j =R _{i,(i+j)mod(M-1)} , i=1, 2, .

5. the music segmentation boundary identification method based on repeated melody according to claim 4, is characterized in that, step 5) comprises:

5-1) Traverse the time delay matrix L, and the value of 1 is defined as a point; every time a point is found, all points connected to it are determined by breadth-first search, and the step distance is less than 3, it is considered to be connected;

5-2) Count the number of points with the same ordinate in the connected points. If the number of points under the ordinate with the largest number of points is greater than 5, the point of the ordinate among these points is retained, and the other points are set to 0 ; otherwise, all these points are set to be 0;

5-3) Let R' _{i,(i+j)mod(M-1)} =L _i,j , i=1,2,...,M,j=1,2,...,M, get the regularization and removal The noised recursive graph R'.

6. the music segmentation boundary recognition method based on repeated melody according to claim 1, is characterized in that, in step 6), line segment clustering comprises:

Traverse the recursive graph R', find all the line segments in the graph, and use {x ₁ , x ₂ , y ₁ , y ₂ } to standardize each line segment, x ₁ and x ₂ are the abscissas of the starting and ending points, y ₁ and y ₂ is the ordinate of the start and end points;

Take a line segment, traverse other line segments, find all the line segments corresponding to the same melody and cluster them; the basis for judging that they correspond to the same melody is: the common length of x ₁ and x ₂ accounts for more than 80% of each.

7. the music segmentation boundary identification method based on repeated melody according to claim 6, is characterized in that, when traversing recursive graph R ', setting step distance is 3.

8. the music segmentation boundary identification method based on repeated melody according to claim 6, is characterized in that, in step 6), after line segment is clustered, get the cluster that line segment number is the most, all x ₁ and x ₂ are taken. average, get

and

Then for each line segment in that cluster, according to x ₁ and

x ₂ and

and all y' ₁ and y' ₂ as the time point x, and perform the following processing: check whether there is a segment boundary point that is less than n frames away from the time point x in the music segment boundary point set B, and if not, the time Click x to join B.