[go: up one dir, main page]

CN111785296A - A music segmentation boundary recognition method based on repetitive melody - Google Patents

A music segmentation boundary recognition method based on repetitive melody Download PDF

Info

Publication number
CN111785296A
CN111785296A CN202010459989.8A CN202010459989A CN111785296A CN 111785296 A CN111785296 A CN 111785296A CN 202010459989 A CN202010459989 A CN 202010459989A CN 111785296 A CN111785296 A CN 111785296A
Authority
CN
China
Prior art keywords
frame
music
points
method based
line segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010459989.8A
Other languages
Chinese (zh)
Other versions
CN111785296B (en
Inventor
张克俊
朱凯丽
殷叶航
叶雨晴
伍文棋
王昊阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010459989.8A priority Critical patent/CN111785296B/en
Publication of CN111785296A publication Critical patent/CN111785296A/en
Application granted granted Critical
Publication of CN111785296B publication Critical patent/CN111785296B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/061Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention relates to a music segmentation boundary identification method based on repeated melody, belonging to the technical field of audio signal processing. The method comprises the following steps: 1) extracting chroma characteristics from the audio, filling zero from beginning to end, aggregating every adjacent N frames to form a new frame vector, and forming a new frame characteristic vector sequence by all the frame vectors; 2) calculating Euclidean distance between each frame vector and other frame vectors in the frame feature sequence to obtain a self-similarity matrix S; 3) based on the self-similarity matrix S, obtaining a set N of the ith frame vector nearest neighbor frameiAnd obtaining a recursion graph R of the self-similarity matrix S; 4) carrying out time delay processing on the recursive graph R to obtain a time delay matrix L; 5) carrying out line segment normalization and denoising on the L, and then carrying out reverse time delay processing to obtain a recursion graph R'; 6) and detecting all line segments, clustering the line segments, and sequentially processing from the cluster with the largest line segment to obtain a music segmentation boundary point set B. The recognition capability of the repeated melody in the music can be improved, and the music can be segmented in a shorter time.

Description

基于重复旋律的音乐分段边界识别方法A music segmentation boundary recognition method based on repetitive melody

技术领域technical field

本发明涉及音频信号处理技术领域,具体地说,涉及一种基于重复旋律的音乐分段边界识别方法。The present invention relates to the technical field of audio signal processing, in particular to a method for identifying boundaries of music segments based on repeated melody.

背景技术Background technique

信息常以一定结构或层级进行组织来促进传播或是理解。人类通常很善于感知这样的结构,这种行为有时甚至是无意识地进行以让我们分析和充分获取给定信息的含义。然而考虑到大数据时代下的情况,我们越来越多地需要从计算机获得信息处理上的支持。因此,自动化获取信息的结构成为当今内容处理系统的关键任务。在广泛的多媒体内容中,音乐是一个典型的例子。Information is often organized in a structure or hierarchy to facilitate dissemination or understanding. Humans are generally very good at perceiving such structures, sometimes even unconsciously, to allow us to analyze and fully grasp the meaning of a given information. However, considering the situation in the era of big data, we increasingly need to obtain information processing support from computers. Therefore, automating the structure of obtaining information becomes a critical task of today's content processing systems. Among a wide range of multimedia content, music is a typical example.

音乐分段边界识别算法研究的重要应用有播放器的作品内导航、片段和混搭自动生成、相同作品版本识别以及大规模的音乐学研究。网络与数字娱乐产品的普及和发展,使得音乐已经成为最重要的数字媒体内容之一。Important applications of music segmentation boundary recognition algorithm research include player navigation within works, automatic generation of segments and mashups, identification of versions of identical works, and large-scale musicology research. With the popularization and development of network and digital entertainment products, music has become one of the most important digital media contents.

在当下,音乐除了作为独立的娱乐产品,同时也在影视作品中以配乐的形式扮演了重要的角色。作为独立的娱乐产品,在音乐分析上音乐分段是重要的基本流程。对于某类音乐作品的分析场景下,庞大的作品数量凸显了自动音乐分段重要性。作为配乐,实际应用中比起音乐整篇出现,更多的情况是取其片段使用,自动音乐分段能够极大地提升音乐片段提取的效率。可见,音乐分段边界识别算法研究具有广阔的市场应用前景。At present, in addition to being an independent entertainment product, music also plays an important role in the form of soundtrack in film and television works. As an independent entertainment product, music segmentation is an important basic process in music analysis. In the analysis scenario of a certain type of music works, the huge number of works highlights the importance of automatic music segmentation. As a soundtrack, in practical applications, rather than the whole piece of music, the fragment is used in more cases. Automatic music segmentation can greatly improve the efficiency of music fragment extraction. It can be seen that the research of music segmentation boundary recognition algorithm has broad market application prospects.

Foote在2000年最先将自相似矩阵用于音乐分段算法研究,以用于发现音乐中的重复旋律。Bruderer等人2006年的研究指出,有一些线索与人类在音乐结构感知上高度相关,如音色改变、重复和间歇等。Paulus等人2010年的研究指出,推断音乐结构有三个原则:新奇,同质和重复。Serra等人在2014年提出的音乐分段算法综合考虑了这些原则,引入了递归图的计算方法,大大提高了分段正确率,从而提升了自动音乐分段效率,促进了音乐自动分段算法的发展。In 2000, Foote first used self-similarity matrices in music segmentation algorithms to discover repetitive melodies in music. A 2006 study by Bruderer et al. pointed out that there are cues that are highly relevant to human perception of musical structure, such as timbre changes, repetitions, and pauses. A 2010 study by Paulus et al. states that there are three principles for inferring musical structure: novelty, homogeneity, and repetition. The music segmentation algorithm proposed by Serra et al. in 2014 comprehensively considered these principles and introduced the calculation method of the recursive graph, which greatly improved the segmentation accuracy, thereby improving the efficiency of automatic music segmentation and promoting the automatic music segmentation algorithm. development of.

然而,目前应用于音乐分段的算法本身还存在诸多不足,如无监督方法的分段粒度较大,对部分音乐的短片段获取存在困难,还存在结合乐理知识程度较低、过多依赖于数学方法的问题。深度学习方法未能充分考虑分段中重复的性质,且存在对数据的依赖、模型训练成本高和难以结合乐理知识的问题。However, there are still many shortcomings in the algorithms currently applied to music segmentation. For example, the segmentation granularity of the unsupervised method is relatively large, and it is difficult to obtain short segments of some music. Problems with mathematical methods. Deep learning methods fail to fully consider the nature of repetition in segmentation, and have problems of dependence on data, high model training costs, and difficulty in integrating music theory knowledge.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种基于重复旋律的音乐分段边界识别方法,以提升对音乐中重复旋律的识别能力,能够在更短时长规模上对音乐进行分段。The purpose of the present invention is to provide a method for identifying the boundaries of music segments based on repetitive melody, so as to improve the ability to recognize repetitive melody in music and to segment music on a shorter time scale.

为了实现上述目的,本发明提供的基于重复旋律的音乐分段边界识别方法包括以下步骤:In order to achieve the above object, the method for identifying the boundaries of music segments based on repeated melody provided by the invention comprises the following steps:

1)对音频提取chroma特征,得到特征向量序列,共M帧;对特征向量序列首尾零填充,聚合每相邻的N帧形成新的帧向量,所有帧向量构成新的帧特征向量序列;1) extracting chroma features from the audio to obtain a sequence of feature vectors, which is a total of M frames; zero-fill the beginning and end of the sequence of eigenvectors, aggregate every adjacent N frames to form a new frame vector, and all frame vectors form a new sequence of frame eigenvectors;

2)计算帧特征序列中每个帧向量与其他帧向量的欧氏距离,得到自相似矩阵S;2) Calculate the Euclidean distance between each frame vector and other frame vectors in the frame feature sequence to obtain a self-similar matrix S;

3)基于自相似矩阵S,得到第i个帧向量最近邻帧的集合Ni,i=1,2,…,M,并依此得到自相似矩阵S的递归图R;3) Based on the self-similar matrix S, obtain the set N i of the nearest neighbor frame of the ith frame vector, i=1, 2, ..., M, and obtain the recursive graph R of the self-similar matrix S accordingly;

4)将递归图R经时间延迟处理,得到时间延迟矩阵L;4) The recursive graph R is processed by time delay to obtain a time delay matrix L;

5)对时间延迟矩阵L进行线段规整及去噪,再反时间延迟处理得到规整及去噪后的递归图R’;5) Carry out line segment regularization and denoising to the time delay matrix L, and then inverse time delay processing to obtain a recursive graph R' after regularization and denoising;

6)基于递归图R’,检测出所有的线段并进行线段聚簇,从线段最多的簇开始依次处理,得到音乐分段边界点集合B。6) Based on the recursive graph R', detect all the line segments and perform line segment clustering, and process sequentially from the cluster with the most line segments to obtain the music segment boundary point set B.

上述技术方案中,针对音乐的重复片段,分帧提取音乐的音高类概述(PitchClass Profile)特征,也称为Chroma特征,该特征将给定范围的频率组织到12个音高类中去,突出反映了音乐的旋律。In the above-mentioned technical scheme, for the repeated segments of music, the pitch class profile (PitchClass Profile) feature of the music is extracted in frames, also called the Chroma feature, which organizes the frequencies of a given range into 12 pitch classes, Highlight the melody of the music.

可选地,在一个实施例中,步骤3)中,对于集合Ni中的k个元素是所有帧向量中与第i个帧向量最相似的k个帧向量,k的取值为帧向量总数的0.01。对于递归图R中的每个点Ri,j,若i属于Nj且j属于Ni,则取Ri,j等于1,否则取Ri,j等于0,依此得到自相似矩阵S的递归图R。Optionally, in one embodiment, in step 3), for the k elements in the set N i are the k frame vectors most similar to the ith frame vector in all frame vectors, and the value of k is the frame vector 0.01 of the total. For each point R i,j in the recursive graph R, if i belongs to N j and j belongs to N i , take R i,j equal to 1, otherwise take R i,j equal to 0, and thus obtain the self-similar matrix S The recursive graph R of .

可选地,在一个实施例中,步骤4)中,令Li,j=Ri,(i+j)mod(M-1),i=1,2,…,M,j=1,2,…,M,得到递归图R的时间延迟矩阵L,即将递归图R中主对角线方向转化为水平方向。Optionally, in an embodiment, in step 4), let Li,j =R i,(i+j)mod(M-1) , i=1, 2, . . . , M, j=1, 2, ..., M, obtain the time delay matrix L of the recursive graph R, that is, convert the main diagonal direction in the recursive graph R to the horizontal direction.

可选地,在一个实施例中,步骤5)包括:Optionally, in one embodiment, step 5) includes:

5-1)对时间延迟矩阵L进行遍历,取值为1的定义为点;每找到一个点,通过广度优先搜素确定与其相连的所有点,步距小于3则认为相连;5-1) Traverse the time delay matrix L, and the value of 1 is defined as a point; every time a point is found, all points connected to it are determined by breadth-first search, and the step distance is less than 3, it is considered to be connected;

5-2)统计相连的点中每个相同纵坐标的点的数量,若点数量最多的纵坐标下点的数量大于5,则保留这些点中该纵坐标的点,其他点取值为0;否则将这些点全部取值为0;5-2) Count the number of points with the same ordinate in the connected points. If the number of points under the ordinate with the largest number of points is greater than 5, the point of the ordinate among these points is retained, and the other points are set to 0 ; otherwise, all these points are set to be 0;

5-3)令R’i,(i+j)mod(M-1)=Li,j,i=1,2,…,M,j=1,2,…,M,得到规整及去噪后的递归图R’。5-3) Let R' i,(i+j)mod(M-1) =L i,j , i=1,2,...,M,j=1,2,...,M, get the regularization and removal The noised recursive graph R'.

可选地,在一个实施例中,步骤6)中,线段聚簇包括:Optionally, in one embodiment, in step 6), the line segment clustering includes:

遍历递归图R’,设置步距为3。Traverse the recursive graph R' and set the stride to 3.

找出图中所有线段,并用{x1,x2,y1,y2}对各线段进行标准化表示,x1和x2是起止点横坐标,y1和y2是起止点的纵坐标;Find all the line segments in the figure, and use {x 1 , x 2 , y 1 , y 2 } to standardize each line segment, where x 1 and x 2 are the abscissas of the start and end points, and y 1 and y 2 are the ordinates of the start and end points. ;

取一线段,遍历其他线段,找到与该线段对应为同一段旋律的所有线段进行聚簇;判定对应为同一段旋律的依据为:x1与x2的公共长度占各自的80%以上。Take a line segment, traverse other line segments, find all the line segments corresponding to the same melody and cluster them; the basis for judging that they correspond to the same melody is: the common length of x 1 and x 2 accounts for more than 80% of each.

可选地,在一个实施例中,步骤6)中,线段聚簇后,取线段条数最多的簇,对所有x1和x2取平均值,得到

Figure BDA0002510613080000041
Figure BDA0002510613080000042
然后针对该簇中的每条线段,根据x1
Figure BDA0002510613080000043
x2
Figure BDA0002510613080000044
的差值分别对y1和y2进行修正得到y’1和y’2;将
Figure BDA0002510613080000045
和所有的y’1、y’2作为时间点x进行如下处理:检查音乐分段边界点集合B中,是否存在与时间点x相隔小于n帧的分段边界点,若不存在则将时间点x加入B中。Optionally, in one embodiment, in step 6), after the line segments are clustered, the cluster with the largest number of line segments is taken, and the average value of all x 1 and x 2 is obtained to obtain:
Figure BDA0002510613080000041
and
Figure BDA0002510613080000042
Then for each line segment in that cluster, according to x 1 and
Figure BDA0002510613080000043
x 2 and
Figure BDA0002510613080000044
The difference of y 1 and y 2 are respectively corrected to obtain y' 1 and y'2;
Figure BDA0002510613080000045
And all y' 1 and y' 2 are processed as time point x as follows: check whether there is a segment boundary point that is less than n frames away from time point x in the set B of music segment boundary points, if not, set the time Click x to join B.

与现有技术相比,本发明的有益之处在于:Compared with the prior art, the advantages of the present invention are:

本发明利用乐理知识以及实际经验进行矩阵去噪,充分考虑了在音乐分段中噪音产生的主要原因,可以更彻底、高效地减少噪音造成的误差。而基于线段聚簇的分段点获取方法优先考虑了重复次数多的旋律片段,取平均值为分段点的方法也进一步地减少了误差,提高了泛化性能。The present invention uses music theory knowledge and practical experience to perform matrix denoising, fully considers the main reasons for noise generation in music segmentation, and can reduce errors caused by noise more thoroughly and efficiently. However, the segmentation point acquisition method based on line clustering gives priority to the melody segment with many repetitions, and the method of taking the average value as the segmentation point further reduces the error and improves the generalization performance.

附图说明Description of drawings

图1为本发明实施例中基于重复旋律的音乐分段边界识别方法的流程图;Fig. 1 is the flow chart of the music segmentation boundary identification method based on repeated melody in the embodiment of the present invention;

图2为本发明实施例中递归图R的示意图;2 is a schematic diagram of a recursive graph R in an embodiment of the present invention;

图3为本发明实施例中延迟矩阵L的示意图;3 is a schematic diagram of a delay matrix L in an embodiment of the present invention;

图4为本发明实施例中经规整去噪后的递归图R’。Fig. 4 is a recursive graph R' after regularization and denoising in an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,以下结合实施例及其附图对本发明作进一步说明。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described below with reference to the embodiments and the accompanying drawings. Obviously, the described embodiments are some, but not all, embodiments of the present invention. Based on the described embodiments, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

除非另外定义,本发明使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本发明中使用的“包括”或者“包含”等类似的词语意指出现该词前面的元件或者步骤涵盖出现在该词后面列举的元件或者步骤及其等同,而不排除其他元件或者步骤。Unless otherwise defined, technical or scientific terms used in the present invention should have the ordinary meaning as understood by one of ordinary skill in the art to which the present invention belongs. When used herein, "comprises" or "comprising" and similar words mean that elements or steps appearing before the word encompass the elements or steps listed after the word and their equivalents, but do not exclude other elements or steps.

实施例Example

本实施例基于重复旋律的音乐分段边界识别方法,构建基于自相似矩阵的音乐分段算法,实现音乐结构分段点自动识别。该方法可代替人工标注,用于音乐结构序列的生成,并可进一步应用于音乐分析、片段自动生成等。参见图1,其具体流程如下:This embodiment constructs a music segmentation algorithm based on a self-similar matrix based on a method for identifying the boundaries of music segments based on repeated melody, and realizes automatic identification of music structure segment points. This method can replace manual annotation for the generation of music structure sequences, and can be further applied to music analysis, automatic segment generation, etc. Referring to Figure 1, the specific process is as follows:

S100,对音频提取chroma特征,得到特征向量序列,共M帧;对特征向量序列首尾零填充,聚合每相邻的N帧形成新的帧向量,所有帧向量构成新的帧特征向量序列;S100, extract chroma features from the audio to obtain a sequence of feature vectors, totaling M frames; zero-pad the beginning and end of the sequence of feature vectors, aggregate every adjacent N frames to form a new frame vector, and all frame vectors form a new sequence of frame feature vectors;

样例音乐的特征序列为12维向量序列,长度为1344。首尾零填充得到长度为1350的长度序列,聚合每个相邻的7帧形成新的帧特征序列,得到12x7维向量序列,长度仍为1344。The feature sequence of the sample music is a 12-dimensional vector sequence with a length of 1344. The first and last zero-padding obtains a length sequence with a length of 1350, and each adjacent 7 frames are aggregated to form a new frame feature sequence, and a 12x7-dimensional vector sequence is obtained, and the length is still 1344.

S200,计算帧特征序列中每个帧向量与其他帧向量的欧氏距离,得到1344x1344的自相似矩阵S。S200 , calculating the Euclidean distance between each frame vector and other frame vectors in the frame feature sequence, to obtain a self-similar matrix S of 1344×1344.

S300,基于自相似矩阵S,得到第i帧最近邻帧的集合Ni,i=1,2,…,M,并依此得到自相似矩阵S的递归图R,参见图2;S300, based on the self-similar matrix S, obtain the set N i of the nearest neighbor frame of the ith frame, i=1, 2, ..., M, and obtain the recursive graph R of the self-similar matrix S accordingly, see Fig. 2;

集合Ni中的k个元素是所有帧中与第i帧最相似的k帧。对于递归图中的每个点Ri,j,若i属于Nj且j属于Ni,则取Ri,j等于1,否则取Ri,j等于0,得到1344x1344的递归图R。k的取值为帧总数的0.01,本实施例取13。The k elements in set Ni are the k frames most similar to the i -th frame among all frames. For each point R i,j in the recursive graph, if i belongs to N j and j belongs to N i , take Ri ,j equal to 1, otherwise take Ri ,j equal to 0, and get a recursive graph R of 1344x1344. The value of k is 0.01 of the total number of frames, which is 13 in this embodiment.

S400,将递归图R经时间延迟处理,得到时间延迟矩阵L,参见图3;S400, the recursive graph R is subjected to time delay processing to obtain a time delay matrix L, see FIG. 3;

先令Li,j=Ri,(i+j)mod(M-1),得到递归图R的时间延迟矩阵L,将递归图R中主对角线方向转化为水平方向,提升计算效率。 Let Li,j =R i,(i+j)mod(M-1) first , get the time delay matrix L of the recursive graph R, convert the main diagonal direction in the recursive graph R to the horizontal direction, and improve the computational efficiency .

S500,对时间延迟矩阵L进行线段规整及去噪,再反时间延迟处理得到规整及去噪后的递归图R’,参见图4。S500, performing line segment regularization and denoising on the time delay matrix L, and then inverse time delay processing to obtain a regularized and denoised recursive graph R', see Fig. 4 .

首先对时间延迟矩阵L进行遍历,取值为1的定义为点。每次找到一个点,通过广度优先搜素确定与其相连的所有点,步距小于3则认为相连。统计这些相连的点中每个相同纵坐标的点的数量,若点数量最多的纵坐标下点的数量大于5,则保留这些点中该纵坐标的点,其他点取值均取0。否则将这些点取值全部取0。例如某一系列点为{(1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(2,2),(3,2),(4,2)},则纵坐标为1的点最多且有6个,它们会被保留,而纵坐标为2的点将被抹去。然后,令R’i,(i+j)mod(M-1)=Li,j,得到规整及去噪后的递归图R’。First, the time delay matrix L is traversed, and the value of 1 is defined as a point. Each time a point is found, all points connected to it are determined by breadth-first search, and the step distance is less than 3, it is considered to be connected. Count the number of points with the same ordinate in each of these connected points. If the number of points on the ordinate with the largest number of points is greater than 5, keep the point of this ordinate among these points, and take 0 for other points. Otherwise, the values of these points are all 0. For example, a series of points is {(1,1),(2,1),(3,1),(4,1),(5,1),(6,1),(2,2),( 3,2). Then, let R' i,(i+j)mod(M-1) =L i,j to obtain the recursive graph R' after regularization and denoising.

S600,基于递归图R’,检测出所有的线段并进行线段聚簇,从线段最多的簇开始依次处理,得到音乐分段边界点集合B。S600, based on the recursive graph R', detect all line segments and perform line segment clustering, and process sequentially from the cluster with the most line segments to obtain a music segment boundary point set B.

首先找到递归图R’中所有的线段并标准化表示,遍历递归图R’,设置步距为3,找到所有线段。找到线段后,用{x1,x2,y1,y2}表示,x1和x2是起止点横坐标,y1和y2是起止点的纵坐标。如一段线段是{1,9,10,19},代表第10帧到第18帧与第1帧到第9帧相似。然后将所有线段中x1与x2公共部分占各自的80%以上的部分聚在同一簇,如{1,9,10,18}、{2,9,20,27}以及{2,9,31,38}。聚簇后,对x1取平均值,并将对应的y1标记,例如此处x1平均值为2,对应的3个y1会取为11、20和31。检查边界点集合B中是否存在和它们差距在20帧(与需要的分段时长相关)内的点,若不存在则将它们加入B中。这样就获得了样例音乐的分段结果。First find all the line segments in the recursive graph R' and standardize the representation, traverse the recursive graph R', set the stride to 3, and find all the line segments. After the line segment is found, it is represented by {x 1 , x 2 , y 1 , y 2 }, where x 1 and x 2 are the abscissas of the starting and ending points, and y 1 and y 2 are the ordinates of the starting and ending points. For example, a line segment is {1,9,10,19}, which means that frames 10 to 18 are similar to frames 1 to 9. Then, in all line segments, the common parts of x 1 and x 2 account for more than 80% of their respective parts, clustered in the same cluster, such as {1, 9, 10, 18}, {2, 9, 20, 27} and {2, 9} ,31,38}. After clustering, take the average value of x 1 and mark the corresponding y 1. For example, the average value of x 1 here is 2, and the corresponding 3 y 1 will be 11, 20 and 31. Check whether there are points in the boundary point set B that are within 20 frames (related to the required segmentation time) and add them to B if they do not exist. In this way, the segmentation result of the sample music is obtained.

Claims (8)

1.一种基于重复旋律的音乐分段边界识别方法,其特征在于,包括以下步骤:1. a kind of music segmentation boundary recognition method based on repeated melody, is characterized in that, comprises the following steps: 1)对音频提取chroma特征,得到特征向量序列,共M帧;对特征向量序列首尾零填充,聚合每相邻的N帧形成新的帧向量,所有帧向量构成新的帧特征向量序列;1) extracting chroma features from the audio to obtain a sequence of feature vectors, which is a total of M frames; zero-fill the beginning and end of the sequence of eigenvectors, aggregate every adjacent N frames to form a new frame vector, and all frame vectors form a new sequence of frame eigenvectors; 2)计算帧特征序列中每个帧向量与其他帧向量的欧氏距离,得到自相似矩阵S;2) Calculate the Euclidean distance between each frame vector and other frame vectors in the frame feature sequence to obtain a self-similar matrix S; 3)基于自相似矩阵S,得到第i个帧向量最近邻帧的集合Ni,i=1,2,…,M,并依此得到自相似矩阵S的递归图R;3) Based on the self-similar matrix S, obtain the set N i of the nearest neighbor frame of the ith frame vector, i=1, 2, ..., M, and obtain the recursive graph R of the self-similar matrix S accordingly; 4)将递归图R经时间延迟处理,得到时间延迟矩阵L;4) The recursive graph R is processed by time delay to obtain a time delay matrix L; 5)对时间延迟矩阵L进行线段规整及去噪,再反时间延迟处理得到规整及去噪后的递归图R’;5) Carry out line segment regularization and denoising to the time delay matrix L, and then inverse time delay processing to obtain a recursive graph R' after regularization and denoising; 6)基于递归图R’,检测出所有的线段并进行线段聚簇,从线段最多的簇开始依次处理,得到音乐分段边界点集合B。6) Based on the recursive graph R', detect all the line segments and perform line segment clustering, and process sequentially from the cluster with the most line segments to obtain the music segment boundary point set B. 2.根据权利要求1所述的基于重复旋律的音乐分段边界识别方法,其特征在于,步骤3)中,对于集合Ni中的k个元素是所有帧向量中与第i个帧向量最相似的k个帧向量,k的取值为帧向量总数的0.01。2. the music segmentation boundary identification method based on repeated melody according to claim 1, is characterized in that, in step 3), for k elements in set N i is the most with the i-th frame vector in all frame vectors. Similar k frame vectors, the value of k is 0.01 of the total number of frame vectors. 3.根据权利要求1所述的基于重复旋律的音乐分段边界识别方法,其特征在于,步骤3)中,对于递归图R中的每个点Ri,j,若i属于Nj且j属于Ni,则取Ri,j等于1,否则取Ri,j等于0,依此得到自相似矩阵S的递归图R。3. the music segmentation boundary identification method based on repeated melody according to claim 1, is characterized in that, in step 3), for each point R i in the recursive graph R i,j , if i belongs to N j and j If it belongs to N i , take Ri ,j equal to 1, otherwise take Ri ,j equal to 0, and thus obtain the recursive graph R of the self-similar matrix S. 4.根据权利要求1所述的基于重复旋律的音乐分段边界识别方法,其特征在于,步骤4)中,令Li,j=Ri,(i+j)mod(M-1),i=1,2,…,M,j=1,2,…,M,得到递归图R的时间延迟矩阵L,即将递归图R中主对角线方向转化为水平方向。4. the music segmentation boundary identification method based on repeated melody according to claim 1, is characterized in that, in step 4), make Li,j =R i,(i+j)mod(M-1) , i=1, 2, . 5.根据权利要求4所述的基于重复旋律的音乐分段边界识别方法,其特征在于,步骤5)包括:5. the music segmentation boundary identification method based on repeated melody according to claim 4, is characterized in that, step 5) comprises: 5-1)对时间延迟矩阵L进行遍历,取值为1的定义为点;每找到一个点,通过广度优先搜素确定与其相连的所有点,步距小于3则认为相连;5-1) Traverse the time delay matrix L, and the value of 1 is defined as a point; every time a point is found, all points connected to it are determined by breadth-first search, and the step distance is less than 3, it is considered to be connected; 5-2)统计相连的点中每个相同纵坐标的点的数量,若点数量最多的纵坐标下点的数量大于5,则保留这些点中该纵坐标的点,其他点取值为0;否则将这些点全部取值为0;5-2) Count the number of points with the same ordinate in the connected points. If the number of points under the ordinate with the largest number of points is greater than 5, the point of the ordinate among these points is retained, and the other points are set to 0 ; otherwise, all these points are set to be 0; 5-3)令R’i,(i+j)mod(M-1)=Li,j,i=1,2,…,M,j=1,2,…,M,得到规整及去噪后的递归图R’。5-3) Let R' i,(i+j)mod(M-1) =L i,j , i=1,2,...,M,j=1,2,...,M, get the regularization and removal The noised recursive graph R'. 6.根据权利要求1所述的基于重复旋律的音乐分段边界识别方法,其特征在于,步骤6)中,线段聚簇包括:6. the music segmentation boundary recognition method based on repeated melody according to claim 1, is characterized in that, in step 6), line segment clustering comprises: 遍历递归图R’,找出图中所有线段,并用{x1,x2,y1,y2}对各线段进行标准化表示,x1和x2是起止点横坐标,y1和y2是起止点的纵坐标;Traverse the recursive graph R', find all the line segments in the graph, and use {x 1 , x 2 , y 1 , y 2 } to standardize each line segment, x 1 and x 2 are the abscissas of the starting and ending points, y 1 and y 2 is the ordinate of the start and end points; 取一线段,遍历其他线段,找到与该线段对应为同一段旋律的所有线段进行聚簇;判定对应为同一段旋律的依据为:x1与x2的公共长度占各自的80%以上。Take a line segment, traverse other line segments, find all the line segments corresponding to the same melody and cluster them; the basis for judging that they correspond to the same melody is: the common length of x 1 and x 2 accounts for more than 80% of each. 7.根据权利要求6所述的基于重复旋律的音乐分段边界识别方法,其特征在于,遍历递归图R’时,设置步距为3。7. the music segmentation boundary identification method based on repeated melody according to claim 6, is characterized in that, when traversing recursive graph R ', setting step distance is 3. 8.根据权利要求6所述的基于重复旋律的音乐分段边界识别方法,其特征在于,步骤6)中,线段聚簇后,取线段条数最多的簇,对所有x1和x2取平均值,得到
Figure FDA0002510613070000021
Figure FDA0002510613070000022
然后针对该簇中的每条线段,根据x1
Figure FDA0002510613070000023
x2
Figure FDA0002510613070000031
的差值分别对y1和y2进行修正得到y’1和y’2;将
Figure FDA0002510613070000032
和所有的y’1、y’2作为时间点x进行如下处理:检查音乐分段边界点集合B中,是否存在与时间点x相隔小于n帧的分段边界点,若不存在则将时间点x加入B中。
8. the music segmentation boundary identification method based on repeated melody according to claim 6, is characterized in that, in step 6), after line segment is clustered, get the cluster that line segment number is the most, all x 1 and x 2 are taken. average, get
Figure FDA0002510613070000021
and
Figure FDA0002510613070000022
Then for each line segment in that cluster, according to x 1 and
Figure FDA0002510613070000023
x 2 and
Figure FDA0002510613070000031
The difference of y 1 and y 2 are respectively corrected to obtain y' 1 and y'2;
Figure FDA0002510613070000032
and all y' 1 and y' 2 as the time point x, and perform the following processing: check whether there is a segment boundary point that is less than n frames away from the time point x in the music segment boundary point set B, and if not, the time Click x to join B.
CN202010459989.8A 2020-05-26 2020-05-26 Music segmentation boundary identification method based on repeated melody Active CN111785296B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010459989.8A CN111785296B (en) 2020-05-26 2020-05-26 Music segmentation boundary identification method based on repeated melody

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010459989.8A CN111785296B (en) 2020-05-26 2020-05-26 Music segmentation boundary identification method based on repeated melody

Publications (2)

Publication Number Publication Date
CN111785296A true CN111785296A (en) 2020-10-16
CN111785296B CN111785296B (en) 2022-06-10

Family

ID=72753490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010459989.8A Active CN111785296B (en) 2020-05-26 2020-05-26 Music segmentation boundary identification method based on repeated melody

Country Status (1)

Country Link
CN (1) CN111785296B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278972B1 (en) * 1999-01-04 2001-08-21 Qualcomm Incorporated System and method for segmentation and recognition of speech signals
WO2005010865A2 (en) * 2003-07-31 2005-02-03 The Registrar, Indian Institute Of Science Method of music information retrieval and classification using continuity information
US20060065106A1 (en) * 2004-09-28 2006-03-30 Pinxteren Markus V Apparatus and method for changing a segmentation of an audio piece
US20070291958A1 (en) * 2006-06-15 2007-12-20 Tristan Jehan Creating Music by Listening
CN103116646A (en) * 2013-02-26 2013-05-22 浙江大学 Cloud gene expression programming based music emotion recognition method
CN103854661A (en) * 2014-03-20 2014-06-11 北京百度网讯科技有限公司 Method and device for extracting music characteristics
US20140205103A1 (en) * 2011-08-19 2014-07-24 Dolby Laboratories Licensing Corporation Measuring content coherence and measuring similarity
US20170148424A1 (en) * 2015-11-23 2017-05-25 Adobe Systems Incorporated Intuitive music visualization using efficient structural segmentation
CN108665903A (en) * 2018-05-11 2018-10-16 复旦大学 An automatic detection method and system for audio signal similarity

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6278972B1 (en) * 1999-01-04 2001-08-21 Qualcomm Incorporated System and method for segmentation and recognition of speech signals
WO2005010865A2 (en) * 2003-07-31 2005-02-03 The Registrar, Indian Institute Of Science Method of music information retrieval and classification using continuity information
US20060065106A1 (en) * 2004-09-28 2006-03-30 Pinxteren Markus V Apparatus and method for changing a segmentation of an audio piece
US20070291958A1 (en) * 2006-06-15 2007-12-20 Tristan Jehan Creating Music by Listening
US20140205103A1 (en) * 2011-08-19 2014-07-24 Dolby Laboratories Licensing Corporation Measuring content coherence and measuring similarity
CN103116646A (en) * 2013-02-26 2013-05-22 浙江大学 Cloud gene expression programming based music emotion recognition method
CN103854661A (en) * 2014-03-20 2014-06-11 北京百度网讯科技有限公司 Method and device for extracting music characteristics
US20170148424A1 (en) * 2015-11-23 2017-05-25 Adobe Systems Incorporated Intuitive music visualization using efficient structural segmentation
CN108665903A (en) * 2018-05-11 2018-10-16 复旦大学 An automatic detection method and system for audio signal similarity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李伟等: "理解数字音乐――音乐信息检索技术综述", 《复旦学报(自然科学版)》 *
肖川等: "多版本音乐识别技术研究综述", 《小型微型计算机系统》 *

Also Published As

Publication number Publication date
CN111785296B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN108921130B (en) Video key frame extraction method based on saliency region
CN112417157A (en) Emotion classification method of text attribute words based on deep learning network
CN111651636A (en) Video similar segment searching method and device
CN108427925B (en) A copy video detection method based on continuous copy frame sequence
CN101398854A (en) Video fragment searching method and system
CN101655859B (en) Method for fast removing redundancy key frames and device thereof
CN108932950A (en) It is a kind of based on the tag amplified sound scenery recognition methods merged with multifrequency spectrogram
TW202217597A (en) Image incremental clustering method, electronic equipment, computer storage medium thereof
CN112100412B (en) Image retrieval method, device, computer equipment and storage medium
US8175392B2 (en) Time segment representative feature vector generation device
CN114399808B (en) A method, system, electronic device and storage medium for estimating face age
CN104715033A (en) Step type voice frequency retrieval method
CN113378946A (en) Robust multi-label feature selection method considering feature label dependency
CN115878842A (en) Video tag determination method and device, electronic equipment and readable storage medium
CN113704565B (en) Learning type space-time index method, device and medium based on global interval error
CN114357248A (en) Video retrieval method, computer storage medium, electronic device, and computer program product
CN111859057A (en) Data feature processing method and data feature processing device
WO2016110125A1 (en) Hash method for high dimension vector, and vector quantization method and device
CN111785296A (en) A music segmentation boundary recognition method based on repetitive melody
CN115879002B (en) Training sample generation method, model training method and device
CN115758191B (en) Knowledge service entity clustering number prediction method based on deep learning
CN102903104A (en) Subtractive clustering based rapid image segmentation method
CN116994043A (en) A small sample image recognition optimization method, device, equipment and storage medium
CN116248918A (en) Video shot segmentation method and device, electronic equipment and readable medium
CN114637889A (en) Video tag identification method, apparatus, electronic device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant