KR20200118587A

KR20200118587A - Music recommendation system using intrinsic information of music

Info

Publication number: KR20200118587A
Application number: KR1020190040671A
Authority: KR
Inventors: 허유정; 전지희; 박종인; 최창일; 이창현
Original assignee: 성균관대학교산학협력단
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2020-10-16

Abstract

본 발명은 음악 추천 시스템에 관한 것이다. 본 발명에 따른 음악 추천 시스템은 원격의 서버로부터 장르별 대표 가수의 음악을 수집하고 수집된 음악 중 동일한 음악의 디지털 싱글버전과 라이브 공연버전이 존재하는 경우 라이브 공연버전은 제거하는 데이터 수집 모듈(100); 수집된 음악의 하이라이트를 추출하는 하이라이트 추출 모듈(200); 추출된 음악의 하이라이트에서 음악적 특징을 추출하는 특징 추출 모듈(300); 및 추출된 특징을 기초로 수집된 음악들을 유사한 음악들끼리 분류하는 음악 분류 모듈(400)을 포함한다.The present invention relates to a music recommendation system. The music recommendation system according to the present invention is a data collection module 100 that collects music of representative singers for each genre from a remote server, and removes the live performance version when a digital single version of the same music and a live performance version exist among the collected music. ; A highlight extraction module 200 for extracting highlights of the collected music; A feature extraction module 300 for extracting musical features from the extracted highlights of music; And a music classification module 400 for classifying the collected songs based on the extracted features among similar songs.

Description

Music recommendation system using intrinsic information of music {MUSIC RECOMMENDATION SYSTEM USING INTRINSIC INFORMATION OF MUSIC}

본 발명은 음악의 내재적 정보를 이용한 음악 추천 시스템에 관한 것으로서, 보다 상세하게는 음악의 하이라이트 부분을 추출하고 추출된 하이라이트 부분의 멜로디 데이터와 비트 데이터를 특성을 기초로 음악의 유사성을 판단한 뒤 유사한 음악을 분류하여 추천하는 음악 추천 시스템에 관한 것이다.The present invention relates to a music recommendation system using intrinsic information of music, and more particularly, extracts a highlight portion of music, determines the similarity of music based on characteristics of melody data and beat data of the extracted highlight portion, and It relates to a music recommendation system that classifies and recommends.

과거 MP3 플레이어에 음악 파일을 다운받아 음악을 듣던 현대인들은 스마트폰이 보급되고 데이터 사용료가 점차 하락하면서 Melon, Gini, Bugs, Mnet 등의 음악 스트리밍 서비스를 통해서 음악을 듣는다. 스트리밍 서비스 업체들은 더 나은 경험을 제공하여 자사의 서비스를 사용자들이 사용하도록 하고자 앞다투어 데이터 분석을 통한 음악 추천 서비스를 제공한다. Melon의 Fou U의 경우 선호 아티스트 3명, 관심 음악 5곡 및 성별을 입력하면 관심도와 유사도를 분석하여 추천 곡이 나열되고 선정한 아티스트와 유사한 아티스트도 함께 나열된다. Bugs의 뮤직 4U 및 뮤직 PD는 PD들이 직접 선정한 동일 키워드에 대한 플레이리스트를 나열한다.Modern people who listened to music by downloading music files to MP3 players in the past listen to music through music streaming services such as Melon, Gini, Bugs, and Mnet as smartphones are popular and data usage fees are gradually falling. Streaming service providers are striving to provide music recommendation services through data analysis in order to provide users with a better experience and use their services. In the case of Melon's Fou U, if you enter 3 preferred artists, 5 songs of interest, and gender, recommended songs are analyzed by analyzing interest and similarity, and artists similar to the selected artist are also listed. Bugs' Music 4U and Music PD list playlists for the same keyword selected by the PDs.

한편, 음악 추천 서비스는 사용자 기반 추천 시스템과 아이템 기반 추천 시스템, 그리고 이 둘을 조합한 하이브리드 시스템으로 이루어질 수 있다. 이러한 추천 시스템은 음악뿐만 아니라 아마존, 옥션, 쿠팡 등 다양한 유통 업체에서도 전략적으로 사용되고 있다. 추천 시스템에서 음악이나 상품 등은 사용자의 선호도를 기반으로 추천되거나 아이템들의 연관성을 기반으로 추천되는데 기술적으로는 메타데이터를 기반으로 추천이 이루어진다. 메타데이터란 그 자체를 설명하는 간접적인 데이터이다. 예를 들어, 음악의 경우 해당 음악의 작곡가, 작사가, 가수, 장르, 앨범, 연도 등이 해당 음악에 대한 메타데이터가 될 수 있다. 하지만, 이러한 메타데이터는 아이템의 내용에 기반한 것이 아니고 간접적으로 부여된 데이터이기 때문에 상황에 따라 해당 아이템의 콘텐츠를 표현하기에는 다소 부족한 면이 없지 않다.Meanwhile, the music recommendation service may consist of a user-based recommendation system, an item-based recommendation system, and a hybrid system combining the two. This recommendation system is strategically used not only by music but also by various retailers such as Amazon, Auction, and Coupang. In the recommendation system, music or products are recommended based on user preferences or relevance of items, but technically, recommendations are made based on metadata. Metadata is indirect data that describes itself. For example, in the case of music, the composer, lyricist, singer, genre, album, year, etc. of the music may be metadata for the music. However, since this metadata is not based on the content of the item, but is indirectly assigned, it is not lacking in expressing the content of the item depending on the situation.

종래의 음악 추천 서비스는 특정 음악의 메타데이터를 기반으로 사용자 기반과 아이템 기반 추천을 수행하기 때문에 충분한 데이터가 확보되지 않은 비인기 음악에 대해서는 추천해줄 수가 없고, 실제 음악의 내재적 데이터를 기반으로는 추천해주지 못하는 한계점이 있다. 즉, 종래의 어떠한 메타데이터도 음악의 내재적 데이터를 표현해주지 못하는 한계점이 있다.Since the conventional music recommendation service performs user-based and item-based recommendations based on the metadata of specific music, it cannot recommend unpopular music for which sufficient data is not secured, and does not recommend based on the intrinsic data of the actual music. There is a limit to not being able to. In other words, there is a limitation in that no conventional metadata can express the inherent data of music.

한국 등록특허 제10-0895009호Korean Patent Registration No. 10-0895009 한국 등록특허 제10-0955523호Korean Patent Registration No. 10-0955523

상술한 문제점을 해결하기 위한 본 발명의 목적은 음악의 전 구간이 아닌 하이라이트 부분을 추출하여 유사도 판단의 효율성을 향상시키는 음악 추천 시스템에 관한 것이다.An object of the present invention for solving the above-described problem is to a music recommendation system that improves the efficiency of determining similarity by extracting a highlight portion, not the entire section of music.

본 발명의 다른 목적은 음악의 하이라이트 부분 내에 내재된 멜로디 측면의 특징 및 비트 측면의 특징을 추출하여 음악의 유사도 판단의 정확성을 향상시키는 음악 추천 시스템에 관한 것이다.Another object of the present invention relates to a music recommendation system that improves the accuracy of determining the similarity of music by extracting features of a melody side and a feature of a beat side inherent in a highlight portion of music.

본 발명의 다른 목적은 음악의 멜로디 측면과 비트 측면에 있어서, MFCC 특성, 크로마 특성 및 템포 특성을 추출하고 이를 이용하여 음악의 유사도 판단의 정확성을 향상시키는 음악 추천 시스템에 관한 것이다.Another object of the present invention relates to a music recommendation system that extracts MFCC characteristics, chroma characteristics, and tempo characteristics in terms of a melody and a beat of music, and improves the accuracy of determining the similarity of music by using them.

상술한 목적을 달성하기 위하여, 본 발명의 일 실시예에 따른 음악 추천 시스템은 원격의 서버로부터 장르별 대표 가수의 음악을 수집하고 수집된 음악 중 동일한 음악의 디지털 싱글버전과 라이브 공연버전이 존재하는 경우 라이브 공연버전은 제거하는 데이터 수집 모듈(100); 수집된 음악의 하이라이트를 추출하는 하이라이트 추출 모듈(200); 추출된 음악의 하이라이트에서 음악적 특징을 추출하는 특징 추출 모듈(300); 및 추출된 특징을 기초로 수집된 음악들을 유사한 음악들끼리 분류하는 음악 분류 모듈(400)을 포함할 수 있다.In order to achieve the above object, the music recommendation system according to an embodiment of the present invention collects music of representative singers for each genre from a remote server, and there is a digital single version and a live performance version of the same music among the collected music. A data collection module 100 to remove the live performance version; A highlight extraction module 200 for extracting highlights of the collected music; A feature extraction module 300 for extracting musical features from the extracted highlights of music; And a music classification module 400 for classifying the collected songs based on the extracted features among similar songs.

바람직하게는, 하이라이트 추출 모듈(200)은 MSAF(Musical Structural Analysis Framework) 기법을 이용하여 음악을 인트로(intro), 벌스(verse), 코러스(chorus), 브릿지(bridge) 및 아웃트로(outro)로 구분하고 구분된 각 구간에 라벨(label)을 붙여 식별하는 구간 구분 모듈(210)을 포함하고, 하이라이트 추출 모듈(200)은 라벨을 이용하여 음악 내에서 코러스 및 반복구를 추출한 뒤, 추출된 코러스 및 반복구 중 음악 내에서 가장 많이 반복되는 코러스 및 반복구를 결정하고, 결정된 코러스 및 반복구 중 음악의 제일 앞부분에 나오는 구간을 하이라이트로 결정하는 하이라이트 결정 모듈(220)을 포함할 수 있다.Preferably, the highlight extraction module 200 divides music into an intro, a verse, a chorus, a bridge, and an outro using a Musical Structural Analysis Framework (MSAF) technique. And a section classification module 210 for identifying by attaching a label to each divided section, and the highlight extraction module 200 extracts a chorus and a repetition phrase from the music using a label, and then extracts the chorus and It may include a highlight determination module 220 that determines the chorus and repetition phrases that are most frequently repeated in the music among the repetition phrases, and determines a section appearing at the earliest part of the music among the determined chorus and repetition phrases as a highlight.

바람직하게는, 하이라이트 결정 모듈(220)은 하이라이트로 결정된 코러스 및 반복구가 음악의 1/4 지점보다 앞에 존재하는 경우에는 해당 코러스 및 반복구를 하이라이트로 결정하지 않고 그 다음으로 많이 반복되는 코러스 및 반복구 중 음악의 1/4 지점보다 뒤에 존재하는 코러스 및 반복구를 하이라이트로 결정할 수 있다.Preferably, the highlight determination module 220 does not determine the chorus and repetition phrase as highlights when the chorus and repetition phrase determined as the highlight exist in front of the 1/4 point of the music, but the next most repeated chorus and Among the repeating phrases, a chorus and a repeating phrase present after a quarter point of the music may be determined as highlights.

바람직하게는, 특징 추출 모듈(300)은 음악의 하이라이트에 포함된 멜로디 데이터와 비트 데이터를 분리하여 추출하는 Mel-Filtering 모듈(310), 음악의 하이라이트에 해당하는 오디오 신호를 DCT(Discrete Cosine Transform) 처리하여 MFCC(Mel-Frequency Cepstral Coefficient) 계수를 추출하는 MFCC 추출 모듈(320), 음악의 하이라이트에 해당하는 오디오 신호를 STFT(Short-Time Fourier Transform), Constant Q Transform 및 CENS(Chroma-Energy Normalized Statistics) 처리하여 크로마 특징을 추출하는 크로마 추출 모듈(330) 및 하이라이트에서 분리 추출된 비트 데이터를 통해 템포(tempo)를 산출할 수 있다.Preferably, the feature extraction module 300 is a Mel-Filtering module 310 that separates and extracts melody data and bit data included in the music highlight, and DCT (Discrete Cosine Transform) an audio signal corresponding to the music highlight. An MFCC extraction module 320 that processes and extracts MFCC (Mel-Frequency Cepstral Coefficient) coefficients, and an audio signal corresponding to the highlight of music is converted to Short-Time Fourier Transform (STFT), Constant Q Transform, and Chroma-Energy Normalized Statistics (CENS). ), the tempo may be calculated through the chroma extraction module 330 that extracts the chroma feature by processing and the bit data separated and extracted from the highlight.

바람직하게는, 음악 분류 모듈(400)은 특징 추출 모듈(300)에 의해 추출된 특징들을 기초로 K-means 기법에 의해 유사한 음악을 분류하는 K-means 모듈(410), GMM(Gaussian Mixture Model) 기법에 의해 유사한 음악을 분류하는 GMM 모듈(420) 및 DBSCAN(Density-based spatial clustering of applications with noise) 기법에 의해 유사한 음악을 분류하는 DBSCAN 모듈(430)을 포함할 수 있다.Preferably, the music classification module 400 is a K-means module 410, a Gaussian Mixture Model (GMM) for classifying similar music by a K-means technique based on features extracted by the feature extraction module 300 A GMM module 420 for classifying similar music by a technique and a DBSCAN module 430 for classifying similar music by a Density-based spatial clustering of applications with noise (DBSCAN) technique.

본 발명은 음악의 전 구간이 아닌 하이라이트 부분을 추출하여 유사도 판단의 효율성을 향상시키는 음악 추천 시스템을 제공할 수 있다.The present invention can provide a music recommendation system that improves the efficiency of determining similarity by extracting a highlight portion, not the entire section of music.

본 발명은 음악의 하이라이트 부분 내에 내재된 멜로디 측면의 특징 및 비트 측면의 특징을 추출하여 음악의 유사도 판단의 정확성을 향상시키는 음악 추천 시스템을 제공할 수 있다.The present invention can provide a music recommendation system that improves the accuracy of determining the similarity of music by extracting the features of the melody side and the features of the beat side inherent in the highlight portion of music.

본 발명은 음악의 멜로디 측면과 비트 측면에 있어서, MFCC 특성, 크로마 특성 및 템포 특성을 추출하고 이를 이용하여 음악의 유사도 판단의 정확성을 향상시키는 음악 추천 시스템을 제공할 수 있다.The present invention can provide a music recommendation system that extracts MFCC characteristics, chroma characteristics, and tempo characteristics in terms of the melody and the beat of music, and improves the accuracy of determining the similarity of music by using them.

도 1은 본 발명의 일 실시예에 따른 음악 추천 시스템의 구성을 나타낸 블록도이다.
도 2는 본 발명의 일 실시예에 따른 하이라이트 추출 모듈의 구성을 나타낸 블록도이다.
도 3은 본 발명의 일 실시예에 따른 특징 추출 모듈의 구성을 나타낸 블록도이다.
도 4는 본 발명의 일 실시예에 따른 음악 분류 모듈의 구성을 나타낸 블록도이다.
도 5는 본 발명의 일 실시예에 따른 특징 추출 모듈의 특징 추출 과정을 도식화한 블록도이다.
도 6은 본 발명의 일 실시예에 따른 음악 분류 모듈의 분류 프로세스에 사용되는 군집 개수에 따른 BIC 값을 나타낸 그래프와 분류된 음악들의 리스트를 나타낸 도면이다.1 is a block diagram showing the configuration of a music recommendation system according to an embodiment of the present invention.
2 is a block diagram showing the configuration of a highlight extraction module according to an embodiment of the present invention.
3 is a block diagram showing the configuration of a feature extraction module according to an embodiment of the present invention.
4 is a block diagram showing the configuration of a music classification module according to an embodiment of the present invention.
5 is a block diagram schematically illustrating a feature extraction process of a feature extraction module according to an embodiment of the present invention.
6 is a diagram showing a graph showing a BIC value according to the number of clusters used in a classification process of a music classification module according to an embodiment of the present invention and a list of classified music.

이하, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다.Hereinafter, some embodiments of the present invention will be described in detail through exemplary drawings. In adding reference numerals to constituent elements in each drawing, it should be noted that the same constituent elements are given the same reference numerals as much as possible even though they are indicated on different drawings.

그리고 본 발명의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 실시예에 대한 이해를 방해한다고 판단되는 경우에는 그 상세한 설명은 생략한다.Further, in describing an embodiment of the present invention, if it is determined that a detailed description of a related known configuration or function interferes with an understanding of the embodiment of the present invention, a detailed description thereof will be omitted.

또한, 본 발명의 실시예의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다.In addition, in describing the constituent elements of the embodiments of the present invention, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, order, or order of the component is not limited by the term.

도 1 내지 도 6을 참조하여, 본 발명의 일 실시예에 따른 음악 추천 시스템의 구성 및 동작에 대하여, 이하 설명한다.A configuration and operation of a music recommendation system according to an embodiment of the present invention will be described below with reference to FIGS. 1 to 6.

도 1은 본 발명의 일 실시예에 따른 음악 추천 시스템(이하, “본 시스템”이라 칭함)의 구성을 나타낸 블록도이다.1 is a block diagram showing the configuration of a music recommendation system (hereinafter referred to as “the system”) according to an embodiment of the present invention.

도 1을 참조하면, 본 시스템은 데이터 수집 모듈(100), 하이라이트 추출 모듈(200), 특징 추출 모듈(300) 및/또는 음악 분류 모듈(400)을 포함한다.Referring to FIG. 1, the system includes a data collection module 100, a highlight extraction module 200, a feature extraction module 300, and/or a music classification module 400.

데이터 수집 모듈(100)은 원격의 서버(유튜브 등)로부터 장르별 대표 가수의 음악을 수집한다. 수집된 음악 중 중복된 음악은 제거하고, 또한 수집된 음악 중 동일한 음악의 디지털 싱글버전과 라이브 공연버전이 존재하는 경우 라이브 공연버전은 제거한다.The data collection module 100 collects music of representative singers by genre from a remote server (such as YouTube). Redundant music among the collected music is removed, and if there is a digital single version and a live performance version of the same music among the collected music, the live performance version is removed.

하이라이트 추출 모듈(200)은 수집된 음악의 하이라이트를 추출한다. 음악 전체를 기초로 유사도를 측정하는 것은 효율성 측면에서 의미가 없으므로 음악의 하이라이트 부분만을 추출하여 유사도 판단에 이용한다. 본 모듈에 대한 상세한 설명은 후술한다.The highlight extraction module 200 extracts highlights of the collected music. Since measuring the similarity based on the whole music is meaningless in terms of efficiency, only the highlight part of the music is extracted and used to determine the similarity. A detailed description of this module will be described later.

특징 추출 모듈(300)은 하이라이트 추출 모듈(200)에 의해 추출된 음악의 하이라이트에서 음악적 특징을 추출한다. 기존에 존재하지 않았던 특성들을 추출하고 새로운 기법에 의해 추출된 특성을 분석하여 음악의 유사도를 판단한다. 본 모듈에 대한 상세한 설명은 후술한다.The feature extraction module 300 extracts musical features from the highlights of the music extracted by the highlight extraction module 200. The similarity of music is judged by extracting features that did not exist before and analyzing the features extracted by a new technique. A detailed description of this module will be described later.

음악 분류 모듈(400)은 특징 추출 모듈(300)에 의해 추출된 특징을 기초로 수집된 음악들을 유사한 음악들끼리 분류하여 최종적으로 수집된 음악들간의 유사도를 판단한다. 이에 대한 상세한 설명은 후술한다.The music classification module 400 classifies the collected music based on the feature extracted by the feature extraction module 300 among similar music, and finally determines the similarity between the collected music. A detailed description of this will be described later.

도 2는 본 발명의 일 실시예에 따른 하이라이트 추출 모듈의 구성을 나타낸 블록도이다.2 is a block diagram showing the configuration of a highlight extraction module according to an embodiment of the present invention.

도 2를 참조하면, 하이라이트 추출 모듈(200)은 구간 구분 모듈(210) 및/또는 하이라이트 결정 모듈(220)을 포함한다.Referring to FIG. 2, the highlight extraction module 200 includes a section division module 210 and/or a highlight determination module 220.

구간 구분 모듈(210)은 MSAF(Musical Structural Analysis Framework) 기법을 이용하여 음악을 인트로(intro), 벌스(verse), 코러스(chorus), 브릿지(bridge) 및 아웃트로(outro)로 구분하고 구분된 각 구간에 라벨(label)을 붙여 각 구간을 식별한다.The section division module 210 divides music into intro, verse, chorus, bridge, and outro using a Musical Structural Analysis Framework (MSAF) technique, and Each section is identified by labeling it.

보다 구체적으로 설명하면, 대부분의 음악은 인트로(intro), 벌스(verse), 코러스(chorus), 벌스(verse), 코러스(chorus), 브릿지(bridge), 벌스(verse), 코러스(chorus) 및 아웃트로(outro)의 구조를 갖는다. 이 구조에서 코러스의 비중이 가장 높으며, 청자는 코러스를 음악의 주요 부분으로 인식한다. 나아가, 음악에는 반복구(hook) 부분이 포함이 되며, 후크송의 경우 전체 음악에서 반복구가 차지하는 비중은 40프로를 넘는다.More specifically, most of the music is intro, verse, chorus, verse, chorus, bridge, verse, chorus, and It has an outro structure. In this structure, the weight of the chorus is the highest, and the listener recognizes the chorus as a major part of the music. Furthermore, the music includes a hook part, and in the case of hook songs, the proportion of the repeat phrase in the total music exceeds 40%.

구간 구분 모듈(210)은 MSAF(MusicalStructural Analysis Framework) 패키지 기법을 이용하여 전체 음악의 구간을 구분한다. MSAF에 의해 구분된 구간들 중 어떤 부분이 코러스 또는 반복구에 해당하는지에 대한 판단은 정성적으로 이루어 질 수 있다. 구간 구분 모듈(210)은 구간을 나누는데 “foote”를 사용하고, 각 구간에 라벨을 붙이는데 “scluster” 기능을 사용할 수 있다. The section classification module 210 classifies a section of the entire music using a MusicalStructural Analysis Framework (MSAF) package technique. The determination of which part of the sections divided by MSAF corresponds to a chorus or a repeat phrase can be made qualitatively. The section division module 210 may use “foote” to divide sections, and use a “scluster” function to label each section.

하이라이트 추출 모듈(200)은 라벨을 이용하여 음악 내에서 코러스 및 반복구를 추출한 뒤, 추출된 코러스 및 반복구 중 음악 내에서 가장 많이 반복되는 코러스 및 반복구를 결정하고, 결정된 코러스 및 반복구 중 음악의 제일 앞부분에 나오는 구간을 하이라이트로 1차적으로 결정한다. 그러고 나서, 하이라이트 결정 모듈(220)은 1차적으로 하이라이트로 결정된 코러스 및 반복구가 음악의 1/4 지점보다 앞에 존재하는 경우에는 해당 코러스 및 반복구를 하이라이트로 결정하지 않고 그 다음으로 많이 반복되는 코러스 및 반복구 중 음악의 1/4 지점보다 뒤에 존재하는 코러스 및 반복구를 하이라이트로 결정한다.The highlight extraction module 200 extracts the chorus and repetition phrases from the music using the label, and then determines the chorus and repetition phrases that are most frequently repeated in the music among the extracted chorus and repetition phrases, and The section that appears at the beginning of the music is primarily determined as a highlight. Then, the highlight determination module 220 does not determine the chorus and repeat phrases as highlights when the chorus and repeating phrases that are primarily determined as highlights exist in front of the 1/4 point of the music, and repeats the next many times. Among the chorus and repeat phrases, the chorus and repeat phrases that exist after 1/4 of the music are determined as highlights.

보다 구체적으로 설명하면, 하이라이트 추출 모듈(200)은 구간 구분 모듈(210)에 의해 구분되고 식별된 라벨(구간)들 중에 코러스 및 반복구를 하이라이트로 결정한다. 그리고, 전체 음악에서 코러스와 반복구는 가장 많이 반복되는 특성을 갖기 때문에 하이라이트 추출 모듈(200)은 전체 음악에서 가장 많이 반복되는 라벨의 구간을 코러스 또는 반복구로 보고 이를 하이라이트로 결정한다. 그리고, 음악의 제일 앞부분에 나타내는 코러스 또는 반복구를 하이라이트로 결정한다. 다만, 이 경우 코러스나 반복구 전에 나타내는 벌스가 하이라이트로 간주되는 현상이 나타날 수 있고 이를 방지하기 위해 반복된 라벨이 악원의 1/4 지점보다 앞에 있다면 그 라벨은 해당 음악의 하이라이트로 인정하지 않고, 차순으로 많이 반복된 라벨이 음악의 1/4 지점보다 뒤에 있다면 그 구간을 하이라이트로 결정한다. 나아가, 상술한 과정에 의해서도 하이라이트를 결정하지 못하는 경우에는 임의로 음악의 1/4 지점 바로 뒤에 나타나는 구간을 해당 음악의 하이라이트로 결정한다. 다른 일 실시예에 따르면, 하이라이트를 결정하는 1/4 지점을 1/8 지점으로 변경할 수 있다. 또한, 하이라이트 추출 모듈(200)은 하이라이트로 결정된 부분의 지속 시간이 10초 이하인 경우에는 해당 부분을 하이라이트로 결정하지 않고 상술한 바와 같이 차순의 구간을 하이라이트로 결정한다.More specifically, the highlight extraction module 200 determines a chorus and a repetition phrase among the labels (sections) classified and identified by the section division module 210 as highlights. In addition, since the chorus and the repetition phrase in the entire music have the characteristic that the most repetitive phrases are repeated, the highlight extraction module 200 determines the section of the label that is most frequently repeated in the entire music as a chorus or a repetition phrase and determines this as a highlight. Then, the chorus or repetition phrase shown at the beginning of the music is decided as the highlight. However, in this case, there may be a phenomenon in which the verse indicated before the chorus or the repeating phrase is regarded as a highlight, and to prevent this, if the repeated label is in front of the 1/4 point of the music circle, the label is not recognized as the highlight of the music. If a label that is repeated many times in sequential order is behind a quarter of the music, the section is selected as a highlight. Further, if the highlight cannot be determined even by the above-described process, a section appearing immediately after the 1/4 point of the music is arbitrarily determined as the highlight of the music. According to another embodiment, a quarter point for determining the highlight may be changed to a 1/8 point. In addition, when the duration of the portion determined as the highlight is less than 10 seconds, the highlight extraction module 200 does not determine the corresponding portion as the highlight, and determines the next-order section as the highlight as described above.

도 3은 본 발명의 일 실시예에 따른 특징 추출 모듈의 구성을 나타낸 블록도이다. 그리고, 도 5는 본 발명의 일 실시예에 따른 특징 추출 모듈의 특징 추출 과정을 도식화한 블록도이다.3 is a block diagram showing the configuration of a feature extraction module according to an embodiment of the present invention. 5 is a block diagram schematically illustrating a feature extraction process of a feature extraction module according to an embodiment of the present invention.

도 3 및 도 5를 참조하면, 특징 추출 모듈(300)은 Mel-Filtering 모듈(310), MFCC 추출 모듈(320), 크로마 추출 모듈(330) 및/또는 템포 산출 모듈(340)을 포함한다.3 and 5, the feature extraction module 300 includes a Mel-Filtering module 310, an MFCC extraction module 320, a chroma extraction module 330 and/or a tempo calculation module 340.

Mel-Filtering 모듈(310)은 음악의 하이라이트에 포함된 멜로디 데이터와 비트 데이터를 분리하여 추출한다. 구체적으로, 코러스나 반복구 구간의 파형 데이터는 그 자체로 음악적 의미를 지니지 않고, 소리들이 갖는 크기 값들의 단순한 나열이다. 음악적 의미를 가지는 데이터 형태로 노래를 표현하기 위해 청각 정보와 음악과 연관된 특징들을 추출하여 사용할 필요가 있다. 이를 위하여, Mel-Filtering 모듈(310)은 Python의 Librosa 패키지를 사용할 수 있다. Mel-Filtering 모듈(310)은 하이라이트 구간에 대한 오디오 신호를 22.05kHz로 샘플링하여 프레임 단위로 구분한다. 그리고, 각 프레임에 해당하는 오디오 신호를 Hann window 처리를 하고, STFT(Short-Time Fourier Transform) 처리를 통해 주파수계의 값으로 변환한다. 그리고, 변환된 값을 기초로 멜로디와 연관된 특징과 비트와 연관된 특징을 분리하여 추출한다. 이때, Librosa 패키지가 제공하는 median-filtering harmonic-percussive separation 알고리즘을 활용하여 음악의 멜로디와 타악기 소리(비트)를 분리할 수 있다. 나아가, Mel-Filtering 모듈(310)은 하이라이트 구간을 더 촘촘하게 분석하기 위한 Mel Filter Bank 처리를 하고, 인간이 소리를 인지하는 형태로 단위를 맞춰주기 위해 log 스케일을 적용한다.The Mel-Filtering module 310 separates and extracts melody data and beat data included in the highlight of music. Specifically, the waveform data of the chorus or repetition section does not have a musical meaning by itself, but is a simple sequence of magnitude values of sounds. In order to express a song in the form of data having a musical meaning, it is necessary to extract and use auditory information and features related to music. To this end, the Mel-Filtering module 310 may use Python's Librosa package. The Mel-Filtering module 310 samples the audio signal for the highlight section at 22.05 kHz and classifies it in units of frames. Then, the audio signal corresponding to each frame is processed by a Hann window and converted into a frequency meter value through a short-time fourier transform (STFT) process. And, based on the converted value, a feature related to a melody and a feature related to a bit are separated and extracted. At this time, it is possible to separate the melody of the music from the percussion sound (beat) using the median-filtering harmonic-percussive separation algorithm provided by the Librosa package. Further, the Mel-Filtering module 310 processes the Mel Filter Bank to analyze the highlight section more closely, and applies a log scale to match the unit in a form in which humans perceive sound.

MFCC 추출 모듈(320)은 음악의 하이라이트에 해당하는 오디오 신호를 DCT(Discrete Cosine Transform) 처리하여 MFCC(Mel-Frequency Cepstral Coefficient) 계수를 추출한다. 참고로, MFCC는 인간의 발성 모델을 기반으로 하기 때문에 음악에 포함된 여러 악기들에 의한 소리가 복합된 디지털 음악에서도 특징 추출을 가능하도록 한다.The MFCC extraction module 320 extracts a Mel-Frequency Cepstral Coefficient (MFCC) coefficient by processing the audio signal corresponding to the highlight of the music with Discrete Cosine Transform (DCT). For reference, since MFCC is based on a model of human vocalization, feature extraction is possible even in digital music in which sounds from various instruments included in music are combined.

크로마 추출 모듈(330)은 음악의 하이라이트에 해당하는 오디오 신호를 STFT(Short-Time Fourier Transform), Constant Q Transform 및 CENS(Chroma-Energy Normalized Statistics) 처리하여 크로마 특징을 추출한다. 구체적으로, 음악은 12 음계로 이루어져 있는데, 이러한 12 음계에 해당하는 주파수 성분인 크로마 특징(chroma feature)을 추출하여 유서도를 판단한다. 이때, 크로마 추출 모듈(330)은 STFT(Short-Time Fourier Transform), Constant Q Transform 및/또는 CENS(Chroma-Energy Normalized Statistics) 기법을 이용하여 하이라이트의 크로마 특징을 추출한다. The chroma extraction module 330 extracts a chroma feature by processing an audio signal corresponding to a highlight of a music with a short-time fourier transform (STFT), a constant Q transform, and a chroma-energy normalized statistics (CENS). Specifically, music is composed of 12 scales, and the intention is determined by extracting a chroma feature, which is a frequency component corresponding to the 12 scales. At this time, the chroma extraction module 330 extracts a chroma feature of the highlight using a Short-Time Fourier Transform (STFT), a Constant Q Transform, and/or a Chroma-Energy Normalized Statistics (CENS) technique.

템포 산출 모듈(340)은 하이라이트에서 분리 추출된 비트 데이터를 통해 템포(tempo)를 산출한다. 구체적으로, 템포 산출 모듈(340)은 Mel-Filtering 모듈(310)에 의해 분류된 비트 데이터(percussive 데이터)로부터 비트 정보를 추출하고 이를 이용하여 템포(tempo)를 산출한다.The tempo calculation module 340 calculates a tempo through bit data separated and extracted from the highlight. Specifically, the tempo calculation module 340 extracts bit information from the percussive data classified by the Mel-Filtering module 310 and calculates a tempo using this.

상술한 바와 같이, 특징 추출 모듈(300)에 의해 추출된 특징들은 거리 계산을 위해 모두 동일한 길이의 벡터로 표현한다. 이는 특징이 표현되는 공간에 위치한 각 펙터들의 유클리드(Euclidean) 거리를 계산하고 계산된 거리를 통해 유사도가 판단되기 때문이다. 나아가, 하이라이트 구간의 길이가 다르면, 프레임의 개수도 달라지기 때문에, 각 프레임 별로 계산된 특징값들의 평균을 취하여 유사도 판단에 사용한다.As described above, the features extracted by the feature extraction module 300 are all expressed as vectors of the same length for distance calculation. This is because the Euclidean distance of each factor located in the space where the feature is expressed is calculated and the similarity is determined through the calculated distance. Further, if the length of the highlight section is different, the number of frames is also different, so the average of the feature values calculated for each frame is taken and used to determine the similarity.

도 4는 본 발명의 일 실시예에 따른 음악 분류 모듈의 구성을 나타낸 블록도이다. 그리고, 도 6은 본 발명의 일 실시예에 따른 음악 분류 모듈의 분류 프로세스에 사용되는 군집 개수에 따른 BIC 값을 나타낸 그래프와 분류된 음악들의 리스트를 나타낸 도면이다.4 is a block diagram showing the configuration of a music classification module according to an embodiment of the present invention. 6 is a diagram illustrating a graph showing a BIC value according to the number of clusters used in a classification process of a music classification module according to an embodiment of the present invention and a list of classified music.

도 4 및 도 6을 참조하면, 음악 분류 모듈(400)은 K-means 모듈(410), GMM 모듈(420) 및/또는 DBSCAN 모듈(430)을 포함한다.4 and 6, the music classification module 400 includes a K-means module 410, a GMM module 420 and/or a DBSCAN module 430.

K-means 모듈(410)은 특징 추출 모듈(300)에 의해 추출된 특징들을 기초로 K-means 기법에 의해 유사한 음악을 분류한다. K-means 기법이란 모집단 또는 범주에 대한 사전 정보가 없는 경우 주어진 관측 값들 사이의 거리 또는 유사성을 이용하는 분석법으로 군집의 개수를 정하는 것으로부터 분석이 시작된다. K-means 모듈(410)은 군집의 수 K를 정의하고 초기 K개 군집의 중심점을 선택한다. 모든 관측 값들을 가장 가까운 중심의 군집에 할당하며, 새로운 군집의 중심을 계산한다. 재정의된 중심값을 기준으로 다시 거리 기반의 군집을 재분류하며, 경계가 변경되지 않을 때 연산을 종료하게 되고, 최종 군집의 모습을 갖추게 된다.The K-means module 410 classifies similar music by the K-means technique based on the features extracted by the feature extraction module 300. The K-means method is an analysis method that uses the distance or similarity between given observations when there is no prior information about a population or category, and the analysis starts from determining the number of clusters. The K-means module 410 defines the number K of clusters and selects the center point of the initial K clusters. All observed values are assigned to the nearest central cluster, and the centroid of the new cluster is calculated. The distance-based cluster is reclassified based on the redefined center value, and the calculation is terminated when the boundary does not change, and the final cluster is formed.

도 6의 좌측 상단의 그래프는 본 실시예에 가장 적합한 K 값을 찾기 위해, 군집의 개수에 따른 BIC 값을 나타낸 그래프이다. 일 실시예에 따르면, K-means 모듈(410)은 그래프에서 BIC의 값이 급격하게 꺾이는 지점인 3, 4, 5를 초기 K 값으로 설정하고 611개의 노래와 302개의 feature로 이루어진 데이터를 K-means 기법을 적용하여 음악을 분류하였다. 그 결과, K 값을 달리하였을 때 중앙값과 가장 가까운 곡은 다르기도 하고 겹치기도 하였다. 결과의 검증을 위해, 표절의혹 곡들이 얼마나 서로 같은 군집으로 묶이는지를 확인한 결과, 10번 시행하였을 때 군집의 개수를 3개로 설정하였을 때 13쌍중 7.2쌍이 같은 군집으로 가장 많이 묶였고, 따라서 K=3 값을 군집의 개수로 결정하였다.The graph at the top left of FIG. 6 is a graph showing BIC values according to the number of clusters in order to find the most suitable K value for this embodiment. According to an embodiment, the K-means module 410 sets 3, 4, and 5, which are points where the BIC value in the graph sharply curves, as an initial K value, and K-means data consisting of 611 songs and 302 features. Music was classified by applying means technique. As a result, when K values were different, the songs closest to the median value were different and overlapped. For the verification of the results, as a result of checking how many plagiarism suspicious songs are grouped into the same cluster, when the number of clusters was set to 3 in 10 trials, 7.2 pairs of 13 pairs were most often grouped into the same cluster, so K=3 The value was determined by the number of clusters.

도 6의 하단의 그림을 참고하면, K=3으로 설정한 K-means 기법에 의하면, '태양'의 '눈, 코, 입'을 중앙값으로 갖는 군집(첫 번째 군집)에는 약 96개의 음악이 포함되었다. '마마무'의 'Piano Man'을 중앙값으로 갖는 군집(두 번째 군집)에는 약 264개의 음원이 포함되었다. '노라조'의 '고등어'를 중앙값으로 갖는 군집(세 번째 군집)에는약 256개의 음원이 포함되었다. 첫 번째 군집은 tempo가 비교적 낮고 기계음이 거의 없는 곡들이 포진해있었다. 두 번째 군집은 락과 K-pop이 섞여 나왔는데 공통적으로 강한 비트나 드럼 사운드가 들리 고 distortion이 적용된 기타음이 들렸다. 세 번 째 군집은 기계음과 트로트가 섞인 군집이었다. 기계음과 트로트가 섞여 있음에도 불구 하고 다른 군집에 비해 거리가 상대적으로 짧고 밀도가 높았다. 각 군집의 중앙값과 가까운 10개의 곡을 나타내는 리스트를 도 6의 하단에 나타내었다.Referring to the figure at the bottom of FIG. 6, according to the K-means technique set to K=3, about 96 songs are in the cluster (the first cluster) having the'eyes, nose, and mouth' of the'sun' as the median values. Included. The cluster (the second cluster) with the median value of'Piano Man' of'Mamamoo' contained about 264 sound sources. About 256 sound sources were included in the cluster (third cluster) with the median value of'mackerel' in'Norazo'. The first group had songs with relatively low tempo and little mechanical sound. In the second cluster, rock and K-pop were mixed, and in common there was a strong beat or drum sound and a guitar sound with distortion applied. The third cluster was a cluster of mechanical sound and trot. Despite the mixture of mechanical sound and trot, the distance was relatively short and the density was high compared to other clusters. A list showing 10 songs close to the median value of each cluster is shown at the bottom of FIG. 6.

GMM 모듈(420)은 GMM(Gaussian Mixture Model) 기법에 의해 유사한 음악을 분류한다. GMM은 머신러닝에서 Unsupervised Learning에서 많이 사용되는 기법으로, Mixture model은 전체 분포에 하위 분포가 존재한다고 가정하고, 데이터가 모수를 갖는 여러 개의 분포로부터 생성되었다고 가정하는 모델이다. 예를 들어, 한 데이터가 포착되었을 때, GMM 모듈(420)은 좌측의 세 정규분포에서 각각 확률을 계산한 뒤 가장 높은 확률을 보이는 클러스터(군집, 그룹)에 해당 데이터를 할당한다.The GMM module 420 classifies similar music by a Gaussian Mixture Model (GMM) technique. GMM is a technique that is widely used in unsupervised learning in machine learning. Mixture model is a model that assumes that sub-distributions exist in the entire distribution, and that data is generated from multiple distributions with parameters. For example, when one data is captured, the GMM module 420 calculates a probability from the three normal distributions on the left and then allocates the data to the cluster (cluster, group) showing the highest probability.

도 6의 우측 상단의 그래프는 본 실시예에 가장 적합한 K 값을 찾기 위해, 군집의 개수에 따른 BIC 값을 나타낸 그래프이다. 그래프에서 BIC의 값이 급격하게 꺾이는 지점인 3, 4, 5를 초기 K 값으로 설정하고 GMM 기법을 적용하여 clustering(군집화, 그룹화)을 진행하였다. GMM도 마찬가지로 초기에 지정되는 중앙값이 무작위로 정해지므로 10번 시행하여 추이를 살폈다. 10번 clustering을 시행하여도 중앙값과 가까운 곡은 항상 같았다. 나아가, 표절 의혹 곡들을 대상으로 10번 시행하였을 때 평균적으로 13쌍 중 6.3쌍으로 가장 많이 같은 군집으로 묶인 K=3 값으로 군집의 개수를 결정하였다. K=3으로 설정하여 군집을 분류한 결과, 첫 번째 군집은 약 86 개, 두 번째 군집은 약 293개, 세 번째 군집은 약 237개를 포함하였다. 또한, 각 군집의 짧은 거리 순으로 한 상위 10곡을 뽑았을 때 K-means 기법에 의한 실시예의 결과와 완벽하게 일치하였다.The graph at the upper right of FIG. 6 is a graph showing BIC values according to the number of clusters in order to find the most suitable K value for this embodiment. In the graph, 3, 4, and 5, which are points where the BIC value sharply bends, were set as initial K values, and clustering (clustering, grouping) was performed by applying the GMM technique. Likewise for GMM, since the median value initially specified is randomly determined, the trend was examined by performing 10 trials. Even after performing clustering 10 times, the songs close to the median were always the same. In addition, the number of clusters was determined by the K=3 value, which was most often grouped into the same cluster with 6.3 pairs out of 13 pairs on average when 10 trials were performed on alleged plagiarism songs. As a result of classifying clusters by setting K=3, the first cluster included approximately 86, the second cluster approximately 293, and the third cluster approximately 237. In addition, when the top 10 songs were selected in the order of the short distance of each cluster, they were in perfect agreement with the results of the Example by the K-means technique.

DBSCAN 모듈(430)은 DBSCAN(Density-based spatial clustering of applications with noise) 기법에 의해 유사한 음악을 분류한다. K-means 기법이 군집간의 거리를 이용하여 클러스터링을 했다면, DBSCAN 모듈(430)은 점이 세밀하게 몰려 있어서 밀도가 높은 부분을 clustering한다. 쉽게 말해, 어느 점(core)을 기준으로 반경(r)내에 점이 n개 이상 있으면 하나의 군집으로 인식하고 해당 작업의 반복을 통해 core point, core는 아니지만 군집에 속하는 border point, 어느 곳에도 속하지 못한 noise로 클러스터링을 수행한다. K-means 기법과 달리 클러스터의 수(K)를 정하지 않아도 된다.The DBSCAN module 430 classifies similar music by using a Density-based spatial clustering of applications with noise (DBSCAN) technique. If the K-means technique performs clustering by using the distance between clusters, the DBSCAN module 430 clusters the high-density part because the points are densely concentrated. In simple terms, if there are more than n points within the radius (r) based on a certain point, it is recognized as a cluster, and through the repetition of the task, the core point, the border point that is not the core but belongs to the cluster, does not belong to any place. Clustering is performed with noise. Unlike the K-means technique, there is no need to determine the number of clusters (K).

본 명세서에서 데이터 수집 모듈(100), 하이라이트 추출 모듈(200), 특징 추출 모듈(300), 음악 분류 모듈(400), 구간 구분 모듈(210), 하이라이트 결정 모듈(220), Mel-Filtering 모듈(310), MFCC 추출 모듈(320), 크로마 추출 모듈(330), 템포 산출 모듈(340), K-means 모듈(410), GMM 모듈(420) 및/또는 DBSCAN 모듈(430)은 메모리에 저장된 연속된 수행과정들을 실행하는 프로세서들일 수 있다. 또는, 프로세서에 의해 구동되고 제어되는 소프트웨어 모듈들로서 동작할 수 있다. 나아가, 프로세서는 하드웨어 장치일 수 있다.In the present specification, the data collection module 100, the highlight extraction module 200, the feature extraction module 300, the music classification module 400, the section classification module 210, the highlight determination module 220, the Mel-Filtering module ( 310), MFCC extraction module 320, chroma extraction module 330, tempo calculation module 340, K-means module 410, GMM module 420 and/or DBSCAN module 430 It may be processors that execute the executed processes. Alternatively, it may operate as software modules driven and controlled by a processor. Furthermore, the processor may be a hardware device.

본 발명의 보호범위가 이상에서 명시적으로 설명한 실시예의 기재와 표현에 제한되는 것은 아니다. 또한, 본 발명이 속하는 기술분야에서 자명한 변경이나 치환으로 말미암아 본 발명이 보호범위가 제한될 수도 없음을 다시 한 번 첨언한다.The scope of protection of the present invention is not limited to the description and expression of the embodiments explicitly described above. In addition, it is added once again that the scope of protection of the present invention may not be limited due to obvious changes or substitutions in the technical field to which the present invention pertains.

데이터 수집 모듈(100), 하이라이트 추출 모듈(200), 특징 추출 모듈(300), 음악 분류 모듈(400), 구간 구분 모듈(210), 하이라이트 결정 모듈(220), Mel-Filtering 모듈(310), MFCC 추출 모듈(320), 크로마 추출 모듈(330), 템포 산출 모듈(340), K-means 모듈(410), GMM 모듈(420), DBSCAN 모듈(430)Data collection module 100, highlight extraction module 200, feature extraction module 300, music classification module 400, section classification module 210, highlight determination module 220, Mel-Filtering module 310, MFCC extraction module 320, chroma extraction module 330, tempo calculation module 340, K-means module 410, GMM module 420, DBSCAN module 430

Claims

A data collection module 100 that collects music of representative singers for each genre from a remote server and removes the live performance version when a digital single version of the same music and a live performance version exist among the collected music;
A highlight extraction module 200 for extracting highlights of the collected music;
A feature extraction module 300 for extracting musical features from the extracted highlights of music; And
A music recommendation system using intrinsic information of music, including a music classification module 400 for classifying collected songs based on the extracted features among similar songs.

The method according to claim 1,
The highlight extraction module 200 divides music into an intro, a verse, a chorus, a bridge, and an outro by using a Musical Structural Analysis Framework (MSAF) technique. It includes a section division module 210 for identifying by attaching a label to the section,
The highlight extraction module 200 extracts the chorus and repetition phrases from the music using the label, and then determines the chorus and repetition phrases that are most often repeated in the music among the extracted chorus and repetition phrases, and among the determined chorus and repetition phrases. A music recommendation system using intrinsic music information, comprising: a highlight determination module 220 for determining a section appearing at the beginning of the music as a highlight.

The method according to claim 2,
The highlight determination module 220 does not determine the chorus and repeat phrases as highlights when the chorus and repeating phrases determined as highlights exist in front of the 1/4 point of the music, and the next most repeated chorus and repeating phrases A music recommendation system using intrinsic information of music, characterized in that chorus and repetition phrases existing behind a quarter point of are determined as highlights.

The method of claim 3,
The feature extraction module 300 includes a Mel-Filtering module 310 that separates and extracts melody data and beat data included in the music highlight, and DCT (Discrete Cosine Transform) processing the audio signal corresponding to the music highlight to MFCC ( An MFCC extraction module 320 that extracts a Mel-Frequency Cepstral Coefficient) coefficient, an audio signal corresponding to a highlight of a music, is processed by STFT (Short-Time Fourier Transform), Constant Q Transform, and CENS (Chroma-Energy Normalized Statistics). A music recommendation system using intrinsic information of music, comprising a chroma extraction module 330 for extracting features and a tempo calculation module 340 for calculating a tempo through bit data separated and extracted from highlights.

The method of claim 4,
The music classification module 400 is a K-means module 410 for classifying similar music by a K-means technique based on features extracted by the feature extraction module 300, and similar music by a Gaussian Mixture Model (GMM) technique. Music recommendation using intrinsic information of music, characterized by including a GMM module 420 for classifying music and a DBSCAN module 430 for classifying similar music by a density-based spatial clustering of applications with noise (DBSCAN) technique. system.