KR102771884B1

KR102771884B1 - Artificial intelligence-based user-customized music recommendation service device

Info

Publication number: KR102771884B1
Application number: KR1020220007166A
Authority: KR
Inventors: 정우주
Original assignee: 주식회사 인디제이
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2025-02-25
Anticipated expiration: 2042-01-18
Also published as: KR20230111382A

Abstract

본 발명은 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치에 관한 것으로, 인공지능(AI)을 기반으로 음원을 분석하여 라벨링(labeling)된 장르를 BPM, 악기구성 및 음향특성을 기초로 세분화하고 음악 감정을 분류하여 라벨을 갱신하여 음악 데이터베이스를 구축하는 음악 데이터베이스 구축부; 사용자 단말의 센서 또는 외부 API로부터 데이터를 수집하고 수집된 데이터를 의미 변환을 통하여 사용자의 상황과 관련된 메타 데이터를 생성하여 사용자의 상황을 추론하는 사용자 상황 분석부; 상기 사용자의 상황을 감정과 연결하여 상기 사용자의 감정을 추론하는 사용자 감정 분석부; 상기 사용자의 상황에 기초한 선호 음악과 상기 사용자의 감정에 기초한 선호 음악을 각각 예측하고 예측된 선호 음악들 간의 매핑을 통해 추천할 음악을 결정하는 선호 음악 예측부; 및 상기 사용자 단말에 추천 음악 리스트를 제공하고 추천 음악 리스트 중 사용자에 의해 선택되는 음악을 상기 음악 데이터베이스로부터 선별하여 상기 사용자 단말에 제공하는 추천 음악 제공부를 포함한다.The present invention relates to a user-tailored music recommendation service device based on artificial intelligence (AI), and includes: a music database construction unit that analyzes sound sources based on artificial intelligence (AI), segments labeled genres based on BPM, instrument composition, and acoustic characteristics, classifies music emotions, and updates labels to construct a music database; a user situation analysis unit that collects data from a sensor of a user terminal or an external API, generates metadata related to a user's situation through semantic conversion of the collected data, and infers the user's situation; a user emotion analysis unit that infers the user's emotion by linking the user's situation to emotion; a preference music prediction unit that predicts preferred music based on the user's situation and preferred music based on the user's emotion, and determines music to recommend through mapping between the predicted preferred music; and a recommendation music provision unit that provides a recommended music list to the user terminal, selects music selected by the user from the recommended music list from the music database, and provides the music to the user terminal.

Description

{ARTIFICIAL INTELLIGENCE-BASED USER-CUSTOMIZED MUSIC RECOMMENDATION SERVICE DEVICE}

본 발명은 음악 추천 서비스 기술에 관한 것으로, 보다 상세하게는 인공지능(AI)을 기반으로 사용자의 상황과 음악 감정을 분석하여 상황과 감정에 어울리는 사용자 맞춤 음악을 추천하는 서비스를 제공할 수 있는 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치에 관한 것이다.The present invention relates to a music recommendation service technology, and more specifically, to an artificial intelligence (AI)-based customized music recommendation service device that can provide a service of recommending customized music that suits the user's situation and emotions by analyzing the user's situation and musical emotions based on artificial intelligence (AI).

스마트폰을 포함한 다양한 스마트 기기가 개발되면서 많은 사람들이 음악 플랫폼에서 스트리밍(실시간 재생) 방식으로 음악 감상을 즐긴다. 음악 감상의 대표적인 방법은 플레이리스트(Playlist) 이다. 실제 음악 스트리밍 이용방식에 관한 조사(2019 음악산업백서, 한국콘텐츠진흥원) '15~19세(60%)'와 '20~24세(59.1%)'에서 '편집/저장해놓은 플레이리스트 감상' 응답 비율이 가장 높게 나타났다.As various smart devices including smartphones are developed, many people enjoy listening to music through streaming (real-time playback) on music platforms. A representative method of listening to music is a playlist. In a survey on actual music streaming usage (2019 Music Industry White Paper, Korea Creative Content Agency), the response rate for ‘listening to edited/saved playlists’ was the highest among ‘15-19 years old (60%)’ and ‘20-24 years old (59.1%)’.

음악 추천 방식은 크게 콘텐츠 기반 추천과 협업 필터링 기반 추천으로 나눌 수 있다. 콘텐츠 기반 추천은 곡 및 가사를 포함하는 음원과 같은 음악의 고유 속성을 활용하여 사용자가 청취하는 비슷한 종류의 음악을 추천하는 방식이다. 협업 필터링 기반 추천은 사용자들의 음악 소비 패턴을 분석하여 사용자와 유사한 성향을 갖는 다른 사용자들이 선호하는 음악을 소개하는 방식이다.Music recommendation methods can be broadly divided into content-based recommendations and collaborative filtering-based recommendations. Content-based recommendations utilize the unique properties of music, such as the music source including the song and lyrics, to recommend similar types of music that the user listens to. Collaborative filtering-based recommendations analyze the music consumption patterns of users and introduce music that other users with similar tendencies prefer.

하지만, 음악은 다른 콘텐츠와는 다르게 한 콘텐츠를 소비하는 시간이 짧고 감정 및 상황에 따라 하루에도 여러 번 소비성향이 달라지므로, 기존 추천 방식은 적합하지 않다. 즉, 콘텐츠 기반 추천과 협업 필터링 기반 추천은 '필터버블(Filter Bubble)' 현상, 마치 거품(Bubble)처럼 사용자를 가둬버린 현상이 나타나 새로운 콘텐츠, 다양한 문화 향유를 불가능하게 한다.However, music is different from other content in that the time spent consuming a single content is short and the consumption tendency changes several times a day depending on emotions and situations, so the existing recommendation method is not suitable. In other words, content-based recommendations and collaborative filtering-based recommendations cause the 'filter bubble' phenomenon, which traps users like a bubble, making it impossible to enjoy new content and diverse cultures.

최근에는 사용자의 상황에 따른 차별화된 음악을 추천하는 선행기술들이 제시되고 있지만, 사용자의 현재 상황을 정확하게 파악하기 어렵고 상황 변화에 여전히 대응하기 어려워 사용자의 상황에 맞는 음악 추천의 정확도가 떨어지고 사용자의 니즈를 충족시키지 못하는 문제점이 있다.Recently, prior technologies have been proposed that recommend differentiated music according to the user's situation, but there is a problem that the accuracy of music recommendations that are suitable for the user's situation is low and they do not meet the user's needs because it is difficult to accurately understand the user's current situation and it is still difficult to respond to changes in the situation.

한국등록특허 제10-1943638호 (2019.01.23)Korean Patent Registration No. 10-1943638 (2019.01.23)

본 발명의 일 실시예는 인공지능(AI)을 기반으로 사용자의 상황과 음악 감정을 분석하여 상황과 감정에 어울리는 사용자 맞춤 음악을 추천하는 서비스를 제공할 수 있는 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치를 제공하고자 한다.One embodiment of the present invention is to provide an artificial intelligence (AI)-based customized music recommendation service device capable of providing a service that recommends customized music that suits the situation and emotion by analyzing the user's situation and musical emotion based on artificial intelligence (AI).

본 발명의 일 실시예는 모바일기기의 센서 데이터를 활용하여 상황 확률 추론을 통해 사용자의 현 상황을 유추하여 상황에 맞는 음악을 추천할 수 있는 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치를 제공하고자 한다.One embodiment of the present invention provides an artificial intelligence-based user-tailored music recommendation service device that can infer a user's current situation through situation probability inference using sensor data of a mobile device and recommend music suitable for the situation.

본 발명의 일 실시예는 사용자의 음악 청취 이력을 기초로 음악 감정의 패턴 분석을 통해 사용자의 감정을 유추하여 감정에 맞는 음악을 추천할 수 있는 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치를 제공하고자 한다.One embodiment of the present invention provides an artificial intelligence-based customized music recommendation service device that can infer a user's emotions through pattern analysis of music emotions based on the user's music listening history and recommend music that matches the emotions.

실시예들 중에서, 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치는 인공지능(AI)을 기반으로 음원을 분석하여 라벨링(labeling)된 장르를 BPM, 악기구성 및 음향특성을 기초로 세분화하고 음악 감정을 분류하여 라벨을 갱신하여 음악 데이터베이스를 구축하는 음악 데이터베이스 구축부; 사용자 단말의 센서 또는 외부 API로부터 데이터를 수집하고 수집된 데이터를 의미 변환을 통하여 사용자의 상황과 관련된 메타 데이터를 생성하여 사용자의 상황을 추론하는 사용자 상황 분석부; 상기 사용자의 상황을 감정과 연결하여 상기 사용자의 감정을 추론하는 사용자 감정 분석부; 상기 사용자의 상황에 기초한 선호 음악과 상기 사용자의 감정에 기초한 선호 음악을 각각 예측하고 예측된 선호 음악들 간의 매핑을 통해 추천할 음악을 결정하는 선호 음악 예측부; 및 상기 사용자 단말에 추천 음악 리스트를 제공하고 추천 음악 리스트 중 사용자에 의해 선택되는 음악을 상기 음악 데이터베이스로부터 선별하여 상기 사용자 단말에 제공하는 추천 음악 제공부를 포함한다.Among the embodiments, the AI-based customized music recommendation service device includes a music database construction unit that analyzes sound sources based on AI, segments labeled genres based on BPM, instrument composition, and acoustic characteristics, classifies music emotions, and updates labels to construct a music database; a user situation analysis unit that collects data from a sensor of a user terminal or an external API, generates metadata related to a user's situation through semantic conversion of the collected data, and infers the user's situation; a user emotion analysis unit that infers the user's emotion by linking the user's situation to emotion; a preference music prediction unit that predicts preferred music based on the user's situation and preferred music based on the user's emotion, and determines music to recommend through mapping between the predicted preferred music; and a recommendation music provision unit that provides a recommended music list to the user terminal, selects music selected by the user from the recommended music list from the music database, and provides the music to the user terminal.

상기 음악 데이터베이스 구축부는 상기 음원의 스펙트럼 양상을 파악하고 비트 추출 처리를 통해 최초 추출된 비트와 최종 추출된 비트의 간격을 추출하고 보간 처리를 통해 상기 BPM를 계산하는 단계; 딥러닝의 악기 분류 모델을 통해 상기 음원에 사용된 악기의 구성을 추론하는 단계; 상기 음원에 대해 파이썬의 스펙트럼 추출 라이브러리를 통해 음향 특성을 추출하는 단계; 및 상기 음원의 BPM, 악기구성 및 음향특성을 데이터셋으로 생성하여 라벨링된 음악 장르를 세분화하여 분류하는 단계를 포함할 수 있다.The above music database construction unit may include a step of identifying the spectrum aspect of the sound source, extracting the interval between the initially extracted bit and the final extracted bit through bit extraction processing, and calculating the BPM through interpolation processing; a step of inferring the composition of the instrument used in the sound source through a deep learning instrument classification model; a step of extracting acoustic characteristics of the sound source through a Python spectrum extraction library; and a step of creating a dataset of the BPM, instrument composition, and acoustic characteristics of the sound source and classifying the labeled music genre.

상기 악기 분류 모델은 악기별 신호의 스펙트럼을 해닝윈도우 함수를 통해 특성 값을 추출하고 베이즈(Bayes') 분류 알고리즘을 통해 악기 소리를 구분하는 특징을 찾아 악기의 사용 확률을 수학적으로 판단하도록 학습된 모델일 수 있다.The above musical instrument classification model may be a model trained to extract characteristic values from the spectrum of the signal for each instrument through a Hanning window function, find features that distinguish the sound of the instrument through a Bayesian classification algorithm, and mathematically determine the probability of use of the instrument.

상기 음악 데이터베이스 구축부는 상기 음원의 템포, 역동성, 잡음, 진폭변화, 밝기에 대한 데이터를 추출하고 추출된 각 데이터의 평균과 표준편차의 값을 기초로 정규분포화하여 감정을 확률값으로 구하거나 딥러닝의 2차원 CNN 음악 감정 분류 모델을 통해 상기 음원의 특징을 추출하고 추출된 특징을 감정 벡터값으로 변환하여 음악 감정을 분류할 수 있다.The above music database construction unit extracts data on tempo, dynamics, noise, amplitude change, and brightness of the sound source, and normalizes the values of the average and standard deviation of each extracted data to obtain a probability value of emotion, or extracts the features of the sound source through a two-dimensional CNN music emotion classification model of deep learning, and converts the extracted features into emotion vector values to classify music emotions.

상기 사용자 상황 분석부는 사용자의 상황 판단 기준이 되는 사용자 선호 음악, 활동 상태, 장소 및 시간을 포함하는 메타 데이터를 사전 설정하고 설정된 메타 데이터의 생성에 필요한 로우 데이터를 상기 사용자 단말의 센서나 외부 API로부터 수집할 수 있다.The above user situation analysis unit can preset metadata including user preferred music, activity status, location and time, which serve as criteria for judging the user's situation, and collect raw data required for generating the set metadata from a sensor of the user terminal or an external API.

상기 사용자 상황 분석부는 상기 수집된 로우 데이터를 POI(Point of Interest) 군집화, 나이브 베이즈, 계층적 베이지안 네트워크를 활용한 상황 확률 추론을 통해 상기 사용자의 상황 분석을 수행할 수 있다.The above user situation analysis unit can perform situation analysis of the user through situation probability inference using POI (Point of Interest) clustering, naive Bayes, and hierarchical Bayesian network on the collected raw data.

상기 사용자 감정 분석부는 사용자의 최근 음악 청취이력을 기초로 청취 음악들의 감정 특징벡터에 대한 가우시안 혼합 분포를 구하고 그중 가장 큰 확률 값을 가지는 감정을 상기 사용자의 감정으로 결정할 수 있다.The above user sentiment analysis unit can obtain a Gaussian mixture distribution for the sentiment feature vectors of the music listened to based on the user's recent music listening history, and determine the sentiment with the largest probability value as the user's sentiment.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology may have the following effects. However, this does not mean that a specific embodiment must include all or only the following effects, and thus the scope of the disclosed technology should not be construed as being limited thereby.

본 발명의 일 실시예에 따른 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치는 인공지능(AI)을 기반으로 사용자의 상황과 음악 감정을 분석하여 상황과 감정에 어울리는 사용자 맞춤 음악을 추천하는 서비스를 제공할 수 있다.An AI-based customized music recommendation service device according to one embodiment of the present invention can provide a service that recommends customized music that suits the user's situation and emotions by analyzing the user's situation and musical emotions based on artificial intelligence (AI).

본 발명의 일 실시예에 따른 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치는 모바일기기의 센서 데이터를 활용하여 상황 확률 추론을 통해 사용자의 현 상황을 유추하여 상황에 맞는 음악을 추천할 수 있다.An artificial intelligence-based customized music recommendation service device according to one embodiment of the present invention can infer the user's current situation through situation probability inference using sensor data of a mobile device and recommend music suitable for the situation.

본 발명의 일 실시예에 따른 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치는 사용자의 음악 청취 이력을 기초로 음악 감정의 패턴 분석을 통해 사용자의 감정을 유추하여 감정에 맞는 음악을 추천할 수 있다.An artificial intelligence-based user-tailored music recommendation service device according to one embodiment of the present invention can infer a user's emotions by analyzing patterns of musical emotions based on the user's music listening history and recommend music that matches the emotions.

도 1은 본 발명에 따른 음악 추천 서비스 시스템을 설명하는 도면이다.
도 2는 도 1의 음악 추천 서비스 장치의 시스템 구성을 설명하는 도면이다.
도 3은 도 1의 음악 추천 서비스 장치의 기능적 구성을 설명하는 도면이다.
도 4는 본 발명에 따른 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 방법을 설명하는 순서도이다.
도 5는 본 발명에 따른 음원특성 기반 감정분류 방법의 일 실시예를 설명하는 도면이다.
도 6은 본 발명에 따른 음원특성 기반 감정분류 방법의 다른 일 실시예를 설명하는 도면이다.Figure 1 is a drawing explaining a music recommendation service system according to the present invention.
Fig. 2 is a drawing explaining the system configuration of the music recommendation service device of Fig. 1.
Fig. 3 is a drawing explaining the functional configuration of the music recommendation service device of Fig. 1.
Figure 4 is a flowchart illustrating a method for providing a user-tailored music recommendation service based on artificial intelligence according to the present invention.
FIG. 5 is a drawing explaining one embodiment of a sound source characteristic-based emotion classification method according to the present invention.
FIG. 6 is a drawing explaining another embodiment of a sound source characteristic-based emotion classification method according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다. 또한, 본 발명에서 제시된 목적 또는 효과는 특정 실시예가 이를 전부 포함하여야 한다거나 그러한 효과만을 포함하여야 한다는 의미는 아니므로, 본 발명의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The description of the present invention is only an embodiment for structural and functional explanation, so the scope of the rights of the present invention should not be construed as being limited by the embodiments described in the text. That is, since the embodiments can be variously modified and can have various forms, the scope of the rights of the present invention should be understood to include equivalents that can realize the technical idea. In addition, the purpose or effect presented in the present invention does not mean that a specific embodiment must include all of them or only such effects, so the scope of the rights of the present invention should not be understood as being limited thereby.

한편, 본 출원에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.Meanwhile, the meanings of the terms described in this application should be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Terms such as "first", "second", etc. are intended to distinguish one component from another, and the scope of the rights should not be limited by these terms. For example, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.When it is said that a component is "connected" to another component, it should be understood that it may be directly connected to that other component, but there may also be other components in between. On the other hand, when it is said that a component is "directly connected" to another component, it should be understood that there are no other components in between. Meanwhile, other expressions that describe the relationship between components, such as "between" and "directly between" or "adjacent to" and "directly adjacent to", should be interpreted similarly.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 실시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.A singular expression should be understood to include the plural expression unless the context clearly indicates otherwise, and terms such as "comprises" or "have" should be understood to specify the presence of a feature, number, step, operation, component, part, or combination thereof, but not to exclude the possibility of the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identifiers (e.g., a, b, c, etc.) are used for convenience of explanation and do not describe the order of each step, and each step may occur in a different order than stated unless the context clearly indicates a specific order. That is, each step may occur in the same order as stated, may be performed substantially simultaneously, or may be performed in the opposite order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be implemented as a computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all kinds of recording devices that store data that can be read by a computer system. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. In addition, the computer-readable recording medium can be distributed over network-connected computer systems, so that the computer-readable code can be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.All terms used herein, unless otherwise defined, have the same meaning as commonly understood by a person of ordinary skill in the art to which the present invention belongs. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the contextual meaning of the relevant art, and shall not be interpreted as having an ideal or overly formal meaning unless explicitly defined in this application.

도 1은 본 발명에 따른 음악 추천 서비스 시스템을 설명하는 도면이다.Figure 1 is a drawing explaining a music recommendation service system according to the present invention.

도 1을 참조하면, 음악 추천 서비스 시스템(100)은 사용자 단말(110), 음악 추천 서비스 장치(130) 및 데이터베이스(150)를 포함하여 구현될 수 있다.Referring to FIG. 1, a music recommendation service system (100) can be implemented including a user terminal (110), a music recommendation service device (130), and a database (150).

사용자 단말(110)은 음악을 재생할 수 있는 장치로서 인공지능 기반의 사용자 맞춤형 음악 추천 서비스를 이용할 수 있는 컴퓨팅 장치에 해당할 수 있고, 스마트폰, 노트북 또는 태블릿 PC로 구현될 수 있으며, 반드시 이에 한정되지 않고, 커넥티트카 등을 포함하여 다양한 모바일기기로도 구현될 수 있다. 여기에서, 사용자 단말(110)은 동작센서, 위치센서, 환경센서 등의 각종 센서를 포함하여 사용자의 상황 분석에 필요한 로우(row) 데이터를 제공할 수 있다. 사용자 단말(110)은 음악 추천 서비스 장치(130)와 네트워크를 통해 연결될 수 있고, 복수의 사용자 단말(1100들은 음악 추천 서비스 장치(130)와 동시에 연결될 수 있다.The user terminal (110) is a device capable of playing music, and may correspond to a computing device capable of using an artificial intelligence-based customized music recommendation service, and may be implemented as a smart phone, a laptop, or a tablet PC, and is not necessarily limited thereto, and may also be implemented as various mobile devices including a connected car, etc. Here, the user terminal (110) may include various sensors such as a motion sensor, a position sensor, and an environment sensor to provide raw data necessary for analyzing the user's situation. The user terminal (110) may be connected to a music recommendation service device (130) through a network, and a plurality of user terminals (1100) may be connected to the music recommendation service device (130) simultaneously.

음악 추천 서비스 장치(130)는 사용자 단말(110)의 각종 센서 정보를 통해 사용자의 상황 정보를 수집하여 사용자의 현재 상황을 이해(Context Awareness)하고 사용자의 상황 및 청취 이력에 따른 감정을 유추하여 상황과 감정에 어울리는 음악을 추천할 수 있는 컴퓨터 또는 프로그램에 해당하는 서버로 구현될 수 있다. 음악 추천 서비스 장치(130)는 사용자 단말(110)과 유선 네트워크 또는 블루투스, WiFi, LTE 등과 같은 무선 네트워크로 연결될 수 있고, 네트워크를 통해 사용자 단말(110)과 데이터를 송·수신할 수 있다.The music recommendation service device (130) may be implemented as a server corresponding to a computer or program that collects user context information through various sensor information of the user terminal (110) to understand the user's current situation (Context Awareness), infers emotions based on the user's situation and listening history, and recommends music that matches the situation and emotions. The music recommendation service device (130) may be connected to the user terminal (110) through a wired network or a wireless network such as Bluetooth, WiFi, LTE, etc., and may transmit and receive data with the user terminal (110) through the network.

일 실시예에서, 음악 추천 서비스 장치(130)는 데이터베이스(150)와 연동하여 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 제공을 위해 필요한 데이터를 저장할 수 있다. 한편, 음악 추천 서비스 장치(130)는 도 1과 달리, 데이터베이스(150)를 내부에 포함하여 구현될 수 있다. 또한, 음악 추천 서비스 장치(130)는 음악 추천 서비스 제공을 위한 API 모듈을 포함하여 구현될 수 있다. 보다 구체적으로, 음악 추천 서비스 장치(130)는 사용자 프로파일, 음악 프로파일 및 추천 음악에 관한 정보들에 접근 가능한 API 인터페이스를 외부 시스템에게 제공할 수 있고, 외부 시스템은 API 인터페이스를 통해 음악 추천 서비스 장치(130)에서 관리하는 다양한 데이터들에 독립적으로 접근 가능할 수 있다.In one embodiment, the music recommendation service device (130) may store data necessary for providing an artificial intelligence-based customized music recommendation service in conjunction with a database (150). Meanwhile, unlike FIG. 1, the music recommendation service device (130) may be implemented by including the database (150) therein. In addition, the music recommendation service device (130) may be implemented by including an API module for providing a music recommendation service. More specifically, the music recommendation service device (130) may provide an API interface that can access information about a user profile, a music profile, and recommended music to an external system, and the external system may independently access various data managed by the music recommendation service device (130) through the API interface.

데이터베이스(150)는 음악 추천 서비스 장치(130)의 동작 과정에서 필요한 다양한 정보들을 저장하는 저장장치에 해당할 수 있다. 데이터베이스(150)는 사용자의 음악 청취 이력에 관한 정보를 저장할 수 있고, 사용자 단말(110)의 센서 데이터를 이용한 상황 확률 추론을 위한 정보를 저장할 수 있으며, 반드시 이에 한정되지 않고, 음악 추천 서비스 장치(130)가 인공지능 기반의 사용자 맞춤형 음악 추천을 서비스하는 과정에서 다양한 형태로 수집 또는 가공된 정보들을 저장할 수 있다.The database (150) may correspond to a storage device that stores various information required in the operation process of the music recommendation service device (130). The database (150) may store information on the user's music listening history, and may store information for situation probability inference using sensor data of the user terminal (110). However, the database is not necessarily limited thereto, and may store information collected or processed in various forms in the process of the music recommendation service device (130) providing a user-tailored music recommendation service based on artificial intelligence.

도 2는 도 1의 음악 추천 서비스 장치의 시스템 구성을 설명하는 도면이다.Fig. 2 is a drawing explaining the system configuration of the music recommendation service device of Fig. 1.

도 2를 참조하면, 음악 추천 서비스 장치(130)는 프로세서(210), 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)를 포함할 수 있다.Referring to FIG. 2, the music recommendation service device (130) may include a processor (210), a memory (230), a user input/output unit (250), and a network input/output unit (270).

프로세서(210)는 음악 추천 서비스 장치(130)가 동작하는 과정에서의 각 단계들을 처리하는 프로시저를 실행할 수 있고, 그 과정 전반에서 읽혀지거나 작성되는 메모리(230)를 관리할 수 있으며, 메모리(230)에 있는 휘발성 메모리와 비휘발성 메모리 간의 동기화 시간을 스케줄 할 수 있다. 프로세서(210)는 음악 추천 서비스 장치(130)의 동작 전반을 제어할 수 있고, 메모리(230), 사용자 입출력부(250) 및 네트워크 입출력부(270)와 전기적으로 연결되어 이들 간의 데이터 흐름을 제어할 수 있다. 프로세서(210)는 음악 추천 서비스 장치(130)의 CPU(Central Processing Unit)로 구현될 수 있다.The processor (210) can execute a procedure for processing each step in the process in which the music recommendation service device (130) operates, manage the memory (230) that is read or written throughout the process, and schedule a synchronization time between the volatile memory and the non-volatile memory in the memory (230). The processor (210) can control the overall operation of the music recommendation service device (130), and is electrically connected to the memory (230), the user input/output unit (250), and the network input/output unit (270) to control the data flow therebetween. The processor (210) can be implemented as a CPU (Central Processing Unit) of the music recommendation service device (130).

메모리(230)는 SSD(Solid State Disk) 또는 HDD(Hard Disk Drive)와 같은 비휘발성 메모리로 구현되어 음악 추천 서비스 장치(130)에 필요한 데이터 전반을 저장하는데 사용되는 보조기억장치를 포함할 수 있고, RAM(Random Access Memory)과 같은 휘발성 메모리로 구현된 주기억장치를 포함할 수 있다.The memory (230) may include an auxiliary memory device implemented as a non-volatile memory such as an SSD (Solid State Disk) or an HDD (Hard Disk Drive) and used to store all data required for the music recommendation service device (130), and may include a main memory device implemented as a volatile memory such as a RAM (Random Access Memory).

사용자 입출력부(250)은 사용자 입력을 수신하기 위한 환경 및 사용자에게 특정 정보를 출력하기 위한 환경을 포함할 수 있다. 예를 들어, 사용자 입출력부(250)는 터치 패드, 터치 스크린, 화상 키보드 또는 포인팅 장치와 같은 어댑터를 포함하는 입력장치 및 모니터 또는 터치 스크린과 같은 어댑터를 포함하는 출력장치를 포함할 수 있다. 일 실시예에서, 사용자 입출력부(250)은 원격 접속을 통해 접속되는 컴퓨팅 장치에 해당할 수 있고, 그러한 경우, 음악 추천 서비스 장치(130)는 독립적인 서버로서 수행될 수 있다.The user input/output unit (250) may include an environment for receiving user input and an environment for outputting specific information to the user. For example, the user input/output unit (250) may include an input device including an adapter such as a touch pad, a touch screen, a visual keyboard, or a pointing device, and an output device including an adapter such as a monitor or a touch screen. In one embodiment, the user input/output unit (250) may correspond to a computing device connected via a remote connection, and in such a case, the music recommendation service device (130) may be performed as an independent server.

네트워크 입출력부(270)은 네트워크를 통해 사용자 단말(110)과 연결되기 위한 통신 환경을 제공하고, 예를 들어, LAN(Local Area Network), MAN(Metropolitan Area Network), WAN(Wide Area Network) 및 VAN(Value Added Network) 등의 통신을 위한 어댑터를 포함할 수 있다. The network input/output unit (270) provides a communication environment for connection to a user terminal (110) via a network, and may include an adapter for communication such as, for example, a LAN (Local Area Network), a MAN (Metropolitan Area Network), a WAN (Wide Area Network), and a VAN (Value Added Network).

도 3은 도 1의 음악 추천 서비스 장치의 기능적 구성을 설명하는 도면이다.Fig. 3 is a drawing explaining the functional configuration of the music recommendation service device of Fig. 1.

도 3을 참조하면, 음악 추천 서비스 장치(130)는 음악 데이터베이스 구축부(310), 사용자 상황 분석부(330), 사용자 감정 분석부(350), 선호 음악 예측부(370), 추천 음악 제공부(390) 및 제어부(도 3에 미도시함)를 포함할 수 있다.Referring to FIG. 3, the music recommendation service device (130) may include a music database construction unit (310), a user situation analysis unit (330), a user emotion analysis unit (350), a preferred music prediction unit (370), a recommended music provision unit (390), and a control unit (not shown in FIG. 3).

음악 데이터베이스 구축부(310)는 인공지능(AI)을 기반으로 음원을 분석하여 음악 데이터베이스를 구축할 수 있다. 현재 국내의 대규모 음원제공 업체의 경우 앨범이름, 음원제목, 가수, 발매일 등 간단한 메타정보와 단순 장르만 라벨링(labeling) 되어 있는 경우가 많아 음악을 추천하는 데 제약이 있다. 음악 데이터베이스 구축부(310)는 음원특성을 분석하고 분석된 음원특성을 기초로 딥러닝 모델을 통해 장르 및 감정을 분류하여 음악 데이터베이스를 구축할 수 있다. 일 실시예에서, 음악 데이터베이스 구축부(310)는 음원에 대해 BPM(Beats Per Minute)과 악기 구성 및 기타 음향특성을 분석하고 딥러닝 모델을 통해 장르를 분류할 수 있다. 여기에서, BPM은 음악에서 템포를 표시하는 단위이다. 오디오 파일은 스펙트럼으로의 변환이 가능하다. 따라서, 음악 데이터베이스 구축부(310)는 음원의 스펙트럼 양상을 파악하고 비트 추출 처리를 통해 최초 추출된 비트와 최종 추출된 비트의 간격을 추출하고 보간 처리를 통해 BMP를 자동 계산할 수 있다. The music database construction unit (310) can analyze sound sources based on artificial intelligence (AI) to construct a music database. Currently, in the case of large-scale domestic sound source providers, only simple meta information such as album name, sound source title, singer, release date, and simple genre are often labeled, which limits music recommendation. The music database construction unit (310) can analyze sound source characteristics and classify genres and emotions based on the analyzed sound source characteristics through a deep learning model to construct a music database. In one embodiment, the music database construction unit (310) can analyze BPM (Beats Per Minute), instrument composition, and other acoustic characteristics of the sound source and classify genres through a deep learning model. Here, BPM is a unit that indicates tempo in music. Audio files can be converted into spectra. Therefore, the music database construction unit (310) can identify the spectrum aspect of the sound source, extract the interval between the initially extracted bit and the finally extracted bit through bit extraction processing, and automatically calculate BMP through interpolation processing.

또한, 음악 데이터베이스 구축부(310)는 딥러닝 모델을 통해 음원의 악기 구성을 추론할 수 있다. 음원에 있어서 악기 구성은 음원의 분위기를 특정 짓고 세부 장르를 구별하는 지표가 될 수 있다. 여기에서, 딥러닝을 이용한 악기 분류 모델은 악기별 신호의 스펙트럼을 해닝윈도우 함수를 통해 특성 값을 추출하고 베이즈(Bayes') 분류 알고리즘을 통해 악기 소리를 구분하는 특징을 찾아 어떤 악기를 사용한 음원인지 수학적으로 판단할 수 있도록 학습된 모델이다. 음악 데이터베이스 구축부(310)는 딥러닝 악기 분류 모델을 통해 음원에 사용된 악기에 대해 존재 유무(확률)를 0부터 1까지의 실수로 표기된 데이터를 획득하여 음원의 악기 구성을 추론할 수 있다. In addition, the music database construction unit (310) can infer the instrument composition of the sound source through the deep learning model. In the sound source, the instrument composition can be an indicator that specifies the mood of the sound source and distinguishes the detailed genre. Here, the instrument classification model using deep learning is a model that has been trained to extract characteristic values through the spectrum of the signal of each instrument through the Hanning window function, find features that distinguish the sound of the instrument through the Bayes' classification algorithm, and mathematically determine which instrument was used in the sound source. The music database construction unit (310) can infer the instrument composition of the sound source by obtaining data in which the presence or absence (probability) of the instrument used in the sound source is expressed as a real number from 0 to 1 through the deep learning instrument classification model.

또한, 음원 데이터베이스 구축부(310)는 음원에 대해 파이썬의 스펙트럼 추출 라이브러리를 이용해 MFCC(Mel Frequency Cepstral Coefficient), Mel-spectrogram, centroid, 크로마그램(chromagram) 등 음향적 특징을 추출할 수 있다. 음원 데이터베이스 구축부(310)는 음원의 BPM, 악기구성, 음향특성을 데이터셋으로 생성하고 생성된 데이터셋과 기존 장르 구분 데이터를 레이블로 하여 딥러닝 모델 학습을 수행하여 음악장르를 기존 대비 세분화하여 분류할 수 있다. In addition, the sound source database construction unit (310) can extract acoustic features such as MFCC (Mel Frequency Cepstral Coefficient), Mel-spectrogram, centroid, and chromagram using Python's spectrum extraction library for the sound source. The sound source database construction unit (310) can create a data set of BPM, instrument composition, and acoustic characteristics of the sound source, and perform deep learning model training using the created data set and existing genre classification data as labels to classify the music genre in more detail than before.

또한, 음원 데이터베이스 구축부(310)는 음원 특성을 기초로 음악 감정을 분류할 수 있다. 일 실시예에서, 음원 데이터베이스 구축부(310)는 음원의 템포, 역동성, 잡음, 진폭변화, 밝기에 대한 데이터를 추출하고 추출된 각 데이터의 평균과 표준편차 값을 이용해 정규분포화하여 각 감정을 확률로 구할 수 있다. 다른 일 실시예에서, 음원 데이터베이스 구축부(310)는 딥러닝의 2차원 CNN 모델을 통해 음원의 특징을 추출하고 추출된 특징을 감정벡터로 변환하여 음원이 나타내는 감정을 분류할 수 있다. In addition, the sound source database construction unit (310) can classify music emotions based on sound source characteristics. In one embodiment, the sound source database construction unit (310) can extract data on tempo, dynamics, noise, amplitude change, and brightness of the sound source, and can obtain each emotion as a probability by normalizing the average and standard deviation values of each extracted data. In another embodiment, the sound source database construction unit (310) can extract the features of the sound source through a two-dimensional CNN model of deep learning, and convert the extracted features into an emotion vector to classify the emotions expressed by the sound source.

또한, 음원 데이터베이스 구축부(310)는 SNS에서 사용자의 상황 및 감정을 분석하여 음악 분류에 적용할 수 있다. 음원 데이터베이스 구축부(310)는 뉴스피드, 소셜데이터, 공개 API에 제공하는 사이트가 공개한 상황, 감정, 스타일태그 데이터를 수집하고 분석하면 음악에 대한 사용자의 상황이나 감정을 파악할 수 있다. 일 실시예에서, 음원 데이터베이스 구축부(310)는 소셜미디어 언급 데이터를 수집하여 기계학습을 통해 라벨링하고, 음원과 가수가 언급된 데이터의 형태소 분석, 문장과 키워드 간 유사도, 유의어, 맥락에 따른 점수를 부여하고 베이즈 네트워크 학습을 통해 데이터 정렬 및 필터링하여 빈도수를 추출하고 분석할 수 있다. 이때, 소셜미디어에서 수집되는 데이터는 언급 횟수 및 게시된 곳의 영향력에 따라 신뢰도가 달라질 수 있기 때문에 해당 사이트가 얼마나 인용되었는지, 해당 사이트(글)의 구독자의 영향력 및 구독자 수 등을 고려하여 소셜미디어 언급 수집데이터 모델을 구축할 수 있다. 이는 다음의 수학식 1과 같이 정의될 수 있다.In addition, the sound source database construction unit (310) can analyze the user's situation and emotion on SNS and apply it to music classification. The sound source database construction unit (310) can identify the user's situation or emotion regarding music by collecting and analyzing the situation, emotion, and style tag data disclosed by the site providing newsfeed, social data, and public API. In one embodiment, the sound source database construction unit (310) can collect social media mention data, label it through machine learning, analyze the morphology of the data mentioning the sound source and the singer, assign a score according to the similarity between sentences and keywords, synonyms, and context, and sort and filter the data through Bayesian network learning to extract and analyze the frequency. At this time, since the reliability of the data collected from social media can vary depending on the number of mentions and the influence of the posting site, a social media mention collection data model can be constructed by considering how much the site was cited, the influence of the subscribers of the site (article), and the number of subscribers. This can be defined as in the following mathematical expression 1.

[수학식 1][Mathematical formula 1]

여기에서, 는 특정 사이트에 대한 전체 언급(인용) 수, 는 음원이 언급된 링크 데이터 수, 는 특정 사이트 인용자(사이트)의 트래픽 수, 는 특정 사이트 트래픽 수이다.Here, is the total number of mentions (citations) to a particular site, is the number of link data where the sound source is mentioned, is the number of traffic from a specific site citation (site), is the number of traffic to a specific site.

하기 표 1은 소셜미디어 언급 수집데이터 모델의 예시를 나타낸다.Table 1 below shows an example of a social media mention collection data model.

[표 1][Table 1]

사용자 상황 분석부(330)는 사용자 단말(110)의 센서 또는 외부 API로부터 데이터를 수집하여 사용자의 상황을 분석할 수 있다. 일 실시예에서, 사용자 상황 분석부(330)는 사용자의 상황 판단의 기준이 되는 메타 데이터를 사전 설정하고 설정된 메타 데이터를 생성하는 데 필요한 데이터를 수집할 수 있다. 여기에서, 메타 데이터는 사용자 선호도, 활동 상태, 장소, 시간 등이 포함될 수 있다. 사용자 선호도는 사용자가 선호하는 음악 종류이다. 사용자 활동 상태는 자동차 같은 운송수단 안에 있음(IN_VEHICLE), 자전거 위에 있음(ON_BICYLE), 걷거나 뛰고 있음(ON_FOOT), 뛰고 있음(RUNNING), 서있음(STILL), 높은 곳에 올라가거나 낮은 곳으로 내려가는 중력변화(TILTING), 걷기(WALKING), 알 수 없음(UNKNOWN) 등이 포함될 수 있고 특정 활동에 대해 시작(Start), 중지(Stop), 활동 중(During) 등의 상태 변화가 포함될 수 있다. 장소는 사용자 중심의 중요 장소와 서비스 중심의 특이 장소로 구분할 수 있고, 해당 장소의 접근 빈도, 시간, 위치 및 타입이 포함될 수 있으며, 해당 장소에서 나오는 경우(Exiting), 들어가는 경우(Entering), 해당 장소에 이미 있는 경우(In) 등의 상태 변화가 포함될 수 있다. 시간은 사용자가 특정 행위를 하는 시간, 요일, 매일, 특정 시간, 특정 시간 근처 등이 포함될 수 있고, 설정한 시간대 안에 들어올 때(IN)가 상태 변화로 포함될 수 있다.The user situation analysis unit (330) can collect data from a sensor of the user terminal (110) or an external API to analyze the user's situation. In one embodiment, the user situation analysis unit (330) can preset metadata that serve as a basis for judging the user's situation and collect data necessary to generate the set metadata. Here, the metadata may include user preference, activity status, location, time, etc. The user preference is the type of music that the user prefers. The user activity status may include being in a means of transportation such as a car (IN_VEHICLE), being on a bicycle (ON_BICYLE), walking or running (ON_FOOT), running (RUNNING), standing (STILL), gravity change such as going up to a high place or going down to a low place (TILTING), walking (WALKING), unknown (UNKNOWN), and the like, and may include status changes such as starting (Start), stopping (Stop), and being active (During) for a specific activity. Places can be divided into user-centric important places and service-centric special places, and can include access frequency, time, location, and type of the place, and can include status changes such as exiting the place, entering the place, and already being in the place (In). Time can include the time when the user performs a specific action, day of the week, every day, a specific time, or near a specific time, and can include entering (IN) within a set time zone as a status change.

사용자 상황 분석부(330)는 설정된 메타 데이터를 생성할 수 있는 로우(row) 데이터를 중심으로 사용자 단말(110)의 센서 또는 외부 API로부터 수집할 수 있다. 사용자 단말(110)에는 수많은 센서들이 내장될 수 있으며, 내장된 센서들을 통해 사용자 단말(110)의 동작, 위치 및 주변 환경 등을 측정할 수 있다. 예로 들어, 사용자 단말(110)에는 가속도 센서, 조도 센서, GPS 센서, 근접 센서, 마이크로폰 및 음향 센서 등이 될 수 있다. 일 실시예에서, 사용자 상황 분석부(330)는 사용자 단말(110)에 있는 센서들 중 설정된 메타 데이터 생성에 필요한 데이터만을 수집할 수 있고, 수집하는 데이터가 이전에 수집한 데이터와 비교하여 변화가 발생된 경우에만 데이터를 수집하여 통신 비용 감소, 처리할 데이터 용량의 경량화를 도모할 수 있다.The user situation analysis unit (330) can collect raw data that can generate set metadata from sensors of the user terminal (110) or an external API. The user terminal (110) can have numerous sensors built in, and the user terminal (110) can measure the operation, location, and surrounding environment of the user terminal (110) through the built-in sensors. For example, the user terminal (110) can have an acceleration sensor, a light sensor, a GPS sensor, a proximity sensor, a microphone, and an acoustic sensor. In one embodiment, the user situation analysis unit (330) can collect only the data necessary for generating the set metadata among the sensors in the user terminal (110), and collect data only when a change occurs in the collected data compared to previously collected data, thereby reducing communication costs and reducing the weight of the data capacity to be processed.

사용자 상황 분석부(330)는 사용자 단말(110)의 센서 데이터를 POI(Point of Interest) 군집화, 나이브 베이즈, 계층적 베이지안 네트워크를 활용한 상황 확률 추론을 통해 사용자의 상황 분석을 수행할 수 있다. 예를 들어, 사용자 상황 분석부(330)는 GPS 센서 정보, WiFi(SSID), 블루투스 센서 값의 학습을 통해 주 이동위치(집, 사무실 등), 날씨, 이동속도(운동, 드라이브 등), 계절, 예외적 상황 유무 등을 판단하여 사용자의 현재 상황이 수요일 오후 운동하는 가는 길임을 판단할 수 있다.The user situation analysis unit (330) can perform user situation analysis by using POI (Point of Interest) clustering, Naive Bayes, and situation probability inference using a hierarchical Bayesian network for the sensor data of the user terminal (110). For example, the user situation analysis unit (330) can determine the user's current situation as a way to exercise on Wednesday afternoon by judging the main movement location (home, office, etc.), weather, movement speed (exercise, drive, etc.), season, and presence of exceptional circumstances through learning GPS sensor information, WiFi (SSID), and Bluetooth sensor values.

사용자 감정 분석부(350)는 사용자의 상황을 감정과 연결하여 사용자의 감정을 추론할 수 있다. 일 실시예에서, 사용자 감정 분석부(350)는 사용자의 활동 상태를 분석하여 사용자의 현재 감정상태를 추론할 수 있다. 여기에서, 사용자 감정 분석부(350)는 사용자 상황 분석부(330)에서 생성된 사용자의 활동 상태, 장소, 시간 중 적어도 2개를 입력 데이터로 하는 미리 학습된 감정 분석 모델을 이용하여 사용자의 감정을 분석할 수 있다. The user emotion analysis unit (350) can infer the user's emotion by connecting the user's situation with the emotion. In one embodiment, the user emotion analysis unit (350) can infer the user's current emotional state by analyzing the user's activity state. Here, the user emotion analysis unit (350) can analyze the user's emotion by using a pre-learned emotion analysis model that uses at least two of the user's activity state, location, and time generated by the user situation analysis unit (330) as input data.

사용자 감정 분석부(350)는 사용자가 최근 청취한 음악 감정의 패턴 분석을 통해 사용자의 감정을 추론할 수도 있다. 일 실시예에서, 사용자 감정 분석부(350)는 사용자 청취 음악들의 감정 특징벡터에 대한 가우시안 혼합 분포(Gaussian mixture density)를 구하여 그 중 가장 큰 확률값을 가지는 GMM의 감정을 사용자의 감정으로 결정할 수 있다.The user emotion analysis unit (350) can also infer the user's emotion by analyzing the pattern of the user's recently listened music emotion. In one embodiment, the user emotion analysis unit (350) can obtain a Gaussian mixture density for the emotion feature vector of the user's listened music, and determine the emotion of the GMM with the largest probability value as the user's emotion.

선호 음악 예측부(370)는 사용자의 상황에 기초한 선호 음악과 사용자의 감정에 기초한 선호 음악을 각각 예측하고 예측된 선호 음악들 간의 매핑을 통해 추천할 음악을 결정할 수 있다. 일 실시예에서, 선호 음악 예측부(370)는 음악 데이터베이스 구축부(310)에 구축된 음악 데이터베이스에 저장된 음악들 중 사용자의 상황에 매칭되는 음원 특성 및 상황태그가 있는 장르의 음악을 N개(여기서, N은 자연수) 선택하여 제1 선호 음악 리스트를 생성할 수 있다. 선호 음악 예측부(370)는 음악 데이터베이스에 저장된 음악들 중 사용자의 감정에 매칭되는 음악 감정으로 분류된 음악을 N개 선택하여 제2 선호 음악 리스트를 생성할 수 있다. 선호 음악 예측부(370)는 제1 선호 음악 리스트와 제2 선호 음악 리스트를 병합하여 추천 음악 리스트를 결정할 수 있다.The preferred music prediction unit (370) can predict preferred music based on the user's situation and preferred music based on the user's emotion, respectively, and determine music to recommend through mapping between the predicted preferred music. In one embodiment, the preferred music prediction unit (370) can select N pieces of music (where N is a natural number) of genres with sound source characteristics and situation tags matching the user's situation from among the music stored in the music database constructed in the music database construction unit (310) to generate a first preferred music list. The preferred music prediction unit (370) can select N pieces of music classified with music emotions matching the user's emotions from among the music stored in the music database to generate a second preferred music list. The preferred music prediction unit (370) can determine a recommended music list by merging the first preferred music list and the second preferred music list.

추천 음악 제공부(390)는 사용자 단말(110)에 추천 음악 리스트를 제공하고 추천 음악 리스트 중 사용자에 의해 선택되는 음악을 음악 데이터베이스로부터 선별하여 사용자 단말(110)에 제공할 수 있다. 일 실시예에서, 추천 음악 제공부(390)는 사용자가 추천 음악을 청취하는 과정에서 사용자의 상황 또는 감정 변화가 있으면 변화에 맞춰 추천 음악 리스트를 갱신하여 사용자 단말(110)에 제공할 수 있다. The recommended music providing unit (390) can provide a recommended music list to the user terminal (110) and select music selected by the user from the recommended music list from the music database and provide it to the user terminal (110). In one embodiment, the recommended music providing unit (390) can update the recommended music list according to the change in the user's situation or emotion while the user is listening to the recommended music and provide it to the user terminal (110).

제어부(도 3에 미도시함)는 음악 추천 서비스 장치(130)의 전체적인 동작을 제어하고, 음악 데이터베이스 구축부(310), 사용자 상황 분석부(330), 사용자 감정 분석부(350), 선호 음악 예측부(370) 및 추천 음악 제공부(390) 간의 제어 흐름 또는 데이터 흐름을 관리할 수 있다.The control unit (not shown in FIG. 3) controls the overall operation of the music recommendation service device (130) and can manage the control flow or data flow between the music database construction unit (310), the user situation analysis unit (330), the user emotion analysis unit (350), the preferred music prediction unit (370), and the recommended music provision unit (390).

도 4는 본 발명에 따른 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 방법을 설명하는 순서도이다.Figure 4 is a flowchart illustrating a method for providing a user-tailored music recommendation service based on artificial intelligence according to the present invention.

도 4를 참조하면, 음악 추천 서비스 장치(130)는 음악 데이터베이스 구축부(310)를 통해 인공지능을 기반으로 음원을 분석하여 음악 데이터베이스를 구축할 수 있다(단계 S410). 음악 데이터베이스 구축부(310)는 딥러닝 모델을 통해 음원을 세분화된 장르 및 감정으로 분류하고 페이스북, 트위터, 커뮤니티댓글, 블로그, 뉴스 등의 소셜미디어에서 수집한 음원 언급 데이터 분석을 통해 상황, 감정, 스타일태그 데이터를 라벨링(labeling)하여 음악 데이터베이스를 구축할 수 있다. 이에 따라, 사용자의 니즈에 맞는 세분화된 음악 추천을 할 수 있다.Referring to FIG. 4, the music recommendation service device (130) can analyze sound sources based on artificial intelligence through the music database construction unit (310) to construct a music database (step S410). The music database construction unit (310) can classify sound sources into detailed genres and emotions through a deep learning model and label situation, emotion, and style tag data through analysis of sound source mention data collected from social media such as Facebook, Twitter, community comments, blogs, and news to construct a music database. Accordingly, detailed music recommendations that meet the user's needs can be made.

또한, 음악 추천 서비스 장치(130)는 사용자 상황 분석부(330)를 통해 사용자 단말(110)의 센서 데이터를 수집하여 상황 확률 추론을 통해 사용자의 현재 상황을 분석할 수 있다(단계 S430). 음악 추천 서비스 장치(130)는 사용자 감정 분석부(350)를 통해 사용자의 현재 상황 또는 최근 청취한 음악 감정의 패턴 분석을 통해 사용자의 감정을 분석할 수 있다(단계 S450). In addition, the music recommendation service device (130) can analyze the user's current situation through situation probability inference by collecting sensor data of the user terminal (110) through the user situation analysis unit (330) (step S430). The music recommendation service device (130) can analyze the user's emotions through pattern analysis of the user's current situation or recently listened music emotions through the user emotion analysis unit (350) (step S450).

음악 추천 서비스 장치(130)는 선호 음악 예측부(370)를 통해 사용자의 상황에 따른 제1 선호 음악 리스트와 사용자의 감정에 따라 제2 선호 음악 리스트를 각각 예측한 후 이를 병합하여 추천 음악 리스트를 결정할 수 있다(단계 S470). 음악 추천 서비스 장치(130)는 추천 음악 제공부(390)를 통해 사용자 상황과 감정에 맞춤형 추천 음악 리스트를 사용자 단말(110)에 제공한다(단계 S490).The music recommendation service device (130) can predict a first preferred music list according to the user's situation and a second preferred music list according to the user's emotions through the preferred music prediction unit (370), and then merge them to determine a recommended music list (step S470). The music recommendation service device (130) provides a recommended music list customized to the user's situation and emotions to the user terminal (110) through the recommended music provision unit (390) (step S490).

도 5는 본 발명에 따른 음원특성 기반 감정분류 방법의 일 실시예를 설명하는 도면이다.FIG. 5 is a drawing explaining one embodiment of a sound source characteristic-based emotion classification method according to the present invention.

도 5를 참조하면, 흥분(excited), 행복(happy), 기쁨(pleased), 짜증(annoying), 화남(angry) 등의 다양한 감정과 관련하여 12개의 라벨을 이용하여 약 2000개의 균일한 데이터를 준비할 수 있다. 그런 다음, 파이썬 라이브러리를 이용해 데이터셋에 포함된 음원의 템포, 역동성, 잡음, 진폭변화, 밝기에 대한 데이터를 추출하고 추출한 각 데이터의 평균과 표준편차 값을 이용하여 정규분포화 할 수 있다. 새로운 샘플 음원의 템포, 역동성, 잡음, 진폭변화, 밝기에 대한 평균과 표준편차 값을 사용하여 각 부분의 사전에 구해진 정규분포곡선에 대입하여 확률 값을 도출할 수 있다. 각 확률에 X축, Y축에 매겨진 가중치를 곱해 X(Arousal), Y(Valence) 좌표값을 구하고, 구해진 X, Y 좌표값을 중심으로 원을 그려 사전에 12개로 쪼개진 평면을 차지하는 넓이를 구해 각 감정의 확률을 도출할 수 있다. Referring to Fig. 5, about 2,000 uniform data can be prepared using 12 labels related to various emotions such as excited, happy, pleased, annoyed, and angry. Then, data on tempo, dynamics, noise, amplitude change, and brightness of the sound source included in the dataset can be extracted using the Python library, and the average and standard deviation values of each extracted data can be used to normalize the distribution. The average and standard deviation values of tempo, dynamics, noise, amplitude change, and brightness of the new sample sound source can be substituted into the previously obtained normal distribution curve of each part to derive the probability value. The X (Arousal), Y (Valence) coordinate values can be derived by multiplying each probability by the weights assigned to the X-axis and Y-axis, and the area occupied by the plane divided into 12 in advance by drawing a circle centered on the obtained X and Y coordinate values can be derived to derive the probability of each emotion.

도 6은 본 발명에 따른 음원특성 기반 감정분류 방법의 다른 일 실시예를 설명하는 도면이다.FIG. 6 is a drawing explaining another embodiment of a sound source characteristic-based emotion classification method according to the present invention.

도 6을 참조하면, 오디오 스트레오 파일을 모노 파일로 파일 변환한 후 푸리에 변환(Fourier Transform)을 통해 오디오 데이터를 주파수 성분으로 분해하고 적은 해상도로 스펙트로그램 변환과정을 수행하여 사람의 귀에 들리는 것과 가장 유사하게 나타낼 수 있는 2차원 데이터를 추출할 수 있다. 스펙트로그램 형태의 2차원 데이터를 전처리 한 후 음악 감정 분류 모델에 적용하여 특징을 추출하고 추출된 특징을 감정벡터로 변환하여 음원이 나타내는 감정을 분류할 수 있다. 여기에서, 음악 감정 분류 모델은 딥러닝의 2차원 CNN 모델이다. Referring to Fig. 6, after converting an audio stereo file into a mono file, the audio data is decomposed into frequency components through Fourier transform, and a spectrogram conversion process is performed with a small resolution to extract two-dimensional data that can most similarly represent what the human ear hears. After preprocessing the two-dimensional data in the form of a spectrogram, it is applied to a music emotion classification model to extract features, and the extracted features are converted into an emotion vector to classify the emotion expressed by the sound source. Here, the music emotion classification model is a two-dimensional CNN model of deep learning.

본 발명은 도 5에서 추출되는 감정의 확률 값과 도 6에서 추출되는 감정 벡터 값의 조합을 통해 감정 분류의 정확도를 상승시킬 수 있다.The present invention can increase the accuracy of emotion classification through a combination of the probability value of emotion extracted in FIG. 5 and the emotion vector value extracted in FIG. 6.

일 실시예에 따른 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 장치는 사용자의 상황과 감정을 예측하여 사용자 개인 맞춤형 음악 플레이리스트를 추천 서비스할 수 있어 사용자의 니즈를 충족시킬 수 있으며, 나아가 특정 인기있는 음악에 한정되지 않고 다양한 음악들을 소비자에게 맞춤형 소개할 수 있고 창작자들에게 창작활동의 알릴 기회를 제공할 수 있다.An AI-based personalized music recommendation service device according to one embodiment can recommend a personalized music playlist for a user by predicting the user's situation and emotions, thereby satisfying the user's needs. Furthermore, it can introduce various types of music to consumers in a personalized manner, not limited to specific popular music, and provide creators with an opportunity to promote their creative activities.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the present invention has been described above with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various modifications and changes may be made to the present invention without departing from the spirit and scope of the present invention as set forth in the claims below.

100: 인공지능 기반의 사용자 맞춤형 음악 추천 서비스 시스템
110: 사용자 단말 130: 음악 추천 서비스 장치
150: 데이터베이스
210: 프로세서 230: 메모리
250: 사용자 입출력부 270: 네트워크 입출력부
310: 음악 데이터베이스 구축부 330: 사용자 상황 분석부
350: 사용자 감정 분석부 370: 선호 음악 예측부
390: 추천 음악 제공부100: AI-based personalized music recommendation service system
110: User terminal 130: Music recommendation service device
150: Database
210: Processor 230: Memory
250: User I/O section 270: Network I/O section
310: Music database construction department 330: User situation analysis department
350: User sentiment analysis section 370: Preferred music prediction section
390: Recommended Music Provider

Claims

A music database construction department that analyzes sound sources based on artificial intelligence (AI), segments the labeled genres based on BPM, instrument composition, and acoustic characteristics, and classifies music emotions to update the labels to build a music database;
A user situation analysis unit that presets metadata including user-preferred music, activity status, location, and time that serve as criteria for judging the user's situation, collects raw data required for generating the set metadata from a sensor on the user's terminal or an external API, and generates metadata related to the user's situation through meaning conversion of the collected data to infer the user's situation;
A user sentiment analysis unit that infers the sentiment of the user by using a pre-learned sentiment analysis model that uses at least two of activity status, location, and time as input data in the user's situation;
A preference music prediction unit that predicts preferred music based on the user's situation and preferred music based on the user's emotion, and determines music to recommend through mapping between the predicted preferred music; and
Including a recommended music providing unit that provides a recommended music list to the user terminal and selects music selected by the user from the recommended music list from the music database and provides the music to the user terminal,
The above music database construction department
A step of identifying the spectral aspect of the sound source, extracting the interval between the initially extracted bit and the final extracted bit through bit extraction processing, calculating the BPM through interpolation processing, inferring the composition of the instrument used in the sound source through a deep learning instrument classification model, extracting acoustic characteristics of the sound source through Python's spectrum extraction library, and creating a dataset of the BPM, instrument composition, and acoustic characteristics of the sound source to classify and segment the labeled music genre; and
i) extracting data on tempo, noise, and amplitude changes of the sound source, and obtaining emotion as a probability value by normalizing the values of the average and standard deviation of each extracted data, extracting the features of the sound source through a two-dimensional CNN music emotion classification model of deep learning, converting the extracted features into emotion vector values, and classifying music emotions through a combination of the probability values of the emotions and the emotion vector values, or ii) collecting social media mention data, labeling it through machine learning, analyzing the morphology of data mentioning the sound source and the singer, assigning scores according to the similarity between sentences and keywords, synonyms, and context, and analyzing the user's situation and emotion regarding music through Bayesian network learning, and applying it to music classification.
The above user situation analysis section
Collect data only when changes have occurred compared to previously collected data.
The above preferred music prediction section
An artificial intelligence-based user-tailored music recommendation service device characterized in that a first preferred music list is created by selecting N pieces of music (where N is a natural number) of a genre having sound source characteristics and context tags matching the user's situation from among the music stored in the music database constructed in the music database construction unit, and a second preferred music list is created by selecting N pieces of music classified by music emotions matching the user's emotions from among the music stored in the music database, and a recommended music list is determined by merging the first preferred music list and the second preferred music list.

delete

In the first paragraph, the instrument classification model
An artificial intelligence-based customized music recommendation service device characterized by a model trained to extract characteristic values through the spectrum of the signal for each instrument through the Hanning window function, find features that distinguish the sound of the instrument through the Bayesian classification algorithm, and mathematically determine the probability of use of the instrument.

delete

In the first paragraph, the user situation analysis unit
An artificial intelligence-based customized music recommendation service device characterized in that it performs situation analysis of the user through situation probability inference using POI (Point of Interest) clustering, naive Bayes, and hierarchical Bayesian network on the collected raw data.

In the first paragraph, the user emotion analysis unit
An artificial intelligence-based personalized music recommendation service device characterized by calculating a Gaussian mixture distribution of emotional feature vectors of music listened to based on the user's recent music listening history and determining the emotion with the highest probability value as the user's emotion.