KR102907945B1

KR102907945B1 - Artificial intelligence based mididata generating solution providing system

Info

Publication number: KR102907945B1
Application number: KR1020250041068A
Authority: KR
Inventors: 김진갑
Original assignee: 김진갑
Priority date: 2025-03-31
Filing date: 2025-03-31
Publication date: 2026-01-02
Anticipated expiration: 2045-03-31

Abstract

AI 기반 미디데이터 추출 솔루션 제공 시스템이 제공되며, 노래방반주기에 입력될 미디파일의 미디데이터를 생성하기 위하여, 원곡음악파일을 선택하는 사용자 단말 및 원곡음악파일을 기 구축된 AI로 입력하여 미디파일로 변환한 후, 미디파일로부터 템포, 키, 조성, 곡구조, 음역대 및 장르를 특징 데이터로 추출하는 곡분석부, 기 구축된 미디데이터 데이터베이스의 미디데이터를 임베딩한 벡터와, 특징 데이터를 임베딩한 벡터 간 유사도를 측정한 후, 유사도를 기준으로 유사 미디데이터를 추출하며, 유사 미디데이터에 적용된 악기, 이퀄라이저 및 필터의 값의 패턴을 후보 미디데이터로 설정하는 설정부, 후보 미디데이터에 기 설정된 사운드 최적화 프로세스를 진행하여 원곡음악파일에 대한 미디데이터를 생성하는 생성부를 포함하는 추출 서비스 제공 서버를 포함한다.A system for providing an AI-based MIDI data extraction solution is provided, and includes an extraction service providing server including a user terminal for selecting an original music file in order to generate MIDI data of a MIDI file to be input into a karaoke accompaniment machine, a song analysis unit for inputting the original music file into a pre-built AI to convert it into a MIDI file, and then extracting tempo, key, composition, song structure, range, and genre as feature data from the MIDI file, a setting unit for measuring the similarity between a vector embedding MIDI data of a pre-built MIDI data database and a vector embedding feature data, and then extracting similar MIDI data based on the similarity, and a setting unit for setting the pattern of values of instruments, equalizers, and filters applied to the similar MIDI data as candidate MIDI data, and a generation unit for generating MIDI data for the original music file by performing a pre-set sound optimization process on the candidate MIDI data.

Description

AI-based MIDI data extraction solution provision system {ARTIFICIAL INTELLIGENCE BASED MIDIDATA GENERATING SOLUTION PROVIDING SYSTEM}

본 발명은 AI 기반 미디데이터 추출 솔루션 제공 시스템에 관한 것으로, 원곡음악파일으로부터 미디데이터를 자동으로 추출할 수 있는 방법을 제공한다.The present invention relates to a system for providing an AI-based MIDI data extraction solution, and provides a method for automatically extracting MIDI data from an original music file.

미디(Music Instrument Digital Interface, MIDI)란, 컴퓨터와 악기 간에 음악 데이터를 주고받는 디지털 프로토콜로, MP3나 WAV처럼 소리를 직접 저장하는 것이 아니라, 어떤 음을 언제, 얼마나 세게 연주할 것인가에 대한 연주 데이터를 저장하는 방식을 의미한다. 노래방반주기에 입력되는 미디는 실제 가수가 부르는 것과 같은 MP3나 WAV와 같은 MR(Music Recorded)이 들어가는 것이 아니라 미디규격의 데이터(파일)가 들어가게 된다. 가수가 이용하는 MR은 실제 일반인이 이용하다보면 보컬의 멜로디라인이 두드러지지 않고, 시작하는 부분을 혼동할 수 있어 노래를 부르기가 쉽지 않으며, 또 키조절과 속도조절도 불가하기 때문에 노래방반주기에서는 여전히 미디규격의 데이터(파일)가 이용되고 있다. 이러한 미디를 만들기 위해서는 원곡음악파일을 재생시켜 음악전공자가 일일이 귀로 청음하고, 악기별 악보를 전용 프로그램에 채보하며, 반주를 믹싱하는 방식으로 제작되고 있다.MIDI (Music Instrument Digital Interface) is a digital protocol for exchanging musical data between computers and musical instruments. Unlike MP3s or WAVs, which store sounds directly, MIDI stores performance data, such as which notes to play, when, and how loudly. The MIDI input into karaoke accompaniment devices isn't MR (Music Recorded) files like MP3s or WAVs, which are the same as what actual singers sing, but rather MIDI-standard data (files). The MR files used by singers can be difficult for the average person to sing, as the vocal melody line is not prominent, and the beginning part can be confusing. Furthermore, key and speed adjustments are not possible. Therefore, MIDI-standard data (files) are still used in karaoke accompaniment devices. To create these MIDI files, music majors listen to the original music files, transcribe the scores for each instrument into a dedicated program, and mix the accompaniment.

이때, 반주 데이터를 생성하거나 재생하는 방법이 연구 및 개발되었는데, 이와 관련하여, 선행기술인 한국등록특허 제10-2290901호(2021년08월19일 공고) 및 한국등록특허 제10-1881854호(2018년07월25일 공고)에는, 사용자 단말에서 노래를 요청하면, 노래의 가사라인별 시간정보 및 가사음절별 시간정보를 포함하는 가사포맷에 기초하여, 가사라인 사이에 출력될 TTS(Text-to-Speech) 데이터를 생성하고, TTS 데이터를 MR 데이터와 함께 반주 데이터로 생성하여 사용자 단말로 전송하며, 사용자 단말은 반주 데이터를 실행하는 구성과, 노래방반주기를 이용하지 않고도 미디를 재생할 수 있도록, 미디데이터와 가사데이터를 추출한 후 재생을 하도록 하는 구성이 각각 개시되어 있다.At this time, a method of generating or reproducing accompaniment data was studied and developed. In relation to this, prior art Korean Patent No. 10-2290901 (announced on August 19, 2021) and Korean Patent No. 10-1881854 (announced on July 25, 2018) disclose a configuration in which, when a song is requested from a user terminal, TTS (Text-to-Speech) data to be output between the lyrics lines is generated based on a lyrics format including time information for each lyric line of the song and time information for each syllable of the lyrics, TTS data is generated as accompaniment data together with MR data and transmitted to the user terminal, and the user terminal executes the accompaniment data, and a configuration in which MIDI data and lyrics data are extracted and then played so that MIDI can be played without using a karaoke accompaniment machine, respectively.

다만, 전자의 경우 반주음원인 MR을 생성하는 것이 아니라, 이미 MR 데이터가 생성된 이후의 구성만을 개시하고 있으며, 후자의 경우에도 MR 데이터가 이미 존재한 후, 이를 노래방반주기가 아닌 다른 기기에서도 출력할 수 있도록 하는 구성만을 개시할 뿐이다. 상술한 바와 같이 MR을 생성하기 위해서는 실제 원곡음악파일을 음악전공자들이 귀로 들어서 일일이 수동으로 제작되고 있기 때문에, 시간 및 비용이 과다하게 소모되며, 그마저도 결과의 품질이 각양각색으로 나오고 있다. 즉, 각 MR은 음악전공자의 개인적인 배경지식, 경력, 취향 및 의견이 반영될 수 밖에 없는데, 예를 들어, 편곡과정에서 긴 전주를 간소화하거나, 루바토(Rubato) 부분을 삭제하기도 하고, 까다로운 연주의 페이드아웃(Fade Out)이나 페르마타(Fermata)를 삭제하기도 한다. 이에, AI 기반으로 원곡음악파일을 미디데이터로 변환할 수 있는 시스템의 연구 및 개발이 요구된다.However, in the former case, it does not create an MR, which is an accompaniment sound source, but only discloses the configuration after the MR data has already been created. In the latter case, it only discloses the configuration that allows the MR data to be output on devices other than a karaoke accompaniment machine after it already exists. As mentioned above, in order to create an MR, the original music file is actually listened to by ear by music majors and is manually produced one by one, which consumes excessive time and money, and even then, the quality of the results varies widely. In other words, each MR inevitably reflects the music major's personal background knowledge, experience, taste, and opinion. For example, during the arrangement process, a long intro may be simplified, a rubato part may be deleted, and a fade-out or fermata of a difficult performance may be deleted. Therefore, research and development of a system that can convert original music files into MIDI data based on AI is required.

본 발명의 일 실시예는, 각 개인에 따라 편차가 발생하지 않도록 수작업을 최소화하고, 자동화된 프로세스로 미디의 수급을 원활히 할 수 있도록, 원곡음악파일을 AI를 이용하여 미디파일로 변환하고, 미디파일로부터 템포, 키/조성, 곡구조, 음역대 및 장르를 추출함으로써 곡정보를 분석하며, 기 구축된 미디데이터 데이터베이스에 저장된 미디데이터와 유사도 비교를 통해 가장 유사한 유사 미디데이터의 패턴을 후보 미디데이터로 세팅하고, 후보 미디데이터에 기 설정된 사운드 최적화 프로세스를 진행함으로써, 원곡음악파일에 대한 미디데이터를 자동으로 생성할 수 있도록 하는, AI 기반 미디데이터 추출 솔루션 제공 시스템을 제공할 수 있다. 다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.One embodiment of the present invention provides an AI-based MIDI data extraction solution system that converts an original music file into a MIDI file using AI to minimize manual work so that variations do not occur for each individual, and facilitates the supply and demand of MIDI through an automated process, analyzes song information by extracting tempo, key/key, song structure, range, and genre from the MIDI file, sets the pattern of the most similar MIDI data as candidate MIDI data through a similarity comparison with MIDI data stored in a pre-built MIDI data database, and performs a preset sound optimization process on the candidate MIDI data, thereby automatically generating MIDI data for the original music file. However, the technical problem to be achieved by the present embodiment is not limited to the technical problem described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 일 실시예는, 노래방반주기에 입력될 미디파일의 미디데이터를 생성하기 위하여, 원곡음악파일을 선택하는 사용자 단말 및 원곡음악파일을 기 구축된 AI로 입력하여 미디파일로 변환한 후, 미디파일로부터 템포, 키, 조성, 곡구조, 음역대 및 장르를 특징 데이터로 추출하는 곡분석부, 기 구축된 미디데이터 데이터베이스의 미디데이터를 임베딩한 벡터와, 특징 데이터를 임베딩한 벡터 간 유사도를 측정한 후, 유사도를 기준으로 유사 미디데이터를 추출하며, 유사 미디데이터에 적용된 악기, 이퀄라이저 및 필터의 값의 패턴을 후보 미디데이터로 설정하는 설정부, 후보 미디데이터에 기 설정된 사운드 최적화 프로세스를 진행하여 원곡음악파일에 대한 미디데이터를 생성하는 생성부를 포함하는 추출 서비스 제공 서버를 포함한다.As a technical means for achieving the above-described technical task, one embodiment of the present invention includes an extraction service providing server including a user terminal for selecting an original music file to generate MIDI data of a MIDI file to be input into a karaoke accompaniment machine, a song analysis unit for inputting the original music file into a pre-built AI to convert it into a MIDI file, and then extracting tempo, key, composition, song structure, range, and genre as feature data from the MIDI file, a setting unit for measuring the similarity between a vector embedding MIDI data of a pre-built MIDI data database and a vector embedding feature data, and then extracting similar MIDI data based on the similarity, and setting a pattern of values of an instrument, an equalizer, and a filter applied to the similar MIDI data as candidate MIDI data, and a generation unit for generating MIDI data for the original music file by performing a pre-set sound optimization process on the candidate MIDI data.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 각 개인에 따라 편차가 발생하지 않도록 수작업을 최소화하고, 자동화된 프로세스로 미디의 수급을 원활히 할 수 있도록, 원곡음악파일을 AI를 이용하여 미디파일로 변환하고, 미디파일로부터 템포, 키/조성, 곡구조, 음역대 및 장르를 추출함으로써 곡정보를 분석하며, 기 구축된 미디데이터 데이터베이스에 저장된 미디데이터와 유사도 비교를 통해 가장 유사한 유사 미디데이터의 패턴을 후보 미디데이터로 세팅하고, 후보 미디데이터에 기 설정된 사운드 최적화 프로세스를 진행함으로써, 원곡음악파일에 대한 미디데이터를 자동으로 생성할 수 있도록 하며, 각 개인별 편차가 발생하지 않도록 하고 미디의 충분한 공급으로 노래방의 핵심자산의 결여를 방지 및 수익창출을 최대화할 수 있다.According to any one of the problem solving means of the present invention described above, in order to prevent individual deviations from occurring, minimize manual work, and facilitate the supply and demand of MIDI through an automated process, the original music file is converted into a MIDI file using AI, the song information is analyzed by extracting tempo, key/key, song structure, range, and genre from the MIDI file, and the pattern of the most similar similar MIDI data is set as candidate MIDI data through a similarity comparison with the MIDI data stored in a pre-built MIDI data database, and a preset sound optimization process is performed on the candidate MIDI data, thereby automatically generating MIDI data for the original music file, thereby preventing individual deviations from occurring, preventing a lack of core assets of karaoke rooms through a sufficient supply of MIDI, and maximizing profit generation.

도 1은 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 솔루션 제공 시스템을 설명하기 위한 도면이다.
도 2는 도 1의 시스템에 포함된 추출 서비스 제공 서버를 설명하기 위한 블록 구성도이다.
도 3 및 도 4는 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 서비스 제공 방법을 설명하기 위한 동작 흐름도이다.FIG. 1 is a diagram for explaining a system for providing an AI-based media data extraction solution according to one embodiment of the present invention.
Figure 2 is a block diagram illustrating an extraction service providing server included in the system of Figure 1.
FIG. 3 and FIG. 4 are drawings for explaining an embodiment in which an AI-based media data extraction service according to one embodiment of the present invention is implemented.
FIG. 5 is a flowchart illustrating a method for providing an AI-based media data extraction service according to one embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present invention are described in detail so that those skilled in the art can easily implement them. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, irrelevant parts have been omitted for clarity of description, and similar reference numerals have been used throughout the specification to indicate similar elements.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element in between. Furthermore, when a part is said to "include" a component, this should be understood to mean that, unless specifically stated to the contrary, it may include other components rather than excluding them, and does not preclude the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. The terms "about," "substantially," and the like used throughout the specification are used in a sense of degree or in a sense close to the numerical value when manufacturing and material tolerances inherent to the meanings mentioned are presented, and are used to prevent unscrupulous infringers from unfairly exploiting disclosures that mention precise or absolute numerical values to aid understanding of the present invention. The terms "step of doing" or "step of" used throughout the specification of the present invention do not mean "step for doing."

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체 지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, the term "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be realized by using two or more pieces of hardware, and two or more units may be realized by one piece of hardware. Meanwhile, the "unit" is not limited to software or hardware, and the "unit" may be configured to be on an addressable storage medium or may be configured to reproduce one or more processors. Accordingly, as an example, the "unit" includes components such as software components, object-oriented software components, class components, and task components, as well as processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within the components and "units" may be combined into a smaller number of components and "units," or further separated into additional components and "units." Additionally, components and '~parts' may be implemented to regenerate one or more CPUs within a device or secure multimedia card.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described herein as being performed by a terminal, apparatus, or device may instead be performed by a server connected to the terminal, apparatus, or device. Similarly, some of the operations or functions described herein as being performed by a server may also be performed by a terminal, apparatus, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as terminal and mapping or matching may be interpreted to mean mapping or matching the terminal's unique number or personal identification information, which is the terminal's identifying data.

이하 첨부된 도면을 참고하여 본 발명을 상세히 설명하기로 한다.The present invention will be described in detail with reference to the attached drawings below.

도 1은 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 솔루션 제공 시스템을 설명하기 위한 도면이다. 도 1을 참조하면, AI 기반 미디데이터 추출 솔루션 제공 시스템(1)은, 적어도 하나의 사용자 단말(100), 추출 서비스 제공 서버(300), 적어도 하나의 노래방반주기(400)를 포함할 수 있다. 다만, 이러한 도 1의 AI 기반 미디데이터 추출 솔루션 제공 시스템(1)은, 본 발명의 일 실시예에 불과하므로, 도 1을 통하여 본 발명이 한정 해석되는 것은 아니다.FIG. 1 is a diagram illustrating an AI-based midi data extraction solution providing system according to one embodiment of the present invention. Referring to FIG. 1, the AI-based midi data extraction solution providing system (1) may include at least one user terminal (100), an extraction service providing server (300), and at least one karaoke accompaniment machine (400). However, since the AI-based midi data extraction solution providing system (1) of FIG. 1 is merely one embodiment of the present invention, the present invention is not limited to FIG. 1.

이때, 도 1의 각 구성요소들은 일반적으로 네트워크(Network, 200)를 통해 연결된다. 예를 들어, 도 1에 도시된 바와 같이, 적어도 하나의 사용자 단말(100)은 네트워크(200)를 통하여 추출 서비스 제공 서버(300)와 연결될 수 있다. 그리고, 추출 서비스 제공 서버(300)는, 네트워크(200)를 통하여 적어도 하나의 사용자 단말(100), 적어도 하나의 노래방반주기(400)와 연결될 수 있다. 또한, 적어도 하나의 노래방반주기(400)는, 네트워크(200)를 통하여 추출 서비스 제공 서버(300)와 연결될 수 있다. At this time, each component of FIG. 1 is generally connected via a network (Network, 200). For example, as illustrated in FIG. 1, at least one user terminal (100) may be connected to an extraction service providing server (300) via a network (200). In addition, the extraction service providing server (300) may be connected to at least one user terminal (100) and at least one karaoke accompaniment machine (400) via the network (200). In addition, at least one karaoke accompaniment machine (400) may be connected to an extraction service providing server (300) via the network (200).

여기서, 네트워크는, 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), 5G NR(New Radio), 6G(6th Generation of Cellular Networks), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.Here, a network means a connection structure that enables information exchange between each node, such as multiple terminals and servers, and examples of such networks include a local area network (LAN), a wide area network (WAN), the Internet (WWW), a wired and wireless data communication network, a telephone network, and a wired and wireless television communication network. Examples of wireless data communication networks include, but are not limited to, 3G, 4G, 5G, 3GPP (3rd Generation Partnership Project), 5GPP (5th Generation Partnership Project), 5G NR (New Radio), 6G (6th Generation of Cellular Networks), LTE (Long Term Evolution), WIMAX (World Interoperability for Microwave Access), Wi-Fi, the Internet, LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth networks, NFC (Near-Field Communication) networks, satellite broadcasting networks, analog broadcasting networks, and DMB (Digital Multimedia Broadcasting) networks.

하기에서, 적어도 하나의 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. 또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시예에 따라 변경가능하다 할 것이다.In the following, the term "at least one" is defined as a term including both singular and plural, and it will be clear that even if the term "at least one" does not exist, each component can exist in the singular or plural and can mean either the singular or plural. Furthermore, whether each component is provided in the singular or plural may vary depending on the embodiment.

적어도 하나의 사용자 단말(100)은, AI 기반 미디데이터 추출 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 원곡음악파일을 입력하고, 추출된 미디데이터에 대한 편집을 하거나 품질을 관리하는 담당자 또는 관리자의 단말일 수 있다.At least one user terminal (100) may be a terminal of a person in charge or manager who inputs an original music file using a web page, app page, program or application related to an AI-based MIDI data extraction service and edits or manages the quality of the extracted MIDI data.

여기서, 적어도 하나의 사용자 단말(100)은, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 내비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 사용자 단말(100)은, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 사용자 단말(100)은, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 내비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, at least one user terminal (100) may be implemented as a computer capable of accessing a remote server or terminal via a network. Here, the computer may include, for example, a notebook, desktop, or laptop equipped with a navigation system or web browser. In this case, at least one user terminal (100) may be implemented as a terminal capable of accessing a remote server or terminal via a network. At least one user terminal (100) may include, for example, a wireless communication device that ensures portability and mobility, and may include all types of handheld-based wireless communication devices such as navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminals, smartphones, smartpads, tablet PCs, etc.

추출 서비스 제공 서버(300)는, AI 기반 미디데이터 추출 서비스 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 제공하는 서버일 수 있다. 그리고, 추출 서비스 제공 서버(300)는, 사용자 단말(100)에서 원곡음악파일을 입력하면, 원곡음악파일에 대한 미디데이터를 생성한 후 사용자 단말(100)로 제공하는 서버일 수 있다. 또, 사용자 단말(100)에서 편집하는 경우 편집된 미디데이터를 저장한 후 미디파일을 생성하여 노래방반주기(400)에서 재생되도록 하는 서버일 수 있다. The extraction service providing server (300) may be a server providing an AI-based MIDI data extraction service web page, app page, program, or application. Furthermore, the extraction service providing server (300) may be a server that, when an original music file is input into a user terminal (100), generates MIDI data for the original music file and then provides the data to the user terminal (100). Furthermore, when editing is performed on the user terminal (100), the server may be a server that stores the edited MIDI data and then generates a MIDI file to be played on a karaoke accompaniment machine (400).

여기서, 추출 서비스 제공 서버(300)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 내비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다.Here, the extraction service providing server (300) may be implemented as a computer capable of connecting to a remote server or terminal via a network. Here, the computer may include, for example, a notebook computer, desktop computer, or laptop computer equipped with a navigation system or web browser.

적어도 하나의 노래방반주기(400)는, AI 기반 미디데이터 추출 서비스 관련 웹 페이지, 앱 페이지, 프로그램 또는 애플리케이션을 이용하여 미디파일을 수신 및 재생하는 장치일 수 있다. 이때 노래방반주기(400)는 도 1과 같이 노래방에 위치한 반주기일 수도 있지만 일반적인 스마트폰으로, 미디파일을 출력하는 장치일 수도 있다. At least one karaoke accompaniment device (400) may be a device that receives and plays MIDI files using a web page, app page, program, or application related to an AI-based MIDI data extraction service. The karaoke accompaniment device (400) may be a karaoke accompaniment device located in a karaoke room, as shown in FIG. 1, but may also be a general smartphone device that outputs MIDI files.

여기서, 적어도 하나의 노래방반주기(400)는, 네트워크를 통하여 원격지의 서버나 단말에 접속할 수 있는 컴퓨터로 구현될 수 있다. 여기서, 컴퓨터는 예를 들어, 내비게이션, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(Desktop), 랩톱(Laptop) 등을 포함할 수 있다. 이때, 적어도 하나의 노래방반주기(400)는, 네트워크를 통해 원격지의 서버나 단말에 접속할 수 있는 단말로 구현될 수 있다. 적어도 하나의 노래방반주기(400)는, 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, 내비게이션, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), Wibro(Wireless Broadband Internet) 단말, 스마트폰(Smartphone), 스마트 패드(Smartpad), 타블렛 PC(Tablet PC) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치를 포함할 수 있다.Here, at least one karaoke accompaniment device (400) may be implemented as a computer capable of connecting to a remote server or terminal via a network. Here, the computer may include, for example, a notebook, desktop, or laptop equipped with a navigation system or web browser. In this case, at least one karaoke accompaniment device (400) may be implemented as a terminal capable of connecting to a remote server or terminal via a network. At least one karaoke accompaniment device (400) may include, for example, all kinds of handheld-based wireless communication devices such as navigation, PCS (Personal Communication System), GSM (Global System for Mobile communications), PDC (Personal Digital Cellular), PHS (Personal Handyphone System), PDA (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), Wibro (Wireless Broadband Internet) terminals, smartphones, smartpads, tablet PCs, etc., as wireless communication devices that ensure portability and mobility.

도 2는 도 1의 시스템에 포함된 추출 서비스 제공 서버를 설명하기 위한 블록 구성도이고, 도 3 및 도 4는 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 서비스가 구현된 일 실시예를 설명하기 위한 도면이다.FIG. 2 is a block diagram for explaining an extraction service providing server included in the system of FIG. 1, and FIGS. 3 and 4 are drawings for explaining an embodiment in which an AI-based media data extraction service according to an embodiment of the present invention is implemented.

도 2를 참조하면, 추출 서비스 제공 서버(300)는, 곡분석부(310), 설정부(320), 생성부(330), 반주재생부(340)를 포함할 수 있다.Referring to FIG. 2, the extraction service providing server (300) may include a song analysis unit (310), a setting unit (320), a generation unit (330), and an accompaniment playback unit (340).

본 발명의 일 실시예에 따른 추출 서비스 제공 서버(300)나 연동되어 동작하는 다른 서버(미도시)가 적어도 하나의 사용자 단말(100) 및 적어도 하나의 노래방반주기(400)로 AI 기반 미디데이터 추출 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 전송하는 경우, 적어도 하나의 사용자 단말(100) 및 적어도 하나의 노래방반주기(400)는, AI 기반 미디데이터 추출 서비스 애플리케이션, 프로그램, 앱 페이지, 웹 페이지 등을 설치하거나 열 수 있다. 또한, 웹 브라우저에서 실행되는 스크립트를 이용하여 서비스 프로그램이 적어도 하나의 사용자 단말(100) 및 적어도 하나의 노래방반주기(400)에서 구동될 수도 있다. 여기서, 웹 브라우저는 웹(WWW: World Wide Web) 서비스를 이용할 수 있게 하는 프로그램으로 HTML(Hyper Text Mark-up Language)로 서술된 하이퍼 텍스트를 받아서 보여주는 프로그램을 의미하며, 예를 들어 크롬(Chrome), 에지(Microsoft Edge), 사파리(Safari), 파이어폭스(FireFox), 웨일(Whale), UC 브라우저 등을 포함한다. 또한, 애플리케이션은 단말 상의 응용 프로그램(Application)을 의미하며, 예를 들어, 모바일 단말(스마트폰)에서 실행되는 앱(App)을 포함한다.When the extraction service providing server (300) according to one embodiment of the present invention or another server (not shown) that operates in conjunction with it transmits an AI-based midi data extraction service application, program, app page, web page, etc. to at least one user terminal (100) and at least one karaoke accompaniment machine (400), the at least one user terminal (100) and the at least one karaoke accompaniment machine (400) may install or open the AI-based midi data extraction service application, program, app page, web page, etc. In addition, the service program may be driven in the at least one user terminal (100) and the at least one karaoke accompaniment machine (400) using a script executed in a web browser. Here, a web browser is a program that enables the use of a web (WWW: World Wide Web) service and means a program that receives and displays hypertext described in HTML (Hyper Text Mark-up Language), and includes, for example, Chrome, Edge (Microsoft Edge), Safari, Firefox, Whale, UC Browser, etc. Additionally, the application refers to an application on a terminal, and includes, for example, an app running on a mobile terminal (smartphone).

<곡정보분석><Song Information Analysis>

<미디파일변환><Midi file conversion>

도 2를 참조하면, 곡분석부(310)는, 원곡음악파일을 기 구축된 AI로 입력하여 미디파일로 변환한 후, 미디파일로부터 템포, 키, 조성, 곡구조, 음역대 및 장르를 특징 데이터로 추출할 수 있다. 사용자 단말(100)은, 노래방반주기에 입력될 미디파일의 미디데이터를 생성하기 위하여, 원곡음악파일을 선택할 수 있다.Referring to FIG. 2, the song analysis unit (310) can input an original music file into a pre-built AI, convert it into a MIDI file, and then extract tempo, key, composition, song structure, range, and genre as feature data from the MIDI file. The user terminal (100) can select the original music file to generate MIDI data of the MIDI file to be input into the karaoke accompaniment machine.

이때 기 구축된 AI는 원곡음악파일을 분석하여 미디(MIDI)파일로 변환해주는 자동변환 프로그램일 수 있는데, 예를 들어, AI 기반 자동 변환 프로그램인 AnthemScore, 고급 오디오 피치 분석 프로그램인 Melodyne, 스펙트럼 분석 후 MIDI로 변환해주는 Sonic Visualiser, 멜로디 추출을 위한 AmazingMIDI 등을 이용할 수 있으나, 이에 한정되는 것은 아니다. 또는 디지털 오디오 워크스테이션인 DAW(Digital Audio Workstation)에서 변환을 시킬 수도 있는데, FL Studio, Ableton Live, Cubase, Logic Pro 등과 같은 DAW를 이용하여 원곡음악파일을 미디파일로 변환할 수도 있다. 이때, Ableton Live는, Convert Melody to MIDI 기능을 제공하고, FL Studio는 NewTone 플러그인으로 오디오를 미디파일로 변환할 수 있으며, Logic Pro X는, Flex Pitch 기능 사용하면 된다. 또는 웹사이트에서 간단하기 오디오를 미디파일로 변환할 수도 있는데, 예를 들어, Bear Audio, Evano, AudioToMidi 등을 이용할 수 있다. 또는, Python + Librosa 라이브러리를 이용하여 코딩을 통해 직접 변환을 할 수도 있는데, Python에서 Librosa 라이브러리를 사용하여 오디오를 분석하고 미디파일로 변환할 수도 있다.At this time, the built-in AI can be an automatic conversion program that analyzes the original music file and converts it into a MIDI file. For example, you can use AnthemScore, an AI-based automatic conversion program, Melodyne, an advanced audio pitch analysis program, Sonic Visualiser, which converts to MIDI after spectrum analysis, AmazingMIDI for melody extraction, etc., but are not limited to these. Alternatively, you can convert it in a DAW (Digital Audio Workstation). You can convert the original music file into a MIDI file using a DAW such as FL Studio, Ableton Live, Cubase, or Logic Pro. At this time, Ableton Live provides the Convert Melody to MIDI function, FL Studio can convert audio into a MIDI file with the NewTone plug-in, and Logic Pro X can use the Flex Pitch function. Alternatively, you can simply convert audio into a MIDI file on a website. For example, you can use Bear Audio, Evano, or AudioToMidi. Alternatively, you can do the conversion directly through coding using Python + Librosa library. You can also use Librosa library in Python to analyze audio and convert it to MIDI file.

이때, 가장 좋은 방법은 상술한 자동화 프로그램을 이용하여 원곡음악파일을 미디파일로 변환한 후 DAW에서 추가편집을 하는 방법이다. 이에 따라, 본 발명의 일 실시예에서도, 원곡음악파일을 미디파일로 변환한 후, 곡분석을 통해 곡을 이루는 정보를 분석하고, 이와 유사한 유사 미디데이터를 추출한 후, 유사 미디데이터의 패턴을 후보 미디데이터로 세팅하며, 여기에 사운드 최적화 프로세스를 통하여 원곡음악파일에 대한 미디데이터를 생성하도록 한다.At this time, the best method is to convert the original music file into a MIDI file using the above-described automation program and then perform additional editing in a DAW. Accordingly, in one embodiment of the present invention, after converting the original music file into a MIDI file, the information that constitutes the song is analyzed through song analysis, similar MIDI data is extracted, and the pattern of the similar MIDI data is set as candidate MIDI data, and here, MIDI data for the original music file is generated through a sound optimization process.

<곡분석><Song Analysis>

원곡음악파일을 미디파일로 변환했다면, 곡분석부(310)는, ① 템포, ② 키, 조성, ③ 곡구조, ④ 음역대 및 ⑤ 장르를 추출하는 곡분석을 시작할 수 있다. 이를 위하여, ① 곡분석부(310)는, 템포를 분석할 때 미디파일 내 템포메타이벤트(Tempo Meta Event)를 추출하여 템포를 분석하도록 할 수 있다. 이때, 템포메타이벤트란, 미디파일 내 곡의 템포(속도)를 조정하는 이벤트이다. 미디파일은 연주정보만 저장하는데, 템포메타이벤트는 곡의 BPM(Beat Per Minute, 분당 박자 수)을 설정하여 재생속도를 조절하는 역할을 한다. If the original music file is converted into a MIDI file, the song analysis unit (310) can start song analysis to extract ① tempo, ② key, tonality, ③ song structure, ④ range, and ⑤ genre. To this end, ① the song analysis unit (310) can analyze the tempo by extracting the tempo meta event in the MIDI file when analyzing the tempo. At this time, the tempo meta event is an event that adjusts the tempo (speed) of the song in the MIDI file. The MIDI file only stores performance information, but the tempo meta event sets the BPM (Beat Per Minute) of the song to adjust the playback speed.

템포메타이벤트Tempo Meta Event BPM(비트속도)설정BPM (beat rate) setting 미디파일의 트랙에서 전체적인 곡의 템포를 결정Determine the overall tempo of the song from the tracks in the MIDI file 곡의 진행중간에 템포변경가능Tempo can be changed mid-song 120BPM→90BPM으로 변환가능Can be converted from 120BPM to 90BPM 수정툴Editing tool 미디시퀀서 또는 DAW에서 수정가능Modifiable in MIDI sequencer or DAW 코드형식Code format FF 51 03 tt tt ttFF 51 03 tt tt tt

즉, 템포메타이벤트를 사용하면 미디파일에서 템포를 자유롭게 조정할 수 있다. 여기서, FF 51는, 템포메타이벤트, 03은 데이터 길이(3바이트), tt tt tt는, 마이크로초 단위의 1/4박자(Quarter Note) 시간을 의미한다. 예를 들어, 120 BPM의 경우, FF 51 03 07 A1 20로 표시될 수 있는데, 07 A1 20(16진수) = 500,000 마이크로초(0.5초)= 120BPM일 수 있다. 템포(BPM) 공식은 60000000에서 마이크로초 단위의 1/4박자 길이를 나눈 것이다.That is, you can freely adjust the tempo in a MIDI file by using the tempo meta event. Here, FF 51 is the tempo meta event, 03 is the data length (3 bytes), and tt tt tt means 1/4 beat (Quarter Note) time in microseconds. For example, in the case of 120 BPM, it can be expressed as FF 51 03 07 A1 20, so 07 A1 20 (hexadecimal) = 500,000 microseconds (0.5 seconds) = 120 BPM. The tempo (BPM) formula is 60000000 divided by the 1/4 beat length in microseconds.

② 곡분석부(310)는, 미디파일의 노트(Note) 분포를 통하여 최빈음역 및 코드진행패턴을 파악함으로써 키(Key) 및 조성(Scale)을 분석할 수 있다. 우선, 미디파일에서 모든 노트 데이터를 추출한 후 각 노트의 개수 및 빈도를 분석할 수 있다. 노트는 C, C#, D, D# ... B(총 12개 음)로 구분되는데, 주어진 곡에서 어떤 노트가 가장 많이 사용되었는지 확인할 수 있다. 예를 들어, C, E, G 노트가 많이 사용되었다면, 이는 C 메이저 키(C Major, C 장조)일 수 있고, A, C, E 노트가 주로 등장했다면, 이는 A 마이너 키(A Minor, A 단조)일 수 있다. 노트의 빈도를 분석할 때에는 노트 빈도 분석(Pitch Class Distribution)을 이용하고, 각 스케일과의 유사도 점수(Scale Matching Score)를 이용하여 키 및 조성을 분석할 수 있다. 정리하면, 노트 분포를 분석하여 가장 많이 등장하는 음(최빈음역)을 찾고, 메이저 스케일 및 마이너 스케일을 비교하여 최적의 조성(Scale)을 찾을 수 있다. ② The song analysis unit (310) can analyze the key and scale by identifying the most frequent range and chord progression pattern through the note distribution of the MIDI file. First, all note data can be extracted from the MIDI file, and then the number and frequency of each note can be analyzed. Notes are classified into C, C#, D, D#...B (12 notes in total), and it is possible to check which note is used most frequently in a given song. For example, if the notes C, E, and G are used frequently, this may be the key of C Major (C Major), and if the notes A, C, and E appear mainly, this may be the key of A Minor (A Minor). When analyzing the frequency of notes, note frequency analysis (Pitch Class Distribution) can be used, and the key and tonality can be analyzed using the similarity score with each scale (Scale Matching Score). In summary, by analyzing the note distribution, you can find the most frequently appearing note (most frequent note range), and by comparing the major and minor scales, you can find the optimal key (scale).

③ 곡분석부(310)는, 미디파일에서 인트로, 벌스(Verse), 코러스, 간주 및 엔딩에 대응하는 구간을 기 구축된 곡구조(Song Structure)분석모델로 분석하도록 할 수 있다. 이때, 곡구조분석모델은, 예를 들어, DAW(FL Studio, Ableton, Cubase 등)에서 시각적으로 분석하거나, Melodyne, Sonic Visualizer 같은 프로그램을 이용하여 파형 분석을 수행함으로써 곡구조를 파악해볼 수도 있고, Python을 활용해 자동으로 분석해볼 수도 있다.③ The song analysis unit (310) can analyze the sections corresponding to the intro, verse, chorus, interlude, and ending in the MIDI file using a pre-built song structure analysis model. At this time, the song structure analysis model can be visually analyzed in a DAW (FL Studio, Ableton, Cubase, etc.), or the song structure can be identified by performing waveform analysis using a program such as Melodyne or Sonic Visualizer, or it can be automatically analyzed using Python.

구성요소Components 설명explanation 인트로(Intro)Intro 곡의 시작 부분, 분위기를 설정The beginning of the song sets the mood 벌스(Verse)Verse 이야기(가사)가 전개되는 부분The part where the story (lyrics) unfolds 프리코러스(Pre-Chorus)Pre-Chorus 코러스로 넘어가기 전 점진적 상승Gradual rise before moving on to the chorus 코러스(Chorus)Chorus 가장 강렬하고 기억에 남는 부분(후렴구)The most intense and memorable part (chorus) 간주(Bridge)Bridge 곡의 흐름을 변화시키는 부분A part that changes the flow of the song 아웃트로(엔딩,Outro)Outro (Ending) 곡을 마무리하는 부분The part that ends the song

현대 대중음악의 구조를 분석해 보면 모든 곡이 해당하는 것은 아니지만, 일반적으로 인트로(Intro), 벌스1(Verse1), 프리-코러스(Pre-Chorus(Build-up)), 코러스(Chorus(Drop)), 벌스2(Verse2), 프리-코러스(Pre-Chorus(Build-up)), 코러스(Chorus(Drop)), 간주(Bridge), 코러스(Chorus(Drop)), 엔딩(Outro)의 형태로 진행된다. 인트로와 엔딩은 곡의 시작과 끝을 표현하는 마디를 의미한다. 벌스는 [절]을 의미하는데, 기승전결(起承轉結) 구조에서 [기(起)] 부분을 의미한다. 프리-코러스는 뒤에 이어져 나올 코러스를 이어주기 위한 역할을 하며 기승전결 구조에서 [승(承)]을 의미한다. 코러스는 곡의 가장 극적인 부분을 표현하며, 기승전결 구조에서 [전(轉)]을 뜻한다. 간주는 일반적으로 앞서 벌스, 프리코러스 및 코러스와는 조금 다른 분위기를 표현하지만, 이를 자연스럽게 연결해주는 다리 역할을 한다. 이러한 형식을 갖춘 현대 대중음악들은 대체적으로 브리지 파트를 제외하고 같은 코드를 반복하여 8 마디, 또는 16 마디를 기점으로 악기 구성이나 박자감의 변화를 이용하여 전반적인 곡의 느낌에서 기승전결이 느껴지게끔 표현한다. 그 중에서도 후렴구 파트인 코러스는 곡의 가장 극적인 부분을 표현하기 때문에 사운드 적으로 가장 화려하고 풍성하게 표현되는 구간이다. 즉 저주파 대역부터 고주파 대역을 아우르는 악기들로 채워진다.If we analyze the structure of modern popular music, it generally progresses in the form of Intro, Verse 1, Pre-Chorus (Build-up), Chorus (Drop), Verse 2, Pre-Chorus (Build-up), Chorus (Drop), Bridge, Chorus (Drop), and Outro. Intro and Outro refer to the measures that express the beginning and end of the song. Verse means [section], and in the introduction-development-turn-conclusion (起承轉結) structure, it refers to the [起] part. Pre-chorus serves to connect the chorus that follows and refers to the [承] in the introduction-development-turn-conclusion (起承轉結) structure. Chorus expresses the most dramatic part of the song and refers to the [轉] in the introduction-development-turn-conclusion (起承轉結) structure. The interlude typically expresses a slightly different mood from the preceding verse, pre-chorus, and chorus, but it serves as a natural bridge that connects them. Contemporary popular music with this format typically repeats the same chords, excluding the bridge section, and uses changes in instrumentation or rhythmic feel around bars 8 or 16 to create a sense of development within the overall feel of the song. Of these, the chorus, the refrain, represents the most dramatic part of the song, making it the most splendid and richly expressed sonically. It is filled with instruments spanning from low to high frequencies.

import mido
from collections import Counter

# MIDI 파일 불러오기
midi_file = mido.MidiFile('example.mid')

# 시간별 노트 분석
note_counts = Counter()

# 노트의 시간 패턴 추출
for track in midi_file.tracks:
for msg in track:
if msg.type == 'note_on' and msg.velocity > 0:
note_counts[msg.time] += 1 # 시간별 노트 개수

# 패턴을 기반으로 곡 구조 추론
print("시간별 노트 발생 빈도:", note_counts)import mido
from collections import Counter

# Importing MIDI files
midi_file = mido.MidiFile('example.mid')

# Hourly note analysis
note_counts = Counter()

# Extracting time patterns from notes
for track in midi_file.tracks:
for msg in track:
if msg.type == 'note_on' and msg.velocity > 0:
note_counts[msg.time] += 1 # Number of notes by hour

# Inferring song structure based on patterns
print("Note occurrence frequency by hour:", note_counts)

표 3의 코드를 활용하면 곡의 구조(Verse, Chorus, Bridge 등)를 패턴분석으로 분석 및 예측할 수 있다. 물론, 상술한 방법 이외에도 다양한 방법을 이용할 수 있다.④ 곡분석부(310)는, 미디파일에서 보컬 파트에 대응하는 멜로디 파트의 최저음 및 최고음의 범위를 산출함으로써 음역대를 추출할 수 있다. 각 노트의 피치(Pitch) 값을 추출한 후, 최저음과 최고음을 찾아서 음역대를 결정할 수 있다. 미디파일에서 각 노트번호는 0(C-1)~123(G9)로 구성되는데, 각 노트번호는 이하 표 4와 같은 의미를 가진다.By utilizing the code in Table 3, the structure of the song (Verse, Chorus, Bridge, etc.) can be analyzed and predicted through pattern analysis. Of course, various methods can be used in addition to the above-described methods. ④ The song analysis unit (310) can extract the pitch range by calculating the range of the lowest and highest notes of the melody part corresponding to the vocal part in the MIDI file. After extracting the pitch value of each note, the pitch range can be determined by finding the lowest and highest notes. Each note number in the MIDI file consists of 0 (C-1) to 123 (G9), and each note number has the same meaning as in Table 4 below.

노트번호Note number 노트명Note name 옥타브octave 2121 A0A0 가장 낮은 피아노 음the lowest piano note 6060 C4C4 중간 도(가운데 C)Middle C 127127 G9G9 가장 높은 음highest note

이때, 미디파일로부터 노트 데이터를 로딩한 후 최저 노트 및 최고 노트를 찾음으로써 음역대를 추출할 수 있다. 또는 파이썬을 이용하여 표 5와 같은 코드로 자동으로 음역대를 추출할 수도 있다.At this point, the pitch range can be extracted by loading note data from a MIDI file and then finding the lowest and highest notes. Alternatively, the pitch range can be automatically extracted using Python with code like that in Table 5.

import mido

# MIDI 파일 불러오기
midi_file = mido.MidiFile('example.mid')

# 노트 리스트 초기화
notes = []

# MIDI 파일에서 노트 추출
for track in midi_file.tracks:
for msg in track:
if msg.type == 'note_on' and msg.velocity > 0: # 활성화된 노트만
notes.append(msg.note)

# 음역대 계산
if notes:
lowest_note = min(notes)
highest_note = max(notes)
print(f"최저 음: {lowest_note}(MIDI Note), 최고 음: {highest_note}(MIDI Note)")
else:
print("MIDI 파일에 노트가 없습니다.")import mido

# Importing MIDI files
midi_file = mido.MidiFile('example.mid')

# Reset note list
notes = []

# Extract notes from MIDI files
for track in midi_file.tracks:
for msg in track:
if msg.type == 'note_on' and msg.velocity > 0: # Only active notes
notes.append(msg.note)

# Calculating the range
if notes:
lowest_note = min(notes)
highest_note = max(notes)
print(f"Lowest note: {lowest_note}(MIDI Note), Highest note: {highest_note}(MIDI Note)")
else:
print("There are no notes in the MIDI file.")

⑤ 곡분석부(310)는, 미디파일에서 템포, 리듬패턴 및 코드진행을 포함하는 지표(Index)를 기반으로 기 구축된 장르분류모델을 이용하여 장르를 분류할 수 있다. 예를 들어 기 구축된 장르분류모델은 예를 들어 머신러닝모델, 예를 들어 SVM(Support Vector Machine)일 수 있다. 템포, 리듬패턴, 코드진행 등을 특징(Feature)으로 하여 장르를 분류할 수 있다. 이때 템포, 리듬패턴 및 코드진행에 따른 분류는 예를 들면 표 6과 같을 수 있으나 이에 한정되는 것은 아니다.⑤ The song analysis unit (310) can classify genres using a genre classification model that has been built based on indices including tempo, rhythm pattern, and chord progression in a MIDI file. For example, the genre classification model that has been built can be a machine learning model, for example, a support vector machine (SVM). Genres can be classified using tempo, rhythm pattern, chord progression, etc. as features. At this time, classification according to tempo, rhythm pattern, and chord progression can be as in Table 6, for example, but is not limited thereto.

장르분류Genre classification 템포(BPM) 분석Tempo (BPM) Analysis -템포가 빠르면 EDM, 록, 메탈 가능성 ↑
-템포가 느리면 발라드, 클래식 가능성 ↑-If the tempo is fast, the possibility of EDM, rock, and metal increases
-If the tempo is slow, the possibility of ballads and classical music increases 리듬패턴 분석Rhythm pattern analysis -4/4 박자가 많으면 팝, 락, EDM
-3/4, 6/8 박자가 많으면 클래식, 재즈, 왈츠 -If there are a lot of 4/4 beats, it's pop, rock, EDM
-If there are many 3/4, 6/8 beats, it is classical, jazz, waltz 코드진행 분석Code progression analysis -I-IV-V-I(메이저 진행) → 팝, 록
-ii-V-I(재즈 진행) → 재즈, 블루스
-i-VI-III-VII(마이너 진행) → 라틴, EDM-I-IV-VI (major progression) → pop, rock
-ii-VI (jazz progression) → jazz, blues
-i-VI-III-VII (minor progression) → Latin, EDM 악기사용 분석Analysis of instrument use -드럼이 강한 음악 → 록, 메탈, EDM
-피아노 & 현악기가 많음 → 클래식, 재즈-Music with strong drums → rock, metal, EDM
- Lots of pianos and string instruments → Classical, jazz 노트 밀도 및 분포 분석Note density and distribution analysis -빠르고 짧은 노트 많음 → 트랩, EDM, 재즈
-긴 음 지속 많음 → 클래식, 발라드- Lots of fast, short notes → trap, EDM, jazz
- Long sustained notes → Classical, ballad

<유사도 측정><Similarity measurement>

설정부(320)는, 기 구축된 미디데이터 데이터베이스의 미디데이터를 임베딩한 벡터와, 특징 데이터를 임베딩한 벡터 간 유사도를 측정한 후, 유사도를 기준으로 유사 미디데이터를 추출하며, 유사 미디데이터에 적용된 악기, 이퀄라이저 및 필터의 값의 패턴을 후보 미디데이터로 설정할 수 있다. 이때, 미디데이터 데이터베이스는, 기 구축된 미디음원에 대하여, 미디데이터 및 가사데이터를 포함한 미디파일, 미디파일에 대한 메타데이터(MetaData), 실제 적용된 악기 이펙트 체인(Effect Chain) 설정, 미디음원에 대한 대중호응 및 대중반응 데이터를 포함할 수 있다. The setting unit (320) measures the similarity between a vector embedding MIDI data of a pre-built MIDI data database and a vector embedding feature data, and then extracts similar MIDI data based on the similarity, and can set the pattern of values of instruments, equalizers, and filters applied to the similar MIDI data as candidate MIDI data. At this time, the MIDI data database can include, for a pre-built MIDI sound source, a MIDI file including MIDI data and lyric data, metadata for the MIDI file, settings for an actually applied instrument effect chain, and data on public response and public reaction to the MIDI sound source.

미디데이터 데이터베이스
MIDI data database 미디파일MIDI file 미디데이터MIDI data 가사데이터Lyrics data 메타데이터(MetaData)MetaData 악기 이펙트 체인(Effect Chain) 설정Instrument Effect Chain Settings 대중호응 및 대중반응 데이터Public Response and Public Reaction Data

<벡터 데이터베이스>Vector Database

미디데이터 데이터베이스는, 벡터 데이터베이스일 수 있는데, 벡터 데이터베이스란, 정보를 벡터로 저장하는 데이터베이스이다. 이때, 벡터는 벡터 임베딩(Vector Embedding)이라고도 알려진 데이터 객체의 수치 표현이다. 이러한 벡터 임베딩의 강력한 기능을 활용하여 이미지, 텍스트 또는 센서 데이터와 같은 비정형 데이터와 반정형 데이터로 구성된 대규모 데이터셋을 색인하고 검색할 수 있다. 벡터 데이터베이스는 벡터 임베딩을 관리하기 위해 구축되었으므로 비정형 및 반정형 데이터 관리를 위한 완벽한 솔루션을 제공할 수 있다. 이때, 벡터 데이터베이스는 벡터 검색 라이브러리 또는 벡터 인덱스와는 다른데, 벡터 데이터베이스는 메타데이터 저장 및 필터링을 가능하게 하고, 확장 가능하며, 동적 데이터 변경을 허용하고, 백업을 수행하고, 보안 기능을 제공하는 데이터 관리 솔루션이다. 또, 벡터 데이터베이스는 고차원 벡터를 통하여 데이터를 구성할 수 있는데, 고차원 벡터에는 수백 개의 차원이 포함되어 있으며, 각 차원은 데이터 객체의 특정 기능이나 속성에 대응할 수 있다.A media database can be a vector database, which stores information as vectors. A vector, also known as a vector embedding, is a numerical representation of a data object. The powerful capabilities of vector embeddings can be leveraged to index and search large datasets composed of unstructured and semi-structured data, such as images, text, or sensor data. Because vector databases are built to manage vector embeddings, they offer a complete solution for managing unstructured and semi-structured data. Unlike vector search libraries or vector indexes, vector databases are data management solutions that enable metadata storage and filtering, are scalable, allow dynamic data modification, perform backups, and provide security features. Furthermore, vector databases can organize data through high-dimensional vectors, which can contain hundreds of dimensions, each corresponding to a specific feature or attribute of a data object.

이때, 벡터 임베딩은 주제, 단어, 이미지 또는 기타 데이터를 숫자로 표현한 것인데, 임베딩이라고도 하는 벡터 임베딩은 대규모 언어 모델인 LLM 및 기타 AI 모델에 의해 생성된다. 각 벡터 임베딩 사이의 거리는 벡터 데이터베이스 또는 벡터 검색 엔진이 벡터 간의 유사성을 결정할 수 있게 해준다. 거리는 데이터 객체의 여러 차원을 나타낼 수 있으므로, 머신러닝과 AI가 패턴, 관계 및 기본 구조를 이해할 수 있다. 벡터 데이터베이스는 알고리즘을 사용하여 벡터 임베딩을 색인하고 쿼리(Query)하는 방식으로 작동한다. 알고리즘을 사용하면 해싱(Hashing), 양자화 또는 그래프 기반 검색을 통해 근사 최근접 유사 항목(Approximate Nearest Neighbor, ANN) 검색이 가능하다. 정보를 검색하기 위해, ANN 검색은 쿼리의 최근접 유사 벡터 항목을 찾는데, kNN 검색(최근접 유사 항목 또는 실제 k-최근접 유사 항목 알고리즘)보다 계산 집약도가 낮기 때문에, 근사 최근접 유사 항목 검색도 덜 정확하다. 다만, 고차원 벡터의 대규모 데이터셋에 대해서는 효율적이고 규모에 맞게 작동하며, 벡터 데이터베이스 파이프라인은 이하 표 8과 같을 수 있다. 더욱 상세한 내용은 공지기술과 같으므로 더 이상의 설명은 생략하기로 한다. Here, vector embeddings are numerical representations of topics, words, images, or other data. Vector embeddings, also known as embeddings, are generated by large-scale language models (LLMs) and other AI models. The distance between each vector embedding allows vector databases or vector search engines to determine similarity between vectors. Distances can represent multiple dimensions of data objects, allowing machine learning and AI to understand patterns, relationships, and underlying structures. Vector databases work by indexing and querying vector embeddings using algorithms. These algorithms enable approximate nearest neighbor (ANN) retrieval through hashing, quantization, or graph-based search. To retrieve information, ANN retrieval finds the closest vector equivalent of a query. However, because ANN retrieval is less computationally intensive than kNN retrieval (also known as nearest neighbor or k-nearest neighbor algorithm), approximate nearest neighbor retrieval is also less accurate. However, for large datasets of high-dimensional vectors, it operates efficiently and scalably, and the vector database pipeline can be as shown in Table 8 below. Further details are available in the public domain, so further explanation is omitted.

색인index 지역성 기반 해싱(Locality-Sensitive Hashing, LSH) 알고리즘과 같은 [ 해싱] 알고리즘은 빠른 결과를 제공하고 대략적인 결과를 생성하므로 근사 최근접 유사 항목 검색에 가장 적합함 LSH는 스도쿠 퍼즐과 같은 해시 테이블을 사용하여 최근접 유사 항목을 매핑함 쿼리는 테이블로 해시된 다음 동일한 테이블의 벡터 집합과 비교되어 유사성을 결정함 Hashing algorithms, such as the Locality-Sensitive Hashing (LSH) algorithm, are best suited for approximate nearest neighbor retrieval because they provide fast results and produce approximate results. LSH uses a hash table, like a Sudoku puzzle, to map nearest neighbors. The query is hashed into a table and then compared to a set of vectors in the same table to determine similarity. [양자화] (Product Quantization, PQ)와 같은 기술은 벡터를 더 작은 부분으로 나누고 해당 부분을 코드로 표현한 다음 해당 부분을 다시 합침. 결과는 벡터와 해당 구성 요소의 코드 표현임. 이러한 코드의 앙상블을 코드북이라고 함. 쿼리가 수행되면 양자화를 사용하는 벡터 데이터베이스는 쿼리를 코드로 나눈 다음 코드북과 일치시켜 가장 유사한 코드를 찾아 결과를 생성함.Techniques like [quantization] (Product Quantization, PQ) break a vector into smaller parts, represent those parts as codes, and then reassemble those parts. The result is a vector and its component code representations. This ensemble of codes is called a codebook. When a query is made, a vector database using quantization breaks the query into codes and then matches them against the codebook to find the most similar code, producing a result. 계층적으로 탐색 가능한 작은 세계(Hierarchical Navigable Small World, HNSW) 알고리즘과 같은 [ 그래프 알고리즘] 은 노드를 사용하여 벡터를 나타냄. 노드를 클러스터링하고 유사한 노드 사이에 선이나 가장자리를 그려 계층적 그래프를 만듦. 쿼리가 시작되면 알고리즘은 그래프 계층 구조를 탐색하여 쿼리 벡터와 가장 유사한 벡터가 포함된 노드를 찾음. Graph algorithms , such as the Hierarchical Navigable Small World (HNSW) algorithm, use nodes to represent vectors. They cluster nodes and draw lines or edges between similar nodes to create a hierarchical graph. When a query is initiated, the algorithm traverses the graph hierarchy to find the node containing the vector most similar to the query vector. 쿼리query [코사인 유사성] (Cosine Similarity)은 -1에서 1 범위의 유사성을 설정함. 벡터 공간에서 두 벡터 사이의 각도의 코사인을 측정하여 정반대(-1로 표시), 직교(0으로 표시) 또는 동일한(1로 표시) 벡터를 결정함 Cosine Similarity sets the similarity between two vectors in a vector space, measuring the cosine of the angle between them. It determines whether vectors are opposite (represented as -1), orthogonal (represented as 0), or identical (represented as 1). [유클리드 거리] (Euclidean Distance)는 벡터 사이의 직선 거리를 측정하여 0부터 무한대까지의 범위에서 유사성을 결정함 동일한 벡터는 0으로 표시되고, 값이 클수록 벡터 간의 차이가 커짐. [Euclidean Distance] measures the straight-line distance between vectors to determine similarity in a range from 0 to infinity. Identical vectors are represented as 0, and the larger the value, the greater the difference between vectors. [점 곱] (Dot Product) 유사성 측정은 마이너스 무한대에서 무한대까지의 범위에서 벡터 유사성을 결정함. 점 곱은 두 벡터의 크기와 그 사이의 각도의 코사인의 곱을 측정하여 서로 떨어진 벡터에는 음의 값을, 직교하는 벡터에 0을, 같은 방향을 가리키는 벡터에 양의 값을 할당함 The Dot Product similarity measure determines vector similarity over a range from minus infinity to infinity. The dot product measures the product of the magnitudes of two vectors and the cosine of the angle between them. It assigns a negative value to vectors that are far apart, a zero value to vectors that are orthogonal, and a positive value to vectors pointing in the same direction. 후처리Post-processing 벡터 데이터베이스 파이프라인의 마지막 단계는 때때로 후처리 또는 사후 필터링이며, 이 과정에서 벡터 데이터베이스는 다른 유사성 척도를 사용하여 최근접 항목의 순위를 다시 매김. 이 단계에서 데이터베이스는 메타데이터를 기반으로 검색에서 식별된 쿼리의 최근접 항목을 필터링함The final stage of a vector database pipeline is sometimes postprocessing or postfiltering, during which the vector database re-ranks the closest items using a different similarity measure. In this step, the database filters the closest items identified in the search query based on metadata.

이때, 미디파일, 미디데이터, 가사데이터, 미디음원 및 메타데이터의 개념을 정리 및 비교하면 표 9와 같다. At this time, the concepts of MIDI files, MIDI data, lyric data, MIDI sound sources, and metadata are organized and compared as shown in Table 9.

항목item 설명explanation 포함 형태 / 예시Inclusion form / example 소리 포함 여부Whether sound is included 주요 용도Main uses 미디파일MIDI file 연주정보(미디데이터 + 메타데이터 등)를 담은 디지털 파일Digital file containing performance information (MIDI data + metadata, etc.) .mid, .kar 파일.mid, .kar files 없음doesn't exist 반주 재생, 작곡, 악보화, 노래방 등Accompaniment playback, composition, sheet music, karaoke, etc. 미디데이터MIDI data 미디파일 안에 들어 있는 실제 노트·악기·연주 정보Actual note, instrument, and performance information contained within the MIDI file Note On/Off, Control Change, Program ChangeNote On/Off, Control Change, Program Change 없음doesn't exist 음악 재생의 핵심, 연주 지시 정보The core of music playback: performance instruction information 가사데이터Lyrics data 노래 가사 텍스트 (싱크 가능), 자막용으로 사용됨Song lyrics text (syncable), used for subtitles .txt, .lrc, 미디파일 내 Lyrics 이벤트Lyrics events in .txt, .lrc, and MIDI files 없음doesn't exist 자막 표시, 노래 따라 부르기Show subtitles, sing along 미디음원MIDI sound source 미디 데이터를 재생해서 실제로 들리는 합성 소리 (사람 귀에 들리는 반주)Playing MIDI data to create a synthetic sound that can actually be heard (accompaniment that can be heard by the human ear) .mp3, .wav, 실시간 출력 음향.mp3, .wav, real-time output sound 있음There is 사용자에게 반주 제공, 음원 생성Provide accompaniment to users and generate sound sources 메타데이터Metadata 미디파일 안의 부가 설명 정보 (템포, 트랙 이름, 마커, 가사 등)Additional descriptive information within the MIDI file (tempo, track name, markers, lyrics, etc.) Meta Events: Set Tempo, Marker, Lyrics 등Meta Events: Set Tempo, Marker, Lyrics, etc. 없음doesn't exist 곡 정보 설명, 시각적 구분, 분석 등Song information description, visual distinction, analysis, etc.

이때, 미디파일은 미디데이터인 외부장치를 통해 소리를 출력하는 규격과, 가사데이터인 텍스트가 결합된 것일 수 있다. 미디파일의 메타데이터(Metadata)란, 곡의 기본 정보(제목, 작곡가, 템포, 키 시그니처 등)와 연주 지시 사항(트랙 이름, 가사, 마커 등)을 포함하는 데이터인데, 메타데이터는 주로 메타이벤트에 포함되며, 주요 항목은 이하 표 10과 같다.At this time, a MIDI file may be a combination of MIDI data, a standard for outputting sound through an external device, and text, which is lyric data. The metadata of a MIDI file is data that includes basic information about the song (title, composer, tempo, key signature, etc.) and performance instructions (track name, lyrics, markers, etc.). Metadata is mainly included in metaevents, and the main items are as shown in Table 10 below.

메타데이터 항목Metadata items 설명explanation 곡 제목(Track Name, Sequence Name)Song Title (Track Name, Sequence Name) 곡의 제목 또는 트랙 이름Song title or track name 작곡가(Composer, Copyright Notice) Composer (Copyright Notice) 작곡가, 저작권 정보Composer, Copyright Information 템포(Tempo)Tempo 곡의 속도(BPM) 정보Song speed (BPM) information 키 시그니처(Key Signature)Key Signature 곡의 조성(예: C Major, A Minor)The key of the song (e.g. C Major, A Minor) 박자(Time Signature)Time Signature 곡의 박자(예: 4/4, 3/4)The time signature of the song (e.g. 4/4, 3/4) 마커(Marker)Marker 특정 지점에 대한 주석(예: "Verse Start")Annotations for specific points (e.g., "Verse Start") 가사(Lyrics)Lyrics 미디파일의 가사데이터Lyrics data in MIDI files 큐 포인트(Cue Point)Cue Point 영상이나 다른 이벤트와 동기화하는 지점Points to synchronize with video or other events

또, 악기 이펙트 체인이란, 악기(특히 반주 사운드)에 적용되는 다양한 오디오 이펙트들을 특정한 순서대로 연결하여 소리를 변형하는 방식을 의미하는데, 악기 이펙트 체인의 구성요소는 예를 들어 표 10과 같을 수 있다.Also, an instrument effect chain refers to a method of modifying a sound by connecting various audio effects applied to an instrument (especially an accompaniment sound) in a specific order. The components of an instrument effect chain can be, for example, as shown in Table 10.

악기 이펙트 체인Instrument effects chain 설명explanation 이퀄라이저(EQ)Equalizer (EQ) 특정 주파수를 조절하여 음색을 변경Change the tone by adjusting specific frequencies 리버브(Reverb)Reverb 울림 효과를 추가하여 공간감을 형성Create a sense of space by adding resonance effects 코러스(Chorus)Chorus 소리를 약간 겹치게 하여 풍성한 느낌을 줌Slightly overlap the sounds to create a richer feel 딜레이(Delay)Delay 특정 간격으로 소리를 반복하여 에코 효과를 부여Repeats a sound at specific intervals to create an echo effect 디스토션(Distortion)Distortion 기타 등의 악기에 거친 톤을 추가하는 효과An effect that adds a harsh tone to instruments such as guitars. 컴프레서(Compressor)Compressor 볼륨 다이나믹을 조절하여 일정한 소리 유지Maintain consistent sound by adjusting volume dynamics

악기 이펙트 체인은 연결순서에 따라 소리의 느낌이 달라지기 때문에, 노래방반주기의 제조사마다 다른 설정을 적용할 수 있다.Since the sound of the instrument effect chain changes depending on the connection order, different manufacturers of karaoke accompaniment machines can apply different settings.

본 발명의 일 실시예에서는, 원곡음악파일의 미디파일의 곡정보와, 기 구축된 미디데이터 데이터베이스의 곡정보 간의 유사도를 측정한다. 이때, 본 발명의 일 실시예에서는 미디데이터 데이터베이스에 약 6 만곡을 구축했는데, 이 미디데이터 데이터베이스의 곡정보를 추출하고, 원곡음악파일과 비교함으로써, 원곡음악파일과 유사한 곡정보를 가진 곡의 미디데이터를 N 개 추출한다. 이를 유사 미디데이터라 명명한다. 이렇게 추출 및 유사도순으로 리스트업 된 N 개의 유사 미디데이터에서 악기, 이퀄라이저 및 필터의 값의 패턴을 후보 미디데이터로 설정할 수 있다. 또 유사 미디데이터에서 사용한 악기, 이퀄라이저(EQ), 필터설정을 후보 미디데이터로 설정할 수 있다. 즉, 원곡음악파일과 유사한 음원으로부터 패턴을 추출한 후, 이 패턴을 원곡음악파일의 미디데이터로 선정하고, 다듬는 과정을 거치는 것이다.In one embodiment of the present invention, the similarity between the song information of the MIDI file of the original music file and the song information of the previously constructed MIDI data database is measured. At this time, in one embodiment of the present invention, about 60,000 songs are constructed in the MIDI data database, and the song information of this MIDI data database is extracted and compared with the original music file, thereby extracting N MIDI data of songs having song information similar to the original music file. This is called similar MIDI data. In the N similar MIDI data listed in this way in order of extraction and similarity, the patterns of the values of instruments, equalizers, and filters can be set as candidate MIDI data. In addition, the instrument, equalizer (EQ), and filter settings used in the similar MIDI data can be set as candidate MIDI data. In other words, after extracting a pattern from a sound source similar to the original music file, this pattern is selected as the MIDI data of the original music file, and a process of refining it is performed.

<사운드 최적화 프로세스>Sound Optimization Process

생성부(330)는, 후보 미디데이터에 기 설정된 사운드 최적화 프로세스를 진행하여 원곡음악파일에 대한 미디데이터를 생성할 수 있다. 상술한 방법으로 추출한 후보 미디데이터에, 보컬 선명도 우선이라는 추가규칙을 적용하고, 노래방의 특성을 고려하여 드럼 및 베이스를 상대적으로 선명하게 처리하며, 보컬방해를 최소화하도록, 코러스 구간이나 보컬이 고역대를 진행할 때, 고역대의 악기의 볼륨을 자동으로 감쇠시켜주는 로직을 적용(더클링 기법)할 수 있다. 또, 장르별 특화된 음색이나 톤을 잡아주도록 장르별 프리셋을 최적화하도록 이펙트 체인을 자동할당할 수 있다. The generation unit (330) can generate MIDI data for the original music file by performing a preset sound optimization process on candidate MIDI data. An additional rule of prioritizing vocal clarity can be applied to the candidate MIDI data extracted using the above-described method, drums and bass can be processed relatively clearly in consideration of the characteristics of karaoke, and logic can be applied (duckling technique) to automatically attenuate the volume of high-frequency instruments when the chorus section or vocals are in the high-frequency range to minimize vocal interference. In addition, an effect chain can be automatically assigned to optimize genre-specific presets to capture genre-specific timbres or tones.

이때, 보컬 선명도 우선이라는 추가규칙이란, 노래방반주기(400)에서 반주와 보컬이 함께 출력될 때, 보컬(노래하는 사람의 목소리)이 더 명확하게 들리도록 조정하는 오디오 처리 방식이다. 이를 위해 다음과 같은 원칙과 기술이 적용된다.Here, the additional rule of vocal clarity priority refers to an audio processing method that adjusts the vocals (the singer's voice) to be more clearly audible when accompaniment and vocals are output together on a karaoke accompaniment device (400). The following principles and techniques are applied to achieve this.

보컬 선명도를 높이는 주요 규칙Key Rules for Improving Vocal Clarity 이퀄라이저(EQ) 조정Equalizer (EQ) adjustment -보컬 주파수 강조 → 1kHz~5kHz 대역을 부스트하여 목소리의 또렷함 강화
-반주 저역 감쇠 → 200Hz 이하의 저음을 살짝 줄여 보컬이 묻히지 않도록 함
-반주 고역 감쇠 → 5kHz 이상을 약간 줄여 악기와 보컬의 충돌을 방지- Emphasize vocal frequencies → Boost the 1kHz to 5kHz band to enhance the clarity of your voice.
- Accompaniment low-frequency attenuation → Slightly reduces low frequencies below 200 Hz to prevent vocals from being buried.
- Accompaniment high-frequency attenuation → Slightly reduce above 5kHz to prevent clashing between instruments and vocals 다이내믹 프로세싱Dynamic processing -컴프레서(Compressor) → 보컬 볼륨을 일정하게 유지하여 묻히지 않게 함
-덕킹(Ducking) → 보컬이 들어올 때 반주 볼륨을 자동으로 낮추는 기능-Compressor → Maintains a constant vocal volume so it doesn't get buried
-Ducking → A function that automatically lowers the accompaniment volume when vocals come in. 스테레오 밸런스 조정Adjust stereo balance -반주는 좌우로 넓게 퍼지게 배치하고, 보컬은 중앙에 위치하여 더욱 또렷하게 들리도록 함-The accompaniment is spread out to the left and right, and the vocals are positioned in the center to make them more clearly audible. 리버브 & 딜레이 최소화
Minimize reverb and delay
-반주의 공간계 이펙트(리버브, 딜레이)를 줄여 보컬과 분리
-보컬에는 적절한 리버브를 적용하여 자연스러운 울림 유지- Reduce the spatial effects (reverb, delay) of the accompaniment to separate it from the vocals.
-Apply appropriate reverb to vocals to maintain natural resonance. 반주 볼륨 자동 조절
Automatic accompaniment volume adjustment
-노래방 기기는 보컬 감지 센서를 사용해 보컬의 크기에 따라 반주를 조절하는 경우가 있음- Karaoke machines sometimes use vocal detection sensors to adjust the accompaniment based on the volume of the vocals.

또, 노래방의 환경에 맞추기 위해서는 저음이 강조되어야 하므로 드럼이나 베이스와 같은 저역대의 악기를 선명하게 처리할 수 있고, 보컬의 멜로디가 고역대일 때, 고역대의 악기를 더클링 기법으로 줄여줄 수 있다, 이때 더클링 기법은 상술한 바와 같이 오디오 믹싱에서 특정 소리가 들어올 때 다른 소리의 볼륨을 자동으로 낮추는 기술이다. 더클링의 원리는 트리거 소리(Trigger Sound)가 감지되면, 타깃 소리(Target Sound)의 볼륨을 자동으로 낮주고, 트리거 소리가 사라지면 타깃 소리의 볼륨을 원래대로 복구시키는 것이다. 이 기법은 주로 보컬과 반주 간의 밸런스를 맞추는 데 사용된다. 이에 따라, 반주가 보컬을 덮지 않도록 자동으로 조정할 때 많이 이용하는데, 중요한 소리를 강조하기 위해 자동으로 볼륨을 조절하는 기법이라고 할 수 있다.Also, in order to suit the karaoke environment, the low sounds must be emphasized, so low-frequency instruments such as drums and bass can be processed clearly, and when the vocal melody is high-frequency, the high-frequency instruments can be reduced using the ducking technique. As mentioned above, the ducking technique is a technology that automatically lowers the volume of other sounds when a specific sound is input in audio mixing. The principle of ducking is that when a trigger sound is detected, the volume of the target sound is automatically lowered, and when the trigger sound disappears, the volume of the target sound is restored to the original. This technique is mainly used to balance the vocals and accompaniment. Accordingly, it is often used to automatically adjust the accompaniment so that it does not cover the vocals, and it can be said to be a technique that automatically adjusts the volume to emphasize important sounds.

정리하면, 기 설정된 사운드 최적화 프로세스는, 후보 미디데이터에 기 설정된 보컬 선명도 우선 규칙을 적용하는 단계, 후보 미디데이터에 드럼 및 베이스를 선명하게 들리도록 설정하는 단계, 후보 미디데이터에 더클링(Duckling) 기법을 이용하여 코러스 구간 또는 보컬의 고역대 구간에 기 설정된 적어도 하나의 악기음의 볼륨을 자동으로 감쇠시키는 단계 및 원곡음악파일의 장르에 대응하도록 후보 미디데이터의 이펙트 체인(Effect Chain)을 자동으로 할당하는 단계를 포함할 수 있다. 이에 따라, <곡정보 분석>, <후보 미디데이터 추출> 및 <사운드 최적화 프로세스>에서 추출된 각 데이터를 모두 미디데이터로 포함되도록 함으로써, 미디데이터를 생성할 수 있다.In summary, the preset sound optimization process may include a step of applying a preset vocal clarity priority rule to candidate MIDI data, a step of setting the candidate MIDI data to make drums and bass clearly audible, a step of automatically attenuating the volume of at least one preset instrument sound in a chorus section or a high-frequency section of the vocal by using a ducking technique on the candidate MIDI data, and a step of automatically assigning an effect chain of the candidate MIDI data to correspond to the genre of the original music file. Accordingly, by including each data extracted in <song information analysis>, <candidate MIDI data extraction>, and <sound optimization process> as MIDI data, MIDI data can be generated.

다만, 이렇게 생성된 미디데이터는 기존에 기 구축된 미디데이터, 즉 노래방반주(MR)와 유사한 후보 미디데이터를 기반으로 생성되는 것이기 때문에, 원곡음악파일에 딱 들어맞는 미디데이터가 생성되지 않을 수도 있고, 적절하지 않은 이퀄라이저 세팅이나, 악기 선정 또는 필터 설정이 들어갔을 수도 있다. 이에 따라, 최종적으로 사용자 단말(100)에서 이를 검수하도록 하고, 검수결과를 통과한 미디데이터를 최종적인 미디데이터로 제공하도록 할 수 있다.However, since the MIDI data generated in this way is generated based on existing MIDI data, that is, candidate MIDI data similar to karaoke accompaniment (MR), MIDI data that exactly matches the original music file may not be generated, and inappropriate equalizer settings, instrument selection, or filter settings may be included. Accordingly, the user terminal (100) may ultimately inspect this, and MIDI data that passes the inspection may be provided as the final MIDI data.

<반주재생><Play Accompaniment>

반주재생부(340)는, 생성부(330)에서 미디데이터가 생성된 경우, 미디데이터에 대응하는 가사데이터가 미디데이터와 조합된 미디파일을 노래방반주기(400)에 입력하여 재생할 수 있다. 반주재생부(340)는, 미디파일의 미디데이터로부터 악기데이터를 추출하고, 악기데이터로부터 악기음을 추출하며, 미디데이터로부터 미디메시지를 추출하여 노래방반주기(400)에서 미디파일이 재생되도록 할 수 있다. 본 발명의 미디데이터는 노래방반주기(400) 뿐 아니라, 다양한 기기에서도 출력되도록 설정되는데, 이에 대한 상세한 내용은 본 출원인의 기 등록된 특허인 한국등록특허 제10-1881854호(2018년07월25일 공고)에 개시되어 있으므로, 이를 참조하기로 한다.The accompaniment playback unit (340) can input a MIDI file in which lyric data corresponding to the MIDI data is combined with the MIDI data into the karaoke accompaniment machine (400) and play it when MIDI data is generated in the generation unit (330). The accompaniment playback unit (340) can extract instrument data from the MIDI data of the MIDI file, extract instrument sounds from the instrument data, and extract MIDI messages from the MIDI data to play the MIDI file in the karaoke accompaniment machine (400). The MIDI data of the present invention is set to be output not only in the karaoke accompaniment machine (400) but also in various devices. For detailed information about this, please refer to the Korean Patent Registration No. 10-1881854 (published on July 25, 2018), which is a registered patent of the present applicant.

<노래방 노래추천>Karaoke Song Recommendations

덧붙여서, 본 발명의 일 실시예에 따른 미디데이터 데이터베이스를 기반으로 노래를 부르는 사람의 음역대에 맞는 노래나 적정키(Key)를 추천해줄 수 있다. 이때, 음역대나 키에 집중을 하다보면 노래를 부르는 사람의 선호도를 배제할 수 있으므로 여기에 협업필터링과 같은 선호도를 고려할 수 있는 추천 시스템을 더 추가할 수 있다. 이때 노래를 부르는 사람의 음역대를 파악하기 위해 쿨백-라이블러 발산(Kullback-Leibler Divergence)을 이용하여 사용자의 음역대와, 노래방반주기(400)의 미디음원의 확률분포 간을 정량적으로 비교하고, 가장 유사한 미디음원 중 사용자의 선호도를 고려하여 추천을 해줄 수 있다. 미디음원이란 미디파일을 재생해서 들리는 실제 소리이다.In addition, based on the MIDI data database according to one embodiment of the present invention, a song or an appropriate key that fits the vocal range of a singer can be recommended. At this time, if the vocal range or key is focused, the preference of the singer can be excluded, so a recommendation system that can consider the preference, such as collaborative filtering, can be further added. At this time, in order to identify the vocal range of the singer, the user's vocal range is quantitatively compared with the probability distribution of the MIDI sound source of the karaoke accompaniment device (400) using the Kullback-Leibler Divergence, and a recommendation can be made considering the user's preference among the most similar MIDI sound sources. The MIDI sound source is the actual sound heard when a MIDI file is played.

정보 이론에서 중요한 개념 중 하나인 쿨백-라이블러 발산은 두 확률분포 간의 유사성을 측정하는데 사용한다. 쿨백-라이블러 발산은 비교되는 두 확률분포 P와 Q사이의 차이를 측정하는데 사용되며 두 분포 간의 유사도가 높을수록 0에 가까 운 값을 출력한다. 두 확률분포의 차이를 계산하는 수학식 1에서 x는 확률변수의 각 값에 대한 인덱스이며, 분포의 모든 가능한 값에 대해 합산된다. P(x)와 Q(x)는 각각 확률 변수 x의 확률이다. 본 발명의 일 실시예에서는 사용자가 노래를 부르기 적합한지 평가하고 피드백을 제공한다는 관점에서 쿨백-라이블러 발산을 사용하기로 한다.Kullback-Leibler divergence, one of the important concepts in information theory, is used to measure the similarity between two probability distributions. Kullback-Leibler divergence is used to measure the difference between two probability distributions P and Q to be compared, and outputs a value closer to 0 as the similarity between the two distributions increases. In mathematical equation 1, which calculates the difference between two probability distributions, x is an index for each value of the random variable, and is summed over all possible values of the distribution. P(x) and Q(x) are the probabilities of the random variable x, respectively. In one embodiment of the present invention, Kullback-Leibler divergence is used from the perspective of evaluating whether a user is suitable for singing a song and providing feedback.

쿨백-라이블러 발산의 이러한 특성은 사용자의 음역대에 적합한 노래에 대한 근거를 정량적으로 나타내고, 이를 통하여 노래를 추천하거나 키의 설정값 기준을 제시하기에 적합하다. 수학식 1에서 P는 사용자가 입력한 음역대 데이터의 확률분포, Q는 노래방 악보 데이터의 확률분포를 나타낸다. 쿨백-라이블러 발산 결괏값에 따라 키 조정이 노래방반주기(400)로 가능한 상하 6 이내 키조절 범위라면 해당 노래는 추천 노래로 적용될 수 있다. 추천 노래의 범위에 있는 노래 중 쿨백-라이블러 발산 값이 작을수록 사용자와의 높은 적합도를 보이는 것으로 판단하여 우선적으로 추천할 수 있다.This characteristic of Kullback-Leibler divergence quantitatively indicates the basis for songs suitable for the user's vocal range, and is suitable for recommending songs or suggesting criteria for key settings. In mathematical expression 1, P represents the probability distribution of the vocal range data input by the user, and Q represents the probability distribution of the karaoke sheet music data. If the key adjustment range is within 6 degrees up and down that is possible with a karaoke accompaniment machine (400) according to the Kullback-Leibler divergence result, the song can be applied as a recommended song. Among the songs in the recommended song range, songs with a smaller Kullback-Leibler divergence value are judged to have a higher suitability with the user, and thus can be recommended preferentially.

<생성형 AI를 이용한 노래 생성>Song Generation Using Generative AI

최근 생성형 AI가 텍스트 뿐만 아니라 노래(Song)도 생성해주고 있지만, 곡의 진행이 화성학적으로 안정되지 않거나 음이 튀는 등의 현상이 발생하고 있다. 이를 해결하기 위해 본 발명의 일 실시예에 따른 미디데이터 데이터베이스(벡터 데이터베이스)에 기반하여 Bi-LSTM(Bidirectional Long Short-Term Memory)를 이용하여 곡의 전개, 곡구조, 음역대 등을 학습시키고, 이를 기반으로 노래를 생성하도록 할 수 있다. 이때, Bi-LSTM는, LSTM(Long Short-Term Memory)의 확장 버전으로, 순방향(Forward)과 역방향(Backward) 두 개의 LSTM을 사용하여 더 많은 문맥(Context) 정보를 학습하는 신경망 모델이다. 이때, 미디파일에서는 높은음 자리표와 낮은음 자리표를 따로 구분하여 두 개의 채널에서 학습하도록 한다.Recently, generative AI has been generating not only text but also songs, but there are phenomena such as the progression of the song being harmonically unstable or the sound jumping around. To solve this, based on a MIDI data database (vector database) according to an embodiment of the present invention, Bi-LSTM (Bidirectional Long Short-Term Memory) is used to learn the development, song structure, and range of the song, and a song can be generated based on this. At this time, Bi-LSTM is an extended version of LSTM (Long Short-Term Memory) and is a neural network model that learns more context information using two LSTMs, one forward and one backward. At this time, in the MIDI file, the high note clef and the low note clef are separately learned on two channels.

기존 피아노롤(Piano Roll) 중심의 단일 레이어 기반의 학습은 미디파일에서 멜로디 음계만을 추출하여 학습함으로써 생성되는 노래가 풍성한 소리를 내는데 한계를 가졌는데, Bi-LSTM은 순방향 학습에서 입력 데이터가 시퀀스 순서대로 들어가게 되고, 역방향에서는 시퀀스의 반대 순서로 들어가게 되므로, 출력에서는 순방향과 역방향의 출력이 하나로 연결되는 출력을 가지게 된다. 이 연결을 위해 어텐션 레이어를 적용할 수 있는데, 어텐션 레이어를 적용하면 인코더의 입력 데이터 중 영향력을 주는 요소에 대한 가중치를 반영해준다. 학습된 데이터는 연결(Concatenation)되어 결합층(Merge Layer)에서 모아진다. 마지막으로 정규화 과정을 거쳐 최종적인 출력을 만들어내게 된다.The existing single-layer-based learning centered on piano rolls only extracts melody scales from MIDI files and learns them, which limits the richness of the generated songs. However, Bi-LSTM inputs data in sequence order during forward learning, and in the reverse order of the sequence during backward learning, so the output is a single output that concatenates the forward and reverse outputs. An attention layer can be applied for this connection, which reflects the weights of influential elements among the encoder's input data. The learned data is concatenated and collected in the merge layer. Finally, the normalization process is performed to produce the final output.

미디파일에서 높은은 자리표와 낮은음 자리표를 구분하여 추출된 음표(Note), 음표의 길이, 쉼표, 쉼표의 길이 및 코드(Chord) 등을 어텐션 기반 Bi-LSTM에 적용하여 학습시킴으로써, 노래의 전개에 어울리는 음표와 코드를 생성할 수 있고, 화성학적으로 안정된 노래를 생성할 수 있게 된다. 물론, 상술한 미디데이터 데이터베이스를 상술한 어텐션 기반 Bi-LSTM으로 학습하는 것 이외에도 다양한 방법으로 학습시킬 수 있음은 자명하다 할 것이다.By extracting notes, note durations, rests, rest durations, and chords from MIDI files by distinguishing between high and low notes, and applying them to an attention-based Bi-LSTM for training, we can generate notes and chords that fit the development of the song, and create harmonically stable songs. Of course, it is self-evident that the above-described MIDI data database can be trained in various ways other than by training it with the attention-based Bi-LSTM described above.

이하, 상술한 도 2의 추출 서비스 제공 서버의 구성에 따른 동작 과정을 도 3 및 도 4를 예로 들어 상세히 설명하기로 한다. 다만, 실시예는 본 발명의 다양한 실시예 중 어느 하나일 뿐, 이에 한정되지 않음은 자명하다 할 것이다.Hereinafter, the operation process according to the configuration of the extraction service providing server of FIG. 2 described above will be described in detail with reference to FIGS. 3 and 4 as examples. However, it will be apparent that the embodiment is only one of various embodiments of the present invention and is not limited thereto.

도 3을 참조하면, (a) 추출 서비스 제공 서버(300)는 원곡음악파일을 AI를 이용하여 미디파일로 변환하고, (b)와 같이 미디파일을 이용하여 곡분석을 실시할 수 있다. 이를 통해, 템포, 키/조성, 곡구조, 음역대 및 장르를 추출한 후, (c)와 같이 기 구축된 미디데이터 데이터베이스로부터 유사한 템포, 키/조성, 곡구조, 음역대 및 장르를 가진 유사 미디데이터를 추출한 후, (d) 이 유사 미디데이터의 악기, 이퀄라이저 및 필터의 값의 패턴을 파악함으로써 후보 미디데이터를 추출할 수 있다. 그리고 나서, 추출 서비스 제공 서버(300)는 도 4의 (a)와 같이 사운드 최적화 프로세스를 거쳐 노래방에서 노래하기 쉬운 반주를 생성하도록 할 수 있다. 추출 서비스 제공 서버(300)는 (b) 이러한 각각의 데이터(곡구조 분석-유사도 파악-사운드 최적화)를 모두 모아 미디데이터를 생성할 수 있다.Referring to FIG. 3, (a) the extraction service providing server (300) can convert an original music file into a MIDI file using AI, and perform song analysis using the MIDI file as in (b). Through this, after extracting tempo, key/key, song structure, range, and genre, (c) similar MIDI data having similar tempo, key/key, song structure, range, and genre can be extracted from a pre-built MIDI data database, and (d) candidate MIDI data can be extracted by identifying the pattern of the instrument, equalizer, and filter values of the similar MIDI data. Then, the extraction service providing server (300) can generate an accompaniment that is easy to sing in a karaoke room through a sound optimization process as in (a) of FIG. 4. The extraction service providing server (300) can (b) gather all of these respective data (song structure analysis-similarity identification-sound optimization) to generate MIDI data.

추출 서비스 제공 서버(300)는, (c) 이렇게 미디데이터가 생성되었다면, 가사데이터를 추가하여 미디파일을 생성하고, 이를 이용하여 미디파일의 악기 데이터를 추출한다. 이때, 노래방반주기(400)의 신디모듈의 사운드라이브러리에서 추출하여 메모리에 로딩할 수 있고, 미디데이터로부터 미디메시지를 추출하여 재생을 할 수 있다. 이에 대한 상세한 내용은 본 출원인의 상술한 한국등록특허를 참조하기로 한다. 이렇게 재생된 미디파일을 (d) 사용자 단말(100)에서 최종 컨펌하거나 편집하여 다듬을 수 있다.The extraction service providing server (300), (c) if MIDI data is generated in this way, adds lyric data to create a MIDI file, and extracts instrument data of the MIDI file using this. At this time, it can be extracted from the sound library of the synth module of the karaoke accompaniment machine (400) and loaded into memory, and a MIDI message can be extracted from the MIDI data and played. For details on this, refer to the aforementioned Korean registered patent of the present applicant. The MIDI file played in this way can be finalized or edited and refined in (d) the user terminal (100).

이와 같은 도 2 내지 도 4의 AI 기반 미디데이터 추출 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1을 통해 AI 기반 미디데이터 추출 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.Matters not described in the AI-based midi data extraction service providing method of FIGS. 2 to 4 are the same as or can be easily inferred from the contents described in the AI-based midi data extraction service providing method of FIG. 1 above, and therefore, description thereof will be omitted below.

도 5는 본 발명의 일 실시예에 따른 도 1의 AI 기반 미디데이터 추출 솔루션 제공 시스템에 포함된 각 구성들 상호 간에 데이터가 송수신되는 과정을 나타낸 도면이다. 이하, 도 5를 통해 각 구성들 상호간에 데이터가 송수신되는 과정의 일 예를 설명할 것이나, 이와 같은 실시예로 본원이 한정 해석되는 것은 아니며, 앞서 설명한 다양한 실시예들에 따라 도 5에 도시된 데이터가 송수신되는 과정이 변경될 수 있음은 기술분야에 속하는 당업자에게 자명하다.FIG. 5 is a diagram illustrating a process of transmitting and receiving data between each component included in the AI-based media data extraction solution providing system of FIG. 1 according to one embodiment of the present invention. Hereinafter, an example of a process of transmitting and receiving data between each component will be described through FIG. 5, but the present invention is not limited to this embodiment, and it will be apparent to those skilled in the art that the process of transmitting and receiving data illustrated in FIG. 5 may be modified according to various embodiments described above.

도 5를 참조하면, 추출 서비스 제공 서버는, 원곡음악파일을 기 구축된 AI로 입력하여 미디파일로 변환한 후, 미디파일로부터 템포, 키, 조성, 곡구조, 음역대 및 장르를 특징 데이터로 추출한다(S5100).Referring to FIG. 5, the extraction service providing server inputs an original music file into a pre-built AI, converts it into a MIDI file, and then extracts tempo, key, composition, song structure, range, and genre as feature data from the MIDI file (S5100).

그리고, 추출 서비스 제공 서버는, 기 구축된 미디데이터 데이터베이스의 미디데이터를 임베딩한 벡터와, 특징 데이터를 임베딩한 벡터 간 유사도를 측정한 후, 유사도를 기준으로 유사 미디데이터를 추출하며, 유사 미디데이터에 적용된 악기, 이퀄라이저 및 필터의 값의 패턴을 후보 미디데이터로 설정한다(S5200).Then, the extraction service providing server measures the similarity between a vector embedding MIDI data of a pre-built MIDI data database and a vector embedding feature data, extracts similar MIDI data based on the similarity, and sets the pattern of values of instruments, equalizers, and filters applied to the similar MIDI data as candidate MIDI data (S5200).

또, 추출 서비스 제공 서버는, 후보 미디데이터에 기 설정된 사운드 최적화 프로세스를 진행하여 원곡음악파일에 대한 미디데이터를 생성한다(S5300).In addition, the extraction service providing server generates MIDI data for the original music file by performing a preset sound optimization process on the candidate MIDI data (S5300).

상술한 단계들(S5100~S5300)간의 순서는 예시일 뿐, 이에 한정되지 않는다. 즉, 상술한 단계들(S5100~S5300)간의 순서는 상호 변동될 수 있으며, 이중 일부 단계들은 동시에 실행되거나 삭제될 수도 있다.The order of the above-described steps (S5100 to S5300) is merely an example and is not limited thereto. That is, the order of the above-described steps (S5100 to S5300) may be mutually changed, and some of the steps may be executed simultaneously or deleted.

이와 같은 도 5의 AI 기반 미디데이터 추출 서비스 제공 방법에 대해서 설명되지 아니한 사항은 앞서 도 1 내지 도 4를 통해 AI 기반 미디데이터 추출 서비스 제공 방법에 대하여 설명된 내용과 동일하거나 설명된 내용으로부터 용이하게 유추 가능하므로 이하 설명을 생략하도록 한다.Matters not described in the AI-based midi data extraction service providing method of FIG. 5 are the same as or can be easily inferred from the contents described in the AI-based midi data extraction service providing method through FIGS. 1 to 4 above, and therefore, description thereof will be omitted below.

도 5를 통해 설명된 일 실시예에 따른 AI 기반 미디데이터 추출 서비스 제공 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. The method for providing an AI-based media data extraction service according to one embodiment described through FIG. 5 may also be implemented in the form of a recording medium including computer-executable instructions, such as an application or program module executed by a computer. The computer-readable medium may be any available medium that can be accessed by a computer, and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include all computer storage media. The computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented by any method or technology for storing information, such as computer-readable instructions, data structures, program modules, or other data.

전술한 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 서비스 제공 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있음)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 본 발명의 일 실시예에 따른 AI 기반 미디데이터 추출 서비스 제공 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다.The method for providing an AI-based midi data extraction service according to an embodiment of the present invention described above can be executed by an application that is installed by default on a terminal (which may include a program included in a platform or operating system that is installed by default on the terminal), and can also be executed by an application (i.e., a program) that a user directly installs on a master terminal through an application providing server such as an application store server, an application, or a web server related to the service. In this sense, the method for providing an AI-based midi data extraction service according to an embodiment of the present invention described above can be implemented as an application (i.e., a program) that is installed by default on a terminal or directly installed by a user, and can be recorded on a computer-readable recording medium such as a terminal.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The foregoing description of the present invention is for illustrative purposes only, and those skilled in the art will readily appreciate that the present invention can be readily modified into other specific forms without altering the technical spirit or essential characteristics of the present invention. Therefore, the embodiments described above should be understood as illustrative in all respects and not restrictive. For example, each component described as a single entity may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined manner.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention.

Claims

A user terminal for selecting an original music file to generate MIDI data of a MIDI file to be input into a karaoke accompaniment machine; and
An extraction service providing server including a song analysis unit that converts the original music file into a MIDI file by inputting it into a pre-built AI and then extracts tempo, key, composition, song structure, range, and genre as feature data from the MIDI file, a setting unit that measures the similarity between a vector embedding MIDI data of a pre-built MIDI data database and a vector embedding the feature data, and then extracts similar MIDI data based on the similarity, and sets the pattern of values of instruments, equalizers, and filters applied to the similar MIDI data as candidate MIDI data, and a generation unit that generates MIDI data for the original music file by performing a pre-built sound optimization process on the candidate MIDI data;
A system that provides an AI-based media data extraction solution including .

In the first paragraph,
The above song analysis section,
When analyzing the above tempo, the tempo meta event within the MIDI file is extracted to analyze the tempo.
By identifying the most frequent notes and chord progression patterns through the note distribution of the above MIDI file, the key and composition are analyzed.
In the above MIDI file, the sections corresponding to the intro, verse, chorus, interlude, and ending are analyzed using the established song structure analysis model.
Extract the range of the lowest and highest notes of the melody part corresponding to the vocal part in the above MIDI file, and
A system providing an AI-based MIDI data extraction solution characterized in that it classifies genres using a genre classification model established based on indices including tempo, rhythm pattern, and chord progression in the above MIDI file.

In the first paragraph,
The above MIDI data database is,
Regarding MIDI sound source, which is the actual sound heard by playing a MIDI file, a MIDI file including MIDI data and lyrics data;
Metadata for the above MIDI file;
Setting the instrument effect chain actually applied to the above MIDI file;
Public response and reaction data for the above MIDI sound source;
A system providing an AI-based media data extraction solution, characterized by including:

In the first paragraph,
The above preset sound optimization process is:
A step of applying a preset vocal clarity priority rule to the above candidate MIDI data;
A step of setting the drums and bass in the above candidate MIDI data to be clearly audible;
A step of automatically attenuating the volume of at least one instrument sound preset in the chorus section or high-frequency section of the vocal using the duckling technique in the above candidate MIDI data;
A step of automatically assigning an effect chain of the candidate MIDI data to correspond to the genre of the original music file;
A system providing an AI-based media data extraction solution, characterized by including:

In the first paragraph,
The above extraction service providing server is,
When the MIDI data is generated in the above generation unit, an accompaniment playback unit that inputs a MIDI file in which the lyrics data corresponding to the MIDI data is combined with the MIDI data into the karaoke accompaniment unit and plays it;
A system providing an AI-based media data extraction solution characterized by further including:

In paragraph 5,
The above accompaniment playback section is,
A system for providing an AI-based MIDI data extraction solution, characterized in that it extracts instrument data from the MIDI data of the MIDI file, extracts instrument sounds from the instrument data, and causes the MIDI file to be played in the karaoke accompaniment machine.