KR101864925B1

KR101864925B1 - Global Model-based Audio Object Separation method and system

Info

Publication number: KR101864925B1
Application number: KR1020160014914A
Authority: KR
Inventors: 조충상; 김제우; 이영한; 이혜인
Original assignee: 전자부품연구원
Priority date: 2016-02-05
Filing date: 2016-02-05
Publication date: 2018-06-05
Anticipated expiration: 2036-02-05
Also published as: WO2017135487A1; KR20170093474A

Abstract

글로벌 모델 기반 오디오 객체 분리 방법 및 시스템이 제공된다. 본 발명의 실시예에 따른 오디오 분리 방법은, 음원의 분리에 사용되는 제1 모델을 자동으로 확장하여 제2 모델을 생성하고, 생성된 제2 모델을 이용하여 음원을 다수의 오디오 객체로 분리한다. 이에 의해, 작은 길이의 글로벌 NMF 모델을 확장하여 긴 길이의 NMF 모델을 자동 생성하여 오디오 객체 분리에 이용하는 것이 가능해져, 보다 간편하고 짧은 시간으로 모든 음원들에 대해 오디오 객체 분리가 가능해진다.A global model-based audio object separation method and system are provided. The audio separation method according to the embodiment of the present invention automatically generates a second model by automatically expanding a first model used for separating a sound source and separates the sound source into a plurality of audio objects using the generated second model . Thus, it is possible to automatically generate a long-length NMF model by extending the global NMF model with a small length, and to use it for audio object separation, thereby enabling audio objects to be separated for all sound sources in a shorter time.

Description

[0001] The present invention relates to a global model-based audio object separation method and system,

본 발명은 오디오 처리 기술에 관한 것으로, 더욱 상세하게는 음원을 다수의 오디오 객체들로 분리하는 방법 및 시스템에 관한 것이다.The present invention relates to audio processing techniques and, more particularly, to a method and system for separating a sound source into a plurality of audio objects.

오디오 음원은 다수의 오디오 객체들, 이를 테면, 보컬, 드럼, 기타, 피아노 등으로 구성된다. 이러한 오디오 음원을 오디오 객체들로 분리하는 것이 가능하다.An audio source comprises a plurality of audio objects, such as vocals, drums, guitars, pianos, and the like. It is possible to separate these audio sources into audio objects.

현재, 오디오 객체 분리에 있어 가장 많이 사용되는 기법들 중 하나는 NMF 모델(Non-Negative Matrix Factorization Model) 기반의 오디오 객체 분리 기법이다.Currently, one of the most popular techniques for separating audio objects is the Non-Negative Matrix Factorization Model (NMF) -based audio object separation technique.

NMF 모델 기반으로 오디오 객체를 분리하기 위해서는, 분리하고자 하는 오디오 음원과 동일한 길이의 NMF 모델이 필요한데, 음원 마다 길이가 다르기 때문에, 이용되는 NMF 모델은 음원 마다 다르다.In order to separate audio objects based on the NMF model, the NMF model having the same length as the audio sound source to be separated is required. Since the sound source has a different length, the NMF model used differs for each sound source.

또한, 분리도를 높이기 위해, 분리하고자 하는 오디오 음원의 길이 외에 속성을 더 고려하여 오디오 엔지니어가 NMF 모델을 직접 설계하고 있는데, 매우 어렵고 장시간이 소요되는 작업이다.In addition, in order to increase the degree of separation, the audio engineer is designing the NMF model directly considering the length of the audio source to be separated in addition to the attribute, which is a very difficult and long time operation.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 음원 분리에 사용되는 모델을 간편하게 자동으로 생성하여 이용하는 글로벌 모델 기반 오디오 객체 분리 방법 및 시스템을 제공함에 있다.It is an object of the present invention to provide a global model-based audio object separation method and system that automatically and easily generates and uses a model used for sound source separation.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 오디오 분리 방법은, 음원의 분리에 사용되는 제1 모델을 자동으로 확장하여 제2 모델을 생성하는 단계; 및 생성된 제2 모델을 이용하여, 상기 음원을 다수의 오디오 객체로 분리하는 단계;를 포함한다.According to an aspect of the present invention, there is provided an audio separation method including: automatically expanding a first model used for separating a sound source to generate a second model; And separating the sound source into a plurality of audio objects using the generated second model.

그리고, 본 발명의 일 실시예에 따른 오디오 분리 방법은, 상기 음원의 길이를 파악하는 단계;를 더 포함하고, 상기 길이를 참조로, 상기 제1 모델을 확장할 수 있다.Further, the method of separating audio according to an embodiment of the present invention further includes the step of determining the length of the sound source, and the first model may be extended with reference to the length.

또한, 상기 제1 모델은, 제1 NMF 모델(Non-Negative Matrix Factorization Model)이고, 상기 제2 모델은, 제2 NMF 모델이며, 상기 생성 단계는, 상기 제1 NMF 모델의 W 행렬을 반복 나열하여 확장할 수 있다.Also, the first model is a first NMF model (Non-Negative Matrix Factorization Model), the second model is a second NMF model, and the generating step is a step of repeatedly listing the W matrix of the first NMF model .

그리고, 상기 생성 단계는, 상기 W 행렬의 전부 또는 일부 단위로 반복 나열하여 확장할 수 있다.The generating step may be repeatedly arranged in all or a part of the W matrix.

또한, 상기 생성 단계는, 상기 W 행렬의 일부를 선택하여 나열함으로써 확장할 수 있다.The generation step may be extended by selecting and arranging a part of the W matrix.

그리고, 상기 생성 단계는, 상기 W 행렬의 일부를 랜덤하게 선택할 수 있다.The generation step may randomly select a part of the W matrix.

또한, 상기 생성 단계는, 상기 음원의 분석 결과를 기초로, 상기 W 행렬의 일부를 선택할 수 있다.Also, the generation step may select a part of the W matrix based on the analysis result of the sound source.

한편, 본 발명의 다른 실시예에 따른, 오디오 분리 시스템은, 음원의 분리에 사용되는 제1 모델을 자동으로 확장하여 제2 모델을 생성하는 생성부; 및 생성된 제2 모델을 이용하여, 상기 음원을 다수의 오디오 객체로 분리하는 분리부;를 포함한다.According to another embodiment of the present invention, there is provided an audio separating system comprising: a generating unit for automatically expanding a first model used for separating a sound source to generate a second model; And a separator separating the sound source into a plurality of audio objects using the generated second model.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 작은 길이의 글로벌 NMF 모델을 확장하여 긴 길이의 NMF 모델을 자동 생성하여 오디오 객체 분리에 이용하는 것이 가능해져, 보다 간편하고 짧은 시간으로 모든 음원들에 대해 오디오 객체 분리가 가능해진다.As described above, according to the embodiments of the present invention, it is possible to automatically generate a long-length NMF model by extending a global NMF model having a small length, and to use it for audio object separation, It is possible to separate the audio object from the audio object.

도 1은 본 발명의 일 실시예에 따른 오디오 객체 분리 시스템의 설명에 제공되는 도면,
도 2는 NMF 모델의 설명에 제공되는 도면,
도 3은 글로벌 NMF 모델의 확장에 대한 설명에 제공되는 도면,
도 4는, 도 1에 도시된 NMF 모델 확장 엔진의 상세 설명에 제공되는 도면,
도 5 내지 도 8은, H를 H'으로 확장/변환하는 방법의 설명에 제공되는 도면들, 그리고,
도 9는 오디오 분석 모듈에 의한 인덱스 결정 방법의 상세 설명에 제공되는 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an audio object separation system according to an embodiment of the present invention.
2 is a diagram provided in the description of the NMF model,
3 is a drawing provided in the description of an extension of the global NMF model,
FIG. 4 is a diagram provided in the detailed description of the NMF model extension engine shown in FIG. 1,
Figures 5-8 illustrate the drawings provided in the description of a method for extending / converting H to H '
FIG. 9 is a diagram provided in detail of an index determination method by the audio analysis module.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 오디오 객체 분리 시스템의 설명에 제공되는 도면이다. 본 발명의 실시예에 따른 오디오 객체 분리 시스템은, 입력되는 길이 T의 음원을 오디오 객체들로 분리하기 위해 필요한 NMF 모델들을 그보다 작은 사이즈의 NMF 모델들을 확장하여 생성하는 시스템이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram illustrating an audio object separation system according to an exemplary embodiment of the present invention. The audio object separation system according to the embodiment of the present invention is a system for expanding NMF models required for separating a sound source having an input length T into audio objects,

이와 같은 기능을 수행하는 본 발명의 실시예에 따른 오디오 객체 분리 시스템은, 도 1에 도시된 바와 같이, NMF 모델 확장 엔진(110) 및 NMF 모델 기반 객체 분리 엔진(120)을 포함한다.As shown in FIG. 1, the audio object separation system according to an embodiment of the present invention includes an NMF model extension engine 110 and an NMF model based object separation engine 120.

NMF 모델 확장 엔진(110)은 음원 분리에 사용되는 글로벌 NMF 모델들(10-1, ... 10-n)을 음원 길이에 따라 자동으로 확장하여, 음원 분리에 사용할 NMF 모델들(20-1, ... 20-n)을 생성한다.The NMF model extension engine 110 automatically expands the global NMF models 10-1, ..., 10-n used for sound source separation according to the length of the sound source, and outputs the NMF models 20-1 , ... 20-n.

NMF 모델 기반 객체 분리 엔진(120)은 NMF 모델 확장 엔진(110)에 의해 생성된 NMF 모델들(20-1, ... 20-n)을 이용하여, 음원을 다수의 오디오 객체들로 분리한다.The NMF model based object separation engine 120 separates a sound source into a plurality of audio objects using the NMF models 20-1 to 20-n generated by the NMF model extension engine 110 .

글로벌 NMF 모델들(10-1, ... 10-n)은 오디오 객체(보컬, 드럼, 기타, 피아노) 별로 마련되어 있는 작은 길이의 NMF 모델들로, 입력되는 모든 음원들에 대해 공통적으로 사용된다.The global NMF models 10-1 to 10-n are small-length NMF models provided for each audio object (vocal, drum, guitar, piano), and are commonly used for all input sound sources .

NMF 모델은, 도 2에 도시된 바와 같이, '여러 오디오 객체들이 믹싱된 음원'(이하, '믹싱 음원'으로 표기)에 대한 STFT(Short Term Fourier Transform) 연산 결과로부터 W(F by k)와 H(k by N)을 설정함으로써 결정된다.As shown in FIG. 2, the NMF model includes W (F by k) and W (F k) from an STFT (Short Term Fourier Transform) operation result of 'a sound source mixed with various audio objects' H (k by N).

N은 STFT 연산에 사용된 윈도우 사이즈, F는 STFT 연산에서 frequency bin의 개수, k는 STFT 연산에 적용된 차원이다. 믹싱 음원의 길이 T가 영향을 미치는 파라미터는 "N"이다.N is the window size used in the STFT operation, F is the number of frequency bins in the STFT operation, and k is the dimension applied to the STFT operation. The parameter affecting the length T of the mixing sound source is "N ".

이에, 글로벌 NMF 모델들(10-1, ... 10-n)로부터 확장된 NMF 모델들(20-1, ... 20-n)을 생성함에 있어, 도 3에 도시된 바와 같이 글로벌 NMF 모델들(10-1, ... 10-n)의 H(k by N1) 행렬을 길이 T의 믹싱 음원에 대해 요구되는 H'(k by N2) 행렬로 확장한다.In order to generate the extended NMF models 20-1 to 20-n from the global NMF models 10-1 to 10-n, as shown in FIG. 3, (K by N1) matrix of the models (10-1, ..., 10-n) to the H '(k by N2) matrix required for the mixing sound source of length T. [

그리고, 글로벌 NMF 모델들(10-1, ... 10-n)의 W는 NMF 모델들(20-1, ... 20-n)을 생성함에 있어 그대로 사용한다. 즉, 글로벌 NMF 모델들(10-1, ... 10-n)의 W와 NMF 모델들(20-1, ... 20-n)의 W는 동일하게 구현한다.W of the global NMF models 10-1, ..., 10-n are used as they are in generating the NMF models 20-1, ..., 20-n. That is, W of the global NMF models 10-1, ... 10-n and W of the NMF models 20-1, ..., 20-n are implemented in the same manner.

도 4는, 도 1에 도시된 NMF 모델 확장 엔진(110)의 상세 설명에 제공되는 도면이다. 도 4에는 NMF 모델 확장 엔진(110)이 글로벌 NMF 모델들(10-1, ... 10-n)을 NMF 모델들(20-1, ... 20-n)로 변환하는 과정이 나타나 있다.4 is a diagram provided in the detailed description of the NMF model extension engine 110 shown in FIG. 4 shows a process in which the NMF model extension engine 110 converts global NMF models 10-1, ..., 10-n into NMF models 20-1, ..., 20-n .

구체적으로, 도 4에는, NMF 모델 확장 엔진(110)은 믹싱 음원의 길이 T를 파악하고, 파악된 길이 T를 기초로 글로벌 NMF 모델들(10-1, ... 10-n)을 구성하는 작은 사이즈의 H를 긴 사이즈의 H'로 변환하여, NMF 모델들(20-1, ... 20-n)를 생성하는 과정이 나타나 있다.4, the NMF model extension engine 110 determines the length T of the mixing sound source and configures the global NMF models 10-1, ..., 10-n based on the recognized length T A process of converting the small size H into the long size H 'and generating the NMF models 20-1, ..., 20-n is shown.

H를 H'으로 확장하여 변환하는 방법은 매우 다양하며, 이하에서 상세히 설명한다. H를 H'으로 변환함에 있어서는, H 자체만을 이용할 수도 있지만, 오디오 분석 모듈(115)에 의한 믹싱 음원의 분석 결과를 이용할 수도 있다.The method of expanding H to H 'and converting it is very diverse and will be described in detail below. In converting H into H ', only H itself may be used, but the analysis result of the mixing sound source by the audio analysis module 115 may be used.

도 5는 H를 H'으로 확장/변환하는 방법을 나타내었다. 도 5에 제시된 확장/변환 방법은, 열 길이가 N1인 H를 반복 나열하되, 요구되는 N2를 초과하는 부분은 삭제하여, H'를 생성하는 방법이다.FIG. 5 shows a method of expanding / converting H to H '. The expansion / conversion method shown in FIG. 5 is a method of repeatedly rearranging H with a column length N1, and deleting a portion exceeding the required N2 to generate H '.

도 6은 H를 H'으로 확장/변환하는 다른 방법을 나타내었다. 도 6에 제시된 확장/변환 방법은, 열 길이가 N1인 H의 열들을 열 단위로 반복 나열하여, H'을 생성하는 방법이다. 이때, 요구되는 N2를 맞추기 위해 반복 횟수는 열 마다 다르게 설정할 수 있다.Figure 6 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 6 is a method of generating H 'by repeatedly arranging columns of H having a column length N1 in column units. At this time, the number of repetitions can be set differently for each column to match the required N2.

도 7은 H를 H'으로 확장/변환하는 또 다른 방법을 나타내었다. 도 7에 제시된 확장/변환 방법은, 열 길이가 N1인 H의 열들 중 하나를 랜덤하게 선택하여 나열하는 것을 반복하여 H'를 생성하는 방법이다.Figure 7 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 7 is a method for generating H 'by repeatedly selecting and arranging randomly one of the columns of H having a column length N1.

도 8은 H를 H'으로 확장/변환하는 또 다른 방법을 나타내었다. 도 8에 제시된 확장/변환 방법은, H의 열들 중 하나를 선택하여 나열하는 것을 반복하여 H'를 생성한다는 점에서 도 7에 제시된 방법과 동일하다.Figure 8 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 8 is the same as the method shown in FIG. 7 in that one of the columns of H is selected and arranged to generate H 'repeatedly.

하지만, 도 8에 제시된 방법에서는 H'에 나열할 H의 열들 중 하나를 랜덤하게 선택하는 것이 아니라, 오디오 분석 모듈(115)에 의해 믹싱 음원의 분석 결과를 기초로 결정된 인덱스에 따라 선택한다는 점에서, 도 7에 제시된 방법과 차이가 있다.However, in the method shown in FIG. 8, instead of randomly selecting one of the columns of H to be listed in H ', the audio analysis module 115 selects according to the index determined based on the analysis result of the mixing sound source , There is a difference from the method shown in Fig.

도 9는 오디오 분석 모듈(115)에 의한 인덱스 결정 방법의 상세 설명에 제공되는 도면이다. 도 9에 도시된 바와 같이, 오디오 분석 모듈(115)은 믹싱 음원에 대한 STFT 연산 결과의 절대값을 산출하고, 산출된 결과에서 윈도우를 이동시키면서 유사도 분석을 통해 가장 유사한 H의 열을 선택하는 것을 반복하여 인덱스들을 생성한다.FIG. 9 is a diagram provided in detail of the method of determining an index by the audio analysis module 115. FIG. As shown in FIG. 9, the audio analysis module 115 calculates the absolute value of the STFT operation result for the mixing sound source, moves the window from the calculated result, and selects the most similar H column through the similarity analysis Repeatedly create indexes.

지금까지, 글로벌 모델 기반 오디오 객체 분리 방법 및 시스템에 대해 바람직한 실시예들을 들어 상세히 설명하였다.So far, preferred embodiments have been described in detail for a global model based audio object separation method and system.

위 실시예들에서는 NMF 모델들을 이용한 오디오 객체 분리를 상정하였는데 예시를 위한 것이다. NMF 모델이 아닌 그로부터 변형된 모델 또는 그와 다른 종류의 모델을 적용하는 경우에도, 본 발명의 기술적 사상이 적용될 수 있음은 물론이다.In the above embodiments, the audio object separation using the NMF models is assumed. It is needless to say that the technical idea of the present invention can be applied to a case where a model other than the NMF model is modified or a model different from the NMF model is applied.

또한, 위 실시예들에서, 오디오 객체들로 언급한 보컬, 드럼, 기타, 피아노 역시 예시적인 것에 불과하다. 이 보다 더 다양한 오디오 객체들로 음원을 분리하는 경우에도 본 발명의 기술적 사상이 적용가능하다.Also, in the above embodiments, the vocals, drums, guitars, and piano referred to as audio objects are merely illustrative. The technical idea of the present invention is also applicable to the case of separating a sound source with a variety of audio objects.

본 발명의 실시예들에서 제시한 오디오 객체 분리 방법 및 시스템은, 오디오 효과, 콘텐츠 제작, 감시 시스템 등과 같은 분야는 물론, 음성 분리나 그 밖의 다른 종류의 음원 분리가 필요한 분야에 적용될 수 있다.The audio object separation method and system proposed in the embodiments of the present invention can be applied to fields such as audio effects, content production, surveillance system, and the like as well as fields requiring voice separation or other types of source separation.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

10-1, ... 10-n : 글로벌 NMF 모델
20-1, ... 20-n : NMF 모델
110 : NMF 모델 확장 엔진
120 : NMF 모델 기반 객체 분리 엔진10-1, ... 10-n: global NMF model
20-1, ..., 20-n: NMF model
110: NMF model extension engine
120: NMF model-based object separation engine

Claims

Generating a second NMF model by using the W matrix of the first NMF model (Non-Negative Matrix Factorization Model) commonly used for separating the sound sources and by repeating the H matrix; And
And separating the sound source into a plurality of audio objects using the generated second NMF model.

delete

The method according to claim 1,
Wherein the generating comprises:
Wherein the H matrix is repeatedly arranged in all or a part of the H matrix.

The method according to claim 1,
Wherein the generating comprises:
And selecting and arranging a part of the H matrix.

The method of claim 5,
Wherein the generating comprises:
Wherein a portion of the H matrix is randomly selected.

The method of claim 5,
Wherein the generating comprises:
Calculating an absolute value of an STFT (Short Term Fourier Transform) operation result on the sound source, and analyzing the calculated result and a similarity degree to select a portion of the most similar H matrix.

A generating unit for generating a second NMF model by using the W matrix of the first NMF model (Non-Negative Matrix Factorization Model) commonly used for the sound source separation, and by repeatedly arranging the H matrix; And
And separating the sound source into a plurality of audio objects using the generated second NMF model.