KR20080091099A

KR20080091099A - Audio Channel Extraction Using Interchannel Amplitude Spectrum

Info

Publication number: KR20080091099A
Application number: KR1020087014637A
Authority: KR
Inventors: 파벨 츄바레프
Original assignee: 디티에스 라이센싱 리미티드
Priority date: 2005-12-06
Filing date: 2006-12-01
Publication date: 2008-10-09
Also published as: CA2632496A1; CN101405717A; CN101405717B; JP2009518684A; EP1958086A2; RU2432607C2; HK1128786A1; WO2007067429A2; TW200739366A; WO2007067429A3; US20070135952A1; RU2008127329A; IL191701A0; MX2008007226A; BRPI0619468A2; AU2006322079A1; NZ568402A; EP1958086A4; WO2007067429B1

Abstract

채널간 진폭 스펙트럼은 오디오 소스의 믹스를 포함하는 2 이상의 오디오 입력 채널들로부터 다중 오디오 채널을 추출하는데 이용될 수 있다. 이러한 접근 방법은 단지 입력 채널들의 선형 조합이 아닌 다중 오디오 채널들을 생성하고, 따라서 예컨대, 블라인드 소스 분리(BSS) 알고리즘과 함께 사용될 수 있다.The interchannel amplitude spectrum can be used to extract multiple audio channels from two or more audio input channels containing a mix of audio sources. This approach produces multiple audio channels, not just a linear combination of input channels, and thus can be used, for example, with a blind source separation (BSS) algorithm.

Description

AUDIO CHANNEL EXTRACTION USING INTER-CHANNEL AMPLITUDE SPECTRA}

본 발명은 오디오 소스의 믹스를 포함하는 2 이상의 오디오 입력 채널로부터의 다중 오디오 채널의 추출에 관한 것으로, 더 상세하게는 추출을 수행하기 위한 채널간 진폭 스펙트럼의 이용에 관한 것이다.The present invention relates to the extraction of multiple audio channels from two or more audio input channels comprising a mix of audio sources, and more particularly to the use of interchannel amplitude spectra to perform extraction.

블라인드 소스 분리(BSS; Blind Source Separation)는 개개의 소스의 선형적인 혼합을 갖는 스테레오 채널로부터 개개의 원래의 오디오 소스를 추정할 필요가 있는 영역에 집중적으로 사용되는 한 종류의 방법이다. 선형적인 혼합 소스로부터 개개의 원래의 소스를 분리하는 것에 대한 어려움은, 많은 실제 애플리케이션에서 원래의 신호에 대해 알려진 바가 적거나 또는 신호들이 혼합된 방식에 대해 알려진 바가 적다는 것이다. 블라인드하게 디믹싱을 행하기 위하여, 일반적으로 신호의 통계적 특징에 대한 어떤 가정이 이루어진다.Blind Source Separation (BSS) is a type of method used intensively in areas where it is necessary to estimate each original audio source from a stereo channel with a linear mix of individual sources. The difficulty in separating individual original sources from linear mixed sources is that in many practical applications little is known about the original signal or little is known about how the signals are mixed. In order to blindly demix, some assumptions are generally made about the statistical characteristics of the signal.

독립 컴포넌트 분석(ICA; Independent Component Analysis)이 블라인드 소스 분리를 수행하는데 가장 널리 이용되는 한 방법이다. ICA는 오디오 소스가 통계적으로 독립적이고, 비정규 분포를 갖는다고 가정한다. 또한, 오디오 입력 채널의 수는 적어도 분리될 오디오 소스의 수만큼 커야만 한다. 게다가, 입력 채널은 선형적 으로 독립적이고; 그들 스스로의 선형 조합은 아니어야 한다. 즉, 예컨대, 목적이 추출이라면, 좌측 채널과 우측 채널의 선형 조합으로서 3 또는 4번째 채널을 형성하는 스테레오 믹스로부터 음성, 현악소리, 타악소리 등과 같은 3개 또는 아마도 4개의 오디오 소스는 충분하지 않을 것이다. ICA 알고리즘은 당업계에 잘 알려져 있고, 본 명세서에서 참조로서 인용되는, 1999년 4월 뉴럴 네트웍스, Aapo Hyvarinen 및 Erkki Oja저, "Independent Component Analysis: Algorithms and Applications"에 기술되어 있다.Independent Component Analysis (ICA) is one of the most widely used methods for performing blind source separation. ICA assumes that the audio sources are statistically independent and have a nonnormal distribution. In addition, the number of audio input channels must be at least as large as the number of audio sources to be separated. In addition, the input channels are linearly independent; It should not be their own linear combination. That is, for example, if the purpose is extraction, then three or possibly four audio sources, such as voice, strings, percussion, etc., from a stereo mix forming a third or fourth channel as a linear combination of left and right channels will not be sufficient. will be. ICA algorithms are well known in the art and described in April 1999 by Neural Networks, Aapo Hyvarinen and Erkki Oja, "Independent Component Analysis: Algorithms and Applications", which are hereby incorporated by reference.

유감스럽게도, 많은 실제 상황에서는 스테레오 믹스만이 이용 가능하다. 이는 믹스로부터 최대 2개의 오디오 소스를 분리하는 것에 대한, BSS 알고리즘 기반의 ICA를 심히 제한한다. 많은 애플리케이션에서, 오디오 믹싱 및 재생은 종래의 스테레오에서 5.1, 6.1 또는 심지어 더 높은 채널 구성을 갖는 다중 채널 오디오로 옮겨갔다. 다중 채널 오디오를 위해 막대한 카탈로그의 스테레오 음악을 리믹스할 수 있어야 한다는 큰 요구가 존재한다. 이를 효과적으로 수행하기 위해, 반드시 스테레오 믹스로부터 3 이상의 소스를 분리해야하는 것이 아니라면, 막대한 카탈로그의 스테레오 음악을 리믹스하는 것은 종종 매우 바람직할 것이다. 현재의 ICA 기술은 이를 지원할 수 없다.Unfortunately, in many practical situations only stereo mix is available. This severely limits the BSS algorithm-based ICA for separating up to two audio sources from the mix. In many applications, audio mixing and playback has moved from conventional stereo to multi-channel audio with 5.1, 6.1 or even higher channel configurations. There is a great need to be able to remix huge catalogs of stereo music for multichannel audio. In order to do this effectively, it would often be highly desirable to remix a huge catalog of stereo music, unless it is necessary to separate three or more sources from the stereo mix. Current ICA technology cannot support this.

본 발명의 몇 가지 양태에 대한 기본적인 이해를 제공하기 위해, 아래에 본 발명의 상세한 설명이 후속한다.To provide a basic understanding of some aspects of the invention, the following detailed description of the invention follows.

본 상세한 설명은 본 발명의 핵심 또는 중요한 요소들을 식별하도록, 또는 본 발명의 범위를 서술하도록 의도된 것은 아니다. 상세한 설명의 단 하나의 목적은 더 상세한 설명 및 후에 제시되는 청구 범위에 대한 서문으로서 본 발명의 일부 개념을 간단한 형태로 제시하는 것이다.This description is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts of the invention in a simplified form as a prelude to the more detailed description and claims that follow.

본 발명은 단지 입력 채널들의 선형 조합이 아닌 2 이상의 오디오 입력 채널로부터 다중 오디오 출력 채널을 추출하는 방법을 제공한다. 이와 같은 출력 채널은, 예컨대, 적어도 분리될 소스만큼 많은 선형적으로 독립적인 입력 채널, 또는 직접적으로 리믹싱 애플리케이션에 대해서는 예컨대, 2.0 내지 5.1 입력 채널을 요구하는 블라인드 소스 분리(BSS) 알고리즘과 함께 이용될 수 있다.The present invention provides a method for extracting multiple audio output channels from two or more audio input channels rather than just a linear combination of input channels. Such output channels are used with, for example, blind source separation (BSS) algorithms that require at least as many linearly independent input channels as the source to be separated, or for example 2.0 to 5.1 input channels for a directly remixing application. Can be.

이는 오디오 소스의 믹스를 갖는 M개의 프레이밍된 오디오 입력 채널들 각각의 쌍에 대해 적어도 하나의 채널간 진폭 스펙트럼을 생성함으로써 구현된다. 예컨대, 이러한 진폭 스펙트럼은 입력 스펙트럼 쌍의 선형 차, 로그 차, 또는 놈 차, 또는 합산을 나타낼 수 있다. 그 다음, 채널간 진폭 스펙트럼의 각각의 스펙트럼 선은 적당하게 M-1 차원의 채널 추출 공간에서 N개의 정의된 출력들 중 하나에 맵핑된다. M개의 입력 채널들로부터의 데이터는 N개의 오디오 출력 채널들을 형성하기 위해 스펙트럼 맵핑에 따라 결합된다. 일 실시예에서, 입력 스펙트럼은 맵핑에 따라 결합되고, 결합된 스펙트럼은 역변환되며, N개의 오디오 출력 채널을 형성하기 위해 프레임들이 재결합된다. 다른 실시예에서, 대응하는 스펙트럼 맵을 이용하여, N개의 출력 각각에 대해 콘볼루션 필터가 구성된다. 입력 채널들은 N개의 필터들을 통과하고 N개의 오디오 출력 채널을 형성하기 위해 재결합된다.This is implemented by generating at least one interchannel amplitude spectrum for each pair of M framed audio input channels having a mix of audio sources. For example, such amplitude spectra can represent linear differences, log differences, or norm differences, or summations of pairs of input spectra. Each spectral line of the interchannel amplitude spectrum is then mapped to one of the N defined outputs in the channel extraction space of the M-1 dimension, as appropriate. Data from the M input channels are combined according to the spectral mapping to form N audio output channels. In one embodiment, the input spectrum is combined according to the mapping, the combined spectrum is inversely transformed, and the frames are recombined to form N audio output channels. In another embodiment, using a corresponding spectral map, a convolution filter is configured for each of the N outputs. The input channels pass through N filters and recombine to form N audio output channels.

당업자들에게 본 발명의 이런저런 특징 및 이점들은 첨부한 도면들과 함께, 바람직한 실시예들의 후속하는 상세한 설명으로부터 명백해질 것이다.These and other features and advantages of the present invention will become apparent to those skilled in the art from the following detailed description of the preferred embodiments, taken in conjunction with the accompanying drawings.

도 1은 채널 추출기, 및 오디오 믹스로부터 다중 오디오 소스를 분리하는 소스 분리기를 포함하는 블럭도이다.1 is a block diagram that includes a channel extractor and a source separator that separates multiple audio sources from an audio mix.

도 2는 본 발명에 따라 채널간 진폭 스펙트럼을 이용하여 추가의 오디오 채널을 추출하기 위한 블럭도이다.2 is a block diagram for extracting additional audio channels using the interchannel amplitude spectrum in accordance with the present invention.

도 3a 내지 3c는 채널간 진폭 스펙트럼으로부터 채널 추출 공간으로의 다양한 맵핑을 묘사하는 도면이다.3A-3C depict various mappings from the interchannel amplitude spectrum to the channel extraction space.

도 4는 스펙트럼 맵핑에 따라 입력 채널의 스펙트럼 합성을 이용하여 스테레오 믹스로부터 3개의 출력 채널을 추출하기 위한 예시적 실시예의 블럭도이다.4 is a block diagram of an exemplary embodiment for extracting three output channels from a stereo mix using spectral synthesis of input channels in accordance with spectral mapping.

도 5a 내지 5는 입력 오디오 프레임 시퀀스를 형성하기 위해 오디오 채널을 윈도잉하는 것을 묘사한 도면이다.5A-5 are diagrams depicting windowing audio channels to form an input audio frame sequence.

도 6는 스테레오 오디오 신호의 주파수 스펙트럼의 플롯이다.6 is a plot of the frequency spectrum of a stereo audio signal.

도 7은 차이 스펙트럼의 플롯이다.7 is a plot of the difference spectrum.

도 8은 입력 스펙트럼을 결합하는 2가지 상이한 접근 방법을 설명하는 표이다. 8 is a table illustrating two different approaches to combining input spectra.

도 9a 내지 9c는 3개의 출력 오디오 채널에 대해 결합된 스펙트럼의 플롯이다.9A-9C are plots of combined spectra for three output audio channels.

도 10은 스펙트럼 맵핑에 따라 입력 채널의 시간-도메인 합성을 수행하기 위해 콘볼루션 필터를 이용하는 대안적인 실시예의 블럭도이다.10 is a block diagram of an alternative embodiment of using a convolution filter to perform time-domain synthesis of an input channel in accordance with spectral mapping.

본 발명은 오디오 소스의 믹스를 포함하는 2 이상의 오디오 입력 채널로부터의 다중 오디오 채널을 추출하는 방법으로서, 더 상세하게는, 추출을 수행하기 위한 채널간 진폭 스펙트럼의 이용에 관한 방법을 제공한다. 이러한 접근 방법은 단지 입력 채널의 선형 결합이 아닌 다중 오디오 채널을 생성하고, 따라서, 예컨대, 블라인드 소스 분리(BSS) 알고리즘과 함께 이용되거나 또는 다양한 리믹싱 애플리케이션에 대해 직접 추가의 채널을 제공하는데 이용될 수 있다.The present invention provides a method for extracting multiple audio channels from two or more audio input channels comprising a mix of audio sources, and more particularly, provides a method for the use of interchannel amplitude spectra to perform extraction. This approach can be used to create multiple audio channels, not just linear combinations of input channels, and thus be used, for example, with blind source separation (BSS) algorithms or to provide additional channels directly for various remixing applications. Can be.

오직 예시적인 실시예로서, BSS 알고리즘을 이용하는 추출 기술에 대해 설명할 것이다. 상술한 바와 같이, 혼합 오디오 소스로부터 Q개의 원래의 오디오 소스를 추출하기 위한 BSS 알고리즘에 대해, BSS 알고리즘은 믹스를 가지고 있는 적어도 Q개의 선형적으로 독립적인 오디오 채널을 입력으로서 수신해야 한다. 도 1에 도시된 바와 같이, N>M개의 오디오 출력 채널(14)을 발생하기 위하여 본 발명에 따라 입력 채널의 채널간 진폭 스펙트럼을 이용하는 채널 추출기(12)에 M개의 오디오 입력 채널(10)이 입력된다. N개의 오디오 출력 채널로부터 Q개의 원래의 오디오 소스(18)를 분리하기 위하여, 소스 분리기(16)는 BSS 알고리즘 기반의 ICA를 실행하는데, 여기서 Q≤N이다. 예를 들어, 채널 추출기와 소스 분리기가 함께 사용될 때, 이들은 종래의 스테레오 믹스로부터 4 이상의 오디오 소스를 추출할 수 있다. 이는 현재는 스테레오 형식으로만 존재하는 음악 카탈로그의 다중-채널 구성으로의 리믹싱에 대한 좋은 응용예를 발견할 것이다.As an example embodiment, an extraction technique using the BSS algorithm will be described. As mentioned above, for a BSS algorithm for extracting Q original audio sources from a mixed audio source, the BSS algorithm must receive as input at least Q linearly independent audio channels having a mix. As shown in Fig. 1, M < RTI ID = 0.0 > Audio Input Channels 10 < / RTI > Is entered. In order to separate the Q original audio sources 18 from the N audio output channels, the source separator 16 implements a BSS algorithm based ICA, where Q ≦ N. For example, when a channel extractor and a source separator are used together, they can extract four or more audio sources from a conventional stereo mix. This will find a good application for the remixing of music catalogs into multi-channel configurations that currently exist only in stereo format.

도 2에 도시된 바와 같이, 채널 추출기는 채널간 진폭 스펙트럼을 이용하는 알고리즘을 실행한다. 채널 추출기는 M개의 오디오 입력 채널(10) 각각을 각각의 입력 스펙트럼으로 변환하는데, 여기서 M은 적어도 2이다(단계 20). 주파수 스펙트럼을 발생시키기 위해, 예컨대, 고속 푸리에 변환(FFT; fast fourier transform) 또는 DCT, MDCT 또는 웨이블렛이 이용될 수 있다. 그 다음, 채널 추출기는 적어도 한 쌍의 입력 채널에 대해 입력 스펙트럼으로부터 적어도 하나의 채널간 진폭 스펙트럼을 생성한다(단계 22). 예를 들어, 이러한 채널간 진폭 스펙트럼은 입력 스펙트럼 쌍에 대해 스펙트럼 선의 선형 차, 로그 차, 또는 놈 차, 또는 합산을 나타낸다. 더 상세하게, 'A' 및 'B'가 제1 채널 및 제2 채널에 대한 스펙트럼 선의 진폭이라면, A-B는 선형 차, Log(A)-Log(B)는 로그 차, (A²- B²)은 L2 놈 차이고, A+B는 합산이다. 2개의 채널의 채널간 진폭 관계를 비교하기 위해, A와 B의 많은 다른 함수 f(A,B)가 사용될 수 있다는 것은 당업자들에게 명백할 것이다.As shown in FIG. 2, the channel extractor executes an algorithm that uses the interchannel amplitude spectrum. The channel extractor converts each of the M audio input channels 10 into their respective input spectra, where M is at least two (step 20). To generate the frequency spectrum, for example, a fast fourier transform (FFT) or DCT, MDCT or wavelet can be used. The channel extractor then generates at least one interchannel amplitude spectrum from the input spectrum for the at least one pair of input channels (step 22). For example, such interchannel amplitude spectra represent linear, logarithmic, or norm differences, or summations, of spectral lines with respect to pairs of input spectra. More specifically, if 'A' and 'B' are the amplitudes of the spectral lines for the first and second channels, AB is a linear difference, Log (A) -Log (B) is a log difference, (A ² -B ² ) Is the L2 norm difference, and A + B is the summation. It will be apparent to those skilled in the art that many different functions f (A, B) of A and B can be used to compare the amplitude-to-channel amplitude relationship of two channels.

채널 추출기는, 적당하게는 M-1 차원의 채널 추출 공간에서, 채널간 진폭 스펙트럼에 대한 각각의 스펙트럼 선을 N개의 정의된 출력들 중 하나에 맵핑한다(단계 24). 도 3a에 도시된 바와 같이, 1-차원 공간(26)에 출력 S₁(∞,-3db), S₂(-3db, +3db), 및 S₃(+3db,∞)을 정의하기 위해, 한 쌍의 입력 채널 (L/R)에 대한 로그 차는 -3db 및 +3db에서 문턱값 처리된다. 특정 스펙트럼 선의 진폭을 Odb이라 하면, 이 진폭은 출력 S₂등으로 맵핑된다. 추가의 문턱값을 정의함으로써, 맵핑은 N>3까지 쉽게 확장된다. 도 3b에 도시된 바와 같이, 3개의 입력 채널 L, R 및 C는 2-차원 채널 추출 공간(28)에서 13개의 출력 채널 S₁, S₂ ... S₁₃로 맵핑된다. L/C의 로그 차는 R/C의 로그 차에 대해 플롯팅되고, 16개의 셀을 정의하기 위해 문턱값 처리된다. 이러한 특정 예에서, 맨 끝 모서리 셀들은 모두 동일한 출력 S₁으로 맵핑된다. 예를 들어, 원하는 개수의 출력 또는 입력 채널들의 사운드 필드 관계의 임의의 사전 지식에 따라, 셀들의 다른 조합이 가능하다. 각각의 스펙트럼 선에 대해, R/C 및 L/C의 로그 차의 진폭은 공간내에 맵핑되고 적당한 출력에 할당된다. 이러한 방식으로, 각각의 스펙트럼 선은 단일 출력으로만 맵핑된다. 대안으로서, 도 3a에 도시된 바와 같이, 1-차원 공간에서 R/C 및 L/C 채널간 진폭 스펙트럼은 개별적으로 문턱값 처리될 수 있다. 다른 2-차원 채널 추출 공간(30)에서 3개 입력 채널 L,R 및 C의 9개의 출력으로의 대안적인 맵핑이 도 3c에 도시되어 있다. 이러한 3개의 예들은, 채널간 진폭 스펙트럼이 많은 다른 방식으로 N개의 출력으로 맵핑될 수 있고, 또한 임의의 개수의 입력 및 출력 채널로 원리가 확장된다는 것을 보여주기 위해 의도된 것이다. 각각의 스펙트럼 선은 M-1 차원의 추출 공간에서 고유의 출력으로 맵핑될 수 있다.The channel extractor maps each spectral line for the interchannel amplitude spectrum to one of the N defined outputs, suitably in the channel extraction space of the M-1 dimension (step 24). As shown in FIG. 3A, to define the outputs S ₁ (∞, -3db), S ₂ (-3db, + 3db), and S ₃ (+ 3db, ∞) in the one-dimensional space 26, The log difference for a pair of input channels (L / R) is thresholded at -3db and + 3db. If the amplitude of a specific spectral line is Odb, this amplitude is mapped to the output S ₂ or the like. By defining additional thresholds, the mapping is easily extended to N> 3. As shown in FIG. 3B, three input channels L, R and C are mapped to ₁₃ output channels S ₁ , S ₂ ... S ₁₃ in a two-dimensional channel extraction space 28. The log difference of L / C is plotted against the log difference of R / C and thresholded to define 16 cells. In this particular example, the last corner cells are all mapped to the same output S ₁ . For example, other combinations of cells are possible, depending on any prior knowledge of the sound field relationship of the desired number of output or input channels. For each spectral line, the amplitude of the logarithmic difference of R / C and L / C is mapped in space and assigned to the appropriate output. In this way, each spectral line is mapped to only a single output. Alternatively, as shown in FIG. 3A, the amplitude spectra between R / C and L / C channels in one-dimensional space may be individually thresholded. An alternative mapping of three input channels L, R and C to nine outputs in another two-dimensional channel extraction space 30 is shown in FIG. 3C. These three examples are intended to show that the inter-channel amplitude spectrum can be mapped to N outputs in many different ways, and that the principle extends to any number of input and output channels. Each spectral line can be mapped to its own output in the M-1 dimension extraction space.

일단 각각의 스펙트럼 선이 N개의 출력 중 하나에 맵핑되면, 채널 추출기는 맵팽에 따라 N개의 출력 각각에 대해 M개의 입력 채널의 데이터를 결합한다(단계 32). 예를 들어, 도 3a에 도시된, 출력 S₁, S₂ 및 S₃로 맵핑된 스테레오 채널 L 및 R의 경우와, 또한 입력 스펙트럼이 8개의 스펙트럼 선을 갖는 상황을 가정하자. 채널간 진폭 스펙트럼에 기초하여, 선 1-3은 S₁로 맵핑되고, 선 4-6은 S₂로 맵핑되며, 선 7-8은 S₂로 맵핑된다면, 채널 추출기는 선 1, 2, 및 3 각각에 대해 입력 데이터를 결합하고, 결합된 데이터를 오디오 출력 채널 등으로 보낼 것이다. 일반적으로, 입력 데이터는 가중치화된 평균으로서 결합된다. 이 가중치는 같거나 또는 변할 수 있다. 예를 들어, 입력 채널들 L, R, 및 C의 사운드 필드 관계에 관한 특정 정보가 알려져 있다면, 이 특정 정보는 가중치 선택에 영향을 줄 수 있다. 예를 들어, L≫R라면, 조합에서 더 비중있는 L 채널을 가중치로서 선택할 수 있다. 또한, 가중치는 출력 모두에 대해 동일하거나 또는 같거나 다른 이유로 변할 수 있다.Once each spectral line is mapped to one of the N outputs, the channel extractor combines the data of the M input channels for each of the N outputs according to the mappin (step 32). For example, the outputs S ₁ , S ₂ , shown in FIG. 3A Assume the case of stereo channels L and R mapped to and S ₃ and also the situation where the input spectrum has eight spectral lines. Based on the inter-channel amplitude spectrum, if lines 1-3 are mapped to S ₁ , lines 4-6 are mapped to S ₂ , and lines 7-8 are mapped to S ₂ , the channel extractor is selected from lines 1, 2, and We will combine the input data for each, and send the combined data to an audio output channel, etc. In general, input data is combined as a weighted average. This weight may be the same or vary. For example, if specific information about the sound field relationship of the input channels L, R, and C is known, this specific information may affect the weight selection. For example, if L »R, then the more weighted L channel in the combination can be selected as the weight. In addition, the weights may vary for the same or for the same or different reasons for all of the outputs.

입력 데이터는 주파수-도메인 또는 시간-도메인 합성을 이용하여 결합될 수 있다. 도 4 내지 도 9에 도시된 바와 같이, 입력 스펙트럼은 맵핑에 따라 결합되고, 결합된 스펙트럼은 역변환되며, N개의 오디오 출력 채널을 형성하기 위하여 프레임들은 재결합된다. 도 10에 도시된 바와 같이, 대응하는 스펙트럼 맵을 이용하여 N개의 출력 각각에 대해 콘볼루션 필터가 구성된다. 입력 채널들은 N개의 필터를 통과하고 N개의 오디오 출력 채널을 형성하기 위하여 재결합된다.The input data can be combined using frequency-domain or time-domain synthesis. As shown in Figures 4-9, the input spectra are combined according to the mapping, the combined spectra are inverse transformed, and the frames are recombined to form N audio output channels. As shown in FIG. 10, a convolution filter is configured for each of the N outputs using the corresponding spectral map. The input channels pass through N filters and recombine to form N audio output channels.

도 4 내지 도 10은 입력 채널의 스테레오 쌍(M=2)으로부터의 N=3개의 출력 채널을 추출하는 경우에 대한 채널 추출 알고리즘의 예시적 실시예에 대하여 상세하게 도시하고 있다. 적당히 오버랩핑된 프레임(48)(좌측 프레임)의 각각의 시퀀스를 생성하기 위하여, 채널 추출기는 윈도우(38), 예컨대, 상승 코사인, 해밍 또는 해닝 윈도우를 좌측 및 우측 오디오 입력 신호(44, 46)에 적용한다(단계 40, 42). 좌측 입력 스펙트럼(54) 및 우측 입력 스펙트럼(56)을 발생시키기 위하여, FFT를 이용하여 각각의 프레임이 주파수 변환된다. 이러한 실시예에서, 채널간 진폭 스펙트럼(58)을 생성하기 위하여, 입력 스펙트럼(54, 56)의 각각의 스펙트럼 선의 로그 차가 계산된다(단계 60). 1-D 채널 추출 공간(62), 예컨대, 출력 S₁, S₂ 및 S₃의 경계를 짓는 -3db 및 +3db 문턱값이 정의되고(단계 64), 채널간 진폭 스펙트럼(58)의 각각의 스펙트럼 선은 적당한 출력으로 맵핑된다(단계 66).4-10 illustrate in detail an exemplary embodiment of a channel extraction algorithm for the case of extracting N = 3 output channels from a stereo pair (M = 2) of an input channel. In order to generate each sequence of moderately overlapped frames 48 (left frame), the channel extractor uses a window 38, e.g. a rising cosine, a hamming or a hanning window, to the left and right audio input signals 44, 46. Apply to steps 40 and 42. In order to generate the left input spectrum 54 and the right input spectrum 56, each frame is frequency transformed using the FFT. In this embodiment, to generate the interchannel amplitude spectrum 58, the logarithm difference of each spectral line of the input spectra 54, 56 is calculated (step 60). 1-D channel extraction space 62, for example output S ₁ , S ₂ And -3db and + 3db thresholds bounding S ₃ are defined (step 64), and each spectral line of the interchannel amplitude spectrum 58 is mapped to the appropriate output (step 66).

일단 맵핑이 완료되면, 채널 추출기는 맵핑에 따라 3개의 출력 각각에 대해, 입력 스펙트럼(54 및 56), 예컨대, 스펙트럼 선의 진폭 계수들을 결합한다(단계 67). 도 8 및 도 9a-9c에 도시된 바와 같이, 경우 1에서는, 각각의 오디오 출력 채널 스펙트럼(68, 70, 72)을 발생시키기 위하여, 채널들은 똑같이 가중치화되고, 가중치들은 동일하다. 도시된 바와 같이, 주어진 스펙트럼 선에 대하여, 입력 스펙트럼은 하나의 출력에 대해서만 결합된다. 경우 2에서는, 아마도 L/R 사운드 필드에 대한 사전 지식을 갖는다면, 스펙트럼 선이 출력 1로 맵핑되는 경우(L≫R), L 입력 채널만이 통과된다. L 및 R이 거의 비슷하다면, L 및 R은 동일하게 가중치화되고, R≫L인 경우, R입력 채널만이 통과된다. 3개의 오디오 출력 채널(86, 88 및 90)을 발생시키기 위해, 각각의 출력 스펙트럼의 연속하는 프레임들이 역변환되고(단계 74, 76, 78), 프레임들은 표준 오버랩-가산 재구성일 이용하여 재결합된다(단계 80, 82, 84).Once the mapping is complete, the channel extractor combines the amplitude coefficients of the input spectra 54 and 56, eg, spectral lines, for each of the three outputs according to the mapping (step 67). As shown in Figures 8 and 9A-9C, in Case 1, the channels are equally weighted and the weights are the same, in order to generate each audio output channel spectrum 68, 70, 72. As shown, for a given spectral line, the input spectrum is combined for only one output. In case 2, perhaps with prior knowledge of the L / R sound field, only the L input channel is passed if the spectral line is mapped to output 1 (L »R). If L and R are nearly similar, then L and R are equally weighted, and if R''L, only the R input channel is passed. To generate three audio output channels 86, 88, and 90, successive frames of each output spectrum are inverse transformed (steps 74, 76, 78), and the frames are recombined using a standard overlap-addition reconstruction ( Steps 80, 82, 84).

각각의 출력 채널들에 대하여 3개의 '맵(106a, 106b, 및 106c)' 1을 구성하기 위하여, 좌측 및 우측 입력 채널들이 해닝 윈도우와 같은 윈도우를 갖는 프레임 들로 분할되고(단계 100), 입력 스펙트럼을 형성하기 위하여 FFT를 이용하여 변환되며(단계 102), 차이 스펙트럼을 형성하고, 문턱값들(-3db 및 +3db)에 대해 각각의 스펙트럼 선을 비교함으로써 스펙트럼 선들로 분리되는(단계 104), 스테레오 쌍으로부터 3개의 오디오 출력 채널을 추출하기 위해 시간-도메인 합성을 이용하는 대안적인 실시예가 도 10에 도시되어 있다. 맵의 성분은, 스펙트럼 선의 차이가 대응하는 카테고리 내에 있는 경우 1(one)로 설정되고, 스펙트럼 선의 차이가 대응하는 카테고리 내에 있지 않은 경우에는 0(zero)으로 설정된다. 이러한 단계들은 도 4에 도시된 단계들(40-66)과 동일하다.In order to construct three 'maps 106a, 106b, and 106c' 1 for each output channel, the left and right input channels are divided into frames with a window such as a Hanning window (step 100), and the input Transformed using an FFT to form a spectrum (step 102), and split into spectral lines by forming a difference spectrum and comparing the respective spectral lines against thresholds (-3db and + 3db) (step 104). An alternative embodiment of using time-domain synthesis to extract three audio output channels from a stereo pair is shown in FIG. 10. The component of the map is set to 1 (one) if the difference in the spectral lines is in the corresponding category, and zero (zero) if the difference in the spectral lines is not in the corresponding category. These steps are the same as steps 40-66 shown in FIG.

입력 채널들은 대응하는 스펙트럼 맵을 이용하여 N개의 출력 각각에 대해 구성된 콘볼루션 필터를 통과하고, M×N개의 부분적인 결과들은 합산되며, N개의 오디오 출력 채널들을 형성하기 위하여, 프레임들이 재결합된다(단계 108). 인위적인 결과(artifact)를 감소시키기 위해, 곱셈처리 이전에 맵에 스무딩 처리(smoothing)가 적용될 수 있다. 스무딩 처리는 아래와 같은 공식으로 행해질 수 있다.The input channels pass through a convolution filter configured for each of the N outputs using the corresponding spectral map, the M × N partial results are summed, and the frames are recombined to form N audio output channels ( Step 108). In order to reduce artificial artifacts, smoothing may be applied to the map prior to multiplication. The smoothing process can be performed by the following formula.

다른 스무딩 방법도 가능하다. 도면에 도시된 바와 같이, 가중치화가 요구되지 않는 경우, 입력 채널의 합산(단계 110)은 필터링 이전에 행해질 수 있다.Other smoothing methods are possible. As shown in the figure, if weighting is not required, the summation of the input channels (step 110) may be done prior to filtering.

본 발명의 몇몇 예시적 실시예들이 도시되고 설명되었지만, 당업자들에는 많은 변경 및 대안적 실시예들이 발생할 것이다. 이와 같은 변경 및 대안적 실시예들이 고려되면 첨부된 청구 범위에 정의된 바와 같은 본 발명의 정신 및 범위를 벗어 나지 않고 행해질 수 있다.While some exemplary embodiments of the invention have been shown and described, many modifications and alternative embodiments will occur to those skilled in the art. Such modifications and alternative embodiments may be taken into consideration without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

A method of extracting N audio output channels from M≤N audio input channels,

Converting each of the M audio input channels into a respective input spectrum;

Generating at least one interchannel amplitude spectrum from the input spectrum for each pair of M audio input channels;

Mapping each spectral line of the interchannel amplitude spectrum to one of N outputs; And

Combining data from the M input channels according to the spectral mapping to form N audio output channels.

Extracting audio output channels comprising a.

2. The method of claim 1, wherein an overlapping window is applied to the pre-conversion of the audio input channels to form a frame sequence, and the overlapping inverse to the post-inverse transform of the frame to recombine frames into the N audio output channels. Wherein the window is applied.

2. The method of claim 1, wherein the interchannel amplitude spectrum is generated as a linear difference, log difference, or norm difference, or summation of the input spectra.

The method of claim 1, wherein the spectral lines are mapped to M-1 dimensional spaces whose axes correspond to respective inter-channel amplitude spectra.

5. The method of claim 4, wherein each spectral line is mapped to one output.

The method of claim 1, wherein the spectral lines are thresholded to map the spectral lines to one of the N outputs.

The method of claim 1, wherein the data from the input channels are combined as a weighted average.

8. The method of claim 7, wherein the weight is determined at least in part by a sound field relationship of the audio input channels.

The method of claim 1, wherein the data from the input channels is:

Combine an input spectrum of the M input channels for each of the spectral lines mapped to each of the N outputs,

And combine by inversely transforming each of the combined spectra to form the N audio output channels.

The method of claim 1, wherein the data from the input channels is:

Configure a filter for each of the N outputs using a corresponding map,

Passing each of the M input channels through the N filters,

Combining the outputs of the filter to form N output channel frames.

The method of claim 1, wherein the N audio output channels are linearly independent.

2. The method of claim 1, wherein the audio input channels comprise a mix of audio sources and using a source separation algorithm to separate the N audio output channels into the same number or a plurality of the audio sources. The method of extracting audio output channels further comprising.

A method of separating Q audio sources from M audio input channels comprising a mix of audio sources, the method comprising:

Converting each of the M audio input channels into a respective input spectrum;

Mapping each spectral line of the interchannel amplitude spectrum to one of N ≧ Q outputs to produce a map for each output;

Combining data from the M input channels according to the map to form the N audio output channels; And

Using a source separation algorithm to separate the N audio output channels into Q audio sources

How to disconnect the audio source comprising a.

The method of claim 13, wherein the N audio output channels are linearly independent.

A method of extracting N audio output channels from two audio input channels, the method comprising:

Converting each of the audio input channels into a respective input spectrum;

Generating an interchannel amplitude spectrum from the input spectrum;

Threshold processing each spectral line of the interchannel amplitude spectrum at one of N outputs; And

Combining data from the M input channels according to the spectral mapping to form the N audio output channels.

Extracting audio output channels comprising a.

16. The method of claim 15, wherein the interchannel amplitude spectrum is generated as a linear difference, log difference, or norm difference, or summation of the input spectrum.

16. The method of claim 15, wherein the number N of audio output channels is three.

16. The method of claim 15, wherein the audio input channel is transformed using a fast fourier transform (FFT).

A channel extractor for extracting N audio output channels from M≤N audio input channels,

Means for converting each of the M audio input channels into a respective input spectrum;

Means for generating at least one interchannel amplitude spectrum from the input spectrum for each pair of M audio input channels;

Means for mapping each spectral line of the interchannel amplitude spectrum to one of N outputs; And

Means for combining data from the M input channels according to the spectral mapping to form the N audio output channels.

Channel extractor comprising.

The method of claim 19, wherein the means for combining the data,

Means for combining an input spectrum of the M input channels for each of the spectral lines mapped to each of the N outputs; And

Means for inversely transforming each of the combined spectra to form the N audio output channels

To include, the channel extractor.

The method of claim 19, wherein the means for combining the data,

Means for configuring a filter for each of the N outputs using a corresponding map;

Means for passing each of the M input channels through the N filters; And

Means for combining the filter outputs to form N output channel frames

To include, the channel extractor.