JP2020508590A

JP2020508590A - Apparatus and method for downmixing multi-channel audio signals

Info

Publication number: JP2020508590A
Application number: JP2019503460A
Authority: JP
Inventors: シェ，ペイ−ルン; ウー，ツァイ−イ
Original assignee: アンビディオ，インコーポレイテッド
Priority date: 2017-02-17
Filing date: 2018-02-16
Publication date: 2020-03-19
Also published as: EP3583786A1; WO2018151858A1; CN109644315A; KR20190109726A; EP3583786A4; TW201843675A

Abstract

マルチチャネル入力オーディオ信号を処理する方法が、コンピューティング装置において実行される。この方法は以下のステップ、すなわち、マルチチャネル入力オーディオ信号から、左入力チャネル及び右入力チャネルを選択することであって、この左入力チャネル及び右入力チャネルが１対の空間的に対称な信号源に対応することと、左入力チャネル及び右入力チャネルから１つ又は複数のクロスチャネルの特徴を生成することと、このクロスチャネルの特徴により、左入力チャネル及び右入力チャネルを処理して、左中間チャネル及び右中間チャネルを生成することと、左中間チャネル及び右中間チャネルのそれぞれと、マルチチャネル入力オーディオ信号の第３の入力チャネルとを結合して、２チャネル出力オーディオ信号を形成することとを含む。【選択図】図４ＢA method for processing a multi-channel input audio signal is performed on a computing device. The method comprises the steps of selecting a left input channel and a right input channel from a multi-channel input audio signal, wherein the left input channel and the right input channel are a pair of spatially symmetric signal sources. And generating one or more cross-channel features from the left and right input channels, and processing the left and right input channels to produce a left intermediate Generating a channel and a right intermediate channel; and combining each of the left and right intermediate channels with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal. Including. [Selection diagram] FIG. 4B

Description

技術分野
[0001] 本出願は一般に、オーディオ信号処理に関し、詳細には、マルチチャネル・オーディオ信号をダウンミックスするための、コンピュータ実施方法、実施装置、及びコンピュータが使用可能なプログラム・コードに関する。 Technical field
[0001] This application relates generally to audio signal processing, and in particular, to a computer-implemented method, implementation apparatus, and computer-usable program code for downmixing a multi-channel audio signal.

背景
[0002] サラウンド・サウンドは、リスナを囲む複数のオーディオ・チャネルを使用して、オーディオを生成、送出、及び再生するための技法である。これは通常、複数の個別のオーディオ・チャネルによって実現される。マルチチャネル又はサラウンド・サウンドの構成のうち普及している２つの構成は、５．１サラウンド・サウンド及び７．１サラウンド・サウンドである。５．１サラウンド・サウンド構成は、１対のフロント・スピーカ（Ｌ及びＲ）、１つのセンター・フロント・チャネル（Ｃ）、１対のサイド・スピーカ（Ｌｓ及びＲｓ）、並びに１つの低音効果（ＬＦＥ）から構成されており、順序は、従来通りのＬ、Ｃ、Ｒ、Ｌｓ、Ｒｓ、ＬＦＥである。７．１サラウンド・サウンド構成は、１対のフロント・スピーカ（Ｌ及びＲ）、１つのセンター・フロント・チャネル（Ｃ）、１対のサイド・サラウンド・スピーカ（Ｌｓｓ及びＲｓｓ）、１対のリア・サラウンド・スピーカ（Ｌｒｓ及びＲｒｓ）、並びに１つの低音効果（ＬＦＥ）から構成されており、順序は、従来通りのＬ、Ｃ、Ｒ、Ｌｓｓ、Ｒｓｓ、Ｌｒｓ、Ｒｒｓ、及びＬＦＥである。 background
[0002] Surround sound is a technique for generating, sending, and playing audio using multiple audio channels surrounding a listener. This is typically achieved by a plurality of individual audio channels. Two popular configurations of multi-channel or surround sound configurations are 5.1 surround sound and 7.1 surround sound. The 5.1 surround sound configuration consists of one pair of front speakers (L and R), one center front channel (C), one pair of side speakers (Ls and Rs), and one bass effect ( LFE), and the order is L, C, R, Ls, Rs, and LFE as in the related art. The 7.1 surround sound configuration consists of one pair of front speakers (L and R), one center front channel (C), one pair of side surround speakers (Lss and Rss), and one pair of rear speakers. Consists of surround speakers (Lrs and Rrs) and one bass effect (LFE), in the order of L, C, R, Lss, Rss, Lrs, Rrs, and LFE as is conventional.

[0003] ダウンミキシングは、マルチチャネル構成（たとえば、マルチチャネル・オーディオ・ファイル）を有するプログラムを、相対的にチャネルの少ないプログラムに変換するプロセスである。たとえば、２チャネル・ステレオ再生システムを使用してリスナに良好な聴取体験を提供しながら、２チャネル・ステレオ再生システムを使用して、５．１サラウンド・サウンド・ファイル又は７．１サラウンド・サウンド・ファイルをダウンミックス及び再生することができる。 [0003] Downmixing is a process of converting a program having a multi-channel configuration (for example, a multi-channel audio file) into a program with relatively few channels. For example, using a two-channel stereo playback system to provide a good listening experience to the listener, while using a two-channel stereo playback system to provide a 5.1 surround sound file or 7.1 surround sound file. Files can be downmixed and played.

[0004] 従来のダウンミックス・プロセスでは、マルチチャネル・オーディオ入力のそれぞれが、それぞれのプロセッサを用いて個別且つ別々に処理されて、１つ又は２つのチャネル出力を生成する。各チャネルでのプロセスは、スピーカ・ペアの間での意味のある情報を適切に考慮するものではない。したがって、これら従来のダウンミキシング・プロセスを使用して得られるオーディオ出力は、相対的に確度が低く、没入型の聴取体験を損なう場合もある。 [0004] In a conventional downmix process, each of the multi-channel audio inputs is processed individually and separately using a respective processor to produce one or two channel outputs. The process on each channel does not properly account for meaningful information between the speaker pairs. Thus, the audio output obtained using these conventional downmixing processes is relatively inaccurate and may detract from an immersive listening experience.

概要
[0005] 本出願の一目的は、入力オーディオ・チャネルをペアで処理する、オーディオ・ダウンミックス・パイプラインを開発することである。その結果得られるダウンミックス済みの出力は、元のマルチチャネル・オーディオ・ストリームの空間情報を維持しながら、確度が相対的に良好である。チャネル入力及びチャネル出力の数は、その規定の下では意味のある任意の数とすることができるが、以下の説明では、例として、５．１からステレオへのダウンミックス及び７．１からステレオへのダウンミックスを使用する。 Overview
[0005] One object of the present application is to develop an audio downmix pipeline that processes input audio channels in pairs. The resulting downmixed output has relatively good accuracy, while maintaining the spatial information of the original multi-channel audio stream. The number of channel inputs and channel outputs can be any number that is meaningful under its definition, but in the following description, as an example, a downmix from 5.1 to stereo and a 7.1 to stereo Use downmix to.

[0006] 本出願の第１の態様によれば、マルチチャネル入力オーディオ信号を処理する方法は、１つ又は複数のプロセッサ、メモリ、及び複数のプログラム・モジュールを有するコンピューティング装置において実行され、この複数のプログラム・モジュールは、メモリに格納され、１つ又は複数のプロセッサによって実行される。この方法は以下のステップ、すなわち、マルチチャネル入力オーディオ信号から、左入力チャネル及び右入力チャネルを選択することであって、この左入力チャネル及び右入力チャネルが１対の空間的に対称な信号源に対応することと、左入力チャネル及び右入力チャネルから１つ又は複数のクロスチャネルの特徴を生成することと、このクロスチャネルの特徴により、左入力チャネル及び右入力チャネルを処理して、左中間チャネル及び右中間チャネルを生成することと、左中間チャネル及び右中間チャネルのそれぞれと、マルチチャネル入力オーディオ信号の第３の入力チャネルとを結合して、２チャネル出力オーディオ信号を形成することとを含む。 [0006] According to a first aspect of the present application, a method of processing a multi-channel input audio signal is performed in a computing device having one or more processors, a memory, and a plurality of program modules. A plurality of program modules are stored in memory and executed by one or more processors. The method comprises the steps of selecting a left input channel and a right input channel from a multi-channel input audio signal, wherein the left input channel and the right input channel are a pair of spatially symmetric signal sources. And generating one or more cross-channel features from the left and right input channels, and processing the left and right input channels according to the cross-channel features to produce a left intermediate Generating a channel and a right intermediate channel; and combining each of the left and right intermediate channels with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal. Including.

[0007] 本出願の別の態様によれば、コンピューティング装置は、１つ又は複数のプロセッサと、メモリと、このメモリに格納され、この１つ又は複数のプロセッサによって実行される複数のプログラム・モジュールとを含む。この複数のプログラム・モジュールは、１つ又は複数のプロセッサによって実行されると、コンピューティング装置が、マルチチャネル入力オーディオ信号を処理する前述の方法を実行できるようにする。本出願のさらに別の態様によれば、コンピュータ・プログラム製品は、１つ又は複数のプロセッサを有するコンピューティング装置と連係して、持続的でコンピュータ読取り可能な記憶媒体に記憶され、このコンピュータ・プログラム製品は複数のプログラム・モジュールを含み、このプログラム・モジュールは、１つ又は複数のプロセッサによって実行されると、コンピューティング装置が、マルチチャネル・オーディオ信号を処理する前述の方法を実行できるようにする。 [0007] According to another aspect of the present application, a computing device includes one or more processors, a memory, and a plurality of programs stored in the memory and executed by the one or more processors. Module. The plurality of program modules, when executed by one or more processors, enable the computing device to perform the foregoing method of processing a multi-channel input audio signal. According to yet another aspect of the present application, a computer program product is stored on a persistent, computer-readable storage medium in conjunction with a computing device having one or more processors, the computer program product comprising: The product includes a plurality of program modules that, when executed by one or more processors, enable the computing device to perform the aforementioned methods of processing multi-channel audio signals. .

図面の簡単な説明
[0008] 添付図面は、各実施形態のさらなる理解を実現するように含まれ、本明細書に組み込まれ、明細書の一部を構成し、説明される実施形態を例示し、この説明とともに、基本となる原理を説明するのに役立つものである。同じ参照番号は、対応する部分を指す。 BRIEF DESCRIPTION OF THE FIGURES
[0008] The accompanying drawings are included to provide a further understanding of each embodiment, are incorporated in and constitute a part of this specification, illustrate the embodiments described, and together with the description, It helps to explain the underlying principles. Like reference numbers refer to corresponding parts.

[0009]いくつかの実施形態による、５．１入力信号に対して実行される従来のダウンミキシング・プロセスを示すブロック図である。[0009] FIG. 4 is a block diagram illustrating a conventional downmixing process performed on a 5.1 input signal, according to some embodiments. [0010]いくつかの実施形態による、５．１入力信号に対して実行される従来のＬｏＲｏダウンミキシング・プロセスを示すブロック図である。[0010] FIG. 4 is a block diagram illustrating a conventional LoRo downmixing process performed on a 5.1 input signal, according to some embodiments. [0011]いくつかの実施形態による、７．１入力信号からの、サラウンド・サウンドの仮想化又は空間化の従来のダウンミキシング・プロセスを示すブロック図である。[0011] FIG. 4 is a block diagram illustrating a conventional downmixing process for virtualizing or spatializing surround sound from a 7.1 input signal, according to some embodiments. [0012]本出願の例示的な実施形態によるオーディオ・ダウンミキシングを実行するように構成されたデータ処理システムのブロック図を示す。[0012] FIG. 1 shows a block diagram of a data processing system configured to perform audio downmixing according to an exemplary embodiment of the present application. [0013]いくつかの実施形態による、マルチチャネル入力信号を処理するオーディオ・ダウンミックス・パイプラインを示すブロック図である。[0013] FIG. 3 is a block diagram illustrating an audio downmix pipeline for processing a multi-channel input signal, according to some embodiments. [0013]いくつかの実施形態による、マルチチャネル入力信号を処理するオーディオ・ダウンミックス・パイプラインを示すブロック図である。[0013] FIG. 3 is a block diagram illustrating an audio downmix pipeline for processing a multi-channel input signal, according to some embodiments. [0014]いくつかの実施形態による、入力ペアに適用されたＰＲＯＣを含む信号ワークフローを示すブロック図である。[0014] FIG. 4 is a block diagram illustrating a signal workflow including a PROC applied to an input pair, according to some embodiments. [0015]いくつかの実施形態による、７．１サラウンド・サウンドに適用される信号ワークフローを示すブロック図である。[0015] FIG. 4 is a block diagram illustrating a signal workflow applied to 7.1 surround sound, according to some embodiments. [0016]いくつかの実施形態による、７．１サラウンド・サウンド・ファイルについて図４Ｂを参照して述べたような、信号パイプラインの実装を管理するのに使用される、ソフトウェア・アプリケーションのユーザ・インターフェース、又はソフトウェア・アプリケーションのプラグイン・コンポーネントを示す。[0016] According to some embodiments, a user of a software application used to manage the implementation of the signal pipeline, as described with reference to FIG. 4B for a 7.1 surround sound file. Shows an interface or plug-in component of a software application. [0017]いくつかの実施形態による、マルチチャネル・オーディオ信号をダウンミックスするプロセスを示す流れ図である。5 is a flowchart illustrating a process for downmixing a multi-channel audio signal, according to some embodiments. [0017]いくつかの実施形態による、マルチチャネル・オーディオ信号をダウンミックスするプロセスを示す流れ図である。5 is a flowchart illustrating a process for downmixing a multi-channel audio signal, according to some embodiments. [0017]いくつかの実施形態による、マルチチャネル・オーディオ信号をダウンミックスするプロセスを示す流れ図である。5 is a flowchart illustrating a process for downmixing a multi-channel audio signal, according to some embodiments.

詳細な説明
[0018] 次に、実施形態を詳細に説明することにする。実施形態の例は添付図面に示す。以下の詳細な説明では、本明細書において提示される主題を理解する際の手助けとするために、数多くの非限定的で具体的な詳細を述べる。しかし、特許請求の範囲から逸脱することなく様々な選択肢を使用してもよく、これらの具体的な詳細なしに主題を実施してもよいことが当業者には明らかになろう。たとえば、本明細書において提示される主題は、スマートフォンやタブレットなど数多くのタイプの無線通信システム上で実施できることが当業者には明らかになろう。 Detailed description
Next, embodiments will be described in detail. Examples of embodiments are shown in the accompanying drawings. In the following detailed description, numerous non-limiting, specific details are set forth in order to assist in understanding the subject matter presented herein. However, it will be apparent to one skilled in the art that various alternatives may be used and the subject matter may be practiced without these specific details without departing from the scope of the claims. For example, it will be apparent to one skilled in the art that the subject matter presented herein can be implemented on many types of wireless communication systems, such as smartphones and tablets.

[0019] ここで各図を参照すると、例示的な実施形態を実施してもよいデータ処理環境の例示的なブロック図が提示されている。これらの図は例示的なものに過ぎず、様々な実施形態を実施してもよい環境について、いかなる制限を表明又は暗示するものでもないことを理解すべきである。図示した環境に対して、数多くの修正を加えてもよい。 [0019] Referring now to the figures, an exemplary block diagram of a data processing environment in which the exemplary embodiments may be implemented is presented. It is to be understood that these figures are merely exemplary, and do not represent or imply any limitations on the environment in which various embodiments may be implemented. Many modifications may be made to the depicted environment.

[0020] 図１Ａは、５．１入力信号に対して実行される従来のダウンミキシング・プロセスを示すブロック図である。図１Ａに示すように、各入力チャネル（すなわち、Ｌ、Ｃ、Ｒ、Ｌｓ、Ｒｓ、及びＬＦＥ）は、他のチャネルとの関係にかかわらず、別々に処理され、そのそれぞれのプロセッサ・モジュール（ＰＲＯＣ）に送信される。プロセッサは、利得、時間遅延、低域通過フィルタ、及び／又は他の音響処理モジュールなど、１つ又は複数のサブモジュール（図示せず）を含むことができる。それぞれのチャネルでの各プロセス・モジュールの出力は、このプロセス・モジュールで実施されるプロセスのタイプに応じて、１つ又は複数のチャネルを含むことができる。最後に、これらの出力が合計されて（すなわちΣ）、この例では、２チャネル・オーディオ（すなわち、Ｌ及びＲの出力チャネル）になる。 FIG. 1A is a block diagram illustrating a conventional downmixing process performed on a 5.1 input signal. As shown in FIG. 1A, each input channel (ie, L, C, R, Ls, Rs, and LFE) is processed separately, regardless of its relationship to other channels, and its respective processor module ( PROC). The processor may include one or more sub-modules (not shown), such as gain, time delay, low pass filters, and / or other sound processing modules. The output of each process module on each channel may include one or more channels, depending on the type of process performed on this process module. Finally, the outputs are summed (ie, Σ) into two-channel audio (ie, L and R output channels) in this example.

[0021] 図１Ｂは、５．１入力信号に対して実行される従来の左のみ／右のみ（ＬｏＲｏ）ダウンミキシング・プロセスを示すブロック図である。各入力チャネル（すなわち、Ｌ、Ｃ、Ｒ、Ｌｓ、Ｒｓ、及びＬＦＥ）は、利得モジュールを個別に通過することになる。利得の調整は、サラウンド・サウンド・システムによって再現された場合と同様に、物理的な場所に依存する。サラウンド・チャネルは、Ｌ／Ｒチャネルを超えて減衰することもあるが、左サイドと右サイドの関係は無視される。左チャネル出力は、左サイドからの全てのチャネルに加えて、減衰したＣ及びＬＦＥの信号を加算することによって作成される。ステレオ再現設定では、中心線上に物理的スピーカが配置されていないので、センター・チャネルが２つに分割される。ＬＦＥもまた、２つのチャネルに分割される。右チャネル出力は、右サイドからの全てのチャネルに加えて、減衰したＣ及びＬＦＥの信号を加算することによって作成される。この例では、あらゆる入力チャネルが、１つのチャネル入力を受け取り、１つ又は２つのチャネル出力を生成する、単純な利得モジュールを用いて、その個々のプロセッサによって様々に処理される。最後に、全てのＰＲＯＣ出力は、入力の所期の再現位置に基づいて合計されることになる。 FIG. 1B is a block diagram illustrating a conventional left only / right only (LoRo) downmixing process performed on a 5.1 input signal. Each input channel (ie, L, C, R, Ls, Rs, and LFE) will pass individually through the gain module. Adjusting the gain depends on the physical location, as if reproduced by a surround sound system. The surround channel may attenuate beyond the L / R channel, but the relationship between the left and right sides is ignored. The left channel output is created by adding all the channels from the left side plus the attenuated C and LFE signals. In the stereo reproduction setting, the center channel is divided into two since no physical speakers are arranged on the center line. The LFE is also split into two channels. The right channel output is created by summing all channels from the right side plus the attenuated C and LFE signals. In this example, every input channel is processed differently by its individual processors using a simple gain module that receives one channel input and produces one or two channel outputs. Finally, all PROC outputs will be summed based on the intended reproduction position of the input.

[0022] 図１Ｃは、７．１入力信号からの、サラウンド・サウンドの仮想化又は空間化の従来のダウンミキシング・プロセスを示すブロック図である。この例では、入力マルチチャネル・オーディオの各チャネル（すなわち、Ｌ、Ｃ、Ｒ、Ｌｓｓ、Ｒｓｓ、Ｌｒｓ、Ｒｒｓ、及びＬＦＥ）は、その個々の利得に加えて、それぞれの頭部伝達関数（ＨＲＴＦ）によって処理されることになり、このＨＲＴＦは、２チャネル出力を生成するためのそれぞれの物理的スピーカの所期の位置を表す。たとえば、左チャネル入力は、サラウンド・サウンド・システムのスピーカの左チャネルを表すそのＨＲＴＦによって処理されることになる。同様の処理が、他の全ての入力チャネルに適用されることになる。それぞれの入力チャネルに基づく２つのチャネル出力セットの全てが合計され、それぞれ左チャネル出力及び右チャネル出力になる。この例では、あらゆる入力チャネルはまた、個々のプロセッサ（たとえば、利得モジュール及びＨＲＴＦフィルタから構成される）によって様々に処理される。このプロセッサは、１チャネル入力を取り入れ、２チャネル出力を生成する。全ての２チャネル出力が合計され、最終の２チャネル出力になる。 FIG. 1C is a block diagram illustrating a conventional downmixing process for virtualizing or spatializing surround sound from a 7.1 input signal. In this example, each channel of the input multi-channel audio (ie, L, C, R, Lss, Rss, Lrs, Rrs, and LFE), in addition to its individual gain, has its own head related transfer function (HRTF). ), Which represents the intended position of each physical loudspeaker to produce a two-channel output. For example, the left channel input will be processed by its HRTF representing the left channel of the surround sound system speaker. Similar processing will be applied to all other input channels. All of the two sets of channel outputs based on each input channel are summed, resulting in a left channel output and a right channel output, respectively. In this example, every input channel is also processed differently by individual processors (eg, consisting of a gain module and an HRTF filter). This processor takes one channel input and produces a two channel output. All the two channel outputs are summed up to a final two channel output.

[0023] 図２には、本出願の例示的な実施形態によるオーディオ・ダウンミキシングを実行するように構成されたデータ処理システム１００のブロック図が示してある。この説明に役立つ例では、データ処理システム１００は、プロセッサ・ユニット１０４と、メモリ１０６と、永続記憶装置１０８と、通信ユニット１１０と、入力／出力（Ｉ／Ｏ）ユニット１１２と、表示装置１１４と、１つ又は複数のスピーカ１１６との間での通信を可能にする通信機構１０２を含む。スピーカ１１６は、データ処理システム１００に内蔵してもよく、データ処理システム１００の外部でもよいことに留意されたい。実施形態によっては、データ処理システム１００は、ラップトップ・コンピュータ、デスクトップ・コンピュータ、タブレット・コンピュータ、携帯電話（スマートフォンなど）、マルチメディア・プレーヤ装置、ナビゲーション装置、教育装置（子どもの学習用玩具など）、ゲーム・システム、オーディオ／ビデオ（ＡＶ）受信機、又は制御装置（たとえば、家庭用又は産業用の制御装置）の形態をとる。 FIG. 2 shows a block diagram of a data processing system 100 configured to perform audio downmixing according to an exemplary embodiment of the present application. In this illustrative example, data processing system 100 includes processor unit 104, memory 106, persistent storage 108, communication unit 110, input / output (I / O) unit 112, display 114, And a communication mechanism 102 that enables communication with one or more speakers 116. Note that the speaker 116 may be internal to the data processing system 100 or external to the data processing system 100. In some embodiments, data processing system 100 includes a laptop computer, a desktop computer, a tablet computer, a mobile phone (such as a smartphone), a multimedia player device, a navigation device, an educational device (such as a child's learning toy). , A game system, an audio / video (AV) receiver, or a controller (eg, a home or industrial controller).

[0024] プロセッサ・ユニット１０４は、メモリ１０６にロードできるソフトウェア・プログラム向けの命令を実行する働きをする。プロセッサ・ユニット１０４は、特定の実施態様に応じて、１つ又は複数のプロセッサのセットでもよく、又はマルチプロセッサ・コアでもよい。さらに、プロセッサ・ユニット１０４は、主プロセッサが２次プロセッサとともに単一チップ上に存在する１つ又は複数の異種プロセッサ・システムを使用して実装してもよい。説明に役立つ他の例として、プロセッサ・ユニット１０４は、同じタイプの複数のプロセッサを含む対称型マルチプロセッサ・システムでもよい。 [0024] Processor unit 104 serves to execute instructions for a software program that can be loaded into memory 106. Processor unit 104 may be a set of one or more processors, or may be a multi-processor core, depending on the particular implementation. Further, processor unit 104 may be implemented using one or more heterogeneous processor systems, where the main processor resides on a single chip with the secondary processor. As another illustrative example, processor unit 104 may be a symmetric multiprocessor system that includes multiple processors of the same type.

[0025] これらの例では、メモリ１０６は、ランダム・アクセス・メモリでもよく、又は他の任意の適切な揮発性若しくは不揮発性の記憶装置でもよい。永続記憶装置１０８は、具体的な実装形態に応じて、様々な形をとってもよい。たとえば、永続記憶装置１０８には、ハード・ドライブ、フラッシュ・メモリ、書換え可能型光ディスク、書換え可能型磁気テープ、又は以上の何らかの組合せなど、１つ又は複数の構成要素又は装置が含まれ得る。永続記憶装置１０８が使用する媒体は、取外し可能でもよい。たとえば、取外し可能なハード・ドライブを、永続記憶装置１０８用に使用してもよい。 [0025] In these examples, memory 106 may be a random access memory, or any other suitable volatile or non-volatile storage. Persistent storage 108 may take various forms, depending on the particular implementation. For example, persistent storage 108 may include one or more components or devices, such as a hard drive, flash memory, rewritable optical disk, rewritable magnetic tape, or some combination thereof. The media used by persistent storage 108 may be removable. For example, a removable hard drive may be used for persistent storage 108.

[0026] これらの例では、通信ユニット１１０は、他のデータ処理システム又はデータ処理装置との通信を実現する。これらの例では、通信ユニット１１０は、ネットワーク・インターフェース・カードである。通信ユニット１１０は、物理的通信リンクと無線通信リンクのいずれか、又はその両方を使用することによって通信を実現してもよい。 In these examples, the communication unit 110 implements communication with another data processing system or data processing device. In these examples, communication unit 110 is a network interface card. Communication unit 110 may implement communication by using either a physical or wireless communication link, or both.

[0027] 入力／出力ユニット１１２は、データ処理システム１００に接続してもよい他の装置とともにデータの入力及び出力を可能にする。たとえば、入力／出力装置１１２は、キーボード及びマウスを介してユーザ入力用の接続部を提供してもよい。さらに、入力／出力ユニット１１２は、プリンタに出力を送信してもよい。表示装置１１４は、ユーザに情報を表示するための機構を提供する。スピーカ１１６は、ユーザに向けてサウンドを演奏する。 [0027] The input / output unit 112 enables data input and output with other devices that may be connected to the data processing system 100. For example, input / output device 112 may provide a connection for user input via a keyboard and mouse. Further, input / output unit 112 may send output to a printer. Display device 114 provides a mechanism for displaying information to a user. The speaker 116 plays a sound toward the user.

[0028] オペレーティング・システム及びアプリケーション又はプログラム用の命令は、永続記憶装置１０８に配置される。これらの命令は、プロセッサ・ユニット１０４が実行するよう、メモリ１０６にロードしてもよい。以下に述べるような様々な実施形態のプロセスは、コンピュータ実装された命令を使用して、プロセッサ・ユニット１０４が実行してもよく、これらの命令は、メモリ１０６などのメモリ内に配置してもよい。これらの命令は、プログラム・コード（若しくはモジュール）、コンピュータ使用可能なプログラム・コード（若しくはモジュール）、又はコンピュータ読取り可能なプログラム・コード（若しくはモジュール）と呼ばれており、これらのプログラム・コードは、プロセッサ・ユニット１０４内のプロセッサが読み取り、実行してもよい。様々な実施形態におけるプログラム・コード（又はモジュール）は、メモリ１０６又は永続記憶装置１０８など、様々な物理的又は有形のコンピュータ読取り可能な媒体上に実施してもよい。 [0028] Instructions for the operating system and applications or programs are located in persistent storage 108. These instructions may be loaded into memory 106 for execution by processor unit 104. The processes of the various embodiments, as described below, may be performed by processor unit 104 using computer-implemented instructions, which may be located in a memory, such as memory 106. Good. These instructions are referred to as program code (or module), computer-usable program code (or module), or computer-readable program code (or module). The processor in processor unit 104 may read and execute. The program code (or module) in various embodiments may be embodied on various physical or tangible computer readable media, such as memory 106 or persistent storage 108.

[0029] プログラム・コード／モジュール１２０は、コンピュータ読取り可能な記憶媒体１１８上に機能的な形式で配置されており、この媒体は、選択的に取外し可能であり、プロセッサ・ユニット１０４が実行するよう、データ処理システム１００にロード又は転送してもよい。プログラム・コード／モジュール１２０及びコンピュータ読取り可能な記憶媒体１１８は、これらの例ではコンピュータ・プログラム製品１２２を形成する。一例では、コンピュータ読取り可能な記憶媒体１１８は、たとえば、光ディスク又は磁気ディスクなど有形の形式にあってもよく、このディスクは、永続記憶装置１０８の一部分であるハード・ドライブなど、記憶装置に転送するための永続記憶装置１０８の一部分であるドライブ又は他の装置に、挿入又は配置される。有形の形式においては、コンピュータ読取り可能な記憶媒体１１８はまた、データ処理システム１００に接続されるハード・ドライブ、サム・ドライブ、又はフラッシュ・メモリなど、永続記憶装置の形をとってもよい。有形の形式のコンピュータ読取り可能な記憶媒体１１８は、コンピュータ記録可能な記憶媒体とも呼ばれる。場合によっては、コンピュータ読取り可能な記憶媒体１１８は、データ処理システム１００から取外し可能でなくてもよい。 [0029] The program code / modules 120 are disposed in a functional form on a computer-readable storage medium 118, which is selectively removable and executed by the processor unit 104. , May be loaded or transferred to the data processing system 100. Program code / module 120 and computer-readable storage medium 118 form computer program product 122 in these examples. In one example, computer readable storage medium 118 may be in a tangible form, such as, for example, an optical or magnetic disk, which is transferred to a storage device, such as a hard drive that is part of persistent storage 108. Inserted or located on a drive or other device that is part of the persistent storage device 108 for storage. In the tangible form, computer readable storage medium 118 may also take the form of a persistent storage, such as a hard drive, thumb drive, or flash memory connected to data processing system 100. The tangible form of the computer-readable storage medium 118 is also called a computer-recordable storage medium. In some cases, computer readable storage medium 118 may not be removable from data processing system 100.

[0030] あるいは、プログラム・コード／モジュール１２０は、コンピュータ読取り可能な記憶媒体１１８から、通信ユニット１１０への通信リンクを介して、及び／又は入力／出力ユニット１１２への接続を介して、データ処理システム１００に転送してもよい。説明に役立つ例では、この通信リンク及び／又は接続は、物理的なもの又は無線でもよい。コンピュータ読取り可能な媒体はまた、プログラム・コード／モジュールを含む通信リンク又は無線伝送など、無形の媒体の形をとってもよい。 [0030] Alternatively, the program code / module 120 may process data from the computer readable storage medium 118 via a communication link to the communication unit 110 and / or via a connection to the input / output unit 112. It may be transferred to the system 100. In an illustrative example, the communication links and / or connections may be physical or wireless. Computer-readable media may also take the form of intangible media, such as communication links or wireless transmissions containing the program code / modules.

[0031] データ処理システム１００用に図示した様々な構成要素は、様々な実施形態を実装してもよい方式に構造上の制限を加えるものではない。様々な例示的な実施形態は、データ処理システム１００において図示した構成要素に加えた、又はその代わりの構成要素を含むデータ処理システムに実装してもよい。図１に示す他の構成要素は、図示した説明に役立つ例から変更することができる。 [0031] The various components illustrated for data processing system 100 do not place any structural limitations on the manner in which various embodiments may be implemented. Various exemplary embodiments may be implemented in a data processing system that includes components in addition to or in place of those illustrated in data processing system 100. The other components shown in FIG. 1 can be modified from the illustrated illustrative example.

[0032] 一例として、データ処理システム１００内の記憶装置は、データを記憶することのできる任意のハードウェア装置である。メモリ１０６、永続記憶装置１０８、及びコンピュータ読取り可能な記憶媒体１１８は、有形の形式での記憶装置の例である。 As an example, the storage device in data processing system 100 is any hardware device that can store data. Memory 106, persistent storage 108, and computer-readable storage medium 118 are examples of storage in tangible form.

[0033] 別の例では、バス・システムは、通信機構１０２を実装するのに使用してもよく、システム・バス又は入力／出力バスなど１つ又は複数のバスから構成してもよい。バス・システムは、このバス・システムに取り付けられた様々な構成要素又は装置の間でのデータの転送を可能にする、任意の適切なタイプのアーキテクチャを使用して実装してもよい。さらに、通信ユニットは、モデム又はネットワーク・アダプタなど、データを送受信するために使用される１つ又は複数の装置を含んでもよい。さらに、メモリは、たとえばメモリ１０６、又は、通信機構１０２内に存在してもよいインターフェース及びメモリの制御装置ハブで目にするようなキャッシュでもよい。 [0033] In another example, the bus system may be used to implement the communication mechanism 102 and may consist of one or more buses, such as a system bus or an input / output bus. The bus system may be implemented using any suitable type of architecture that allows for the transfer of data between various components or devices attached to the bus system. Further, a communication unit may include one or more devices used to send and receive data, such as a modem or a network adapter. Further, the memory may be, for example, the memory 106 or a cache such as found on a controller hub of the interface and memory that may reside within the communication mechanism 102.

[0034] 本出願の背景技術で説明する従来の手法に関する問題を克服するため、本出願の様々な実施形態を以下に述べ、これらの実施形態は、ペアごとに入力オーディオ・チャネルをダウンミックスして、さらに良好なオーディオ情報確度を実現し、元のマルチチャネル・オーディオ・ストリームの空間情報を保持するシステム及び方法に関連する。従来の方法とは異なり、入力チャネル間の関係は２つ１組と見なされる。各ペアが、それぞれプロセッサに入る。ペアの情報が互いに比較及び解析されて、さらに密な音像及びさらに良好な空間的結果を生成する。このペアを意味のあるものにするために、プロセス内のモジュールの少なくとも１つが、ペア間で情報を相互参照する必要がある。各ペアの２チャネル出力が、単一チャネルと合計されて、左チャネル出力及び右チャネル出力を生成することになる。 [0034] To overcome the problems with the conventional approaches described in the background of the present application, various embodiments of the present application are described below, which downmix the input audio channels pair by pair. And a system and method for achieving better audio information accuracy and preserving the spatial information of the original multi-channel audio stream. Unlike conventional methods, the relationship between the input channels is considered as a pair. Each pair enters a processor. The pair information is compared and analyzed with each other to produce a denser sound image and better spatial results. In order for this pair to be meaningful, at least one of the modules in the process needs to cross-reference information between the pair. The two channel outputs of each pair will be summed with the single channel to produce a left channel output and a right channel output.

[0035] 図３Ａ〜３Ｂは、いくつかの実施形態による、マルチチャネル入力信号を処理するオーディオ・ダウンミックス・パイプラインを示すブロック図である。図３Ａでのマルチチャネル入力信号は、左フロント・チャネル（Ｌ）、中央フロント・チャネル（Ｃ）、右フロント・チャネル（Ｒ）、左サイド・チャネル（Ｌｓ）、右サイド・チャネル（Ｒｓ）、及び低音効果（ＬＦＥ）を含む、５．１サラウンド・サウンド・ファイル２１０である。図３Ｂでのマルチチャネル入力信号は、左フロント・チャネル（Ｌ）、中央フロント・チャネル（Ｃ）、右フロント・チャネル（Ｒ）、左サイド・サラウンド・チャネル（Ｌｓｓ）、右サイド・サラウンド・チャネル（Ｒｓｓ）、左リア・サラウンド・チャネル（Ｌｒｓ）、右リア・サラウンド・チャネル（Ｒｒｓ）、及び低音効果（ＬＦＥ）を含む、７．１サラウンド・サウンド・ファイル２４０である。 [0035] FIGS. 3A-3B are block diagrams illustrating an audio downmix pipeline for processing a multi-channel input signal, according to some embodiments. The multi-channel input signal in FIG. 3A includes a left front channel (L), a center front channel (C), a right front channel (R), a left side channel (Ls), a right side channel (Rs), And 5.1 surround sound file 210, including a bass effect (LFE). The multi-channel input signal in FIG. 3B includes a left front channel (L), a center front channel (C), a right front channel (R), a left side surround channel (Lss), and a right side surround channel. (Rss), 7.1 surround sound file 240 including left rear surround channel (Lrs), right rear surround channel (Rrs), and bass effect (LFE).

[0036] 実施形態によっては、このシステムは、マルチチャネル入力信号から１つ又は複数の入力ペアを選択する。実施形態によっては、入力ペアは、対象形に配置されたスピーカで再現されるように意図された２組のオーディオ・ストリームに対応する。したがって、この入力ペアは、１対の空間的に対称な信号源を含む。実施形態によっては、入力ペアは、中心線から同じ角度を有する２つのサイド（すなわち、左サイド及び右サイド）において２つのオーディオ・チャネルを含む。たとえば、フロント・ペアは、左フロント（Ｌ）チャネル及び右フロント（Ｒ）チャネルを含み、これらは、それぞれ中心線の左及び右に対して３０°の角度にある。別の例では、Ｄｏｌｂｙ７．１サラウンド・サウンド設定のリア・ペアは、左リア・サラウンド（Ｌｒｓ）チャネル及び右リア・サラウンド（Ｒｒｓ）チャネルを、それぞれ中心線の左及び右に対して１３５°の角度に配置する。次いで、選択された各入力ペアが、出力オーディオ・ペアを生成するそれぞれのプロセッサ（ＰＲＯＣ）に送信される。 [0036] In some embodiments, the system selects one or more input pairs from the multi-channel input signal. In some embodiments, the input pairs correspond to two sets of audio streams intended to be reproduced on symmetrically placed speakers. Thus, this input pair includes a pair of spatially symmetric signal sources. In some embodiments, the input pair includes two audio channels on two sides having the same angle from the centerline (ie, the left and right sides). For example, a front pair includes a left front (L) channel and a right front (R) channel, each at a 30 ° angle to the left and right of the centerline. In another example, the rear pair of the Dolby 7.1 surround sound setting has a left rear surround (Lrs) channel and a right rear surround (Rrs) channel at 135 ° to the left and right of the centerline, respectively. Place at an angle. Each selected input pair is then sent to a respective processor (PROC) that produces an output audio pair.

[0037] 図３Ａに示すような実施形態によっては、５．１サラウンド・サウンド・ファイルの複数の入力チャネルのうちから、このシステムは、左フロント・チャネル（Ｌ）及び右フロント・チャネル（Ｒ）をペア２２２として選択し、左サイド・チャネル（Ｌｓ）及び右サイド・チャネル（Ｒｓ）をペア２２４として選択する。図３Ｂに示すような実施形態によっては、７．１サラウンド・サウンド・ファイルの複数の入力チャネルのうちから、このシステムは、左フロント・チャネル（Ｌ）及び右フロント・チャネル（Ｒ）をペア２４２として選択し、左サイド・サラウンド・チャネル（Ｌｓｓ）及び右サイド・サラウンド・チャネル（Ｒｓｓ）をペア２４４として選択し、左リア・サラウンド・チャネル（Ｌｒｓ）及び右リア・サラウンド・チャネル（Ｒｒｓ）をペア２４６として選択する。中心線上にあるチャネル（たとえば、中央のフロントＣチャネル）、及び無指向性のチャネル（たとえば、ＬＦＥ）は、単一のチャネルであり、他のどのようなチャネルともペアにはならない。 [0037] In some embodiments, such as that shown in Figure 3A, out of multiple input channels of a 5.1 surround sound file, the system includes a left front channel (L) and a right front channel (R). Are selected as a pair 222, and the left side channel (Ls) and the right side channel (Rs) are selected as a pair 224. In some embodiments, such as that shown in FIG. 3B, of the multiple input channels of the 7.1 surround sound file, the system may pair 242 the left front channel (L) and the right front channel (R). And the left side surround channel (Lss) and the right side surround channel (Rss) are selected as a pair 244, and the left rear surround channel (Lrs) and the right rear surround channel (Rrs) are selected. Select as pair 246. The channel on the centerline (eg, the center front C channel) and the omni-directional channel (eg, LFE) are single channels and do not pair with any other channels.

[0038] 実施形態によっては、各ペアは、それぞれプロセッサに入ることになる。たとえば、図３Ａでは、ペア２２２及び２２４が、それぞれＰＲＯＣ２３２及び２３４に送信され、図３Ｂでは、ペア２４２、２４４、及び２４６が、それぞれＰＲＯＣ２５２、２５４、及び２５６に送信される。このようなプロセスでは、ペアの情報が互いに比較及び解析されて、さらに密な音像及びさらに良好な空間的結果を生成する。このペアを意味のあるものにするには、各プロセッサＰＲＯＣ内の１つ又は複数のモジュールのうち少なくとも１つが、ペア内の２つのチャネル間で情報を相互参照する必要がある。各プロセッサＰＲＯＣの出力信号は２つのチャネルを含み、各ペアの２チャネル出力は、単一チャネルと合計されて（Σ）、（それぞれ、図３Ａ及び図３Ｂに示すように）左チャネル出力（Ｌ’）及び右チャネル出力（Ｒ’）を含む出力信号を生成する。 [0038] In some embodiments, each pair will enter a respective processor. For example, in FIG. 3A, pairs 222 and 224 are transmitted to PROCs 232 and 234, respectively, and in FIG. 3B, pairs 242, 244, and 246 are transmitted to PROCs 252, 254, and 256, respectively. In such a process, pairs of information are compared and analyzed with each other to produce a denser sound image and better spatial results. To make this pair meaningful, at least one of the one or more modules in each processor PROC needs to cross-reference information between the two channels in the pair. The output signal of each processor PROC includes two channels, and the two channel outputs of each pair are summed with a single channel (Σ) to form the left channel output (L, as shown in FIGS. 3A and 3B, respectively). ') And an output signal including the right channel output (R').

[0039] 実施形態によっては、ペア・プロセッサ（ＰＲＯＣ）は、実現可能な任意の「ツーイン・ツーアウト」のオーディオ信号プロセッサとすることができる。前述の通り、プロセッサは、入力ペアからのクロスチャネルの特徴を組み込む、少なくとも１つのモジュールから構成される。実施形態によっては、ペア・プロセッサ（ＰＲＯＣ）は、入力ペアから信号情報を取り出す１つ又は複数のモジュールを含む。入力されたペア信号情報に基づいて、次いで、ペアプロセッサ（ＰＲＯＣ）は、ペアごとに出力ストリームを修正する。 [0039] In some embodiments, the pair processor (PROC) can be any feasible "two-in-two-out" audio signal processor. As described above, the processor is comprised of at least one module that incorporates cross-channel features from the input pair. In some embodiments, a pair processor (PROC) includes one or more modules that extract signal information from an input pair. Based on the input pair signal information, the pair processor (PROC) then modifies the output stream for each pair.

[0040] 実施形態によっては、ペア・プロセッサ（ＰＲＯＣ）は、チャネル依存構成要素及び／又はチャネル非依存構成要素を含む、複数の互いに異なる構成要素（又はモジュール）から構成される。実施形態によっては、チャネル依存構成要素は、マルチチャネルイン・マルチチャネルアウト・プロセス、たとえば、ペア・プロセッサでのツーイン・ツーアウトアウト・プロセスを実行する。実施形態によっては、チャネル依存構成要素は、それぞれが２つ以上の入力チャネルに基づいて複数の出力チャネルを生成する。実施形態によっては、チャネル依存構成要素は、入力信号の情報を使用し、抽出されたクロスチャネル特徴に基づいてプロセスを調整する。実施形態によっては、クロスチャネル特徴は、それぞれの入力チャネルのボリュームの比較、左右の入力チャネルの周波数スペクトル特性（たとえば、振幅及び／又は位相）の関係、及び／又は、左右の入力チャネルの信号開始の時点及び振幅の差を含む。実施形態によっては、ペア・プロセッサ（ＰＲＯＣ）は、ミッド／サイド（Ｍ／Ｓ）ミキサ、及び／又は幅制御装置（ＷＣ）など、１つ又は複数のチャネル依存構成要素を含む。たとえば、Ｍ／Ｓミキサは、入力された左右の信号の和と差を使用して、ミッド信号及びサイド信号を生成することができる。別の例では、Ｍ／Ｓミキサは、入力信号の周波数スペクトルのオーバラップ領域を比較することによって、ミッド信号及びサイド信号を生成することができる。 [0040] In some embodiments, a pair processor (PROC) is comprised of a plurality of different components (or modules), including channel-dependent components and / or channel-independent components. In some embodiments, the channel dependent component performs a multi-channel in-multi-channel out process, for example, a two-in-two-out process on a paired processor. In some embodiments, the channel-dependent component generates a plurality of output channels, each based on two or more input channels. In some embodiments, the channel-dependent component uses information from the input signal to adjust the process based on the extracted cross-channel features. In some embodiments, the cross-channel feature may be a comparison of the volume of each input channel, a relationship between frequency spectral characteristics (eg, amplitude and / or phase) of the left and right input channels, and / or a signal start of the left and right input channels. And the difference in amplitude. In some embodiments, a pair processor (PROC) includes one or more channel-dependent components, such as a mid / side (M / S) mixer, and / or a width controller (WC). For example, the M / S mixer can generate a mid signal and a side signal using the sum and difference of the input left and right signals. In another example, the M / S mixer can generate a mid signal and a side signal by comparing overlapping regions of the frequency spectrum of the input signal.

[0041] 実施形態によっては、チャネル非依存構成要素は、マルチチャネル入力ファイルのそれぞれの各チャネルを別々に処理するマルチチャネルイン・マルチチャネルアウト・プロセス（ツーイン・ツーアウトを含む）である。実施形態によっては、複数チャネルが同数のモノ信号に分割され、各モノ信号が独立して処理され、それぞれの複数チャネルの処理済みモノ信号がともに合計される場合と同様に、チャネル非依存構成要素は、マルチチャネル入力ファイルを処理する。実施形態によっては、ペア・プロセッサ（ＰＲＯＣ）は、イコライザ（ＥＱ）及び／又はダイナミックレンジ圧縮装置（ＤＲＣ）など、１つ又は複数のチャネル非依存構成要素を含む。たとえば、イコライザ（ＥＱ）モジュールは、各チャネルをそれぞれ取り込み、他のチャネルからのどのような情報も使用することなく、対応するチャネルを生成する。イコライザ（ＥＱ）の結果は、各チャネルが同じ入力パラメータで別々に等化された場合と同様である。 [0041] In some embodiments, the channel-independent component is a multi-channel in-multi-channel out process (including two-in-two-out) that processes each respective channel of the multi-channel input file separately. In some embodiments, the multiple channels are divided into the same number of mono signals, each mono signal is processed independently, and the channel independent components, as in the case where the processed mono signals of each of the multiple channels are summed together. Handles multi-channel input files. In some embodiments, the pair processor (PROC) includes one or more channel-independent components, such as an equalizer (EQ) and / or a dynamic range compressor (DRC). For example, an equalizer (EQ) module captures each channel individually and creates a corresponding channel without using any information from other channels. The result of the equalizer (EQ) is the same as if each channel were separately equalized with the same input parameters.

[0042] 図４Ａは、いくつかの実施形態による、入力ペア４１０に適用されたＰＲＯＣ４２０を含む信号ワークフローを示すブロック図である。実施形態によっては、ＰＲＯＣ４２０は、入力ペア信号４１０を処理するように構成された（構成要素とも呼ばれる）複数のモジュールを含む。実施形態によっては、ＰＲＯＣ４２０は、図３Ａ〜図３Ｂに示すように、ＰＲＯＣ２３２、ＰＲＯＣ２３４、ＰＲＯＣ２５２、ＰＲＯＣ２５４、又はＰＲＯＣ２５６など、任意のペア・プロセッサとすることができる。したがって、ペア・プロセッサ４２０は、図３Ａ〜図３Ｂに示すように、ペア２２２、ペア２２４、ペア２４２、ペア２４４、又はペア２４６など、任意の入力ペアに適用することができる。実施形態によっては、ペア・プロセッサ４２０は、図４Ａに示すように、互いに結合されたミッド／サイド（Ｍ／Ｓ）ミキサ４２２、イコライザ（ＥＱ）４２８、ダイナミックレンジ圧縮装置（ＤＲＣ）４３０、及びクロストーク消去（ＸＴＣ）モジュール４３２を含む。ＰＲＯＣ４２０の出力信号は、幅制御装置（ＷＣ）４３４及び別のダイナミックレンジ圧縮装置（ＤＲＣ）４３６でさらに処理されて、図４Ａに示すように出力ペア４４０を得る。 FIG. 4A is a block diagram illustrating a signal workflow including a PROC 420 applied to an input pair 410, according to some embodiments. In some embodiments, PROC 420 includes multiple modules (also referred to as components) configured to process input pair signal 410. In some embodiments, PROC 420 can be any paired processor, such as PROC 232, PROC 234, PROC 252, PROC 254, or PROC 256, as shown in FIGS. 3A-3B. Thus, pair processor 420 can be applied to any input pair, such as pair 222, pair 224, pair 242, pair 244, or pair 246, as shown in FIGS. 3A-3B. In some embodiments, the pair processor 420 includes a mid / side (M / S) mixer 422, an equalizer (EQ) 428, a dynamic range compressor (DRC) 430, and a cross It includes a talk cancellation (XTC) module 432. The output signal of PROC 420 is further processed in width controller (WC) 434 and another dynamic range compressor (DRC) 436 to obtain output pair 440 as shown in FIG. 4A.

[0043] 実施形態によっては、データ処理システム１００は、まず入力の左右の信号４１０をＭ／Ｓミキサ４２２に送信する。実施形態によっては、Ｍ／Ｓミキサ４２２は、入力ペア４１０から３つの成分（２つのサイド成分（Ｓ）４２４、すなわち左サイド及び右サイド、並びに１つのミッド成分（Ｍ）４２６）を生成するように構成されるミキシング・ツールである。左サイド成分は、左チャネルだけに現れる音源を表し、右サイド成分は、右チャネルだけに現れる音に対応する。中央成分は、サウンドステージの音像中心にのみ現れる音源、たとえば主要な音楽的な要素及びダイアログである。 In some embodiments, data processing system 100 first transmits left and right input signals 410 to M / S mixer 422. In some embodiments, M / S mixer 422 generates three components from input pair 410: two side components (S) 424, i.e., left and right sides, and one mid component (M) 426. Is a mixing tool. The left side component represents a sound source that appears only in the left channel, and the right side component corresponds to a sound that appears only in the right channel. The central component is a sound source that appears only at the center of the sound image of the sound stage, for example, main musical elements and dialogs.

[0044] そうすることにより、Ｍ／Ｓミキサ４２２は、後に続く様々なサウンドステージの改良に有用な情報を分離し、音質における不要なひずみ（たとえば、色付け）を最小限に抑える。さらに、このステップはまた、左成分と右成分の間の相関関係を下げるのに役立つ。実施形態によっては、Ｍ／Ｓミキサ４２２は、音像を解析し、中央、左、及び右から到来する音を推定する。次いで、Ｍ／Ｓミキサ４２２は、２つの入力チャネル、すなわち左右の信号４１０を、１チャネルのミッド信号４２６及び２チャネルのサイド信号４２４に分割する。Ｍ／Ｓミキサ４２２のより詳細な説明は、「サウンドステージ改良のための装置及び方法」と題する、２０１５年１０月２７日出願のＰＣＴ出願第ＰＣＴ／ＵＳ２０１５／０５７６１６号に見いだすことができ、これを、参照により全体として組み込む。 [0044] In doing so, the M / S mixer 422 separates information that is useful for subsequent improvement of the various sound stages and minimizes unnecessary distortion (eg, coloring) in sound quality. Further, this step also helps reduce the correlation between the left and right components. In some embodiments, M / S mixer 422 analyzes the sound image and estimates the sound coming from the center, left, and right. The M / S mixer 422 then splits the two input channels, the left and right signals 410, into one channel mid signal 426 and two channel side signals 424. A more detailed description of the M / S mixer 422 can be found in PCT Application No. PCT / US2015 / 057616, filed October 27, 2015, entitled "Apparatus and Method for Sound Stage Improvements," Is incorporated by reference in its entirety.

[0045] 次に、このシステムは、各サイド信号４２４をイコライザ（ＥＱ）４２８に送信して、このサイド信号４２４の周波数成分を調整する。実施形態によっては、２つのサイド信号４２４を処理するＥＱ４２８は、この２つのサイド信号に帯域通過フィルタ処理を実行するための１つ又は複数のマルチバンド・イコライザを含む。実施形態によっては、各サイド信号に適用されるマルチバンド・イコライザは同じである。他の実施形態によっては、一方のサイド信号に適用されるマルチバンド・イコライザは、他方のサイド信号に適用されるマルチバンド・イコライザと同じではない。それにもかかわらず、これらの機能は、オーディオ信号の元の音色を保持し、これら２つの信号に存在する曖昧な空間的手がかりを回避することである。実施形態によっては、このＥＱ４２８を使用して、２つのサイド成分のスペクトル解析に基づいてターゲットの音源を選択することもできる。図４Ａに示すような実施形態によっては、ＥＱ４２８は、２つの出力信号４５０及び４５２を生成する。実施形態によっては、出力信号４５０及び４５２のそれぞれは、２チャネル・オーディオ信号である。実施形態によっては、ＥＱ４２８は、２チャネルのサイド信号４２４のそれぞれに、それぞれの帯域通過フィルタを適用して、帯域通過フィルタ処理済みの信号４５０を得る。 Next, the system transmits each side signal 424 to an equalizer (EQ) 428 to adjust the frequency component of the side signal 424. In some embodiments, the EQ 428 that processes the two side signals 424 includes one or more multi-band equalizers to perform bandpass filtering on the two side signals. In some embodiments, the multi-band equalizer applied to each side signal is the same. In some other embodiments, the multi-band equalizer applied to one side signal is not the same as the multi-band equalizer applied to the other side signal. Nevertheless, their function is to preserve the original timbre of the audio signal and avoid ambiguous spatial cues present in these two signals. In some embodiments, the EQ 428 may be used to select a target sound source based on a spectral analysis of the two side components. In some embodiments, such as shown in FIG. 4A, the EQ 428 produces two output signals 450 and 452. In some embodiments, each of output signals 450 and 452 is a two-channel audio signal. In some embodiments, the EQ 428 applies a respective bandpass filter to each of the two channel side signals 424 to obtain a bandpass filtered signal 450.

[0046] 実施形態によっては、ＥＱ４２８はまた、ＥＱ４２８の入力信号（すなわち、２チャネルのサイド信号４２４）とＥＱ４２８の出力信号（すなわち、２チャネル帯域通過フィルタ処理済みの信号４５０）との間の周波数帯域の差に基づいて、残留信号４５２を生成する。実施形態によっては、データ処理システム１００は、帯域通過フィルタ処理済みの左サイド成分及び帯域通過フィルタ処理済みの右サイド成分それぞれから、左サイド成分及び右サイド成分を差し引くことによって、左サイドの残留成分及び右サイドの残留成分を生成する。実施形態によっては、それぞれの増幅器を残留信号及びクロストーク消去からの結果信号に適用して、この２つの信号の利得を調整した後に、これらの信号を互いに結合する。ＥＱ４２８のより詳細な説明は、「サウンドステージ改良のための装置及び方法」と題する、２０１５年１０月２７日出願のＰＣＴ出願第ＰＣＴ／ＵＳ２０１５／０５７６１６号に見いだすことができ、これを、参照により全体として組み込む。 [0046] In some embodiments, EQ 428 also includes a frequency between the input signal of EQ 428 (ie, the two-channel side signal 424) and the output signal of EQ 428 (ie, the two-channel bandpass filtered signal 450). A residual signal 452 is generated based on the band difference. In some embodiments, the data processing system 100 subtracts the left side component and the right side component from the bandpass filtered left side component and the bandpass filtered right side component, respectively, to provide a left side residual component. And a residual component on the right side is generated. In some embodiments, each amplifier is applied to the residual signal and the resulting signal from crosstalk cancellation to adjust the gain of the two signals and then combine the signals together. A more detailed description of EQ428 can be found in PCT Application No. PCT / US2015 / 057616, filed Oct. 27, 2015, entitled "Apparatus and Method for Sound Stage Improvement", which is hereby incorporated by reference. Incorporate as a whole.

[0047] 実施形態によっては、帯域通過フィルタ処理済みの信号４５０は、ダイナミックレンジ圧縮装置（ＤＲＣ）４３０に送信される。実施形態によっては、ＤＲＣ４３０は、所定の周波数範囲内で２つのオーディオ信号（すなわち、２チャネル帯域通過フィルタ処理済みの信号４５０）を増幅するための（ＥＱ４２８の帯域通過フィルタとは異なる）帯域通過フィルタを含んでいて、クロストーク・キャンセル・ブロック（ＸＴＣ）４３２が実現するサウンドステージ改良効果を最大化する。実施形態によっては、ユーザ（たとえば、音響技師）は、特定の周波数帯域を除外するようにＤＲＣ４３０の帯域通過フィルタを調整することができる。そうすることにより、ユーザは、自分が選んだある特定の音響事象を強調することができる。たとえば、ＥＱ４２８の第１の帯域通過フィルタを使用して左サイド成分及び右サイド成分に等化を実行した後、データ処理システム１００は、ＤＲＣ４３０の第２の帯域通過フィルタを使用して、左サイド成分及び右サイド成分から所定の周波数帯域を除去する。ＥＱブロック４２８及びＤＲＣブロック４３０で使用される代表的な帯域通過フィルタには、４次フィルタ又はバターワース・フィルタが含まれる。実施形態によっては、ＥＱ４２８の第１の帯域通過フィルタを使用して左サイド成分及び右サイド成分への等化を実行した後、データ処理システム１００は、ＤＲＣ４３０による第１のダイナミックレンジ圧縮を左サイド成分及び右サイド成分に対して実行して、他の周波数に対して所定の周波数帯域を強調する。ＤＲＣ４３０のより詳細な説明は、「サウンドステージ改良のための装置及び方法」と題する、２０１５年１０月２７日出願のＰＣＴ出願第ＰＣＴ／ＵＳ２０１５／０５７６１６号に見いだすことができ、これを、参照により全体として組み込む。 In some embodiments, bandpass filtered signal 450 is transmitted to dynamic range compressor (DRC) 430. In some embodiments, the DRC 430 includes a bandpass filter (different from the EQ428 bandpass filter) for amplifying the two audio signals (ie, the two-channel bandpass filtered signal 450) within a predetermined frequency range. To maximize the sound stage improvement effect provided by the crosstalk cancel block (XTC) 432. In some embodiments, a user (e.g., an acoustician) can adjust the bandpass filter of DRC 430 to exclude certain frequency bands. By doing so, the user can emphasize a particular acoustic event of his choice. For example, after performing equalization on the left and right side components using the first bandpass filter of EQ 428, data processing system 100 may use the second bandpass filter of DRC 430 to perform left side component filtering. A predetermined frequency band is removed from the component and the right side component. Typical bandpass filters used in the EQ block 428 and the DRC block 430 include a fourth-order filter or a Butterworth filter. In some embodiments, after performing equalization to the left and right side components using the first bandpass filter of EQ 428, the data processing system 100 may reduce the first dynamic range compression by the DRC 430 to the left side component. This is performed on the component and the right side component to emphasize a predetermined frequency band for other frequencies. A more detailed description of DRC 430 can be found in PCT Application No. PCT / US2015 / 057616, filed Oct. 27, 2015, entitled "Apparatus and Method for Sound Stage Improvement", which is hereby incorporated by reference. Incorporate as a whole.

[0048] 実施形態によっては、ＤＲＣ４３０からの出力信号は、次いで、クロストーク・キャンセル・プロセスを実行するためのクロストーク・キャンセル（ＸＴＣ）モジュール４３２に送信される。クロストークは、ステレオ（すなわち、２チャネル）ラウドスピーカ再生時に固有の問題である。各スピーカの反対側の耳に音が到達するとクロストークが発生し、元の信号に不要なスペクトルの音色付けがなされる。この問題に対する解決策は、クロストーク消去（ＸＴＣ）アルゴリズムである。ＸＴＣアルゴリズムの１つのタイプは、頭部伝達関数（ＨＲＴＦ）及び／又はバイノーラル室内インパルス応答（ＢＲＩＲ）など、汎用型の指向性バイノーラル伝達関数を使用して、リスナの位置に対する２つの物理的ラウドスピーカの角度を表す。ＸＴＣアルゴリズム・システムの別のタイプは、頭部伝達関数（ＨＲＴＦ）、バイノーラル室内インパルス応答（ＢＲＩＲ）、又は他の任意のバイノーラル伝達関数を必要としない、再帰的クロストーク消去方法である。基本的なアルゴリズムは、以下のように公式化することができる。
ｌｅｆｔ［ｎ］＝ｌｅｆｔ［ｎ］−Ａ_Ｌ＊ｒｉｇｈｔ［ｎ−ｄ_Ｌ］
ｒｉｇｈｔ［ｎ］＝ｒｉｇｈｔ［ｎ］−Ａ_Ｒ＊ｌｅｆｔ［ｎ−ｄ_Ｒ］
ここで、Ａ_Ｌ及びＡ_Ｒは信号の減衰係数であり、ｄ_Ｌ及びｄ_Ｒは、それぞれのスピーカから反対側の耳までのデータ・サンプル数の遅延である。実施形態によっては、図４Ａに示すようなＸＴＣ４３２は、再帰的クロストーク消去方法又は汎用型の指向性バイノーラル伝達関数を使用する。ＸＴＣ４３２のより詳細な説明は、「サウンドステージ改良のための装置及び方法」と題する、２０１４年１２月１２日出願の米国特許出願第１４／５６９，４９０号（２０１６年１２月２７日に付与された特許第９，５３２，１５６号）、及び「サウンドステージ改良のための装置及び方法」と題する、２０１６年１１月１１日出願の米国特許出願第１５／３４９，８２２号に見いだすことができ、これらを、参照により全体として組み込む。 [0048] In some embodiments, the output signal from DRC 430 is then sent to a crosstalk cancellation (XTC) module 432 for performing a crosstalk cancellation process. Crosstalk is an inherent problem when playing stereo (ie, two-channel) loudspeakers. When the sound reaches the ears on the opposite side of each speaker, crosstalk occurs, and the original signal is colored with an unnecessary spectrum. The solution to this problem is the crosstalk cancellation (XTC) algorithm. One type of XTC algorithm uses two general-purpose directional binaural transfer functions, such as a head-related transfer function (HRTF) and / or a binaural room impulse response (BRIR), to provide two physical loudspeakers to the location of the listener. Represents the angle of Another type of XTC algorithm system is a recursive crosstalk cancellation method that does not require head related transfer function (HRTF), binaural room impulse response (BRIR), or any other binaural transfer function. The basic algorithm can be formulated as follows.
left [n] = left [n] -A _L * right [n-d _L ]
_{right [n] = right [n} ] -A R * left [n-d R]
Here, A _L and A _R are the attenuation coefficients of the signals, d _L and d _R is a data sample number of the delay from each speaker to the ear opposite. In some embodiments, XTC 432, as shown in FIG. 4A, uses a recursive crosstalk cancellation method or a general-purpose directional binaural transfer function. A more detailed description of XTC 432 may be found in U.S. Patent Application No. 14 / 569,490, filed December 12, 2014, entitled "Apparatus and Method for Sound Stage Improvements," issued December 27, 2016. No. 9,532,156) and U.S. Patent Application No. 15 / 349,822, filed November 11, 2016, entitled "Apparatus and Method for Sound Stage Improvement". These are incorporated by reference in their entirety.

[0049] 図４Ａに示すような実施形態によっては、ＸＴＣ４３２の出力信号が増幅器４６２に供給され、残留信号４５２のペアが増幅器４６４に供給され、中間成分（Ｍ）４２６も増幅器４６６に供給された後に、これらが幅制御装置（ＷＣ）４３４に送信されて、処理され、一緒に結合される。 In some embodiments, as shown in FIG. 4A, the output signal of XTC 432 was provided to amplifier 462, a pair of residual signals 452 was provided to amplifier 464, and intermediate component (M) 426 was also provided to amplifier 466. Later, they are sent to a width controller (WC) 434, where they are processed and combined together.

[0050] 実施形態によっては、増幅器４６２、４６４、及び４６６の出力信号が、ＷＣ４３４に送信されて、ステージの幅を調整する。実施形態によっては、ＷＣ４３４は、入力信号ペアの解析済み情報を使用して、出力オーディオ信号のサウンドステージ幅を制御する。ステージ幅は、０°の狭さから、トータル没入型サウンドでの３６０°の広さまでとすることができる。実施形態によっては、ペア（たとえば、出力ペア４７２又は４７４）のクロスチャネル情報が解析され、別の方法で調整される。実施形態によっては、以下で図５に示すように、ユーザは、所望のステージ幅を幅制御装置に割り当てることができる。割り当てられたステージ幅は、これまでに解析された情報に基づいて、ＷＣ４３４のペア加算行列に影響を及ぼす場合がある。例によっては、サウンドステージの幅は、以下の式を使用して調整することができる。

ここで、−５≦β≦０は、ステージ幅のパラメータである。その結果得られる信号は、β＝０のときサウンドステージ幅が最大になり、β＝−５のときモノ信号に近い。 [0050] In some embodiments, the output signals of

amplifiers

462, 464, and 466 are sent to WC 434 to adjust the width of the stage. In some embodiments, WC 434 uses the analyzed information of the input signal pair to control the sound stage width of the output audio signal. The stage width can be as narrow as 0 ° to as wide as 360 ° for a total immersive sound. In some embodiments, the cross-channel information of the pair (eg, output pair 472 or 474) is analyzed and otherwise adjusted. In some embodiments, the user can assign a desired stage width to the width controller, as shown below in FIG. The assigned stage width may affect the WC434 pair addition matrix based on the information analyzed so far. In some examples, the width of the sound stage can be adjusted using the following equation:

Here, −5 ≦ β ≦ 0 is a parameter of the stage width. The resulting signal has a maximum sound stage width when β = 0 and is close to a mono signal when β = −5.

[0051] 実施形態によっては、ＷＣ４３４の出力信号４７６は、第２のダイナミックレンジ圧縮装置（ＤＲＣ）４３６に送信されて、オーディオ・マスタリング・プロセスにおいてオーディオ信号の総合的な出力レベルを増幅する。実施形態によっては、データ処理システム１００は、ＤＲＣ４３６による左サイド成分及び右サイド成分への第２のダイナミックレンジ圧縮を実行して、デジタル・オーディオ出力信号での位置特定の手がかりを保存する。 [0051] In some embodiments, the output signal 476 of the WC 434 is sent to a second dynamic range compressor (DRC) 436 to amplify the overall output level of the audio signal in an audio mastering process. In some embodiments, data processing system 100 performs a second dynamic range compression on the left and right side components by DRC 436 to preserve location cues in the digital audio output signal.

[0052] 図４Ａに示すように、パイプラインの出力は、左チャネル（Ｌ’）及び右チャネル（Ｒ’）を含むステレオ・オーディオ信号４４０である。実施形態によっては、ＰＲＯＣ４２０を含む信号ワークフローは、図３Ａに示すように、５．１サラウンド・サウンド・ファイルに適用することができる。たとえば、ＰＲＯＣ２３２及び／又はＰＲＯＣ２３４は、図４Ａに示すように、ＰＲＯＣ４２０と同様でもよい。 [0052] As shown in FIG. 4A, the output of the pipeline is a stereo audio signal 440 that includes a left channel (L ') and a right channel (R'). In some embodiments, a signal workflow including PROC 420 can be applied to a 5.1 surround sound file, as shown in FIG. 3A. For example, PROC 232 and / or PROC 234 may be similar to PROC 420, as shown in FIG. 4A.

[0053] 図４Ｂは、いくつかの実施形態による、７．１サラウンド・サウンド・ファイルに適用される信号ワークフローを示すブロック図である。図３Ｂを参照して説明する実施形態によっては、７．１サラウンド・サウンド・ファイルの入力信号は、ペアに、すなわち、それぞれＬ／Ｒペア２４２、Ｌｓｓ／Ｒｓｓペア２４４、及びＬｒｓ／Ｒｒｓペア２４６にグループ化される。次いで、Ｌ／Ｒペア２４２、Ｌｓｓ／Ｒｓｓペア２４４、及びＬｒｓ／Ｒｒｓペア２４６は、それぞれＰＲＯＣ２５２、ＰＲＯＣ２５４、及びＰＲＯＣ２５６に送信される。実施形態によっては、ＰＲＯＣ２５２、ＰＲＯＣ２５４、及びＰＲＯＣ２５６のそれぞれのＰＲＯＣは、図４Ａを参照して先に述べたＰＲＯＣ４２０と同様である。他の実施形態によっては、ＰＲＯＣ２５２、ＰＲＯＣ２５４、又はＰＲＯＣ２５６は、様々な構成で１つ又は複数の他のモジュール（又は構成要素）を含んでもよい。ＰＲＯＣ２５２、ＰＲＯＣ２５４、及びＰＲＯＣ２５６のそれぞれの出力信号は、それぞれの幅制御、たとえば、それぞれＷＣ４８２、ＷＣ４８４、及びＷＣ４８６に送信される。ＷＣ４８２、ＷＣ４８４、又はＷＣ４８６は、図４Ａを参照して先に述べたＷＣ４３４と同様でもよい。ＷＣ４８２、ＷＣ４８４、及びＷＣ４８６の出力を、中心信号Ｃ及び低音効果チャネルＬＦＥと結合して、出力ステレオ・オーディオ信号４８８を生成する。 FIG. 4B is a block diagram illustrating a signal workflow applied to a 7.1 surround sound file, according to some embodiments. In some embodiments described with reference to FIG. 3B, the input signals of the 7.1 surround sound file are paired, ie, L / R pair 242, Lss / Rss pair 244, and Lrs / Rrs pair 246, respectively. Grouped into Next, L / R pair 242, Lss / Rss pair 244, and Lrs / Rrs pair 246 are transmitted to PROC 252, PROC 254, and PROC 256, respectively. In some embodiments, the respective PROCs of PROC 252, PROC 254, and PROC 256 are similar to PROC 420 described above with reference to FIG. 4A. In some other embodiments, PROC 252, PROC 254, or PROC 256 may include one or more other modules (or components) in various configurations. The output signal of each of PROC 252, PROC 254, and PROC 256 is transmitted to a respective width control, eg, WC 482, WC 484, and WC 486, respectively. WC482, WC484, or WC486 may be similar to WC434 described above with reference to FIG. 4A. The outputs of WC482, WC484, and WC486 are combined with center signal C and bass effect channel LFE to produce output stereo audio signal 488.

[0054] 図５には、いくつかの実施形態による、７．１サラウンド・サウンド・ファイルについて図４Ｂを参照して述べたような、信号パイプラインの実装を管理するのに使用される、ソフトウェア・アプリケーションのユーザ・インターフェース（ＵＩ）５００、又はソフトウェア・アプリケーションのプラグイン・コンポーネントが示してある。信号パイプラインは、図４Ａに示すようなＰＲＯＣ４２０など、複数のペア・プロセッサを含んでもよい。この場合には、３つのペア、すなわちフロント・ペア（Ｌ、Ｒ）、サイド・ペア（Ｌｓｓ、Ｒｓｓ）、リア・ペア（Ｌｒｓ、Ｒｒｓ）がある。ＵＩ５００の左パネル５１０は、それぞれの入力チャネルの利得を制御する。制御領域５２０は、イコライザ（ＥＱ）の周波数成分を制御する。制御領域５３０は、幅制御装置の幅を制御する。図４Ｂを参照して先に述べた通り、各ペアは、互いに異なるパラメータを有する互いに異なるペア・プロセッサＰＲＯＣを通過し、したがって、制御領域５４０を使用して、どのペア（たとえば、フロント・ペア、サイド・ペア、又はリア・ペア）が、ＰＲＯＣ処理を実行するためのパラメータを入力するかを選択する。 [0054] FIG. 5 illustrates software used to manage the implementation of the signal pipeline, as described with reference to FIG. 4B for 7.1 surround sound files, according to some embodiments. An application user interface (UI) 500 or a plug-in component of a software application is shown. The signal pipeline may include multiple paired processors, such as PROC 420 as shown in FIG. 4A. In this case, there are three pairs: a front pair (L, R), a side pair (Lss, Rss), and a rear pair (Lrs, Rrs). The left panel 510 of the UI 500 controls the gain of each input channel. The control area 520 controls the frequency components of the equalizer (EQ). The control area 530 controls the width of the width control device. As described above with reference to FIG. 4B, each pair passes through a different pair processor PROC having different parameters, and thus uses control region 540 to determine which pair (eg, front pair, Side pair or rear pair) selects whether to input a parameter for executing the PROC process.

[0055] 前述の通り、音は、メディア及び娯楽において重要な部分である。音は、聴衆の感情に訴えかけ、聴衆を物語に引き込む。さらに没入する聴取体験を得るために、マルチチャネル・サラウンド・サウンドが導入される。マルチチャネル・オーディオ・フォーマットは、複数のオーディオ・トラックを利用して、対応するマルチチャネル・サウンド再現システム上で音を再構成する。ダウンミキシングは、制作側及び再現側の両方で発生する場合がある。制作側では、サウンド・ミキサは通常、最多のチャネル数でミキシングを開始し、それよりも少ないチャネル数にダウンミックスする。再現側では、マルチチャネル・オーディオ・トラックを、相対的に少ないチャネル数にダウンミックスして、再現システムのチャネル番号に合わせることができる。どちらの場合にも、元の独創的な意図にマッチするサウンドの使用及び配置を維持することが目標である。 [0055] As mentioned above, sound is an important part of media and entertainment. The sounds appeal to the audience's emotions and draw the audience into the story. To get a more immersive listening experience, multi-channel surround sound is introduced. The multi-channel audio format utilizes multiple audio tracks to reconstruct the sound on a corresponding multi-channel sound reproduction system. Downmixing can occur on both the production side and the reproduction side. On the production side, the sound mixer usually starts mixing with the highest number of channels and downmixes to a lower number of channels. On the reproduction side, the multi-channel audio track can be downmixed to a relatively small number of channels to match the channel number of the reproduction system. In both cases, the goal is to maintain the use and placement of sounds that match the original intent.

[0056] 図１Ａ〜図１Ｃに示すように、従来のダウンミキシング方法は、個々の各トラックにプロセスを適用する。トラック間の関係は考慮に入れない。本明細書に提示される方法及びシステムは、ペアごとにオーディオ入力を受け取る。マルチチャネル・オーディオ・トラックは、第１に、再現システムの物理的な配置に応じて各ペアにグループ分けされる。第２に、ペアの２つのチャネル間の関係が解析されることになる。最後に、このペア入力は、解析結果に基づいて処理される。ペア間の関係を組み込むことによって、元のマルチチャネル・オーディオ入力の空間情報をより良好に保存することができる。その結果、本明細書で導入されるシステム及び方法でダウンミックスされるマルチチャネル・オーディオ出力は、従来の方法でダウンミックスされる同じオーディオ入力よりも元のマルチチャネル・オーディオ入力に近い、さらに正確なサウンド・スケープを生成する。 As shown in FIGS. 1A to 1C, the conventional downmixing method applies a process to each individual track. The relationship between the tracks is not taken into account. The methods and systems presented herein receive audio input on a pair-by-pair basis. Multi-channel audio tracks are first grouped into pairs according to the physical arrangement of the reproduction system. Second, the relationship between the two channels of the pair will be analyzed. Finally, the pair input is processed based on the analysis result. By incorporating the relationship between pairs, the spatial information of the original multi-channel audio input can be better preserved. As a result, the multi-channel audio output downmixed with the systems and methods introduced herein is more accurate and closer to the original multi-channel audio input than the same audio input downmixed in a conventional manner. Create a nice soundscape.

[0057] 図６Ａ〜６Ｃは、いくつかの実施形態による、図２に示すデータ処理システム１００を使用してマルチチャネル・オーディオ信号をダウンミックスするプロセスを示す流れ図である。データ処理システム１００は、マルチチャネル入力オーディオ信号から左入力チャネル及び右入力チャネルを選択する（６０２）。実施形態によっては、左入力チャネル及び右入力チャネルは、空間的に対称な１対の信号源に対応する。実施形態によっては、マルチチャネル・オーディオ入力信号は、図３Ａの５．１サラウンド・サウンド・ファイル２１０、又は図３Ｂの７．１サラウンド・サウンド・ファイル２４０である。実施形態によっては、５．１サラウンド入力信号２１０は、ペア２２２としての左フロント・チャネルＬ及び右フロント・チャネルＲ、並びにペア２２４としての左サイド・チャネルＬｓ及び右サイド・チャネルＲｓを含む。実施形態によっては、７．１サラウンド入力信号２４０は、ペア２４２としての左フロント・チャネルＬ及び右フロント・チャネルＲ、ペア２４４としての左サイド・サラウンド・チャネルＬｓｓ及び右サイド・サラウンド・チャネルＲｓｓ、並びにペア２４６としての左リア・サラウンド・チャネルＬｒｓ及び右リア・サラウンド・チャネルＲｒｓを含む。 [0057] FIGS. 6A-6C are flowcharts illustrating a process of downmixing a multi-channel audio signal using the data processing system 100 shown in FIG. 2, according to some embodiments. The data processing system 100 selects a left input channel and a right input channel from the multi-channel input audio signal (602). In some embodiments, the left input channel and the right input channel correspond to a pair of spatially symmetric signal sources. In some embodiments, the multi-channel audio input signal is the 5.1 surround sound file 210 of FIG. 3A or the 7.1 surround sound file 240 of FIG. 3B. In some embodiments, 5.1 surround input signal 210 includes left front channel L and right front channel R as pair 222 and left side channel Ls and right side channel Rs as pair 224. In some embodiments, the 7.1 surround input signal 240 includes a left front channel L and a right front channel R as a pair 242, a left side surround channel Lss and a right side surround channel Rss as a pair 244, And a left rear surround channel Lrs and a right rear surround channel Rrs as a pair 246.

[0058] 次いで、データ処理システム１００は、選択されたペアの左入力チャネル及び右入力チャネルから、１つ又は複数のクロスチャネル特徴を生成する（６０４）。実施形態によっては、１つ又は複数のクロスチャネル特徴は、ペアの左右の入力チャネルのボリュームの比較、左右の入力チャネルの周波数スペクトル特性（たとえば、振幅及び／又は位相）の関係、及び／又は、左右の入力チャネルの信号開始の時点及び振幅の差を含む。 [0058] Next, data processing system 100 generates one or more cross-channel features from the selected pair of left and right input channels (604). In some embodiments, the one or more cross-channel features include a comparison of the volumes of the left and right input channels of the pair, a relationship between frequency spectral characteristics (eg, amplitude and / or phase) of the left and right input channels, and / or It includes the difference between the signal start time and amplitude of the left and right input channels.

[0059] 次いで、データ処理システム１００は、選択済みのペアのクロスチャネル特徴、左入力チャネル、及び右入力チャネルに従って処理して（６０６）、左中間チャネル及び右中間チャネルを生成する。実施形態によっては、ペアの左入力チャネル及び右入力チャネルは、図３Ａ〜３Ｂ及び図４Ａ〜４Ｂに示すようにプロセッサ（ＰＲＯＣ）を使用して処理される。実施形態によっては、ＰＲＯＣは、図４Ａに示すような１つ又は複数のモジュールを含む。 [0059] The data processing system 100 then processes (606) according to the selected pair of cross-channel features, the left input channel, and the right input channel to generate a left intermediate channel and a right intermediate channel. In some embodiments, the left and right input channels of the pair are processed using a processor (PROC) as shown in FIGS. 3A-3B and 4A-4B. In some embodiments, the PROC includes one or more modules as shown in FIG. 4A.

[0060] 次に、データ処理システム１００は、左中間チャネル及び右中間チャネルのそれぞれと、マルチチャネル入力オーディオ信号の第３の入力チャネルとを結合して（６０８）、２チャネル出力オーディオ信号を形成する。たとえば、図３Ａに示すように、それぞれのＰＲＯＣによって処理された左中間チャネル及び右中間チャネル（たとえば、Ｌ／Ｒペア又はＬｓ／Ｒｓペア）は、センター・チャネルＣ及び／又は低音効果（ＬＦＥ）と結合されて、２チャネル出力オーディオ信号Ｌ’／Ｒ’を生成する。同様に、図３Ｂに示すように、それぞれのＰＲＯＣによって処理された左中間チャネル及び右中間チャネル（たとえば、Ｌ／Ｒペア、Ｌｓｓ／Ｒｓｓペア、又はＬｒｓ／Ｒｒｓペア）は、センター・チャネルＣ及び／又は低音効果（ＬＦＥ）と結合されて、２チャネル出力オーディオ信号Ｌ’／Ｒ’を生成する。 Next, the data processing system 100 combines (608) each of the left intermediate channel and the right intermediate channel with the third input channel of the multi-channel input audio signal to form a two-channel output audio signal. I do. For example, as shown in FIG. 3A, a left intermediate channel and a right intermediate channel (eg, an L / R pair or an Ls / Rs pair) processed by respective PROCs may include a center channel C and / or bass effect (LFE). To generate a two-channel output audio signal L '/ R'. Similarly, as shown in FIG. 3B, the left intermediate channel and the right intermediate channel (eg, L / R pair, Lss / Rss pair, or Lrs / Rrs pair) processed by respective PROCs are center channels C and And / or combined with a bass effect (LFE) to produce a two-channel output audio signal L ′ / R ′.

[0061] 音響工学では、サウンドステージは、通常、オーディオ再現において最も左に知覚される位置と最も右に知覚される位置との間の領域として定義される。すなわち、サウンドステージは、音の対象をどれくらい遠くまで知覚できるかの限界である。したがって、サウンドステージ幅は、左の境界と右の境界の間の距離として定義される。通常の場合、ステレオ再現のサウンドステージ幅は、２つのラウドスピーカの分離距離である。この用途では、サウンドステージの概念は、独立してチャネルの各対称ペアに適用されるようになっている。たとえば、実施形態によっては、データ処理システム１００は、幅制御装置（たとえば、ＷＣ４３４）を使用して、左中間チャネル及び右中間チャネルに関連付けられたサウンドステージ幅をさらに調整した（６１０）後に、左中間チャネル及び右中間チャネルと第３の入力チャネルとを結合する。実施形態によっては、データ処理システム１００は、２チャネル出力オーディオ信号のサウンドステージ幅を指定するユーザ入力を受信する（６１２）。ユーザ入力は、図５に示すように、ＵＩ５００で受信することができる。 [0061] In acoustics, a sound stage is usually defined as the area between the leftmost and rightmost perceived position in audio reproduction. That is, the sound stage is the limit of how far a sound object can be perceived. Thus, the soundstage width is defined as the distance between the left and right boundaries. In the normal case, the sound stage width of a stereo reproduction is the separation distance between two loudspeakers. In this application, the concept of the soundstage is adapted to be applied independently to each symmetric pair of channels. For example, in some embodiments, data processing system 100 uses a width controller (eg, WC 434) to further adjust (610) the sound stage width associated with the left intermediate channel and the right intermediate channel before the left processing. The intermediate channel and the right intermediate channel are combined with the third input channel. In some embodiments, data processing system 100 receives user input specifying a sound stage width of the two-channel output audio signal (612). User input can be received at the UI 500, as shown in FIG.

[0062] 実施形態によっては、左入力チャネル及び右入力チャネルを処理するステップ６０６はさらに、左入力チャネル及び右入力チャネルから、中間成分、左サイド成分、及び右サイド成分を抽出すること（６１４）を含む。たとえば、図４Ａに示すように、入力ペア４１０は、Ｍ／Ｓミキサ４２２によって処理されて、中間成分４２６と、左サイド成分及び右サイド成分Ｓ４２４とを生成する。実施形態によっては、データ処理システム１００は、左入力チャネル及び右入力チャネル、並びに左サイド成分及び右サイド成分を処理した（６１６）後に、これらと中間成分とを結合して、左中間チャネル及び右中間チャネルを生成する。 [0062] In some embodiments, processing 606 the left and right input channels further comprises extracting an intermediate component, a left side component, and a right side component from the left and right input channels (614). including. For example, as shown in FIG. 4A, input pair 410 is processed by M / S mixer 422 to generate intermediate component 426 and left and right side components S424. In some embodiments, after processing 616 the left and right input channels, and the left and right side components, the data processing system 100 combines these with the intermediate components to form the left intermediate channel and the right Create an intermediate channel.

[0063] 実施形態によっては、左入力チャネル及び右入力チャネルを処理するステップ６０６は、帯域通過フィルタを使用して左サイド成分及び右サイド成分に（たとえば、図４ＡのＥＱブロック４２８による）等化を実行して（６１８）、左の帯域通過フィルタ処理済み成分及び右の帯域通過フィルタ処理済み成分（たとえば、ＥＱ４５０の出力信号）を得ることをさらに含む。実施形態によっては、図４Ａに示すような左サイド残留成分及び右サイド残留成分４５２のように、この等化プロセスはさらに、左サイド成分と左の帯域通過フィルタ処理済み成分との間の差に基づいて左サイド残留成分を生成し（６２０）、右サイド成分と右の帯域通過フィルタ処理済み成分との間の差に基づいて右サイド残留成分を生成する（６２０）。 [0063] In some embodiments, processing 606 the left and right input channels includes equalizing (eg, by EQ block 428 of FIG. 4A) to the left and right side components using a bandpass filter. (618) to obtain the left band-pass filtered component and the right band-pass filtered component (eg, the output signal of the EQ 450). In some embodiments, such as the left side residual component and the right side residual component 452 as shown in FIG. 4A, this equalization process further reduces the difference between the left side component and the left bandpass filtered component. A left side residual component is generated based on the difference (620), and a right side residual component is generated based on a difference between the right side component and the right band-pass filtered component (620).

[0064] 実施形態によっては、左サイド成分及び右サイド成分への等化を実行した後、データ処理システム１００は、（たとえば、図４ＡのＤＲＣ４３０による）第１のダイナミックレンジ圧縮を、左の帯域通過フィルタ処理済み成分及び（たとえば、図４ＡのＥＱ４２８によって生成される）右の帯域通過フィルタ処理済み成分に対してそれぞれ実行して（６２２）、それに応じて左の圧縮済み成分及び右の圧縮済み成分を得る。 [0064] In some embodiments, after performing equalization to the left and right side components, data processing system 100 may perform first dynamic range compression (eg, according to DRC 430 of FIG. 4A) on the left band. Performed 622 on the pass-filtered component and the right band-pass filtered component (eg, generated by EQ 428 of FIG. 4A), respectively, and accordingly the left compressed component and the right compressed component. Get the ingredients.

[0065] 実施形態によっては、第１のダイナミックレンジ圧縮を実行した後、データ処理システム１００は、（たとえば、図４ＡのＸＴＣ４３２による）クロストーク消去を、（たとえば、図４ＡのＤＲＣ４３０によって生成される）左の圧縮済み成分及び右の圧縮済み成分にそれぞれ実行して（６２４）、クロストーク消去済みの左サイド成分及びクロストーク消去済みの右サイド成分を得る。 [0065] In some embodiments, after performing the first dynamic range compression, data processing system 100 generates crosstalk cancellation (eg, by XTC 432 of FIG. 4A) (eg, by DRC 430 of FIG. 4A). ) Perform 624 on the left and right compressed components, respectively, to obtain the left side component with crosstalk cancellation and the right side component with crosstalk cancellation.

[0066] 実施形態によっては、データ処理システム１００は、クロストーク消去済みの左サイド成分及びクロストーク消去済みの右サイド成分、左サイド残留成分及び右サイド残留成分、並びに中間成分を結合して（６２６）、左中間チャネル及び右中間チャネルを生成する。実施形態によっては、この結合ステップはさらに、左中間チャネル及び右中間チャネルに関連付けられたサウンドステージ幅を（たとえば、図４ＡのＷＣ４３４によって）調整した（６２８）後に、これらと第３の入力チャネルとを結合することを含む。たとえば、図４Ｂに示すように、それぞれのＰＲＯＣによって生成される左右の中間チャネルは、それぞれのＷＣに送られて、サウンドステージ幅を調整する。次いで、調整済みの信号は、第３の入力チャネル、たとえばＣ又はＬＦＥチャネルと結合されて、出力ステレオ信号４８８を生成する。 In some embodiments, the data processing system 100 combines the crosstalk-eliminated left side component and the crosstalk-eliminated right side component, the left and right side residual components, and the intermediate component ( 626), generate a left intermediate channel and a right intermediate channel. In some embodiments, this combining step further comprises adjusting (628) the sound stage width associated with the left intermediate channel and the right intermediate channel (e.g., by WC 434 of FIG. 4A), and then combining these with the third input channel. And combining. For example, as shown in FIG. 4B, the left and right intermediate channels generated by each PROC are sent to each WC to adjust the sound stage width. The conditioned signal is then combined with a third input channel, eg, a C or LFE channel, to generate an output stereo signal 488.

[0067] 実施形態によっては、サウンドステージ幅を調整した後、データ処理システム１００は、（たとえば、図４ＡのＤＲＣ４３６による）第２のダイナミックレンジ圧縮を実行して（６３０）、左中間チャネル及び右中間チャネルを生成する。 [0067] In some embodiments, after adjusting the sound stage width, the data processing system 100 performs a second dynamic range compression (eg, according to the DRC 436 of FIG. 4A) (630) to obtain a left intermediate channel and a right intermediate channel. Create an intermediate channel.

[0068] 最後に、本発明は、もっぱらハードウェアの実施形態、もっぱらソフトウェアの実施形態、又はハードウェア要素及びソフトウェア要素の両方を含む実施形態の形をとることができることに留意されたい。好ましい一実施形態では、本発明は、ソフトウェアで実装されており、このソフトウェアには、それだけには限定されないが、ファームウェア、常駐ソフトウェア、マイクロコードなどが含まれている。 [0068] Finally, it is noted that the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, and the like.

[0069] さらに、本発明は、コンピュータ若しくは任意の命令実行システムが使用するか、又はそれとともに使用するためのプログラム・コードを提供する、コンピュータ使用可能又はコンピュータ読取り可能な媒体からアクセスできるコンピュータ・プログラム製品の形をとることができる。この説明のために、コンピュータ使用可能又はコンピュータ読取り可能な媒体は、命令実行のシステム、装置、若しくはデバイスによって、又はそれらとともに使用するために、プログラムを包含、記憶、伝達、伝搬、又は移送することができる任意の有形装置とすることができる。 [0069] Further, the present invention provides a computer program accessible from a computer usable or computer readable medium that provides program code for use with or for use by a computer or any instruction execution system. It can take the form of a product. For purposes of this description, computer-usable or computer-readable media includes, stores, communicates, propagates, or transports programs for use by or with a system, apparatus, or device of instruction execution. It can be any tangible device that can do this.

[0070] この媒体は、電子、磁気、光、電磁、赤外線、若しくは半導体のシステム（若しくは機器、若しくは装置）、又は伝搬媒体とすることができる。コンピュータ読取り可能な媒体の例には、半導体メモリすなわち固体記憶装置、磁気テープ、取外し可能なコンピュータ・ディスケット、ランダム・アクセス・メモリ（ＲＡＭ）、リードオンリ・メモリ（ＲＯＭ）、硬質磁気ディスク、及び光ディスクが含まれる。光ディスクの現在の例には、コンパクト・ディスク・リード・オンリ・メモリ（ＣＤ−ＲＯＭ）、コンパクト・ディスク読取り／書込み（ＣＤ−Ｒ／Ｗ）、及びＤＶＤが含まれる。 [0070] This medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or device or device), or a propagation medium. Examples of computer readable media include semiconductor memory or solid state storage, magnetic tape, removable computer diskettes, random access memory (RAM), read only memory (ROM), hard magnetic disks, and optical disks. included. Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read / write (CD-R / W), and DVD.

[0071] プログラム・コードを記憶及び／又は実行するのに適したデータ処理システムは、システム・バスを介して記憶素子に直接又は間接に結合された、少なくとも１つのプロセッサを備えることになる。これらの記憶素子には、プログラム・コードを実際に実行する際に利用されるローカル・メモリ、大容量記憶装置、及び、少なくとも何らかのプログラム・コードを一時的に記憶して、実行中に大容量記憶装置からコードを取り出さなければならない回数を減らすキャッシュ・メモリが含まれ得る。実施形態によっては、データ処理システムは、コンピュータ又は他の電子システムの全ての構成要素をシングル・チップ基板に集積化する、半導体チップ（たとえば、システムオンチップ）の形で実装される。 [0071] A data processing system suitable for storing and / or executing program code will include at least one processor coupled directly or indirectly to a storage element via a system bus. These storage elements include a local memory used for actually executing the program code, a mass storage device, and at least temporarily store at least some program code so as to temporarily store the mass code during execution. A cache memory may be included to reduce the number of times code must be retrieved from the device. In some embodiments, the data processing system is implemented in the form of a semiconductor chip (eg, a system-on-chip) that integrates all components of a computer or other electronic system on a single chip substrate.

[0072] 入力／出力装置すなわちＩ／Ｏ装置（キーボード、表示装置、ポインティング装置などを含むが、それだけには限定されない）は、直接又は介在するＩ／Ｏ制御装置を介して、このシステムに結合することができる。 [0072] Input / output or I / O devices (including, but not limited to, keyboards, displays, pointing devices, etc.) couple to the system either directly or through intervening I / O controls. be able to.

[0073] ネットワーク・アダプタはまた、このシステムに結合して、該データ処理システムが、介在する専用ネットワーク又は公衆ネットワークを介して、他のデータ処理システム、又はリモート・プリンタ若しくは記憶装置に結合するようになっていてもよい。モデム、ケーブル・モデム、及びイーサネット・カードは、ネットワーク・アダプタの現在利用可能なタイプのうちの、ほんのいくつかに過ぎない。 [0073] A network adapter is also coupled to the system such that the data processing system couples to another data processing system or a remote printer or storage device via an intervening private or public network. It may be. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

[0074] 本出願の説明は、例示し説明するためにおこなってきており、網羅的ではなく、又は開示された形での本発明に限定されるものでもない。多くの修正形態及び変形形態が、当業者には明白になろう。本発明の原理、その実際の適用例を最も良好に説明するために、また、企図された特定の使用に適した様々な修正形態とともに、様々な実施形態について本発明を当業者が理解できるようにするために実施形態が選ばれ、説明された。 The description of the present application has been presented for purposes of illustration and description, and is not exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those skilled in the art. To best explain the principles of the invention, its practical application, and various modifications suitable for the particular intended use, the invention will be understood by those skilled in the art for various embodiments. An embodiment has been chosen and described in order to

[0075] 本明細書での実施形態の説明に使用される専門用語は、特定の実施形態のみを説明することを目的としており、特許請求の範囲に記載の範囲を限定するものではない。実施形態及び添付特許請求の範囲の説明で使用されているように、単数形「ａ」、「ａｎ」、及び「ｔｈｅ」は、文脈から明らかにそうでない場合を除き、複数形をも含むものである。本明細書で使用される用語「及び／又は」は、関連する列挙された項目のうち１つ又は複数の項目のありとあらゆる可能な組合せを指し、またこれらを包含することも理解されよう。用語「ｃｏｍｐｒｉｓｅｓ」、及び／又は「ｃｏｍｐｒｉｓｉｎｇ」は、本明細書において使用されるとき、明記された特徴、完全体、ステップ、動作、要素、及び／又は構成部品の存在を特定するが、１つ若しくは複数の他の特徴、完全体、ステップ、動作、要素、構成部品及び／又はそれらのグループの存在を排除しないことがさらに理解されよう。 [0075] Terminology used in the description of the embodiments herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the claims. As used in the description of the embodiments and the appended claims, the singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise. . It will also be understood that the term "and / or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The terms "comprises" and / or "comprising" as used herein identify the presence of specified features, entities, steps, acts, elements and / or components, but one It will be further understood that this does not exclude the presence of or other elements, completeness, steps, acts, elements, components and / or groups thereof.

[0076] 第１、第２などの用語は、様々な要素を説明するために本明細書で使用されることがあるが、これらの要素は、これらの用語によって限定されるべきではないことも理解されよう。これらの用語はもっぱら、ある要素と別の要素を区別するために使用される。たとえば、実施形態の範囲から逸脱することなく、第１のポートを第２のポートと呼ぶこともでき、同様にして、第２のポートを第１のポートと呼ぶこともできる。第１のポート及び第２のポートは両方ともポートであるが、同じポートではない。 [0076] Terms such as "first" and "second" may be used herein to describe various elements, but these elements should not be limited by these terms. Will be understood. These terms are used exclusively to distinguish one element from another. For example, a first port may be referred to as a second port, and similarly, a second port may be referred to as a first port without departing from the scope of the embodiments. The first port and the second port are both ports, but not the same port.

[0077] 前述の説明及び関連する各図面に提示される教示の利益を享受する当業者には、本明細書に記載の実施形態の数多くの修正形態及び代替実施形態が思い浮かぶはずである。したがって、特許請求の範囲に記載の範囲は、開示された実施形態の具体例に限定されるべきではなく、修正形態及び他の実施形態が、添付の特許請求の範囲に記載の範囲内に含まれるべきものであることを理解されたい。本明細書では特定の用語が採用されているが、これらは、包括的且つ説明的な意味でのみ使用されており、限定する目的では使用されていない。 [0077] Numerous modifications and alternatives to the embodiments described herein will occur to those skilled in the art having the benefit of the foregoing description and the teachings presented in the associated figures. Therefore, the scope of the appended claims should not be limited to the specific examples of disclosed embodiments, but modifications and other embodiments are included within the scope of the appended claims. Please understand that it is something to be done. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

[0078] 基本となる原理及びその実際的な用途を最も良好に説明するように、実施形態を選択し、また説明して、それにより、企図された特定の使用に適した様々な修正形態とともに、基本となる原理及び様々な実施形態を当業者が最も良好に利用できるようになる。
[0078] The embodiments have been selected and described in order to best explain the underlying principles and their practical applications, thereby, along with various modifications suitable for the particular use contemplated. The underlying principles and various embodiments will be best utilized by those skilled in the art.

Claims

A computer-implemented method for processing a multi-channel input audio signal, comprising:
A computing device having one or more processors, a memory, and a plurality of program modules stored in the memory and executed by the one or more processors;
Selecting a left input channel and a right input channel from the multi-channel input audio signal, wherein the left input channel and the right input channel correspond to a pair of spatially symmetric signal sources;
Generating one or more cross-channel features from the left input channel and the right input channel;
Processing the left input channel and the right input channel to generate a left intermediate channel and a right intermediate channel according to the characteristics of the cross channel;
A computer-implemented method comprising combining each of the left intermediate channel and the right intermediate channel with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.

2. The computer-implemented method of claim 1, further comprising: after adjusting a sound stage width associated with the left intermediate channel and the right intermediate channel, combining them with the third input channel.

3. The computer-implemented method of claim 2, further comprising receiving a user input specifying the sound stage width of the two-channel output audio signal.

The step of processing the left input channel and the right input channel further comprises:
Extracting an intermediate component, a left side component, and a right side component from the left input channel and the right input channel;
The computer-implemented method of claim 1, further comprising: after processing the left side component and the right side component, combining them with the intermediate component to generate the left intermediate channel and the right intermediate channel. Method.

Processing the left side component and the right side component,
Performing equalization on the left side component and the right side component using a band pass filter to obtain a left band pass filtered component and a right band pass filtered component;
A left side residual component is generated based on a difference between the left side component and the left band pass filtered component, and a difference between the right side component and the right band pass filtered component is generated. Generating the right-side residual component based on the computer-implemented method.

After performing equalization on the left side component and the right side component, the left band-pass filtered component and the right band-pass filtered component are each subjected to a first dynamic range compression, The computer-implemented method of claim 5, further comprising obtaining a left compressed component and a right compressed component accordingly.

After performing the first dynamic range compression, crosstalk cancellation is performed on the left compressed component and the right compressed component, respectively, so that the crosstalk-eliminated left side component and the crosstalk-eliminated right component are removed. 7. The computer-implemented method of claim 6, further comprising obtaining a side component.

Combining the left side component after crosstalk cancellation and the right side component after crosstalk cancellation, the left side residual component and the right side residual component, and the intermediate component to form the left intermediate channel and the right intermediate channel Generating, wherein the step of combining further comprises:
The computer-implemented method of claim 7, comprising adjusting a sound stage width associated with the left intermediate channel and the right intermediate channel, and then combining them with the third input channel.

9. The computer-implemented method of claim 8, further comprising performing a second dynamic range compression after adjusting the sound stage width to generate the left intermediate channel and the right intermediate channel.

The computer-implemented method of claim 1, wherein the left input channel is a left front channel and the right input channel is a right front channel.

The computer-implemented method of claim 1, wherein the left input channel is a left surround channel and the right input channel is a right surround channel.

The computer-implemented method of claim 1, wherein the left input channel is a left rear surround channel and the right input channel is a right rear surround channel.

The computer-implemented method of claim 1, wherein the third input channel is a center channel.

The computer-implemented method of claim 1, wherein the third input channel is a bass effect channel.

A computing device for processing a multi-channel input audio signal, comprising:
One or more processors;
Memory and
A plurality of program modules stored in the memory and executed by the one or more processors;
The plurality of program modules, when executed by the one or more processors,
Selecting a left input channel and a right input channel from the multi-channel input audio signal, wherein the left input channel and the right input channel correspond to a pair of spatially symmetric signal sources;
Generating one or more cross-channel features from the left input channel and the right input channel;
Processing the left input channel and the right input channel to generate a left intermediate channel and a right intermediate channel according to the characteristics of the cross channel;
Combining the left intermediate channel and the right intermediate channel each with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal. A computing device that allows the device to execute.

16. The method of claim 15, further comprising: after adjusting a sound stage width associated with the left intermediate channel and the right intermediate channel, combining them with the third input channel. Computing device.

The computing device of claim 16, further adapted to: receive user input specifying the sound stage width of the two-channel output audio signal.

The step of processing the left input channel and the right input channel further comprises:
Extracting an intermediate component, a left side component, and a right side component from the left input channel and the right input channel;
The computing of claim 15, comprising processing the left side component and the right side component and then combining them with the intermediate component to generate the left intermediate channel and the right intermediate channel. apparatus.

Processing the left side component and the right side component,
Performing equalization on the left side component and the right side component using a band pass filter to obtain a left band pass filtered component and a right band pass filtered component;
A left side residual component is generated based on a difference between the left side component and the left band pass filtered component, and a difference between the right side component and the right band pass filtered component is generated. Generating the right-side residual component based on the computing device.

After performing equalization on the left side component and the right side component, the left band-pass filtered component and the right band-pass filtered component are each subjected to a first dynamic range compression, 20. The computing device of claim 19, further adapted to perform obtaining a left compressed component and a right compressed component accordingly.

After performing the first dynamic range compression, crosstalk cancellation is performed on the left compressed component and the right compressed component, respectively, so that the crosstalk-eliminated left side component and the crosstalk-eliminated right component are removed. 21. The computing device of claim 20, further adapted to perform obtaining a side component.

Combining the left side component after crosstalk cancellation and the right side component after crosstalk cancellation, the left side residual component and the right side residual component, and the intermediate component to form the left intermediate channel and the right intermediate channel Further comprising performing the combining step, wherein the combining step further comprises:
22. The computing device of claim 21, comprising adjusting a sound stage width associated with the left middle channel and the right middle channel before combining them with the third input channel.

23. The computer of claim 22, further comprising, after adjusting the sound stage width, performing a second dynamic range compression to generate the left intermediate channel and the right intermediate channel. Device.

The computing device of claim 15, wherein the left input channel is a left front channel and the right input channel is a right front channel.

The computing device of claim 15, wherein the left input channel is a left surround channel and the right input channel is a right surround channel.

The computing device of claim 15, wherein the left input channel is a left rear surround channel and the right input channel is a right rear surround channel.

The computing device according to claim 15, wherein the third input channel is a central channel.

The computing device of claim 15, wherein the third input channel is a bass effect channel.

A computer program product stored on a persistent computer readable storage medium, with a computing device having one or more processors for processing audio signals, the computer program product being executed by the one or more processors. When,
Selecting a left input channel and a right input channel from the multi-channel input audio signal, wherein the left input channel and the right input channel correspond to a pair of spatially symmetric signal sources;
Generating one or more cross-channel features from the left input channel and the right input channel;
Processing the left input channel and the right input channel to generate a left intermediate channel and a right intermediate channel according to the characteristics of the cross channel;
Combining the left intermediate channel and the right intermediate channel each with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal. A computer program product that includes a plurality of program modules that allow a device to execute.

The computing device is further adapted to adjust a sound stage width associated with the left middle channel and the right middle channel before combining them with the third input channel. A computer program product according to claim 29.

31. The computer program product of claim 30, wherein the computing device is further adapted to receive user input specifying the sound stage width of the two-channel output audio signal.

The step of processing the left input channel and the right input channel further comprises:
Extracting an intermediate component, a left side component, and a right side component from the left input channel and the right input channel;
30. The computer of claim 29, further comprising: after processing the left side component and the right side component, combining them with the intermediate component to generate the left intermediate channel and the right intermediate channel. Program products.

Processing the left side component and the right side component,
Performing equalization on the left side component and the right side component using a band pass filter to obtain a left band pass filtered component and a right band pass filtered component;
A left side residual component is generated based on a difference between the left side component and the left band pass filtered component, and a difference between the right side component and the right band pass filtered component is generated. Generating a right side residual based on the computer program product.

The computing device comprises:
After performing equalization on the left side component and the right side component, the left band-pass filtered component and the right band-pass filtered component are each subjected to a first dynamic range compression, 34. The computer program product of claim 33, further adapted to obtain a left compressed component and a right compressed component accordingly.

The computing device comprises:
After performing the first dynamic range compression, crosstalk cancellation is performed on the left compressed component and the right compressed component, respectively, so that the crosstalk-eliminated left side component and the crosstalk-eliminated right component are removed. 35. The computer program product of claim 34, further adapted to obtain a side component.

The computing device comprises:
Combining the left side component after crosstalk cancellation and the right side component after crosstalk cancellation, the left side residual component and the right side residual component, and the intermediate component to form the left intermediate channel and the right intermediate channel Further comprising performing the combining step, wherein the combining step further comprises:
36. The computer program product of claim 35, comprising adjusting a sound stage width associated with the left intermediate channel and the right intermediate channel before combining them with the third input channel.

The computing device comprises:
37. The computer program product of claim 36, further comprising performing a second dynamic range compression after adjusting the sound stage width to generate the left intermediate channel and the right intermediate channel.

30. The computer program product of claim 29, wherein the left input channel is a left front channel and the right input channel is a right front channel.

30. The computer program product of claim 29, wherein the left input channel is a left surround channel and the right input channel is a right surround channel.

30. The computer program product of claim 29, wherein the left input channel is a left rear surround channel and the right input channel is a right rear surround channel.

30. The computer program product according to claim 29, wherein the third input channel is a central channel.

30. The computer program product according to claim 29, wherein the third input channel is a bass effect channel.