JP2009526467A

JP2009526467A - Method and apparatus for encoding and decoding object-based audio signal

Info

Publication number: JP2009526467A
Application number: JP2008554147A
Authority: JP
Inventors: ヨンユーン，スン; スクパン，ヒー; クークリー，ヒュン; スーキム，ドン; ヒュンリム，ジェ
Original assignee: LG Electronics Inc
Current assignee: LG Electronics Inc
Priority date: 2006-02-09
Filing date: 2007-02-09
Publication date: 2009-07-16
Also published as: EP1984916A1; CA2646278A1; BRPI0708047A2; AU2007212873B2; TW200741649A; TWI326448B; WO2007091870A1; EP1984916A4; WO2007091870A8; AU2007212873A1

Abstract

【課題】オブジェクトベースオーディオ信号の符号化及び復号化方法とその装置を提供する。
【解決手段】このオーディオ信号復号化方法は、入力オーディオ信号からオブジェクトベースダウンミックス信号とオブジェクトベースパラメータ情報とを抽出し、オブジェクトベースダウンミックス信号とオブジェクトベースパラメータ情報とを用いてオブジェクトオーディオ信号を生成する。そして、インデックスデータに対応して３Ｄ情報をオブジェクトオーディオ信号に適用して３Ｄ効果の適用されたオブジェクトオーディオ信号を生成する。これにより、オブジェクトオーディオ信号別に音像を定位でき、オブジェクトオーディオ信号の再生の際により精巧な現実感を提供することができる。An object-based audio signal encoding and decoding method and apparatus are provided.
The audio signal decoding method extracts an object base downmix signal and object base parameter information from an input audio signal, and generates an object audio signal using the object base downmix signal and the object base parameter information. To do. Then, 3D information is applied to the object audio signal corresponding to the index data to generate an object audio signal to which the 3D effect is applied. Thereby, a sound image can be localized for each object audio signal, and more elaborate realism can be provided when reproducing the object audio signal.

Description

本発明は、オーディオ信号の符号化及び復号化方法とその装置に関し、より詳細には、オブジェクトオーディオ信号別に音像を所望の空間位置に定位させることができるように符号化及び復号化するオーディオ信号の符号化及び復号化方法とその装置に関する。 The present invention relates to an audio signal encoding and decoding method and apparatus, and more particularly, to an audio signal that is encoded and decoded so that a sound image can be localized at a desired spatial position for each object audio signal. The present invention relates to an encoding and decoding method and an apparatus therefor.

一般に、オブジェクトベースオーディオ符号化方法において、オブジェクトエンコーダは、複数のオブジェクトオーディオ信号をダウンミックスすることによってダウンミックス信号を生成し、オブジェクトオーディオ信号から抽出した複数の情報を含むパラメータ情報を生成する。典型的なオブジェクトベースオーディオ復号化方法において、オブジェクトデコーダは、受信したダウンミックス信号をオブジェクトベースパラメータ情報を用いて復号化することによって複数のオブジェクトオーディオ信号を復元し、レンダラーは、復元されたオブジェクト信号の位置を指定するのに必要な制御データに基づいて、オブジェクトオーディオ信号を２チャンネル信号又はマルチチャンネル信号に合成する。 In general, in an object-based audio encoding method, an object encoder generates a downmix signal by downmixing a plurality of object audio signals, and generates parameter information including a plurality of information extracted from the object audio signals. In a typical object-based audio decoding method, the object decoder recovers a plurality of object audio signals by decoding the received downmix signal using object-based parameter information, and the renderer The object audio signal is synthesized into a two-channel signal or a multi-channel signal based on control data necessary for designating the position of the signal.

しかしながら、制御データは単なるレベル間情報であり、レベル情報を用いて単なる音像正位を行うことによって３Ｄ効果を具現するのに限界があった。 However, the control data is merely inter-level information, and there is a limit in realizing the 3D effect by performing simple sound image localization using the level information.

したがって、本発明の目的は、オブジェクトオーディオ信号別に音像を所望の空間位置に定位できるようにオーディオ信号を符号化及び復号化するオーディオ信号符号化及び復号化方法とその装置を提供することにある。 Accordingly, an object of the present invention is to provide an audio signal encoding and decoding method and apparatus for encoding and decoding an audio signal so that a sound image can be localized at a desired spatial position for each object audio signal.

上記の目的を達成するための本発明によるオーディオ信号復号化方法は、入力オーディオ信号からダウンミックス信号とオブジェクトベースパラメータ情報とを抽出する段階と、前記ダウンミックス信号と前記オブジェクトベースパラメータ情報とを用いてオブジェクトオーディオ信号を生成する段階と、３次元（３Ｄ）効果の適用されたオブジェクトオーディオ信号を、前記オブジェクトオーディオ信号に３Ｄ情報を用いて生成する段階と、を含む。 In order to achieve the above object, an audio signal decoding method according to the present invention uses a step of extracting a downmix signal and object-based parameter information from an input audio signal, and uses the downmix signal and the object-based parameter information. Generating an object audio signal, and generating an object audio signal to which a three-dimensional (3D) effect is applied using 3D information for the object audio signal.

また、上記の目的を達成するための本発明によるオーディオ信号の復号化方法は、入力オーディオ信号からダウンミックス信号とオブジェクトベースパラメータ情報とを抽出する段階と、前記オブジェクトベースパラメータ情報を変換してチャンネルベースパラメータ情報を生成する段階と、前記ダウンミックス信号と前記チャンネルベースパラメータ情報とを用いてオーディオ信号を生成し、該オーディオ信号に３Ｄ情報を用いて３Ｄ効果の適用されたオーディオ信号を生成する段階と、を含む。 According to another aspect of the present invention, there is provided a decoding method of an audio signal according to the present invention, the step of extracting a downmix signal and object base parameter information from an input audio signal, and converting the object base parameter information to a channel. Generating base parameter information; generating an audio signal using the downmix signal and the channel base parameter information; and generating an audio signal to which a 3D effect is applied using 3D information for the audio signal. And including.

一方、本発明によるオーディオ信号復号化装置は、入力オーディオ信号からオブジェクトベースダウンミックス信号とオブジェクトベースパラメータ情報とを抽出するデマルチプレクサと、前記オブジェクトベースダウンミックス信号と前記オブジェクトベースパラメータ情報とを用いてオブジェクトオーディオ信号を生成するオブジェクトデコーダと、前記オブジェクトオーディオ信号に３Ｄ情報を用いて３Ｄ効果の適用されたオブジェクトオーディオ信号を生成するレンダラーと、を含む。 Meanwhile, an audio signal decoding apparatus according to the present invention uses a demultiplexer that extracts an object base downmix signal and object base parameter information from an input audio signal, the object base downmix signal, and the object base parameter information. An object decoder for generating an object audio signal; and a renderer for generating an object audio signal to which a 3D effect is applied using 3D information for the object audio signal.

また、本発明によるオーディオ信号復号化装置は、入力オーディオ信号からダウンミックス信号とオブジェクトベースパラメータ情報とを抽出するデマルチプレクサと、インデックスデータを用いて３Ｄ情報を引き出すとともに前記３Ｄ情報を出力するレンダラーと、前記オブジェクトベースパラメータ情報をチャンネルベースパラメータ情報に変換し、前記３Ｄ情報をチャンネルベース３Ｄ情報に変換し、これらをそれぞれ出力するトランスコーダと、前記ダウンミックス信号と前記チャンネルベースパラメータ情報とを用いてオーディオ信号を生成し、該オーディオ信号に前記チャンネルベース３Ｄ情報を用いて３Ｄ効果の適用されたオーディオ信号を生成するマルチチャンネルデコーダと、を含む。 An audio signal decoding apparatus according to the present invention includes a demultiplexer that extracts a downmix signal and object-based parameter information from an input audio signal, a renderer that extracts 3D information using index data and outputs the 3D information. The object-based parameter information is converted into channel-based parameter information, the 3D information is converted into channel-based 3D information, and each of them is output, and the downmix signal and the channel-based parameter information are used. A multi-channel decoder that generates an audio signal and generates an audio signal to which a 3D effect is applied using the channel-based 3D information for the audio signal.

本発明によれば、入力オーディオ信号からダウンミックス信号とオブジェクトベースパラメータ情報を抽出するデマルチプレクサと、入力インデックスデータを用いて３Ｄ情報を引き出すとともに３Ｄ情報を出力するレンダラーと、前記オブジェクトベースパラメータ情報をチャンネルベースパラメータ情報に変換し、前記３Ｄ情報をチャンネルベース３Ｄ情報に変換し、これらをそれぞれ出力するトランスコーダと、前記ダウンミックス信号と前記チャンネルベースパラメータ情報とを用いてオーディオ信号を生成し、前記オーディオ信号に前記チャンネルベース３Ｄ情報を用いて３Ｄ効果の適用されたオーディオ信号を生成するマルチチャンネルデコーダと、を含むオーディオ信号復号化装置が提供される。 According to the present invention, a demultiplexer that extracts a downmix signal and object-based parameter information from an input audio signal, a renderer that extracts 3D information using input index data and outputs 3D information, and the object-based parameter information Converting into channel-based parameter information, converting the 3D information into channel-based 3D information, generating an audio signal using the transcoder that outputs the information, the downmix signal, and the channel-based parameter information, There is provided an audio signal decoding apparatus including a multi-channel decoder that generates an audio signal to which a 3D effect is applied using the channel-based 3D information in the audio signal.

また、上記目的を達成するための本発明によるオーディオ信号符号化方法は、オブジェクトオーディオ信号をダウンミックスすることによってダウンミックス信号を生成する段階と、前記オブジェクトオーディオ信号に関する情報を抽出し、オブジェクトベースパラメータ情報を生成する段階と、前記オブジェクトオーディオ信号に対する３Ｄ効果具現時に用いられる３Ｄ情報の検索のためのインデックスデータを、前記オブジェクトベースパラメータ情報に挿入する段階と、を含む。 According to another aspect of the present invention, there is provided an audio signal encoding method for generating a downmix signal by downmixing an object audio signal, extracting information about the object audio signal, Generating information, and inserting index data for searching for 3D information used when implementing the 3D effect on the object audio signal into the object base parameter information.

前記目的を達成するために本発明では、前記方法をコンピュータで実行させるためのプログラムを記録したコンピュータで読取り可能な記録媒体を提供する。 In order to achieve the above object, the present invention provides a computer-readable recording medium storing a program for causing the computer to execute the method.

以上説明の如く、本発明によれば、オブジェクトベースオーディオ信号の符号化及び復号化の長所を最大限に活用しながら、各オブジェクトオーディオ信号に音像を定位できるので、オブジェクトオーディオ信号の再生時により生き生きとした現実感を提供することが可能になる。また、本発明は、ゲームプレーヤーによりネットワークを通じて操作されるゲームキャラクターの位置情報が頻繁に変化するインタラクティブゲームに有用となり、精巧な現実感を提供することができる。 As described above, according to the present invention, sound images can be localized in each object audio signal while maximizing the advantages of encoding and decoding object-based audio signals, so that the object audio signals can be reproduced more vividly. It becomes possible to provide a sense of reality. In addition, the present invention is useful for an interactive game in which position information of game characters operated by a game player through a network frequently changes, and can provide elaborate reality.

以下、添付の図面を参照しつつ本発明についてより詳細に説明する。 Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

本発明によるオーディオ信号の符号化及び復号化方法とその装置は、基本的に、オブジェクトベースオーディオ信号の符号化及び復号化過程に適用されるが、必ずしもこれに限定されるわけではなく、本発明による条件を満たす他の信号の処理過程にも適用可能である。また、本発明によるオーディオ信号の符号化及び復号化方法とその装置は、オブジェクトオーディオ信号に頭部伝達関数（ＨＲＴＦ）などの３Ｄ情報を適用するもので、これにより、それぞれのオブジェクトオーディオ信号の音像を所望の空間位置に定位させることができる。 The audio signal encoding and decoding method and apparatus according to the present invention are basically applied to an object-based audio signal encoding and decoding process, but the present invention is not limited thereto. The present invention can also be applied to other signal processing processes that satisfy the condition (1). In addition, the audio signal encoding and decoding method and apparatus according to the present invention apply 3D information such as a head related transfer function (HRTF) to an object audio signal, and thereby a sound image of each object audio signal. Can be localized at a desired spatial position.

図１は、一般的なオブジェクトベースオーディオ符号化装置を示すブロック図である。図１を参照すると、オブジェクトベースオーディオ信号符号化装置は、オブジェクトエンコーダ１１０及びビットストリーム生成部１２０を含む。 FIG. 1 is a block diagram illustrating a general object-based audio encoding apparatus. Referring to FIG. 1, the object-based audio signal encoding apparatus includes an object encoder 110 and a bitstream generation unit 120.

オブジェクトエンコーダ１１０は、Ｎ個のオブジェクトオーディオ信号を受信し、オブジェクトベースダウンミックス信号と、Ｎ個のオブジェクトオーディオ信号から抽出した情報が含まれるオブジェクトベースパラメータ情報と、を生成する。この時、各オブジェクトオーディオ信号から抽出した情報は、エネルギー差の値及び相関値などに基づくものである。 The object encoder 110 receives N object audio signals, and generates an object base downmix signal and object base parameter information including information extracted from the N object audio signals. At this time, the information extracted from each object audio signal is based on the value of the energy difference, the correlation value, and the like.

ビットストリーム生成部１２０は、オブジェクトエンコーダ１１０で生成したオブジェクトベースダウンミックス信号とパラメータ情報とを結合したビットストリームを生成する。ここで、ビットストリーム生成部１２０で生成したビットストリームには、復号化装置のデフォルト設定のためにデフォルトミキシングパラメータを含めることができ、デフォルトミキシングパラメータには、３Ｄ効果具現時に適用されるＨＲＴＦのような３Ｄ情報の検索に使われるインデックスデータを含めることができる。 The bit stream generation unit 120 generates a bit stream obtained by combining the object base downmix signal generated by the object encoder 110 and the parameter information. Here, the bitstream generated by the bitstream generation unit 120 may include a default mixing parameter for the default setting of the decoding device, and the default mixing parameter may be an HRTF applied when the 3D effect is implemented. Index data used for searching 3D information can be included.

図２は、本発明の第１実施例によるオーディオ信号復号化装置を示すブロック図である。本実施例によるオーディオ信号復号化装置は、一般的なオブジェクトベース符号化方法にＨＲＴＦベース３Ｄバイノーラル定位（ｂｉｎａｕｒａｌｌｏｃａｌｉｓａｔｉｏｎ）概念を加えたものである。ＨＲＴＦは、任意の位置を持つ音源から出る音波と耳の鼓膜に到達する音波間の伝達関数を意味し、前記音源の方位と高度によってその値が異なる。指向性のない信号をＨＲＴＦでフィルタリングすると、人にとってはあたかも特定方向から音が聞こえるかのよう感じられる。 FIG. 2 is a block diagram showing an audio signal decoding apparatus according to the first embodiment of the present invention. The audio signal decoding apparatus according to the present embodiment is obtained by adding an HRTF-based 3D binaural localization concept to a general object-based encoding method. HRTF means a transfer function between a sound wave emitted from a sound source having an arbitrary position and a sound wave reaching the ear tympanic membrane, and its value varies depending on the direction and altitude of the sound source. When a signal with no directivity is filtered by HRTF, it feels as if a sound can be heard from a specific direction.

図２を参照すると、本実施例によるオーディオ信号復号化装置は、デマルチプレクサ１３０、オブジェクトデコーダ１４０、レンダラー１５０、及び３Ｄ情報データベース１６０を含む。 Referring to FIG. 2, the audio signal decoding apparatus according to the present embodiment includes a demultiplexer 130, an object decoder 140, a renderer 150, and a 3D information database 160.

デマルチプレクサ１３０は、入力ビットストリームからダウンミックス信号とオブジェクトベースパラメータ情報を抽出する。オブジェクトデコーダ１４０は、ダウンミックス信号とオブジェクトベースパラメータ情報とを用いてオブジェクトオーディオ信号を生成する。３Ｄ情報データベース１６０は、ＨＲＴＦなどのような３Ｄ情報を保存するデータベースであり、入力インデックスデータに対応する３Ｄ情報を検索して出力する。そして、レンダラー１５０は、オブジェクトデコーダ１４０によって生成されたオブジェクトオーディオ信号と、３Ｄ情報データベース１６０から出力された３Ｄ情報とを用いて、３Ｄ信号を生成する。 The demultiplexer 130 extracts the downmix signal and the object base parameter information from the input bitstream. The object decoder 140 generates an object audio signal using the downmix signal and the object base parameter information. The 3D information database 160 is a database for storing 3D information such as HRTF, and searches for and outputs 3D information corresponding to input index data. Then, the renderer 150 generates a 3D signal using the object audio signal generated by the object decoder 140 and the 3D information output from the 3D information database 160.

図３は、本発明の第１実施例によるオーディオ信号復号化装置の動作方法を説明するためのフローチャートである。図２及び図３を参照すると、オーディオ信号復号化装置は、符号化装置などから伝達されるビットストリームを受信すると（Ｓ１７０）、デマルチプレクサ１３０が、ビットストリームからダウンミックス信号とオブジェクトベースパラメータ情報とを抽出する（Ｓ１７２）。オブジェクトデコーダ１４０は、デマルチプレクサ１３０で抽出されたダウンミックス信号とオブジェクトベースパラメータ情報とを用いてオブジェクトオーディオ信号を生成する（Ｓ１７４）。 FIG. 3 is a flowchart for explaining an operation method of the audio signal decoding apparatus according to the first embodiment of the present invention. 2 and 3, when the audio signal decoding apparatus receives a bit stream transmitted from the encoding apparatus or the like (S170), the demultiplexer 130 receives a downmix signal, object-based parameter information, and the like from the bit stream. Is extracted (S172). The object decoder 140 generates an object audio signal using the downmix signal extracted by the demultiplexer 130 and the object base parameter information (S174).

レンダラー１５０は、オブジェクトオーディオ信号の位置を指定するのに必要な制御データに含まれたインデックスデータを用いて３Ｄ情報データベース１６０から３Ｄ情報を引き出す（Ｓ１７６）。続いて、レンダラー１５０は、オブジェクトデコーダ１１０から出力されるオブジェクトオーディオ信号と３Ｄ情報データベース１６０から引き出した３Ｄ情報とを用いて３Ｄレンダリングを行うことによって、３Ｄ効果を奏する３Ｄベースの信号を生成する（Ｓ１７８）。 The renderer 150 extracts 3D information from the 3D information database 160 using the index data included in the control data necessary for designating the position of the object audio signal (S176). Subsequently, the renderer 150 performs 3D rendering using the object audio signal output from the object decoder 110 and the 3D information extracted from the 3D information database 160, thereby generating a 3D-based signal that exhibits a 3D effect ( S178).

レンダラー１５０によって生成された３Ｄ信号は、３以上の指向性を持つ２チャンネルの信号とすることができ、ヘッドホンのような２チャンネルスピーカーを通じて３次元立体音響として再生できる。すなわち、レンダラー１５０によって生成された３Ｄ信号を２チャンネルのスピーカーを通じて再生すると、３Ｄダウンミックス信号は３チャンネル以上の音源から再生されるかのようにユーザーが感じることができる。音源の方向は、両耳に入ってくる音の強度差、二つの音の間の時間差、二つの音の位相差のうち少なくとも一つによって決定されるので、レンダラー１５０は、人間が聴覚で音源の３次元上の位置を把握するメカニズムを用いて３Ｄ信号を生成できる。 The 3D signal generated by the renderer 150 can be a two-channel signal having three or more directivities, and can be reproduced as a three-dimensional stereo sound through a two-channel speaker such as a headphone. That is, when the 3D signal generated by the renderer 150 is reproduced through a two-channel speaker, the user can feel as if the 3D downmix signal is reproduced from a sound source of three or more channels. The direction of the sound source is determined by at least one of the intensity difference between the sounds coming into both ears, the time difference between the two sounds, and the phase difference between the two sounds. A 3D signal can be generated by using a mechanism for grasping the position in the three dimensions.

オーディオ信号符号化装置は、デフォルト設定のためにデフォルトミキシングパラメータ情報に３Ｄ情報引出しに必要なインデックスデータを含むことができる。この場合、レンダラー１５０は、デフォルトミキシングパラメータ情報に含まれたインデックスデータを用いて３Ｄ情報データベース１６０から３Ｄ情報を引き出すことができる。 The audio signal encoding apparatus may include index data necessary for 3D information extraction in default mixing parameter information for default setting. In this case, the renderer 150 can extract 3D information from the 3D information database 160 using the index data included in the default mixing parameter information.

オーディオ信号符号化装置は、オブジェクト信号に対して３Ｄ効果具現時に適用されるＨＲＴＦのような３Ｄ情報の検索に必要なインデックスデータを制御データに含むことができる。すなわち、本実施例によるオーディオ信号の符号化装置に使われる制御データに含まれるミキシングパラメータは、レベル情報の外に、３Ｄ情報の検索のためのインデックスデータをさらに含むことができる。そして、制御データに含まれるミキシングパラメータには、レベル情報とインデックスデータの外に、チャンネル間の時間差情報のような時間情報、位置情報、及びレベル情報と時間情報との組合せとすることができる。 The audio signal encoding apparatus can include, in the control data, index data necessary for searching 3D information such as HRTF applied to the object signal when the 3D effect is implemented. That is, the mixing parameter included in the control data used in the audio signal encoding apparatus according to the present embodiment may further include index data for searching for 3D information in addition to the level information. In addition to the level information and the index data, the mixing parameter included in the control data can be a combination of time information such as time difference information between channels, position information, and level information and time information.

複数のオブジェクトオーディオ信号及び複数のオブジェクトオーディオ信号の一つ以上に加えることを必要とする３Ｄ効果が存在する場合、所定のインデックス情報に対応する３Ｄ情報が、検索され、３Ｄ効果を加えるオブジェクトオーディオ信号のターゲット位置を特定する３Ｄ情報を格納する３Ｄ情報データベース１６０から引き出される。レンダラー１５０は、引き出された３Ｄ情報を用いて３Ｄレンダリング処理を行い、３Ｄ効果を奏することができる。全てのオブジェクト信号に関する３Ｄ情報を、ミキシングパラメータ情報として使用することができる。３Ｄ情報を少数のオブジェクト信号のみに適用する場合、少数のオブジェクト信号以外のオブジェクト信号に関するレベル情報及び時間情報を、ミキシングパラメータ情報として使用することができる。 When there is a 3D effect that needs to be added to one or more of the plurality of object audio signals and the plurality of object audio signals, 3D information corresponding to the predetermined index information is searched, and the object audio signal to which the 3D effect is added Is extracted from the 3D information database 160 that stores 3D information for specifying the target position of the target. The renderer 150 can perform a 3D rendering process using the extracted 3D information and achieve a 3D effect. 3D information about all object signals can be used as mixing parameter information. When 3D information is applied only to a small number of object signals, level information and time information related to object signals other than the small number of object signals can be used as mixing parameter information.

図４は、本発明の第２実施例によるオーディオ信号復号化装置を示すブロック図である。本実施例では、オブジェクトデコーダの代わりにマルチチャンネルデコーダ２７０を使用する。 FIG. 4 is a block diagram showing an audio signal decoding apparatus according to the second embodiment of the present invention. In this embodiment, a multi-channel decoder 270 is used instead of the object decoder.

図４を参照すると、本実施例によるオーディオ信号復号化装置は、デマルチプレクサ２３０、トランスコーダ２４０、レンダラー２５０、３Ｄ情報データベース２６０、及びマルチチャンネルデコーダ２７０を含む。 Referring to FIG. 4, the audio signal decoding apparatus according to the present embodiment includes a demultiplexer 230, a transcoder 240, a renderer 250, a 3D information database 260, and a multichannel decoder 270.

デマルチプレクサ２３０は、入力ビットストリームからダウンミックス信号とオブジェクトベースパラメータ情報とを抽出する。レンダラー２５０は、制御データに含まれたインデックスデータに対応する３Ｄ情報を用いて、各オブジェクト信号に対して３Ｄ位置を指定する。トランスコーダ２３０は、オブジェクトベースパラメータ情報と、レンダラー２４０によって３Ｄ情報が適用された各オブジェクトオーディオ信号に対する３Ｄ位置情報とを合成し、チャンネルベースパラメータ情報を生成する。マルチチャンネルデコーダ２７０は、ダウンミックス信号とチャンネルベースパラメータ情報とを用いて、３Ｄ信号を生成する。 The demultiplexer 230 extracts the downmix signal and the object base parameter information from the input bitstream. The renderer 250 designates a 3D position for each object signal using 3D information corresponding to the index data included in the control data. The transcoder 230 synthesizes the object-based parameter information and the 3D position information for each object audio signal to which the 3D information is applied by the renderer 240, and generates channel-based parameter information. The multi-channel decoder 270 generates a 3D signal using the downmix signal and the channel base parameter information.

図５は、本発明の第２実施例によるオーディオ信号復号化装置の動作方法を説明するためのフローチャートである。図４及び図５を参照すると、オーディオ信号復号化装置がビットストリームを受信すると（Ｓ２８０）、デマルチプレクサ２３０は、受信したビットストリームからオブジェクトベースダウンミックス信号とオブジェクトベースパラメータ情報とを抽出する（Ｓ２８２）。 FIG. 5 is a flowchart for explaining an operation method of the audio signal decoding apparatus according to the second embodiment of the present invention. 4 and 5, when the audio signal decoding apparatus receives the bit stream (S280), the demultiplexer 230 extracts the object base downmix signal and the object base parameter information from the received bit stream (S282). ).

レンダラー２５０は、オブジェクトオーディオ信号の位置を指定するのに使用する制御データに含まれたインデックスデータを抽出し、抽出したインデックスデータを用いて３Ｄ情報データベース２６０から３Ｄ情報を引き出す（Ｓ２８４）。一次的にデフォルトミキシングパラメータにより指定されたオブジェクトオーディオ信号の位置は、ミキシング制御データを用いてオブジェクトオーディオ信号の所望の位置に対応する３Ｄ情報を指定することによって変更することができる。 The renderer 250 extracts index data included in the control data used to specify the position of the object audio signal, and extracts 3D information from the 3D information database 260 using the extracted index data (S284). The position of the object audio signal temporarily specified by the default mixing parameter can be changed by specifying 3D information corresponding to a desired position of the object audio signal using the mixing control data.

トランスコーダ２３０は、オーディオ信号符号化装置によって転送されたＮ個のオブジェクト信号に関するオブジェクトベースパラメータ情報と、レンダラー２５０によりＨＲＴＦのような３Ｄ情報を用いて得られた各オブジェクト信号に関する３Ｄ位置情報とを合成し、Ｍ個チャンネルに対するチャンネルベースパラメータ情報を生成する（Ｓ２８６）。 The transcoder 230 receives object base parameter information related to N object signals transferred by the audio signal encoding device, and 3D position information related to each object signal obtained by the renderer 250 using 3D information such as HRTF. Then, the channel base parameter information for the M channels is generated (S286).

マルチチャンネルデコーダ２７０は、デマルチプレクサ２３０から供給されるダウンミックス信号とトランスコーダ２３０から供給されるチャンネルベースパラメータ情報とを用いてオーディオ信号を生成し、チャンネルベースパラメータ情報に含まれた３Ｄ情報を用いて３Ｄレンダリングを行い、３Ｄベースのマルチチャンネル信号を生成する（Ｓ２９０）。 The multi-channel decoder 270 generates an audio signal using the downmix signal supplied from the demultiplexer 230 and the channel base parameter information supplied from the transcoder 230, and uses the 3D information included in the channel base parameter information. 3D rendering is performed to generate a 3D-based multi-channel signal (S290).

図６は、本発明の第３実施例によるオーディオ信号復号化装置を示すブロック図である。図６を参照すると、本実施例によるオーディオ信号の復号化装置は、トランスコーダ４４０がマルチチャンネルデコーダ４７０にチャンネルベースパラメータ情報及び３Ｄ情報を個別に転送するという点が、前述した実施例と異なる。すなわち、第２実施例によるトランスコーダのように、３Ｄ情報を含むチャンネルベースパラメータ情報を転送するのではなく、本実施例ではトランスコーダ４４０が、Ｎ個のオブジェクト信号に対するオブジェクトベースパラメータ情報を用いることによって得られたＭ個チャンネルに対するチャンネルベースパラメータ情報をマルチチャンネルデコーダ４７０に転送する。 FIG. 6 is a block diagram showing an audio signal decoding apparatus according to the third embodiment of the present invention. Referring to FIG. 6, the audio signal decoding apparatus according to the present embodiment is different from the above-described embodiment in that the transcoder 440 individually transfers the channel base parameter information and the 3D information to the multi-channel decoder 470. That is, instead of transferring channel-based parameter information including 3D information as in the transcoder according to the second embodiment, the transcoder 440 uses object-based parameter information for N object signals in this embodiment. The channel base parameter information for the M channels obtained by the above is transferred to the multi-channel decoder 470.

図７に示すように、チャンネルベースパラメータ情報と３Ｄ情報は、それ自体のフレームインデックスを含む。したがって、マルチチャンネルデコーダ４７０は、チャンネルベースパラメータ情報と３Ｄ情報のフレームインデックスを用いてチャンネルベースパラメータ情報と３Ｄ情報との同期をとることによって、ビットストリームの特定フレームに３Ｄ情報を適用することができる。例えば、図７を参照すると、インデックス２に対応する３Ｄ情報を、インデックス２を持つフレーム２の始点に適用される。 As shown in FIG. 7, channel-based parameter information and 3D information includes its own frame index. Therefore, the multi-channel decoder 470 can apply the 3D information to a specific frame of the bitstream by synchronizing the channel base parameter information and the 3D information using the channel index of the channel base parameter information and the 3D information. . For example, referring to FIG. 7, 3D information corresponding to index 2 is applied to the start point of frame 2 having index 2.

３Ｄ情報が時間の経過につれて更新される場合でも、チャンネルベースパラメータ情報において３Ｄ情報をどの位置に適用する必要があるかを３Ｄ情報のフレームインデックスを参照することによって決定することができる。すなわち、マルチチャンネルデコーダ４７０がチャンネルベースパラメータ情報と３Ｄ情報とを一時的に同期をとるために、トランスコーダ４４０は、チャンネルベースパラメータ情報と３Ｄ情報にフレームインデックス情報を挿入することができる。 Even when the 3D information is updated over time, it is possible to determine to which position the 3D information should be applied in the channel-based parameter information by referring to the frame index of the 3D information. That is, since the multi-channel decoder 470 temporarily synchronizes channel base parameter information and 3D information, the transcoder 440 can insert frame index information into the channel base parameter information and 3D information.

図８は、本発明の第４実施例によるオーディオ信号復号化装置を示すブロック図である。図８を参照すると、本実施例によるオーディオ信号復号化装置は、プリプロセッサ５４３とエフェクトプロセッサ５８０とをさらに含み、レンダラー５５０内に３Ｄ情報データベース５６０が備えられるという点が、前述した実施例と異なる。 FIG. 8 is a block diagram showing an audio signal decoding apparatus according to the fourth embodiment of the present invention. Referring to FIG. 8, the audio signal decoding apparatus according to the present embodiment further includes a preprocessor 543 and an effects processor 580, and is different from the above-described embodiment in that a 3D information database 560 is provided in the renderer 550.

すなわち、デマルチプレクサ５３０、トランスコーダ５４７、レンダラー５６０、３Ｄ情報データベース５６０、及びマルチチャンネルデコーダ５７０の機能及び構成は、図６に示す実施例におけると同様である。図８を参照すると、エフェクトプロセッサ５８０がダウンミックス信号に所定の効果を加えることができる。プリプロセッサ５４３は、例えばステレオダウンミックス信号に前処理過程を行うことができ、その結果、ステレオダウンミックス信号の位置を調整することができる。レンダラー５５０内に３Ｄ情報データベース５６０を含むことができる。 That is, the functions and configurations of the demultiplexer 530, transcoder 547, renderer 560, 3D information database 560, and multi-channel decoder 570 are the same as those in the embodiment shown in FIG. Referring to FIG. 8, the effects processor 580 can add a predetermined effect to the downmix signal. The preprocessor 543 can perform a preprocessing process on, for example, a stereo downmix signal, and as a result, can adjust the position of the stereo downmix signal. A 3D information database 560 can be included in the renderer 550.

図９は、本発明の第５実施例によるオーディオ信号復号化装置を示すブロック図である。図９を参照すると、本実施例によるオーディオ信号復号化装置は、３Ｄ信号を生成する部分６８０がマルチチャンネルデコーダ６７０とメモリー６７５とから構成されるという点が、前述した実施例と異なる。この場合、マルチチャンネルデコーダ６７０は、マルチチャンネルデコーダ６７０の使用されていないメモリーに保存された３Ｄ情報をメモリー６７５に複写し、メモリー６７５に複写された３Ｄ情報を用いて３Ｄレンダリングを行う。したがって、トランスコーダ６４７から出力される３Ｄ情報が、メモリー６７５に保存された３Ｄ情報を直接更新するように構成すると、マルチチャネルデコーダ６７０の構成変更なしで所望の３Ｄ情報を用いて３Ｄベースの信号を生成することができる。 FIG. 9 is a block diagram showing an audio signal decoding apparatus according to a fifth embodiment of the present invention. Referring to FIG. 9, the audio signal decoding apparatus according to the present embodiment is different from the above-described embodiment in that a portion 680 for generating a 3D signal includes a multi-channel decoder 670 and a memory 675. In this case, the multi-channel decoder 670 copies the 3D information stored in the unused memory of the multi-channel decoder 670 to the memory 675, and performs 3D rendering using the 3D information copied to the memory 675. Accordingly, when the 3D information output from the transcoder 647 is configured to directly update the 3D information stored in the memory 675, the 3D-based signal can be obtained using the desired 3D information without changing the configuration of the multi-channel decoder 670. Can be generated.

一方、本発明は、コンピュータが読取りできる記録媒体にコンピュータが読取りできるコードとして具現することが可能である。コンピュータが読取りできる記録媒体は、コンピュータにより読み取られうるデータが保存される全ての種類の記録装置を含む。コンピュータが読取りできる記録媒体の例には、ＲＯＭ、ＲＡＭ、ＣＤ−ＲＯＭ、磁気テープ、フレキシブルディスク、光データ記憶装置などがあり、また、インターネットを介した転送などのような搬送波の形態で具現されるものも含む。また、コンピュータが読取りできる記録媒体を、ネットワークで連結されたシステムに分散し、分散方式でコンピュータが読取りできるコードが保存し、実行することができる。本発明を実現するのに必要な機能プログラム、コード及びコードセグメントを、当業者によって容易に構成することができる。 On the other hand, the present invention can be embodied as a computer readable code on a computer readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, flexible disk, optical data storage device, etc., and are embodied in the form of a carrier wave such as transfer via the Internet. Also included. Further, a computer-readable recording medium can be distributed to systems connected via a network, and a computer-readable code can be stored and executed in a distributed manner. Functional programs, codes and code segments necessary for realizing the present invention can be easily configured by those skilled in the art.

また、以上では本発明を好適な実施例に挙げて説明してきたが、本発明は、上記した特定の実施例に限定されず、特許請求の範囲で請求する本発明の要旨を逸脱しない限度内で、当該技術分野における通常の知識を持つ者によって様々な変形実施が可能であることは勿論であり、それらの変形実施も本発明の技術的思想に含まれるものとして理解されるべきである。 Although the present invention has been described with reference to the preferred embodiments, the present invention is not limited to the specific embodiments described above, and is within the limits that do not depart from the gist of the present invention claimed in the claims. Therefore, it should be understood that various modifications can be made by those having ordinary knowledge in the technical field, and these modifications are also included in the technical idea of the present invention.

本発明は、オブジェクトベースのオーディオ信号の復号化過程などに適用され、オブジェクトオーディオ信号別に音像を定位し、より精巧な現実感を提供することができる。 The present invention is applied to an object-based audio signal decoding process and the like, and can localize a sound image for each object audio signal to provide a more sophisticated sense of reality.

一般的なオブジェクトベースのオーディオ信号符号化装置を示すブロック図である。1 is a block diagram showing a general object-based audio signal encoding apparatus. FIG. 本発明の第１実施例によるオーディオ信号復号化装置を示すブロック図である。1 is a block diagram showing an audio signal decoding apparatus according to a first embodiment of the present invention. 本発明の第１実施例によるオーディオ信号復号化装置の動作方法を説明するためのフローチャートである。5 is a flowchart for explaining an operation method of the audio signal decoding apparatus according to the first embodiment of the present invention; 本発明の第２実施例によるオーディオ信号復号化装置を示すブロック図である。It is a block diagram which shows the audio signal decoding apparatus by 2nd Example of this invention. 本発明の第２実施例によるオーディオ信号復号化装置の動作方法を説明するためのフローチャートである。6 is a flowchart illustrating an operation method of an audio signal decoding apparatus according to a second embodiment of the present invention. 本発明の第３実施例によるオーディオ信号復号化装置を示すブロック図である。It is a block diagram which shows the audio signal decoding apparatus by 3rd Example of this invention. 図６に示すオーディオ信号復号化装置によりフレームに３Ｄ情報を適用する例を示す図である。It is a figure which shows the example which applies 3D information to a flame | frame with the audio signal decoding apparatus shown in FIG. 本発明の第４実施例によるオーディオ信号復号化装置を示すブロック図である。It is a block diagram which shows the audio signal decoding apparatus by 4th Example of this invention. 本発明の第５実施例によるオーディオ信号復号化装置を示すブロック図である。It is a block diagram which shows the audio signal decoding apparatus by 5th Example of this invention.

Claims

Extracting a downmix signal and object-based parameter information from the input audio signal;
Generating an object audio signal using the downmix signal and the object-based parameter information;
Generating 3D effect applied object audio signal using 3D information for the object audio signal;
An audio signal decoding method comprising:

The audio signal decoding method according to claim 1, wherein the 3D information is head related transfer function (HRTF) information.

The method of claim 1, further comprising a step of storing the 3D information in a database.

The audio signal decoding method according to claim 1, wherein the 3D information is information corresponding to index data included in control data used for rendering the object audio signal.

5. The control data according to claim 4, wherein the control data includes at least one of inter-channel level information, inter-channel time information, position information, and information obtained by combining the level information and the time information. Audio signal decoding method.

The method of claim 4, further comprising rendering the object audio signal based on the control data.

2. The audio signal decoding method according to claim 1, wherein the index data is included in a default mixing parameter included in the object base parameter information.

A demultiplexer that extracts an object-based downmix signal and object-based parameter information from an input audio signal;
An object decoder that generates an object audio signal using the object-based downmix signal and the object-based parameter information;
A renderer for generating an object audio signal to which a 3D effect is applied using 3D information for the object audio signal;
An audio signal decoding apparatus comprising:

9. The audio signal decoding apparatus according to claim 8, further comprising a 3D information database in which the 3D information is stored as a database.

The audio signal decoding apparatus according to claim 8, wherein the 3D information is head related transfer function (HRTF) information.

The audio signal decoding apparatus according to claim 8, wherein the 3D information is information corresponding to index data included in control data used for rendering the object audio signal.

12. The control data according to claim 11, wherein the control data includes at least one of inter-channel level information, inter-channel time information, position information, and information obtained by combining the level information and the time information. Audio signal decoding apparatus.

Extracting a downmix signal and object-based parameter information from the input audio signal;
Converting the object-based parameter information to generate channel-based parameter information;
Generating an audio signal using the downmix signal and the channel base parameter information, and generating an audio signal to which a 3D effect is applied using 3D information for the audio signal;
An audio signal decoding method comprising:

The method of claim 13, further comprising storing the 3D information in a database.

The audio signal decoding method according to claim 13, wherein the 3D information is HRTF information.

14. The audio signal decoding method according to claim 13, wherein the 3D information is included in mixing control data used for rendering the object audio signal.

17. The mixing control data according to claim 16, wherein the mixing control data includes at least one of inter-channel level information, inter-channel time information, position information, and information obtained by combining the level information and the time information. Audio signal decoding method.

The method of claim 16, further comprising rendering the object audio signal based on the control data.

14. The audio signal decoding method according to claim 13, further comprising adding a predetermined effect to the downmix signal.

A demultiplexer that extracts a downmix signal and object-based parameter information from the input audio signal;
A renderer that extracts 3D information using index data and outputs the 3D information;
A transcoder that generates channel-based parameter information using the object-based parameter information and the 3D information;
An audio signal is generated using the downmix signal and the channel base parameter information, and an audio signal in which a 3D effect is applied to the audio signal using 3D information included in the channel base parameter information is generated. A channel decoder;
An audio signal decoding apparatus comprising:

The audio signal decoding apparatus according to claim 20, further comprising a 3D information database in which 3D information corresponding to the index data is stored as a database.

The audio signal decoding apparatus according to claim 20, wherein the 3D information database is provided in the renderer.

21. The audio signal decoding apparatus according to claim 20, further comprising an effect processor for adding a predetermined effect to the downmix signal.

The audio signal decoding apparatus according to claim 20, wherein the index data is included in control data used for rendering the object audio signal.

25. The control data according to claim 24, wherein the control data includes at least one of inter-channel level information, inter-channel time information, position information, and information obtained by combining the level information and the time information. Audio signal decoding apparatus.

A demultiplexer that extracts a downmix signal and object-based parameter information from the input audio signal;
A renderer that extracts 3D information using input index data and outputs the 3D information;
A transcoder that converts the object-based parameter information into channel-based parameter information, converts the 3D information into channel-based 3D information, and outputs these respectively;
A multi-channel decoder that generates an audio signal using the downmix signal and the channel base parameter information, and generates an audio signal in which a 3D effect is applied to the audio signal using the channel base 3D information;
An audio signal decoding apparatus comprising:

27. The audio signal decoding apparatus according to claim 26, wherein the multi-channel decoder includes a memory that stores 3D information that is commonly used to generate an audio signal that exhibits the 3D effect.

28. The audio signal decoding device according to claim 27, wherein the 3D information stored in the memory is updated by the channel-based 3D information output from the transcoder.

27. The audio signal decoding apparatus according to claim 26, wherein the index data is included in mixing control data used for rendering the object audio signal.

27. The audio signal decoding according to claim 26, wherein the channel base parameter information and the channel base 3D information include index information for synchronizing the channel base parameter information with the channel base 3D information. Device.

Generating a downmix signal obtained by downmixing the object audio signal;
Extracting information about the object audio signal and generating object-based parameter information;
Inserting index data for searching for 3D information used when realizing a 3D effect on the object audio signal into the object base parameter information;
An audio signal encoding method comprising:

The audio signal code of claim 31, further comprising generating a bitstream by combining the object-based downmix signal and the object-based parameter information into which the index data is inserted. Method.

The audio signal encoding method of claim 31, wherein the 3D information is HRTF information.

A computer-readable recording medium in which a program for causing a computer to execute the method according to claim 1 is recorded.