JP2011066868A

JP2011066868A - Audio signal encoding method, encoding device, decoding method, and decoding device

Info

Publication number: JP2011066868A
Application number: JP2009282358A
Authority: JP
Inventors: Sadahiro Yasura; 定浩安良
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2009-08-18
Filing date: 2009-12-14
Publication date: 2011-03-31

Abstract

<P>PROBLEM TO BE SOLVED: To reproduce a three-dimensional sound field with clear designation of a three-dimensional sound source position by an encoding stream according to a predetermined encoding system, and to encode an audio signal with good transmission efficiency. <P>SOLUTION: A three-dimensional space divider 11 outputs plane information and channel mapping information, based on positions of a plurality of speakers arranged stereoscopically in three-dimensional space for outputting audio signals of a plurality of channels, and based on a dividing direction for dividing the three-dimensional space into a plurality of planes. Plane encoders 12 to 14 generate encoding elements as a result of encoding as a group of programs for each two-dimensional plane based on the plane information and the channel mapping information, and further generates and outputs plane positional information. A stream integrating section 15 integrates all the encoding elements and the plane positional information to generate and output one encoding stream. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明はオーディオ信号符号化方法、符号化装置、復号化方法及び復号化装置に係り、特に多チャンネルのオーディオ信号に対して、所定の音声符号化方式の規格に準拠したオーディオ符号化信号を生成するオーディオ信号符号化方法及び符号化装置、並びにその符号化されたオーディオ信号を復号化する復号化方法及び復号化装置に関する。 The present invention relates to an audio signal encoding method, an encoding device, a decoding method, and a decoding device, and particularly generates an audio encoded signal that conforms to a predetermined audio encoding standard for a multi-channel audio signal. The present invention relates to an audio signal encoding method and encoding apparatus, and a decoding method and decoding apparatus for decoding the encoded audio signal.

多チャンネルのオーディオ信号を符号化するオーディオ信号符号化方式として、複数のエンコーダを用いる方式が知られている（例えば、特許文献１参照）。 As an audio signal encoding method for encoding a multi-channel audio signal, a method using a plurality of encoders is known (see, for example, Patent Document 1).

このオーディオ信号符号化方式では、符号化側では多チャンネルのオーディオ信号（音声信号）を、それぞれのチャンネルに対応するＡ／Ｄコンバータでデジタル信号に変換した後、各チャンネルに対応した並列駆動の複数のエンコーダ群でそれぞれの群毎に、例えば現行のデジタル放送において実績があるＭＰＥＧ−２ＡＡＣ(Moving Picture Experts Group 2 Advanced Audio Coding)規格により符号化し、一本の伝送ストリームに多重して送出する。 In this audio signal encoding method, a multi-channel audio signal (audio signal) is converted into a digital signal by an A / D converter corresponding to each channel on the encoding side, and then a plurality of parallel drive units corresponding to each channel are converted. For example, each encoder group is encoded according to the MPEG-2 AAC (Moving Picture Experts Group 2 Advanced Audio Coding) standard, which has a track record in the current digital broadcasting, and is multiplexed and transmitted in one transmission stream.

復号化側では、受信した一本の伝送ストリームから複数の群毎の多チャンネルオーディオ信号を分離して、それぞれを並列駆動の複数のデコーダ群でＭＰＥＧ−２ＡＡＣ規格に基づいて伸張処理した後、元の多チャンネルそれぞれに対応したＤ／Ａコンバータによりアナログ信号のオーディオ信号に復号化する。この復号化された多チャンネルのオーディオ信号は、聴取位置を中心とする水平平面とその上方の水平平面にそれぞれ配置された複数個のスピーカに供給されてこれらを駆動することにより、３次元的音場を再現する。 On the decoding side, a multi-channel audio signal for each of a plurality of groups is separated from one received transmission stream, and each of them is decompressed by a plurality of parallel-driven decoder groups based on the MPEG-2 AAC standard. It is decoded into an audio signal of an analog signal by a D / A converter corresponding to each original multi-channel. The decoded multi-channel audio signal is supplied to and driven by a plurality of speakers arranged on a horizontal plane centered at the listening position and a horizontal plane above the three-dimensional sound signal. Reproduce the place.

特開２０００−２３６５９９号公報JP 2000-236599 A

しかしながら、上記のＭＰＥＧ−２ＡＡＣ規格は、２次元平面（幅×奥行き）に設置された複数個のスピーカにより再生を行うオーディオ信号を符号化する方式であり、チャンネル配置情報も２次元平面を想定している。すなわち、３次元空間（幅×奥行き×高さ）に設置された複数個のスピーカにより再生を行う場合を想定していないため、多平面（３次元空間）におけるチャンネル配置を、例えば上方フロントチャンネル何チャンネルのように、３次元空間用のチャンネル配置情報を定義することができない。 However, the above MPEG-2 AAC standard is a method for encoding an audio signal to be reproduced by a plurality of speakers installed on a two-dimensional plane (width × depth), and channel arrangement information is also assumed to be a two-dimensional plane. is doing. In other words, since it is not assumed that reproduction is performed by a plurality of speakers installed in a three-dimensional space (width × depth × height), the channel arrangement in a multi-plane (three-dimensional space) is, for example, an upper front channel. Like a channel, channel arrangement information for a three-dimensional space cannot be defined.

このため、上記の従来のオーディオ信号符号化方法では、フロントチャンネルと、上方フロントチャンネルを識別することができず、一本の伝送ストリームとして多平面を伝送することができない。従って、上記の従来のオーディオ信号符号化方法では、３次元の音源位置を明確に指定した３次元の音場再現ができない。 For this reason, in the conventional audio signal encoding method described above, the front channel and the upper front channel cannot be identified, and a multi-plane cannot be transmitted as a single transmission stream. Therefore, the above-described conventional audio signal encoding method cannot reproduce a three-dimensional sound field in which a three-dimensional sound source position is clearly specified.

また、上記の従来のオーディオ信号符号化方法では、送信側（符号化側）で多チャンネルの独立した複数のＭＰＥＧ−２ＡＡＣ規格の符号化信号をそれぞれ所定ビット毎に分割してストリームとして一本の伝送路上に時分割多重して送り出すようにしているため、伝送ストリームは、ＭＰＥＧ−２ＡＡＣ規格に準拠したストリームではない。従って、復号化装置として、ＭＰＥＧ−２ＡＡＣ規格に準拠したデコーダを用意しても、従来のオーディオ信号符号化方法固有の分離処理を行わなければ、受信した一本の伝送ストリームをそのまま復号化することができない。 In the above-described conventional audio signal encoding method, a plurality of independent MPEG-2 AAC standard encoded signals of multiple channels on the transmission side (encoding side) are divided into predetermined streams, respectively, as one stream. Therefore, the transmission stream is not a stream compliant with the MPEG-2 AAC standard. Therefore, even if a decoder compliant with the MPEG-2 AAC standard is prepared as a decoding device, if a separation process specific to the conventional audio signal encoding method is not performed, one received transmission stream is decoded as it is. I can't.

更に、上記の従来のオーディオ信号符号化方法では、多重化する前に、複数本の独立した符号化ストリームを作るために、チャンネル数に応じた多数のエンコーダが必要であり、また時分割多重のための多重器も必要であるため、回路規模が大きく、また符号化ストリームそれぞれが、ヘッダ情報（同期コードなど）や転送レート調整用ビット等を有しているので、それらを多重化すると冗長な情報により一本のストリームサイズが大きくなり、伝送効率が悪い。上記の従来のオーディオ信号復号化方法も同様に、多数のデコーダだけでなく、分離器も必要であるため回路規模が大きくなってしまう。 Furthermore, in the conventional audio signal encoding method described above, a large number of encoders corresponding to the number of channels are necessary to create a plurality of independent encoded streams before multiplexing, and time division multiplexing is also possible. A large-scale circuit, and each encoded stream has header information (synchronization code, etc.), transfer rate adjustment bits, and the like. The size of one stream increases due to information, and transmission efficiency is poor. Similarly, the conventional audio signal decoding method described above requires not only a large number of decoders but also a separator, so that the circuit scale becomes large.

本発明は以上の点に鑑みなされたもので、３次元の音源位置を明確に指定した３次元の音場再現を、ＭＰＥＧ−２ＡＡＣ規格、ＭＰＥＧ−４ＡＡＣ規格、ＡＣ−３（Audio Code number3）を基にしたＥ−ＡＣ３(Enhanced AC3)方式等の所定の符号化方式に準拠した符号化ストリームにより可能とすると共に伝送効率の良いオーディオ信号符号化を行い得るオーディオ信号符号化方法、符号化装置、復号化方法及び復号化装置を提供することを目的とする。 The present invention has been made in view of the above points. Three-dimensional sound field reproduction in which a three-dimensional sound source position is clearly specified is reproduced by MPEG-2 AAC standard, MPEG-4 AAC standard, AC-3 (Audio Code number 3). ) Based audio signal encoding method and encoding capable of performing audio signal encoding with high transmission efficiency while enabling an encoded stream compliant with a predetermined encoding method such as E-AC3 (Enhanced AC3) method An object is to provide an apparatus, a decoding method, and a decoding apparatus.

上記の目的を達成するため、本発明のオーディオ信号符号化方法は、複数チャンネルのオーディオ信号を出力する３次元空間に立体的に配置される複数のスピーカの各スピーカの位置と、３次元空間を複数の２次元平面に分割するための方向である分割方向とに基づいて、２次元平面の数と、各平面に対応するチャンネル数と、２次元平面の分割順序とを含む平面情報を出力し、さらに、各チャンネルが対応する各スピーカの２次元平面内における位置を示すチャンネルマッピング情報を出力する第１のステップと、平面情報とチャンネルマッピング情報とに基づいて、複数チャンネルのオーディオ信号を２次元平面毎にひとまとまりのプログラムとして符号化を行うことで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、その符号化要素と平面位置情報とを２次元平面毎に出力する第２のステップと、第２のステップにより２次元平面毎に出力された符号化要素と平面位置情報とを全て統合して、１本の符号化ストリームを生成して出力する第３のステップとを含むことを特徴とする。 In order to achieve the above object, an audio signal encoding method according to the present invention includes a position of each speaker of a plurality of speakers arranged three-dimensionally in a three-dimensional space that outputs audio signals of a plurality of channels, and a three-dimensional space. Output plane information including the number of two-dimensional planes, the number of channels corresponding to each plane, and the order of division of the two-dimensional plane, based on a division direction that is a direction for dividing the plurality of two-dimensional planes. Further, based on the first step of outputting channel mapping information indicating the position of each speaker corresponding to each channel in the two-dimensional plane, the two-dimensional audio signal is two-dimensionally based on the plane information and the channel mapping information. Encoding elements are generated by encoding as a single program for each plane, and the channel arrangement in the two-dimensional plane is shown. A second step of generating plane position information including information and outputting the encoded element and the plane position information for each two-dimensional plane; and an encoded element output for each two-dimensional plane by the second step; And a third step of generating and outputting one encoded stream by integrating all the plane position information.

ここで、上記第３のステップで生成される１本の符号化ストリームには、３次元空間に立体的に配置される複数チャンネルのオーディオ信号の内、一部のチャンネルのみを復号化できるようにするための情報が平面位置情報として付加されていてもよい。 Here, in the one encoded stream generated in the third step, only a part of the channels of the audio signals of a plurality of channels arranged in a three-dimensional space can be decoded. Information may be added as plane position information.

また、上記第３のステップで生成される１本の符号化ストリームには、３次元空間に立体的に配置される複数チャンネルのオーディオ信号を、複数チャンネルよりも少ないチャンネル数に変換した信号に再生できるようにするための変換係数情報が付加されていてもよい。 In addition, in the one encoded stream generated in the third step, a plurality of channels of audio signals arranged three-dimensionally in a three-dimensional space is reproduced as a signal converted to a smaller number of channels than the plurality of channels. Conversion coefficient information for making it possible may be added.

また、上記の変換係数情報は、３次元空間に立体的に配置される少ないチャンネル数のスピーカの各スピーカの位置から、視聴者の右耳までの頭部伝達関数に相当するフィルタ係数と、視聴者の左耳までの頭部伝達関数に相当するフィルタ係数とを有していてもよい。 Also, the above conversion coefficient information includes a filter coefficient corresponding to a head-related transfer function from the position of each speaker of a speaker with a small number of channels arranged three-dimensionally in a three-dimensional space to the viewer's right ear, And a filter coefficient corresponding to the head-related transfer function up to the left ear of the person.

また、本発明は、上記第１のステップで生成されるチャンネルマッピング情報には、３次元空間に立体的に配置される複数のスピーカのうち、複数のチャンネルよりも少ないチャンネル数に予め変換したオーディオ信号を出力する各スピーカの２次元平面における位置を示す情報も含むようにし、上記第２のステップでは、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号とは別に、少ないチャンネル数に予め変換したオーディオ信号について、２次元平面毎にひとまとまりのプログラムとして符号化を行うことで第２の符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む第２の平面位置情報を生成して、第２の符号化要素と第２の平面位置情報とを２次元平面毎に出力するようにし、上記第３のステップでは、第２の符号化要素と第２の平面位置情報とを、複数チャンネルのオーディオ信号を２次元平面毎にひとまとまりのプログラムとして符号化を行うことで生成した符号化要素と、２次元平面内のチャンネル配置を示す情報を含む平面位置情報と一緒に統合した、１本の符号化ストリームを生成して出力するようにしてもよい。 Further, according to the present invention, the channel mapping information generated in the first step is an audio that has been converted in advance into a smaller number of channels than a plurality of channels among a plurality of speakers arranged three-dimensionally in a three-dimensional space. Information indicating the position of each speaker that outputs a signal in a two-dimensional plane is also included, and in the second step, a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. Separately, the audio signal that has been converted into a small number of channels in advance is encoded as a group of programs for each two-dimensional plane to generate a second encoding element, and the channel arrangement in the two-dimensional plane Second plane position information including information indicating the second plane position information is generated and the second coding element and the second plane position information are In the third step, the second encoding element and the second plane position information are encoded as a group of programs for each two-dimensional plane of audio signals of a plurality of channels. Thus, a single encoded stream may be generated and output together with the encoded elements generated in this way and the plane position information including information indicating the channel arrangement in the two-dimensional plane.

また、上記の目的を達成するため、本発明のオーディオ信号符号化装置は、複数チャンネルのオーディオ信号を出力する３次元空間に立体的に配置される複数のスピーカの各スピーカの位置と、３次元空間を複数の２次元平面に分割するための方向である分割方向とに基づいて、２次元平面の数と、各２次元平面に対応するチャンネル数と、２次元平面の分割順序とを含む平面情報を出力し、さらに、各チャンネルが対応する各スピーカの２次元平面内における位置を示すチャンネルマッピング情報を出力する３次元空間分割部と、平面情報とチャンネルマッピング情報とに基づいて、３次元空間に配置されるスピーカから出力するための多チャンネルのオーディオ信号を２次元平面毎にひとまとまりのプログラムとして符号化を行うことで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、その符号化要素と平面位置情報とを２次元平面毎に出力する平面符号化部と、平面符号化部により２次元平面毎に出力された符号化要素と平面位置情報とを全て統合して、１本の符号化ストリームを生成して出力するストリーム統合部とを有することを特徴とする。 In order to achieve the above object, the audio signal encoding device according to the present invention includes a position of each speaker of a plurality of speakers arranged three-dimensionally in a three-dimensional space for outputting a plurality of channels of audio signals, and a three-dimensional A plane including the number of two-dimensional planes, the number of channels corresponding to each two-dimensional plane, and the division order of the two-dimensional plane based on a division direction that is a direction for dividing the space into a plurality of two-dimensional planes 3D space division unit for outputting information and further outputting channel mapping information indicating the position in the 2D plane of each speaker corresponding to each channel, and the 3D space based on the plane information and the channel mapping information By encoding a multi-channel audio signal to be output from a speaker arranged in a 2-dimensional plane as a single program A plane encoding unit that generates a coding element, generates plane position information including information indicating a channel arrangement in a two-dimensional plane, and outputs the coding element and the plane position information for each two-dimensional plane; A stream integration unit that generates a single encoded stream by outputting all of the encoding elements and plane position information output for each two-dimensional plane by the plane encoding unit. To do.

ここで、上記のストリーム統合部は、３次元空間に立体的に配置される複数チャンネルの音源情報の内、一部のチャンネルのみを復号化できるようにするための情報を平面位置情報として付加した１本の符号化ストリームを生成してもよい。 Here, the above-mentioned stream integration unit adds information for enabling decoding of only some of the plural channels of sound source information arranged in a three-dimensional space as plane position information. One encoded stream may be generated.

また、上記のストリーム統合部は、３次元空間に立体的に配置される複数チャンネルのオーディオ信号を複数チャンネルよりも少ないチャンネル数に変換した信号に再生できるようにするための変換係数情報を付加した１本の符号化ストリームを生成するようにしてもよい。 In addition, the above stream integration unit has added conversion coefficient information for enabling reproduction of a multi-channel audio signal arranged three-dimensionally in a three-dimensional space into a signal converted to a number of channels smaller than the plurality of channels. One encoded stream may be generated.

上記の変換係数情報は、３次元空間に立体的に配置される少ないチャンネル数のスピーカの各スピーカの位置から、視聴者の右耳までの頭部伝達関数に相当するフィルタ係数と、視聴者の左耳までの頭部伝達関数に相当するフィルタ係数とを有していてもよい。 The above conversion coefficient information includes a filter coefficient corresponding to a head-related transfer function from the position of each speaker of a speaker with a small number of channels arranged in a three-dimensional space to the viewer's right ear, And a filter coefficient corresponding to the head-related transfer function up to the left ear.

また、本発明のオーディオ信号符号化装置は、上記の３次元空間分割部は、３次元空間に立体的に配置される複数のスピーカのうち、複数チャンネルよりも少ないチャンネル数に予め変換したオーディオ信号を出力するスピーカの２次元平面における位置を示す情報も出力するようにし、上記の平面符号化部は、複数チャンネルのオーディオ信号とは別に、少ないチャンネル数に予め変換したオーディオ信号について、２次元平面毎にひとまとまりのプログラムとして符号化を行うことで第２の符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む第２の平面位置情報を生成して、第２の符号化要素と第２の平面位置情報とを２次元平面毎に出力するようにし、上記のストリーム統合部は、第２の符号化要素と第２の平面位置情報とを、複数チャンネルのオーディオ信号を２次元平面毎にひとまとまりのプログラムとして符号化を行うことで生成した符号化要素と２次元平面内のチャンネル配置を示す情報を含む平面位置情報と一緒に統合した、１本の符号化ストリームを生成して出力するようしてもよい。 Also, in the audio signal encoding device of the present invention, the three-dimensional space division unit described above is an audio signal that is converted in advance into a number of channels smaller than a plurality of channels among a plurality of speakers arranged three-dimensionally in a three-dimensional space. Information indicating the position of the speaker in the two-dimensional plane is also output, and the plane encoding unit described above is a two-dimensional plane for audio signals that have been converted into a small number of channels separately from the audio signals of a plurality of channels. A second encoding element is generated by performing encoding as a group of programs, and second plane position information including information indicating the channel arrangement in the two-dimensional plane is generated. The encoding element and the second plane position information are output for each two-dimensional plane, and the stream integration unit is configured to output the second encoding element and the second plane information. Plane position information, including plane position information including coding elements generated by encoding audio signals of a plurality of channels as a group of programs for each two-dimensional plane and information indicating channel arrangement in the two-dimensional plane; A single encoded stream integrated together may be generated and output.

また、上記の目的を達成するため、本発明のオーディオ信号復号化方法は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで第１の符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む第１の平面位置情報を生成すると共に、複数チャンネルのオーディオ信号よりも少ないチャンネル数に予め変換したオーディオ信号についても、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで第２の符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む第２の平面位置情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、第１及び第２の平面位置情報と第１及び第２の符号化要素とを複数の２次元平面のそれぞれについて分離する第１のステップと、第１のステップで分離された２次元平面毎の第１及び第２の符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号と、複数チャンネルよりも少ないチャンネル数に予め変換したオーディオ信号へと復号する第２のステップと、第１のステップで分離された２次元平面毎の第１及び第２の平面位置情報を合成して、復号された複数チャンネルのオーディオ信号と、複数チャンネルよりも少ないチャンネル数に予め変換したオーディオ信号の各チャンネルのオーディオ信号をそれぞれ出力するスピーカの位置を示す３次元チャンネル配置情報を生成する第３のステップとを含むことを特徴とする。 In order to achieve the above object, the audio signal decoding method of the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. A first encoding element is generated by encoding a plurality of two-dimensional planes as a single program, and further, first plane position information including information indicating channel arrangement in the two-dimensional plane is generated. In addition, the second encoding is performed by encoding an audio signal that has been converted into a smaller number of channels than a plurality of channels of audio signals into a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. Generating elements, and further generating second plane position information including information indicating the channel arrangement in the two-dimensional plane, and integrating them The first encoded stream is received as an input, and the first and second plane position information and the first and second encoded elements are separated from the encoded stream for each of a plurality of two-dimensional planes. 1 step and the first and second coding elements for each two-dimensional plane separated in the first step are respectively decoded and converted in advance into a multi-channel audio signal and a smaller number of channels than the plurality of channels. A second step of decoding into the audio signal, and combining the first and second plane position information for each two-dimensional plane separated in the first step, and decoding a plurality of channels of audio signals, Third order indicating the position of the speaker that outputs the audio signal of each channel of the audio signal converted in advance to the number of channels smaller than a plurality of channels Characterized in that it comprises a third step of generating the channel arrangement information.

また、上記の目的を達成するため、本発明のオーディオ信号復号化方法は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、さらに、３次元空間に配置される複数チャンネルのオーディオ信号を複数チャンネルよりも少ないチャンネル数のオーディオ信号として再生できるようにするための変換係数を示す情報を含む変換係数情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、平面位置情報と符号化要素とを複数の２次元平面のそれぞれについて分離し、さらに、変換係数情報を分離する第１のステップと、第１のステップで分離された２次元平面毎に符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号へと復号する第２のステップと、第１のステップで分離された２次元平面毎の平面位置情報を合成して、複数チャンネルのオーディオ信号の各チャンネルのオーディオ信号を出力するスピーカの位置を示す３次元チャンネル配置情報を生成する第３のステップと、復号された複数チャンネルのオーディオ信号に対して、３次元チャンネル配置情報を基に得られる第１のステップで分離された変換係数情報を乗じて、１枚以上の２次元平面で再生されるように、複数チャンネルのオーディオ信号よりも少ないチャンネル数のオーディオ信号に変換する第４のステップとを含むことを特徴とする。 In order to achieve the above object, the audio signal decoding method of the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. A plurality of two-dimensional planes are encoded as a group of programs to generate an encoding element, and plane position information including information indicating channel arrangement in the two-dimensional plane is generated, and further, three-dimensional It is obtained by generating conversion coefficient information including information indicating conversion coefficients so that audio signals of a plurality of channels arranged in space can be reproduced as audio signals having fewer channels than a plurality of channels, and integrating them. One encoded stream is received as an input, and plane position information and encoding elements are received from the encoded stream as a plurality of secondary Separating each of the planes, further separating the transform coefficient information, and decoding the encoding elements for each of the two-dimensional planes separated in the first step to form a multi-channel audio signal A second step of decoding and a three-dimensional position indicating the position of the speaker that outputs the audio signal of each channel of the audio signals of a plurality of channels by combining the plane position information for each two-dimensional plane separated in the first step A third step of generating channel arrangement information and the decoded multi-channel audio signal are multiplied by the transform coefficient information separated in the first step obtained based on the three-dimensional channel arrangement information. Audio signals with fewer channels than multi-channel audio signals so that they can be played back on two or more two-dimensional planes Characterized in that it comprises a fourth step of conversion.

また、上記の目的を達成するため、本発明のオーディオ信号復号化方法は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、さらに、３次元空間に配置される複数チャンネルのオーディオ信号を少ないチャンネル数で再生できるようにするための変換係数を示す情報を含む変換係数情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、平面位置情報と符号化要素とを複数の２次元平面のそれぞれについて分離し、さらに、変換係数情報を分離する第１のステップと、第１のステップで分離された２次元平面毎に符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号へと復号する第２のステップと、第１のステップで分離された２次元平面毎の平面位置情報を合成して、複数チャンネルのオーディオ信号の各チャンネルのオーディオ信号を出力するスピーカの位置を示す３次元チャンネル配置情報を生成する第３のステップと、復号された複数チャンネルのオーディオ信号に対して、３次元チャンネル配置情報を基に得られる第１のステップで分離された変換係数情報を乗じて、複数チャンネルのオーディオ信号から２チャンネルのバイノーラル信号に変換する第４のステップとを含むことを特徴とする。 In order to achieve the above object, the audio signal decoding method of the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. A plurality of two-dimensional planes are encoded as a group of programs to generate an encoding element, and plane position information including information indicating channel arrangement in the two-dimensional plane is generated, and further, three-dimensional A single encoded stream obtained by generating transform coefficient information including information indicating transform coefficients for enabling reproduction of audio signals of a plurality of channels arranged in space with a small number of channels, and integrating them. Receiving as input and separating plane position information and coding elements from each of the plurality of two-dimensional planes from the encoded stream; A first step of separating the conversion coefficient information, a second step of decoding the encoding elements for each of the two-dimensional planes separated in the first step, and decoding into a multi-channel audio signal; 3rd channel arrangement information which shows the position of the speaker which outputs the audio signal of each channel of the audio signal of a plurality of channels by combining the plane position information for every two-dimensional plane separated in one step is generated. Multiplying the decoded multi-channel audio signal by the transform coefficient information separated in the first step obtained on the basis of the three-dimensional channel arrangement information to the 2-channel binaural from the multi-channel audio signal And a fourth step of converting into a signal.

また、上記の目的を達成するため、本発明のオーディオ信号復号化装置は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、平面位置情報と符号化要素とを複数の２次元平面のそれぞれについて分離するストリーム分離部と、ストリーム分離部において分離された２次元平面毎の符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号へと復号する平面復号化部と、ストリーム分離部で分離された２次元平面毎の平面位置情報を合成して、復号された複数チャンネルのオーディオ信号の各チャンネルのオーディオ信号を出力するスピーカの位置を示す３次元チャンネル配置情報を生成する３次元空間合成部とを有することを特徴とする。 In order to achieve the above object, the audio signal decoding apparatus according to the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. Encoding elements are generated by encoding each group of two-dimensional planes as a single program, and plane position information including information indicating the channel arrangement in the two-dimensional plane is generated and integrated. And receiving a single encoded stream obtained as an input, and separating the plane position information and the encoding element from the encoded stream for each of a plurality of two-dimensional planes by the stream separator. A plane decoding unit that decodes the encoded elements for each two-dimensional plane and decodes them into audio signals of a plurality of channels; The three-dimensional channel arrangement information indicating the position of the speaker that outputs the audio signal of each channel of the decoded multi-channel audio signal is generated by synthesizing the plane position information for each two-dimensional plane separated by the segment separation unit. And a three-dimensional space synthesizing unit.

また、上記の目的を達成するため、本発明のオーディオ信号復号化装置は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで第１の符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む第１の平面位置情報を生成すると共に、予め変換した複数チャンネルよりも少ないチャンネル数のオーディオ信号についても、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで第２の符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む第２の平面位置情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、第１及び第２の平面位置情報と第１及び第２の符号化要素とを複数の２次元平面のそれぞれについて分離するストリーム分離部と、ストリーム分離部で分離された２次元平面毎の第１及び第２の符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号と、予め変換した複数チャンネルよりも少ないチャンネル数のオーディオ信号へと復号する平面復号化部と、ストリーム分離部で分離された２次元平面毎の第１及び第２の平面位置情報を合成して、復号された複数チャンネルのオーディオ信号と、予め変換した複数チャンネルよりも少ないチャンネル数のオーディオ信号の各チャンネルのオーディオ信号をそれぞれ出力するスピーカの位置を示す３次元チャンネル配置情報を生成する３次元空間合成部とを有することを特徴とする。 In order to achieve the above object, the audio signal decoding apparatus according to the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. A first encoding element is generated by encoding a plurality of two-dimensional planes as a single program, and further, first plane position information including information indicating channel arrangement in the two-dimensional plane is generated. In addition, a second encoding element is generated by encoding an audio signal having a smaller number of channels than a plurality of previously converted channels as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. Furthermore, the second plane position information including information indicating the channel arrangement in the two-dimensional plane is generated, and one code obtained by integrating them is obtained. A stream separation unit that receives an encoded stream as an input and separates the first and second plane position information and the first and second encoded elements for each of a plurality of two-dimensional planes from the encoded stream; The first and second coding elements for each two-dimensional plane separated by the separation unit are respectively decoded and decoded into a plurality of channels of audio signals and audio signals with fewer channels than the plurality of channels converted in advance. By combining the first and second plane position information for each two-dimensional plane separated by the plane decoding unit and the stream separation unit, the decoded plural-channel audio signal and fewer than the pre-converted plural channels 3D channel indicating the position of the speaker that outputs the audio signal of each channel of the audio signal of the number of channels And having a three-dimensional space combining unit for generating location information.

また、上記の目的を達成するため、本発明のオーディオ信号復号化装置は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、さらに、３次元空間に配置される複数チャンネルよりも少ないチャンネル数のオーディオ信号で再生できるようにするための変換係数を示す情報を含む変換係数情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、平面位置情報と符号化要素とを複数の２次元平面のそれぞれについて分離し、さらに、変換係数情報を分離するストリーム分離部と、ストリーム分離部で分離された２次元平面毎の符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号へと復号する平面復号化部と、ストリーム分離部で分離された２次元平面毎の平面位置情報を合成して、復号された複数チャンネルのオーディオ信号の各チャンネルのオーディオ信号を出力するスピーカの位置を示す３次元チャンネル配置情報を生成する３次元空間合成部と、復号された複数チャンネルのオーディオ信号に対して、３次元チャンネル配置情報を基に得られるストリーム分離部で分離された変換係数情報を乗じて、１枚以上の２次元平面で再生されるように、複数チャンネルよりも少ないチャンネル数のオーディオ信号に変換するダウンミックス部とを有することを特徴とする。 In order to achieve the above object, the audio signal decoding apparatus according to the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. A plurality of two-dimensional planes are encoded as a group of programs to generate an encoding element, and plane position information including information indicating channel arrangement in the two-dimensional plane is generated, and further, three-dimensional One encoded stream obtained by generating transform coefficient information including information indicating transform coefficients for enabling reproduction with audio signals having a smaller number of channels than a plurality of channels arranged in space and integrating them From the encoded stream, the plane position information and the encoded elements are separated for each of a plurality of two-dimensional planes. A stream separation unit that separates transform coefficient information, a plane decoding unit that decodes the encoded elements for each two-dimensional plane separated by the stream separation unit, and decodes them into a multi-channel audio signal; and stream separation 3D to generate 3D channel arrangement information indicating the position of the speaker that outputs the audio signal of each channel of the decoded audio signals of a plurality of channels by combining the plane position information for each 2D plane separated by the unit Multiply the decoded multi-channel audio signal by the spatial coefficient unit and the transform coefficient information separated by the stream separation unit obtained based on the three-dimensional channel arrangement information, and reproduce it on one or more two-dimensional planes. A downmix unit that converts the audio signal into a smaller number of channels than a plurality of channels. And wherein the door.

また、上記の目的を達成するため、本発明のオーディオ信号復号化装置は、３次元空間に立体的に配置される複数のスピーカから出力するための複数チャンネルのオーディオ信号を、３次元空間に含まれる複数の２次元平面毎にひとまとまりのプログラムとして符号化することで符号化要素を生成し、さらに、２次元平面内のチャンネル配置を示す情報を含む平面位置情報を生成し、さらに、３次元空間に配置される複数チャンネルよりも少ないチャンネル数で再生できるようにするための変換係数を示す情報を含む変換係数情報を生成し、それらを統合して得られる１本の符号化ストリームを入力として受け、その符号化ストリームから、平面位置情報と符号化要素とを複数の２次元平面のそれぞれについて分離し、さらに、変換係数情報を分離するストリーム分離部と、ストリーム分離部で分離された２次元平面毎の符号化要素をそれぞれ復号化して、複数チャンネルのオーディオ信号へと復号する平面復号化部と、ストリーム分離部で分離された２次元平面毎の平面位置情報を合成して、復号された複数チャンネルのオーディオ信号の各チャンネルのオーディオ信号を出力するスピーカの位置を示す３次元チャンネル配置情報を生成する３次元空間合成部と、復号された複数チャンネルのオーディオ信号に対して、３次元チャンネル配置情報を基に得られるストリーム分離部で分離された変換係数情報を乗じて、複数チャンネルのオーディオ信号から２チャンネルのバイノーラル信号に変換するダウンミックス部とを有することを特徴とする。 In order to achieve the above object, the audio signal decoding apparatus according to the present invention includes a plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space. A plurality of two-dimensional planes are encoded as a group of programs to generate an encoding element, and plane position information including information indicating channel arrangement in the two-dimensional plane is generated, and further, three-dimensional Generating transform coefficient information including information indicating transform coefficients for enabling reproduction with a smaller number of channels than a plurality of channels arranged in space, and using one encoded stream obtained by integrating them as an input Receiving, separating the plane position information and the coding element from each of the plurality of two-dimensional planes from the encoded stream, and further converting coefficient information The stream separation unit to be separated, the plane decoding unit that decodes the encoding elements for each two-dimensional plane separated by the stream separation unit, and decodes the audio signals of a plurality of channels, and the stream separation unit A three-dimensional space synthesis unit for synthesizing plane position information for each two-dimensional plane and generating three-dimensional channel arrangement information indicating a position of a speaker that outputs an audio signal of each channel of the decoded audio signals of a plurality of channels; The decoded multi-channel audio signal is multiplied by the conversion coefficient information separated by the stream separation unit obtained based on the three-dimensional channel arrangement information, and converted from the multi-channel audio signal to the 2-channel binaural signal. And a downmix unit.

本発明のオーディオ信号符号化方法及び装置によれば、３次元空間に対応する音声信号の３次元空間における位置を明確に指定した３次元の音場再現が可能な符号化ストリームを生成することができる。 According to the audio signal encoding method and apparatus of the present invention, it is possible to generate an encoded stream capable of reproducing a three-dimensional sound field in which a position in a three-dimensional space of an audio signal corresponding to the three-dimensional space is clearly specified. it can.

また、本発明のオーディオ信号復号化方法及び装置によれば、符号化側で生成した３次元空間に対応する音声信号を２次元平面毎の音声信号に分割し、分割した２次元平面毎のオーディオ信号を符号化した符号化要素を、分割情報と平面でのチャンネル配置情報とを含めて統合した符号化ストリームを復号化するため、３次元空間における音源位置を明確に指定した３次元の音場再現ができる。 According to the audio signal decoding method and apparatus of the present invention, the audio signal corresponding to the three-dimensional space generated on the encoding side is divided into audio signals for each two-dimensional plane, and the divided audio for each two-dimensional plane is divided. A three-dimensional sound field that clearly specifies a sound source position in a three-dimensional space in order to decode a coded stream in which coding elements obtained by coding a signal are integrated including division information and channel arrangement information in a plane. Can be reproduced.

本発明のオーディオ信号符号化装置の一実施の形態のブロック図である。It is a block diagram of one embodiment of an audio signal encoding device of the present invention. ２２.２ｃｈのスピーカ配置を示す図である。It is a figure which shows the speaker arrangement | positioning of 22.2ch. 図２に示すスピーカ配置の３次元空間を高さ方向に３つに分割した２次元平面のスピーカ配置（チャンネル位置）を示す図である。It is a figure which shows the speaker arrangement (channel position) of the two-dimensional plane which divided | segmented the three-dimensional space of the speaker arrangement shown in FIG. 2 into three in the height direction. 図３に示した３つの２次元平面の各対応要素（スピーカ位置）の３次元座標を示す図である。It is a figure which shows the three-dimensional coordinate of each corresponding element (speaker position) of three two-dimensional planes shown in FIG. 本発明により３つの２次元平面に分割して符号化して得られるＭＰＥＧ準拠符号化ストリームの第１の例のフォーマットを示す図である。It is a figure which shows the format of the 1st example of the MPEG based encoding stream obtained by dividing | segmenting and encoding to three two-dimensional plane by this invention. 図５中のPCEの構成を、記述言語を用いて示した図である。It is the figure which showed the structure of PCE in FIG. 5 using the description language. 図１のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠符号化ストリームで定義されるPCEを記述言語を用いて表した図である。FIG. 2 is a diagram showing a PCE defined by an MPEG-compliant encoded stream generated by the audio signal encoding device of FIG. 1 using a description language. 図２に示すスピーカ配置の３次元空間を奥行き方向に３つに分割した２次元平面のスピーカ配置（チャンネル位置）を示す図である。It is a figure which shows the speaker arrangement (channel position) of the two-dimensional plane which divided | segmented the three-dimensional space of the speaker arrangement shown in FIG. 2 into the depth direction. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第２の例のフォーマットを示す図である。It is a figure which shows the format of the 2nd example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 図９にPCE3として示した５.１ｃｈ互換用PCEにおける設定例を示す図である。It is a figure which shows the example of a setting in PCE for 5.1ch compatibility shown as PCE3 in FIG. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第３の例のフォーマットを示す図である。It is a figure which shows the format of the 3rd example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第４の例のフォーマットを示す図である。It is a figure which shows the format of the 4th example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第５の例のフォーマットを示す図である。It is a figure which shows the format of the 5th example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第６の例のフォーマットを示す図である。It is a figure which shows the format of the 6th example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第７の例のフォーマットを示す図である。It is a figure which shows the format of the 7th example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 図１５中のDSE0の構成を、記述言語を用いて示した図である。It is the figure which showed the structure of DSE0 in FIG. 15 using the description language. 本発明のオーディオ信号復号化装置の第１の実施形態のブロック図である。It is a block diagram of 1st Embodiment of the audio signal decoding apparatus of this invention. 本発明のオーディオ信号復号化装置の第２の実施形態のブロック図である。It is a block diagram of 2nd Embodiment of the audio signal decoding apparatus of this invention. 図１８中のダウンミックス部の一例のブロック図である。It is a block diagram of an example of the downmix part in FIG. 図１９中のモード１ブロックの一例のブロック図である。It is a block diagram of an example of the mode 1 block in FIG. 本発明のオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第８の例のフォーマットを示す図である。It is a figure which shows the format of the 8th example of the encoding stream based on MPEG produced | generated by the audio signal encoding apparatus of this invention. 図２０中の上層チャンネルダウンミックス部の一例の構成図である。It is a block diagram of an example of the upper layer channel downmix part in FIG. 図２０中の中層チャンネルダウンミックス部の一例の構成図である。It is a block diagram of an example of the middle layer channel downmix part in FIG. 図２０中の下層チャンネルダウンミックス部の一例の構成図である。It is a block diagram of an example of the lower layer channel downmix part in FIG. 図２０中の５.１ｃｈ合成部の一例の構成図である。It is a block diagram of an example of the 5.1ch synthetic | combination part in FIG. 図１９中のモード２ブロックの一例のブロック図である。It is a block diagram of an example of the mode 2 block in FIG. 図２６中の２ｃｈ合成部の一例のブロック図である。FIG. 27 is a block diagram of an example of a 2ch combining unit in FIG. 26. 図１９中のモード３ブロックの一例のブロック図である。It is a block diagram of an example of the mode 3 block in FIG. 図１９中のモード４ブロックの一例のブロック図である。It is a block diagram of an example of the mode 4 block in FIG. 図１８に示す本発明のオーディオ信号復号化装置におけるモード４のときの効果を説明するスピーカ配置の一例を示す図である。It is a figure which shows an example of the speaker arrangement | positioning explaining the effect at the time of the mode 4 in the audio signal decoding apparatus of this invention shown in FIG.

次に、本発明の実施の形態について図面と共に詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明になるオーディオ信号符号化装置の一実施の形態のブロック図を示す。同図に示すように、本実施の形態のオーディオ信号符号化装置１０は、３次元空間分割部１１と、３つの平面符号化部１２、１３及び１４と、ストリーム統合部１５とから構成され、外部からＮチャンネル（Ｎは自然数）のオーディオ信号、分割方向、各チャンネルのオーディオ信号の３次元チャンネル配置情報を入力として受け、例えばＭＰＥＧ準拠の符号化ストリームを出力する。 FIG. 1 shows a block diagram of an embodiment of an audio signal encoding apparatus according to the present invention. As shown in the figure, the audio signal encoding device 10 according to the present embodiment includes a three-dimensional space division unit 11, three plane encoding units 12, 13, and 14, and a stream integration unit 15. An externally received N-channel (N is a natural number) audio signal, division direction, and three-dimensional channel arrangement information of the audio signal of each channel are received as inputs, and an MPEG-compliant encoded stream is output, for example.

上記の「分割方向」は、Ｎチャンネルの各チャンネル毎のオーディオ信号がそれぞれ発音されるＮ個のスピーカが配置された、Ｘ軸，Ｙ軸，Ｚ軸の３軸からなる３次元空間を、２次元平面に分割するときの分割方向が、Ｘ−Ｙ平面分割、Ｘ−Ｚ平面分割、Ｙ−Ｚ平面分割の３種類のいずれであるかを示す情報である。なお、「分割方向」は必ずしも外部から入力される必要はなく、３次元空間分割部１１は、Ｘ−Ｙ平面分割、Ｘ−Ｚ平面分割、Ｙ−Ｚ平面分割の内、いずれか１つの「分割方向」を常に使用するようにしてもよい。また、上記の「３次元チャンネル配置情報」は、入力されたＮチャンネル（ｃｈ）のオーディオ信号が発音される３次元空間上の各チャンネルのスピーカ（あるいは音源）の配置位置を、聴取者の位置（リスニングポイント）を原点（０，０，０）としたときの、３次元座標（ｘ，ｙ，ｚ）で示す情報である。更に、上記のＮｃｈオーディオ信号は、Ｎｃｈのアナログオーディオ信号を各チャンネル別に、例えばパルス符号変調（ＰＣＭ）して得られたデジタル信号である。 The above-mentioned “division direction” refers to a three-dimensional space composed of three axes of the X axis, the Y axis, and the Z axis in which N speakers each generating an audio signal for each of the N channels are arranged. This is information indicating whether the division direction when dividing into a three-dimensional plane is one of three types: XY plane division, XZ plane division, and YZ plane division. Note that the “division direction” is not necessarily input from the outside, and the three-dimensional space division unit 11 may select any one of “X-Y plane division, XZ plane division, and YZ plane division”. The “division direction” may always be used. Also, the above-mentioned “3D channel arrangement information” indicates the arrangement position of the speakers (or sound sources) of each channel on the 3D space where the input N channel (ch) audio signal is generated, and the position of the listener. This is information indicated by three-dimensional coordinates (x, y, z) where (listening point) is the origin (0, 0, 0). Further, the Nch audio signal is a digital signal obtained by subjecting an Nch analog audio signal to each channel, for example, pulse code modulation (PCM).

３次元空間分割部１１は、分割方向と３次元チャンネル配置情報とに基づいて、３次元空間上に配置されたスピーカに対応するＮチャンネルの音源に関する情報を、複数の２次元平面の音源に関する情報に分割し、平面情報と、分割された２次元平面上における音源位置を示すチャンネルマッピング情報とを出力する。平面情報には、平面総数、平面毎のチャンネル総数、平面分割順序を示す情報が含まれる。チャンネルマッピング情報は、各チャンネルと平面との関連付け、各チャンネルと平面内の位置（フロント、サイド、リア等）との関連付けを表す。 Based on the division direction and the three-dimensional channel arrangement information, the three-dimensional space division unit 11 obtains information on N-channel sound sources corresponding to speakers arranged in the three-dimensional space, and information on a plurality of two-dimensional plane sound sources. The plane information and channel mapping information indicating the sound source position on the divided two-dimensional plane are output. The plane information includes information indicating the total number of planes, the total number of channels for each plane, and the plane division order. The channel mapping information represents an association between each channel and a plane, and an association between each channel and a position (front, side, rear, etc.) in the plane.

平面符号化部１２、１３及び１４は、３次元空間を分割して得られた３つの２次元の平面のうち、割り当てられた一平面毎に設けられており、上記の平面情報とチャンネルマッピング情報とに基づいて、平面に含まれる各チャンネルのオーディオ信号に対して例えばＭＰＥＧ準拠の符号化方式により、ひとまとまりのプログラムとして符号化を行い、オーディオ信号そのものを符号化した際に得られるメインデータ（符号化データ）を意味する符号化要素（SCE、CPEエレメント）と、補助データとして、その平面内のチャンネル配置（フロント何チャンネル、サイド何チャンネル、リア何チャンネル）を示す情報を含む平面位置情報（PCEエレメント）とを出力する。 The plane encoding units 12, 13 and 14 are provided for each assigned plane among the three two-dimensional planes obtained by dividing the three-dimensional space, and the plane information and channel mapping information described above are provided. Based on the above, the main data obtained when the audio signal of each channel included in the plane is encoded as a group of programs by, for example, an MPEG-compliant encoding method, and the audio signal itself is encoded ( Plane position information (SCE, CPE element) meaning the encoded data) and plane position information including information indicating the channel arrangement (front channel, side channel, rear channel) in the plane as auxiliary data PCE element).

ここで、本実施の形態では、上記の符号化方式として、ＢＳデジタル放送で使用されているＭＰＥＧ−２ＡＡＣ符号化方式を例として説明する。なお、ＭＰＥＧ−４ＡＡＣ符号化方式でもよく、この符号化方式は、ＭＰＥＧ−２ＡＡＣ符号化方式に新たなオプションツールが追加された形であるので、ＭＰＥＧ−２ＡＡＣ符号化方式で代用することができる。なお、以下の説明では、両符号化方式をまとめて「ＭＰＥＧ−２／４ＡＡＣ」と記す。 Here, in this embodiment, an MPEG-2 AAC encoding method used in BS digital broadcasting will be described as an example of the above encoding method. The MPEG-4 AAC encoding method may be used, and this encoding method is a form in which a new optional tool is added to the MPEG-2 AAC encoding method. Therefore, the MPEG-2 AAC encoding method should be substituted. Can do. In the following description, both encoding methods are collectively referred to as “MPEG-2 / 4 AAC”.

ストリーム統合部１５は、平面符号化部１２、１３及び１４からそれぞれ出力された、３つの平面の平面位置情報と符号化要素とを統合して、１本のＭＰＥＧ準拠（ここでは、ＭＰＥＧ−２／４ＡＡＣ）の符号化ストリームを生成して出力する。 The stream integration unit 15 integrates the plane position information and the encoding elements of the three planes output from the plane encoding units 12, 13, and 14 respectively, and integrates one MPEG compliant (here, MPEG-2). / 4 AAC) encoded stream is generated and output.

次に、本実施の形態のオーディオ信号符号化装置１０の動作について詳細に説明する。ここでは、図２に示す２２.２ｃｈの３次元のスピーカ配置による立体音場を生成するために、図２に示すスピーカ配置の３次元空間を、図３（Ａ）〜（Ｃ）に示す３つの２次元平面に分割し、図５に示す構成のＭＰＥＧ−２／４ＡＡＣ符号化ストリームを生成する例について説明する。 Next, the operation of the audio signal encoding device 10 of the present embodiment will be described in detail. Here, in order to generate the three-dimensional sound field by the 22.2ch three-dimensional speaker arrangement shown in FIG. 2, the three-dimensional space of the speaker arrangement shown in FIG. 2 is represented by 3 shown in FIGS. An example in which an MPEG-2 / 4 AAC encoded stream having the configuration shown in FIG. 5 is generated by dividing into two two-dimensional planes will be described.

図２は、２２.２ｃｈを有する音響システムのスピーカ配置を示す。この２２.２ｃｈを有する音響システムのスピーカ配置は、衛星デジタル放送の高度化（高度ＢＳ）において定義されている。図２に示すように、上層９チャンネル、中層１０チャンネル、下層３チャンネル、ＬＦＥ（Low Frequency Effect）２チャンネルによる合計２２.２ｃｈにより、（幅×奥行き×高さ）の３次元空間が構築される。なお、ＬＦＥチャンネルは主に低音域の音声が割り当てられるチャンネルである。 FIG. 2 shows the speaker arrangement of an acoustic system with 22.2 ch. The speaker arrangement of the sound system having 22.2 ch is defined in the advancement of satellite digital broadcasting (advanced BS). As shown in FIG. 2, a three-dimensional space of (width × depth × height) is constructed by a total of 22.2 channels of the upper layer 9 channels, the middle layer 10 channels, the lower layer 3 channels, and the LFE (Low Frequency Effect) 2 channels. . The LFE channel is a channel to which mainly low-frequency sound is assigned.

この２２.２ｃｈの３次元空間の立体音場を示す符号化ストリームを、ＢＳデジタル放送と同様に、ＭＰＥＧ−２／４ＡＡＣ符号化方式により符号化して構築しようとする場合、前述したようにＭＰＥＧ−２／４ＡＡＣ符号化方式では高さ方向を定義することができないので、基本的に現規格では符号化ストリームを構築することができない。 In the case where an encoded stream representing a 32.2 spatial space in 3D space is encoded by the MPEG-2 / 4 AAC encoding method in the same way as BS digital broadcasting, as described above, MPEG is used. Since the height direction cannot be defined in the −2/4 AAC encoding method, basically, the encoded stream cannot be constructed in the current standard.

そこで、本実施の形態では、図２に示した３次元空間（幅×奥行き×高さ）を、一例として３次元空間の高さ方向に分割して、図３（Ａ）に示すように上層の９チャンネル（UFL,UFC,UFR,USL,USC,USR,UBL,UBC,UBR）を有する２次元平面と、図３（Ｂ）に示すように中層の１０チャンネル(MFL,MFLC,MFC,MFRC,MFR,MSL,MSR,MBL,MBC,MBR)を有する２次元平面と、図３（Ｃ）に示すように下層３チャンネル(LFL,LFC,LFR)とＬＦＥ２チャンネル(LFEL,LFER)とからなる３.２チャンネルの下層部の２次元平面の３つに分割している。このように、３次元空間を高さ方向に分割することで、ＭＰＥＧ−２／４ＡＡＣ符号化方式における２次元平面のチャンネル配置を定義することが可能となる。なお、ここでは、一例として２２.２ｃｈ音響システムに対応して３層に分割しているが、分割数を限定するものではない。同様に分割方向は高さ方向に限定するものではない。 Therefore, in the present embodiment, the three-dimensional space (width × depth × height) shown in FIG. 2 is divided in the height direction of the three-dimensional space as an example, and an upper layer as shown in FIG. 2 channels with 9 channels (UFL, UFC, UFR, USL, USC, USR, UBL, UBC, UBR) and 10 channels in the middle layer (MFL, MFLC, MFC, MFRC) as shown in FIG. , MFR, MSL, MSR, MBL, MBC, MBR), as shown in FIG. 3 (C), the lower 3 channels (LFL, LFC, LFR) and LFE2 channels (LFEL, LFER) 3.2 Divided into three channels in the two-dimensional plane of the lower layer of the channel. As described above, by dividing the three-dimensional space in the height direction, it is possible to define a channel arrangement on a two-dimensional plane in the MPEG-2 / 4 AAC encoding system. In addition, although divided | segmented into 3 layers here as an example corresponding to a 22.2ch sound system, the division | segmentation number is not limited. Similarly, the dividing direction is not limited to the height direction.

再び図１に戻って説明する。３次元空間分割部１１は、入力される分割方向が一例としてＸ−Ｙ分割を示しており、また、入力される３次元チャンネル配置情報として、表１に示す情報が入力されたものとする。 Returning again to FIG. As an example, the three-dimensional space dividing unit 11 indicates XY division as an input division direction, and information shown in Table 1 is input as input three-dimensional channel arrangement information.

表１は、図２に示した２２.２ｃｈのスピーカ配置の３次元空間を、図３（Ａ）〜（Ｃ）に示した高さ方向に３つの２次元平面に分割したときの、３次元チャンネル配置情報と各チャンネルと図３（Ａ）〜（Ｃ）及び図４（Ａ）〜（Ｃ）に示す対応要素（スピーカ位置）との関係を示す。表１に示す３次元チャンネル配置情報は、図４（Ｂ）に示したリスナー位置を原点とした時の距離を示す。なお、図４（Ａ）〜（Ｃ）は図３（Ａ）〜（Ｃ）と同じ３つの２次元平面を示しており、更に各対応要素の３次元座標を示している。

Table 1 shows the three-dimensional space when the two-dimensional speaker arrangement of 22.2ch shown in FIG. 2 is divided into three two-dimensional planes in the height direction shown in FIGS. The relationship between channel arrangement information, each channel, and the corresponding elements (speaker positions) shown in FIGS. 3 (A) to 3 (C) and FIGS. 4 (A) to 4 (C) is shown. The three-dimensional channel arrangement information shown in Table 1 indicates the distance when the listener position shown in FIG. 4A to 4C show the same three two-dimensional planes as in FIGS. 3A to 3C, and further show the three-dimensional coordinates of the corresponding elements.

３次元空間分割部１１は、表１に示す３次元チャンネル配置情報に基づいて、平面総数と、平面毎のチャンネル総数と、平面分割順序とからなる平面情報を生成して出力する。ここで、上記の平面総数は２次元平面の総数であるので「３」、上記の平面毎のチャンネル総数は表２に示され、上記の平面分割順序は４ビット表現で例えば「００１１」である。ここで、「００１１」で表される平面分割順序は、Ｘ−Ｙ分割で、上層部から中層部を経て下層部＋ＬＦＥの順序で分割することを示す。なお、Ｘ−Ｙ分割で、下層部＋ＬＦＥから中層部を経て上層部への順序で分割する場合は、上記平面分割順序は「００１０」とされる。また、分割しない場合は上記平面分割順序は「００００」とされる。 Based on the three-dimensional channel arrangement information shown in Table 1, the three-dimensional space division unit 11 generates and outputs plane information including the total number of planes, the total number of channels for each plane, and the plane division order. Here, the total number of planes is “2” because it is the total number of two-dimensional planes, the total number of channels for each plane is shown in Table 2, and the plane division order is “0011” in 4-bit representation, for example. . Here, the plane division order represented by “0011” indicates that the division is an XY division and the division is performed in the order of the lower layer portion + LFE from the upper layer portion through the middle layer portion. In addition, when dividing in the order from the lower layer portion + LFE to the upper layer portion through the middle layer portion in the XY division, the plane division order is “0010”. In the case of no division, the plane division order is “0000”.

表２に示す平面毎のチャンネル総数は、オーディオ信号符号化装置１０が後述する図５に示すフォーマットの符号化ストリームを生成する場合の例で、平面番号とチャンネル総数と図５の対応エレメントであるＰＣＥ(Program_Config_Element)との関係を示す。

The total number of channels for each plane shown in Table 2 is an example in the case where the audio signal encoding apparatus 10 generates an encoded stream of the format shown in FIG. 5 to be described later, and is the plane number, the total number of channels, and the corresponding elements in FIG. The relationship with PCE (Program_Config_Element) is shown.

平面番号「０」は上層部の２次元平面を示し、そのチャンネル総数は図３（Ａ）、図４（Ａ）に示すように「９」である。また、平面番号「１」は中層部の２次元平面を示し、そのチャンネル総数は図３（Ｂ）、図４（Ｂ）に示すように「１０」である。更に、平面番号「２」は下層部＋ＬＦＥの２次元平面を示し、図３（Ｃ）、図４（Ｃ）に示すように３.２ｃｈであるので、そのチャンネル総数は「５」である。従って、全チャンネル総数はｃｈ０〜ｃｈ２３の２４チャンネルとなる。 The plane number “0” indicates the two-dimensional plane of the upper layer part, and the total number of channels is “9” as shown in FIGS. 3A and 4A. The plane number “1” indicates the two-dimensional plane of the middle layer, and the total number of channels is “10” as shown in FIGS. 3B and 4B. Further, the plane number “2” indicates a two-dimensional plane of the lower layer portion + LFE, and is 3.2 ch as shown in FIGS. 3C and 4C, and therefore the total number of channels is “5”. Therefore, the total number of all channels is 24 channels of ch0 to ch23.

また、３次元空間分割部１１は、Ｘ−Ｙ分割である場合、同じ高さ（Ｚ軸）のチャンネルをまとめて一平面とし、また原点からのＹ軸の距離が同じものをフロント(front)、サイド(side)、バック(back)に分割する。また、センターのような単独チャンネルと、Ｌ／Ｒのようなペアチャンネルとの分離も行う。 Further, in the case of XY division, the three-dimensional space division unit 11 combines channels having the same height (Z axis) into one plane, and sets the same Y axis distance from the origin to the front. Divide into side, back. Also, a single channel such as the center is separated from a pair channel such as L / R.

これにより、３次元空間分割部１１は、フロントチャンネルで単独チャンネルを示す「front single 識別」、フロントチャンネルでペアチャンネルを示す「front pair識別」、サイドチャンネルで単独チャンネルを示す「side single 識別」、サイドチャンネルでペアチャンネルを示す「side pair識別」、バックチャンネルで単独チャンネルを示す「back single 識別」、バックチャンネルでペアチャンネルを示す「back pair識別」、ＬＦＥチャンネルで単独チャンネルを示す「LFE single 識別」を各チャンネル毎に示すチャンネルマッピング情報を生成する。表３は、このチャンネルマッピング情報を示す。 Accordingly, the three-dimensional space division unit 11 performs “front single identification” indicating a single channel in the front channel, “front pair identification” indicating a pair channel in the front channel, “side single identification” indicating a single channel in the side channel, “Side pair identification” indicating a pair channel in the side channel, “back single identification” indicating a single channel in the back channel, “back pair identification” indicating a pair channel in the back channel, “LFE single identification” indicating a single channel in the LFE channel Is generated for each channel. Table 3 shows this channel mapping information.

表３において、対応要素は、オーディオ信号符号化装置１０が後述する図５に示すフォーマットの符号化ストリームを生成する場合の、そのフォーマットの対応要素を示す。

In Table 3, the corresponding element indicates a corresponding element of the format when the audio signal encoding apparatus 10 generates an encoded stream of the format shown in FIG.

平面符号化部１２は、３次元空間分割部１１からの平面情報とチャンネルマッピング情報に基づいて、２２.２ｃｈの入力オーディオ信号のうち、平面番号「０」の上層部の２次元平面の各チャンネルのオーディオ信号に対してＭＰＥＧ−２／４ＡＡＣ符号化方式による符号化を行い、平面位置情報と符号化要素とを生成して出力する。 Based on the plane information and the channel mapping information from the three-dimensional space division unit 11, the plane encoding unit 12 has each channel of the two-dimensional plane of the upper layer of the plane number “0” in the 22.2ch input audio signal. The audio signal is encoded by the MPEG-2 / 4 AAC encoding method, and plane position information and encoding elements are generated and output.

また、平面符号化部１２の符号化動作と並行して、平面符号化部１３は、３次元空間分割部１１からの平面情報とチャンネルマッピング情報に基づいて、平面番号「１」の中層部の２次元平面の各チャンネルのオーディオ信号に対して、また平面符号化部１４は、３次元空間分割部１１からの平面情報とチャンネルマッピング情報に基づいて、平面番号「２」の下層部＋ＬＦＥの２次元平面の各チャンネルのオーディオ信号に対して、それぞれＭＰＥＧ−２／４ＡＡＣ符号化方式による符号化を行い、平面位置情報と符号化要素とを生成して出力する。 In parallel with the encoding operation of the plane encoding unit 12, the plane encoding unit 13 determines the middle layer of the plane number “1” based on the plane information and the channel mapping information from the three-dimensional space division unit 11. For the audio signal of each channel of the two-dimensional plane, and the plane encoding unit 14, based on the plane information and the channel mapping information from the three-dimensional space division unit 11, the lower layer part of the plane number “2” + LFE 2 The audio signal of each channel on the dimension plane is encoded by the MPEG-2 / 4 AAC encoding method, and plane position information and encoding elements are generated and output.

このとき平面符号化部１２〜１４は、同一平面のチャンネルの符号化形態（チャンネルペアで符号化：ＣＰＥ、シングルチャンネルで符号化：ＳＣＥ）を、チャンネルマッピング情報を基に決定して符号化を行い、符号化要素を生成する。また、平面符号化部１２〜１４は、チャンネルマッピング情報から平面位置情報を生成する。これはストリーム中のプログラムコンフィグエレメント（ＰＣＥ）に相当する。 At this time, the plane encoding units 12 to 14 determine the encoding mode of channels on the same plane (encoding with a channel pair: CPE, encoding with a single channel: SCE) based on the channel mapping information and encoding. To generate a coding element. In addition, the plane encoding units 12 to 14 generate plane position information from the channel mapping information. This corresponds to a program configuration element (PCE) in the stream.

ストリーム統合部１５は、平面符号化部１２〜１４からそれぞれ出力された平面位置情報と符号化要素とから、ＭＰＥＧ−２／４ＡＡＣ符号化方式に準拠した符号化ストリームを生成して出力する。 The stream integration unit 15 generates and outputs an encoded stream compliant with the MPEG-2 / 4 AAC encoding method from the plane position information and the encoding elements output from the plane encoding units 12 to 14 respectively.

次に、オーディオ信号符号化装置１０から出力されるＭＰＥＧ準拠符号化ストリームの各例について説明する。 Next, examples of MPEG-compliant encoded streams output from the audio signal encoding device 10 will be described.

図５は、本発明により図２に示した２２.２ｃｈの３次元空間を、図３（Ａ）〜（Ｃ）に示すような３つの２次元平面に分割して符号化して得られるＭＰＥＧ準拠符号化ストリームの第１の例のフォーマットを示す。このＭＰＥＧ準拠符号化ストリームは、ＭＰＥＧ−２／４ＡＡＣ符号化方式で符号化されたストリームで、そのフォーマットは、ＡＤＴＳ（Audio_Data_Transport_Stream）フォーマットと呼ばれており、この図５（Ａ）もＡＤＴＳフォーマットに準拠している。 FIG. 5 shows an MPEG conformity obtained by dividing and encoding the 22.2 ch three-dimensional space shown in FIG. 2 into three two-dimensional planes as shown in FIGS. The format of the 1st example of an encoding stream is shown. This MPEG-compliant encoded stream is a stream encoded by the MPEG-2 / 4 AAC encoding method, and its format is called an ADTS (Audio_Data_Transport_Stream) format. FIG. 5A is also converted to the ADTS format. It is compliant.

図５（Ａ）に示すように、符号化フォーマットは、１オーディオフレームに相当する「adts_frame」単位で時系列的に合成された構造である。「adts_frame」は、同期コード、フレーム長などの情報やＣＲＣエラー検出コードを含む「adts_header」と呼ぶヘッダと、符号化したオーディオ情報がエレメントと呼ばれる単位にまとめられて収められたブロックである「raw_data_block」とからなる。 As shown in FIG. 5A, the encoding format has a structure synthesized in time series in units of “adts_frame” corresponding to one audio frame. “Adts_frame” is a block in which a header called “adts_header” including information such as a synchronization code, a frame length, and a CRC error detection code, and encoded audio information are stored in a unit called an element “raw_data_block” It consists of.

本実施の形態の符号化ストリームは、図５（Ａ）、（Ｂ）に示すように、「raw_data_block」が、チャンネル情報用のPCE(Program_Config_Element)と、上層の情報「upper_layer」、中層の情報「middle_layer」、及び下層＋ＬＦＥの情報「lower+LFE_layer」と、スタッフィングビット用のFIL(File_element)と、フレームの終わりを示すEND(Terminator)とから構成される。 As shown in FIGS. 5A and 5B, the encoded stream according to the present embodiment includes “raw_data_block” including channel information PCE (Program_Config_Element), upper layer information “upper_layer”, and middle layer information “ middle_layer ", lower layer + LFE information" lower + LFE_layer ", stuffing bit FIL (File_element), and END (Terminator) indicating the end of the frame.

チャンネル情報用のPCEは図５（Ｂ）に示すように「PCE0」、「PCE1」、「PCE2」からなる。「PCE0」は、図１に示した平面符号化部１２から出力された上層の平面位置情報である。同様に、「PCE1」、「PCE2」は、それぞれ図１に示した平面符号化部１３、１４から出力された中層と下層＋ＬＦＥの平面位置情報である。 The PCE for channel information includes “PCE0”, “PCE1”, and “PCE2” as shown in FIG. “PCE0” is the plane position information of the upper layer output from the plane encoding unit 12 shown in FIG. Similarly, “PCE1” and “PCE2” are the plane position information of the middle layer and the lower layer + LFE output from the plane coding units 13 and 14 shown in FIG. 1, respectively.

上層の情報「upper_layer」は、図５（Ｃ）に示すように、フロントのSCE(Single_Channel_Element)0及びCPE(Channel_Pair_Element)0と、サイドのSCE1及びCPE1と、バック（リア）のSCE2及びCPE2とからなる。この上層の情報「upper_layer」は、図１に示した平面符号化部１２から出力された符号化要素である。なお、ここでは、エレメントの名称とタグ番号（element_instance_tag）を一緒に記載している。例えば、SCEでタグ番号０を有するものを「SCE0」と記載している。 As shown in FIG. 5C, the upper layer information “upper_layer” is obtained from the front SCE (Single_Channel_Element) 0 and CPE (Channel_Pair_Element) 0, the side SCE1 and CPE1, and the back (rear) SCE2 and CPE2. Become. The upper layer information “upper_layer” is an encoding element output from the plane encoding unit 12 illustrated in FIG. 1. Here, the element name and the tag number (element_instance_tag) are described together. For example, an SCE having a tag number 0 is described as “SCE0”.

同様に、中層の情報「middle_layer」は、図５（Ｄ）に示すように、フロントのSCE3、CPE3及びCPE4と、サイドのCPE5と、バックのSCE4及びCPE6とからなる。この中層の情報「middle_layer」は、図１に示した平面符号化部１３から出力された符号化要素である。また、下層＋ＬＦＥの情報「lower+LFE_layer」は、図５（Ｅ）に示すように、フロントのSCE5及びCPE7と、ＬＦＥのLFE(LFE_Channel_Element)0及びLFE1とからなる。この下層＋ＬＦＥの情報「lower+LFE_layer」は、図１に示した平面符号化部１４から出力された符号化要素である。 Similarly, as shown in FIG. 5D, the middle layer information “middle_layer” includes front SCE3, CPE3, and CPE4, side CPE5, and back SCE4 and CPE6. The middle layer information “middle_layer” is an encoding element output from the plane encoding unit 13 illustrated in FIG. 1. Further, as shown in FIG. 5E, the lower layer + LFE information “lower + LFE_layer” includes front SCE 5 and CPE 7 and LFE LFE (LFE_Channel_Element) 0 and LFE 1. This lower layer + LFE information “lower + LFE_layer” is a coding element output from the plane coding unit 14 shown in FIG.

図６は、上記のPCEの構成を、記述言語を用いて示した図である。各名称の後ろには、ビット数とその単位（uimsbfは符号無し整数値、bslbfはビット列）を表している。先頭にある「element_innstance_tag」は、タグ番号であり、複数のPCEが存在した場合に、それらを区別することができる。「num_front_channel_elements」はフロントチャンネルに存在するエレメント数を表しており、同様に、サイドチャンネル用、バックチャンネル用、LFEチャンネル用にそれぞれ、「num_side_channel_elements」,「num_back_channnel_elements」, 「num_lfe_channel_elements」が存在する。 FIG. 6 is a diagram showing the configuration of the above PCE using a description language. Each name is followed by the number of bits and its unit (uimsbf is an unsigned integer value and bslbf is a bit string). “Element_innstance_tag” at the head is a tag number, and when there are a plurality of PCEs, they can be distinguished. “Num_front_channel_elements” represents the number of elements existing in the front channel. Similarly, “num_side_channel_elements”, “num_back_channnel_elements”, and “num_lfe_channel_elements” exist for the side channel, the back channel, and the LFE channel, respectively.

そして、そのエレメントがSCEであるかCPEであるかを区別する情報１ビットと、そのエレメントに付けられているタグ番号(element_instance_tag)と同一の情報が４ビットで追加される（例えば、「front_element_is_cpe」,「front_element_tag_select」）。LFEについてはチャンネルペアとして符号化されることはないので、SCE/CPEを区別する情報はなく、タグ番号だけが４ビットで追加される（lfe_element_tag_select）。 Then, 1 bit of information for identifying whether the element is SCE or CPE, and 4 bits of the same information as the tag number (element_instance_tag) attached to the element are added (for example, “front_element_is_cpe”) , "Front_element_tag_select"). Since LFE is not encoded as a channel pair, there is no information for distinguishing SCE / CPE, and only the tag number is added in 4 bits (lfe_element_tag_select).

図７は、本実施の形態のオーディオ信号符号化装置１０により生成されるＭＰＥＧ準拠符号化ストリームで定義されるPCEを記述言語を用いて表した図を示す。図７（Ａ）は上層部に存在する９ｃｈを記述したPCE0、同図（Ｂ）は、中層部に存在する１０ｃｈを記述したPCE1、同図（Ｃ）は下層部＋ＬＦＥに存在する３.２ｃｈを記述したPCE2の構成を記述言語を用いて示す。このように、本実施の形態によれば、PCE0、PCE1、PCE2は問題なく定義できることが分かる。 FIG. 7 is a diagram showing the PCE defined by the MPEG-compliant encoded stream generated by the audio signal encoding apparatus 10 of the present embodiment using a description language. 7A shows PCE0 describing 9ch existing in the upper layer part, FIG. 7B shows PCE1 describing 10ch existing in the middle layer part, and FIG. 7C shows 3.2ch existing in the lower layer part + LFE. The configuration of PCE2 that describes is described using a description language. Thus, according to this embodiment, it can be seen that PCE0, PCE1, and PCE2 can be defined without problems.

このように、本実施の形態では、２２.２ｃｈを有する１つのプログラムは、高さ方向に分割された３プログラムから構成されるものとして分割し、図５（Ｂ）に示すように符号化ストリームに現れるPCE0を上層用、２番目に現れるPCE1を中層用、最後に現れるPCE2を低層＋LFE用と定義する。そして、分割された２次元平面に含まれるチャンネルをＭＰＥＧ−２／４ＡＡＣ符号化方式により符号化して各プログラムを構成し、ストリーム統合部１５が全てのプログラムを統合することで、ＭＰＥＧ規格に準拠し、かつ、３次元空間に対応した符号化ストリームを構築することができる。 As described above, in this embodiment, one program having 22.2 ch is divided as being composed of three programs divided in the height direction, and an encoded stream is obtained as shown in FIG. PCE0 that appears in the upper layer is defined as the upper layer, PCE1 that appears second is the middle layer, and PCE2 that appears last is defined as the lower layer + LFE. Then, the channels included in the divided two-dimensional plane are encoded by the MPEG-2 / 4 AAC encoding method to configure each program, and the stream integration unit 15 integrates all the programs, thereby conforming to the MPEG standard. In addition, an encoded stream corresponding to a three-dimensional space can be constructed.

このように、本実施の形態によれば、３次元空間用のチャンネル配置の定義を追加することなく、３次元の音源位置を明確に指定した３次元の音場再現が可能なＭＰＥＧ−２／４ＡＡＣ規格に準拠した符号化ストリームを生成することができる。また、本実施の形態によれば、３つの２次元平面のそれぞれの符号化要素をストリーム統合して１本の符号化ストリームを生成しており、独立した符号化ストリームを多重化しないため、冗長な情報が存在せず、伝送効率の良いオーディオ信号符号化ができる。 As described above, according to the present embodiment, MPEG-2 / which can reproduce a three-dimensional sound field in which a three-dimensional sound source position is clearly specified without adding a definition of channel arrangement for a three-dimensional space. An encoded stream compliant with the 4AAC standard can be generated. In addition, according to the present embodiment, each encoded element of the three two-dimensional planes is stream-integrated to generate one encoded stream, and independent encoded streams are not multiplexed. Audio information can be encoded with good transmission efficiency.

更に、本実施の形態によれば、多重器を用いることなく、１個のオーディオ信号符号化装置１０により２２.２ｃｈのオーディオ信号の符号化を行うことができるので回路規模を比較的小規模とすることができる。更に、本実施の形態によれば、符号化ストリームから、選択した２次元平面に関するストリーム情報だけを抜き出して復号することができる。 Furthermore, according to the present embodiment, since a single audio signal encoding apparatus 10 can encode a 22.2 channel audio signal without using a multiplexer, the circuit scale can be made relatively small. can do. Furthermore, according to the present embodiment, it is possible to extract and decode only stream information relating to the selected two-dimensional plane from the encoded stream.

なお、高度ＢＳに関する答申は、電波産業会（ＡＲＩＢ:Association of Radio Industries and Businesses）より答申されたものであること、従来からARIB標準規格STD-B32において、ＭＰＥＧ規格の使用制限、厳密化を行っていることから、本実施の形態のように平面位置情報であるPCEと分割した２次元平面の符号化要素との対応を示す定義を追記することは、特に問題にならない。ＭＰＥＧ国際標準規格を修正して、日本のローカル放送方式に対応することの方が問題は大きい。 In addition, the report on advanced BS is a report from the Association of Radio Industries and Businesses (ARIB), and the ARIB standard STD-B32 has traditionally restricted and tightened the use of MPEG standards. Therefore, it is not particularly problematic to add a definition indicating the correspondence between the PCE that is the plane position information and the encoded elements of the divided two-dimensional plane as in the present embodiment. It is more problematic to modify the MPEG international standard to support Japanese local broadcasting.

なお、上記の実施の形態では、図２に示した３次元空間（幅×奥行き×高さ）を、３次元空間の高さ方向に分割（すなわち、前記のＸ−Ｙ分割）して２次元平面を得るようにしたが、分割方法はこれに限定されるものではない。例えば、図８（Ａ）〜（Ｃ）に示すように、奥行き方向に分割（Ｘ−Ｚ分割）して、３つの２次元平面（幅×高さ）を得て、各２次元平面のチャンネルのオーディオ信号毎に符号化するようにしてもよい。Ｘ−Ｚ分割の場合は、同じ奥行き（Ｙ軸）のチャンネルをまとめて一平面とする。図８（Ａ）は、３次元空間の奥行き方向の分割により、前方部を有する２次元平面のチャンネル位置（スピーカ位置）を示す。図８（Ｂ）は、中方部を有する２次元平面のチャンネル位置（スピーカ位置）、図８（Ｃ）は、後方部を有する２次元平面のチャンネル位置（スピーカ位置）を示す。 In the above embodiment, the three-dimensional space (width × depth × height) shown in FIG. 2 is divided in the height direction of the three-dimensional space (that is, the above-described XY division) to obtain a two-dimensional space. Although a plane is obtained, the dividing method is not limited to this. For example, as shown in FIGS. 8A to 8C, division in the depth direction (XZ division) is performed to obtain three two-dimensional planes (width × height), and channels of each two-dimensional plane. It may be encoded for each audio signal. In the case of XZ division, channels having the same depth (Y axis) are combined into one plane. FIG. 8A shows a channel position (speaker position) on a two-dimensional plane having a front portion by dividing the three-dimensional space in the depth direction. FIG. 8B shows a channel position (speaker position) on a two-dimensional plane having a middle part, and FIG. 8C shows a channel position (speaker position) on a two-dimensional plane having a rear part.

この例の場合、２種類のデフォルト定義が必要となる。第１のデフォルト定義は、高さ方向の分割と同様に、２２.２ｃｈを有する１番組（プログラム）は、奥行き方向に分割された３プログラムから構成されるものとし、一例として、符号化ストリームに最初に現れるPCE0を前方＋LFE用、２番目に現れるPCE1を中方用、最後に現れるPCE2を後方用と定義することである。 In this example, two types of default definitions are required. The first default definition is that, as with the division in the height direction, one program (program) having 22.2 ch is composed of three programs divided in the depth direction. PCE0 that appears first is for forward + LFE, PCE1 that appears second is for the middle, and PCE2 that appears last is for the rear.

第２の定義は、フロントチャンネル、サイドチャンネル、バックチャンネルの定義が明確でなくなるため、一例として、下層部をフロントチャンネルにより対応し、中層部をサイドチャンネルにより対応し、上層部をバックチャンネルにより対応するものと定義することである。 In the second definition, the definition of the front channel, side channel, and back channel is not clear. For example, the lower layer corresponds to the front channel, the middle layer corresponds to the side channel, and the upper layer corresponds to the back channel. It is to define what to do.

以上のように定義することで、３次元空間に配置された２２.２ｃｈの音源位置を持つオーディオ信号を、ＭＰＥＧ規格に準拠した符号化方式（ここでは、ＭＰＥＧ−２／４ＡＡＣ符号化方式）で符号化された符号化ストリームを構築することができる。このような奥行き方向の分割を行うメリットは、前方からの距離が一定で、水平角は同一であるので、違いが仰角による伝搬時間の違いだけとなり、各２次元平面内の存在する音源間の相関が高いと考えられるためである。 By defining as described above, an audio signal having a sound source position of 22.2ch arranged in a three-dimensional space is encoded by an encoding method compliant with the MPEG standard (in this case, MPEG-2 / 4 AAC encoding method). An encoded encoded stream can be constructed. The merit of such a division in the depth direction is that the distance from the front is constant and the horizontal angle is the same, so the difference is only the difference in propagation time due to the elevation angle, and between the sound sources existing in each two-dimensional plane This is because the correlation is considered high.

なお、上記のＸ−Ｙ分割やＸ−Ｚ分割以外に、Ｙ−Ｚ分割も可能である。このＹ−Ｚ分割では、同じ幅（Ｘ軸）のチャンネルをまとめて一平面とするものである。このＹ−Ｚ分割では原点からのＺ軸からの距離が同じものをフロント、サイド、リニアに分類する。なお、前記の４ビットの平面分割順序は、Ｘ−Ｚ分割の場合は、スピーカ配置において前方から後方への分割順序かその逆の順序かを示し、Ｙ−Ｚ分割の場合は、左方から右方への分割順序かその逆の順序かを予め定められた４ビットの値で示す。 In addition to the above XY division and XZ division, YZ division is also possible. In this YZ division, channels having the same width (X axis) are combined into one plane. In this YZ division, those having the same distance from the Z axis from the origin are classified into front, side, and linear. In the case of XZ division, the 4-bit plane division order indicates the division order from the front to the rear in the speaker arrangement or the reverse order, and in the case of YZ division, the left side is the left side. The division order to the right or the reverse order is indicated by a predetermined 4-bit value.

次に、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第２の例について説明する。 Next, a second example of the MPEG-compliant encoded stream generated by the audio signal encoding apparatus according to the present invention will be described.

図９は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第２の例のフォーマットを示す。図９に示す第２の例のフォーマットは、図５に示した高さ方向の分割による３次元空間用符号化ストリームに、図９（Ｂ）にPCE3で示すように５.１ｃｈ互換用PCEを追加したものである。このPCE3は、３次元空間に配置された２２.２ｃｈよりも少ないチャンネル数からなる既存のサラウンドシステムで用いられる音声と同じ音源位置にある情報だけを復号化できるようにするための平面位置情報であり、ストリーム統合部１５が平面位置情報に基づいて生成し、符号化ストリームに付加する。 FIG. 9 shows the format of the second example of the encoded stream of the MPEG-2 / 4 AAC encoding method generated by the audio signal encoding apparatus according to the present invention. The format of the second example shown in FIG. 9 is obtained by adding 5.1ch compatible PCE as shown by PCE3 in FIG. 9B to the encoded stream for three-dimensional space by dividing in the height direction shown in FIG. It is added. This PCE3 is plane position information for decoding only the information at the same sound source position as the voice used in the existing surround system having the number of channels smaller than 22.2ch arranged in the three-dimensional space. Yes, the stream integration unit 15 generates based on the plane position information and adds it to the encoded stream.

図９（Ｄ）に示す中層の符号化要素（エレメント）のうち、５.１ｃｈのセンターチャネルをＭＦＣの要素SCE3で代用し、５.１ｃｈのフロントレフト、ライトチャンネルのＭＦＬ及びＭＦＲの要素CPE4で代用する。そして、５.１ｃｈのバックレフト、ライトをＭＢＬ及びＭＢＲの要素CPE6で代用する。また、図９（Ｅ）に示す下層＋ＬＦＥの符号化要素（エレメント）のうち、５.１ｃｈのＬＦＥを要素LFE0で代用する。なお、５．１ｃｈのサラウンドシステムを例としたが、以上の様に、３次元空間に配置された２２.２ｃｈの内、一部のチャンネルのみを復号化するための情報を付加することで、５.１ｃｈだけでなく７.１ｃｈや９.１ｃｈなどの既存のサラウンドシステムにも同様にして対応することができる。 Among the coding elements (elements) in the middle layer shown in FIG. 9 (D), the 5.1ch center channel is substituted with the MFC element SCE3, the 5.1ch front left, the right channel MFL and the MFR element CPE4. to substitute. Then, the 5.1ch backleft and right are substituted by the MBL and MBR element CPE6. Further, among the lower layer + LFE encoding elements (elements) shown in FIG. 9E, the 5.1ch LFE is substituted with the element LFE0. In addition, although the 5.1ch surround system was taken as an example, as described above, by adding information for decoding only some of the 22.2ch arranged in the three-dimensional space, In addition to 5.1ch, existing surround systems such as 7.1ch and 9.1ch can be similarly handled.

このようにして、第２の例のフォーマットのＭＰＥＧ準拠の符号化ストリームを復号化した際、５.１ｃｈサラウンドシステムでの再生を可能とするため、復号化した２２.２ｃｈ信号をダウンミックスして５.１ｃｈ信号を生成するのではなく、復号化の時点で５.１ｃｈに対応したストリーム部分（斜線部）だけを復号化して、５.１ｃｈ信号を生成することが可能となる。 In this way, when the MPEG-compliant encoded stream in the format of the second example is decoded, the decoded 22.2ch signal is downmixed to enable playback in the 5.1ch surround system. Instead of generating a 5.1ch signal, it is possible to generate a 5.1ch signal by decoding only the stream portion (hatched portion) corresponding to 5.1ch at the time of decoding.

また、この第２の例のフォーマットのＭＰＥＧ準拠の符号化ストリームでは、中層と下層＋LFEに復号すべきエレメントが及んでいるので、デコード処理の最適化を図ることからも、中層においてLFEも処理するように中層のチャンネル数を１０.１ｃｈとして符号化ストリームを構成しておくことも可能である。また、５.１ｃｈ出力に関係するエレメントだけを１プログラムとして定義した符号化ストリーム構成としてもよい。 In addition, in the MPEG-compliant encoded stream of the format of the second example, the elements to be decoded reach the middle layer and the lower layer + LFE, so that the LFE is also processed in the middle layer in order to optimize the decoding process. As described above, the encoded stream can be configured with the number of channels in the middle layer set to 10.1 ch. Moreover, it is good also as an encoding stream structure which defined only the element relevant to 5.1ch output as one program.

図１０は、図９（Ｂ）にPCE3として示した５.１ｃｈ互換用PCEにおける設定例を示す。この５.１ｃｈ互換用PCEは、従来知られている５.１ｃｈ用PCEと比較して対応するエレメントのタグ番号だけが異なる。 FIG. 10 shows a setting example in the 5.1ch compatible PCE shown as PCE3 in FIG. The 5.1ch compatible PCE differs from the conventionally known 5.1ch PCE only in the tag number of the corresponding element.

次に、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第３の例について説明する。 Next, a third example of the MPEG-compliant encoded stream generated by the audio signal encoding apparatus according to the present invention will be described.

図１１は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第３の例のフォーマットを示す。図１１に示す第３の例のフォーマットは、本発明になるオーディオ信号符号化装置により図２に示した２２.２ｃｈのスピーカ配置で再生される２２.２ｃｈのオーディオ信号をＡＡＣ＋ＳＢＲ符号化方式により符号化して構築した符号化ストリームのフォーマットである。 FIG. 11 shows the format of the third example of the MPEG-compliant encoded stream generated by the audio signal encoding apparatus according to the present invention. The format of the third example shown in FIG. 11 is obtained by encoding the 22.2 channel audio signal reproduced by the audio signal encoding apparatus according to the present invention with the 22.2 channel speaker arrangement shown in FIG. 2 using the AAC + SBR encoding method. This is a format of an encoded stream constructed by converting into a format.

ＳＢＲ（Spectral Band Replication）は、ＡＡＣのオプションツールとしてＭＰＥＧにて２００３年に追加された技術である。ＡＡＣ＋ＳＢＲ符号化方式は、通常のＡＡＣ符号化では高音質の実現が困難な低ビットレートにおいて、１／２サンプリング周波数を用いてＡＡＣ規格で符号化し、それにより破棄される高域成分については、低域成分から予測復元される高域成分の推定値と原信号の高域成分との差を基に補完情報を生成し、ＡＡＣ符号化ストリーム中のＦＩＬに多重するものである。 SBR (Spectral Band Replication) is a technology added in 2003 as an AAC option tool in MPEG. The AAC + SBR encoding method uses the AAC standard with a 1/2 sampling frequency at a low bit rate, where high sound quality is difficult to achieve with normal AAC encoding. Complement information is generated based on the difference between the estimated value of the high frequency component that is predicted and restored from the frequency component and the high frequency component of the original signal, and is multiplexed on the FIL in the AAC encoded stream.

図１１に示すフォーマットでは、図５に示した第１の例のフォーマットと同様に、２２.２ｃｈを有する１番組は、高さ方向に分割された３プログラムから構成されるものとして分割を行い、図１１（Ｂ）に示すようにストリームに最初に現れるPCE0を上層用、２番目に現れるPCE1を中層用、最後に現れるPCE2を低層＋LFE用と定義する。そして、このフォーマットでは、図１１（Ｃ）〜（Ｅ）に示すように「raw_data_block」中に上層、中層、下層部＋ＬＦＥに含まれるメインオーディオ用のエレメントSCE、CPEを持ち、各エレメントSCE、CPEの後ろにＳＢＲ情報を含むFIL SBRを連続させている。 In the format shown in FIG. 11, similarly to the format of the first example shown in FIG. 5, one program having 22.2 ch is divided as being composed of three programs divided in the height direction, As shown in FIG. 11B, PCE0 that appears first in the stream is defined as the upper layer, PCE1 that appears second is defined as the middle layer, and PCE2 that appears last is defined as the low layer + LFE. In this format, as shown in FIGS. 11C to 11E, “raw_data_block” includes main audio elements SCE and CPE included in the upper layer, the middle layer, and the lower layer part + LFE, and each element SCE and CPE The FIL SBR including the SBR information is continued behind.

上記のＡＡＣ＋ＳＢＲ符号化は、図１に示した平面符号化部１２〜１４により行う。このようにして、図１１に示すＭＰＥＧ規格に準拠した３次元空間用ＡＡＣ＋ＳＢＲ符号化ストリームを構築することができる。 The AAC + SBR encoding is performed by the plane encoding units 12 to 14 illustrated in FIG. In this way, it is possible to construct a three-dimensional space AAC + SBR encoded stream compliant with the MPEG standard shown in FIG.

次に、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第４の例について説明する。 Next, a fourth example of the MPEG-compliant encoded stream generated by the audio signal encoding apparatus according to the present invention will be described.

図１２は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第４の例のフォーマットを示す。図１２に示す第４の例のフォーマットは、本発明になるオーディオ信号符号化装置により図２に示した２２.２ｃｈのスピーカ配置で再生される２２.２ｃｈのオーディオ信号と、既存の５.１ｃｈのサラウンドオーディオ信号との互換再生が可能なように符号化して構築した符号化ストリームのフォーマットである。 FIG. 12 shows the format of the fourth example of the encoded stream of the MPEG-2 / 4 AAC encoding method generated by the audio signal encoding apparatus according to the present invention. The format of the fourth example shown in FIG. 12 includes the 22.2 ch audio signal reproduced by the audio signal encoding apparatus according to the present invention and the 22.2 ch speaker arrangement shown in FIG. 2, and the existing 5.1 ch. This is a format of an encoded stream constructed by encoding so as to be compatible with the surround audio signal.

この図１２に示すフォーマットは、図９に示したフォーマットと同様に、図５に示した高さ方向の分割による３次元空間用符号化ストリームに、図１２（Ｂ）に示すように５.１ｃｈ互換用PCE3を追加した点で図９に示したフォーマットと類似している。しかし、この図１２に示すフォーマットは、図１２（Ｄ）に示すように、中層の符号化要素（エレメント）として、５.１ｃｈのフロントチャネルのＭＦＣの要素SCE3、ミドルチャネルのＭＦＬ及びＭＦＲの要素CPE4、及びバックチャネルのＭＢＬ及びＭＢＲの要素CPE6だけでなく、５.１ｃｈのＬＦＥの要素LFE0も含む点で図９に示したフォーマットと異なる。 The format shown in FIG. 12 is similar to the format shown in FIG. 9 in the encoded stream for three-dimensional space by the height direction division shown in FIG. 5, and 5.1ch as shown in FIG. It is similar to the format shown in FIG. 9 in that a compatible PCE3 is added. However, as shown in FIG. 12D, the format shown in FIG. 12 includes 5.1ch front channel MFC element SCE3, middle channel MFL and MFR elements as a middle layer encoding element (element). 9 is different from the format shown in FIG. 9 in that it includes not only CPE4 and backchannel MBL and MBR element CPE6 but also 5.1ch LFE element LFE0.

図９に示したフォーマットの場合は、CPE6をデコードした後、SCE5及びCPE7を読み捨ててLFE0のデコードを行う必要があったのに対し、この図１２に示すフォーマットでは、CPE6をデコードした後、直ちにLFE0のデコードを行うことができる。 In the case of the format shown in FIG. 9, after decoding CPE6, SCE5 and CPE7 had to be read and discarded to decode LFE0, whereas in the format shown in FIG. 12, immediately after decoding CPE6, LFE0 can be decoded.

次に、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ準拠の符号化ストリームの第５の例について説明する。 Next, a fifth example of the MPEG-compliant encoded stream generated by the audio signal encoding device according to the present invention will be described.

図１３は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第５の例のフォーマットを示す。図１３に示す第５の例のフォーマットは、本発明になるオーディオ信号符号化装置により図２に示した２２.２ｃｈのスピーカ配置で再生される２２.２ｃｈのオーディオ信号と、既存の５.１ｃｈのサラウンドオーディオ信号との互換再生が可能なように符号化して構築した符号化ストリームのフォーマットである点は、図９及び図１２に示したフォーマットと同様であるが、このフォーマットは５.１ｃｈに関係するエレメントだけを１プログラムとして定義したものである。 FIG. 13 shows the format of a fifth example of the encoded stream of the MPEG-2 / 4 AAC encoding method generated by the audio signal encoding device according to the present invention. The format of the fifth example shown in FIG. 13 includes the 22.2 ch audio signal reproduced by the audio signal encoding apparatus according to the present invention and the 22.2 ch speaker arrangement shown in FIG. 2, and the existing 5.1 ch. The format of the encoded stream constructed by encoding so as to be compatible with the surround audio signal is the same as the format shown in FIGS. 9 and 12, but this format is 5.1 ch. Only relevant elements are defined as one program.

すなわち、図１３（Ｂ）に示すように、この例の符号化ストリームのフォーマットは、符号化ストリームに最初に現れるエレメントPCE0を５.１ｃｈ用とし、２番目に現れるエレメントPCE1を高さ方向に分割したときの上層用、３番目に現れるエレメントPCE２を高さ方向に分割したときの中層用、４番目に現れるエレメントPCE3を高さ方向に分割したときの下層＋LFE用と定義する。上記のPCE0は、図９（Ｂ）に示したPCE3と同様、５.１ｃｈ、７.１ｃｈ、９.１ｃｈなどの既存のサラウンド音声と同じ音源位置にある情報だけを復号化できるようにするための平面位置情報である。ストリーム統合部１５が平面位置情報に基づいて上記のPCEを生成し、符号化ストリームに統合する。 That is, as shown in FIG. 13B, in the encoded stream format of this example, the element PCE0 that appears first in the encoded stream is for 5.1ch, and the element PCE1 that appears second is divided in the height direction. For the upper layer, when the element PCE2 that appears third is divided in the height direction, for the middle layer when the element PCE2 appears fourth, and for the lower layer + LFE when the element PCE3 that appears fourth is divided in the height direction. The above PCE0 is similar to the PCE3 shown in FIG. 9B, so that only information in the same sound source position as the existing surround sound such as 5.1ch, 7.1ch, 9.1ch, etc. can be decoded. Is the plane position information. The stream integration unit 15 generates the PCE based on the plane position information and integrates it into the encoded stream.

更に、図１３（Ｂ）に示すように、このフォーマットでは、エレメントPCE3に続いて、５.１ｃｈの情報「5.1ch compatible_layer」を配置し、続いて上層の情報「upper_layer」、中層の情報「middle_layer」、及び下層＋ＬＦＥの情報「lower+LFE_layer」を順番に配置した点に特徴がある。５.１ｃｈの情報「5.1ch compatible_layer」は、図１３（Ｃ）に示される。ストリーム統合部が、各符号化要素を基に「5.1ch compatible_layer」を符号化ストリームに付加する。 Furthermore, as shown in FIG. 13B, in this format, 5.1ch information “5.1ch compatible_layer” is arranged after element PCE3, followed by upper layer information “upper_layer” and middle layer information “middle_layer”. ”And information“ lower + LFE_layer ”of the lower layer + LFE are arranged in order. The 5.1ch information “5.1ch compatible_layer” is illustrated in FIG. The stream integration unit adds “5.1ch compatible_layer” to the encoded stream based on each encoded element.

このフォーマットの符号化ストリームを復号化する際に、５.１ｃｈ再生を選択した場合は、図１３（Ｂ）に示すPCE0のデコードに続いて、５.１ｃｈの情報「5.1ch compatible_layer」だけをデコードし、続く２次元平面の上層の情報「upper_layer」、中層の情報「middle_layer」、及び下層＋ＬＦＥの情報「lower+LFE_layer」は読み飛ばすことで、図９や図１２に示したフォーマットの符号化ストリームをデコードする場合に比べて、再生処理が高速化できる。 If 5.1ch playback is selected when decoding an encoded stream of this format, only 5.1ch information “5.1ch compatible_layer” is decoded following the decoding of PCE0 shown in FIG. Then, the upper layer information “upper_layer”, the middle layer information “middle_layer”, and the lower layer + LFE information “lower + LFE_layer” are skipped and the encoded stream of the format shown in FIG. 9 or FIG. The playback process can be speeded up as compared with the case of decoding.

図１４は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第６の例のフォーマットを示す。図１４に示す第６の例のフォーマットは、図２に示した２２.２ｃｈのスピーカ配置で再生される２２.２ｃｈのオーディオ信号と、２２.２ｃｈのオーディオ信号を５.１ｃｈにダウンミックスした信号を同時に本発明になるオーディオ信号符号化装置に入力し、それぞれを符号化して、一本の符号化ストリームにしたものである。 FIG. 14 shows the format of a sixth example of the encoded stream of the MPEG-2 / 4 AAC encoding method generated by the audio signal encoding device according to the present invention. The format of the sixth example shown in FIG. 14 is a signal obtained by downmixing a 22.2 channel audio signal reproduced with the 22.2 channel speaker arrangement shown in FIG. 2 and a 22.2 channel audio signal to 5.1 channel. Are simultaneously input to the audio signal encoding apparatus according to the present invention, and each is encoded into one encoded stream.

この第６の例のフォーマットの符号化ストリームを生成する本発明になるオーディオ信号符号化装置は、図１に示した平面符号化部１２〜１４に、更に一つの平面符号化部を追加した構成である。これにより、２２.２ｃｈのオーディオ信号については、図１に示した平面符号化部１２〜１４が、ＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第１の例を生成する場合と同様に、図３（Ａ）〜（Ｃ）に示すような３つの２次元平面に分割して符号化する。一方、５.１ｃｈダウンミックス信号については、新たに追加された一つの平面符号化部が新たな１プログラムとして符号化し、符号化要素を生成する。そして、図１のストリーム統合部１５において、２２.２ｃｈのオーディオ信号と、５.１ｃｈダウンミックス信号に関する、全ての符号化要素と全ての平面位置情報とを統合して、ＭＰＥＧ準拠の符号化ストリームを構築する。 The audio signal encoding apparatus according to the present invention for generating the encoded stream of the format of the sixth example has a configuration in which one plane encoding unit is further added to the plane encoding units 12 to 14 shown in FIG. It is. As a result, the 22.2ch audio signal is the same as when the plane encoding units 12 to 14 shown in FIG. 1 generate the first example of the encoded stream of the MPEG-2 / 4 AAC encoding method. Then, the data is divided into three two-dimensional planes as shown in FIGS. On the other hand, for the 5.1ch downmix signal, one newly added plane encoding unit encodes it as a new program, and generates an encoding element. 1 integrates all the encoding elements and all the plane position information regarding the 22.2ch audio signal and the 5.1ch downmix signal in the stream integration unit 15 in FIG. Build up.

また、図１４（Ｂ）に示すように、この例の符号化ストリームのフォーマットは、符号化ストリームに最初に現れるエレメントPCE0を５.１ｃｈダウンミックス信号用とし、２番目に現れるエレメントPCE1を高さ方向に分割したときの上層用、３番目に現れるエレメントPCE２を高さ方向に分割したときの中層用、４番目に現れるエレメントPCE3を高さ方向に分割したときの下層＋LFE用と定義している。 Also, as shown in FIG. 14B, the encoded stream format in this example is such that the element PCE0 that appears first in the encoded stream is used for the 5.1ch downmix signal, and the element PCE1 that appears second is the height. It is defined as the upper layer when divided in the direction, the middle layer when the element PCE2 that appears third is divided in the height direction, and the lower layer + LFE when the element PCE3 that appears fourth is divided in the height direction .

更に、図１４（Ｂ）に示すように、このフォーマットでは、エレメントPCE3に続いて、５.１ｃｈダウンミックス信号の情報「5.1ch downmix_layer」を配置し、続いて上層の情報「upper_layer」、中層の情報「middle_layer」、及び下層＋ＬＦＥの情報「lower+LFE_layer」を順番に配置した点に特徴がある。５.１ｃｈダウンミックス信号の情報「5.1ch downmix_layer」は、図１４（Ｃ）に示される。 Furthermore, as shown in FIG. 14 (B), in this format, 5.1ch downmix signal information “5.1ch downmix_layer” is arranged subsequent to element PCE3, followed by upper layer information “upper_layer”, middle layer information Information “middle_layer” and lower layer + LFE information “lower + LFE_layer” are arranged in order. The information “5.1ch downmix_layer” of the 5.1ch downmix signal is shown in FIG.

このフォーマットの符号化ストリームを復号化する際に、５.１ｃｈ再生を選択した場合は、図１４（Ｂ）に示すPCE0のデコードに続いて、５.１ｃｈダウンミックス信号の情報「5.1ch downmix_layer」だけをデコードし、続く２次元平面の上層の情報「upper_layer」、中層の情報「middle_layer」、及び下層＋ＬＦＥの情報「lower+LFE_layer」は読み飛ばすことで、図９や図１２に示したフォーマットの符号化ストリームをデコードする場合に比べて、再生処理が高速化できる。また、予め２２.２ｃｈ信号とは別に、５.１ｃｈ信号を生成しているため、この例のフォーマットの符号化ストリームを復号化した場合は、ダウンミックス係数によりデジタルデータ的に加算する場合に比べて、コンテンツの音質及び、音楽表現は向上する。 If 5.1ch playback is selected when decoding an encoded stream of this format, 5.1ch downmix signal information “5.1ch downmix_layer” follows PCE0 decoding shown in FIG. 14B. Only the upper layer information “upper_layer”, the middle layer information “middle_layer”, and the lower layer + LFE information “lower + LFE_layer” are skipped, and the format shown in FIG. 9 and FIG. The playback process can be speeded up as compared to the case of decoding the encoded stream. In addition, since the 5.1ch signal is generated separately from the 22.2ch signal in advance, when the encoded stream of the format of this example is decoded, it is compared with the case of adding digital data with the downmix coefficient. Thus, the sound quality and music expression of the content are improved.

表４は、表１に５.１ｃｈダウンミックス信号に関する情報を追加したものである。表４は、５.１ｃｈダウンミックス信号の３次元チャンネル配置情報と各チャンネルと図３（Ａ）〜（Ｃ）及び図４（Ａ）〜（Ｃ）に示す対応要素（スピーカ位置）との関係を示している。ここでは、５.１ｃｈダウンミックス信号は、中層部に位置するものとしている。また、２２.２ｃｈのオーディオ信号と、５.１ｃｈダウンミックス信号との区別をするために、ダウンミックス（downmix）識別情報が追加されている。

Table 4 is obtained by adding information on the 5.1ch downmix signal to Table 1. Table 4 shows the relationship between the three-dimensional channel arrangement information of the 5.1ch downmix signal, each channel, and the corresponding elements (speaker positions) shown in FIGS. 3 (A) to 3 (C) and FIGS. 4 (A) to 4 (C). Is shown. Here, the 5.1ch downmix signal is assumed to be located in the middle layer. Also, downmix identification information is added to distinguish between a 22.2 channel audio signal and a 5.1 channel downmix signal.

表５は、表２に５.１ｃｈダウンミックス信号に関する情報を追加したものである。平面番号「１」として中層部の２次元平面を示し、そのチャンネル総数は、「６」である。また、ＰＣＥのタグ番号（element_instance_tag）は「０」としている。３次元空間を２次元平面に分割した際の平面総数に応じてＰＣＥの個数は可変し、ＰＣＥに付随するタグ番号も変化するので、５.１ｃｈダウンミックス信号用のタグ番号は常に「０」を使うようにしてもよい。

Table 5 is obtained by adding information on the 5.1ch downmix signal to Table 2. The plane number “1” indicates the two-dimensional plane of the middle layer, and the total number of channels is “6”. The tag number (element_instance_tag) of PCE is “0”. Since the number of PCEs varies according to the total number of planes when the three-dimensional space is divided into two-dimensional planes, and the tag number associated with the PCE also changes, the tag number for the 5.1ch downmix signal is always “0”. May be used.

表６は、表３に５.１ｃｈダウンミックス信号に関するチャンネルマッピング情報を追加したものである。downmix識別情報を追加して、５.１ｃｈダウンミックス信号であるｃｈ２４〜ｃｈ２９にのみ、「１」が立つようにしている。また、ｃｈ２４〜ｃｈ２９は、平面番号「１」として中層部の２次元平面を示し、それぞれのチャンネルは、既存の５.１ｃｈサラウンドの配置同様に、フロントチャンネルで単独チャンネルを示す「front single 識別」に割り振られるチャンネル（ｃｈ２４）と、フロントチャンネルでペアチャンネルを示す「front pair識別」に割り振られるチャンネル（ｃｈ２５、２６）と、バックチャンネルでペアチャンネルを示す「back pair識別」に割り振られるチャンネル（ｃｈ２７、２８）と、ＬＦＥチャンネルで単独チャンネルを示す「LFE single 識別」に割り振られるチャンネル（ｃｈ２９）より構成されている。

Table 6 is obtained by adding channel mapping information related to the 5.1ch downmix signal to Table 3. Downmix identification information is added so that “1” stands only in ch24 to ch29 which are 5.1 ch downmix signals. In addition, ch24 to ch29 indicate a two-dimensional plane of the middle layer portion as a plane number “1”, and each channel indicates “front single identification” indicating a single channel as a front channel as in the existing 5.1ch surround arrangement. Channel (ch24) allocated to the channel, channels (ch25, 26) allocated to "front pair identification" indicating a pair channel in the front channel, and channels (ch27) allocated to "back pair identification" indicating the pair channel in the back channel. , 28) and a channel (ch29) allocated to “LFE single identification” indicating a single channel in the LFE channel.

そして、図１に新たに追加された前記一つの平面符号化部において、「front single 識別」に割り振られたチャンネルは、ＳＣＥとして符号化され、「front pair識別」、及び「back pair識別」に割り振られたチャンネルは、それぞれＣＰＥとして符号化され、「LFE single 識別」に割り振られたチャンネルは、ＬＦＥとして符号化される。 Then, in the one plane encoding unit newly added to FIG. 1, the channel allocated to “front single identification” is encoded as SCE, and is converted into “front pair identification” and “back pair identification”. The allocated channels are each encoded as CPE, and the channels allocated to “LFE single identification” are encoded as LFE.

図１５は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第７の例のフォーマットを示す。図１５に示す第７の例のフォーマットは、本発明になるオーディオ信号符号化装置により図２に示した２２.２ｃｈのスピーカ配置で再生される２２.２ｃｈのオーディオ信号を、５.１ｃｈダウンミックス信号に変換するための変換係数を伴ったフォーマットである。 FIG. 15 shows the format of a seventh example of the encoded stream of the MPEG-2 / 4 AAC encoding method generated by the audio signal encoding device according to the present invention. The format of the seventh example shown in FIG. 15 is a 5.1ch downmix of the 22.2ch audio signal reproduced by the audio signal encoding apparatus according to the present invention and the 22.2ch speaker arrangement shown in FIG. It is a format with a conversion coefficient for converting into a signal.

すなわち、図１５（Ｂ）に示すように、この例の符号化ストリームのフォーマットは、符号化ストリームに最初に現れるエレメントPCE0を上層用、２番目に現れるエレメントPCE1を中層用、３番目に現れるエレメントPCE2を下層＋LFE用と定義し、続いて、DSE（データストリームエレメント）として、DSE0を定義する。このDSE0には５.１ｃｈダウンミックス信号に変換するのに必要な各チャンネルに対する変換係数を記述する。 That is, as shown in FIG. 15B, the format of the encoded stream of this example is that the element PCE0 that appears first in the encoded stream is for the upper layer, the element PCE1 that appears second is for the middle layer, and the element that appears third PCE2 is defined for lower layer + LFE, and then DSE0 is defined as DSE (data stream element). In DSE0, conversion coefficients for each channel necessary for conversion into a 5.1ch downmix signal are described.

図１６は、上記のDSE0の構成を、図６と同様に記述言語を用いて示した図である。図１６において、各名称の後ろには、ビット数とその単位（uimsbfは符号無し整数値）を表している。先頭にある「element_instance_tag」はタグ番号であり、複数のDSEが存在した場合に、それらを区別することができる。「data_byte_align_flag」は、DSE内でバイトアラインが成されたかを示すフラグである。「count」にはデータ長が入る。「esc_count」はデータ長２５５以上を表現するためのものである。「data_stream_byte」にはデータ長分の実データが含められる。以上から、DSE0には実データに関するフォーマットの制約が無いため自由な記述が可能であり、この部分にダウンミックス用変換係数を記述することで、復号化の終わりにそれらの係数を用いてダウンミックスを実行することができる。 FIG. 16 is a diagram showing the configuration of the above DSE0 using a description language as in FIG. In FIG. 16, the number of bits and the unit (uimsbf is an unsigned integer value) are shown after each name. “Element_instance_tag” at the head is a tag number, and when there are a plurality of DSEs, they can be distinguished. “Data_byte_align_flag” is a flag indicating whether byte alignment has been performed in the DSE. “Count” contains the data length. “Esc_count” is for expressing a data length of 255 or more. “Data_stream_byte” includes actual data for the data length. From the above, DSE0 can be described freely because there are no restrictions on the format of the actual data. By describing the downmix transform coefficients in this part, downmix can be performed using these coefficients at the end of decoding. Can be executed.

次に、本発明になるオーディオ信号復号化装置について説明する。 Next, an audio signal decoding apparatus according to the present invention will be described.

図１７は、本発明になるオーディオ信号復号化装置の第１の実施の形態のブロック図を示す。同図に示すように、本実施の形態のオーディオ信号復号化装置２０は、ストリーム分離部２１と、３つの平面復号化部２２、２３及び２４と、３次元空間合成部２５とから構成され、外部から図５、図９、図１１、図１２又は図１３に示すフォーマットのＭＰＥＧ準拠の符号化ストリームを入力信号として受け、その符号化ストリームを復号化して３次元チャンネル配置情報とＮｃｈオーディオ信号とを出力する。 FIG. 17 shows a block diagram of a first embodiment of an audio signal decoding apparatus according to the present invention. As shown in the figure, the audio signal decoding apparatus 20 according to the present embodiment includes a stream separation unit 21, three plane decoding units 22, 23 and 24, and a three-dimensional space synthesis unit 25. An MPEG-compliant encoded stream in the format shown in FIG. 5, FIG. 9, FIG. 11, FIG. 12 or FIG. 13 is received as an input signal from the outside, and the encoded stream is decoded to obtain three-dimensional channel arrangement information and Nch audio signal Is output.

ストリーム分離部２１は、入力された上記のＭＰＥＧ準拠の符号化ストリームから３つの２次元平面それぞれの符号化要素と平面位置情報とを分離して、５.１ｃｈ互換用以外の３つの平面位置情報(PCE)は３次元空間合成部２５にそれぞれ供給し、３つの符号化要素は２次元平面毎に設けられた平面復号化部２２、２３及び２４に別々に供給する。 The stream separation unit 21 separates the encoding elements and the plane position information of each of the three two-dimensional planes from the input MPEG-compliant encoded stream, and three plane position information other than for 5.1ch compatibility. (PCE) is supplied to the three-dimensional space synthesis unit 25, and the three coding elements are separately supplied to the plane decoding units 22, 23, and 24 provided for each two-dimensional plane.

平面復号化部２２、２３及び２４は、図１に示した平面符号化部１２、１３、１４から出力された符号化要素と同じ符号化要素をストリーム分離部２１から供給され、入力された符号化要素を復号化して、その符号化要素が示す２次元平面の各スピーカ位置に対応したチャンネルのオーディオ信号を出力する。 The plane decoding units 22, 23, and 24 are supplied with the same encoding elements as the encoding elements output from the plane encoding units 12, 13, and 14 shown in FIG. The encoding element is decoded, and an audio signal of a channel corresponding to each speaker position on the two-dimensional plane indicated by the encoding element is output.

例えば、図５に示したフォーマットのＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリーム入力時は、平面復号化部２２は図５（Ｃ）に示した上層の情報「upper_layer」の符号化要素を復号化し、図３（Ａ）に示した上層の９ｃｈのオーディオ信号を対応するチャンネルに出力する。また、平面復号化部２３は図５（Ｄ）に示した中層の情報「middle_layer」の符号化要素を復号化し、図３（Ｂ）に示した中層の１０ｃｈのオーディオ信号を対応するチャンネルに出力する。更に、平面復号化部２４は図５（Ｅ）に示した下層及びＬＦＥの情報「lower+LFE_layer」の符号化要素を復号化し、図３（Ｃ）に示した下層及びＬＦＥの３.２ｃｈのオーディオ信号を対応するチャンネルに出力する。 For example, when an encoded stream of the MPEG-2 / 4 AAC encoding method having the format shown in FIG. 5 is input, the plane decoding unit 22 encodes the upper layer information “upper_layer” shown in FIG. Is output, and the upper 9-channel audio signal shown in FIG. 3A is output to the corresponding channel. Also, the plane decoding unit 23 decodes the encoding element of the middle layer information “middle_layer” shown in FIG. 5D and outputs the middle layer 10ch audio signal shown in FIG. 3B to the corresponding channel. To do. Further, the plane decoding unit 24 decodes the encoding element of the lower layer and LFE information “lower + LFE_layer” shown in FIG. 5E, and the lower layer and LFE of 3.2 ch shown in FIG. Output the audio signal to the corresponding channel.

一方、３次元空間合成部２５は、平面復号化部２１から入力される平面位置情報(PCE)から平面分割の種類と個数、２次元平面におけるフロント、サイド、リアの各チャンネル配置を識別した後、これらのチャンネル配置と２次元平面内のチャンネルとの対応をとり、各チャンネルの位置情報をリスニングポイントである原点座標（０，０，０）からの距離で表現された３次元チャンネル配置情報（ｘ,ｙ,ｚ)を出力する。 On the other hand, the three-dimensional space synthesis unit 25 identifies the type and number of plane divisions from the plane position information (PCE) input from the plane decoding unit 21 and the front, side, and rear channel arrangements in the two-dimensional plane. The three-dimensional channel arrangement information (corresponding to the channel arrangement and the channel in the two-dimensional plane, and the position information of each channel expressed by the distance from the origin coordinates (0, 0, 0) as the listening point) x, y, z) is output.

このようにして、平面復号化部２２、２３及び２４から復号化された全部で２２.２ｃｈ（＝Ｎｃｈ）のオーディオ信号と、３次元空間合成部２５から出力された３次元チャンネル配置情報とを出力することにより、各チャンネルの３次元空間内の位置を明確にすることができ、図示しない対応する各チャンネルのスピーカによって立体音場を再生させることができる。このように、本実施の形態によれば、ＭＰＥＧ−２／４ＡＡＣ規格に準拠した符号化ストリームを復号化して、２２.２ｃｈの各チャンネルの音源位置を明確に指定した立体音場の再生ができる。 In this way, a total of 22.2 ch (= Nch) audio signals decoded from the plane decoding units 22, 23, and 24 and the three-dimensional channel arrangement information output from the three-dimensional space synthesis unit 25 are used. By outputting, the position of each channel in the three-dimensional space can be clarified, and a three-dimensional sound field can be reproduced by a speaker of each corresponding channel (not shown). As described above, according to the present embodiment, an encoded stream compliant with the MPEG-2 / 4 AAC standard is decoded to reproduce a three-dimensional sound field in which the sound source position of each 22.2ch channel is clearly specified. it can.

また、図１４に示すフォーマットのＭＰＥＧ準拠の符号化ストリームを復号化する本発明のオーディオ信号復号化装置は、図１７に示した本発明のオーディオ信号復号化装置２０内に、平面復号化部２２〜２４に並列に更に一つの平面復号化部を新たに追加すると共に、ストリーム分離部２１により前記３つの２次元平面それぞれの符号化要素と平面位置情報とを分離すると共に、５.１ｃｈダウンミックス信号に関する符号化要素と平面位置情報も分離できるようにする。 Further, the audio signal decoding apparatus of the present invention for decoding the MPEG-compliant encoded stream of the format shown in FIG. 14 is included in the plane decoding unit 22 in the audio signal decoding apparatus 20 of the present invention shown in FIG. In addition, one plane decoding unit is newly added in parallel to 24 to 24, and the stream separation unit 21 separates the encoding elements and plane position information of each of the three two-dimensional planes, and 5.1ch downmix. It is also possible to separate the coding element and the plane position information relating to the signal.

これにより、２２.２ｃｈのオーディオ信号に対応する３つの符号化要素は２次元平面毎に設けられた平面復号化部２２、２３及び２４により復号化される。また、５.１ｃｈダウンミックス信号に関する符号化要素は、上記の新たに追加された平面復号化部により復号化されて、オーディオ信号として出力される。また、３次元空間合成部２５は、２２．２ｃｈのオーディオ信号に関する３つの平面位置情報と５.１ｃｈダウンミックス信号に関する平面位置情報から、３次元チャンネル配置情報（ｘ,ｙ,ｚ)を出力する。 Thereby, the three encoding elements corresponding to the audio signal of 22.2ch are decoded by the plane decoding units 22, 23 and 24 provided for each two-dimensional plane. Also, the encoding element related to the 5.1ch downmix signal is decoded by the newly added plane decoding unit and output as an audio signal. The three-dimensional space synthesis unit 25 outputs three-dimensional channel arrangement information (x, y, z) from the three plane position information related to the 22.2ch audio signal and the plane position information related to the 5.1ch downmix signal. .

図１８は、本発明になるオーディオ信号復号化装置の第２の実施の形態のブロック図を示す。同図中、図１７と同一構成部分には同一符号を付し、その説明を省略する。同図に示すように、本実施の形態のオーディオ信号復号化装置３０は、ストリーム分離部３１と、３つの平面復号化部２２、２３及び２４と、３次元空間合成部２５と、ダウンミックス部３２とから構成される。このオーディオ信号復号化装置３０は、外部から図１５に示すフォーマットのＭＰＥＧ準拠の符号化ストリームを入力信号として受け、その符号化ストリームを復号化して３次元チャンネル配置情報とＮｃｈオーディオ信号を生成し、そして、外部より入力されたダウンミックス選択フラグに応じて、Ｎｃｈオーディオ信号を出力するか、または、ダウンミックスされたオーディオ信号を出力する。 FIG. 18 is a block diagram showing a second embodiment of the audio signal decoding apparatus according to the present invention. In the figure, the same components as those in FIG. As shown in the figure, the audio signal decoding apparatus 30 according to the present embodiment includes a stream separation unit 31, three plane decoding units 22, 23 and 24, a three-dimensional space synthesis unit 25, and a downmix unit. 32. The audio signal decoding apparatus 30 receives an MPEG-compliant encoded stream of the format shown in FIG. 15 from the outside as an input signal, decodes the encoded stream, generates three-dimensional channel arrangement information and an Nch audio signal, Then, according to a downmix selection flag input from the outside, an Nch audio signal is output or a downmixed audio signal is output.

図１７に示したオーディオ信号復号化装置２０との相違点は、オーディオ信号復号化装置３０は、ストリーム分離部３１において、３つの平面位置情報及び符号化要素と共に変換係数情報を分離する点と、外部よりダウンミックス選択フラグの入力を伴ったダウンミックス部３２を有している点である。以下では、その相違点について詳細に説明する。 17 is different from the audio signal decoding device 20 shown in FIG. 17 in that the audio signal decoding device 30 separates transform coefficient information together with three plane position information and coding elements in the stream separation unit 31. The downmix unit 32 is accompanied by an input of a downmix selection flag from the outside. Below, the difference is demonstrated in detail.

例えば、図１５に示したフォーマットのＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリーム入力時は、オーディオ信号復号化装置３０はストリーム分離部３１において図１５（Ｂ）に示したDSE0を分離し、５.１ｃｈダウンミックス信号に変換するのに必要な各チャンネルに対する変換係数情報を抽出し、ダウンミックス部３２に渡す。 For example, when an encoded stream of the MPEG-2 / 4 AAC encoding method having the format shown in FIG. 15 is input, the audio signal decoding apparatus 30 separates DSE0 shown in FIG. Conversion coefficient information for each channel necessary for conversion into a 5.1ch downmix signal is extracted and passed to the downmix unit 32.

ダウンミックス部３２は、３次元空間合成部２５からの３次元チャンネル配置情報と、平面復号化部２２、２３及び２４からの全部でＮｃｈの復号オーディオ信号と、上記変換係数情報と、ダウンミックス選択フラグとを入力として受け、ダウンミックス選択フラグに応じて、Ｎｃｈオーディオ信号、またはダウンミックスされたオーディオ信号を出力する。上記の３次元チャンネル配置情報と変換係数情報とが対になっているため、入力されたＮｃｈオーディオ信号に乗じる変換係数は、チャンネル番号により識別される。また、ダウンミックス選択フラグは、例えば、表７に示すような種類が存在する。 The downmix unit 32 includes 3D channel arrangement information from the 3D space synthesis unit 25, Nch decoded audio signals from the plane decoding units 22, 23 and 24, the transform coefficient information, and downmix selection. The flag is received as an input, and an Nch audio signal or a downmixed audio signal is output according to the downmix selection flag. Since the three-dimensional channel arrangement information and the conversion coefficient information are paired, the conversion coefficient to be multiplied by the input Nch audio signal is identified by the channel number. In addition, the types of downmix selection flags shown in Table 7 exist, for example.

表７において、ダウンミックス選択フラグ番号「０」は、ダウンミックスを行わずＮｃｈオーディオ信号をそのまま出力するモードである。また、ダウンミックス選択フラグ番号「１」は、ダウンミックスを実行し、１枚の２次元平面に存在する５.１ｃｈとして出力するモードである。また、ダウンミックス選択フラグ番号「２」は、ダウンミックスを実行し、１枚の２次元平面に存在する２ｃｈとして出力するモードである。また、ダウンミックス選択フラグ番号「３」は、ダウンミックスにより、１枚の２次元平面に存在する２ｃｈバイノーラル信号を生成するモードである。更に、ダウンミックス選択フラグ番号「４」は、平面総数分の２次元平面に存在する５.１ｃｈとして出力するモードである。

In Table 7, the downmix selection flag number “0” is a mode in which the Nch audio signal is output as it is without downmixing. The downmix selection flag number “1” is a mode in which downmix is executed and output as 5.1ch existing on one two-dimensional plane. The downmix selection flag number “2” is a mode in which downmix is executed and output as 2ch existing on one two-dimensional plane. The downmix selection flag number “3” is a mode for generating a 2ch binaural signal existing on one two-dimensional plane by downmixing. Further, the downmix selection flag number “4” is a mode for outputting as 5.1ch existing on the two-dimensional plane corresponding to the total number of planes.

図１９は、図１８中のダウンミックス部３２の一例のブロック図を示す。図１９に示すように、ダウンミックス部３２は、モード１ブロック３２１、モード２ブロック３２２、モード３ブロック３２３、モード４ブロック３２４及び出力セレクタ３２５から構成されている。モード番号は表７に示したダウンミックス選択フラグ番号に対応している。入力されたダウンミックス選択フラグに応じて、モード１ブロック３２１〜モード４ブロック３２４のうち、各番号に対応したモードブロックが有効になり、入力されたオーディオ信号、３次元チャンネル配置情報、及び変換係数情報を用いてダウンミックス処理がなされて、出力セレクタ３２５からオーディオ信号が出力される。以下、各モードついて詳細に説明する。 FIG. 19 shows a block diagram of an example of the downmix unit 32 in FIG. As shown in FIG. 19, the downmix unit 32 includes a mode 1 block 321, a mode 2 block 322, a mode 3 block 323, a mode 4 block 324, and an output selector 325. The mode number corresponds to the downmix selection flag number shown in Table 7. In accordance with the input downmix selection flag, the mode block corresponding to each number among the mode 1 block 321 to the mode 4 block 324 becomes valid, and the input audio signal, three-dimensional channel arrangement information, and conversion coefficient Downmix processing is performed using the information, and an audio signal is output from the output selector 325. Hereinafter, each mode will be described in detail.

まず、ダウンミックス部３２に、ダウンミックス選択フラグ番号「０」が入力された場合の構成及び動作について説明する。この場合は、ダウンミックス部３２は、ダウンミックスを行わず、入力されたＮｃｈオーディオ信号を出力セレクタ３２５で選択してそのまま出力する。この場合、モード１ブロック３２１〜モード４ブロック３２４は使用しない。 First, the configuration and operation when the downmix selection flag number “0” is input to the downmix unit 32 will be described. In this case, the downmix unit 32 does not perform the downmix, selects the input Nch audio signal by the output selector 325 and outputs it as it is. In this case, the mode 1 block 321 to the mode 4 block 324 are not used.

次に、ダウンミックス部３２に、ダウンミックス選択フラグ番号「１」が入力された場合の構成及び動作について説明する。この場合は、ダウンミックス部３２は、モード１ブロック３２１を有効とすると共に、出力セレクタ３２５をモード１ブロック３２１から出力されるオーディオ信号を選択する。 Next, the configuration and operation when the downmix selection flag number “1” is input to the downmix unit 32 will be described. In this case, the downmix unit 32 enables the mode 1 block 321 and selects the audio signal output from the mode 1 block 321 by the output selector 325.

図２０は、モード１ブロック３２１の一例の全体ブロック図を示す。同図に示すように、モード１ブロック３２１は、上層チャンネルダウンミックス部１０１、中層チャンネルダウンミックス部１０２、下層チャンネルダウンミックス部１０３及び５.１ｃｈ合成部１０４から構成されている。モード１ブロック３２１は、入力された３次元チャンネル配置情報に基づいて、各ダウンミックス部１０１〜１０３で入力されたＮｃｈオーディオ信号の必要なチャンネルのオーディオ信号を選択してダウンミックス処理を行う。 FIG. 20 shows an overall block diagram of an example of the mode 1 block 321. As shown in the figure, the mode 1 block 321 includes an upper layer channel downmix unit 101, a middle layer channel downmix unit 102, a lower layer channel downmix unit 103, and a 5.1 channel combining unit 104. The mode 1 block 321 selects an audio signal of a necessary channel of the Nch audio signal input by each of the downmix units 101 to 103 based on the input three-dimensional channel arrangement information, and performs a downmix process.

上層チャンネルダウンミックス部１０１は、Ｎｃｈオーディオ信号のうち上層部に存在するチャンネルのオーディオ信号について、変換係数情報を用いて５.１ｃｈのダウンミックス信号に変換し、そのダウンミックス信号を５.１ｃｈ合成部１０４へ出力する。同様に、中層チャンネルダウンミックス部１０２、下層チャンネルダウンミックス部１０３は、変換係数情報を用いて中層部、下層部に存在するチャンネルのオーディオ信号をそれぞれ５.１ｃｈのダウンミックス信号に変換し、そのダウンミックス信号を５.１ｃｈ合成部１０４へ出力する。 The upper layer channel downmix unit 101 converts the channel audio signal existing in the upper layer portion of the Nch audio signal into a 5.1ch downmix signal using the conversion coefficient information, and combines the downmix signal with 5.1ch. Output to the unit 104. Similarly, the middle layer channel downmix unit 102 and the lower layer channel downmix unit 103 use the conversion coefficient information to convert the audio signals of the channels existing in the middle layer portion and the lower layer portion into 5.1ch downmix signals, respectively. The downmix signal is output to the 5.1ch synthesis unit 104.

５.１ｃｈ合成部１０４は、各ダウンミックス部１０１〜１０３から入力された５．１ｃｈダウンミックス信号について、対応するチャンネル毎に加算することで、１枚の２次元平面に存在する最終的な５.１ｃｈのオーディオ信号を生成して出力する。 The 5.1ch synthesizing unit 104 adds the 5.1ch downmix signals input from the downmix units 101 to 103 for each corresponding channel, so that the final 5 existing on one two-dimensional plane is present. Generate and output a .1ch audio signal.

ダウンミックス方法について説明する。例えば、前述したＭＰＥＧ−２ＡＡＣ規格は、下記の式（１ａ）、（１ｂ）を用いて１枚の２次元平面でのダウンミックスを実行し、左右の２ｃｈ（L'、R'）に変換する。 The downmix method will be described. For example, the MPEG-2 AAC standard described above performs downmix on one two-dimensional plane using the following formulas (1a) and (1b), and converts them to left and right 2ch (L ', R') To do.

式（１ａ）、（１ｂ）において、右辺はオリジナル音声に関し、Ｌは左フロントチャンネル信号、Ｒは右フロントチャンネル信号、Ｃは前方センターチャンネル信号、Ｌsは左リアチャンネル信号、Ｒsは右リアチャンネル信号、Ａはダウンミックスの係数を示す。また、Ｌ’は左チャンネルのダウンミックス信号、Ｒ’は右チャンネルのダウンミックス信号を示す。

In the equations (1a) and (1b), the right side relates to the original sound, L is the left front channel signal, R is the right front channel signal, C is the front center channel signal, Ls is the left rear channel signal, and Rs is the right rear channel signal. , A represents a downmix coefficient. L ′ represents a left channel downmix signal, and R ′ represents a right channel downmix signal.

ここで、係数Ａは可変の値で、１/√２、１/２、１/(２√２）、０という値をとれる。このため、例えば、Ａ＝１/√２を用いるとすると、式（１ａ）、（１ｂ）は下記の式（２ａ）、（２ｂ）で表現することができる。 Here, the coefficient A is a variable value and can take values of 1 / √2, 1/2, 1 / (2√2), and 0. Therefore, for example, if A = 1 / √2 is used, the expressions (1a) and (1b) can be expressed by the following expressions (2a) and (2b).

Ｌ’＝Ｃ１×［Ｌ＋Ｃ２×（Ｃ＋Ｌ_Ｓ）］（２ａ）
Ｒ’＝Ｃ１×［Ｒ＋Ｃ２×（Ｃ＋Ｒ_Ｓ）］（２ｂ）
ただし、式（２ａ）、（２ｂ）中、Ｃ１、Ｃ２は係数である。 L ′ = C1 × [L + C2 × (C + L _S )] (2a)
R ′ = C1 × [R + C2 × (C + R _S )] (2b)
However, in formulas (2a) and (2b), C1 and C2 are coefficients.

ＭＰＥＧ−２ＡＡＣ規格に示したダウンミックス手法は、再生出力を行うチャンネルである(Ｌ,Ｒ)の側面にあるチャンネル（ＬならばＣとＬs、ＲならばＣとＲs)に係数を乗じて加算しているので、これを中層部に応用して、中層ダウンミックスＬをＭ_Ｌ、中層ダウンミックスＲをＭ_Ｒ、中層ダウンミックスＣをＭ_Ｃ、中層ダウンミックスＢＬをＭ_ＢＬ、中層ダウンミックスＢＲをＭ_ＢＲとし、中層ダウンミックスＬＦＥをＭ_ＬＦＥとすると、係数Ｃ１、Ｃ２と中層の１０チャンネル(MFL,MFLC,MFC,MFRC,MFR,MSL,MSR,MBL,MBC,MBR)とを用いて以下のような式となる。ここでは、係数Ｃ１、Ｃ２の値として、例えば、Ｃ１＝(２／３）、Ｃ２＝（１/√２）としている。 The downmix method shown in the MPEG-2 AAC standard multiplies a channel on the side of (L, R) which is a channel for reproduction output (C and Ls if L, C and Rs if R) by a coefficient. Since it is added, this is applied to the middle layer, and the middle layer downmix L is M_L, the middle layer downmix R is M_R, the middle layer downmix C is M_C, the middle layer downmix BL is M_BL, and the middle layer downmix BR is M_BR. When the middle layer downmix LFE is M_LFE, using the coefficients C1 and C2 and the middle layer 10 channels (MFL, MFLC, MFC, MFRC, MFR, MSL, MSR, MBL, MBC, MBR), Become. Here, as the values of the coefficients C1 and C2, for example, C1 = (2/3) and C2 = (1 / √2).

Ｍ_Ｌ＝Ｃ１×［ＭＦＬ＋Ｃ２×（ＭＦＬＣ＋ＭＳＬ）］（３ａ）
Ｍ_Ｒ＝Ｃ１×［ＭＦＲ＋Ｃ２×（ＭＦＲＣ＋ＭＳＲ）］（３ｂ）
Ｍ_Ｃ＝Ｃ１×［ＭＦＣ＋Ｃ２×（ＭＦＬＣ＋ＭＦＲＣ）］（３ｃ）
Ｍ_ＢＬ＝Ｃ１×［ＭＢＬ＋Ｃ２×（ＭＳＬ＋ＭＢＣ）］（３ｄ）
Ｍ_ＢＲ＝Ｃ１×［ＭＢＲ＋Ｃ２×（ＭＳＲ＋ＭＢＣ）］（３ｅ）
Ｍ_ＬＦＥ＝０（３ｆ）
上層部についても、これを応用して、上層ダウンミックスＬをＵ_Ｌ、上層ダウンミックスＲをＵ_Ｒ、上層ダウンミックスＣをＵ_Ｃ、上層ダウンミックスＢＬをＵ_ＢＬ、上層ダウンミックスＢＲをＵ_ＢＲとし、上層ダウンミックスＬＦＥをＵ_ＬＦＥとすると、これらは、係数Ｃ１、Ｃ２、Ｃ３と、上層の９チャンネル（UFL,UFC,UFR,USL,USC,USR,UBL,
UBC,UBR）とを用いて以下のような式となる。ここでは、係数Ｃ１、Ｃ２、Ｃ３の値として、例えば、Ｃ１＝(２／３）、Ｃ２＝Ｃ３＝（１/√２）としている。 M_L = C1 × [MFL + C2 × (MFLC + MSL)] (3a)
M_R = C1 × [MFR + C2 × (MFRC + MSR)] (3b)
M_C = C1 × [MFC + C2 × (MFLC + MFRC)] (3c)
M_BL = C1 × [MBL + C2 × (MSL + MBC)] (3d)
M_BR = C1 × [MBR + C2 × (MSR + MBC)] (3e)
M_LFE = 0 (3f)
By applying this to the upper layer, the upper layer downmix L is U_L, the upper layer downmix R is U_R, the upper layer downmix C is U_C, the upper layer downmix BL is U_BL, the upper layer downmix BR is U_BR, and the upper layer downmix is U_BR. If LFE is U_LFE, these are the coefficients C1, C2, C3 and the upper 9 channels (UFL, UFC, UFR, USL, USC, USR, UBL,
UBC, UBR) and the following formula. Here, as the values of the coefficients C1, C2, and C3, for example, C1 = (2/3) and C2 = C3 = (1 / √2).

Ｕ_Ｌ＝Ｃ１×［Ｃ３×（ＵＦＬ＋Ｃ２×ＵＳＬ）］（４ａ）
Ｕ_Ｒ＝Ｃ１×［Ｃ３×（ＵＦＲ＋Ｃ２×ＵＳＲ）］（４ｂ）
Ｕ_Ｃ＝Ｃ１×［Ｃ３×（ＵＦＣ＋Ｃ２×ＵＳＣ）］（４ｃ）
Ｕ_ＢＬ＝C1×［C3×｛ＵＢＬ＋C2×（ＵＳＬ＋ＵＢＣ）＋ＵＳＣ｝］（４ｄ）
Ｕ_ＢＲ＝C1×［C3×｛ＵＢＲ＋C2×（ＵＳＲ＋ＵＢＣ）＋ＵＳＣ｝］（４ｅ）
Ｕ_ＬＦＥ＝０（４ｆ）
下層部については、ダウンミックスすべき信号は、ＬＦＥチャンネルだけである。下層ダウンミックスＬをＬ_Ｌ、下層ダウンミックスＲをＬ_Ｒ、下層ダウンミックスＣをＬ_Ｃ、下層ダウンミックスＢＬをＬ_ＢＬ、下層ダウンミックスＢＲをＬ_ＢＲとし、下層ダウンミックスＬＦＥをＬ_ＬＦＥとすると、係数Ｃ１、Ｃ２と下層の３チャンネル(LFL,LFC,LFR)とＬＦＥの２チャンネル(LFEL,LFER)とを用いて以下のような式となる。ここでは、係数Ｃ１、Ｃ２の値として、例えば、Ｃ１＝(２／３）、Ｃ２＝（１/√２）としている。 U_L = C1 × [C3 × (UFL + C2 × USL)] (4a)
U_R = C1 × [C3 × (UFR + C2 × USR)] (4b)
U_C = C1 × [C3 × (UFC + C2 × USC)] (4c)
U_BL = C1 × [C3 × {UBL + C2 × (USL + UBC) + USC}] (4d)
U_BR = C1 × [C3 × {UBR + C2 × (USR + UBC) + USC}] (4e)
U_LFE = 0 (4f)
For the lower layer, the only signal to be downmixed is the LFE channel. If the lower layer downmix L is L_L, the lower layer downmix R is L_R, the lower layer downmix C is L_C, the lower layer downmix BL is L_BL, the lower layer downmix BR is L_BR, and the lower layer downmix LFE is L_LFE, the coefficients C1, C2 and Using the lower three channels (LFL, LFC, LFR) and the two LFE channels (LFEL, LFER), the following equation is obtained. Here, as the values of the coefficients C1 and C2, for example, C1 = (2/3) and C2 = (1 / √2).

Ｌ_Ｌ＝Ｃ１×ＬＦＬ（５ａ）
Ｌ_Ｒ＝Ｃ１×ＬＦＲ（５ｂ）
Ｌ_Ｃ＝Ｃ１×ＬＦＣ（５ｃ）
Ｌ_ＢＬ＝０（５ｄ）
Ｌ_ＢＲ＝０（５ｅ）
Ｌ_ＬＦＥ＝Ｃ２×（ＬＦＥＬ＋ＬＦＥＲ）（５ｆ）
以上より、最終の５.１ｃｈダウンミックス信号を、Ｌ'、Ｒ'、Ｃ'、ＢＬ'、ＢＲ'、ＬＦＥ’とすると、これらは以下の式で表される。 L_L = C1 × LFL (5a)
L_R = C1 × LFR (5b)
L_C = C1 × LFC (5c)
L_BL = 0 (5d)
L_BR = 0 (5e)
L_LFE = C2 × (LFEL + LFER) (5f)
From the above, assuming that the final 5.1ch downmix signal is L ′, R ′, C ′, BL ′, BR ′, LFE ′, these are expressed by the following equations.

Ｌ’＝（Ｕ_Ｌ）＋（Ｍ_Ｌ）＋（Ｌ_Ｌ）（６ａ）
Ｒ’＝（Ｕ_Ｒ）＋（Ｍ_Ｒ）＋（Ｌ_Ｒ）（６ｂ）
Ｃ’＝（Ｕ_Ｃ）＋（Ｍ_Ｃ）＋（Ｌ_Ｃ）（６ｃ）
ＢＬ’＝（Ｕ_ＢＬ）＋（Ｍ_ＢＬ）＋（Ｌ_ＢＬ）（６ｄ）
ＢＲ’＝（Ｕ_ＢＲ）＋（Ｍ_ＢＲ）＋（Ｌ_ＢＲ）（６ｅ）
ＬＦＥ’＝（Ｕ_ＬＦＥ）＋（Ｍ_ＬＦＥ）＋（Ｌ_ＬＦＥ）（６ｆ）
前述の例で示したＣ１＝（２／３）、Ｃ２＝Ｃ３＝（１/√２）の場合、Ｃ１＝（−３.５）ｄＢ、Ｃ２＝Ｃ３＝（−３.０）ｄＢであるので、この値と加算時の増加量（＋６ｄＢ）とを考慮して信号の増加レベルを計算すると、ダウンミックス後の信号(Ｌ'、Ｒ'、Ｃ'、ＢＬ'、ＢＲ'）は、−１.４４ｄＢ、ＬＦＥ’は０ｄＢとなり、ほぼ元の信号レベルに近い値を持つダウンミックス信号を生成することができる。 L ′ = (U_L) + (M_L) + (L_L) (6a)
R ′ = (U_R) + (M_R) + (L_R) (6b)
C ′ = (U_C) + (M_C) + (L_C) (6c)
BL ′ = (U_BL) + (M_BL) + (L_BL) (6d)
BR ′ = (U_BR) + (M_BR) + (L_BR) (6e)
LFE ′ = (U_LFE) + (M_LFE) + (L_LFE) (6f)
When C1 = (2/3) and C2 = C3 = (1 / √2) shown in the above example, C1 = (− 3.5) dB and C2 = C3 = (− 3.0) dB. Therefore, if the increase level of the signal is calculated in consideration of this value and the increase amount at the time of addition (+6 dB), the signals (L ′, R ′, C ′, BL ′, BR ′) after the downmix are − 1.44 dB and LFE ′ are 0 dB, and a downmix signal having a value almost close to the original signal level can be generated.

以上の変換係数についてDSEを用いて符号化ストリームで伝送する際に、ダウンミックス信号の計算の仕方を、オーディオ信号符号化装置とオーディオ信号復号化装置で、上式のように取り決めるのであれば、係数Ｃ１、Ｃ２、Ｃ３だけを伝送するようにすればよい。もし、計算の仕方に自由度を持たせるのであれば、チャンネル毎に乗じる係数に分解して、それぞれを伝送するようにする。式（３ａ）〜式（６ｆ)を分解してチャンネルに対応させた表を表８に示す。 When transmitting the coded coefficients using the DSE for the above transform coefficients, if the way of calculating the downmix signal is determined by the audio signal encoding device and the audio signal decoding device as in the above equation, Only the coefficients C1, C2, and C3 need to be transmitted. If there is a degree of freedom in the calculation method, it is divided into coefficients to be multiplied for each channel and each is transmitted. Table 8 shows a table in which the equations (3a) to (6f) are decomposed to correspond to the channels.

このようにすれば、各チャンネルのオーディオ信号に、対応するダウンミックスチャンネル生成要素の係数を乗じるだけで、ダウンミックスチャンネルの生成ができ、計算式に縛られることは無い。また、式（３ａ）〜式（６ｆ）からも分るように、２次元平面単位で、５.１ｃｈダウンミックス信号を生成することができるので、符号化ストリームを図２１に示すようなフォーマットに構成することができる。

In this way, it is possible to generate a downmix channel simply by multiplying the audio signal of each channel by the coefficient of the corresponding downmix channel generation element, and there is no restriction on the calculation formula. Further, as can be seen from the equations (3a) to (6f), the 5.1ch downmix signal can be generated in units of two-dimensional planes, so that the encoded stream has the format shown in FIG. Can be configured.

図２１は、本発明になるオーディオ信号符号化装置により生成されるＭＰＥＧ−２／４ＡＡＣ符号化方式の符号化ストリームの第８の例のフォーマットを示す。このフォーマットは、図２１（Ｂ）に示すように、符号化ストリームに最初に現れるエレメントPCE0を上層用、２番目に現れるエレメントPCE1を中層用、３番目に現れるエレメントPCE2を下層＋LFE用と定義し、続いて、DSEを同様な順番で、DSE0を上層用、DSE1を中層用、DSE2を下層+LFE用と定義している。そして、各DSEでは、各２次元平面に含まれるチャンネルの変換係数だけを送る。この変換係数とチャンネルと平面番号と対応要素との関係は、例えば、表９Ａ、表９Ｂ、表９Ｃに示すようになる。 FIG. 21 shows the format of an eighth example of the encoded stream of the MPEG-2 / 4 AAC encoding method generated by the audio signal encoding device according to the present invention. As shown in FIG. 21B, this format defines the element PCE0 that appears first in the encoded stream as the upper layer, the element PCE1 that appears second as the middle layer, and the element PCE2 that appears third as the lower layer + LFE. Subsequently, DSE is defined in the same order as DSE0 for the upper layer, DSE1 for the middle layer, and DSE2 for the lower layer + LFE. In each DSE, only the conversion coefficients of the channels included in each two-dimensional plane are sent. The relationship among the conversion coefficient, channel, plane number, and corresponding element is as shown in Tables 9A, 9B, and 9C, for example.

以上の変換係数を用いて、モード１ブロック３２１は５.１ｃｈのダウンミックス処理を行う。図２２は、モード１ブロック３２１内の表９Ａに対応した上層チャンネルに対するダウンミックス処理を行う図２０の上層チャンネルダウンミックス部１０１の一例の構成図を示す。図２２に示すように、上層チャンネルダウンミックス部１０１は、上層の９チャンネルのそれぞれと変換係数情報とを乗算する９個の乗算器１０１１と、所定の乗算器出力を加算する加算器１０１２〜１０１４と、乗算器１０１５と、加算器１０１６及び１０１７とより構成される。これにより、上層チャンネルダウンミックス部１０１は、式（４ａ）〜式（４ｆ）に示した上層の５.１ｃｈダウンミックス出力（U_L,U_R,U_C,U_BL,U_BR,U_LFE）を生成して出力する。

Using the above conversion coefficients, the mode 1 block 321 performs 5.1 ch downmix processing. FIG. 22 shows a configuration diagram of an example of the upper layer channel downmix unit 101 of FIG. 20 that performs the downmix processing for the upper layer channel corresponding to Table 9A in the mode 1 block 321. As shown in FIG. 22, the upper layer channel downmix unit 101 includes nine multipliers 1011 for multiplying each of the upper nine channels and transform coefficient information, and adders 1012 to 1014 for adding predetermined multiplier outputs. And a multiplier 1015 and

adders

1016 and 1017. Thereby, the upper layer channel downmix unit 101 generates and outputs the upper layer 5.1ch downmix outputs (U_L, U_R, U_C, U_BL, U_BR, U_LFE) shown in the equations (4a) to (4f). .

同様に、図２３は、モード１ブロック３２１内の表９Ｂに対応した中層チャンネルに対するダウンミックス処理を行う図２０の中層チャンネルダウンミックス部１０２の一例の構成図を示す。図２３に示すように、中層チャンネルダウンミックス部１０２は、中層の１０チャンネルのそれぞれと変換係数情報とを乗算する１０個の乗算器１０２１と、所定の乗算器出力を加算する加算器１０２２〜１０２６とより構成される。これにより、中層チャンネルダウンミックス部１０２は、式（３ａ）〜式（３ｆ）に示した中層の５.１ｃｈダウンミックス出力（M_L,M_R,M_C,M_BL,M_BR,M_LFE）を生成して出力する。 Similarly, FIG. 23 shows a configuration diagram of an example of the middle-layer channel downmix unit 102 in FIG. 20 that performs the downmix processing for the middle-layer channel corresponding to Table 9B in the mode 1 block 321. As shown in FIG. 23, the middle-layer channel downmix unit 102 includes ten multipliers 1021 that multiply each of the ten middle-layer channels and transform coefficient information, and adders 1022 to 1026 that add predetermined multiplier outputs. It is composed of. Thereby, the middle layer channel downmix unit 102 generates and outputs the middle layer 5.1ch downmix outputs (M_L, M_R, M_C, M_BL, M_BR, M_LFE) shown in the equations (3a) to (3f). .

同様に、図２４は、モード１ブロック３２１内の表９Ｃに対応した下層チャンネルに対するダウンミックス処理を行う図２０の下層チャンネルダウンミックス部１０３の一例の構成図を示す。図２４に示すように、下層チャンネルダウンミックス部１０３は、下層の５チャンネルのそれぞれと変換係数情報とを乗算する５個の乗算器１０３１と、所定の乗算器出力を加算する加算器１０３２とより構成される。これにより、下層チャンネルダウンミックス部１０３は、式（５ａ）〜式（５ｆ）に示した下層の５.１ｃｈダウンミックス出力（L_L,L_R,L_C,L_BL,L_BR,L_LFE）を生成して出力する。 Similarly, FIG. 24 shows a configuration diagram of an example of the lower layer channel downmix unit 103 of FIG. 20 that performs the downmix processing for the lower layer channel corresponding to Table 9C in the mode 1 block 321. As shown in FIG. 24, the lower layer channel downmix unit 103 includes five multipliers 1031 for multiplying each of the lower five channels and the transform coefficient information, and an adder 1032 for adding a predetermined multiplier output. Composed. Thereby, the lower layer channel downmix unit 103 generates and outputs the lower layer 5.1ch downmix outputs (L_L, L_R, L_C, L_BL, L_BR, L_LFE) shown in the equations (5a) to (5f). .

図２５は、図２０のモード１ブロック３２１内の５.１ｃｈ合成部１０４の一例のブロック図を示す。図２５に示すように、５.１ｃｈ合成部１０４は、上層、中層及び下層の各チャンネルのうち、５.１ｃｈの対応するチャンネルの信号同士を加算する６個の加算器１０４１〜１０４６により、式（６ａ）〜式（６ｆ）に示した加算結果をそれぞれ得て、前述の最終の５.１ｃｈダウンミックス信号Ｌ'、Ｒ'、Ｃ'、ＢＬ'、ＢＲ'、ＬＦＥ’を出力する。 FIG. 25 shows a block diagram of an example of the 5.1ch combining unit 104 in the mode 1 block 321 of FIG. As shown in FIG. 25, the 5.1ch combining unit 104 includes six adders 1041 to 1046 that add signals of corresponding channels of 5.1ch among the upper layer, middle layer, and lower layer channels. The addition results shown in (6a) to (6f) are obtained, and the final 5.1ch downmix signals L ′, R ′, C ′, BL ′, BR ′, LFE ′ are output.

次に、ダウンミックス部３２に、ダウンミックス選択フラグ番号「２」が入力された場合の構成及び動作について説明する。この場合は、ダウンミックス部３２は、図１９のモード２ブロック３２２を有効とすると共に、出力セレクタ３２５をモード２ブロック３２２から出力されるオーディオ信号を選択する。 Next, the configuration and operation when the downmix selection flag number “2” is input to the downmix unit 32 will be described. In this case, the downmix unit 32 validates the mode 2 block 322 of FIG. 19 and selects the audio signal output from the mode 2 block 322 by the output selector 325.

図２６は、モード２ブロック３２２の一例の全体ブロック図を示す。同図に示すように、モード２ブロック３２２は、上層チャンネルダウンミックス部２０１、中層チャンネルダウンミックス部２０２、下層チャンネルダウンミックス部２０３、５.１ｃｈ合成部２０４及び２ｃｈ合成部２０５から構成されている。モード２ブロック３２２は、入力された３次元チャンネル配置情報に基づいて、各ダウンミックス部２０１〜２０３で入力されたＮｃｈオーディオ信号の必要なチャンネルを選択させてダウンミックス処理させた後、５．１ｃｈ合成部２０４で５.１ｃｈのダウンミックス信号を生成した後、２ｃｈ合成部２０５により５.１ｃｈダウンミックス信号から２ｃｈのオーディオ信号に変換して出力する。 FIG. 26 shows an overall block diagram of an example of the mode 2 block 322. As shown in the figure, the mode 2 block 322 includes an upper layer channel downmix unit 201, a middle layer channel downmix unit 202, a lower layer channel downmix unit 203, a 5.1ch synthesis unit 204, and a 2ch synthesis unit 205. . The mode 2 block 322 selects a necessary channel of the Nch audio signal input from each of the downmix units 201 to 203 based on the input three-dimensional channel arrangement information, and then performs a downmix process. After the 5.1ch downmix signal is generated by the synthesizer 204, the 2ch synthesizer 205 converts the 5.1ch downmix signal into a 2ch audio signal and outputs it.

モード２ブロック３２２は、図２０に示したモード１ブロック３２１と同様の構成に、２ｃｈ合成部２０５を追加した構成であるので、次に２ｃｈ合成部２０５について説明する。 The mode 2 block 322 has the same configuration as the mode 1 block 321 shown in FIG. 20 with the addition of the 2ch combining unit 205, so the 2ch combining unit 205 will be described next.

図２７は、図２６中の２ｃｈ合成部２０５の一例のブロック図を示す。図２７に示すように、２ｃｈ合成部２０５は、５.１ｃｈダウンミックス信号Ｌ'、Ｒ'、Ｃ'、ＢＬ'、ＢＲ'、ＬＦＥ’と変換係数情報とをそれぞれ乗算する乗算器２０５１〜２０５５と、乗算器２０５１、２０５３、２０５４の各出力信号を加算合成する加算器２０５６と、乗算器２０５２、２０５３、２０５５の各出力信号を加算合成する加算器２０５７とより構成されている。加算器２０５６は、ダウンミックスした左チャンネル信号Ｌ”を出力する。また、加算器２０５７は、ダウンミックスした右チャンネル信号Ｒ”を出力する。 FIG. 27 shows a block diagram of an example of the 2ch synthesis unit 205 in FIG. As shown in FIG. 27, the 2ch synthesis unit 205 multiplies 5.1ch downmix signals L ′, R ′, C ′, BL ′, BR ′, LFE ′ and transform coefficient information, respectively. And an adder 2056 that adds and synthesizes the output signals of the multipliers 2051, 2053, and 2054, and an adder 2057 that adds and synthesizes the output signals of the multipliers 2052, 2053, and 2055. The adder 2056 outputs the down-mixed left channel signal L ″. The adder 2057 outputs the down-mixed right channel signal R ″.

次に、ダウンミックス部３２に、ダウンミックス選択フラグ番号「３」が入力された場合の構成及び動作について説明する。この場合は、ダウンミックス部３２は図１９のモード３ブロック３２３を有効とすると共に、出力セレクタ３２５をモード３ブロック３２３から出力されるオーディオ信号を選択する。 Next, the configuration and operation when the downmix selection flag number “3” is input to the downmix unit 32 will be described. In this case, the downmix unit 32 enables the mode 3 block 323 of FIG. 19 and selects the audio signal output from the mode 3 block 323 by the output selector 325.

ダウンミックス選択フラグ番号「３」の場合、モード３ブロック３２３によるダウンミックスにより、１枚の２次元平面上の２ｃｈバイノーラル（binaural)信号を生成する。バイノーラル信号を生成するためには、Ｎｃｈオーディオ信号の位置を示す３次元チャンネル配置情報で示されるオーディオ信号の音源位置（Ｘ，Ｙ，Ｚ）から、聴取者位置（０，０，０）に着席する聴取者の右耳までの頭部伝達関数（ＨＲＴＦ_Ｒ）と当該聴取者の左耳までの頭部伝達関数（ＨＲＴＦ_Ｌ）とを予め測定しておき、それらの頭部伝達関数に基づくフィルタ係数を用いて、各信号にフィルタ演算を行い、右耳用、左耳用にまとめることを行う。 When the downmix selection flag number is “3”, a 2ch binaural signal on one two-dimensional plane is generated by downmixing by the mode 3 block 323. In order to generate a binaural signal, the user is seated at the listener position (0, 0, 0) from the sound source position (X, Y, Z) of the audio signal indicated by the three-dimensional channel arrangement information indicating the position of the Nch audio signal. The head-related transfer function (HRTF_R) to the listener's right ear and the head-related transfer function (HRTF_L) to the listener's left ear are measured in advance, and the filter coefficients based on those head-related transfer functions are calculated. The filter operation is performed on each signal to be combined for the right ear and the left ear.

図２８は、モード３ブロック３２３の一例の全体ブロック図を示す。図２８において、モード３ブロック３２３は、各チャンネルｃｈ０〜ｃｈＮ−１のオーディオ信号が入力される左耳用のＮ個のフィルタ３２３１₀〜３２３１_N-1と、各チャンネルｃｈ０〜ｃｈＮ−１のオーディオ信号が入力される右耳用のＮ個のフィルタ３２３２₀〜３２３２_N-1と、フィルタ３２３１₀〜３２３１_N-1の各出力信号を加算する加算器３２３３と、フィルタ３２３２₀〜３２３２_N-1の各出力信号を加算する加算器３２３４とより構成される。 FIG. 28 shows an overall block diagram of an example of the mode 3 block 323. In FIG. 28, the mode 3 block 323 includes N filters 3231 _{0 to} 3231 _N-1 for the left ear to which audio signals of the channels ch0 to chN-1 are input, and audio of the channels ch0 to chN-1. N filters 3232 _{0 to} 3232 _N-1 for the right ear to which signals are input, an adder 3233 for adding output signals of the filters 3231 _{0 to} 3231 _N-1 , and filters 3232 _{0 to} 3232 _N-1 And an adder 3234 for adding the respective output signals.

モード３ブロック３２３の各フィルタ３２３１₀〜３２３１_N-1と３２３２₀〜３２３２_N-1とは、ダウンミックス選択フラグ番号「３」が選択された場合に、入力される変換係数情報に基づいたフィルタ係数に設定される。このフィルタ係数は、各チャンネルに対する対応要素（UFCなど）に対応した頭部伝達関数（HRTF: Head-Related Transfer Function)に基づくフィルタ係数である。固定位置に対する頭部伝達関数のデータベースで十分であれば、符号化ストリームとして伝送する必要はないが、自由度の高い３次元配置を必要とする場合には、その３次元配置からリスナーの右耳、左耳までの頭部伝達関数が必要となる。 If each filter 3231 ₀ ~3231 _N-1 mode 3 Block 323 and 3232 _{0 to 3232} and _N-1 downmix selection flag number "3" is selected, the filter based on the transform coefficient information input Set to coefficient. This filter coefficient is a filter coefficient based on a head-related transfer function (HRTF) corresponding to a corresponding element (such as UFC) for each channel. If the database of head-related transfer functions for a fixed position is sufficient, it is not necessary to transmit it as an encoded stream. However, if a three-dimensional arrangement with a high degree of freedom is required, the right ear of the listener is determined from the three-dimensional arrangement. The head-related transfer function to the left ear is required.

図２８において、前述からの例でいえば、ｃｈ０はＵＦＣであり、ＵＦＣの３次元配置（0, Y, Z)から発した音がリスナーの左耳に届く際の伝達特性（HRTF(0, L)）をフィルタ３２３１₀により、またリスナーの右耳に届く際の伝達特性（HRTF(0, R))をフィルタ３２３２₀によりそれぞれｃｈ０の信号に畳み込み、出力する。同様に、ｃｈ１の信号に対して、ＵＦＬの伝達特性HRTF(1, L)をフィルタ３２３１₁により、また、伝達関数HRTF(1, R)をフィルタ３２３２₁により、それぞれ畳み込み、出力する。 In FIG. 28, in the example from the above, ch0 is UFC, and the transmission characteristics (HRTF (0, Y, Z) when sound emitted from the UFC three-dimensional arrangement (0, Y, Z) reaches the listener's left ear. L)) is convoluted with the filter 3231 _{0 and} the transfer characteristic (HRTF (0, R)) when reaching the listener's right ear is convolved with the ch0 signal by the filter 3232 ₀ and output. Similarly, the UFL transfer characteristic HRTF (1, L) is convoluted with the filter 3231 ₁ and the transfer function HRTF (1, R) is convoluted with the filter 3232 ₁ to output the ch1 signal.

そして、加算器３２３３は、全てのＬｃｈに対するフィルタ３２３１₀〜３２３１_N-1の各出力信号を加算合成してＬチャンネルのバイノーラル信号を出力する。また、これと並行して加算器３２３４は、全てのＲｃｈに対するフィルタ３２３２₀〜３２３２_N-1の各出力信号を加算合成してＲチャンネルのバイノーラル信号を出力する。このモードは、ヘッドホン試聴時に有効なモードとなる。 The adder 3233 adds and synthesizes the output signals of the filters 3231 ₀ to 3231 _N−1 for all Lch, and outputs an L channel binaural signal. In parallel with this, the adder 3234 adds and synthesizes the output signals of the filters 3232 ₀ to 3232 _N−1 for all Rch and outputs an R channel binaural signal. This mode is an effective mode when listening to headphones.

次に、ダウンミックス部３２に、ダウンミックス選択フラグ番号「４」が入力された場合の構成及び動作について説明する。この場合は、ダウンミックス部３２は、図１９のモード４ブロック３２４を有効とすると共に、出力セレクタ３２５をモード４ブロック３２４から出力されるオーディオ信号を選択する。 Next, the configuration and operation when the downmix selection flag number “4” is input to the downmix unit 32 will be described. In this case, the downmix unit 32 validates the mode 4 block 324 of FIG. 19 and selects the audio signal output from the mode 4 block 324 by the output selector 325.

図２９は、モード４ブロック３２４の一例の全体ブロック図を示す。同図に示すように、モード４ブロック３２４は、上層チャンネルダウンミックス部３２４１、中層チャンネルダウンミックス部３２４２、下層チャンネルダウンミックス部３２４３から構成されている。このモード４ブロック３２４は、図２０に示したモード１ブロック３２１と同様の構成から５.１ｃｈ合成部１０４を削除した構成である。 FIG. 29 shows an overall block diagram of an example of the mode 4 block 324. As shown in the figure, the mode 4 block 324 includes an upper layer channel downmix unit 3241, a middle layer channel downmix unit 3242, and a lower layer channel downmix unit 3243. The mode 4 block 324 has a configuration in which the 5.1ch combining unit 104 is deleted from the same configuration as the mode 1 block 321 shown in FIG.

このモードは、平面総数だけ存在する２次元平面について、２次元平面毎に５.１ｃｈダウンミックスを行い出力するモードである。このモードの利点は、上層用に９ｃｈ分のスピーカを設置できない場合でも、図３０に示すような、上層、中層、下層用にそれぞれ合計３個のユニットを備えたトールボーイ型スピーカ４１〜４５を配置することで、計５本のスピーカ４１〜４５と１個のサブウーハー（ＬＦＥ）とにより、１枚の２次元平面上での５.１ｃｈダウンミックスでは得られない効果を得ることができる点である。 This mode is a mode in which a 5.1ch downmix is performed for each two-dimensional plane and output for a two-dimensional plane having a total number of planes. The advantage of this mode is that even if speakers for 9 channels cannot be installed for the upper layer, the tallboy speakers 41 to 45 having a total of three units for the upper layer, middle layer, and lower layer as shown in FIG. By arranging, a total of five speakers 41 to 45 and one subwoofer (LFE) can obtain an effect that cannot be obtained by a 5.1ch downmix on one two-dimensional plane. It is.

なお、本発明は以上の実施の形態に限定されるものではなく、例えば「ストリームに最初に現れるPCE0を上層用、２番目に現れるPCE1を中層用、３番目に現れるPCE2を低層＋ＬＦＥ用と定義する」といった取り決めを行わないのであれば、別途PCE中のコメントフィールドに前述した３種類の平面情報を記載したり、あるいは、DSE(データストリームエレメント)に記載するようにし、PCEの「element_instance_tag」の順番（小さい番号）からの並びと対応させる（この番号がプログラム番号を意味する。）ようにしてもよい。この場合は、上記の３つの平面の情報を自由な並びで配置できる。 The present invention is not limited to the above embodiment. For example, “PCE0 that appears first in the stream is for the upper layer, PCE1 that appears second is for the middle layer, and PCE2 that appears third is defined for the lower layer + LFE. If you do not make an arrangement such as “Yes”, enter the above three types of plane information separately in the comment field in the PCE, or in the DSE (data stream element), and set the “element_instance_tag” in the PCE. You may make it respond | correspond to the arrangement | sequence from an order (small number) (this number means a program number). In this case, the information on the above three planes can be arranged in any order.

また、オーディオ信号符号化装置１０は、３つの平面符号化部１２〜１４を有しているが、１個の符号化部でメモリに蓄えつつ、３平面分の符号化処理を行うようにしてもよい。同様に、オーディオ信号復号化装置２０は、３つの平面復号化部２２〜２４を有しているが、１個の復号化部でメモリに蓄えつつ、３平面分の復号化処理を行うようにしてもよい。更に、本発明は２２.２ｃｈ以外の多チャンネルの３次元空間に配置されたスピーカにより立体音場を形成する多チャンネルオーディオ信号にも適用できることは勿論である。 The audio signal encoding apparatus 10 includes three plane encoding units 12 to 14, but performs encoding processing for three planes while storing the memory in a single encoding unit. Also good. Similarly, the audio signal decoding apparatus 20 includes three plane decoding units 22 to 24, but performs decoding processing for three planes while storing in one memory with one decoding unit. May be. Furthermore, the present invention can also be applied to multi-channel audio signals that form a three-dimensional sound field by speakers arranged in a multi-channel three-dimensional space other than 22.2 ch.

また、以上の実施の形態ではＭＰＥＧ−２／４ＡＡＣ方式を例に説明したが、例えば、Ｅ−ＡＣ３方式に本発明を適用することができる。Ｅ−ＡＣ３方式の場合、公知の文献（"SMPTE Proposed Recommended Practice, Digital Cinema Channel Mapping and Labeling, RP 226,"(c)SMPTE 2004）に示された上方スピーカを伴ったチャンネル配置に準拠して符号化ストリームを生成することができるが、チャンネル配置がSMPTE提案に限定的であるため、２２.２ｃｈを符号化することができない。しかし、本発明のオーディオ信号符号化装置は複数のプログラムを１本のストリームに統合化できる符号化を行うため、本発明をＥーＡＣ３方式に適用可能である。 In the above embodiment, the MPEG-2 / 4 AAC system has been described as an example, but the present invention can be applied to, for example, the E-AC3 system. In the case of the E-AC3 system, a code according to a channel arrangement with an upper speaker shown in a known document ("SMPTE Proposed Recommended Practice, Digital Cinema Channel Mapping and Labeling, RP 226," (c) SMPTE 2004) is used. However, since the channel arrangement is limited to the SMPTE proposal, 22.2ch cannot be encoded. However, since the audio signal encoding apparatus of the present invention performs encoding capable of integrating a plurality of programs into one stream, the present invention can be applied to the E-AC3 system.

また、本発明はオーディオ信号符号化装置１０の動作をコンピュータにより実行する符号化プログラムや、オーディオ信号復号化装置２０，３０の動作をコンピュータにより実行する復号化プログラムも包含するものである。 The present invention also includes an encoding program for executing the operation of the audio signal encoding device 10 by a computer and a decoding program for executing the operations of the audio signal decoding devices 20 and 30 by a computer.

１０オーディオ信号符号化装置
１１３次元空間分割部
１２、１３、１４平面符号化部
１５ストリーム統合部
２０、３０オーディオ信号復号化装置
２１、３１ストリーム分離部
２２、２３、２４平面復号化部
２５３次元空間合成部
３２ダウンミックス部
４１〜４５トールボーイ型スピーカ
１０１、２０１、３２４１上層チャンネルダウンミックス部
１０２、２０２、３２４２中層チャンネルダウンミックス部
１０３、２０３、３２４３下層チャンネルダウンミックス部
１０４、２０４５.１ｃｈ合成部
２０５２ｃｈ合成部
３２１モード１ブロック
３２２モード２ブロック
３２３モード３ブロック
３２４モード４ブロック
３２５出力セレクタ DESCRIPTION OF SYMBOLS 10 Audio signal encoding apparatus 11 Three-dimensional space division part 12, 13, 14 Planar encoding part 15 Stream integration part 20, 30 Audio signal decoding apparatus 21, 31 Stream separation part 22, 23, 24 Planar decoding part 25 3 Dimensional space synthesis unit 32 Downmix unit 41 to 45 Tallboy speaker 101, 201, 3241 Upper channel downmix unit 102, 202, 3242 Middle channel downmix unit 103, 203, 3243 Lower channel downmix unit 104, 204 5. 1ch synthesis unit 205 2ch synthesis unit 321 Mode 1 block 322 Mode 2 block 323 Mode 3 block 324 Mode 4 block 325 Output selector

Claims

The position of each speaker of a plurality of speakers arranged three-dimensionally in a three-dimensional space that outputs audio signals of a plurality of channels, and a dividing direction that is a direction for dividing the three-dimensional space into a plurality of two-dimensional planes. Based on this, plane information including the number of the two-dimensional planes, the number of channels corresponding to the respective two-dimensional planes, and the division order of the two-dimensional planes is output, and the 2 of each speaker corresponding to each channel is output. A first step of outputting channel mapping information indicating a position in a dimension plane;
Based on the plane information and the channel mapping information, an encoding element is generated by encoding the audio signals of the plurality of channels as a group of programs for each two-dimensional plane, and further, the two-dimensional plane A second step of generating plane position information including information indicating a channel arrangement in the output and outputting the coding element and the plane position information for each two-dimensional plane;
And a third step of generating and outputting one encoded stream by integrating all of the encoding elements output for each of the two-dimensional planes by the second step and the plane position information. An audio signal encoding method characterized by the above.

In the one encoded stream generated in the third step, only a part of the channels of the audio signals of the plurality of channels arranged in a three-dimensional space can be decoded. 2. The audio signal encoding method according to claim 1, wherein information for adding the information is added as the planar position information.

In the one encoded stream generated in the third step, an audio signal obtained by converting the audio signals of the plurality of channels arranged three-dimensionally in a three-dimensional space into a smaller number of channels than the plurality of channels. 2. The audio signal encoding method according to claim 1, wherein transform coefficient information is added so as to be reproducible.

The transform coefficient information includes a filter coefficient corresponding to a head-related transfer function from the position of each speaker of the speaker with a small number of channels arranged three-dimensionally in the three-dimensional space to the viewer's right ear, and the viewing 4. The audio signal encoding method according to claim 3, further comprising a filter coefficient corresponding to a head-related transfer function up to the left ear of the person.

In the channel mapping information generated in the first step, an audio signal converted in advance to a number of channels smaller than the plurality of channels among the plurality of speakers arranged three-dimensionally in a three-dimensional space is output. Including information indicating the position of each speaker in the two-dimensional plane;
In the second step, apart from the plurality of channels of audio signals to be output from the plurality of speakers arranged three-dimensionally in the three-dimensional space, the audio signal previously converted into the small number of channels, A second encoding element is generated by encoding as a group of programs for each two-dimensional plane, and further, second plane position information including information indicating the channel arrangement in the two-dimensional plane is generated. And outputting the second encoding element and the second plane position information for each two-dimensional plane,
In the third step, the second encoding element and the second plane position information are generated by encoding the plurality of channels of audio signals as a group of programs for each two-dimensional plane. An encoded stream integrated with the encoded element and the plane position information including information indicating a channel arrangement in the two-dimensional plane is generated and output. Item 6. An audio signal encoding method according to Item 1.

The position of each speaker of a plurality of speakers arranged three-dimensionally in a three-dimensional space that outputs audio signals of a plurality of channels, and a dividing direction that is a direction for dividing the three-dimensional space into a plurality of two-dimensional planes. Based on this, plane information including the number of the two-dimensional planes, the number of channels corresponding to the respective two-dimensional planes, and the division order of the two-dimensional planes is output, and the 2 of each speaker corresponding to each channel is output. A three-dimensional space division unit that outputs channel mapping information indicating a position in a three-dimensional plane;
Based on the plane information and the channel mapping information, an encoding element is generated by encoding the audio signals of the plurality of channels as a group of programs for each two-dimensional plane, and further, the two-dimensional plane A plane encoding unit that generates plane position information including information indicating the channel arrangement in the plane, and outputs the encoding element and the plane position information for each two-dimensional plane;
A stream integration unit that integrates all of the encoding elements output by the plane encoding unit for each of the two-dimensional planes and the plane position information, and generates and outputs one encoded stream. An audio signal encoding device.

The stream integration unit adds the information for enabling decoding only a part of the plurality of channels of sound source information arranged three-dimensionally in a three-dimensional space as the plane position information. 7. The audio signal encoding apparatus according to claim 6, wherein an encoded stream of the book is generated.

The stream integration unit is configured to add conversion coefficient information for allowing the audio signals of the plurality of channels arranged three-dimensionally in a three-dimensional space to be reproduced into a signal converted to a number of channels smaller than the plurality of channels. 7. The audio signal encoding apparatus according to claim 6, wherein one encoded stream is generated.

The transform coefficient information includes a filter coefficient corresponding to a head-related transfer function from the position of each speaker of the speaker with a small number of channels arranged three-dimensionally in the three-dimensional space to the viewer's right ear, and the viewing 9. The audio signal encoding apparatus according to claim 8, further comprising a filter coefficient corresponding to a head-related transfer function up to the left ear of the person.

The three-dimensional space division unit is a position on the two-dimensional plane of a speaker that outputs an audio signal converted in advance to a number of channels smaller than the plurality of channels among the plurality of speakers arranged three-dimensionally in a three-dimensional space. To output information indicating
The plane encoding unit encodes an audio signal previously converted into the small number of channels separately from the plurality of channels of audio signals by performing encoding as a group of programs for each two-dimensional plane. Generating second plane position information including information indicating channel arrangement in the two-dimensional plane, and generating the second encoding element and the second plane position information. Output every 2D plane,
The stream integration unit generates the second encoding element and the second plane position information by encoding the audio signals of the plurality of channels as a group of programs for each two-dimensional plane. 7. A single encoded stream integrated with plane position information including information indicating channel arrangement in the two-dimensional plane is generated and output. Audio signal encoding device.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded by encoding them as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. And generating plane position information including information indicating the channel arrangement in the two-dimensional plane, receiving one encoded stream obtained by integrating them as an input, and the encoded stream A first step of separating the plane position information and the encoding element for each of the plurality of two-dimensional planes;
A second step of decoding the encoding elements for each of the two-dimensional planes separated in the first step, respectively, and decoding the decoded audio signals into the plurality of channels of audio signals arranged in the three-dimensional space;
A three-dimensional channel indicating the position of a speaker that outputs the audio signal of each channel of the decoded audio signals of the plurality of channels by combining the plane position information for each of the two-dimensional planes separated in the first step. An audio signal decoding method comprising: a third step of generating arrangement information.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded as a set of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. 1 encoding element is generated, and further, first plane position information including information indicating the channel arrangement in the two-dimensional plane is generated, and the number of channels is smaller than that of the plurality of audio signals in advance. The audio signal is also encoded as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space to generate a second encoding element, and the channel arrangement in the two-dimensional plane is further changed. Generating the second plane position information including the information to be received, and receiving one encoded stream obtained by integrating them as an input, From encoded stream, a first step of separating each of the said first and second planar position information first and second coding elements of said plurality of two-dimensional plane,
The first and second coding elements for each of the two-dimensional planes separated in the first step are respectively decoded and converted in advance into the audio signal of the plurality of channels and the number of channels smaller than the plurality of channels. A second step of decoding into an audio signal,
By combining the first and second plane position information for each two-dimensional plane separated in the first step, the decoded audio signals of the plurality of channels and the number of channels smaller than the plurality of channels. An audio signal decoding method comprising: a third step of generating three-dimensional channel arrangement information indicating a position of a speaker that outputs an audio signal of each channel of an audio signal converted in advance.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded by encoding them as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. Generating plane position information including information indicating the channel arrangement in the two-dimensional plane, and further reducing the audio signals of the plurality of channels arranged in the three-dimensional space from the plurality of channels. Generates transform coefficient information including information indicating transform coefficients for enabling reproduction as an audio signal of the number of channels, receives one encoded stream obtained by integrating them as an input, and from the encoded stream Separating the plane position information and the encoding element for each of the plurality of two-dimensional planes; A first step of separating the serial transform coefficient information,
A second step of decoding each of the encoding elements for each of the two-dimensional planes separated in the first step, and decoding into the plurality of channels of audio signals;
Three-dimensional channel arrangement information indicating the position of a speaker that outputs the audio signal of each channel of the plurality of audio signals is synthesized by combining the plane position information for each of the two-dimensional planes separated in the first step. A third step of generating,
The decoded audio signals of the plurality of channels are reproduced on one or more two-dimensional planes by multiplying the transform coefficient information separated in the first step obtained based on the three-dimensional channel arrangement information. And a fourth step of converting the audio signal into a smaller number of channels than the plurality of channels of audio signals.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded by encoding them as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. Can generate plane position information including information indicating the channel arrangement in the two-dimensional plane, and can reproduce the audio signals of the plurality of channels arranged in the three-dimensional space with a small number of channels. Generation of transform coefficient information including information indicating transform coefficients for receiving and receiving one encoded stream obtained by integrating them as an input, from the encoded stream, the plane position information and the code Separating a transform element for each of the plurality of two-dimensional planes, and further separating the transform coefficient information;
A second step of decoding each of the encoding elements for each of the two-dimensional planes separated in the first step, and decoding into the plurality of channels of audio signals;
Three-dimensional channel arrangement information indicating the position of a speaker that outputs the audio signal of each channel of the plurality of audio signals is synthesized by combining the plane position information for each of the two-dimensional planes separated in the first step. A third step of generating,
The decoded multi-channel audio signal is multiplied by the transform coefficient information separated in the first step obtained based on the three-dimensional channel arrangement information to obtain two channels from the multi-channel audio signal. The audio signal decoding method characterized by including the 4th step of converting into the binaural signal of this.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded by encoding them as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. And generating plane position information including information indicating the channel arrangement in the two-dimensional plane, receiving one encoded stream obtained by integrating them as an input, and the encoded stream A stream separation unit for separating the plane position information and the encoding element for each of the plurality of two-dimensional planes;
A plane decoding unit that decodes each of the encoding elements for each of the two-dimensional planes separated in the stream separation unit and decodes the encoded signals into the plurality of channels of audio signals;
A three-dimensional channel arrangement that indicates the position of a speaker that outputs the audio signal of each channel of the decoded audio signals of the plurality of channels by combining the plane position information for each of the two-dimensional planes separated by the stream separation unit An audio signal decoding apparatus comprising: a three-dimensional space synthesis unit that generates information.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded as a set of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. 1 is generated, and further, first plane position information including information indicating the channel arrangement in the two-dimensional plane is generated, and an audio signal having a smaller number of channels than the plurality of previously converted channels is generated. The second encoding element is generated by encoding as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space, and information indicating channel arrangement in the two-dimensional plane is further generated. Second plane position information including the received encoded stream is received as an input, and an encoded stream obtained by integrating them is received as an input. From arm, a stream separation unit separating the said first and second said planar position information of the first and second coding elements for each of the plurality of two-dimensional plane,
The first and second coding elements for each of the two-dimensional planes separated by the stream separation unit are decoded, respectively, so that the plurality of channels of audio signals and the number of channels smaller than the plurality of pre-converted channels A plane decoding unit for decoding into the audio signal of
The first and second plane position information for each of the two-dimensional planes separated by the stream separation unit is synthesized, and the decoded audio signals of the plurality of channels are less than the plurality of channels converted in advance. An audio signal decoding apparatus comprising: a three-dimensional space synthesis unit that generates three-dimensional channel arrangement information indicating a position of a speaker that outputs an audio signal of each channel of audio signals of the number of channels.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded by encoding them as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. Generating plane position information including information indicating the channel arrangement in the two-dimensional plane, and further reproducing with audio signals having a smaller number of channels than the plurality of channels arranged in the three-dimensional space. Generating transform coefficient information including information indicating transform coefficients for enabling and receiving one encoded stream obtained by integrating them, and receiving the plane position information and the encoded stream from the encoded stream Stream separation that separates encoding elements from each of the plurality of two-dimensional planes and further separates the transform coefficient information And,
A plane decoding unit that decodes each of the encoding elements for each of the two-dimensional planes separated by the stream separation unit and decodes the encoded signals into the plurality of channels of audio signals;
A three-dimensional channel arrangement that indicates the position of a speaker that outputs the audio signal of each channel of the decoded audio signals of the plurality of channels by combining the plane position information for each of the two-dimensional planes separated by the stream separation unit A three-dimensional space synthesis unit for generating information;
The decoded audio signals of the plurality of channels are multiplied by the transform coefficient information separated by the stream separation unit obtained based on the three-dimensional channel arrangement information, and reproduced on one or more two-dimensional planes. An audio signal decoding apparatus comprising: a downmix unit that converts an audio signal having a smaller number of channels than the plurality of channels.

A plurality of channels of audio signals to be output from a plurality of speakers arranged three-dimensionally in a three-dimensional space are encoded by encoding them as a group of programs for each of a plurality of two-dimensional planes included in the three-dimensional space. To generate plane position information including information indicating the channel arrangement in the two-dimensional plane, and to reproduce with a smaller number of channels than the plurality of channels arranged in the three-dimensional space. Generating transform coefficient information including information indicating transform coefficients to be received, receiving one encoded stream obtained by integrating them, and receiving the plane position information and the encoded element from the encoded stream For each of the plurality of two-dimensional planes, and further, a stream separator for separating the transform coefficient information;
A plane decoding unit that decodes each of the encoding elements for each of the two-dimensional planes separated by the stream separation unit and decodes the encoded signals into the plurality of channels of audio signals;
A three-dimensional channel arrangement that indicates the position of a speaker that outputs the audio signal of each channel of the decoded audio signals of the plurality of channels by combining the plane position information for each of the two-dimensional planes separated by the stream separation unit A three-dimensional space synthesis unit for generating information;
The decoded audio signal of the plurality of channels is multiplied by the transform coefficient information separated by the stream separation unit obtained based on the three-dimensional channel arrangement information, and the two-channel audio signal is obtained from the plurality of channels of the audio signal. An audio signal decoding device comprising: a downmix unit that converts a binaural signal.