JP4907522B2

JP4907522B2 - Speech coding apparatus and speech coding method

Info

Publication number: JP4907522B2
Application number: JP2007514799A
Authority: JP
Inventors: 幸司吉田
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2005-04-28
Filing date: 2006-04-27
Publication date: 2012-03-28
Anticipated expiration: 2026-04-27
Also published as: US20090083041A1; EP1876586B1; US8428956B2; EP1876586A4; CN101167126A; RU2007139784A; CN101167126B; JPWO2006118179A1; DE602006011600D1; EP1876586A1; WO2006118179A1

Description

本発明は、音声符号化装置および音声符号化方法に関し、特に、ステレオ音声のための音声符号化装置および音声符号化方法に関する。 The present invention relates to a speech encoding apparatus and speech encoding method, and more particularly to a speech encoding apparatus and speech encoding method for stereo speech.

移動体通信やＩＰ通信での伝送帯域の広帯域化、サービスの多様化に伴い、音声通信において高音質化、高臨場感化のニーズが高まっている。例えば、今後、テレビ電話サービスにおけるハンズフリー形態での通話、テレビ会議における音声通信、多地点で複数話者が同時に会話を行うような多地点音声通信、臨場感を保持したまま周囲の音環境を伝送できるような音声通信などの需要が増加すると見込まれる。その場合、モノラル信号より臨場感があり、また複数話者の発話位置が認識できるような、ステレオ音声による音声通信を実現することが望まれる。このようなステレオ音声による音声通信を実現するためには、ステレオ音声の符号化が必須となる。 With the widening of the transmission band in mobile communication and IP communication and the diversification of services, the need for higher sound quality and higher presence in voice communication is increasing. For example, in the future, hands-free calls in videophone services, voice communications in videoconferencing, multipoint voice communications in which multiple speakers talk at the same time at multiple locations, and the ambient sound environment while maintaining a sense of reality Demand for voice communications that can be transmitted is expected to increase. In that case, it is desired to realize audio communication using stereo sound that has a sense of presence than a monaural signal and can recognize the utterance positions of a plurality of speakers. In order to realize such audio communication using stereo sound, it is essential to encode stereo sound.

また、ＩＰネットワーク上での音声データ通信において、ネットワーク上のトラフィック制御やマルチキャスト通信実現のために、スケーラブルな構成を有する音声符号化方式が望まれている。スケーラブルな構成とは、受信側で部分的な符号化データからでも音声データの復号が可能な構成をいう。スケーラブルな構成を有する音声符号化方式における符号化処理は、階層化されており、コアレイヤに対応するものと拡張レイヤに対応するものとを含む。したがって、その符号化処理によって生成される符号化データも、コアレイヤの符号化データと拡張レイヤの符号化データとを含む。 In addition, in voice data communication on an IP network, a voice coding system having a scalable configuration is desired for traffic control on the network and multicast communication. A scalable configuration refers to a configuration in which audio data can be decoded even from partial encoded data on the receiving side. The encoding process in the speech encoding method having a scalable configuration is hierarchized, and includes one corresponding to the core layer and one corresponding to the enhancement layer. Therefore, encoded data generated by the encoding process also includes encoded data of the core layer and encoded data of the enhancement layer.

ステレオ音声を符号化し伝送する場合にも、ステレオ信号の復号と、符号化データの一部を用いたモノラル信号の復号とを受信側において選択可能な、モノラル−ステレオ間でのスケーラブル構成（モノラル−ステレオ・スケーラブル構成）を有する音声符号化方式が望まれる。 Even when stereo audio is encoded and transmitted, a scalable configuration between monaural and stereo (monaural-) that enables the reception side to select decoding of a stereo signal and decoding of a monaural signal using a part of the encoded data. A speech coding scheme having a stereo scalable configuration is desired.

このような音声符号化方式に基づく音声符号化方法としては、例えば、チャネル（以下、「ｃｈ」と略記することがある）間の信号の予測（第１ｃｈ信号から第２ｃｈ信号の予測、または、第２ｃｈ信号から第１ｃｈ信号の予測）を、チャネル相互間のピッチ予測により行う、すなわち、２チャネル間の相関を利用して符号化を行うものがある（非特許文献１参照）。
Ramprashad, S.A., “Stereophonic CELP coding using cross channel prediction”, Proc. IEEE Workshop on Speech Coding, pp.136-138, Sep. 2000 As a speech coding method based on such a speech coding scheme, for example, prediction of a signal between channels (hereinafter sometimes abbreviated as “ch”) (prediction of a first channel signal to a second channel signal, or There is a method in which the prediction of the first channel signal from the second channel signal) is performed by pitch prediction between channels, that is, encoding is performed using the correlation between two channels (see Non-Patent Document 1).
Ramprashad, SA, “Stereophonic CELP coding using cross channel prediction”, Proc. IEEE Workshop on Speech Coding, pp.136-138, Sep. 2000

しかしながら、上記従来の音声符号化方法では、双方のチャネル間の相関が小さい場合、十分な予測性能（予測ゲイン）が得られず符号化効率が劣化することがある。 However, in the above conventional speech coding method, when the correlation between both channels is small, sufficient prediction performance (prediction gain) cannot be obtained and coding efficiency may deteriorate.

本発明の目的は、双方のチャネル間の相関が小さい場合でも効率的にステレオ音声を符号化することができる音声符号化装置および音声符号化方法を提供することである。 An object of the present invention is to provide a speech encoding apparatus and speech encoding method that can efficiently encode stereo speech even when the correlation between both channels is small.

本発明の音声符号化装置は、第１チャネル信号および第２チャネル信号を含むステレオ信号を符号化する音声符号化装置において、前記第１チャネル信号および前記第２チャネル信号を用いてモノラル信号を生成するモノラル信号生成手段と、前記第１チャネル信号および前記第２チャネル信号の一方を選択する選択手段と、生成されたモノラル信号を符号化してコアレイヤ符号化データを得るとともに、選択されたチャネル信号を符号化して前記コアレイヤ符号化データに対応する拡張レイヤ符号化データを得る符号化手段と、を有し、前記選択手段は、前記第１チャネル信号および前記第２チャネル信号に対する符号化歪み、または、前記第１チャネル信号および前記第２チャネル信号に対応するチャネル内相関度、に基づき、前記第１チャネル信号および前記第２チャネル信号の一方をフレーム毎に選択し、前記符号化手段は、前記モノラル信号および前記フレーム毎に選択されたチャネル信号を前記フレーム毎に符号化する構成を採る。 The speech coding apparatus according to the present invention generates a monaural signal using the first channel signal and the second channel signal in a speech coding apparatus that encodes a stereo signal including a first channel signal and a second channel signal. Monaural signal generating means, selecting means for selecting one of the first channel signal and the second channel signal, encoding the generated monaural signal to obtain core layer encoded data, and selecting the selected channel signal encodes have a, encoding means for obtaining an extended layer encoded data corresponding to the core layer encoded data, said selection means, coding distortion for the first channel signal and the second channel signal, or, Based on the intra-channel correlation corresponding to the first channel signal and the second channel signal, the first channel signal One was selected for each frame of the Yaneru signal and the second channel signal, the encoding means, a configuration for encoding the monaural signal and the channel signal selected for each of the frame for each of the frames.

本発明の音声符号化方法は、第１チャネル信号および第２チャネル信号を含むステレオ信号を符号化する音声符号化方法において、前記第１チャネル信号および前記第２チャネル信号を用いてモノラル信号を生成するステップと、前記第１チャネル信号および前記第２チャネル信号の一方を選択する選択ステップと、生成されたモノラル信号を符号化してコアレイヤ符号化データを得るとともに、選択されたチャネル信号を符号化して前記コアレイヤ符号化データに対応する拡張レイヤ符号化データを得る符号化ステップと、を含み、前記選択ステップでは、前記第１チャネル信号および前記第２チャネル信号に対する符号化歪み、または、前記第１チャネル信号および前記第２チャネル信号に対応するチャネル内相関度、に基づき、前記第１チャネル信号および前記第２チャネル信号の一方をフレーム毎に選択し、前記符号化ステップでは、前記モノラル信号および前記フレーム毎に選択されたチャネル信号を前記フレーム毎に符号化するようにした。
The speech encoding method of the present invention is a speech encoding method for encoding a stereo signal including a first channel signal and a second channel signal, and generates a monaural signal using the first channel signal and the second channel signal. a step of a selection step of selecting one of the first channel signal and the second channel signal, the obtained core layer encoded data generated monaural signal is encoded, and encodes the selected channel signal An encoding step of obtaining enhancement layer encoded data corresponding to the core layer encoded data , wherein the selecting step includes encoding distortion for the first channel signal and the second channel signal, or the first channel Signal and an intra-channel correlation corresponding to the second channel signal, Selecting one of the channel signal and the second channel signal for each frame, in the encoding step, to the monaural signal and the channel signal selected for each of the frames to be encoded for each said frame.

本発明によれば、ステレオ信号の複数チャネル信号間の相関が小さい場合でも効率的にステレオ音声を符号化することができる。 According to the present invention, stereo audio can be efficiently encoded even when the correlation between a plurality of channel signals of a stereo signal is small.

以下、モノラル−ステレオ・スケーラブル構成を有する音声符号化に関する本発明の実施の形態について、添付図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention relating to speech coding having a monaural-stereo scalable configuration will be described in detail with reference to the accompanying drawings.

（実施の形態１）
図１は、本発明の実施の形態１に係る音声符号化装置の構成を示すブロック図である。図１の音声符号化装置１００は、スケーラブル構成のコアレイヤに対応する構成要素であるコアレイヤ符号化部１０２と、スケーラブル構成の拡張レイヤに対応する構成要素である拡張レイヤ符号化部１０４と、を有する。以下、各構成要素はフレーム単位で動作することを前提として説明する。 (Embodiment 1)
FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 1 of the present invention. The speech coding apparatus 100 in FIG. 1 includes a core layer coding unit 102 that is a component corresponding to a scalable core layer, and an enhancement layer coding unit 104 that is a component corresponding to a scalable enhancement layer. . Hereinafter, description will be made on the assumption that each component operates in units of frames.

コアレイヤ符号化部１０２は、モノラル信号生成部１１０およびモノラル信号符号化部１１２を有する。また、拡張レイヤ符号化部１０４は、符号化チャネル選択部１２０、第１ｃｈ符号化部１２２、第２ｃｈ符号化部１２４およびスイッチ部１２６を有する。 The core layer encoding unit 102 includes a monaural signal generation unit 110 and a monaural signal encoding unit 112. The enhancement layer encoding unit 104 includes an encoding channel selection unit 120, a first channel encoding unit 122, a second channel encoding unit 124, and a switch unit 126.

コアレイヤ符号化部１０２において、モノラル信号生成部１１０は、ステレオ入力音声信号に含まれる第１ｃｈ入力音声信号s_ch1(n)および第２ｃｈ入力音声信号s_ch2(n)（但し、n=0〜NF-1；NFはフレーム長)から、式（１）に示す関係に基づいてモノラル信号s_mono(n)を生成し、モノラル信号符号化部１１２に出力する。ここで、本実施の形態で説明するステレオ信号は、２つのチャネルの信号、すなわち第１チャネルの信号および第２チャネルの信号から成る。

In the core layer encoding unit 102, the monaural signal generation unit 110 includes a first channel input audio signal s_ch1 (n) and a second channel input audio signal s_ch2 (n) (where n = 0 to NF−1) included in the stereo input audio signal. ; NF is a frame length), a monaural signal s_mono (n) is generated based on the relationship shown in Expression (1), and is output to the monaural signal encoding unit 112. Here, the stereo signal described in the present embodiment includes two channel signals, that is, a first channel signal and a second channel signal.

モノラル信号符号化部１１２は、モノラル信号s_mono(n)をフレーム毎に符号化する。符号化には任意の符号化方式が用いられて良い。モノラル信号s_mono(n)の符号化によって得られた符号化データは、コアレイヤ符号化データとして出力される。より具体的には、コアレイヤ符号化データは、後述の拡張レイヤ符号化データおよび符号化チャネル選択情報と多重され、送信符号化データとして音声符号化装置１００から出力される。 The monaural signal encoding unit 112 encodes the monaural signal s_mono (n) for each frame. Any encoding method may be used for encoding. The encoded data obtained by encoding the monaural signal s_mono (n) is output as core layer encoded data. More specifically, the core layer encoded data is multiplexed with enhancement layer encoded data and encoded channel selection information, which will be described later, and output from the speech encoding apparatus 100 as transmission encoded data.

また、モノラル信号符号化部１１２は、モノラル信号s_mono(n)を復号し、それによって得られるモノラル復号音声信号を拡張レイヤ符号化部１０４の第１ｃｈ符号化部１２２および第２ｃｈ符号化部１２４に出力する。 Also, the monaural signal encoding unit 112 decodes the monaural signal s_mono (n), and the monaural decoded audio signal obtained thereby is sent to the first channel encoding unit 122 and the second channel encoding unit 124 of the enhancement layer encoding unit 104. Output.

拡張レイヤ符号化部１０４において、符号化チャネル選択部１２０は、第１ｃｈ入力音声信号s_ch1(n)および第２ｃｈ入力音声信号s_ch2(n)を用いて、第１チャネルおよび第２チャネルのうち、拡張レイヤでの符号化の対象のチャネルとして最適なチャネルを、所定の選択基準に基づき選択する。最適なチャネルは、フレーム毎に選択される。ここで、所定の選択基準は、拡張レイヤ符号化を高効率にまたは高音質（低符号化歪み）に実現できるための基準である。符号化チャネル選択部１２０は、選択されたチャネルを示す符号化チャネル選択情報を生成する。生成された符号化チャネル選択情報は、スイッチ部１２６に出力されるとともに、前述のコアレイヤ符号化データおよび後述の拡張レイヤ符号化データと多重される。 In the enhancement layer coding unit 104, the coding channel selection unit 120 uses the first channel input speech signal s_ch1 (n) and the second channel input speech signal s_ch2 (n) to expand the first channel and the second channel. An optimum channel as a channel to be encoded in the layer is selected based on a predetermined selection criterion. The optimal channel is selected for each frame. Here, the predetermined selection criterion is a criterion for realizing enhancement layer coding with high efficiency or high sound quality (low coding distortion). The encoded channel selection unit 120 generates encoded channel selection information indicating the selected channel. The generated encoded channel selection information is output to the switch unit 126 and multiplexed with the core layer encoded data described above and enhancement layer encoded data described later.

なお、符号化チャネル選択部１２０は、第１入力音声信号s_ch1(n)および第２入力音声信号s_ch2(n)を用いる代わりに、第１ｃｈ符号化部１２２および第２ｃｈ符号化部１２４での符号化の過程で得られる任意のパラメータまたは信号もしくは符号化の結果（すなわち、後述の第１ｃｈ符号化データおよび第２ｃｈ符号化データ）を用いても良い。 Note that the encoding channel selection unit 120 uses the codes in the first channel encoding unit 122 and the second channel encoding unit 124 instead of using the first input audio signal s_ch1 (n) and the second input audio signal s_ch2 (n). Any parameter or signal or encoding result (that is, first channel encoded data and second channel encoded data described later) obtained in the process of encoding may be used.

第１ｃｈ符号化部１２２は、第１ｃｈ入力音声信号およびモノラル復号音声信号を用いて第１ｃｈ入力音声信号をフレーム毎に符号化し、それによって得られた第１ｃｈ符号化データをスイッチ部１２６に出力する。 The first channel encoding unit 122 encodes the first channel input audio signal for each frame using the first channel input audio signal and the monaural decoded audio signal, and outputs the first channel encoded data obtained thereby to the switch unit 126. .

また、第１ｃｈ符号化部１２２は、第１ｃｈ符号化データを復号して、第１ｃｈ復号音声信号を得る。但し、本実施の形態では、第１ｃｈ符号化部１２２で得られる第１ｃｈ復号音声信号は、図示を省略する。 In addition, the first channel encoding unit 122 decodes the first channel encoded data to obtain a first channel decoded audio signal. However, in the present embodiment, the first channel decoded speech signal obtained by first channel encoder 122 is not shown.

第２ｃｈ符号化部１２４は、第２ｃｈ入力音声信号およびモノラル復号音声信号を用いて第２ｃｈ入力音声信号をフレーム毎に符号化し、それによって得られた第２ｃｈ符号化データをスイッチ部１２６に出力する。 Second channel encoding section 124 encodes the second channel input audio signal for each frame using the second channel input audio signal and the monaural decoded audio signal, and outputs the second channel encoded data obtained thereby to switch section 126. .

また、第２ｃｈ符号化部１２４は、第２ｃｈ符号化データを復号して、第２ｃｈ復号音声信号を得る。但し、本実施の形態では、第２ｃｈ符号化部１２４で得られる第２ｃｈ復号音声信号は、図示を省略する。 Further, the second channel encoding unit 124 decodes the second channel encoded data to obtain a second channel decoded audio signal. However, in the present embodiment, the second channel decoded speech signal obtained by the second channel encoding unit 124 is not shown.

スイッチ部１２６は、符号化チャネル選択情報に従って、第１ｃｈ符号化データおよび第２ｃｈ符号化データのうちいずれか一方をフレーム毎に選択的に出力する。出力される符号化データは、符号化チャネル選択部１２０によって選択されたチャネルの符号化データである。よって、選択されたチャネルが、第１チャネルから第２チャネルに、あるいは、第２チャネルから第１チャネルに切り替わったとき、スイッチ部１２６から出力される符号化データも、第１ｃｈ符号化データから第２ｃｈ符号化データに、あるいは、第２ｃｈ符号化データから第１ｃｈ符号化データに切り替わる。 The switch unit 126 selectively outputs one of the first channel encoded data and the second channel encoded data for each frame in accordance with the encoded channel selection information. The encoded data to be output is the encoded data of the channel selected by the encoding channel selection unit 120. Therefore, when the selected channel is switched from the first channel to the second channel, or from the second channel to the first channel, the encoded data output from the switch unit 126 is also the first channel encoded data from the first channel encoded data. Switching to 2ch encoded data or switching from 2ch encoded data to 1st ch encoded data.

ここで、前述したモノラル信号符号化部１１２、第１ｃｈ符号化部１２２、第２ｃｈ符号化部１２４およびスイッチ部１２６の組み合わせは、モノラル信号を符号化してコアレイヤ符号化データを得るとともに、選択されたチャネル信号を符号化してコアレイヤ符号化データに対応する拡張レイヤ符号化データを得る、符号化部を構成する。 Here, the combination of the monaural signal encoding unit 112, the first channel encoding unit 122, the second channel encoding unit 124, and the switch unit 126 described above is selected while encoding the monaural signal to obtain core layer encoded data. An encoding unit that encodes a channel signal to obtain enhancement layer encoded data corresponding to the core layer encoded data is configured.

図２は、音声符号化装置１００から出力された送信符号化データを受信符号化データとして受信し復号してモノラル復号音声信号およびステレオ復号音声信号を得ることができる音声復号化装置の構成を示すブロック図である。図２の音声復号化装置１５０は、スケーラブル構成のコアレイヤに対応する構成要素であるコアレイヤ復号部１５２と、スケーラブル構成の拡張レイヤに対応する構成要素である拡張レイヤ復号部１５４と、を有する。 FIG. 2 shows a configuration of a speech decoding apparatus that can receive transmission decoded data output from speech encoding apparatus 100 as reception encoded data and decode it to obtain a monaural decoded speech signal and a stereo decoded speech signal. It is a block diagram. The speech decoding apparatus 150 in FIG. 2 includes a core layer decoding unit 152 that is a component corresponding to the core layer of the scalable configuration, and an enhancement layer decoding unit 154 that is a component corresponding to the enhancement layer of the scalable configuration.

コアレイヤ復号部１５２は、モノラル信号復号部１６０を有する。モノラル信号復号部１６０は、受信した受信符号化データに含まれるコアレイヤ符号化データを復号して、モノラル復号音声信号sd_mono(n)を得る。モノラル復号音声信号sd_mono(n)は、後段の音声出力部（図示せず）、第１ｃｈ復号部１７２、第２ｃｈ復号部１７４、第１ｃｈ復号信号生成部１７６および第２ｃｈ復号信号生成部１７８に出力される。 The core layer decoding unit 152 includes a monaural signal decoding unit 160. The monaural signal decoding unit 160 decodes the core layer encoded data included in the received received encoded data to obtain a monaural decoded audio signal sd_mono (n). The monaural decoded audio signal sd_mono (n) is output to the subsequent audio output unit (not shown), the first channel decoding unit 172, the second channel decoding unit 174, the first channel decoded signal generation unit 176, and the second channel decoded signal generation unit 178. Is done.

拡張レイヤ復号部１５４は、スイッチ部１７０、第１ｃｈ復号部１７２、第２ｃｈ復号部１７４、第１ｃｈ復号信号生成部１７６、第２ｃｈ復号信号生成部１７８およびスイッチ部１８０、１８２を有する。 The enhancement layer decoding unit 154 includes a switch unit 170, a first channel decoding unit 172, a second channel decoding unit 174, a first channel decoded signal generation unit 176, a second channel decoded signal generation unit 178, and switch units 180 and 182.

スイッチ部１７０は、受信符号化データに含まれる符号化チャネル選択情報を参照し、受信符号化データに含まれる拡張レイヤ符号化データを、選択されたチャネルに対応する復号部に出力する。具体的には、選択されたチャネルが第１チャネルの場合は、拡張レイヤ符号化データは第１ｃｈ復号部１７２に出力され、選択されたチャネルが第２チャネルの場合は、拡張レイヤ符号化データは第２ｃｈ復号部１７４に出力される。 The switch unit 170 refers to the encoded channel selection information included in the received encoded data, and outputs the enhancement layer encoded data included in the received encoded data to the decoding unit corresponding to the selected channel. Specifically, when the selected channel is the first channel, the enhancement layer encoded data is output to the first channel decoding unit 172, and when the selected channel is the second channel, the enhancement layer encoded data is The data is output to second channel decoding section 174.

第１ｃｈ復号部１７２は、スイッチ部１７０から拡張レイヤ符号化データが入力されたとき、その拡張レイヤ符号化データおよびモノラル復号音声信号sd_mono(n)を用いて第１ｃｈ復号音声信号sd_ch1(n)を復号し、第１ｃｈ復号音声信号sd_ch1(n)をスイッチ部１８０および第２ｃｈ復号信号生成部１７８に出力する。 When the enhancement layer encoded data is input from the switch unit 170, the first channel decoding unit 172 uses the enhancement layer encoded data and the monaural decoded speech signal sd_mono (n) to generate the first channel decoded speech signal sd_ch1 (n). The first channel decoded audio signal sd_ch1 (n) is output to the switch unit 180 and the second channel decoded signal generation unit 178.

第２ｃｈ復号部１７４は、スイッチ部１７０から拡張レイヤ符号化データが入力されたとき、その拡張レイヤ符号化データおよびモノラル復号音声信号sd_mono(n)を用いて第２ｃｈ復号音声信号sd_ch2(n)を復号し、第２ｃｈ復号音声信号sd_ch2(n)をスイッチ部１８２および第１ｃｈ復号信号生成部１７６に出力する。 When the enhancement layer encoded data is input from the switch unit 170, the second channel decoding unit 174 uses the enhancement layer encoded data and the monaural decoded speech signal sd_mono (n) to generate the second channel decoded speech signal sd_ch2 (n). The second channel decoded audio signal sd_ch2 (n) is output to the switch unit 182 and the first channel decoded signal generation unit 176.

第１ｃｈ復号信号生成部１７６は、第２ｃｈ復号部１７４から第２ｃｈ復号音声信号sd_ch2(n)が入力されたとき、第２ｃｈ復号部１７４から入力された第２ｃｈ復号音声信号sd_ch2(n)およびモノラル復号音声信号sd_mono(n)を用いて、次の式（２）に示す関係に基づいて、第１ｃｈ復号音声信号sd_ch1(n)を生成する。生成された第１ｃｈ復号音声信号sd_ch1(n)はスイッチ部１８０に出力される。

When the second channel decoded speech signal sd_ch2 (n) is input from the second channel decoding unit 174, the first channel decoded signal generation unit 176 receives the second channel decoded speech signal sd_ch2 (n) input from the second channel decoding unit 174 and monaural. Using the decoded speech signal sd_mono (n), a first channel decoded speech signal sd_ch1 (n) is generated based on the relationship shown in the following equation (2). The generated first channel decoded audio signal sd_ch1 (n) is output to the switch unit 180.

第２ｃｈ復号信号生成部１７８は、第１ｃｈ復号部１７２から第１ｃｈ復号音声信号sd_ch1(n)が入力されたとき、第１ｃｈ復号部１７２から入力された第１ｃｈ復号音声信号sd_ch1(n)およびモノラル復号音声信号sd_mono(n)を用いて、次の式（３）に示す関係に基づいて、第２ｃｈ復号音声信号sd_ch2(n)を生成する。生成された第２ｃｈ復号音声信号sd_ch2(n)はスイッチ部１８２に出力される。

When the first channel decoded speech signal sd_ch1 (n) is input from the first channel decoding unit 172, the second channel decoded signal generation unit 178 receives the first channel decoded speech signal sd_ch1 (n) and monaural input from the first channel decoding unit 172. Using the decoded audio signal sd_mono (n), a second channel decoded audio signal sd_ch2 (n) is generated based on the relationship shown in the following equation (3). The generated second channel decoded audio signal sd_ch2 (n) is output to the switch unit 182.

スイッチ部１８０は、符号化チャネル選択情報に従って、第１ｃｈ復号部１７２から入力された第１ｃｈ復号音声信号sd_ch1(n)および第１ｃｈ復号信号生成部１７６から入力された第１ｃｈ復号音声信号sd_ch1(n)のいずれか一方を選択的に出力する。具体的には、選択されたチャネルが第１チャネルの場合は、第１ｃｈ復号部１７２から入力された第１ｃｈ復号音声信号sd_ch1(n)が選択され出力される。一方、選択されたチャネルが第２チャネルの場合は、第１ｃｈ復号信号生成部１７６から入力された第１ｃｈ復号音声信号sd_ch1(n)が選択され出力される。 The switch unit 180, according to the encoded channel selection information, the first channel decoded speech signal sd_ch1 (n) input from the first channel decoding unit 172 and the first channel decoded speech signal sd_ch1 (n) input from the first channel decoded signal generation unit 176. ) Is selectively output. Specifically, when the selected channel is the first channel, the first channel decoded speech signal sd_ch1 (n) input from the first channel decoding unit 172 is selected and output. On the other hand, when the selected channel is the second channel, the first channel decoded audio signal sd_ch1 (n) input from the first channel decoded signal generation unit 176 is selected and output.

スイッチ部１８２は、符号化チャネル選択情報に従って、第２ｃｈ復号部１７４から入力された第２ｃｈ復号音声信号sd_ch2(n)および第２ｃｈ復号信号生成部１７８から入力された第２ｃｈ復号音声信号sd_ch2(n)のいずれか一方を選択的に出力する。具体的には、選択されたチャネルが第１チャネルの場合は、第２ｃｈ復号信号生成部１７８から入力された第２ｃｈ復号音声信号sd_ch2(n)が選択され出力される。一方、選択されたチャネルが第２チャネルの場合は、第２ｃｈ復号部１７４から入力された第２ｃｈ復号音声信号sd_ch2(n)が選択され出力される。 The switch unit 182 receives the second channel decoded speech signal sd_ch2 (n) input from the second channel decoding unit 174 and the second channel decoded speech signal sd_ch2 (n) input from the second channel decoded signal generation unit 178 according to the encoded channel selection information. ) Is selectively output. Specifically, when the selected channel is the first channel, the second channel decoded speech signal sd_ch2 (n) input from the second channel decoded signal generation unit 178 is selected and output. On the other hand, when the selected channel is the second channel, the second channel decoded speech signal sd_ch2 (n) input from the second channel decoding unit 174 is selected and output.

スイッチ部１８０から出力される第１ｃｈ復号音声信号sd_ch1(n)およびスイッチ部１８２から出力される第２ｃｈ復号音声信号sd_ch2(n)は、ステレオ復号音声信号として後段の音声出力部（図示せず）に出力される。 The first channel decoded audio signal sd_ch1 (n) output from the switch unit 180 and the second channel decoded audio signal sd_ch2 (n) output from the switch unit 182 are the subsequent audio output units (not shown) as stereo decoded audio signals. Is output.

このように、本実施の形態によれば、第１ｃｈ入力音声信号s_ch1(n)および第２ｃｈ入力音声信号s_ch2(n)から生成されたモノラル信号s_mono(n)を符号化してコアレイヤ符号化データを得るとともに、第１チャネルおよび第２チャネルのうち選択されたチャネルの入力音声信号（第１ｃｈ入力音声信号s_ch1(n)または第２ｃｈ入力音声信号s_ch2(n)）を符号化して拡張レイヤ符号化データを得るため、ステレオ信号の複数チャネル間の相関が小さい場合に予測性能（予測ゲイン）が不十分になることを回避することができ、効率的にステレオ音声を符号化することができる。 As described above, according to the present embodiment, the monaural signal s_mono (n) generated from the first channel input audio signal s_ch1 (n) and the second channel input audio signal s_ch2 (n) is encoded to generate the core layer encoded data. And encoding the input audio signal (first channel input audio signal s_ch1 (n) or second channel input audio signal s_ch2 (n)) of the channel selected from the first channel and the second channel, thereby obtaining enhancement layer encoded data. Therefore, it is possible to avoid that the prediction performance (prediction gain) becomes insufficient when the correlation between the plurality of channels of the stereo signal is small, and it is possible to efficiently encode the stereo sound.

（実施の形態２）
図３は、本発明の実施の形態２に係る音声符号化装置の構成を示すブロック図である。 (Embodiment 2)
FIG. 3 is a block diagram showing the configuration of the speech coding apparatus according to Embodiment 2 of the present invention.

なお、図３の音声符号化装置２００は、実施の形態１で説明した音声符号化装置１００と同様の基本的構成を有する。よって、本実施の形態で説明する構成要素のうち実施の形態１で説明したものと同様のものには、実施の形態１で用いたものと同一の参照符号を付し、その構成要素についての詳細な説明を省略する。 Note that speech coding apparatus 200 in FIG. 3 has the same basic configuration as speech coding apparatus 100 described in the first embodiment. Therefore, the same reference numerals as those used in the first embodiment are given to the same components as those described in the first embodiment among the components described in the present embodiment, and the components are not described. Detailed description is omitted.

また、音声符号化装置２００から出力される送信符号化データは、実施の形態１で説明した音声復号化装置１５０と同様の基本的構成を有する音声復号化装置において復号することができる。 Also, transmission encoded data output from speech coding apparatus 200 can be decoded by a speech decoding apparatus having the same basic configuration as speech decoding apparatus 150 described in the first embodiment.

音声符号化装置２００は、コアレイヤ符号化部１０２および拡張レイヤ符号化部２０２を有する。拡張レイヤ符号化部２０２は、第１ｃｈ符号化部１２２、第２ｃｈ符号化部１２４、スイッチ部１２６および符号化チャネル選択部２１０を有する。 Speech coding apparatus 200 has core layer coding section 102 and enhancement layer coding section 202. The enhancement layer encoding unit 202 includes a first channel encoding unit 122, a second channel encoding unit 124, a switch unit 126, and an encoded channel selection unit 210.

符号化チャネル選択部２１０は、第２ｃｈ復号音声生成部２１２、第１ｃｈ復号音声生成部２１４、第１歪み算出部２１６、第２歪み算出部２１８および符号化チャネル決定部２２０を有する。 The encoding channel selection unit 210 includes a second channel decoded speech generation unit 212, a first channel decoded speech generation unit 214, a first distortion calculation unit 216, a second distortion calculation unit 218, and an encoding channel determination unit 220.

第２ｃｈ復号音声生成部２１２は、モノラル信号符号化部１１２によって得られたモノラル復号音声信号および第１ｃｈ符号化部１２２によって得られた第１ｃｈ復号音声信号を用いて、前述の式（１）に示す関係に基づいて、第２ｃｈ推定信号としての第２ｃｈ復号音声信号を生成する。生成された第２ｃｈ復号音声信号は、第１歪み算出部２１６に出力される。 The second channel decoded speech generation unit 212 uses the monaural decoded speech signal obtained by the monaural signal encoding unit 112 and the first channel decoded speech signal obtained by the first channel encoding unit 122, to the above equation (1). Based on the relationship shown, a second channel decoded speech signal is generated as a second channel estimation signal. The generated second channel decoded speech signal is output to first distortion calculation section 216.

第１ｃｈ復号音声生成部２１４は、モノラル信号符号化部１１２によって得られたモノラル復号音声信号および第２ｃｈ符号化部１２４によって得られた第２ｃｈ復号音声信号を用いて、前述の式（１）に示す関係に基づいて、第１ｃｈ推定信号としての第１ｃｈ復号音声信号を生成する。生成された第１ｃｈ復号音声信号は、第２歪み算出部２１８に出力される。 The first channel decoded speech generation unit 214 uses the monaural decoded speech signal obtained by the monaural signal encoding unit 112 and the second channel decoded speech signal obtained by the second channel encoding unit 124, to the above equation (1). Based on the relationship shown, a first channel decoded speech signal is generated as a first channel estimation signal. The generated first channel decoded speech signal is output to second distortion calculation section 218.

前述した第２ｃｈ復号音声生成部２１２および第１ｃｈ復号音声生成部２１４の組み合わせは、推定信号生成部を構成する。 The combination of the second channel decoded speech generation unit 212 and the first channel decoded speech generation unit 214 described above constitutes an estimated signal generation unit.

第１歪み算出部２１６は、第１ｃｈ符号化部１２２によって得られた第１ｃｈ復号音声信号および第２ｃｈ復号音声生成部２１２によって得られた第２ｃｈ復号音声信号を用いて、第１符号化歪みを算出する。第１符号化歪みは、拡張レイヤでの符号化の対象チャネルとして第１チャネルを選択した場合に生じる２チャネル分の符号化歪みに相当する。算出された第１符号化歪みは、符号化チャネル決定部２２０に出力される。 The first distortion calculation unit 216 uses the first channel decoded speech signal obtained by the first channel coding unit 122 and the second channel decoded speech signal obtained by the second channel decoded speech generation unit 212 to perform the first coding distortion. calculate. The first coding distortion corresponds to coding distortion for two channels that occurs when the first channel is selected as a channel to be coded in the enhancement layer. The calculated first coding distortion is output to coding channel determining section 220.

第２歪み算出部２１８は、第２ｃｈ符号化部１２４によって得られた第２ｃｈ復号音声信号および第１ｃｈ復号音声生成部２１４によって得られた第１ｃｈ復号音声信号を用いて、第２符号化歪みを算出する。第２符号化歪みは、拡張レイヤでの符号化の対象チャネルとして第２チャネルを選択した場合に生じる２チャネル分の符号化歪みに相当する。算出された第２符号化歪みは、符号化チャネル決定部２２０に出力される。 The second distortion calculation unit 218 uses the second channel decoded speech signal obtained by the second channel encoding unit 124 and the first channel decoded speech signal obtained by the first channel decoded speech generation unit 214 to perform the second coding distortion. calculate. The second coding distortion corresponds to coding distortion for two channels that occurs when the second channel is selected as a target channel for coding in the enhancement layer. The calculated second coding distortion is output to coding channel determining section 220.

ここで、２チャネル分の符号化歪み（第１符号化歪みまたは第２符号化歪み）の算出方法としては、例えば次の２つの方法が挙げられる。１つは、各チャネルの復号音声信号（第１ｃｈ復号音声信号または第２ｃｈ復号音声信号）の、対応する入力音声信号（第１ｃ
ｈ入力音声信号または第２ｃｈ入力音声信号）に対する誤差パワーの比（信号対符号化歪み比）の２チャネル分の平均を、２チャネル分の符号化歪みとして求める方法である。もう１つは、前述した誤差パワーの２チャネル分の総和を、２チャネル分の符号化歪みとして求める方法である。 Here, as a method for calculating the coding distortion (first coding distortion or second coding distortion) for two channels, for example, the following two methods may be mentioned. One is a corresponding input audio signal (first c) of the decoded audio signal (first channel decoded audio signal or second channel decoded audio signal) of each channel.
In this method, the average of the error power ratio (signal to coding distortion ratio) for two channels with respect to the h input voice signal or the second channel input voice signal) is obtained as coding distortion for two channels. The other is a method for obtaining the sum of the error power for two channels as the coding distortion for two channels.

前述した第１歪み算出部２１６および第２歪み算出部２１８の組み合わせは、歪み算出部を構成する。また、この歪み算出部および前述した推定信号生成部の組み合わせは算出部を構成する。 The combination of the first distortion calculation unit 216 and the second distortion calculation unit 218 described above constitutes a distortion calculation unit. Moreover, the combination of this distortion calculation part and the estimated signal generation part mentioned above comprises a calculation part.

符号化チャネル決定部２２０は、第１符号化歪みの値および第２符号化歪みの値を相互比較し、第１符号化歪みおよび第２符号化歪みのうち、より小さい値を有するものを選択する。符号化チャネル決定部２２０は、選択された符号化歪みに対応するチャネルを、拡張レイヤでの符号化の対象チャネル（符号化チャネル）として選択し、選択されたチャネルを示す符号化チャネル選択情報を生成する。より具体的には、符号化チャネル決定部２２０は、第１符号化歪みが第２符号化歪みよりも小さい場合、第１チャネルを選択し、第２符号化歪みが第１符号化歪みよりも小さい場合、第２チャネルを選択する。生成された符号化チャネル選択情報は、スイッチ部１２６に出力されるとともに、コアレイヤ符号化データおよび拡張レイヤ符号化データと多重される。 The coding channel determination unit 220 compares the first coding distortion value and the second coding distortion value with each other, and selects the first coding distortion and the second coding distortion having the smaller value. To do. The coding channel determination unit 220 selects a channel corresponding to the selected coding distortion as a target channel (coding channel) for coding in the enhancement layer, and sets coding channel selection information indicating the selected channel. Generate. More specifically, the coding channel determination unit 220 selects the first channel when the first coding distortion is smaller than the second coding distortion, and the second coding distortion is larger than the first coding distortion. If it is smaller, the second channel is selected. The generated encoded channel selection information is output to the switch unit 126 and multiplexed with the core layer encoded data and the enhancement layer encoded data.

このように、本実施の形態によれば、符号化チャネルの選択基準として、符号化歪みの大きさを使用するため、拡張レイヤの符号化歪みを低減することができ、効率的にステレオ音声を符号化することができる。 Thus, according to the present embodiment, since the magnitude of the coding distortion is used as the coding channel selection criterion, the coding distortion of the enhancement layer can be reduced, and stereo audio can be efficiently generated. Can be encoded.

なお、本実施の形態では、対応する入力音声信号に対する各チャネルの復号音声信号の誤差パワーの比または総和を算出し、この算出結果を符号化歪みとして用いているが、その代わりに、第１ｃｈ符号化部１２２および第２ｃｈ符号化部１２４での符号化の過程で得られる符号化歪みを用いても良い。また、この符号化歪みは、聴覚重み付きの歪みであっても良い。 In the present embodiment, the ratio or sum of the error power of the decoded audio signal of each channel with respect to the corresponding input audio signal is calculated, and this calculation result is used as encoding distortion. You may use the encoding distortion obtained in the encoding process in the encoding part 122 and the 2ch encoding part 124. FIG. The encoding distortion may be a distortion with auditory weight.

（実施の形態３）
図４は、本発明の実施の形態３に係る音声符号化装置の構成を示すブロック図である。なお、図４の音声符号化装置３００は、前述した実施の形態で説明した音声符号化装置１００、２００と同様の基本的構成を有する。よって、本実施の形態で説明する構成要素のうち前述の実施の形態で説明したものと同様のものについては、前述の実施の形態で用いたものと同一の参照符号を付し、その詳細な説明を省略する。 (Embodiment 3)
FIG. 4 is a block diagram showing the configuration of the speech coding apparatus according to Embodiment 3 of the present invention. Note that speech coding apparatus 300 in FIG. 4 has the same basic configuration as speech coding apparatuses 100 and 200 described in the above-described embodiments. Therefore, among the components described in this embodiment, the same components as those described in the above embodiment are denoted by the same reference numerals as those used in the above embodiment, and the detailed description thereof is omitted. Description is omitted.

また、音声符号化装置３００から出力される送信符号化データは、実施の形態１で説明した音声復号化装置１５０と同様の基本的構成を有する音声復号化装置において復号することができる。 Also, transmission encoded data output from speech coding apparatus 300 can be decoded by a speech decoding apparatus having the same basic configuration as speech decoding apparatus 150 described in the first embodiment.

音声符号化装置３００は、コアレイヤ符号化部１０２および拡張レイヤ符号化部３０２を有する。拡張レイヤ符号化部３０２は、符号化チャネル選択部３１０、第１ｃｈ符号化部３１２、第２ｃｈ符号化部３１４およびスイッチ部１２６を有する。 Speech encoding apparatus 300 includes core layer encoding section 102 and enhancement layer encoding section 302. The enhancement layer encoding unit 302 includes an encoding channel selection unit 310, a first channel encoding unit 312, a second channel encoding unit 314, and a switch unit 126.

符号化チャネル選択部３１０は、図５に示すように、第１ｃｈチャネル内相関度算出部３２０、第２ｃｈチャネル内相関度算出部３２２および符号化チャネル決定部３２４を有する。 As illustrated in FIG. 5, the encoding channel selection unit 310 includes a first channel intra-channel correlation calculation unit 320, a second channel intra-channel correlation calculation unit 322, and an encoding channel determination unit 324.

第１ｃｈチャネル内相関度算出部３２０は、第１ｃｈ入力音声信号に対する正規化最大自己相関係数値を用いて、第１チャネルのチャネル内相関度cor1を算出する。 The first channel intra-channel correlation degree calculation unit 320 calculates the first channel intra-channel correlation degree cor1 using the normalized maximum autocorrelation coefficient value for the first channel input speech signal.

第２ｃｈチャネル内相関度算出部３２２は、第２ｃｈ入力音声信号に対する正規化最大自己相関係数値を用いて、第２チャネルのチャネル内相関度cor2を算出する。 The second channel intra-channel correlation calculation unit 322 calculates the intra-channel correlation degree cor2 of the second channel using the normalized maximum autocorrelation coefficient value for the second channel input speech signal.

なお、各チャネルのチャネル内相関度の算出には、各チャネルの入力音声信号に対する正規化最大自己相関係数値を用いる代わりに、各チャネルの入力音声信号に対するピッチ予測ゲイン値を用いたり、ＬＰＣ（Linear Prediction Coding）予測残差信号に対する正規化最大自己相関係数値およびピッチ予測ゲイン値を用いたりすることができる。 For calculating the intra-channel correlation for each channel, instead of using the normalized maximum autocorrelation coefficient value for the input speech signal of each channel, the pitch prediction gain value for the input speech signal of each channel is used, or the LPC ( Linear Prediction Coding) normalized maximum autocorrelation coefficient value and pitch prediction gain value for the prediction residual signal can be used.

符号化チャネル決定部３２４は、チャネル内相関度cor1、cor2を相互比較し、これらのうち、より高い値を有するものを選択する。符号化チャネル決定部３２４は、選択されたチャネル内相関度に対応するチャネルを、拡張レイヤでの符号化チャネルとして選択し、選択されたチャネルを示す符号化チャネル選択情報を生成する。より具体的には、符号化チャネル決定部３２４は、チャネル内相関度cor1がチャネル内相関度cor2よりも高い場合、第１チャネルを選択し、チャネル内相関度cor2がチャネル内相関度cor1よりも高い場合、第２チャネルを選択する。生成された符号化チャネル選択情報は、スイッチ部１２６に出力されるとともに、コアレイヤ符号化データおよび拡張レイヤ符号化データと多重される。 The encoded channel determination unit 324 compares the intra-channel correlations cor1 and cor2 with each other, and selects one having a higher value. The encoded channel determination unit 324 selects a channel corresponding to the selected intra-channel correlation as an encoded channel in the enhancement layer, and generates encoded channel selection information indicating the selected channel. More specifically, if the intra-channel correlation cor1 is higher than the intra-channel correlation cor2, the encoded channel determination unit 324 selects the first channel, and the intra-channel correlation cor2 is higher than the intra-channel correlation cor1. If so, select the second channel. The generated encoded channel selection information is output to the switch unit 126 and multiplexed with the core layer encoded data and the enhancement layer encoded data.

第１ｃｈ符号化部３１２および第２ｃｈ符号化部３１４は、互いに同様の内部構成を有する。よって、説明の簡略化のために、第１ｃｈ符号化部３１２および第２ｃｈ符号化部３１４のうちいずれか一方を「第Ａｃｈ符号化部３３０」として示し、その内部構成について図６を用いて説明する。なお、「Ａｃｈ」の「Ａ」は１または２を表す。また、図中においておよび以下の説明において用いられる「Ｂ」も１または２を表す。但し、「Ａ」が１の場合「Ｂ」は２であり、「Ａ」が２の場合「Ｂ」は１である。 The first channel encoding unit 312 and the second channel encoding unit 314 have the same internal configuration. Therefore, for simplification of description, one of the first channel encoding unit 312 and the second channel encoding unit 314 is indicated as “first channel encoding unit 330”, and the internal configuration thereof will be described with reference to FIG. To do. Note that “A” in “Ach” represents 1 or 2. Also, “B” used in the drawings and in the following description represents 1 or 2. However, when “A” is 1, “B” is 2, and when “A” is 2, “B” is 1.

第Ａｃｈ符号化部３３０は、スイッチ部３３２、第Ａｃｈ信号チャネル内予測部３３４、減算器３３６、３３８、第Ａｃｈ予測残差信号符号化部３４０および第Ｂｃｈ推定信号生成部３４２を有する。 The Ach encoding unit 330 includes a switch unit 332, an Ach signal intra-channel prediction unit 334, subtracters 336 and 338, an Ach prediction residual signal encoding unit 340, and a Bch estimation signal generation unit 342.

スイッチ部３３２は、第Ａｃｈ予測残差信号符号化部３４０によって得られた第Ａｃｈ復号音声信号、または、第Ｂｃｈ符号化部（図示せず）によって得られた第Ａｃｈ推定信号を、符号化チャネル選択情報に従って第Ａｃｈ信号チャネル内予測部３３４に出力する。具体的には、選択されたチャネルが第Ａチャネルの場合は、第Ａｃｈ復号音声信号が第Ａｃｈ信号チャネル内予測部３３４に出力され、選択されたチャネルが第Ｂチャネルの場合は、第Ａｃｈ推定信号が第Ａｃｈ信号チャネル内予測部３３４に出力される。 The switch unit 332 outputs the Ach decoded speech signal obtained by the Ach prediction residual signal encoding unit 340 or the Ach estimation signal obtained by the Bch encoding unit (not shown) as an encoding channel. According to the selection information, output to the Ach signal intra-channel prediction unit 334. Specifically, when the selected channel is the Ath channel, the Ach decoded speech signal is output to the Ach signal intra-channel prediction unit 334, and when the selected channel is the Bth channel, the Ach estimation is performed. The signal is output to the Ach signal intra-channel prediction unit 334.

第Ａｃｈ信号チャネル内予測部３３４は、第Ａチャネルのチャネル内予測を行う。チャネル内予測は、チャネル内の信号の相関性を利用して過去のフレームの信号から現在のフレームの信号を予測するものである。チャネル内予測の結果として、チャネル内予測信号Sp(n)およびチャネル内予測パラメータ量子化符号が得られる。例えば１次のピッチ予測フィルタを用いる場合、チャネル内予測信号Sp(n)は、次の式（４）によって算出される。

ここで、Sin(n)はピッチ予測フィルタへの入力信号、Ｔはピッチ予測フィルタのラグ、ｇｐはピッチ予測フィルタのピッチ予測係数である。 The A-channel signal intra-channel prediction unit 334 performs intra-channel prediction of the A-th channel. The intra-channel prediction is a method for predicting a signal of the current frame from a signal of a past frame using the correlation of signals within the channel. As a result of the intra-channel prediction, an intra-channel prediction signal Sp (n) and an intra-channel prediction parameter quantization code are obtained. For example, when a primary pitch prediction filter is used, the intra-channel prediction signal Sp (n) is calculated by the following equation (4).

Here, Sin (n) is an input signal to the pitch prediction filter, T is a lag of the pitch prediction filter, and gp is a pitch prediction coefficient of the pitch prediction filter.

前述した過去のフレームの信号は、第Ａｃｈ信号チャネル内予測部３３４の内部に設けられたチャネル内予測バッファ（第Ａｃｈチャネル内予測バッファ）に保持される。また、第Ａｃｈチャネル内予測バッファは、次フレームの信号の予測のために、スイッチ部３３２から入力された信号で更新される。チャネル内予測バッファの更新の詳細については後述する。 The above-mentioned past frame signals are held in an intra-channel prediction buffer (an A-ch intra-channel prediction buffer) provided in the intra-Ach signal intra-channel prediction unit 334. The intra-Ach intra-channel prediction buffer is updated with the signal input from the switch unit 332 in order to predict the signal of the next frame. Details of the update of the intra-channel prediction buffer will be described later.

減算器３３６は、第Ａｃｈ入力音声信号からモノラル復号音声信号を減算する。減算器３３８は、減算器３３６での減算によって得られた信号から、第Ａｃｈ信号チャネル内予測部３３４でのチャネル内予測によって得られたチャネル内予測信号Sp(n)を減算する。減算器３３８での減算によって得られた信号、すなわち第Ａｃｈ予測残差信号は、第Ａｃｈ予測残差信号符号化部３４０に出力される。 The subtracter 336 subtracts the monaural decoded audio signal from the Ach input audio signal. The subtracter 338 subtracts the intra-channel prediction signal Sp (n) obtained by the intra-channel prediction in the Ach signal intra-channel prediction unit 334 from the signal obtained by the subtraction in the subtracter 336. The signal obtained by the subtraction in the subtracter 338, that is, the Ach prediction residual signal is output to the Ach prediction residual signal encoding unit 340.

第Ａｃｈ予測残差信号符号化部３４０は、第Ａｃｈ予測残差信号を任意の符号化方式で符号化する。この符号化によって、予測残差符号化データおよび第Ａｃｈ復号音声信号が得られる。予測残差符号化データは、チャネル内予測パラメータ量子化符号とともに、第Ａｃｈ符号化データとして出力される。第Ａｃｈ復号音声信号は、第Ｂｃｈ推定信号生成部３４２およびスイッチ部３３２に出力される。 The Ach prediction residual signal encoding unit 340 encodes the Ach prediction residual signal using an arbitrary encoding method. By this encoding, prediction residual encoded data and the Ach decoded speech signal are obtained. The prediction residual encoded data is output as the Ach encoded data together with the intra-channel prediction parameter quantization code. The Ach decoded speech signal is output to the Bch estimated signal generation unit 342 and the switch unit 332.

第Ｂｃｈ推定信号生成部３４２は、第Ａｃｈ復号音声信号およびモノラル復号音声信号から、第Ａチャネル符号化時の第Ｂｃｈ復号音声信号として第Ｂｃｈ推定信号を生成する。生成された第Ｂｃｈ推定信号は、図示されない第Ｂｃｈ符号化部のスイッチ部（スイッチ部３３２と同様）に出力される。 B-th channel estimation signal generation section 342 generates a B-th channel estimation signal from the A-channel decoded speech signal and monaural decoded speech signal as a B-channel decoded speech signal at the time of A-th channel coding. The generated Bch estimation signal is output to a switch unit (similar to the switch unit 332) of the Bch encoding unit (not shown).

次いで、チャネル内予測バッファの更新動作について説明する。ここでは、符号化チャネル選択部３１０によって第Ａチャネルが選択された場合を例にとり、第Ａチャネルのチャネル内予測バッファの更新動作例を図７を用いて説明し、第Ｂチャネルのチャネル内予測バッファの更新動作例を図８を用いて説明する。 Next, the update operation of the intra-channel prediction buffer will be described. Here, taking as an example the case where the A-th channel is selected by the encoded channel selection unit 310, an example of the update operation of the intra-channel prediction buffer for the A-th channel will be described with reference to FIG. An example of the buffer update operation will be described with reference to FIG.

図７に示す動作例では、第Ａｃｈ予測残差信号符号化部３４０によって得られた、第ｉフレーム（ｉは任意の自然数）の第Ａｃｈ復号音声信号を用いて、第Ａｃｈ信号チャネル内予測部３３４の内部の第Ａｃｈチャネル内予測バッファ３５１が更新される（ＳＴ１０１）。そして、更新された第Ａｃｈチャネル内予測バッファ３５１は、次フレームである第ｉ＋１フレームについてのチャネル内予測に用いられる（ＳＴ１０２）。 In the operation example illustrated in FIG. 7, the Ach signal intra-channel prediction unit using the Ach decoded speech signal of the i-th frame (i is an arbitrary natural number) obtained by the Ath prediction residual signal encoding unit 340. The intra-Ach channel intra-channel prediction buffer 351 inside 334 is updated (ST101). The updated Ach intra-channel prediction buffer 351 is used for intra-channel prediction for the i + 1-th frame that is the next frame (ST102).

図８に示す動作例では、第ｉフレームの第Ａｃｈ復号音声信号および第ｉフレームのモノラル復号音声信号を用いて、第ｉフレームの第Ｂｃｈ推定信号が生成される（ＳＴ２０１）。生成された第Ｂｃｈ推定信号は、第Ａｃｈ符号化部３３０から図示されない第Ｂｃｈ符号化部に出力される。そして、第Ｂｃｈ符号化部において、第Ｂｃｈ推定信号は、スイッチ部（スイッチ部３３２と同様）を経由して第Ｂｃｈ信号チャネル内予測部（第Ａｃｈ信号チャネル内予測部３３４と同様）に出力される。第Ｂｃｈ信号チャネル内予測部の内部に設けられた第Ｂｃｈチャネル内予測バッファ３５２は、第Ｂｃｈ推定信号によって更新される（ＳＴ２０２）。そして、更新された第Ｂｃｈチャネル内予測バッファ３５２は、第ｉ＋１フレームについてのチャネル内予測に用いられる（ＳＴ２０３）。 In the operation example shown in FIG. 8, the i-th frame Bch estimation signal is generated using the i-th frame Ach decoded audio signal and the i-frame monaural decoded audio signal (ST201). The generated Bch estimation signal is output from Ach encoding section 330 to a Bch encoding section (not shown). Then, in the Bch encoding unit, the Bch estimation signal is output to the Bch signal intra-channel prediction unit (similar to the Ach signal intra-channel prediction unit 334) via the switch unit (similar to the switch unit 332). The The intra-Bch channel prediction buffer 352 provided in the intra-Bch signal intra-channel prediction unit is updated with the Bch estimation signal (ST202). The updated Bch intra-channel prediction buffer 352 is used for intra-channel prediction for the (i + 1) th frame (ST203).

あるフレームにおいて、第Ａチャネルが符号化チャネルとして選択された場合、第Ｂｃｈ符号化部では、第Ｂｃｈチャネル内予測バッファ３５２の更新動作以外の動作は要求されないため、そのフレームにおいては第Ｂｃｈ入力音声信号の符号化を休止することができる。 When the A-th channel is selected as the coding channel in a certain frame, the B-th channel encoding unit does not require any operation other than the update operation of the intra-B-channel prediction buffer 352. Signal encoding can be paused.

このように、本実施の形態によれば、符号化チャネルの選択基準として、チャネル内相関度の高さを使用するため、チャネル内相関度が高いチャネルの信号を符号化することができ、チャネル内予測による符号化効率を向上させることができる。 As described above, according to the present embodiment, since the high intra-channel correlation is used as the selection criterion for the encoded channel, it is possible to encode a channel signal having a high intra-channel correlation. Encoding efficiency by intra prediction can be improved.

なお、音声符号化装置３００の構成に、チャネル間予測を実行する構成要素を加えることもできる。この場合、音声符号化装置３００は、モノラル復号音声信号を減算器３３６に入力する代わりに、モノラル復号音声信号を用いて第Ａｃｈ音声信号を予測するチャネル間予測を行い、それによって生成されたチャネル間予測信号を減算器３３６に入力する構成を、採用することができる。 Note that a component that performs inter-channel prediction can be added to the configuration of the speech encoding apparatus 300. In this case, instead of inputting the monaural decoded audio signal to the subtracter 336, the audio encoding device 300 performs inter-channel prediction that predicts the Ach audio signal using the monaural decoded audio signal, and the channel generated thereby A configuration in which the inter-prediction signal is input to the subtractor 336 can be employed.

（実施の形態４）
図９は、本発明の実施の形態４に係る音声符号化装置の構成を示すブロック図である。 (Embodiment 4)
FIG. 9 is a block diagram showing the configuration of the speech coding apparatus according to Embodiment 4 of the present invention.

なお、図９の音声符号化装置４００は、前述の実施の形態で説明した音声符号化装置１００、２００、３００と同様の基本的構成を有する。よって、本実施の形態で説明する構成要素のうち前述の実施の形態で説明したものと同様のものについては、前述の実施の形態で用いたものと同一の参照符号を付し、その詳細な説明を省略する。 Note that speech encoding apparatus 400 in FIG. 9 has the same basic configuration as speech encoding apparatuses 100, 200, and 300 described in the above embodiments. Therefore, among the components described in this embodiment, the same components as those described in the above embodiment are denoted by the same reference numerals as those used in the above embodiment, and the detailed description thereof is omitted. Description is omitted.

また、音声符号化装置４００から出力される送信符号化データは、実施の形態１で説明した音声復号化装置１５０と同様の基本的構成を有する音声復号化装置において復号することができる。 Also, transmission encoded data output from speech coding apparatus 400 can be decoded by a speech decoding apparatus having the same basic configuration as speech decoding apparatus 150 described in the first embodiment.

音声符号化装置４００は、コアレイヤ符号化部４０２および拡張レイヤ符号化部４０４を有する。コアレイヤ符号化部４０２は、モノラル信号生成部１１０およびモノラル信号ＣＥＬＰ（Code Excited Linear Prediction）符号化部４１０を有する。拡張レイヤ符号化部４０４は、符号化チャネル選択部３１０、第１ｃｈＣＥＬＰ符号化部４２２、第２ｃｈＣＥＬＰ符号化部４２４およびスイッチ部１２６を有する。 Speech encoding apparatus 400 includes core layer encoding section 402 and enhancement layer encoding section 404. The core layer encoding unit 402 includes a monaural signal generation unit 110 and a monaural signal CELP (Code Excited Linear Prediction) encoding unit 410. The enhancement layer encoding unit 404 includes an encoding channel selection unit 310, a first ch CELP encoding unit 422, a second ch CELP encoding unit 424, and a switch unit 126.

コアレイヤ符号化部４０２において、モノラル信号ＣＥＬＰ符号化部４１０は、モノラル信号生成部１１０によって生成されたモノラル信号に対してＣＥＬＰ符号化を行う。この符号化によって得られた符号化データは、コアレイヤ符号化データとして出力される。また、この符号化によって、モノラル駆動音源信号が得られる。さらに、モノラル信号ＣＥＬＰ符号化部４１０は、モノラル信号を復号し、それによって得られるモノラル復号音声信号を出力する。コアレイヤ符号化データは、拡張レイヤ符号化データおよび符号化チャネル選択情報と多重される。また、コアレイヤ符号化データ、モノラル駆動音源信号およびモノラル復号音声信号は、第１ｃｈＣＥＬＰ符号化部４２２および第２ｃｈＣＥＬＰ符号化部４２４に出力される。 In the core layer encoding unit 402, the monaural signal CELP encoding unit 410 performs CELP encoding on the monaural signal generated by the monaural signal generation unit 110. The encoded data obtained by this encoding is output as core layer encoded data. In addition, a monaural driving sound source signal is obtained by this encoding. Further, the monaural signal CELP encoding unit 410 decodes the monaural signal and outputs a monaural decoded audio signal obtained thereby. The core layer encoded data is multiplexed with enhancement layer encoded data and encoded channel selection information. Also, the core layer encoded data, the monaural driving excitation signal, and the monaural decoded speech signal are output to the first ch CELP encoding unit 422 and the second ch CELP encoding unit 424.

拡張レイヤ符号化部４０４において、第１ｃｈＣＥＬＰ符号化部４２２および第２ｃｈＣＥＬＰ符号化部４２４は、互いに同様の内部構成を有する。よって、説明の簡略化のために、第１ｃｈＣＥＬＰ符号化部４２２および第２ｃｈＣＥＬＰ符号化部４２４のうちいずれか一方を「第ＡｃｈＣＥＬＰ符号化部４３０」として示し、その内部構成について図１０を用いて説明する。なお、前述したように、「Ａｃｈ」の「Ａ」は１または２を表し、図中においておよび以下の説明において用いられる「Ｂ」も１または２を表し、「Ａ」が１の場合「Ｂ」は２であり、「Ａ」が２の場合「Ｂ」は１である。 In enhancement layer encoding section 404, first ch CELP encoding section 422 and second ch CELP encoding section 424 have the same internal configuration. Therefore, for simplification of description, one of the first ch CELP encoding unit 422 and the second ch CELP encoding unit 424 is indicated as a “second Ach CELP encoding unit 430”, and the internal configuration thereof will be described with reference to FIG. To do. As described above, “A” of “Ach” represents 1 or 2, “B” used in the drawings and in the following description also represents 1 or 2, and when “A” is 1, “B” "Is 2, and when" A "is 2," B "is 1.

第ＡｃｈＣＥＬＰ符号化部４３０は、第ＡｃｈＬＰＣ（Linear Prediction Coding）分析部４３１、乗算器４３２、４３３、４３４、４３５、４３６、スイッチ部４３７、第Ａｃｈ適応符号帳４３８、第Ａｃｈ固定符号帳４３９、加算器４４０、合成フィルタ４４１、聴覚重み付け部４４２、歪最小化部４４３、第Ａｃｈ復号部４４４、第Ｂｃｈ推定信号
生成部４４５、第ＡｃｈＬＰＣ分析部４４６、第ＡｃｈＬＰＣ予測残差信号生成部４４７および減算器４４８を有する。 The Ach CELP encoding unit 430 includes an Ach LPC (Linear Prediction Coding) analysis unit 431, multipliers 432, 433, 434, 435, and 436, a switch unit 437, an Ach adaptive codebook 438, an Ach fixed codebook 439, and an addition. 440, synthesis filter 441, perceptual weighting unit 442, distortion minimizing unit 443, Ach decoding unit 444, Bch estimation signal generation unit 445, Ach LPC analysis unit 446, Ach LPC prediction residual signal generation unit 447 and subtractor 448.

第ＡｃｈＣＥＬＰ符号化部４３０において、第ＡｃｈＬＰＣ分析部４３１は、第Ａｃｈ入力音声信号に対するＬＰＣ分析を行い、それによって得られた第ＡｃｈＬＰＣパラメータを量子化する。第ＡｃｈＬＰＣ分析部４３１は、第ＡｃｈＬＰＣパラメータとモノラル信号に対するＬＰＣパラメータとの相関が一般に高いことを利用して、ＬＰＣパラメータの量子化に際して、コアレイヤ符号化データからモノラル信号量子化ＬＰＣパラメータを復号し、復号されたモノラル信号量子化ＬＰＣパラメータに対する第ＡｃｈＬＰＣパラメータの差分成分を量子化して、第ＡｃｈＬＰＣ量子化符号を得る。第ＡｃｈＬＰＣ量子化符号は、合成フィルタ４４１に出力される。また、第ＡｃｈＬＰＣ量子化符号は、後述の第Ａｃｈ駆動音源符号化データとともに第Ａｃｈ符号化データとして出力される。差分成分の量子化を行うことにより、拡張レイヤのＬＰＣパラメータの量子化を効率化することができる。 In the AchCELP encoding unit 430, the AchLPC analysis unit 431 performs LPC analysis on the Ach input speech signal, and quantizes the AchLPC parameters obtained thereby. The AchLPC analysis unit 431 uses the fact that the correlation between the AchLPC parameter and the LPC parameter for the monaural signal is generally high, and decodes the monaural signal quantized LPC parameter from the core layer encoded data when the LPC parameter is quantized. The difference component of the AchLPC parameter with respect to the decoded monaural signal quantization LPC parameter is quantized to obtain the AchLPC quantized code. The Ach LPC quantization code is output to the synthesis filter 441. The Ach LPC quantization code is output as Ach encoded data together with Ach drive excitation encoded data described later. By performing the quantization of the difference component, the quantization of the LPC parameters of the enhancement layer can be made efficient.

第ＡｃｈＣＥＬＰ符号化部４３０において、第Ａｃｈ駆動音源符号化データは、第Ａｃｈ駆動音源信号のモノラル駆動音源信号に対する残差成分を符号化することによって得られる。この符号化は、ＣＥＬＰ符号化における音源探索によって実現される。 In the AchCELP encoding unit 430, the Ach drive excitation code data is obtained by encoding the residual component of the Ach drive excitation signal with respect to the monaural drive excitation signal. This encoding is realized by sound source search in CELP encoding.

つまり、第ＡｃｈＣＥＬＰ符号化部４３０では、適応音源信号、固定音源信号およびモノラル駆動音源信号に、それぞれに対応するゲインが乗じられ、ゲイン乗算後のこれらの音源信号が加算され、その加算によって得られた駆動音源信号に対して、歪み最小化による閉ループ型音源探索（適応符号帳探索、固定符号帳探索およびゲイン探索）が行われる。そして、適応符号帳インデクス（適応音源インデクス）、固定符号帳インデクス（固定音源インデクス）ならびに適応音源信号、固定音源信号およびモノラル駆動音源信号に対するゲイン符号が、第Ａｃｈ駆動音源符号化データとして出力される。コアレイヤの符号化、拡張レイヤの符号化および符号化チャネルの選択がフレーム毎に行われるのに対し、この音源探索は、フレームを複数の部分に分割することによって得られるサブフレーム毎に行われる。以下、この構成についてより具体的に説明する。 That is, the AchCELP encoding unit 430 multiplies the adaptive excitation signal, the fixed excitation signal, and the monaural driving excitation signal by the corresponding gain, adds these excitation signals after gain multiplication, and obtains the result by addition. A closed-loop type sound source search (adaptive codebook search, fixed codebook search, and gain search) by distortion minimization is performed on the drive sound source signal. Then, the adaptive codebook index (adaptive excitation index), fixed codebook index (fixed excitation index), and the gain code for the adaptive excitation signal, fixed excitation signal, and monaural driving excitation signal are output as the Ach driving excitation encoded data. . While the coding of the core layer, the coding of the enhancement layer, and the selection of the coding channel are performed for each frame, the sound source search is performed for each subframe obtained by dividing the frame into a plurality of parts. Hereinafter, this configuration will be described more specifically.

合成フィルタ４４１は、第ＡｃｈＬＰＣ分析部４３１から出力された第ＡｃｈＬＰＣ量子化符号を用いて、加算器４４０から出力された信号を駆動音源としてＬＰＣ合成フィルタによる合成を行う。この合成によって得られた合成信号は、減算器４４８に出力される。 The synthesis filter 441 uses the AchLPC quantization code output from the AchLPC analysis unit 431 to perform synthesis by the LPC synthesis filter using the signal output from the adder 440 as a driving sound source. A synthesized signal obtained by this synthesis is output to the subtracter 448.

減算器４４８は、第Ａｃｈ入力音声信号から合成信号を減算することにより誤差信号を算出する。誤差信号は、聴覚重み付け部４４２に出力される。誤差信号は、符号化歪みに相当する。 The subtracter 448 calculates an error signal by subtracting the synthesized signal from the Ach input audio signal. The error signal is output to the auditory weighting unit 442. The error signal corresponds to coding distortion.

聴覚重み付け部４４２は、符号化歪み（つまり、前述の誤差信号）に対して聴覚的な重み付けを行い、重み付け後の符号化歪みを歪最小化部４４３に出力する。 The auditory weighting unit 442 performs auditory weighting on the coding distortion (that is, the error signal described above), and outputs the weighted coding distortion to the distortion minimizing unit 443.

歪最小化部４４３は、符号化歪みを最小とするような適応符号帳インデクスおよび固定符号帳インデクスを決定し、適応符号帳インデクスを第Ａｃｈ適応符号帳４３８に、固定符号帳インデクスを第Ａｃｈ固定符号帳４３９に、それぞれ出力する。また、歪最小化部４４３は、それらのインデクスに対応するゲイン、具体的には、後述する適応ベクトルおよび後述する固定ベクトルの各々に対するゲイン（適応符号帳ゲインおよび固定符号帳ゲイン）を生成し、適応符号帳ゲインを乗算器４３３に、固定符号帳ゲインを乗算器４３５に、それぞれ出力する。 The distortion minimizing section 443 determines an adaptive codebook index and a fixed codebook index that minimize the coding distortion, fixes the adaptive codebook index to the Ach adaptive codebook 438, and fixes the fixed codebook index to the Ach. Each is output to the codebook 439. Further, the distortion minimizing unit 443 generates gains corresponding to these indexes, specifically, gains (adaptive codebook gain and fixed codebook gain) for each of an adaptive vector described later and a fixed vector described later, The adaptive codebook gain is output to multiplier 433, and the fixed codebook gain is output to multiplier 435.

また、歪最小化部４４３は、モノラル駆動音源信号、ゲイン乗算後の適応ベクトルおよびゲイン乗算後の固定ベクトルの間でゲインを調整するためのゲイン（第１調整用ゲイン、第２調整用ゲインおよび第３調整用ゲイン）を生成し、第１調整用ゲインを乗算器４３２に、第２調整用ゲインを乗算器４３４に、第３調整用ゲインを乗算器４３６に、それぞれ出力する。これらの調整用ゲインは、好ましくは、相互に関係性を持つように生成される。例えば、第１ｃｈ入力音声信号と第２ｃｈ入力音声信号との間のチャネル間相関が高い場合は、モノラル駆動音源信号の寄与分が、ゲイン乗算後の適応ベクトルおよびゲイン乗算後の固定ベクトルの寄与分に対して相対的に大きくなるように、３つの調整用ゲインが生成される。逆に、チャネル間相関が低い場合は、モノラル駆動音源信号の寄与分がゲイン乗算後の適応ベクトルおよびゲイン乗算後の固定ベクトルの寄与分に対して相対的に小さくなるように、３つの調整用ゲインが生成される。 Also, the distortion minimizing unit 443 has a gain (first adjustment gain, second adjustment gain, and gain) for adjusting the gain between the monaural driving sound source signal, the adaptive vector after gain multiplication, and the fixed vector after gain multiplication. 3rd adjustment gain) is generated, and the first adjustment gain is output to the multiplier 432, the second adjustment gain is output to the multiplier 434, and the third adjustment gain is output to the multiplier 436. These adjustment gains are preferably generated so as to be related to each other. For example, when the inter-channel correlation between the first channel input audio signal and the second channel input audio signal is high, the contribution of the monaural driving sound source signal is the contribution of the adaptive vector after gain multiplication and the fixed vector after gain multiplication. Thus, three adjustment gains are generated so as to be relatively large. On the other hand, when the correlation between channels is low, the adjustments for the three driving sources are such that the contribution of the monaural driving sound source signal is relatively small relative to the contribution of the adaptive vector after gain multiplication and the fixed vector after gain multiplication. Gain is generated.

また、歪最小化部４４３は、適応符号帳インデクス、固定符号帳インデクス、適応符号帳ゲインの符号、固定符号帳ゲインの符号および３つのゲイン調整用ゲインの符号を、第Ａｃｈ駆動音源符号化データとして出力する。 Also, the distortion minimizing section 443 converts the adaptive codebook index, fixed codebook index, adaptive codebook gain code, fixed codebook gain code, and three gain adjustment gain codes into the Ach drive excitation code data Output as.

第Ａｃｈ適応符号帳４３８は、過去に生成された合成フィルタ４４１への駆動音源の音源ベクトルを内部バッファに記憶している。また、第Ａｃｈ適応符号帳４３８は、記憶されている音源ベクトルから１サブフレーム分のベクトルを適応ベクトルとして生成する。適応ベクトルの生成は、歪最小化部４４３から入力された適応符号帳インデクスに対応する適応符号帳ラグ（ピッチラグまたはピッチ周期）に基づいて行われる。生成された適応ベクトルは、乗算器４３３に出力される。 The Ach adaptive codebook 438 stores the excitation vector of the driving excitation to the synthesis filter 441 generated in the past in the internal buffer. In addition, the Ach adaptive codebook 438 generates a vector for one subframe as an adaptive vector from the stored excitation vector. The generation of the adaptive vector is performed based on the adaptive codebook lag (pitch lag or pitch period) corresponding to the adaptive codebook index input from the distortion minimizing unit 443. The generated adaptation vector is output to the multiplier 433.

第Ａｃｈ適応符号帳４３８の内部バッファは、スイッチ部４３７から出力された信号によって更新される。この更新動作の詳細については後述する。 The internal buffer of the Ach adaptive codebook 438 is updated by the signal output from the switch unit 437. Details of this update operation will be described later.

第Ａｃｈ固定符号帳４３９は、歪最小化部４４３から出力された固定符号帳インデクスに対応する音源ベクトルを、固定ベクトルとして乗算器４３５に出力する。 Ach fixed codebook 439 outputs the excitation vector corresponding to the fixed codebook index output from distortion minimizing section 443 to multiplier 435 as a fixed vector.

乗算器４３３は、第Ａｃｈ適応符号帳４３８から出力された適応ベクトルに適応符号帳ゲインを乗じ、ゲイン乗算後の適応ベクトルを乗算器４３４に出力する。 Multiplier 433 multiplies the adaptive vector output from A-th adaptive codebook 438 by the adaptive codebook gain, and outputs the adaptive vector after gain multiplication to multiplier 434.

乗算器４３５は、第Ａｃｈ固定符号帳４３９から出力された固定ベクトルに固定符号帳ゲインを乗じ、ゲイン乗算後の固定ベクトルを乗算器４３６に出力する。 Multiplier 435 multiplies the fixed vector output from Ach fixed codebook 439 by a fixed codebook gain, and outputs the fixed vector after gain multiplication to multiplier 436.

乗算器４３２は、モノラル駆動音源信号に第１調整用ゲインを乗じ、ゲイン乗算後のモノラル駆動音源信号を加算器４４０に出力する。乗算器４３４は、乗算器４３３から出力された適応ベクトルに第２調整用ゲインを乗じ、ゲイン乗算後の適応ベクトルを加算器４４０に出力する。乗算器４３６は、乗算器４３５から出力された固定ベクトルに第３調整用ゲインを乗じ、ゲイン乗算後の固定ベクトルを加算器４４０に出力する。 Multiplier 432 multiplies the monaural driving sound source signal by the first adjustment gain, and outputs the monaural driving sound source signal after gain multiplication to adder 440. Multiplier 434 multiplies the adaptive vector output from multiplier 433 by the second adjustment gain, and outputs the adaptive vector after gain multiplication to adder 440. Multiplier 436 multiplies the fixed vector output from multiplier 435 by the third adjustment gain, and outputs the fixed vector after gain multiplication to adder 440.

加算器４４０は、乗算器４３２から出力されたモノラル駆動音源信号と、乗算器４３４から出力された適応ベクトルと、乗算器４３６から出力された固定ベクトルと、を加算し、加算後の信号をスイッチ部４３７および合成フィルタ４４１に出力する。 The adder 440 adds the monaural driving sound source signal output from the multiplier 432, the adaptive vector output from the multiplier 434, and the fixed vector output from the multiplier 436, and switches the signal after the addition Output to the unit 437 and the synthesis filter 441.

スイッチ部４３７は、加算器４４０から出力された信号または第ＡｃｈＬＰＣ予測残差信号生成部４４７から出力された信号を、符号化チャネル選択情報に従って第Ａｃｈ適応符号帳４３８に出力する。より具体的には、選択されたチャネルが第Ａチャネルの場合は、加算器４４０からの信号が第Ａｃｈ適応符号帳４３８に出力され、選択されたチャネルが第Ｂチャネルの場合は、第ＡｃｈＬＰＣ予測残差信号生成部４４７からの信号が第Ａｃ
ｈ適応符号帳４３８に出力される。 The switch unit 437 outputs the signal output from the adder 440 or the signal output from the AchLPC prediction residual signal generation unit 447 to the Ach adaptive codebook 438 according to the encoding channel selection information. More specifically, when the selected channel is the Ath channel, the signal from the adder 440 is output to the Ach adaptive codebook 438, and when the selected channel is the Bth channel, the AchLPC prediction is performed. The signal from the residual signal generation unit 447 is the Ac
h is output to the adaptive codebook 438.

第Ａｃｈ復号部４４４は、第Ａｃｈ符号化データを復号し、それによって得られた第Ａｃｈ復号音声信号を第Ｂｃｈ推定信号生成部４４５に出力する。 The Ach decoding unit 444 decodes the Ach encoded data, and outputs the obtained Ach decoded speech signal to the Bch estimated signal generation unit 445.

第Ｂｃｈ推定信号生成部４４５は、第Ａｃｈ復号音声信号およびモノラル復号音声信号を用いて、第Ａｃｈ符号化時の第Ｂｃｈ復号音声信号として第Ｂｃｈ推定信号を生成する。生成された第Ｂｃｈ推定信号は、第ＢｃｈＣＥＬＰ符号化部（図示せず）に出力される。 Bch estimated signal generation section 445 generates a Bch estimated signal as a Bch decoded speech signal at the time of Ach encoding, using the Ach decoded speech signal and the monaural decoded speech signal. The generated Bch estimation signal is output to a BchCELP encoding unit (not shown).

第ＡｃｈＬＰＣ分析部４４６は、図示されない第ＢｃｈＣＥＬＰ符号化部から出力された第Ａｃｈ推定信号に対してＬＰＣ分析を行い、それによって得られた第ＡｃｈＬＰＣパラメータを、第ＡｃｈＬＰＣ予測残差信号生成部４４７に出力する。ここで、第ＢｃｈＣＥＬＰ符号化部から出力された第Ａｃｈ推定信号は、第ＢｃｈＣＥＬＰ符号化部において第Ｂｃｈ入力音声信号が符号化されたとき（第Ｂｃｈ符号化時）に生成された第Ａｃｈ復号音声信号に相当する。 The Ach LPC analysis unit 446 performs LPC analysis on the Ach estimation signal output from the Bch CELP encoding unit (not shown), and the obtained Ach LPC parameters are sent to the Ach LPC prediction residual signal generation unit 447. Output. Here, the Ach estimated speech output from the Bch CELP encoding unit is the Ach decoded speech generated when the Bch input speech signal is encoded (during the Bch encoding) in the Bch CELP encoding unit. Corresponds to the signal.

第ＡｃｈＬＰＣ予測残差信号生成部４４７は、第ＡｃｈＬＰＣ分析部４４６から出力された第ＡｃｈＬＰＣパラメータを用いて、第Ａｃｈ推定信号に対する符号化ＬＰＣ予測残差信号を生成する。生成された符号化ＬＰＣ予測残差信号は、スイッチ部４３７に出力される。 The AchLPC prediction residual signal generation unit 447 generates an encoded LPC prediction residual signal for the Ach estimation signal using the AchLPC parameter output from the AchLPC analysis unit 446. The generated encoded LPC prediction residual signal is output to the switch unit 437.

次いで、第ＡｃｈＣＥＬＰ符号化部４３０および図示されない第ＢｃｈＣＥＬＰ符号化部での適応符号帳更新動作について説明する。図１１は、符号化チャネル選択部３１０によって第Ａチャネルが選択された場合の、適応符号帳更新動作を示すフロー図である。 Next, the adaptive codebook update operation in the AchCELP encoding unit 430 and the BchCELP encoding unit (not shown) will be described. FIG. 11 is a flowchart showing an adaptive codebook update operation when the channel A is selected by the coding channel selection unit 310.

ここに例示されたフローは、第ＡｃｈＣＥＬＰ符号化部４３０でのＣＥＬＰ符号化処理（ＳＴ３１０）、第ＡｃｈＣＥＬＰ符号化部４３０内の適応符号帳の更新処理（ＳＴ３２０）および第ＢｃｈＣＥＬＰ符号化部内の適応符号帳の更新処理（ＳＴ３３０）に分けられる。また、ステップＳＴ３１０は、２つのステップＳＴ３１１、ＳＴ３１２を含み、ステップＳＴ３３０は、４つのステップＳＴ３３１、ＳＴ３３２、ＳＴ３３３、ＳＴ３３４を含む。 The flow illustrated here includes CELP encoding processing (ST310) in the AchCELP encoding unit 430, adaptive codebook update processing (ST320) in the AchCELP encoding unit 430, and adaptive code in the BchCELP encoding unit. This is divided into a book update process (ST330). Step ST310 includes two steps ST311 and ST312, and step ST330 includes four steps ST331, ST332, ST333, and ST334.

まず、ステップＳＴ３１１では、第ＡｃｈＣＥＬＰ符号化部４３０の第ＡｃｈＬＰＣ分析部４３１によって、ＬＰＣ分析および量子化が行われる。そして、第Ａｃｈ適応符号帳４３８、第Ａｃｈ固定符号帳４３９、乗算器４３２、４３３、４３４、４３５、４３６、加算器４４０、合成フィルタ４４１、減算器４４８、聴覚重み付け部４４２および歪最小化部４４３を主に含む閉ループ型音源探索部によって、音源探索（適応符号帳探索、固定符号帳探索およびゲイン探索）が行われる（ＳＴ３１２）。 First, in step ST311, LPC analysis and quantization are performed by the Ach LPC analysis unit 431 of the Ach CELP encoding unit 430. The Ach adaptive codebook 438, the Ach fixed codebook 439, the multipliers 432, 433, 434, 435, 436, the adder 440, the synthesis filter 441, the subtractor 448, the perceptual weighting unit 442, and the distortion minimizing unit 443. A sound source search (adaptive codebook search, fixed codebook search, and gain search) is performed by a closed loop type sound source search unit that mainly includes (ST312).

ステップＳＴ３２０では、前述の音源探索によって得られた第Ａｃｈ駆動音源信号で第Ａｃｈ適応符号帳４３８の内部バッファが更新される。 In step ST320, the internal buffer of the Ach adaptive codebook 438 is updated with the Ach drive excitation signal obtained by the excitation search described above.

ステップＳＴ３３１では、第ＡｃｈＣＥＬＰ符号化部４３０の第Ｂｃｈ推定信号生成部４４５によって、第Ｂｃｈ推定信号が生成される。生成された第Ｂｃｈ推定信号は、第ＡｃｈＣＥＬＰ符号化部４３０から第ＢｃｈＣＥＬＰ符号化部に送られる。そして、ステップＳＴ３３２では、第ＢｃｈＣＥＬＰ符号化部の図示されない第ＢｃｈＬＰＣ分析部（第ＡｃｈＬＰＣ分析部４４６の同等物）によって、第Ｂｃｈ推定信号に対するＬＰＣ分析が行われ、第ＢｃｈＬＰＣパラメータが得られる。 In step ST331, the Bch estimation signal generation section 445 of the AchCELP encoding section 430 generates a Bch estimation signal. The generated Bch estimation signal is sent from the AchCELP encoding unit 430 to the BchCELP encoding unit. In step ST332, an LPC analysis is performed on the Bch estimation signal by a BchLPC analysis unit (equivalent to the AchLPC analysis unit 446) (not shown) of the BchCELP encoding unit to obtain a BchLPC parameter.

そして、ステップＳＴ３３３では、第ＢｃｈＣＥＬＰ符号化部の図示されない第ＢｃｈＬＰＣ予測残差信号生成部（第ＡｃｈＬＰＣ予測残差信号生成部４４７の同等物）によって、第ＢｃｈＬＰＣパラメータが用いられ、第Ｂｃｈ推定信号に対する符号化ＬＰＣ予測残差信号が生成される。この符号化ＬＰＣ予測残差信号は、第ＢｃｈＣＥＬＰ符号化部の図示されないスイッチ部（スイッチ部４３７の同等物）を経由して、図示されない第Ｂｃｈ適応符号帳（第Ａｃｈ適応符号帳４３８の同等物）に出力される。そして、ステップＳＴ３３４において、第Ｂｃｈ適応符号帳の内部バッファが、第Ｂｃｈ推定信号に対する符号化ＬＰＣ予測残差信号で更新される。 In step ST333, the Bch LPC parameter is used by the Bch LPC prediction residual signal generation unit (equivalent to the Ach LPC prediction residual signal generation unit 447) (not shown) of the Bch CELP encoding unit, and the Bch LPC estimation signal is An encoded LPC prediction residual signal is generated. This encoded LPC prediction residual signal is sent to a Bch adaptive codebook (not shown) (equivalent to the Ach adaptive codebook 438) via a switch (not shown) of the BchCELP encoding unit (equivalent to the switch unit 437). ) Is output. In step ST334, the internal buffer of the Bch adaptive codebook is updated with the encoded LPC prediction residual signal for the Bch estimation signal.

続いて、適応符号帳更新動作についてより具体的に説明する。ここでは、符号化チャネル選択部３１０によって第Ａチャネルが選択された場合を例にとり、第Ａｃｈ適応符号帳４３８の内部バッファの更新動作例を図１２を用いて説明し、第Ｂｃｈ適応符号帳の内部バッファの更新動作例を図１３を用いて説明する。 Next, the adaptive codebook update operation will be described more specifically. Here, an example of the update operation of the internal buffer of the Ach adaptive codebook 438 will be described using the case where the Ath channel is selected by the coding channel selection unit 310 with reference to FIG. An example of the internal buffer update operation will be described with reference to FIG.

図１２に示す動作例では、歪最小化部４４３によって得られた、第ｉフレーム内の第ｊサブフレームについての第Ａｃｈ駆動音源信号を用いて、第Ａｃｈ適応符号帳４３８の内部バッファが更新される（ＳＴ４０１）。そして、更新された第Ａｃｈ適応符号帳４３８は、次サブフレームである第ｊ＋１サブフレームについての音源探索に用いられる（ＳＴ４０２）。 In the operation example shown in FIG. 12, the internal buffer of the Ach adaptive codebook 438 is updated using the Ach drive excitation signal for the jth subframe in the ith frame obtained by the distortion minimizing section 443. (ST401). The updated Ach adaptive codebook 438 is used for sound source search for the j + 1-th subframe that is the next subframe (ST402).

図１３に示す動作例では、第ｉフレームの第Ａｃｈ復号音声信号および第ｉフレームのモノラル復号音声信号を用いて、第ｉフレームの第Ｂｃｈ推定信号が生成される（ＳＴ５０１）。生成された第Ｂｃｈ推定信号は、第ＡｃｈＣＥＬＰ符号化部４３０から第ＢｃｈＣＥＬＰ符号化部に出力される。そして、第ＢｃｈＣＥＬＰ符号化部の第ＢｃｈＬＰＣ予測残差信号生成部において、第ｉフレームについての第Ｂｃｈ符号化ＬＰＣ予測残差信号（第Ｂｃｈ推定信号に対する符号化ＬＰＣ予測残差信号）４５１が生成される（ＳＴ５０２）。第Ｂｃｈ符号化ＬＰＣ予測残差信号４５１は、第ＢｃｈＣＥＬＰ符号化部のスイッチ部を経由して第Ｂｃｈ適応符号帳４５２に出力される。第Ｂｃｈ適応符号帳４５２は、第Ｂｃｈ符号化ＬＰＣ予測残差信号４５１によって更新される（ＳＴ５０３）。更新された第Ｂｃｈ適応符号帳４５２は、次フレームである第ｉ＋１フレームについての音源探索に用いられる（ＳＴ５０４）。 In the operation example shown in FIG. 13, the i-th frame Bch estimation signal is generated using i-th frame Ach decoded audio signal and i-frame monaural decoded audio signal (ST501). The generated Bch estimation signal is output from the AchCELP encoding unit 430 to the BchCELP encoding unit. Then, in the Bch LPC prediction residual signal generation unit of the Bch CELP encoding unit, a Bch encoded LPC prediction residual signal (encoded LPC prediction residual signal for the Bch estimation signal) 451 for the i-th frame is generated. (ST502). The Bch encoded LPC prediction residual signal 451 is output to the Bch adaptive codebook 452 via the switch unit of the Bch CELP encoding unit. Bch adaptive codebook 452 is updated by Bch encoded LPC prediction residual signal 451 (ST503). The updated Bch adaptive codebook 452 is used for sound source search for the (i + 1) th frame which is the next frame (ST504).

あるフレームにおいて、第Ａチャネルが符号化チャネルとして選択された場合、第ＢｃｈＣＥＬＰ符号化部では、第Ｂｃｈ適応符号帳４５２の更新動作以外の動作は要求されないため、そのフレームにおいては第Ｂｃｈ入力音声信号の符号化を休止することができる。 When the Ath channel is selected as the encoding channel in a certain frame, the BchCELP encoding unit does not require any operation other than the update operation of the Bch adaptive codebook 452, and therefore the Bch input speech signal in that frame. Can be paused.

このように、本実施の形態によれば、ＣＥＬＰ符号化方式に基づいて各レイヤの音声符号化を行った場合において、チャネル内相関度が高いチャネルの信号を符号化することができ、チャネル内予測による符号化効率を向上させることができる。 As described above, according to the present embodiment, when speech encoding of each layer is performed based on the CELP encoding scheme, a channel signal having a high intra-channel correlation can be encoded. The encoding efficiency by prediction can be improved.

なお、本実施の形態では、ＣＥＬＰ符号化方式を採用した音声符号化装置において実施の形態３で説明した符号化チャネル選択部３１０を用いた場合を例にとって説明したが、実施の形態１および実施の形態２でそれぞれ説明した符号化チャネル選択部１２０および符号化チャネル選択部２１０を、符号化チャネル選択部３１０の代わりに、あるいは、符号化チャネル３１０とともに、使用することもできる。よって、ＣＥＬＰ符号化方式に基づいて各レイヤの音声符号化を行った場合において、前述の各実施の形態で説明した効果を実現することができる。 In the present embodiment, the case where the coding channel selection unit 310 described in the third embodiment is used in the speech coding apparatus adopting the CELP coding method has been described as an example. The encoding channel selection unit 120 and the encoding channel selection unit 210 described in Embodiment 2 can be used instead of the encoding channel selection unit 310 or together with the encoding channel 310. Therefore, when the speech encoding of each layer is performed based on the CELP encoding method, the effects described in the above embodiments can be realized.

また、拡張レイヤの符号化チャネルの選択基準として、前述したもの以外のものを使用
することもできる。例えば、あるフレームに関して、第ＡｃｈＣＥＬＰ符号化部４３０の適応符号帳探索および第ＢｃｈＣＥＬＰ符号化部の適応符号帳探索をそれぞれ行い、それらの結果として得られる符号化歪みのうちより小さい値を有するものに対応するチャネルを、符号化チャネルとして選択しても良い。 Also, other than the above-described ones can be used as selection criteria for the enhancement layer coding channel. For example, with respect to a certain frame, the adaptive codebook search of the AchCELP encoding unit 430 and the adaptive codebook search of the BchCELP encoding unit are respectively performed, and the resulting encoding distortion has a smaller value. The corresponding channel may be selected as the encoding channel.

また、音声符号化装置４００の構成に、チャネル間予測を実行する構成要素を加えることもできる。この場合、音声符号化装置４００は、モノラル駆動音源信号に対して第１調整用ゲインを直接乗算する代わりに、モノラル駆動音源信号を用いて第Ａｃｈ復号音声信号を予測するチャネル間予測を行い、それによって生成されたチャネル間予測信号に対して第１調整用ゲインを乗算する構成を、採用することができる。 Moreover, the component which performs the prediction between channels can also be added to the structure of the audio | voice coding apparatus 400. FIG. In this case, the speech encoding apparatus 400 performs inter-channel prediction that predicts the first Ach decoded speech signal using the monaural drive excitation signal instead of directly multiplying the monaural drive excitation signal by the first adjustment gain. A configuration in which the inter-channel prediction signal generated thereby is multiplied by the first adjustment gain can be employed.

以上、本発明の各実施の形態について説明した。上記実施の形態に係る音声符号化装置および音声復号化装置は、移動体通信システムにおいて使用される無線通信移動局装置および無線通信基地局装置などの無線通信装置に搭載することができる。 The embodiments of the present invention have been described above. The speech encoding apparatus and speech decoding apparatus according to the above embodiments can be mounted on a wireless communication apparatus such as a wireless communication mobile station apparatus and a wireless communication base station apparatus used in a mobile communication system.

また、上記実施の形態では、本発明をハードウェアで構成する場合を例にとって説明したが、本発明はソフトウェアで実現することも可能である。 Further, although cases have been described with the above embodiment as examples where the present invention is configured by hardware, the present invention can also be realized by software.

また、上記実施の形態の説明に用いた各機能ブロックは、典型的には集積回路であるＬＳＩとして実現される。これらは個別に１チップ化されてもよいし、一部又は全てを含むように１チップ化されてもよい。 Each functional block used in the description of the above embodiment is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them.

ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 The name used here is LSI, but it may also be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration.

また、集積回路化の手法はＬＳＩに限るものではなく、専用回路又は汎用プロセッサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（Field Programmable Gate Array）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサーを利用してもよい。 Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.

さらには、半導体技術の進歩又は派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適応等が可能性としてありえる。 Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied.

本明細書は、２００５年４月２８日出願の特願２００５−１３２３６６に基づくものである。この内容はすべてここに含めておく。 This specification is based on Japanese Patent Application No. 2005-132366 of April 28, 2005 application. All this content is included here.

本発明は、移動体通信システムやインターネットプロトコルを用いたパケット通信システムなどにおける通信装置の用途に適用できる。 The present invention can be applied to the use of a communication apparatus in a mobile communication system or a packet communication system using the Internet protocol.

本発明の実施の形態１に係る音声符号化装置の構成を示すブロック図The block diagram which shows the structure of the audio | voice coding apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態１に係る音声復号化装置の構成を示すブロック図The block diagram which shows the structure of the speech decoding apparatus which concerns on Embodiment 1 of this invention. 本発明の実施の形態２に係る音声符号化装置の構成を示すブロック図FIG. 3 is a block diagram showing the configuration of a speech encoding apparatus according to Embodiment 2 of the present invention. 本発明の実施の形態３に係る音声符号化装置の構成を示すブロック図Block diagram showing the configuration of a speech encoding apparatus according to Embodiment 3 of the present invention. 本発明の実施の形態３に係る符号化チャネル選択部の構成を示すブロック図FIG. 9 is a block diagram showing a configuration of a coding channel selection unit according to Embodiment 3 of the present invention. 本発明の実施の形態３に係る第Ａｃｈ符号化部の構成を示すブロック図The block diagram which shows the structure of the Ach encoding part which concerns on Embodiment 3 of this invention. 本発明の実施の形態３に係る第Ａチャネルのチャネル内予測バッファの更新動作の一例を説明するための図The figure for demonstrating an example of the update operation | movement of the intra-channel prediction buffer of the Ath channel which concerns on Embodiment 3 of this invention. 本発明の実施の形態３に係る第Ｂチャネルのチャネル内予測バッファの更新動作の一例を説明するための図The figure for demonstrating an example of the update operation | movement of the intra channel prediction buffer of the B channel which concerns on Embodiment 3 of this invention. 本発明の実施の形態４に係る音声符号化装置の構成を示すブロック図Block diagram showing the configuration of a speech encoding apparatus according to Embodiment 4 of the present invention. 本発明の実施の形態４に係る第ＡｃｈＣＥＬＰ符号化部の構成を示すブロック図The block diagram which shows the structure of the AchCELP encoding part which concerns on Embodiment 4 of this invention. 本発明の実施の形態４に係る適応符号帳更新動作の一例を示すフロー図The flowchart which shows an example of the adaptive codebook update operation | movement which concerns on Embodiment 4 of this invention. 本発明の実施の形態４に係る第Ａｃｈ適応符号帳の更新動作の一例を説明するための図The figure for demonstrating an example of the update operation | movement of the Ach adaptive codebook which concerns on Embodiment 4 of this invention. 本発明の実施の形態４に係る第Ｂｃｈ適応符号帳の更新動作の一例を説明するための図The figure for demonstrating an example of the update operation | movement of the Bch adaptive codebook which concerns on Embodiment 4 of this invention.

Claims

In a speech encoding apparatus that encodes a stereo signal including a first channel signal and a second channel signal,
Monaural signal generating means for generating a monaural signal using the first channel signal and the second channel signal;
Selecting means for selecting one of the first channel signal and the second channel signal;
Encoding means for encoding the generated monaural signal to obtain core layer encoded data, and encoding the selected channel signal to obtain enhancement layer encoded data corresponding to the core layer encoded data;
I have a,
The selection means includes
Based on the coding distortion for the first channel signal and the second channel signal or the intra-channel correlation corresponding to the first channel signal and the second channel signal, the first channel signal and the second channel Select one of the signals for each frame,
The encoding means includes
Encoding the monaural signal and the channel signal selected for each frame for each frame;
Speech encoding device.

Calculation means for calculating a first coding distortion that occurs when the first channel signal is selected and a second coding distortion that occurs when the second channel signal is selected, respectively. ,
The selection means includes
When the calculated first coding distortion is smaller than the calculated second coding distortion, the first channel signal is selected, and the calculated first coding distortion is calculated by the first code. The second channel signal is selected if the distortion is smaller than
The speech encoding apparatus according to claim 1.

The encoding means includes
The first channel signal and the second channel signal are encoded to obtain first encoded data and second encoded data, respectively, and the selected channel is selected from the first encoded data and the second encoded data. A signal corresponding to the signal is output as the enhancement layer encoded data,
Using the monaural decoded signal obtained when the encoding means encodes the monaural signal and the first channel decoded signal obtained when the encoding means encodes the first channel signal, Generating a second channel estimation signal corresponding to the second channel signal, the monaural decoded signal, and a second channel decoded signal obtained when the encoding means encodes the second channel signal, Using estimated signal generating means for generating a first channel estimated signal corresponding to the first channel signal;
Calculating the first coding distortion based on an error of the first channel decoded signal with respect to the first channel signal and an error of the second channel estimation signal with respect to the second channel signal; Distortion calculating means for calculating the second coding distortion based on an error of the first channel estimation signal with respect to and an error of the second channel decoded signal with respect to the second channel signal;
The speech encoding apparatus according to claim 2, comprising:

The selection means includes
Calculating means for calculating a first intra-channel correlation corresponding to the first channel signal and a second intra-channel correlation corresponding to the second channel signal;
When the calculated first intra-channel correlation is higher than the calculated second intra-channel correlation, the first channel signal is selected, and the calculated second intra-channel correlation is calculated in the first channel. When the degree of correlation is higher, the second channel signal is selected.
The speech encoding apparatus according to claim 1.

The encoding means includes
When the first channel signal is selected by the selection unit, CELP (Code Excited Linear Prediction) encoding of the first channel signal is performed using a first adaptive codebook, and a CELP encoding result is used. Obtaining the enhancement layer encoded data and updating the first adaptive codebook using the CELP encoding result;
The speech encoding apparatus according to claim 1.

The encoding means includes
Generating a second channel estimation signal corresponding to the second channel signal using the enhancement layer encoded data and a monaural decoded signal obtained when the monaural signal is encoded;
Updating a second adaptive codebook used in CELP coding of the second channel signal using an LPC (Linear Prediction Coding) prediction residual signal of the second channel estimation signal;
The speech encoding apparatus according to claim 5 .

The selection means includes
Selecting the first channel signal in association with a frame having subframes;
The encoding means includes
Obtaining the enhancement layer encoded data of the frame while performing sound source search for each subframe for the monaural signal and the first channel signal selected in association with the frame;
The speech encoding apparatus according to claim 6 .

The encoding means includes
Updating the first adaptive codebook in units of the subframe and updating the second adaptive codebook in units of the frame;
The speech encoding apparatus according to claim 7 .

A mobile station apparatus comprising the speech encoding apparatus according to claim 1.

A base station apparatus comprising the speech encoding apparatus according to claim 1.

In a speech encoding method for encoding a stereo signal including a first channel signal and a second channel signal,
Generating a monaural signal using the first channel signal and the second channel signal;
A selection step of selecting one of the first channel signal and the second channel signal;
With obtaining the core layer encoded data generated monaural signal is encoded, the encoding step of obtaining an enhancement layer encoded data corresponding to the core layer encoded data by encoding the selected channel signal,
Including
In the selection step,
Based on the coding distortion for the first channel signal and the second channel signal or the intra-channel correlation corresponding to the first channel signal and the second channel signal, the first channel signal and the second channel Select one of the signals for each frame,
In the encoding step,
Encoding the monaural signal and the channel signal selected for each frame for each frame;
Speech encoding method.