JP4486560B2

JP4486560B2 - Scalable encoding method and apparatus, scalable decoding method and apparatus, program thereof, and recording medium thereof

Info

Publication number: JP4486560B2
Application number: JP2005205325A
Authority: JP
Inventors: 幸浩坂東; 誠之高村; 一人上倉; 由幸八島
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2005-07-14
Filing date: 2005-07-14
Publication date: 2010-06-23
Anticipated expiration: 2025-07-14
Also published as: JP2007028034A

Description

本発明は，フレーム間予測と階層間予測とを組み合わせてスケーラブル符号化／復号を行う高能率画像信号符号化方法および復号方法に関する。 The present invention relates to a highly efficient image signal encoding method and decoding method for performing scalable encoding / decoding by combining inter-frame prediction and inter-layer prediction.

近年，多様化するネットワーク環境・端末環境などに対応するためのスケーラブル符号化が注目を集めている。スケーラブル符号化では，画像信号を階層的に分割し，各階層毎に符号化が行われる。階層分割の方法としては，
(ｉ) 空間周波数に関する帯域分割，
(ii) 時間周波数に関する帯域分割，
などがある。 (ｉ) としては，wavelet 分割（非特許文献１参照）， (ii) としては，Motion Compensation Temporal Filtering（ＭＣＴＦ）（非特許文献２参照）が代表例である。 In recent years, scalable coding to deal with diversifying network environments and terminal environments has attracted attention. In scalable coding, an image signal is divided hierarchically and coding is performed for each layer. As a method of hierarchy division,
(i) Band division related to spatial frequency,
(ii) Band division for time frequency,
and so on. Typical examples of (i) are wavelet division (see Non-Patent Document 1), and (ii) is Motion Compensation Temporal Filtering (MCTF) (see Non-Patent Document 2).

時間・空間・ＳＮＲスケーラビリティに対応した符号化方法として，Scalable Video Coding （ＳＶＣ）が注目されている。非特許文献３で示されたScalable Video Model（ＳＶＭ）は，ＡＶＣとＭＣＴＦをベースとした符号化方式であり，時間方向の片方向予測・両方向予測，フレーム内の空間的な予測，下位階層の補間信号を用いた階層間予測を用いている。 As an encoding method corresponding to time, space, and SNR scalability, Scalable Video Coding (SVC) has attracted attention. The Scalable Video Model (SVM) shown in Non-Patent Document 3 is a coding method based on AVC and MCTF. One-way prediction / bi-directional prediction in the time direction, spatial prediction in the frame, lower layer Inter-layer prediction using interpolated signals is used.

画像の伝送にスケーラブル符号化を適用した例としては，特許文献１に記載された遠隔地撮影システムがあり，この特許文献１には，サムネイル画像の符号化に，ＭＰＥＧ−２のスケーラブル機能が利用可能であることが示されている。このようなＭＰＥＧ−２のスケーラブル符号化の枠組みでは，下位階層からの予測を行う場合，予測はブロック単位に行われ，ブロック内の画素には同一の予測係数が乗じられる。
“A theory for multiresolution signal decomposition: the wavelet representation ”，S.G.Mallat，IEEE Trans. Pattern Analysis and Machine Intelligence ，Vol.11，No.7，pp.674-693，July，1989. “Three-dimensional subband coding with motion compensation ”，J.R.Ohm ，IEEE Trans.Image Processing ，Vol.3 ，No.5，pp.559-571，Sept. ，1994. J.Reichel ，M.Wien and H.Schwarz，“Scalable Video Model 3.0”，ISO/IEC JTC1/SC29/WG11 doc. no. N6716 ，Palma ，October 2004. 特開平２００３−１０１８５５号公報 As an example of applying scalable coding to image transmission, there is a remote shooting system described in Patent Document 1, which uses a scalable function of MPEG-2 for coding thumbnail images. It has been shown to be possible. In the MPEG-2 scalable coding framework, when prediction is performed from a lower layer, prediction is performed in units of blocks, and pixels in the block are multiplied by the same prediction coefficient.
“A theory for multiresolution signal decomposition: the wavelet representation”, SGMallat, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.11, No.7, pp.674-693, July, 1989. “Three-dimensional subband coding with motion compensation”, JROhm, IEEE Trans.Image Processing, Vol.3, No.5, pp.559-571, Sept., 1994. J. Reichel, M. Wien and H. Schwarz, “Scalable Video Model 3.0”, ISO / IEC JTC1 / SC29 / WG11 doc. No. N6716, Palma, October 2004. Japanese Patent Laid-Open No. 2003-101855

ＳＶＭにおける予測方式は，いずれもブロックベースの方式であるため，原理上，ブロック歪みの発生を内在する。すなわち，ブロック歪みの低減が復号画像の画質の向上には不可欠となる。ＳＶＭでは，デブロックキングフィルタによるブロック歪みの低減が行われている。より直接的なブロック歪み抑圧法は，予測方式の予測性能を向上させ，予測誤差の発生そのものを低減させることである。しかし，従来のＳＶＭにおける予測は，時間方向予測と階層間予測とが独立に行われており，予測誤差低減の観点から見た場合，改良の余地が残る。 Since all prediction methods in SVM are block-based methods, the generation of block distortion is inherent in principle. That is, reduction of block distortion is indispensable for improving the image quality of the decoded image. In SVM, block distortion is reduced by a deblocking filter. A more direct block distortion suppression method is to improve the prediction performance of the prediction method and reduce the occurrence of prediction error itself. However, in the prediction in the conventional SVM, temporal direction prediction and inter-layer prediction are performed independently, and there is room for improvement from the viewpoint of reducing prediction errors.

本発明はかかる事情に鑑みてなされたものであって，異なる時間・空間解像度の階層から構成される信号に対するブロック歪み低減処理において，階層間の相関を利用してブロック境界部の予測誤差を抑制する予測方法の設計法を確立することを目的とする。 The present invention has been made in view of such circumstances, and in block distortion reduction processing for signals composed of layers having different temporal and spatial resolutions, the prediction error at the block boundary portion is suppressed by using the correlation between the layers. The purpose is to establish the design method of the prediction method.

ブロックベースの予測を行った場合，ブロック中心部と比較してブロック境界部に大きな予測誤差が発生する傾向にある。これが，ブロック歪みの原因である。つまり，ブロック歪みの低減には，上述のブロック境界周辺の予測誤差の抑圧が必要となる。 When block-based prediction is performed, a large prediction error tends to occur at the block boundary compared to the block center. This is the cause of block distortion. That is, to reduce block distortion, it is necessary to suppress the prediction error around the block boundary described above.

空間解像度の異なる２つの階層を考えた場合，ブロック境界の画素は，下位階層との位置関係に応じて次の２つに分類できる。 When two layers having different spatial resolutions are considered, the pixels on the block boundary can be classified into the following two types according to the positional relationship with the lower layer.

(ｉ) 対応する下位階層の位置がブロック境界の近傍ではない画素
(ii) 対応する下位階層の位置がブロック境界の近傍となる画素
図２に，この (ｉ) (ii) の例を示す。同図は空間解像度の異なる２つの階層におけるフレームを示しており，同図における濃い網掛け部は上述の (ｉ) にあたる領域Ａであり，同図における薄い網掛け部は上述の (ii) にあたる領域Ｂである。 (i) Pixel whose corresponding lower layer position is not near the block boundary
(ii) Pixel whose corresponding lower layer position is near the block boundary Figure 2 shows an example of (i) and (ii). This figure shows frames in two layers with different spatial resolutions. The dark shaded area in the figure is the area A corresponding to (i) above, and the thin shaded area in the figure corresponds to (ii) above. Region B.

ここで， (ｉ) の領域Ａに注目すると，同領域に対応する下位階層の領域はブロック境界ではないことが分かる。つまり，下位階層の信号を用いた予測を行えば，同領域においてブロック歪みは発生しない。そこで，本発明では， (ｉ) の領域Ａに対して，動き補償によるフレーム間予測と補間処理による階層間予測とを組み合わせた両方向予測を適用する。つまり，フレーム間予測の参照信号として，同一階層における異なる時刻のフレームに加え，下位階層信号の補間処理により生成した補間信号を用いる。 Here, when attention is paid to the area A in (i), it can be seen that the area of the lower hierarchy corresponding to the area is not a block boundary. That is, block prediction does not occur in the same region if prediction is performed using a lower layer signal. Therefore, in the present invention, bidirectional prediction combining inter-frame prediction based on motion compensation and inter-layer prediction based on interpolation processing is applied to region A in (i). That is, as a reference signal for inter-frame prediction, an interpolation signal generated by interpolation processing of lower layer signals is used in addition to frames at different times in the same layer.

すなわち，本発明の特徴は以下のとおりである。フレーム間予測を伴う画像符号化において，空間解像度の等しい隣接フレーム信号を参照する時間方向のフレーム間予測と，空間解像度異なる下位階層信号を参照する階層間予測を組み合わせ，複数のフレームを参照信号とする予測方式を用いる。このとき，本発明では，被予測信号の予測ブロック内の空間的な位置に応じて，フレーム間予測と階層間予測の予測強度を変化させる。 That is, the features of the present invention are as follows. In image coding with inter-frame prediction, temporal inter-frame prediction that refers to adjacent frame signals with the same spatial resolution and inter-layer prediction that refers to lower layer signals with different spatial resolutions are combined, and multiple frames are used as reference signals. The prediction method to be used is used. At this time, in the present invention, the prediction strength of inter-frame prediction and inter-layer prediction is changed according to the spatial position of the signal to be predicted in the prediction block.

従来技術の例えばＭＰＥＧ−２のスケーラブル符号化の枠組みでは，下位階層からの予測を行う場合，予測はブロック単位に行い，ブロック内の画素には同一の予測係数が乗じられる。これに対し，本発明では，予測ブロック内の空間位置に応じて予測係数を適応的に変化させる適応処理を導入する。これにより，復号画像におけるブロック歪みの低減が可能になる。 In the conventional scalable encoding framework of MPEG-2, for example, when prediction is performed from a lower layer, prediction is performed in units of blocks, and pixels in the block are multiplied by the same prediction coefficient. On the other hand, the present invention introduces an adaptive process for adaptively changing the prediction coefficient in accordance with the spatial position in the prediction block. Thereby, the block distortion in the decoded image can be reduced.

また，本発明は，被予測信号の予測ブロック内の空間的な位置，および，参照信号となる復号画像に含まれる符号化歪みに応じて，フレーム間予測と階層間予測の予測強度を変化させることを特徴とする。符号化歪みに応じて予測強度を変化させる方法として，下位階層信号生成時のダウンサンプリングフィルタの周波数特性に応じて，下位階層の復号画像に含まれる符号化歪みを重み付けし，フレーム間予測と階層間予測の予測強度を変化させる方法を用いることができる。 In addition, the present invention changes the prediction strength of inter-frame prediction and inter-layer prediction according to the spatial position in the prediction block of the signal to be predicted and the coding distortion included in the decoded image serving as the reference signal. It is characterized by that. As a method of changing the prediction strength according to the coding distortion, the coding distortion included in the decoded image of the lower layer is weighted according to the frequency characteristic of the downsampling filter when the lower layer signal is generated, and the interframe prediction and the layer A method of changing the prediction strength of the inter prediction can be used.

このように予測係数を動的に変更することにより，符号化歪みの少ない予測画像を生成することが可能になる。 By dynamically changing the prediction coefficient in this way, it is possible to generate a predicted image with less coding distortion.

図１は，本発明の概要を説明する図である。図中，１は予測係数算出手段，２は予測係数記憶手段，３は階層間予測処理手段，４はフレーム間予測処理手段，５は予測信号生成手段を表す。 FIG. 1 is a diagram for explaining the outline of the present invention. In the figure, 1 is a prediction coefficient calculation means, 2 is a prediction coefficient storage means, 3 is an inter-layer prediction processing means, 4 is an inter-frame prediction processing means, and 5 is a prediction signal generating means.

予測係数算出手段１は，参照信号となる復号画像に含まれる符号化歪みに基づき決定されたフレーム間予測および階層間予測における参照信号の画質予測強度と，被予測信号の予測ブロック内の空間的な位置に応じて決定されたフレーム間予測と階層間予測における参照信号の空間予測強度とから，または前記画質予測強度と前記空間予測強度のいずれかから，フレーム間予測の予測係数α₀と階層間予測の予測係数α_lを算出し，予測係数記憶手段２に記憶する。 The prediction coefficient calculation means 1 includes the image quality prediction strength of the reference signal in the inter-frame prediction and inter-layer prediction determined based on the coding distortion included in the decoded image serving as the reference signal, and the spatial in the prediction block of the predicted signal. The prediction coefficient α _{0 of the} inter-frame prediction and the hierarchy from either the inter-frame prediction determined according to the position and the spatial prediction intensity of the reference signal in the inter-layer prediction, or from the image quality prediction intensity and the spatial prediction intensity. A prediction coefficient α _l for inter prediction is calculated and stored in the prediction coefficient storage means 2.

階層間予測処理手段３は，空間解像度の異なる下位階層信号を参照する階層間予測を行い，階層間予測に基づく予測信号Ｕ（ｇ_i+1）を生成する。なお，Ｕ（）は，アップサンプリングを行う関数である。 The inter-layer prediction processing means 3 performs inter-layer prediction with reference to lower layer signals having different spatial resolutions, and generates a prediction signal U (g _{i + 1} ) based on the inter-layer prediction. U () is a function that performs upsampling.

フレーム間予測処理手段４は，空間解像度の等しい近隣のフレーム信号を参照する時間方向のフレーム間予測を行い，フレーム間予測に基づく予測信号ｇ_iを生成する。 Interframe predictive processing unit 4 performs prediction between time direction of the frame that references a neighboring frame signal equal spatial resolution, and generates a prediction signal g _i based on inter-frame prediction.

予測信号生成手段５は，前記階層間予測に基づく予測信号と前記フレーム間予測に基づく予測信号に，それぞれ予測係数記憶手段２に記憶された対応する予測係数を乗じて加算することにより予測信号（α₀ｇ_i＋α_lＵ（ｇ_i+1））を生成する。 The prediction signal generation means 5 multiplies the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by the corresponding prediction coefficient stored in the prediction coefficient storage means 2, respectively, and adds the prediction signal ( α ₀ g _i + α _l U (g _{i + 1} )).

これにより，被予測信号の予測ブロック内の空間的な位置に応じて，または参照信号となる復号画像に含まれる符号化歪みに応じて，あるいはそれらの双方に応じてフレーム間予測と階層間予測との予測強度を変化させる。 As a result, inter-frame prediction and inter-layer prediction are performed according to the spatial position in the prediction block of the signal to be predicted, according to the coding distortion included in the decoded image serving as the reference signal, or both. And change the predicted intensity.

本発明によれば，フレーム間予測において，下位階層の信号を参照することにより，ブロック境界部に局在する予測誤差を抑制することが可能となり，復号画像におけるブロック歪みを低減することができる。また，参照信号の符号化歪みを考慮して予測係数を動的に変更することにより，符号化歪みの少ない予測画像を生成することが可能となり，これは予測誤差の低減につながる。 According to the present invention, it is possible to suppress a prediction error localized at a block boundary by referring to a lower layer signal in inter-frame prediction, and block distortion in a decoded image can be reduced. Also, by dynamically changing the prediction coefficient in consideration of the coding distortion of the reference signal, it is possible to generate a prediction image with little coding distortion, which leads to a reduction in prediction error.

まず，本発明の説明で用いる記号を整理する。第ｊ階層の時刻ｔのフレーム内の座標（ｘ，ｙ）における画素値をｆ_j（ｘ，ｙ，ｔ）とし，ｆ_j（ｘ，ｙ，ｔ）に対する復号信号をｇ_j（ｘ，ｙ，ｔ）とする。ｆ_j+1（ｘ，ｙ，ｔ）は，ｆ_j（ｘ，ｙ，ｔ）の１／２の空間解像度となる。例えば，ｆ₀（ｘ，ｙ，ｔ）がＣＩＦサイズであれば，ｆ₁（ｘ，ｙ，ｔ）はＱＣＩＦサイズとなる。フレーム間予測におけるブロックサイズをＬ×Ｌとする。 First, symbols used in the description of the present invention are organized. The pixel value at the coordinate (x, y) in the frame at time t in the j-th layer is defined as f _j (x, y, t), and the decoded signal for f _j (x, y, t) is represented as g _j (x, y). , T). f _{j + 1} (x, y, t) has a spatial resolution ½ that of f _j (x, y, t). For example, if f ₀ (x, y, t) is a CIF size, f ₁ (x, y, t) is a QCIF size. Let the block size in inter-frame prediction be L × L.

以下では，参照信号の異なる２つの予測モードとして，
・階層間予測と時間方向の片方向予測を組み合わせた第一予測モード，
・階層間の予測と時間方向の両方向予測を組み合わせた第二予測モード，
を示す。 Below, as two prediction modes with different reference signals,
-The first prediction mode that combines inter-layer prediction and temporal one-way prediction,
A second prediction mode that combines prediction between layers and bi-directional prediction in the time direction,
Indicates.

図３は，下位階層も考慮した多参照信号予測の例を示している。図３（Ａ）が階層間予測と時間方向の片方向予測を組み合わせた第一予測モードを示しており，図３（Ｂ）が階層間の予測と時間方向の両方向予測を組み合わせた第二予測モードを示している。 FIG. 3 shows an example of multi-reference signal prediction in consideration of lower layers. FIG. 3 (A) shows a first prediction mode in which inter-layer prediction and temporal one-way prediction are combined, and FIG. 3 (B) is a second prediction in which inter-layer prediction and temporal bidirectional prediction are combined. Indicates the mode.

［第一予測モード：片方向フレーム間予測の拡張］
片方向フレーム間予測を拡張して，次式に示すように，下位階層の信号も参照信号とする多フレーム参照予測を行う。 [First prediction mode: extended one-way interframe prediction]
One-way inter-frame prediction is expanded to perform multi-frame reference prediction using lower layer signals as reference signals as shown in the following equation.

ここで，ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀）は，動き補償を伴う時間方向の参照信号であり，α₀は同参照信号の重み係数である。一方，ｇ_j+1（ｘ，ｙ，ｔ）は，階層間予測の参照信号であり，直下階層の対応する位置の復号信号である。Ｕ（）は，アップサンプリングを行う関数であり，Ｕ（ｇ_j+1（ｘ，ｙ，ｔ））は，ｇ_j（ｘ，ｙ，ｔ）と同じ空間解像度を有する信号である。α_lは同参照信号の重み係数である。α₀，α_lを各々，参照信号ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j+1（ｘ，ｙ，ｔ）の予測係数と呼ぶ。

Here, g _j (x−d _x0 , y−d _y0 , t−τ ₀ ) is a time direction reference signal with motion compensation, and α ₀ is a weighting coefficient of the reference signal. On the other hand, g _{j + 1} (x, y, t) is a reference signal for inter-layer prediction, and is a decoded signal at a corresponding position in the immediately lower layer. U () is a function that performs upsampling, and U (g _{j + 1} (x, y, t)) is a signal having the same spatial resolution as g _j (x, y, t). α _l is a weighting coefficient of the reference signal. α ₀ and α _l are respectively referred to as prediction coefficients of the reference signals g _j (x−d _x0 , y−d _y0 , t−τ ₀ ) and g _{j + 1} (x, y, t).

α₀，α_lの設定法について，以下，三通りの方法を示す。 Three methods for setting α ₀ and α _l are shown below.

〔１〕被予測信号の空間位置に応じた適応重み付け
まず，被予測信号の空間的な位置に応じてα₀，α_lの値を設定する方法について示す。 [1] Adaptive weighting according to the spatial position of the signal to be predicted First, a method for setting the values of α ₀ and α _l according to the spatial position of the signal to be predicted will be described.

ここで，ｓ_t（ｘ，ｙ），ｓ_l（ｘ，ｙ）は，空間予測強度と呼ぶ被予測信号の空間位置に応じた重み関数である。空間予測強度の例を以下に示す。

Here, s _t (x, y) and s _l (x, y) are weight functions according to the spatial position of the signal to be predicted, called spatial prediction intensity. An example of spatial prediction intensity is shown below.

＜空間予測強度の例１＞ <Example 1 of spatial prediction strength>

ここで，ψ（）は以下の関数である。

Here, ψ () is the following function.

ここで，ｃはｃ＞０を満たす実数であり，ｃの値が大きいほど，下位階層からの予測に対する比重が大きくなる。また，ａは０≦ａ≦Ｌを満たす実数であり，ａの値が大きいほど，前述の領域Ａ（図２) の比率が大きくなる。ψ（ｙ）も同様である。図４にψ（）の例を示す。

Here, c is a real number that satisfies c> 0, and the greater the value of c, the greater the specific gravity for prediction from the lower hierarchy. Further, a is a real number satisfying 0 ≦ a ≦ L, and the larger the value of a, the larger the ratio of the above-mentioned region A (FIG. 2). The same applies to ψ (y). FIG. 4 shows an example of ψ ().

＜空間予測強度の例２＞ <Example 2 of spatial prediction strength>

ここで，ａ，ｃは式（４）で用いたものと同じである。

Here, a and c are the same as those used in equation (4).

〔２〕参照信号の復号画質に応じた適応重み付け
次に，参照信号の復号画質に応じてα₀，α_lの値を設定する方法について示す。 [2] Adaptive Weighting According to Decoded Image Quality of Reference Signal Next, a method for setting the values of α ₀ and α _l according to the decoded image quality of the reference signal will be described.

ここで，Ｑ₀，Ｑ_lは各々，復号信号ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j+1（ｘ，ｙ，ｔ）の生成に用いた量子化ステップ幅である。また，ｗ_t（），ｗ_l（）は画質予測強度と呼ぶパラメータであり，各々，参照信号ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j+1（ｘ，ｙ，ｔ）の復号画質，すなわち，符号化歪みを反映した重み関数であり，量子化ステップ幅Ｑの関数として次のように表される。

Here, Q ₀ and Q _l are quantizations used for generating decoded signals g _j (x−d _x0 , y−d _y0 , t−τ ₀ ) and g _{j + 1} (x, y, t), respectively. Step width. W _t () and w _l () are parameters called image quality prediction strength, and reference signals g _j (x−d _x0 , y−d _y0 , t−τ ₀ ) and g _{j + 1} (x, respectively). , Y, t) is a weighting function reflecting the decoding image quality, that is, encoding distortion, and is expressed as a function of the quantization step width Q as follows.

ここで，βは第ｊ階層の参照信号が縮小画像であり，縮小による歪み（高域成分の減衰等）を反映させる重み係数である。βの例としては，次式を挙げることができる。

Here, β is a weighting factor that reflects distortion due to reduction (attenuation of high-frequency components, etc.) when the reference signal of the j-th layer is a reduced image. The following formula can be given as an example of β.

ここで，Ｈ（ω）は第ｊ＋１階層の信号から第ｊ階層の信号を生成したローパスフィルタの周波数特性である。

Here, H (ω) is a frequency characteristic of a low-pass filter that generates a j-th layer signal from a j + 1-th layer signal.

〔３〕被予測信号の空間位置および参照信号の復号画質に応じた重み付け
最後に，前述の２つの適応処理を併用したα₀，α_lの値を設定する方法について示す。 [3] Weighting according to the spatial position of the signal to be predicted and the decoded image quality of the reference signal Finally, a method for setting the values of α ₀ and α _l using the two adaptive processes described above will be described.

ここで，ｓ_t（ｘ，ｙ），ｓ_l（ｘ，ｙ）は式（３）あるいは式（５）の通りであり，ｗ_t（），ｗ_l（）は式（７）の通りである。

Here, s _t (x, y) and s _l (x, y) are as in Equation (3) or (5), and w _t () and w _l () are as in Equation (7). is there.

［第二予測モード：両片向フレーム間予測の拡張］
両方向フレーム間予測を拡張して，次式に示すように，下位階層の信号も参照信号とする多フレーム参照予測を行う。 [Second prediction mode: extended bi-directional inter-frame prediction]
The bi-directional inter-frame prediction is expanded to perform multi-frame reference prediction using a lower layer signal as a reference signal as shown in the following equation.

ここで，ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j（ｘ−ｄ_x1，ｙ−ｄ_y1，ｔ−τ₁）は動き補償を伴う時間方向の参照信号であり，α₀，α_lは同参照信号の重み係数である。一方，ｇ_j+1（ｘ，ｙ，ｔ）は階層間予測の参照信号であり，直下階層の対応する位置の復号信号である。Ｕ（）はアップサンプリングを行う関数であり，Ｕ（ｇ_j+1（ｘ，ｙ，ｔ））はｇ_j（ｘ，ｙ，ｔ）と同じ空間解像度を有する信号である。α_lは同参照信号の重み係数である。

Here, g _j (x−d _x0 , y−d _y0 , t−τ ₀ ) and g _j (x−d _x1 , y−d _y1 , t−τ ₁ ) are time direction reference signals with motion compensation. _Where α ₀ and α _l are weighting factors of the reference signal. On the other hand, g _{j + 1} (x, y, t) is a reference signal for inter-layer prediction, and is a decoded signal at a corresponding position in the immediately lower layer. U () is a function that performs upsampling, and U (g _{j + 1} (x, y, t)) is a signal having the same spatial resolution as g _j (x, y, t). α _l is a weighting coefficient of the reference signal.

α₀，α₁，α_lを各々，参照信号ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j（ｘ−ｄ_x1，ｙ−ｄ_y1，ｔ−τ₁），ｇ_j+1（ｘ，ｙ，ｔ）の予測係数と呼ぶ。 α ₀ , α ₁ , α _l are respectively referred to as reference signals g _j (x-d _x0 , y-d _y0 , t-τ ₀ ), g _j (x-d _x1 , y-d _y1 , t-τ ₁ ). , G _{j + 1} (x, y, t).

α₀，α₁，α_lの設定法について，以下，三通りの方法を示す。 Three methods for setting α ₀ , α ₁ , and α _l are shown below.

〔１〕被予測信号の空間位置に応じた適応重み付け
まず，被予測信号の空間的な位置に応じてα₀，α₁，α_lの値を設定する方法について示す。 [1] Adaptive weighting according to the spatial position of the signal to be predicted First, a method for setting the values of α ₀ , α ₁ , and α _l according to the spatial position of the signal to be predicted will be described.

ここで，予測強度ｓ_t（ｘ，ｙ），ｓ_l（ｘ，ｙ）は，式（３）あるいは式（５）に従う。

Here, the predicted intensities s _t (x, y) and s _l (x, y) follow Formula (3) or Formula (5).

〔２〕参照信号の復号画質に応じた適応重み付け
次に，参照信号の復号画質に応じてα₀，α₁，α_lの値を設定する方法について示す。 [2] Adaptive Weighting According to Decoding Image Quality of Reference Signal Next, a method for setting the values of α ₀ , α ₁ , and α _l according to the decoding image quality of the reference signal will be described.

ここで，Ｑ₀，Ｑ₁，Ｑ_lは各々，復号信号ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j（ｘ−ｄ_x1，ｙ−ｄ_y1，ｔ−τ₁），ｇ_j+1（ｘ，ｙ，ｔ）の生成に用いた量子化ステップ幅であり，ｗ_t（），ｗ_l（）は各々，参照信号ｇ_j（ｘ−ｄ_x0，ｙ−ｄ_y0，ｔ−τ₀），ｇ_j（ｘ−ｄ_x1，ｙ−ｄ_y1，ｔ−τ₁），ｇ_j+1（ｘ，ｙ，ｔ）の復号画質，すなわち，符号化歪みを反映した重み関数であり，量子化ステップ幅Ｑの関数として式（７）として表される。

Here, Q ₀ , Q ₁ , and Q _l are decoded signals g _j (x−d _x0 , y−d _y0 , t−τ ₀ ), g _j (x−d _x1 , y−d _y1 , t−, respectively). τ ₁ ), g _{j + 1} (x, y, t) are the quantization step widths used, and w _t () and w _l () are reference signals g _j (x−d _x0 , y −d _y0 , t−τ ₀ ), g _j (x−d _x1 , y−d _y1 , t−τ ₁ ), g _{j + 1} (x, y, t). The reflected weight function is expressed as a function of the quantization step width Q as the expression (7).

〔３〕被予測信号の空間位置および参照信号の復号画質に応じた重み付け
最後に，前述の２つの適応処理を併用したα₀，α₁，α_lの値を設定する方法について示す。 [3] Weighting according to spatial position of signal to be predicted and decoded image quality of reference signal Finally, a method for setting the values of α ₀ , α ₁ , and α _l using the two adaptive processes described above will be described.

ここで，ｓ_t（ｘ，ｙ），ｓ_l（ｘ，ｙ）は，式（３）あるいは式（５）に従い，ｗ_t（），ｗ_l（）は式（７）に従う。

Here, s _t (x, y) and s _l (x, y) follow Formula (3) or Formula (5), and w _t () and w _l () follow Formula (7).

［予測処理］
本発明で用いる予測装置の実施形態について，図５を参照して説明する。 [Prediction process]
An embodiment of a prediction apparatus used in the present invention will be described with reference to FIG.

ステップＳ１では，予測モード情報を入力とし，第一予測モード，第二予測モードのいずれであるのかを識別し，予測モードに応じて対応する復号フレームを参照信号として，レジスタに書き出す。ここでは，予測モード情報は外部から与えられるものとする。なお，この予測モードは，あらかじめ固定的に決められていてもよく，また発生符号量の予測・推定などにより適応的に決められるようになっていてもよい。 In step S1, the prediction mode information is input to identify whether the prediction mode is the first prediction mode or the second prediction mode, and the corresponding decoded frame is written to the register as a reference signal according to the prediction mode. Here, it is assumed that the prediction mode information is given from the outside. This prediction mode may be fixedly determined in advance, or may be determined adaptively by predicting / estimating the amount of generated code.

ステップＳ２では，量子化パラメータ，画質予測強度／空間予測強度の算出に用いる重み関数の関数形を入力とし，参照信号に対する予測係数を算出する処理を行い，算出した予測係数をレジスタに書き出す。この予測係数算出処理の詳細については，図６を参照して後述する。 In step S2, the process of calculating the prediction coefficient for the reference signal is performed using the quantization parameter and the function form of the weight function used for calculating the image quality prediction strength / spatial prediction strength, and the calculated prediction coefficient is written to the register. Details of this prediction coefficient calculation processing will be described later with reference to FIG.

ステップＳ３では，参照信号，各参照信号に対する予測係数，動きベクトル，アップサンプリングフィルタ係数を入力とし，フレーム間予測および階層間予測からなる予測信号の生成処理を行い，同予測信号を出力する。具体的な算出方法は，式（１）または式（１０）に従う。 In step S3, a reference signal, a prediction coefficient for each reference signal, a motion vector, and an upsampling filter coefficient are input, a prediction signal generation process including inter-frame prediction and inter-layer prediction is performed, and the prediction signal is output. A specific calculation method follows Formula (1) or Formula (10).

ステップＳ４では，全てのフレーム間予測ブロックに対して，予測処理が終了したか否かの判定処理を行い，終了している場合は真値を出力し，終了する。そうでなければ偽値を出力し，ステップＳ２へ戻って同様に処理を繰り返す。 In step S4, it is determined whether or not the prediction process has been completed for all inter-frame prediction blocks. If it has been completed, a true value is output and the process ends. Otherwise, a false value is output, and the process returns to step S2 to repeat the same process.

次に，図６に従って，図５のステップＳ２の処理の詳細，すなわち本実施形態における予測係数算出処理の流れを説明する。 Next, according to FIG. 6, the details of the process of step S2 of FIG.

ステップＳ２１では，ステップＳ１で書き出された参照信号である復号信号に対する量子化パラメータを読み込む。 In step S21, the quantization parameter for the decoded signal that is the reference signal written in step S1 is read.

ステップＳ２２では，縮小画像生成に用いたローパスフィルタの伝達関数を入力とし，縮小処理を反映した重み係数を算出する処理を行い，同重み係数をレジスタに出力する。具体的な算出方法は，式（８）に従う。 In step S22, the transfer function of the low-pass filter used for generating the reduced image is input, a process of calculating a weighting factor reflecting the reduction process is performed, and the weighting factor is output to the register. A specific calculation method follows Formula (8).

ステップＳ２３では，ステップＳ２１で読み込んだ量子化パラメータ，ステップＳ２２で書き出された重み係数を入力とし，画質予測強度を算出する処理を行い，算出した画質予測強度をレジスタに出力する。具体的な算出方法は，式（７）に従う。 In step S23, the quantization parameter read in step S21 and the weighting factor written in step S22 are input, and the process for calculating the image quality prediction strength is performed, and the calculated image quality prediction strength is output to the register. A specific calculation method follows Formula (7).

ステップＳ２４では，ブロックサイズおよび空間予測強度の関数を入力とし，ブロック内の各画素毎に空間位置に応じた重み係数を算出する処理を行い，算出した重み係数をレジスタに出力する。具体的な算出方法は，式（４）に従う。なお，本処理は，以下のステップＳ２５において，式（５）に従う処理を行う場合は不要である。 In step S24, the function of the block size and the spatial prediction intensity is input, a process of calculating a weighting factor corresponding to the spatial position is performed for each pixel in the block, and the calculated weighting factor is output to the register. A specific calculation method follows Formula (4). This process is not necessary when the process according to the equation (5) is performed in the following step S25.

ステップＳ２５では，ブロックサイズおよび空間予測強度の関数を入力とし，ブロック内の各画素毎に空間予測強度を算出する処理を行い，算出した空間予測強度をレジスタに出力する。具体的な算出方法は，式（３）あるいは式（５）に従う。 In step S25, the function of the block size and the spatial prediction intensity is input, a process of calculating the spatial prediction intensity for each pixel in the block is performed, and the calculated spatial prediction intensity is output to the register. A specific calculation method follows Formula (3) or Formula (5).

ステップＳ２６では，ステップＳ２３およびＳ２５で出力した画質予測強度および空間予測強度を入力とし，予測係数を算出する処理を行い，算出した予測係数をレジスタに書き出す。具体的な算出方法は，式（９）あるいは式（１３）に従う。 In step S26, the image quality prediction strength and the spatial prediction strength output in steps S23 and S25 are input, a process for calculating a prediction coefficient is performed, and the calculated prediction coefficient is written to a register. A specific calculation method follows Formula (9) or Formula (13).

ステップＳ２７では，フレーム間予測ブロックにおける全ての参照信号に対して，予測係数の算出が終了したか否かの判定処理を行い，終了している場合には真値を出力し，図５のステップＳ３へ進む。そうでなければ偽値を出力し，ステップＳ２４以降の処理を繰り返す。 In step S27, it is determined whether or not the calculation of the prediction coefficient has been completed for all the reference signals in the inter-frame prediction block. If the calculation has been completed, a true value is output. Proceed to S3. Otherwise, a false value is output, and the processing after step S24 is repeated.

［予測装置］
図７に本発明の実施形態に係る予測装置のブロック図を示す。 [Prediction device]
FIG. 7 shows a block diagram of a prediction apparatus according to the embodiment of the present invention.

予測モード情報が入力され，予測モード記憶部１０に書き出される。同モードが第一予測モードの場合，予測係数算出部１１から予測信号記憶部１６までの処理と，予測係数算出部２１から予測信号記憶部２６までの処理とを行い，同モードが第二予測モードの場合，さらに予測係数算出部３１から予測信号記憶部３６までの処理を行う。 Prediction mode information is input and written to the prediction mode storage unit 10. When the mode is the first prediction mode, the process from the prediction coefficient calculation unit 11 to the prediction signal storage unit 16 and the process from the prediction coefficient calculation unit 21 to the prediction signal storage unit 26 are performed, and the mode is the second prediction mode. In the mode, processing from the prediction coefficient calculation unit 31 to the prediction signal storage unit 36 is further performed.

予測係数算出部１１は，予測モード記憶部１０から読み出した予測モードを入力とし，参照信号の予測係数を算出する処理を行う。本処理の詳細については，図８を用いて後述する。 The prediction coefficient calculation unit 11 receives the prediction mode read from the prediction mode storage unit 10 and performs processing for calculating the prediction coefficient of the reference signal. Details of this processing will be described later with reference to FIG.

まず，参照信号となる下位階層の復号信号が，復号信号記憶部１２に書き出される。階層間予測処理部１３は，復号信号記憶部１２から読み出した復号信号を入力として，アップサンプリングによる補間処理および階層間予測処理を行い，予測信号を予測信号記憶部１４に書き出す。予測係数乗算処理部１５は，予測係数算出部１１が算出した予測係数および予測信号記憶部１４から読み出した予測信号を入力とし，入力された予測信号に予測係数を乗じる処理を行い，乗算後の結果を予測信号記憶部１６に書き出す。 First, a decoded signal of a lower layer that becomes a reference signal is written to the decoded signal storage unit 12. The inter-layer prediction processing unit 13 receives the decoded signal read from the decoded signal storage unit 12, performs interpolation processing by upsampling and inter-layer prediction processing, and writes the prediction signal to the prediction signal storage unit 14. The prediction coefficient multiplication processing unit 15 receives the prediction coefficient calculated by the prediction coefficient calculation unit 11 and the prediction signal read from the prediction signal storage unit 14, performs a process of multiplying the input prediction signal by the prediction coefficient, and performs multiplication. The result is written in the prediction signal storage unit 16.

予測係数算出部２１は，予測係数算出部１１と同じである。復号信号記憶部２２には，参照信号となる同一階層の復号信号が書き出される。フレーム間予測処理部２３は，復号信号記憶部２２から読み出した復号信号を入力として，動き補償によるフレーム間予測処理を行い，予測信号を予測信号記憶部２４に書き出す。予測係数乗算処理部２５は，予測係数算出部２１が算出した予測係数および予測信号記憶部２４から読み出した予測信号を入力とし，入力された予測信号に予測係数を乗じる処理を行い，乗算後の結果を予測信号記憶部２６に書き出す。 The prediction coefficient calculation unit 21 is the same as the prediction coefficient calculation unit 11. In the decoded signal storage unit 22, a decoded signal of the same layer as a reference signal is written. The inter-frame prediction processing unit 23 receives the decoded signal read from the decoded signal storage unit 22, performs inter-frame prediction processing by motion compensation, and writes the prediction signal to the prediction signal storage unit 24. The prediction coefficient multiplication processing unit 25 receives the prediction coefficient calculated by the prediction coefficient calculation unit 21 and the prediction signal read from the prediction signal storage unit 24, performs a process of multiplying the input prediction signal by the prediction coefficient, The result is written in the prediction signal storage unit 26.

予測係数算出部３１から予測信号記憶部３６までは，予測モードが第二予測モードの場合に用いられるが，これらの処理は，復号信号記憶部３２に記憶される参照信号となる復号信号が異なるだけであり，前述した予測係数算出部２１から予測信号記憶部２６までの処理と同じである。 The prediction coefficient calculation unit 31 to the prediction signal storage unit 36 are used when the prediction mode is the second prediction mode, but these processes differ in the decoded signal serving as the reference signal stored in the decoded signal storage unit 32. This is the same as the processing from the prediction coefficient calculation unit 21 to the prediction signal storage unit 26 described above.

多重化処理部４１は，予測信号記憶部１６，予測信号記憶部２６，予測信号記憶部３６から予測信号を読み込み，１つの予測信号として多重化する。なお，第一予測モードの場合，予測信号記憶部３６には予測信号は書き出されていないため，予測信号記憶部１６と予測信号記憶部２６とからだけ予測信号を読み出すことになる。 The multiplexing processing unit 41 reads the prediction signal from the prediction signal storage unit 16, the prediction signal storage unit 26, and the prediction signal storage unit 36, and multiplexes them as one prediction signal. In the first prediction mode, since the prediction signal is not written in the prediction signal storage unit 36, the prediction signal is read only from the prediction signal storage unit 16 and the prediction signal storage unit 26.

図８は，図７に示す予測係数算出部１１の構成例を示している。以下，予測係数算出部１１の詳細を図８を用いて説明する。 FIG. 8 shows a configuration example of the prediction coefficient calculation unit 11 shown in FIG. Hereinafter, details of the prediction coefficient calculation unit 11 will be described with reference to FIG.

量子化パラメータ記憶部５１に，参照信号の符号化に用いた量子化パラメータを読み込み格納する。重み係数記憶部５２には，画質予測強度の算出に用いる重み係数βをあらかじめ格納しておく。画質予測強度算出部５３は，量子化パラメータ記憶部５１および重み係数記憶部５２から読み込んだ量子化パラメータおよび重み係数を入力とし，画質予測強度を算出する処理を行い，画質予測強度記憶部５４に書き出す。画質予測強度の具体的な算出方法は，式（７）に従う。 The quantization parameter storage unit 51 reads and stores the quantization parameter used for encoding the reference signal. In the weighting coefficient storage unit 52, a weighting coefficient β used for calculating the image quality prediction strength is stored in advance. The image quality prediction strength calculation unit 53 receives the quantization parameter and the weight coefficient read from the quantization parameter storage unit 51 and the weight coefficient storage unit 52, performs processing for calculating the image quality prediction strength, and stores the image quality prediction strength storage unit 54 in the image quality prediction strength storage unit 54. Write out. A specific method for calculating the image quality prediction strength follows equation (7).

一方，空間予測強度算出部５５は，空間予測強度の算出を行い，空間予測強度記憶部５６に書き出す。具体的な算出方法は，式（３）あるいは式（５）に従う。なお，パラメータａ，ｃは，外部から与えられるものとする。 On the other hand, the spatial prediction strength calculation unit 55 calculates the spatial prediction strength and writes it to the spatial prediction strength storage unit 56. A specific calculation method follows Formula (3) or Formula (5). Parameters a and c are given from the outside.

予測係数算出部５７は，画質予測強度記憶部５４および空間予測強度記憶部５６から読み出した画質予測強度，空間予測強度を入力とし，これらを用いて予測係数を算出する処理を行い，予測係数記憶部５８に書き出す。具体的な算出方法は，式（９）あるいは式（１３）に従う。 The prediction coefficient calculation unit 57 receives the image quality prediction intensity and the spatial prediction intensity read from the image quality prediction intensity storage unit 54 and the spatial prediction intensity storage unit 56, performs a process of calculating a prediction coefficient using these, and stores the prediction coefficient storage. Write to part 58. A specific calculation method follows Formula (9) or Formula (13).

［符号化装置］
上述の予測装置は，例えば図９に示すスケーラブル符号化装置の一部として用いられる。図９において，拡張階層符号化部７０における予測処理部７９が，図７に示す予測装置に相当する部分である。 [Encoding device]
The above prediction device is used as a part of the scalable encoding device shown in FIG. 9, for example. In FIG. 9, the prediction processing unit 79 in the enhancement layer encoding unit 70 is a part corresponding to the prediction device shown in FIG.

本装置において，階層分離器６１は，入力画像である符号化対象フレームを入力として，空間解像度の異なる階層に分離し，各階層の信号を各々，基本階層信号記憶部６２，拡張階層信号記憶部６３に書き出す。 In this apparatus, the layer separator 61 receives an encoding target frame that is an input image and separates it into layers having different spatial resolutions, and each layer signal is divided into a basic layer signal storage unit 62 and an extended layer signal storage unit. Write to 63.

基本階層符号化部６４は，基本階層信号記憶部６２から読み出した基本階層信号を入力とし，同信号に対して符号化処理を行い，符号化ストリームを符号化ストリーム記憶部６５に書き出す。なお，具体的な符号化方法は，外部から与えられるものとする。例えば，よく知られている動き補償と離散コサイン変換を用いた動画像符号化方法等を用いることができる。 The base layer encoding unit 64 receives the base layer signal read from the base layer signal storage unit 62, performs an encoding process on the signal, and writes the encoded stream to the encoded stream storage unit 65. A specific encoding method is assumed to be given from the outside. For example, a well-known moving picture encoding method using motion compensation and discrete cosine transform can be used.

ローカル復号画像取得部６６は，符号化ストリーム記憶部６５から読み出した基本階層の符号化ストリームを入力とし，基本階層符号化部６４が行った符号化処理に対応する復号処理を行い，復号した画像をローカル復号画像記憶部に６７に書き込む。 The local decoded image acquisition unit 66 receives the base layer encoded stream read from the encoded stream storage unit 65, performs a decoding process corresponding to the encoding process performed by the base layer encoding unit 64, and outputs a decoded image. Is written in 67 in the local decoded image storage unit.

拡張階層符号化部７０における変換部７１は，拡張階層信号記憶部６３から読み出した拡張階層信号を入力とし，変換処理（例えば，離散コサイン変換）を行い，算出された変換係数を変換係数記憶部７２へ書き出す。量子化部７３は，変換係数記憶部７２から読み出した変換係数を入力とし，量子化処理を行い，量子化値を量子化値記憶部７４へ書き出す。 The conversion unit 71 in the enhancement layer encoding unit 70 receives the enhancement layer signal read from the enhancement layer signal storage unit 63, performs conversion processing (for example, discrete cosine transform), and converts the calculated conversion coefficient into a conversion coefficient storage unit. Write to 72. The quantization unit 73 receives the transform coefficient read from the transform coefficient storage unit 72, performs quantization processing, and writes the quantized value to the quantized value storage unit 74.

逆量子化部７５は，量子化値記憶部７４から読み出した量子化値を入力とし，逆量子化処理を行い，逆量子化値記憶部７６へ書き出す。逆変換部７７は，逆量子化値記憶部７６から読み出した変換係数を入力とし，逆変換処理を行い，その結果をローカル復号信号記憶部７８へ書き出す。 The inverse quantization unit 75 receives the quantization value read from the quantization value storage unit 74, performs an inverse quantization process, and writes it to the inverse quantization value storage unit 76. The inverse transform unit 77 receives the transform coefficient read from the inverse quantized value storage unit 76, performs an inverse transform process, and writes the result to the local decoded signal storage unit 78.

予測処理部７９は，ローカル復号信号記憶部７８から読み出した拡張階層のローカル復号画像と遅延器８１の出力との加算値，および，基本階層のローカル復号画像記憶部６７から読み出したローカル復号画像を入力とし，予測処理を行い，予測信号記憶部８０に書き出す。本処理の詳細は，図７および図８を用いて説明したとおりである。 The prediction processing unit 79 adds the local decoded image of the enhancement layer read from the local decoded signal storage unit 78 and the output of the delay unit 81 and the local decoded image read from the local decoded image storage unit 67 of the base layer. As an input, a prediction process is performed, and the result is written in the prediction signal storage unit 80. Details of this process are as described with reference to FIGS.

なお，予測信号記憶部８０から読み出した予測信号は，遅延器８１に入力され，１フレーム分遅延された後，ローカル復号信号記憶部７８から読み出された拡張階層のローカル復号画像と加算される。また，予測信号記憶部８０に記憶された予測信号は，変換部７１への拡張階層信号の入力時に用いられ，拡張階層信号と予測信号との差分信号が変換部７１への入力となる。 Note that the prediction signal read from the prediction signal storage unit 80 is input to the delay unit 81, delayed by one frame, and then added to the local decoded image of the enhancement layer read from the local decoded signal storage unit 78. . Further, the prediction signal stored in the prediction signal storage unit 80 is used when the enhancement layer signal is input to the conversion unit 71, and a difference signal between the enhancement layer signal and the prediction signal is input to the conversion unit 71.

エントロピ符号化部８２は，量子化値記憶部７４から読み出した量子化値を入力とし，エントロピ符号化処理を行い，符号化結果を符号化ストリーム記憶部８３へ書き出す。多重化器６８は，符号化ストリーム記憶部６５および符号化ストリーム記憶部８３から読み出した符号化ストリームを多重化する処理を行い，スケーラブル符号化結果として出力する。 The entropy encoding unit 82 receives the quantized value read from the quantized value storage unit 74, performs entropy encoding processing, and writes the encoded result to the encoded stream storage unit 83. The multiplexer 68 performs a process of multiplexing the encoded streams read from the encoded stream storage unit 65 and the encoded stream storage unit 83, and outputs the result as a scalable encoding result.

［復号装置］
上述の予測装置は，例えば図１０に示すスケーラブル復号装置の一部としても用いられる。図１０における予測処理部１０５が，図７に示す予測装置に相当する部分である。 [Decoding device]
The above prediction device is also used as a part of the scalable decoding device shown in FIG. 10, for example. The prediction processing unit 105 in FIG. 10 corresponds to the prediction device shown in FIG.

本装置において，分離器９１は，スケーラブル符号化装置から出力された符号化ストリームを入力とし，同ストリームを基本階層符号化ストリームと拡張階層符号化ストリームとに分離する処理を行い，基本階層符号化ストリームおよび拡張階層符号化ストリームを各々，基本階層符号化ストリーム記憶部９２，拡張階層符号化ストリーム記憶部９５に書き出す。 In this apparatus, the separator 91 receives the encoded stream output from the scalable encoding apparatus, performs a process of separating the stream into a base layer encoded stream and an enhancement layer encoded stream, and performs base layer encoding. The stream and the enhancement layer encoded stream are written in the base layer encoded stream storage unit 92 and the extension layer encoded stream storage unit 95, respectively.

基本階層復号部９３は，基本階層符号化ストリーム記憶部９２から読み出した符号化ストリームを入力とし，同ストリームに対して復号処理を行い，復号結果を基本階層信号記憶部９４に書き出す。なお，具体的な復号方法は，外部から与えられるものとする。例えば，動き補償と逆離散コサイン変換を用いた動画像復号方法等を用いることができる。 The base layer decoding unit 93 receives the encoded stream read from the base layer encoded stream storage unit 92, performs a decoding process on the stream, and writes the decoding result in the base layer signal storage unit 94. Note that a specific decoding method is given from the outside. For example, a moving picture decoding method using motion compensation and inverse discrete cosine transform can be used.

エントロピ復号部９６は，拡張階層符号化ストリーム記憶部９５から読み出した符号化ストリームを入力とし，エントロピ復号処理を行い，復号された量子化値を量子化値記憶部９７へ書き出す。逆量子化部９８は，量子化値記憶部９７から読み出した量子化値を入力とし，逆量子化処理を行い，その結果を変換係数記憶部９９へ書き出す。逆変換部１００は，変換係数記憶部９９から読み出した変換係数を入力とし，逆変換処理を行い，復号信号記憶部１０１へ書き出す。加算器１０２は，復号信号記憶部１０１から読み出した復号信号と予測信号記憶部１０６の出力との加算値を拡張階層信号記憶部１０３に書き出す。 The entropy decoding unit 96 receives the encoded stream read from the enhancement layer encoded stream storage unit 95, performs entropy decoding processing, and writes the decoded quantized value to the quantized value storage unit 97. The inverse quantization unit 98 receives the quantization value read from the quantization value storage unit 97, performs an inverse quantization process, and writes the result to the transform coefficient storage unit 99. The inverse transform unit 100 receives the transform coefficient read from the transform coefficient storage unit 99, performs an inverse transform process, and writes it to the decoded signal storage unit 101. The adder 102 writes the added value of the decoded signal read from the decoded signal storage unit 101 and the output of the prediction signal storage unit 106 to the enhancement layer signal storage unit 103.

拡張階層信号記憶部１０３に記憶された拡張階層信号は，外部に出力されるとともに，遅延器１０４に書き出され，遅延器１０４において１フレーム分遅延される。 The enhancement layer signal stored in the enhancement layer signal storage unit 103 is output to the outside and written to the delay unit 104, and is delayed by one frame in the delay unit 104.

予測処理部１０５は，基本階層信号記憶部９４から読み出した復号信号と遅延器１０４の出力である拡張階層信号を入力とし，予測処理を行い，予測信号記憶部１０６に書き出す。本処理の詳細は，図７および図８を用いて説明したとおりである。 The prediction processing unit 105 receives the decoded signal read from the base layer signal storage unit 94 and the enhancement layer signal that is the output of the delay unit 104 as input, performs prediction processing, and writes it to the prediction signal storage unit 106. Details of this process are as described with reference to FIGS.

以上のスケーラブル符号化および復号の処理は，コンピュータとソフトウェアプログラムとによっても実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも，ネットワークを通して提供することも可能である。 The above scalable encoding and decoding processes can be realized by a computer and a software program, and the program can be provided by being recorded on a computer-readable recording medium or via a network. is there.

本発明の概要を説明する図である。It is a figure explaining the outline | summary of this invention. 空間的な位置による領域の分類の例を示す図である。It is a figure which shows the example of the classification | category of the area | region by a spatial position. 下位階層も考慮した多参照信号予測の例を示す図である。It is a figure which shows the example of the multi reference signal estimation also considering the lower hierarchy. ブロック内の空間的な位置と空間予測強度の例を示す図である。It is a figure which shows the example of the spatial position in a block, and a spatial prediction intensity | strength. 本発明の実施形態に係る予測処理の流れを示す図である。It is a figure which shows the flow of the prediction process which concerns on embodiment of this invention. 本発明の実施形態に係る予測係数算出処理の流れを示す図である。It is a figure which shows the flow of the prediction coefficient calculation process which concerns on embodiment of this invention. 本発明の実施形態に係る予測装置のブロック図である。It is a block diagram of the prediction apparatus concerning the embodiment of the present invention. 予測係数算出部の構成例を示す図である。It is a figure which shows the structural example of a prediction coefficient calculation part. スケーラブル符号化装置の構成例を示す図である。It is a figure which shows the structural example of a scalable encoding apparatus. スケーラブル復号装置の構成例を示す図である。It is a figure which shows the structural example of a scalable decoding apparatus.

Explanation of symbols

１予測係数算出手段
２予測係数記憶手段
３階層間予測処理手段
４フレーム間予測処理手段
５予測信号生成手段
１０予測モード記憶部
１１，２１，３１予測係数算出部
１２，２２，３２復号信号記憶部
１３階層間予測処理部
２３，３３フレーム間予測処理部
１４，２４，３４予測信号記憶部
１５，２５，３５予測係数乗算処理部
１６，２６，３６予測信号記憶部
４１多重化処理部
４２予測信号記憶部 DESCRIPTION OF SYMBOLS 1 Prediction coefficient calculation means 2 Prediction coefficient storage means 3 Inter-layer prediction processing means 4 Inter-frame prediction processing means 5 Prediction signal generation means 10 Prediction mode storage part 11, 21, 31 Prediction coefficient calculation part 12, 22, 32 Decoded signal storage part 13 Inter-layer prediction processing unit 23, 33 Inter-frame prediction processing unit 14, 24, 34 Prediction signal storage unit 15, 25, 35 Prediction coefficient multiplication processing unit 16, 26, 36 Prediction signal storage unit 41 Multiplexing processing unit 42 Prediction signal Storage

Claims

A scalable video coding method using a prediction method using a plurality of frames as reference signals,
Prediction coefficient for inter-layer prediction in the former case, in the case of a pixel area where the position of the corresponding lower layer in the prediction block of the signal to be predicted is not near the block boundary and in the case of a pixel area near the block boundary Is predicted to have a value larger than the prediction coefficient for inter-layer prediction in the latter case and smaller than the prediction coefficient for inter-frame prediction in the latter case. And a prediction coefficient storage step for storing the prediction coefficient of the inter-layer prediction in the prediction coefficient storage means,
An inter-layer prediction processing step for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
An inter-frame prediction processing step for performing inter-frame prediction in a temporal direction with reference to neighboring frame signals having the same spatial resolution and generating a prediction signal based on the inter-frame prediction;
A prediction signal generation step of generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying a corresponding prediction coefficient stored in the prediction coefficient storage unit, respectively; ,
And a step of encoding a block of an encoding target frame using the generated prediction signal .

A scalable video coding method using a prediction method using a plurality of frames as reference signals,
The image quality prediction strength of the reference signal in inter-frame prediction and inter-layer prediction determined to be smaller as the quantization step width used for generation of the decoded image serving as the reference signal is larger, and the prediction block of the predicted signal In the case of the pixel region where the position of the corresponding lower layer is not in the vicinity of the block boundary and the case of the pixel region in the vicinity of the block boundary, the spatial prediction strength of inter-layer prediction in the former case is Inter- frame prediction and hierarchy determined so that the spatial prediction strength of the inter-frame prediction in the former case is larger than the spatial prediction strength of the inter-layer prediction and smaller than the spatial prediction strength of the inter-frame prediction in the latter case and a spatial prediction intensity of the reference signal between the prediction, the higher the quality prediction strength is large, and calculates of such a larger value as the spatial prediction strength is greater The prediction coefficients and the inter-frame prediction and inter-layer prediction, the prediction coefficient storage step of storing the prediction coefficient storage means,
An inter-layer prediction processing step for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
An inter-frame prediction processing step for performing inter-frame prediction in a temporal direction with reference to neighboring frame signals having the same spatial resolution and generating a prediction signal based on the inter-frame prediction;
A prediction signal generation step of generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying a corresponding prediction coefficient stored in the prediction coefficient storage unit, respectively; ,
And a step of encoding a block of an encoding target frame using the generated prediction signal .

In the scalable encoding method according to claim 1 or 2 ,
As a prediction mode, it has a first prediction mode that combines inter-layer prediction and temporal unidirectional inter-frame prediction, and a second prediction mode that combines inter-layer prediction and temporal bi-directional inter-frame prediction,
The inter-frame prediction processing step performs a unidirectional prediction in the temporal direction in the case of the first prediction mode, and performs a bidirectional prediction in the temporal direction in the case of the second prediction mode. Method.

A scalable video coding apparatus using a prediction method using a plurality of frames as reference signals,
Prediction coefficient for inter-layer prediction in the former case, in the case of a pixel area where the position of the corresponding lower layer in the prediction block of the signal to be predicted is not near the block boundary and in the case of a pixel area near the block boundary Is predicted to have a value larger than the prediction coefficient of inter-layer prediction in the latter case and smaller than the prediction coefficient of inter-frame prediction in the latter case. And a prediction coefficient storage means for storing prediction coefficients of the inter-layer prediction,
Inter-layer prediction processing means for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
Inter-frame prediction processing means for performing inter-frame prediction in a temporal direction referring to neighboring frame signals having the same spatial resolution and generating a prediction signal based on the inter-frame prediction;
Prediction signal generation means for generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by a corresponding prediction coefficient stored in the prediction coefficient storage means, respectively. ,
A scalable encoding device comprising: means for encoding a block of an encoding target frame using the generated prediction signal .

A scalable video coding apparatus using a prediction method using a plurality of frames as reference signals,
The image quality prediction strength of the reference signal in inter-frame prediction and inter-layer prediction determined to be smaller as the quantization step width used for generation of the decoded image serving as the reference signal is larger, and the prediction block of the predicted signal In the case of the pixel region where the position of the corresponding lower layer is not in the vicinity of the block boundary and the case of the pixel region in the vicinity of the block boundary, the spatial prediction strength of inter-layer prediction in the former case is Inter- frame prediction and hierarchy determined so that the spatial prediction strength of the inter-frame prediction in the former case is larger than the spatial prediction strength of the inter-layer prediction and smaller than the spatial prediction strength of the inter-frame prediction in the latter case and a spatial prediction intensity of the reference signal between the prediction, the higher the quality prediction strength is large, and calculates of such a larger value as the spatial prediction strength is greater Prediction coefficient storage means for storing a prediction coefficient,
Inter-layer prediction processing means for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
Inter-frame prediction processing means for performing inter-frame prediction in the temporal direction with reference to neighboring frame signals having the same spatial resolution and generating a prediction signal based on inter-frame prediction;
Prediction signal generating means for generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying corresponding prediction coefficients stored in the prediction coefficient storage means, respectively. ,
A scalable encoding device comprising: means for encoding a block of an encoding target frame using the generated prediction signal .

A scalable decoding method for decoding a moving image encoded using a prediction method using a plurality of frames as reference signals,
Prediction coefficient for inter-layer prediction in the former case, in the case of a pixel region where the position of the corresponding lower layer in the prediction block of the signal to be predicted is not in the vicinity of the block boundary and in the case of a pixel region in the vicinity of the block boundary Is predicted to have a value larger than the prediction coefficient for inter-layer prediction in the latter case and smaller than the prediction coefficient for inter-frame prediction in the latter case. And a prediction coefficient storage step for storing the prediction coefficient of the inter-layer prediction in the prediction coefficient storage means,
An inter-layer prediction processing step for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
An inter-frame prediction processing step for performing inter-frame prediction in a temporal direction with reference to neighboring frame signals having the same spatial resolution and generating a prediction signal based on the inter-frame prediction;
A prediction signal generation step of generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying a corresponding prediction coefficient stored in the prediction coefficient storage unit, respectively; ,
And a step of decoding a block of a decoding target frame using the generated prediction signal .

A scalable decoding method for decoding a moving image encoded using a prediction method using a plurality of frames as reference signals,
The image quality prediction strength of the reference signal in inter-frame prediction and inter-layer prediction determined to be smaller as the quantization step width used for generation of the decoded image serving as the reference signal is larger, and the prediction block of the predicted signal In the case of the pixel region where the position of the corresponding lower layer is not in the vicinity of the block boundary and the case of the pixel region in the vicinity of the block boundary, the spatial prediction strength of inter-layer prediction in the former case is Inter- frame prediction and hierarchy determined so that the spatial prediction strength of the inter-frame prediction in the former case is larger than the spatial prediction strength of the inter-layer prediction and smaller than the spatial prediction strength of the inter-frame prediction in the latter case and a spatial prediction intensity of the reference signal between the prediction, the higher the quality prediction strength is large, and calculates of such a larger value as the spatial prediction strength is greater The prediction coefficients and the inter-frame prediction and inter-layer prediction, the prediction coefficient storage step of storing the prediction coefficient storage means,
An inter-layer prediction processing step for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
An inter-frame prediction processing step for performing inter-frame prediction in a temporal direction with reference to neighboring frame signals having the same spatial resolution and generating a prediction signal based on the inter-frame prediction;
A prediction signal generation step of generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying corresponding prediction coefficients stored in the prediction coefficient storage means, respectively, ,
And a step of decoding a block of a decoding target frame using the generated prediction signal .

A scalable decoding device that decodes a moving image encoded using a prediction method using a plurality of frames as reference signals,
Prediction coefficient for inter-layer prediction in the former case, in the case of a pixel region where the position of the corresponding lower layer in the prediction block of the signal to be predicted is not in the vicinity of the block boundary and in the case of a pixel region in the vicinity of the block boundary Is predicted to have a value larger than the prediction coefficient for inter-layer prediction in the latter case and smaller than the prediction coefficient for inter-frame prediction in the latter case. And a prediction coefficient storage means for storing prediction coefficients of the inter-layer prediction,
Inter-layer prediction processing means for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
Inter-frame prediction processing means for performing inter-frame prediction in a temporal direction referring to neighboring frame signals having the same spatial resolution and generating a prediction signal based on the inter-frame prediction;
Prediction signal generating means for generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying corresponding prediction coefficients stored in the prediction coefficient storage means, respectively. ,
A scalable decoding device comprising: means for decoding a block of a decoding target frame using the generated prediction signal .

A scalable decoding device that decodes a moving image encoded using a prediction method using a plurality of frames as reference signals,
The image quality prediction strength of the reference signal in inter-frame prediction and inter-layer prediction determined to be smaller as the quantization step width used for generation of the decoded image serving as the reference signal is larger, and the prediction block of the predicted signal In the case of the pixel region where the position of the corresponding lower layer is not in the vicinity of the block boundary and the case of the pixel region in the vicinity of the block boundary, the spatial prediction strength of inter-layer prediction in the former case is Inter- frame prediction and hierarchy determined so that the spatial prediction strength of the inter-frame prediction in the former case is larger than the spatial prediction strength of the inter-layer prediction and smaller than the spatial prediction strength of the inter-frame prediction in the latter case and a spatial prediction intensity of the reference signal between the prediction, the higher the quality prediction strength is large, and calculates of such a larger value as the spatial prediction strength is greater Prediction coefficient storage means for storing a prediction coefficient,
Inter-layer prediction processing means for performing inter-layer prediction with reference to lower layer signals having different spatial resolutions and generating a prediction signal based on inter-layer prediction;
Inter-frame prediction processing means for performing inter-frame prediction in the temporal direction with reference to neighboring frame signals having the same spatial resolution and generating a prediction signal based on inter-frame prediction;
Prediction signal generating means for generating a prediction signal by multiplying the prediction signal based on the inter-layer prediction and the prediction signal based on the inter-frame prediction by multiplying corresponding prediction coefficients stored in the prediction coefficient storage means, respectively. ,
A scalable decoding device comprising: means for decoding a block of a decoding target frame using the generated prediction signal .

A scalable encoding program for causing a computer to execute the scalable encoding method according to claim 1 , claim 2 or claim 3 .

A computer-readable recording medium storing a scalable encoding program for causing a computer to execute the scalable encoding method according to claim 1, 2 or 3 .

A scalable decoding program for causing a computer to execute the scalable decoding method according to claim 6 .

A computer-readable recording medium on which a scalable decoding program for causing a computer to execute the scalable decoding method according to claim 6 or 7 is recorded.