JPH10174126A

JPH10174126A - Synchronization of stereoscopic video sequence

Info

Publication number: JPH10174126A
Application number: JP9329371A
Authority: JP
Inventors: Xuemin Chen; スーミン・チェン
Original assignee: NextLevel Systems Inc
Current assignee: Arris Technology Inc
Priority date: 1996-10-24
Filing date: 1997-10-24
Publication date: 1998-06-26
Also published as: CN1112806C; DE69722556T2; EP0838959A3; CA2218607C; CA2218607A1; TW349311B; NO974891L; HK1010775A1; US5886736A; NO974891D0; DE69722556D1; MX9708213A; CN1187734A; EP0838959B1; EP0838959A2; KR19980033135A

Abstract

PROBLEM TO BE SOLVED: To apply optimum image transmission order for minimizing a decoder input buffer size required for a system by arranging video images so as to be transmitted after respective low-order layer images corresponding to an imbalance predicted reinforce layer image. SOLUTION: The low-order layer and reinforce layer video sequences are received by a temporary multiplexer 105, the reinforce layer video is applied to a reinforce encoder 110, and the base layer video is applied to a low-order encoder 115. Afterwards, the encoded reinforce and base layers are demultiplexed by a system demultiplexer 125 through a system multiplexer 120 for transmitting them to a decoder as move streams, the encoded reinforce layer data are passed through a reinforce decoder 130, the low-order layer data are passed through a low-order decoder 135 and by combining the base layer data and reinforce layer data decoded by a temporary remultiplexer 140, reinforce and low-order layer output signals are outputted.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は立体ビデオシーケン
スの復合化及びディスプレイ（例えば，プレゼンテーシ
ョン）を同期化するための装置及び方法に関する。特
に，要求されるデコーダ入力バッファサイズを最小化す
る対応する最適ビットストリーム送信順序に加え，強化
層のプレゼンテーション・タイム・スタンプ及びデコー
ディング・タイム・スタンプを決定するための装置が与
えられる。FIELD OF THE INVENTION The present invention relates to an apparatus and method for decoding a stereoscopic video sequence and synchronizing a display (eg, a presentation). In particular, an apparatus is provided for determining a presentation time stamp and a decoding time stamp of the enhancement layer, in addition to a corresponding optimal bit stream transmission order which minimizes the required decoder input buffer size.

【０００２】[0002]

【従来の技術】デジタル技術はアナログ技術よりも非常
に高い品質の信号を提供しまたかつては扱えなかった付
加的な特徴を与えるために，ビデオ及びオーディオサー
ビスの消費者への提供を革新した。デジタル装置は特
に，ケーブルテレビネットワークを通じてまたは衛星に
よってケーブルテレビ関連会社へ及び／または直接家庭
衛星テレビ受信機へ放送される信号に対して特に有利で
ある。そのようなシステムにおいて，契約者は，オリジ
ナルビデオ及びオーディオ信号を再構成するためにデー
タを減圧しかつ復合化するレシーバ／デスクランブラを
通じてデジタルデータストリームを受信する。該デジタ
ルレシーバはその処理に使用するためのマイクロコンピ
ュータ及びメモリ保存エレメントを含む。BACKGROUND OF THE INVENTION Digital technology has revolutionized the provision of video and audio services to consumers to provide much higher quality signals than analog technology and to provide additional features that were once unhandled. Digital devices are particularly advantageous for signals broadcast via cable television networks or by satellite to cable television affiliates and / or directly to home satellite television receivers. In such a system, the subscriber receives the digital data stream through a receiver / descrambler that decompresses and decrypts the data to reconstruct the original video and audio signals. The digital receiver includes a microcomputer and a memory storage element for use in its processing.

【０００３】高品質のビデオ及びオーディオを与えなが
ら低コストのレシーバを与えるためには，処理すべきデ
ータ量を制限することが必要である。さらに，デジタル
信号の送信用の有効帯域幅もまた，物理的制約，現存の
通信プロトコル及び政府標準によって制限される。した
がって，特定のビデオ画像（例えば，フレーム）におけ
る隣接する画素間の空間的相関を利用するさまざまなフ
レーム内(intra-frame)データ圧縮手法が開発されてき
た。To provide a low cost receiver while providing high quality video and audio, it is necessary to limit the amount of data to be processed. In addition, the effective bandwidth for transmitting digital signals is also limited by physical constraints, existing communication protocols and government standards. Accordingly, various intra-frame data compression techniques have been developed that utilize the spatial correlation between adjacent pixels in a particular video image (eg, frame).

【０００４】さらに，フレーム間(inter-frame)圧縮手
法は，移動補償データ(motion compensation data)及び
ブロック一致移動推定アルゴリズム(block-matching mo
tionestimation algorithm)を使うことによって，連続
フレームの対応する領域間の時間的相関を利用する。こ
の場合，移動ベクトルは，特定のカレントブロックに最
も近似する先行画像内のブロックを識別することによっ
て映像のカレント画像内の各ブロックに対して決定され
る。その後，全カレント画像は，対応するブロック対の
間の差を表すデータとともに対応する対を識別するのに
必要な移動ベクトルを送ることによってデコーダで再構
成される。離散コサイン変換(DCT)のようなブロックベ
ースの空間圧縮技術と組み合わされるとき，ブロック一
致移動推定アルゴリズムは特に効果的である。Further, the inter-frame compression technique uses motion compensation data and a block-matching motion estimation algorithm.
The temporal correlation between the corresponding regions of successive frames is used by using the motion estimation algorithm. In this case, a motion vector is determined for each block in the current image of the video by identifying the block in the preceding image that most closely approximates the particular current block. Thereafter, the entire current image is reconstructed at the decoder by sending the motion vectors necessary to identify the corresponding pair along with data representing the difference between the corresponding block pair. Block-matched motion estimation algorithms are particularly effective when combined with block-based spatial compression techniques such as the discrete cosine transform (DCT).

【０００５】付加的に，提案された立体ビデオ送信フォ
ーマットとして興味あるものは，ここに参考文献として
組み込む，ドキュメント ISO/IEC JTC1/SC29/WG11 N108
8題名"Proposed Draft Amendment No.3 to 13818-2 (Mu
lti-view Profile)"1995年11月に記載された，モーショ
ン・ピクチャー・エキスパート・グループ(MPEG)MPEG-2
マルチ-ビュー・プロファイル(MVP)システムである。
立体ビデオはより深いフィールドと結合された映像を生
成するべく同一画像のわずかなオフセット像を与え，そ
れによって３次元(3-D)効果を作成する。そのようなシ
ステムにおいて，２つのカメラが２つの独立のビデオ信
号でイベントを記録するよう約２インチ離れて配置され
ている。カメラの間隔は，ほぼ人間の左右の目の間隔と
同じである。さらに，ある立体ビデオカムコーダに関し
て，ひとつのカムコーダヘッド内に２つのレンズが収納
され，例えば，映像を横切ってパンニングするとき同期
して動く。２つのビデオ信号は，通常の人間の視覚に対
応する深さのフィールドを有するイメージを生成するべ
くレシーバにおいて受信されかつ再結合される。その他
の特別の効果もまた与えられる。[0005] Additionally, of interest as a proposed stereoscopic video transmission format, the document ISO / IEC JTC1 / SC29 / WG11 N108 is hereby incorporated by reference.
8 Title "Proposed Draft Amendment No.3 to 13818-2 (Mu
lti-view Profile) "Motion Picture Expert Group (MPEG) MPEG-2, described in November 1995.
Multi-view profile (MVP) system.
Stereoscopic video provides a slight offset image of the same image to produce a picture combined with a deeper field, thereby creating a three-dimensional (3-D) effect. In such a system, two cameras are positioned about two inches apart to record the event with two independent video signals. The distance between the cameras is almost the same as the distance between the left and right eyes of a human. Further, for some stereoscopic video camcorders, two lenses are housed within one camcorder head and move synchronously, for example, when panning across video. The two video signals are received and recombined at a receiver to produce an image having a field of depth corresponding to normal human vision. Other special effects are also provided.

【０００６】MPEG MVPシステムは多重化信号内で送信さ
れる２つのビデオレイヤーを含む。第１に，ベース層
（例えば，下位層(lower layer)）は３次元物体の左像
を表す。第２に，強化層(enhancement layer)（例え
ば，補助，または上層）は物体の右像を表す。該左像及
び右像は同一物体のものでありまた互いにわずかにオフ
セットされているだけなので，ベース及び強化層のビデ
オ映像の間にはしばしば大きな相関関係がある。この相
関関係は，ベース層に関して強化層を圧縮するために使
用され，それによって与えられた映像品質を維持するた
めに強化層内で送信される必要があるデータ量が減少す
る。映像品質は概してビデオデータの量子化レベルに対
応する。[0006] The MPEG MVP system includes two video layers transmitted in a multiplexed signal. First, the base layer (eg, the lower layer) represents the left image of the three-dimensional object. Second, the enhancement layer (e.g., auxiliary or upper) represents the right image of the object. Since the left and right images are of the same object and are only slightly offset from each other, there is often a large correlation between the base and enhancement layer video images. This correlation is used to compress the enhancement layer with respect to the base layer, thereby reducing the amount of data that needs to be transmitted within the enhancement layer to maintain a given video quality. Video quality generally corresponds to the quantization level of the video data.

【０００７】MPEG MVPシステムは３つのタイプのビデオ
画像，特に，内部符号化(intra-coded)画像（Ｉ-画
像），予測符号化(predictive-coded)画像（Ｐ-画
像），及び双方向予測符号化(bi-directionally predic
tive-coded)画像（Ｂ-画像）を含む。さらに，ベース層
はフレームまたはフィールド構造のビデオシーケンスを
含むが，強化層はフレーム構造のみ含む。Ｉ画像は他の
画像を参照することなく単一のビデオ画像を完全に説明
する。エラー隠蔽の改良のために，移動ベクトルはＩ画
像に与えられる。ベース層内のＰ画像及びＢ画像の両方
はＩ画像から予測されるため，Ｉ画像内のエラーは表示
されたビデオ上へより大きな衝撃を与えるような潜在性
を有している。さらに，強化層内の画像は，不均衡予測
(disparity prediction)として知られるクロスレイヤー
予測処理において，ベース層内の画像から予測される。
レイヤー内でひとつのフレームから他のフレームへの予
測は時間的予測(temporal prediction)として知られて
いる。[0007] The MPEG MVP system has three types of video pictures, in particular, intra-coded pictures (I-pictures), predictive-coded pictures (P-pictures), and bi-directional prediction. Encoding (bi-directionally predic
tive-coded) images (B-images). Further, the base layer includes a frame or field structured video sequence, while the enhancement layer includes only a frame structure. An I-picture completely describes a single video picture without reference to other pictures. For improved error concealment, a motion vector is provided to the I-picture. Since both the P and B pictures in the base layer are predicted from the I picture, errors in the I picture have the potential to have a greater impact on the displayed video. In addition, the images in the enhancement layer are
In a cross-layer prediction process known as (disparity prediction), prediction is made from an image in the base layer.
The prediction from one frame to another within a layer is known as temporal prediction.

【０００８】ベース層において，Ｐ画像は先行Ｉまたは
Ｐ画像に基づいて予測される。先行ＩまたはＰ画像から
未来のＰ画像へ参照がなされ，それは前方予測として知
られる。Ｂ画像は近接する先行ＩまたはＰ画像及び近接
する後続ＩまたはＰ画像から予測される。In the base layer, the P picture is predicted based on the preceding I or P picture. A reference is made from the preceding I or P picture to a future P picture, which is known as forward prediction. The B image is predicted from a neighboring preceding I or P image and a neighboring succeeding I or P image.

【０００９】強化層において，Ｐ画像は，(a)強化層内
で最も最近復合化された画像，(b)表示順に最も最近の
ベース層画像，または(c)表示順に次の下位層画像，か
ら予測される。ケース(b)は，表示順に最も最近のベー
ス層画像がＩ画像であるときに，しばしば使用される。
さらに，強化層内のＢ画像は，(d)前方予測に対して最
も最近の復合化された強化層画像及び後方予測に対して
表示順に最も最近の下位層画像を使って，(e)前方予測
に対して最も最近復合化された強化層画像及び後方予測
に対して表示順に次の下位層画像を使って，または(f)
前方予測に対して表示順に最も最近の下位層画像及び後
方予測に対して表示順に次の下位層画像を使って，予測
される。表示順に最も最近の下位層画像がＩ画像であれ
ば，そのＩ画像のみが予測符号化のために使用される
（例えば，前方予測は存在しない）。In the enhancement layer, the P image is composed of (a) the most recently decoded image in the enhancement layer, (b) the latest base layer image in display order, or (c) the next lower layer image in display order. Predicted from Case (b) is often used when the most recent base layer image in the display order is an I-image.
In addition, the B image in the enhancement layer is calculated using (d) the most recent decoded enhancement layer image for forward prediction and the latest lower layer image in display order for backward prediction, and (e) forward Using the most recently decoded enhancement layer image for prediction and the next lower layer image in display order for backward prediction, or (f)
The prediction is performed using the most recent lower layer image in display order for forward prediction and the next lower layer image in display order for backward prediction. If the most recent lower layer image in the display order is an I image, only that I image is used for predictive coding (for example, there is no forward prediction).

【００１０】予測モードの(a)，(b)及び(d)のみがMPEG
MVPシステム内に包含される点に注意すべきである。MVP
システムはMPEG時間測定可能符号化(MPEG temporal sca
lability coding)のサブセットであり，それは(a)〜(f)
の各モードを包含する。Only the prediction modes (a), (b) and (d) are MPEG
Note that it is included in the MVP system. MVP
The system uses MPEG temporal sca
lability coding) is a subset of (a) to (f)
Mode.

【００１１】ひとつの付加的な構成において，強化レイ
ヤーはＰ及びＢ画像のみを有し，Ｉ画像を有しない。未
来の画像（すなわち，いまだに表示されていない画像）
への参照は後方予測と呼ばれる。後方予測は強化層内で
は生じないことに注意すべきである。したがって，強化
レイヤー画像は表示順に送信される。圧縮率を増加する
際に後方予測が非常に有用であるような状況が存在す
る。例えば，ドアが開くところのシーンで，カレント画
像はドアの後方にあるものをドアがすでに開けられた未
来の画像に基づいて予測されることもできる。In one additional configuration, the enhancement layer has only P and B images and no I images. Future images (ie, images that have not yet been displayed)
The reference to is called backward prediction. It should be noted that backward prediction does not occur in the enhancement layer. Therefore, the enhancement layer images are transmitted in the display order. There are situations where backward prediction is very useful in increasing the compression ratio. For example, in a scene where a door opens, the current image can be predicted based on a future image in which the door has already been opened, the one behind the door.

【００１２】Ｂ画像は最大の圧縮を与えるが，最大のエ
ラーも伴う。エラーの伝搬を除去するために，Ｂ画像は
ベース層内の他のＢ画像からは決して予測されてはなら
ない。Ｐ画像はより少ないエラー及びより少ない圧縮を
与える。Ｉ画像は最小の圧縮を与えるが，ランダムアク
セスを与えることもできる。B-pictures provide the greatest compression, but also involve the greatest error. To eliminate the propagation of errors, B-pictures must never be predicted from other B-pictures in the base layer. P-pictures give less errors and less compression. I-pictures provide the least compression, but can also provide random access.

【００１３】[0013]

【発明が解決しようとする課題】したがって，ベースレ
イヤーにおいて，Ｐ画像を復合化するために，先行Ｉ画
像またはＰ画像は有効でなければならない。同様に，Ｂ
画像を復合化するために，先行ＰまたはＩ画像及び未来
のＰ及びＩ画像は有効でなければならない。結果的に，
ビデオ画像は，予測に使用されるすべての画像がそこか
ら予測された画像より前に符号化されるような依存関係
の順序で符号化されかつ送信される。符号化された信号
がデコーダで受信されたとき，ビデオ画像は復合化され
かつ表示用に並べ替えられる。したがって，表示前にデ
ータをバッファするために一時的保存エレメントが要求
される。しかし，比較的大きなデコーダ入力バッファに
対する要求は，デコーダの製造コストの増加をもたら
す。デコーダはできるだけ最低のコストで製造されなけ
ればならない大量生産品であるため，このことは不所望
である。Therefore, in order to decode a P-picture in the base layer, the preceding I-picture or P-picture must be valid. Similarly, B
In order to decode an image, the preceding P or I image and future P and I images must be valid. as a result,
Video images are encoded and transmitted in a dependency order such that all images used for prediction are encoded before images predicted therefrom. When the encoded signal is received at the decoder, the video image is decoded and reordered for display. Therefore, a temporary storage element is required to buffer the data before display. However, the requirement for a relatively large decoder input buffer results in increased decoder manufacturing costs. This is undesirable because the decoder is a mass product that must be manufactured at the lowest possible cost.

【００１４】したがって，強化及びベース層ビデオシー
ケンスの復合化及びプレゼンテーションを同期化する必
要がある。立体ビデオ用の復合化及びプレゼンテーショ
ンの同期化はMVPの特に重要な態様である。立体ビデオ
において２つの像が互いにぴったり結合されていること
は固有であるため，プレゼンテーションまたはディスプ
レイの同期の損失は視聴者に対し目の疲労や頭痛等の多
くの問題を引き起こす。Therefore, there is a need to synchronize the decoding and presentation of the enhancement and base layer video sequences. Decoding and presentation synchronization for stereoscopic video are particularly important aspects of MVP. Since it is inherent that two images are tightly coupled to each other in stereoscopic video, loss of presentation or display synchronization causes many problems for the viewer, such as eye fatigue and headaches.

【００１５】さらに，デジタル圧縮ビットストリームに
対するこの問題を扱う場合の問題点は，NTSCまたはPAL
規格に一致するような非圧縮ビットストリームまたはア
ナログ信号に対するものと異なる。例えば，NTSCまたは
PAL信号に関して，画像は同期して送信され，その結果
クロック信号は画像同期(picture synch)から直接引き
出される。この場合において，２つの像の同期化は画像
同期を使って簡単に達成される。Further, when dealing with this problem for digitally compressed bit streams, the problem is that NTSC or PAL
Unlike for uncompressed bitstreams or analog signals that match the standard. For example, NTSC or
For the PAL signal, the picture is transmitted synchronously, so that the clock signal is derived directly from picture synch. In this case, synchronization of the two images is easily achieved using image synchronization.

【００１６】しかし，デジタル圧縮立体ビットストリー
ムにおいて，各レイヤー内の各画像に対するデータ量
は，変化しまたビット速度，画像符号化タイプ及び場面
の複雑さに依存する。したがって，復合化及びプレゼン
テーションタイミングは画像データの開始から直接引き
出される。すなわち，アナログビデオ送信と違って，デ
ジタル圧縮ビットストリームには同期パルスの自然概念
が存在しない。However, in a digitally compressed stereoscopic bit stream, the amount of data for each image in each layer varies and depends on bit rate, image coding type and scene complexity. Therefore, the decoding and presentation timing is derived directly from the start of the image data. That is, unlike analog video transmission, digital compression bitstreams do not have the natural concept of synchronization pulses.

【００１７】したがって，立体ビデオシーケンスの復合
化及びプレゼンテーションを同期化するためのシステム
を与えることが有利である。該システムはまた画像を連
続的に（例えば，一度に１画像）または並行に（例え
ば，一度に２画像）復合化するデコーダと互換性がなけ
ればならない。さらに，システムは必要なデコーダ入力
バッファサイズを最小化するような最適な画像送信順序
を与えなければならない。本発明は上記及び他の利点を
有する装置を与える。Therefore, it would be advantageous to provide a system for synchronizing the decoding and presentation of stereoscopic video sequences. The system must also be compatible with decoders that decode images sequentially (eg, one image at a time) or in parallel (eg, two images at a time). In addition, the system must provide an optimal image transmission order that minimizes the required decoder input buffer size. The present invention provides an apparatus having the above and other advantages.

【００１８】[0018]

【課題を解決するための手段】本発明にしたがって，立
体ビデオシーケンスの下位層及び強化層のビデオ画像の
送信シーケンスを並べるための方法及び装置が与えられ
る。特に，画像は，プレゼンテーション前に一時的に保
存されるべき画像数が最少化されるような順序で送信さ
れる。さらにまた，各画像に対するデコード・タイム・
スタンプ(DTS)及びプレゼンテーション・タイム・スタ
ンプ(PTS)は，復合化が連続または並行で生じるところ
のデコーダにおいて，下位層及び強化層画像の間の同期
化を与えるべく決定される。SUMMARY OF THE INVENTION In accordance with the present invention, there is provided a method and apparatus for aligning a transmission sequence of video images of lower and enhancement layers of a stereoscopic video sequence. In particular, the images are transmitted in such an order that the number of images that must be temporarily stored before the presentation is minimized. Furthermore, the decoding time for each image
The stamp (DTS) and the presentation time stamp (PTS) are determined to provide synchronization between the lower layer and enhancement layer images in the decoder where the decoding occurs in a continuous or parallel manner.

【００１９】特に，強化層が対応する下位層画像を使っ
て予測される不均衡予測画像を含むところの，立体ビデ
オ信号の下位層及び強化層内のビデオ画像シーケンスの
送信を順序づけるための方法が与えられる。該方法は，
不均衡予測された強化層画像が対応するそれぞれの下位
層画像の後に送信されるようにビデオ画像を順序づける
工程を含む。In particular, a method for ordering the transmission of a video image sequence within a lower layer and an enhancement layer of a stereoscopic video signal, wherein the enhancement layer includes an unbalanced predicted image predicted using a corresponding lower layer image. Is given. The method comprises:
Ordering the video images such that the unbalanced predicted enhancement layer images are transmitted after the corresponding respective lower layer images.

【００２０】第１の実施例において，下位層は連続画像
のI_Li,I_Li+1,I_Li+2,I_Li+3,I_Li+4などを含む内部符号化
画像（Ｉ画像）のみ含み，対応する強化層画像はH_Ei,H
_Ei+1,H_Ei+2,H_Ei+3,H_Ei+4などによって表される。この場
合，ビデオ画像は，I_Li,I_Li+1,H_Ei,I_Li+2,H_Ei+1,I_Li+3,
H_Ei+2,I_Li+4,H_Ei+3の順序で送信される（例えば，シー
ケンス１）。In the first embodiment, the lower layer is only the inner coded image (I image) including the continuous images I _Li , I _{Li + 1} , I _{Li + 2} , I _{Li + 3} , I _{Li + 4,} etc. Included and corresponding enhancement layer images are H _Ei , H
_{It is represented by Ei + 1} , _{HEi + 2} , _{HEi + 3} , _{HEi + 4,} and so on. In this case, the video images are I _Li , I _{Li + 1} , H _Ei , I _{Li + 2} , H _{Ei + 1} , I _{Li + 3} ,
H _{Ei + 2} , I _{Li + 4} , and H _{Ei + 3} are transmitted in this order (for example, sequence 1).

【００２１】替わって，第２実施例において，ビデオ画
像は，I_Li,H_Ei,I_Li+1,H_Ei+1,I_Li+2,H_Ei+2,I_Li+3,H_Ei+3
の順序で送信される（例えば，シーケンス２）。Alternatively, in the second embodiment, the video images are I _Li , H _Ei , I _{Li + 1} , H _{Ei + 1} , I _{Li + 2} , H _{Ei + 2} , I _{Li + 3} , H _{Ei + Three}
(For example, sequence 2).

【００２２】第３の実施例において，下位層は連続画像
のI_Li,P_Li+1,P_Li+2,P_Li+3,及びP_Li+4などを含む内部符
号化画像（Ｉ画像）及び予測符号化画像（Ｐ画像）のみ
を含み，対応する強化層画像は，H_Ei,H_Ei+1,H_Ei+2,H
_Ei+3及びH_Ei+4などによってそれぞれ表される。ここ
で，ビデオ画像はI_Li,P_Li+1,H_Ei,P_Li+2,H_Ei+1,P_Li+3,H
_Ei+2,P_Li+4,H_Ei+3などの順序で送信される（例えば，シ
ーケンス３）。In the third embodiment, the lower layer is an inner coded image (I image) including the continuous images I _Li , P _{Li + 1} , P _{Li + 2} , P _{Li + 3} , and P _{Li + 4.} And only the prediction-encoded image (P image), and the corresponding enhancement layer images are H _Ei , H _{Ei + 1} , H _{Ei + 2} , H _Ei
_They are represented by _{Ei + 3} and _{HE + 4} , respectively. Here, the video images are I _Li , P _{Li + 1} , H _Ei , P _{Li + 2} , H _{Ei + 1} , P _{Li + 3} , H
It is transmitted in the order of _{Ei + 2} , P _{Li + 4} , H _{Ei + 3,} etc. (for example, sequence 3).

【００２３】替わって，第４の実施例において，ビデオ
画像は，I_Li,H_Ei,P_Li+1,H_Ei+1,P_Li+2,H_Ei+2,P_Li+3,H
_Ei+3などの順序で送信される（例えば，シーケンス
４）。Alternatively, in the fourth embodiment, the video images are I _Li , H _Ei , P _{Li + 1} , H _{Ei + 1} , P _{Li + 2} , H _{Ei + 2} , P _{Li + 3} , H
_They are transmitted in the order of _{Ei + 3} (for example, sequence 4).

【００２４】第５の実施例において，下位層は連続画像
のI_Li,B_Li+1,P_Li+2,B_Li+3,P_Li+4,B_Li+5,P_Li+6,などを含
む，内部符号化画像（Ｉ画像），予測符号化画像（Ｐ画
像）及び非連続双方向予測符号化画像（Ｂ画像）を含
み，対応する強化層画像はH_Ei,H_Ei+1,H_Ei+2,H_Ei+3,H
_Ei+4,H_Ei+5,H_Ei+6などによってそれぞれ表される。ビデ
オ画像はI_Li,P_Li+2,B_Li+1,H_Ei,H_Ei+1,P_Li+4,B_Li+3,H
_Ei+2,H_Ei+3などの順序で送信される（例えば，シーケン
ス５）。In the fifth embodiment, the lower layers are I _Li , B _{Li + 1} , P _{Li + 2} , B _{Li + 3} , P _{Li + 4} , B _{Li + 5} , P _{Li + 6} , etc. of the continuous image. , And the corresponding enhancement layer images are H _Ei , H _{Ei + 1} , H _{Ei + 2} , H _{Ei + 3} , H
_{Ei + 4} , H _{Ei + 5} , H _{Ei + 6} and so on. Video images are I _Li , P _{Li + 2} , B _{Li + 1} , H _Ei , H _{Ei + 1} , P _{Li + 4} , B _{Li + 3} , H
It is transmitted in the order of _{Ei + 2} , _{HEi + 3,} etc. (for example, sequence 5).

【００２５】替わって，第６の実施例において，ビデオ
画像はI_Li,H_Ei,P_Li+2,B_Li+1,H_Ei+1,H_Ei+2,P_Li+4,B_Li+3,
H_Ei+3,H_Ei+4などの順序で送信される（例えば，シーケ
ンス６）。Alternatively, in the sixth embodiment, the video images are I _Li , H _Ei , P _{Li + 2} , B _{Li + 1} , H _{Ei + 1} , H _{Ei + 2} , P _{Li + 4} , B _{Li + 3} ,
It is transmitted in the order of H _{Ei + 3} , H _{Ei + 4,} etc. (for example, sequence 6).

【００２６】替わって，第７の実施例において，ビデオ
画像はI_Li,P_Li+2,H_Ei,B_Li+1,H_Ei+1,P_Li+4,H_Ei+2,B_Li+3,
H_Ei+3などの順序で送信される（例えば，シーケンス
７）。Alternatively, in the seventh embodiment, the video images are I _Li , P _{Li + 2} , H _Ei , B _{Li + 1} , H _{Ei + 1} , P _{Li + 4} , H _{Ei + 2} , B _{Li + 3} ,
The transmission is performed in the order of H _{Ei + 3} (for example, sequence 7).

【００２７】第８の実施例において，下位層は連続画像
のI_Li,B_Li+1,B_Li+2,P_Li+3,B_Li+4,B_Li+5,P_Li+6などを含
む，内部符号化画像（Ｉ画像），予測符号化画像（Ｐ画
像）及び連続双方向予測符号化画像（Ｂ画像）を含み，
対応する強化層画像はH_Ei,H_Ei+1,H_Ei+2,H_Ei+3,H_Ei+4,H
_Ei+5,H_Ei+6などによってそれぞれ表される。ビデオ画像
はI_Li,P_Li+3,B_Li+1,H_Ei,B_Li+2,H_Ei+1,H_Ei+2,P_Li+6,B
_Li+4,H_Ei+3,B_Li+5,H_Ei+4,H_Ei+5などの順序で送信される
（例えば，シーケンス８）。In the eighth embodiment, the lower layers include the continuous images I _Li , B _{Li + 1} , B _{Li + 2} , P _{Li + 3} , B _{Li + 4} , B _{Li + 5} , P _{Li + 6,} etc. Including an intra-coded image (I-picture), a predictive-coded picture (P-picture) and a continuous bidirectional predictive-coded picture (B-picture)
The corresponding enhancement layer images are H _Ei , H _{Ei + 1} , H _{Ei + 2} , H _{Ei + 3} , H _{Ei + 4} , H
_{Ei + 5} , H _{Ei + 6,} etc., respectively. Video images are I _Li , P _{Li + 3} , B _{Li + 1} , H _Ei , B _{Li + 2} , H _{Ei + 1} , H _{Ei + 2} , P _{Li + 6} , B
It is transmitted in the order of _{Li + 4} , _{HEi + 3} , _{BLi + 5} , _{HEi + 4} , _{HEi + 5} (for example, sequence 8).

【００２８】替わって，第９の実施例において，ビデオ
画像はI_Li,H_Ei,P_Li+3,B_Li+1,H_Ei+1,B_Li+2,H_Ei+2,H_Ei+3,
P_Li+6,B_Li+4,H_Ei+4,B_Li+5,H_Ei+5,H_Ei+6などの順序で送
信される（例えば，シーケンス９）。Alternatively, in the ninth embodiment, the video images are I _Li , H _Ei , P _{Li + 3} , B _{Li + 1} , H _{Ei + 1} , B _{Li + 2} , H _{Ei + 2} , H _{Ei + 3} ,
The transmission is performed in the order of P _{Li + 6} , B _{Li + 4} , H _{Ei + 4} , B _{Li + 5} , H _{Ei + 5} , H _{Ei + 6} (for example, sequence 9).

【００２９】替わって，第１０の実施例において，ビデ
オ画像はI_Li,P_Li+3,H_Ei,B_Li+1,H_Ei+1,B_Li+2,H_Ei+2,P
_Li+6,H_Ei+3,B_Li+4,H_Ei+4,B_Li+5,H_Ei+5などの順序で送信
される（例えば，シーケンス１０）。Alternatively, in the tenth embodiment, the video images are I _Li , P _{Li + 3} , H _Ei , B _{Li + 1} , H _{Ei + 1} , B _{Li + 2} , H _{Ei + 2} , P
It is transmitted in the order of _{Li + 6} , _{HE + 3} , B _{+ 4} , _{HE + 4} , B _{+ 5} , _{HE + 5} (for example, sequence 10).

【００３０】対応する装置もまた与えられる。A corresponding device is also provided.

【００３１】付加的に，レシーバは下位層及び強化層を
含む立体信号のビデオ画像のシーケンスを処理するため
に与えられる。レシーバはメモリ，減圧／予測プロセッ
サ，及びメモリ及びプロセッサと協働するメモリマネー
ジャを含む。メモリマネージャは，不均衡予測された強
化層画像の対応するものの前に下位層画像が減圧／予測
プロセッサによって処理されるように，メモリ内への選
択された下位層画像の保存をスケジュール化する。さら
に，復合化は連続または並行に生じ得る。[0031] Additionally, a receiver is provided for processing a sequence of video images of the stereo signal including the lower layer and the enhancement layer. The receiver includes a memory, a decompression / prediction processor, and a memory manager that cooperates with the memory and the processor. The memory manager schedules the storage of the selected lower layer image in memory such that the lower layer image is processed by the decompression / prediction processor before the corresponding one of the unbalanced predicted enhancement layer images. In addition, the decryption can occur sequentially or in parallel.

【００３２】[0032]

【発明の実施の形態】立体ビデオ画像シーケンスの復合
化及びプレゼンテーションを同期化するための方法及び
装置が与えられる。DETAILED DESCRIPTION A method and apparatus for synchronizing the decoding and presentation of a stereoscopic video image sequence is provided.

【００３３】図１は立体ビデオ用のコーダ／デコーダの
ブロック図である。MPEG MVP規格及び同様の装置は，下
位層及び強化層を含む２つのビデオ層の符号化を伴う。
そのような応用に対し，下位層は左像に割り当てられ，
強化層は右像に割り当てられる。図１のコーダ／デコー
ダ（例えば，ｃｏｄｅｃ）構造において，下位層及び強
化層ビデオシーケンスは一時的リマルチプレクサ(tempo
ral remux)105によって受信される。時間分割多重化(TD
MX)を使って，強化層ビデオは強化エンコーダ110に与え
られ，一方ベース層ビデオは下位エンコーダ115に与え
られる。下位層ビデオデータは不均衡予測用に強化エン
コーダ110に与えられることに注意すべきである。FIG. 1 is a block diagram of a coder / decoder for stereoscopic video. The MPEG MVP standard and similar devices involve the encoding of two video layers, including a lower layer and an enhancement layer.
For such applications, the lower layers are assigned to the left image,
The enhancement layer is assigned to the right image. In the coder / decoder (eg, codec) structure of FIG. 1, the lower layer and enhancement layer video sequences are temporarily remultiplexed (tempo).
ral remux) 105. Time division multiplexing (TD
Using MX), the enhancement layer video is provided to enhancement encoder 110, while the base layer video is provided to lower encoder 115. It should be noted that the lower layer video data is provided to the enhancement encoder 110 for imbalance prediction.

【００３４】その後，符号化された強化及びベース層は
移送ストリームとしてデコーダへ送信するためにシステ
ムマルチプレクサ120へ与えられる。典型的に，送信経
路はケーブルシステムヘッドエンドへの衛星リンクまた
は直接に消費者の家庭への衛星を通じるものである。デ
コーダ122において，移送ストリームはシステムデマル
チプレクサ125においてデマルチプレクスされる。符号
化された強化層データは強化デコーダ130へ与えられ，
一方符号化された下位層データは下位デコーダ135へ与
えられる。好適には，復合化は並行処理で下位層及び強
化層と同時に実行される点に注意すべきである。他に，
強化デコーダ130及び下位デコーダ135が共通の処理ハー
ドウエアを共有してもよく，その場合復合化は一度に一
つの画像ずつ連続的に実行される。Thereafter, the encoded enhancement and base layers are provided to a system multiplexer 120 for transmission as a transport stream to a decoder. Typically, the transmission path is through a satellite link to the cable system headend or directly to the consumer's home. At the decoder 122, the transport stream is demultiplexed at the system demultiplexer 125. The encoded enhancement layer data is provided to enhancement decoder 130,
On the other hand, the encoded lower layer data is supplied to lower decoder 135. It should be noted that the decryption is preferably performed concurrently with the lower and enhancement layers simultaneously. other,
The enhancement decoder 130 and the lower decoder 135 may share common processing hardware, in which case the decoding is performed continuously one image at a time.

【００３５】復合化された下位層データは独立のデータ
ストリームとして下位デコーダ135から出力され，それ
はまた一時的リマルチプレクサ140へ与えられる。一時
的リマルチプレクサ140において，復合化されたベース
層データ及び復合化された強化層データは図示されるよ
うな強化層出力信号を与えるべく組み合わされる。その
後，強化及び下位層出力信号が表示用のディスプレイ装
置へ与えられる。The decoded lower layer data is output from lower decoder 135 as an independent data stream, which is also provided to temporary remultiplexer 140. In temporary remultiplexer 140, the decoded base layer data and the decoded enhancement layer data are combined to provide an enhancement layer output signal as shown. Thereafter, the enhanced and lower layer output signals are provided to a display device for display.

【００３６】さらに，下位層及び強化層の両方に対する
符号化ビットストリームは，デコーダがすでに復合化ず
みのフレームまたはフィールドのみに依存してあらゆる
フレームまたはフィールドを復合化することができるよ
うに，システムマルチプレクサ120において多重化され
なければならない。しかし，この問題は，Ｐ及びＢ画像
に対する予測モードが下位層及び強化層では異なるとい
う事実によって複雑化している。さらにまた，強化層画
像は常にプレゼンテーション順（例えば，ディスプレイ
順）に送信されるが，下位層ではしばしばそうではな
い。したがって，復合化及びディスプレイが適正な順序
で生じるように，しばしばデコーダにおいてビデオ画像
を保存しかつ並べ替える必要がある。Furthermore, the coded bitstreams for both the lower layer and the enhancement layer are encoded in a system multiplexer so that the decoder can decode every frame or field only depending on the already decoded frames or fields. Must be multiplexed at 120. However, this problem is complicated by the fact that the prediction modes for P and B pictures are different for the lower and enhancement layers. Furthermore, enhancement layer images are always transmitted in presentation order (eg, display order), but often not in lower layers. Therefore, it is often necessary to store and reorder the video images at the decoder so that the decoding and display occur in the proper order.

【００３７】付加的に，下位層及び強化層のデータの復
合化及びプレゼンテーションを同期化させる際に困難が
生じる。上記したように，下位層及び強化層に対するビ
デオビットストリームは２つの基本ビデオストリームと
して送信される。移送ストリームに対し，移送ストリー
ムパケットの２つのパケット識別子(PID)が２つの層に
対する移送ストリームプログラムマップセクション内に
特定されている。さらにまた，タイミング情報がデコー
ダでのタイミング比較用のリファレンスとして機能する
よう下位層用の選択パケットの付加フィールド内（例え
ば，PCR_PIDフィールド内）に運ばれる。特に，27MHzク
ロックのサンプルはプログラム_クロック_リファレンス
(PCR)フィールド内で送信される。より厳密には，ここ
に参考文献として組み込む，MPEG-2システムドキュメン
ト ITU-T Rec. H.262, ISO/IEC 13818-1, April 27, 19
95に記載されるように，サンプルはプログラム_クロッ
ク_リファレンス_ベース及びプログラム_クロック_リフ
ァレンス_拡張フィールド内で運ばれる。さらに詳細なM
PEG規格に関しては，ここに参考文献として組み込むド
キュメント ISO/IEC JTC1/SC29/WG11 N0702,題名"Infor
mation Technology -Generic Coding of Moving Pictur
es and Assosiated Audio, Recommendation H.262",199
4年3月25日に記載されている。Additionally, difficulties arise in synchronizing the decoding and presentation of the data in the lower and enhancement layers. As described above, the video bitstreams for the lower and enhancement layers are transmitted as two elementary video streams. For a transport stream, two packet identifiers (PIDs) of the transport stream packets are specified in the transport stream program map section for the two layers. Furthermore, the timing information is carried in an additional field (for example, in a PCR_PID field) of the lower layer selection packet so as to function as a reference for timing comparison in the decoder. In particular, the sample of the 27 MHz clock is program_clock_reference
Sent in the (PCR) field. More precisely, the MPEG-2 system document ITU-T Rec. H.262, ISO / IEC 13818-1, April 27, 19, which is incorporated herein by reference.
As described in 95, the samples are carried in the program_clock_reference_base and program_clock_reference_extension fields. More detailed M
Regarding the PEG standard, the document incorporated here as a reference is ISO / IEC JTC1 / SC29 / WG11 N0702, titled "Infor
mation Technology -Generic Coding of Moving Pictur
es and Assosiated Audio, Recommendation H.262 ", 199
It is listed on March 25, 4 years.

【００３８】PCRはデコーダにおけるビットストリーム
からのフィールドの読み込み完了の予定時間を示す。デ
コーダでの局所的クロック実行の位相は，ビデオ，オー
ディオ及びその他のデータの復合化が同期化しているか
どうかを決定するためにPCR値が得られる瞬間のビット
ストリーム内のPCR値と比較される。さらに，デコーダ
内のサンプルクロックはPCR値から引き出されるシステ
ムクロックへロックされる。PCR値はITU-T Rec. H.262
ISO/IEC 13818-1に説明された方程式を使って以下のよ
うに計算される。The PCR indicates the expected time of completion of reading of the field from the bit stream in the decoder. The phase of the local clock execution at the decoder is compared to the PCR value in the bit stream at the moment when the PCR value is obtained to determine if the decoding of the video, audio and other data is synchronized. Further, the sample clock in the decoder is locked to the system clock derived from the PCR value. PCR value is ITU-T Rec. H.262
It is calculated as follows using the equations described in ISO / IEC 13818-1.

【００３９】PCR(i)=PCR_base(i) × 300 ＋ PCR_ext
(i)，ここで，PCR_base(i)=((システム_クロック_周波数 ×
t(i))DIV300)%2³³，及びPCR_ext(i)=((システム_クロッ
ク_周波数 × t(i))DIV 1)%300，ここで記号％はモジュ
ーロ演算を示す。PCR (i) = PCR_base (i) × 300 + PCR_ext
(i), where PCR_base (i) = ((system_clock_frequency ×
t (i)) DIV300)% 2 33, and PCR_ext (i) = ((System _ clock _ frequency × t (i)) DIV 1 )% 300, wherein the symbol% indicates a modulo operation.

【００４０】同様の方法で，立体ビデオ信号のプログラ
ムストリームに対し，タイミング情報がシステム_クロ
ック_リファレンス(SCR)フィールド内の27MHzクロック
のサンプルとしてパケットヘッダ内で運ばれる。SCR値
はITU-T Rec. H.262 ISO/IEC 13818-1に説明された方程
式を使って以下のように計算される。In a similar manner, for a program stream of a stereoscopic video signal, timing information is carried in the packet header as a 27 MHz clock sample in the system_clock_reference (SCR) field. The SCR value is calculated as follows using the equation described in ITU-T Rec. H.262 ISO / IEC 13818-1.

【００４１】SCR(i)=SCR_base(i) × 300 ＋ SCR_ext
(i)，ここで，SCR_base(i)=((システム_クロック_周波数 ×
t(i))DIV300)%2³³，及びSCR_ext(i)=((システム_クロッ
ク_周波数 × t(i))DIV 1)%300。SCR (i) = SCR_base (i) × 300 + SCR_ext
(i), where SCR_base (i) = ((system_clock_frequency ×
t (i)) DIV300)% 2 33, and SCR_ext (i) = ((System _ clock _ frequency × t (i)) DIV 1 )% 300.

【００４２】下位層及び強化層の両方におけるビデオパ
ケットの識別は２つのストリーム識別子としてのプログ
ラム・ストリーム・マップ内で特定される。移送ストリ
ーム及びプログラムストリームの両方に対して，立体ビ
デオ用の復合化及びプレゼンテーション処理の同期化が
パケット化された基本ストリーム(PES)パケット内で与
えられる。特に，プレゼンテーション・タイム・スタン
プ(PTS)及び／またはデコーディング・タイム・スタン
プ(DTS)はPESヘッダの付加的フィールド内に与えられ
る。The identification of video packets at both the lower and enhancement layers is specified in the program stream map as two stream identifiers. For both transport and program streams, synchronization of the decoding and presentation processing for stereoscopic video is provided in packetized elementary stream (PES) packets. In particular, the presentation time stamp (PTS) and / or the decoding time stamp (DTS) are provided in an additional field of the PES header.

【００４３】PESパケットは移送またはプログラムのパ
ケット化以前に各基本ビデオストリームに対して与えら
れる。もしPTS及び／またはDTSをデコーダに送る必要が
あれば，新しいPESパケットがPESストリーム内に与えら
れる。したがって，同期化のためのひとつの重要な要因
は正確にPTS及びDTSを計算することである。PTS及びDTS
は，ITU-T Rec. H.262 ISO/IEC 13818-1に説明された移
送ストリームシステムターゲットデコーダ(T-STD)，ま
たはプログラムストリームシステムターゲットデコーダ
(P-STD)などの仮想的なデコーダモデルに基づいたエン
コーダによって決定される。A PES packet is provided for each elementary video stream prior to transport or packetization of the program. If PTS and / or DTS needs to be sent to the decoder, a new PES packet is provided in the PES stream. Therefore, one important factor for synchronization is to calculate PTS and DTS accurately. PTS and DTS
Is the transport stream system target decoder (T-STD) described in ITU-T Rec. H.262 ISO / IEC 13818-1, or the program stream system target decoder.
It is determined by an encoder based on a virtual decoder model such as (P-STD).

【００４４】PTS及びDTSの両方の値は，単位あたり90kH
zを与える300に分割されたシステムクロック周波数の単
位期間内で特定される。特に，ITU-T Rec. H.262 ISO/I
EC 13818-1に説明されるように， PTS(k)=((システム_クロック_周波数 × tpn(k))DIV30
0)%2³³，ここで，tpn(k)はプレゼンテーション単位Pn(k)のプレ
ゼンテーション時間である。同様に， DTS(j)=((システム_クロック_周波数 × tdn(k))DIV30
0)%2³³，ここで，tdn(k)はアクセス単位An(j)の復合化時間であ
る。このように，ビデオDTSは画像がSTDによって復合化
されるべく要求される時間を指示する。ビデオPTSは復
合化された画像が視聴者に（例えば，テレビ画面上に表
示されて）与えられるべき時間を示す。さらに，PTS及
びDTSによって指示された時間はカレントPCRまたはSCR
値に関して評価される。The value of both PTS and DTS is 90 kHz per unit.
Specified within a unit period of the system clock frequency divided by 300 giving z. In particular, ITU-T Rec. H.262 ISO / I
PTS (k) = ((system_clock_frequency × tpn (k)) DIV30 as described in EC 13818-1
0)% 2 ³³ , where tpn (k) is the presentation time of the presentation unit Pn (k). Similarly, DTS (j) = ((system_clock_frequency × tdn (k)) DIV30
0)% 2 ^33, where, tdn (k) is the recovery Goka time access unit An (j). Thus, the video DTS indicates the time at which an image is required to be decoded by the STD. The video PTS indicates the time at which the decoded image is to be provided to the viewer (eg, displayed on a television screen). In addition, the time indicated by PTS and DTS is the current PCR or SCR
Evaluated for value.

【００４５】ビデオビットストリームは理論的なSTDモ
デル内で即座に復合化される。しかし，もしＢ画像が立
体ビットストリームの下位層内に存在すれば，該ビット
ストリームはプレゼンテーション（例えば，ディスプレ
イ）順にデコーダに到着する。そのような場合におい
て，いくつかのＩ及び／またはＰ画像は，復合化された
後の適当なプレゼンテーション時間までの間，STD内の
デコーダバッファ内に一時的に保存されなければならな
い。しかし，強化層に関して，すべての画像はプレゼン
テーション順でデコーダに到着し，その結果PTS及びDTS
値は同一または固定間隔だけオフセットされなければな
らない。The video bitstream is immediately decoded in the theoretical STD model. However, if the B-picture is in a lower layer of the stereoscopic bitstream, the bitstream arrives at the decoder in presentation (eg, display) order. In such a case, some I and / or P images must be temporarily stored in the decoder buffer in the STD until the appropriate presentation time after decoding. However, for the enhancement layer, all images arrive at the decoder in presentation order, so that PTS and DTS
The values must be offset by the same or fixed intervals.

【００４６】下位層及び強化層シーケンスを同期化する
ために，下位層及び強化層内の対応する画像は同じPTS
を有しなければならない。MPEG-2メインプロファイルに
対するDTSを計算する現在の方法は，下位層内のDTS，例
えば，DTS_L（ここで，"L"は下位層を示す）の計算に関
して使用される。後続のPTS及びDTS値は対応するDTS_Lを
参照する。特に，DTS_Li及びPTS_Liは，下位層内のi番目
の画像に対するDTS及びPTSをそれぞれ示す。また，DTS
_Ei及びPTS_Eiは，強化層内のi番目の画像に対するDTS及
びPTSをそれぞれ示す。その際，連続画像のプレゼンテ
ーションの間の時間間隔Fは，F = 90×10³／フレーム速
度として定義される。例えば，NTSC規格の下で，29.97
フレーム/秒のフレーム速度について，F=3,003である。
Fは90kHzクロックサイクルにおける公称フレーム時間間
隔であり，3,003サイクル/90kHz=0.03336秒の実際の経
過時間に対応する。PAL規格の下では，25フレーム/秒の
フレーム速度に関してF=3,600である。To synchronize the lower layer and enhancement layer sequences, the corresponding images in the lower layer and the enhancement layer have the same PTS
Must have. The current method of calculating the DTS for the MPEG-2 main profile is used for calculating the DTS in the lower layer, eg, DTS _L (where “L” indicates the lower layer). Subsequent PTS and DTS values refer to the corresponding DTS _L. In particular, DTS _Li and PTS _Li indicate the DTS and PTS for the ith image in the lower layer, respectively. DTS
_Ei and PTS _Ei indicate the DTS and PTS, respectively, for the ith image in the enhancement layer. The time interval F between the presentation of successive images is then defined as F = 90 × 10 ³ / frame rate. For example, under the NTSC standard, 29.97
For a frame rate of frames / sec, F = 3,003.
F is the nominal frame time interval in a 90 kHz clock cycle, corresponding to an actual elapsed time of 3,003 cycles / 90 kHz = 0.03336 seconds. Under the PAL standard, F = 3,600 for a frame rate of 25 frames / sec.

【００４７】さらに，下位層及び強化層シーケンスの同
期化はビデオシーケンスの送信及びディスプレイ順に密
接に依存している。概して，非立体ビデオ信号に対する
MPEG-2規格は，Ｉ画像，Ｐ画像及びＢ画像がベース層内
のシーケンス内で取らねばならない特別の分布を特定化
しないが，異なる分布に対し異なる程度の圧縮及びラン
ダムアクセス能力をもたらす。ひとつの可能な分布にお
いて，ベース層内の各画像はＩ画像である。他の可能な
分布において，Ｉ及びＰ画像の両方が与えられるか，
Ｉ，Ｐ及び非連続的に与えられるところのＢ画像が与え
られるか，またはＩ，Ｐ及び２つが連続して与えられる
ところのＢ画像が与えられる。概して，３つ以上の連続
Ｂ画像は映像品質が低いことより使用されていない。強
化層において，Ｂ画像及びＰ画像が与えられ，またＩ画
像も付加的に与えられ得る。Furthermore, the synchronization of the lower layer and enhancement layer sequences depends closely on the transmission and display order of the video sequence. Generally, for non-stereo video signals
The MPEG-2 standard does not specify a particular distribution that the I, P, and B images must take within the sequence in the base layer, but provides different degrees of compression and random access capabilities for different distributions. In one possible distribution, each image in the base layer is an I image. In other possible distributions, both I and P images are given,
Either the I, P and B images provided discontinuously are provided, or the I, P and B images provided two consecutively are provided. Generally, three or more consecutive B images are not used due to poor video quality. In the enhancement layer, a B image and a P image are provided, and an I image may additionally be provided.

【００４８】図２は，本発明の装置で使用するための強
化層画像シーケンス及び第１ベース層画像シーケンスを
図示したものである。ここで，下位層はＩ画像のみを含
む。強化層画像シーケンスは200で示され，下位層シー
ケンスは250で示される。シーケンス200及び250はディ
スプレイ順で示されている。各画像は，画像タイプ（例
えば，Ｉ，ＢまたはＰ），層の指示（例えば，強化層に
対して"E"及び下位層に対して"L"），及び添字"0"はシ
ーケンス内のゼロ番目の画像でありまた"1"はシーケン
ス内の最初の画像であるところの画像の連続的配置，を
示すべくラベル化されている。FIG. 2 illustrates an enhancement layer image sequence and a first base layer image sequence for use in the apparatus of the present invention. Here, the lower layer includes only the I image. The enhancement layer image sequence is indicated by 200 and the lower layer sequence is indicated by 250. Sequences 200 and 250 are shown in display order. Each image has an image type (eg, I, B or P), a layer indication (eg, “E” for enhancement layers and “L” for lower layers), and a subscript “0” in the sequence. The zeroth image and a "1" is labeled to indicate the contiguous arrangement of the image, which is the first image in the sequence.

【００４９】強化層200は画像I_E0(202),B_E1(204),B_E2(2
06),P_E3(208),B_E4(210),B_E5(212),P_E6(214),B_E7(216),B
_E8(218),P_E9(220),B_E10(222),B_E11(224)及びI_E12(226)
を含む。しかし，示された特定の強化層シーケンスは一
例にすぎない。図２〜５を含めここで議論されるあらゆ
る強化層シーケンスにおいて，強化層はディスプレイ順
に送信されるため特定の強化層画像タイプに制限されな
い。したがって，あらゆる強化層画像は一般画像タイプ
（例えば，H_Ei（ここで"H"は画像タイプを示す））であ
ると考えられる。The enhancement layer 200 is composed of the images I _E0 (202), B _E1 (204), B _E2 (2
06), P _E3 (208), B _E4 (210), B _E5 (212), P _E6 (214), B _E7 (216), B
_E8 (218), P _E9 (220), B _E10 (222), B _E11 (224) and I _E12 (226)
including. However, the particular enhancement layer sequence shown is only an example. In any of the enhancement layer sequences discussed herein, including FIGS. 2-5, the enhancement layers are transmitted in display order and are not limited to a particular enhancement layer image type. Thus, any enhancement layer image is considered to be of a general image type (eg, _HEi (where "H" indicates an image type)).

【００５０】下位層250は，この例において，I_L0(252),
I_L1(254),I_L2(256),I_L3(258),I_L4(260),I_L5(262),I_L6(2
64),I_L7(266),I_L8(268),I_L9(270),I_L10(272),I_L11(274)
及びI_L12(276)を含むＩ画像のみを含む。付加的に，各
シーケンスごとの画像グループ(GOP)の開始が示されて
いる。GOPは他のGOP内の画像を参照せずに復合化される
一つ以上の連続画像を示す。概して，下位層及び強化層
のGOPは並んでおらず，また異なる長さを有する。例え
ば，強化層200内の最初のGOPの開始は画像I_E0(202)で示
されるが，第２GOPの開始は画像I_E12(226)である。同様
に，下位層250内の第１GOPの開始は画像I_L2(256)で示さ
れ，第２GOPの開始は画像I_L8(268)である。The lower layer 250 is, in this example, I _L0 (252),
I _L1 (254), I _L2 (256), I _L3 (258), I _L4 (260), I _L5 (262), I _L6 (2
64), _IL7 (266), _IL8 (268), _IL9 (270), _IL10 (272), _IL11 (274)
And I _L12 (276). Additionally, the start of an image group (GOP) for each sequence is shown. The GOP indicates one or more continuous images to be decoded without referring to images in other GOPs. In general, the GOPs of the lower and enhancement layers are not side by side and have different lengths. For example, the start of the first GOP in enhancement layer 200 is indicated by image I _E0 (202), while the start of the second GOP is image I _E12 (226). Similarly, the start of the first GOP in lower layer 250 is indicated by image I _L2 (256), and the start of the second GOP is image I _L8 (268).

【００５１】さらに，図２で示される矢印は，矢の先端
によって示される画像が矢の後端に結合された画像に基
づいて予測されるような許容予測モードを示す。例え
ば，画像B_E1(204)は画像I_L1(254)から予測される。Ｉ画
像は予測符号化されず，独立である。Further, the arrow shown in FIG. 2 indicates an allowable prediction mode in which the image indicated by the tip of the arrow is predicted based on the image connected to the rear end of the arrow. For example, image _BE1 (204) is predicted from image _IL1 (254). I-pictures are not predictive coded and are independent.

【００５２】図２の画像ディスプレイ順に関して，本発
明に従うI_L2で始まる有利な送信シーケンスは，I_L2,
B_E1,I_L3,B_E2,I_L4,P_E3,I_L5,B_E4,I_L6,B_E5,I_L7,P_E6,I_L8,B
_E7,I_L9,B_E8,I_L10,P_E9,I_L11,B_E10,I_L12,B_E11など（シー
ケンス１）である。この画像順に関して，デコーダに到
着する各予測符号化画像は復合化前に並べ替えられなく
てもよい。したがって，デコーダでの保存及び処理の必
要性は減少し，それによってデコーダのコストが減少す
る。他の適当な画像送信シーケンスは，I_L2,B_E2,I_L3,P
_E3,I_L4,B_E4,I_L5,B_E5,I_L6,P_E6,I_L7,B_E7,I_L8,B_E8,I_L9,
P_E9,I_L10,B_E10,I_L11,B_E11,I_L12,I_E12など（シーケンス
２）である。With respect to the image display order of FIG. 2, an advantageous transmission sequence starting with I _L2 according to the invention is I _L2 ,
_{_{_{B E1, I L3, B E2}}} , I L4, P E3, I L5, B E4, I L6, B E5, I L7, P E6, I L8, B
_E7 , _IL9 , _BE8 , _IL10 , _PE9 , _IL11 , _BE10 , _IL12 , _BE11, etc. (sequence 1). With respect to this picture order, each prediction coded picture arriving at the decoder does not have to be rearranged before decoding. Thus, the need for storage and processing at the decoder is reduced, thereby reducing the cost of the decoder. Other suitable image transmission sequences are _IL2 , _BE2 , _IL3 , P
_{_{_{E3, I L4, B E4,}}} I L5, B E5, I L6, P E6, I L7, B E7, I L8, B E8, I L9,
P _E9 , I _L10 , B _E10 , I _L11 , B _E11 , I _L12 , I _E12, etc. (sequence 2).

【００５３】これらの画像送信シーケンスに関して，す
べての画像はプレゼンテーション順にデコーダに到着す
る。さらに，各画像に対し適当なPTS及びDTSを決定する
ことが可能である。最初に，i番目の下位層画像のDTSの
DTS_Liを仮定することが知られている。With respect to these image transmission sequences, all images arrive at the decoder in presentation order. Further, it is possible to determine an appropriate PTS and DTS for each image. First, the DTS of the i-th lower layer image
It is known to assume DTS _Li .

【００５４】特定の例として，図２の第１画像送信シー
ケンス１に関して，以下の表１に示されるような復合化
及び呈示(presenting)が生じる。ここで，連続的な復合
化が仮定されている。表１において，第１列は開始時間
としてDTS_L2を使った増分0.5Fの時間を示し，第２列は
下位層画像の復合化時間を示し，第３列は強化層画像の
復合化時間を示し，第４列は下位層及び強化層画像のプ
レゼンテーション時間を示す。As a specific example, for the first image transmission sequence 1 of FIG. 2, decoding and presenting occur as shown in Table 1 below. Here, continuous decryption is assumed. In Table 1, the first column indicates the time of 0.5F increment using DTS _L2 as the start time, the second column indicates the decoding time of the lower layer image, and the third column indicates the decoding time of the enhancement layer image. The fourth column shows the presentation time of the lower layer and enhancement layer images.

【００５５】[0055]

【表１】ここで，２つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2及びI_L3が復合化されかつB_E2が
受信される前に保存される。受信されると，B_E2はすぐ
に復合化されプレゼンテーション用に実質的にI_L2と同
時に出力される。[Table 1] Here, it is necessary to save only the two decoded images. For example, _IL2 and _IL3 are decrypted and stored before _BE2 is received. When received, B _E2 is substantially outputted at the same time as I _L2 for immediately Fukugo of presentation.

【００５６】さらにまた，下位または強化シーケンスの
いずれかにおけるi番目画像に対し，DTS及びPTSが以下
のようにDTS_Liから決定される。Furthermore, for the ith image in either the lower order or the enhancement sequence, the DTS and PTS are determined from DTS _Li as follows.

【００５７】PTS_Li=DTS_Li+1.5F DTS_Ei=DTS_Li+1.5F PTS_Ei=PTS_Li 例えば，図２内のP_E3(208)に対するPTSは1.5FとI_L3に対
するDTSの和に等しい。したがって，P_E3の復合化及びプ
レゼンテーションは1.5画像時間間隔（すなわち，1.5
F）だけI_L3の復合化に遅れる。[0057] _{_{PTS Li = DTS Li + 1.5F DTS}} Ei = DTS Li + 1.5F PTS Ei = PTS Li e.g., PTS for P _E3 (208) in FIG. 2 is equal to the sum of DTS for 1.5F and I _L3. Therefore, Fukugo size and presentation 1.5 image time interval P _E3 (i.e., 1.5
F) only delayed the recovery Goka of I _L3.

【００５８】図２の第２画像移送シーケンスに関して，
復合化及び呈示は以下の表２に記載されているように生
じる。Regarding the second image transfer sequence of FIG.
Decryption and presentation occurs as described in Table 2 below.

【００５９】[0059]

【表２】ここで，ひとつの復合化された画像に対してのみ保存が
必要である。例えば，I_L2はB_E2が受信される前に復合化
されかつ保存される。受信されると，B_E2はすぐに復合
化されかつI_L2と同時にプレゼンテーション用に出力さ
れる。[Table 2] Here, it is necessary to save only one decrypted image. For example, I _L2 is and stored is Fukugo of before the B _E2 is received. When received, B _E2 is outputted immediately be Fukugo of and for I _L2 presentation concurrently.

【００６０】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表２の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or enhanced sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 2.

【００６１】PTS_Li=DTS_Li+0.5F DTS_Ei=DTS_Li+0.5F PTS_Ei=PTS_Li 図３は本発明の装置に使用するための強化層画像シーケ
ンス及び第２ベース層画像シーケンスを図示したもので
ある。ここで，下位層はＩ及びＰ画像の両方を含む。図
２と同一のエレメントは同一符号で示される。強化層20
0は上記したものと同一である。下位層300は，画像シー
ケンスP_L0(302),P_L1(304),I_L2(306),P_L3(308),P_L4(31
0),P_L5(312),P_L6(314),I_L8(316),P_L9(318),P_L10(320),P
_L11(322)及びP_L12(326)を含む。GOPはI_L2(306)及びI
_L8(318)で開始する。PTS _Li = DTS _Li + 0.5F DTS _Ei = DTS _Li + 0.5F PTS _Ei = PTS _Li FIG. 3 illustrates the enhancement layer image sequence and the second base layer image sequence for use in the apparatus of the present invention. Things. Here, the lower layer includes both I and P images. The same elements as those in FIG. 2 are denoted by the same reference numerals. Strengthening layer 20
0 is the same as described above. The lower layer 300 includes an image sequence P _L0 (302), P _L1 (304), I _L2 (306), P _L3 (308), P _L4 (31
0), _PL5 (312), _PL6 (314), _IL8 (316), _PL9 (318), _PL10 (320), P
Includes _L11 (322) and P _L12 (326). GOP is I _L2 (306) and I
Start with _L8 (318).

【００６２】ここで，予測手法は幾分複雑である。ベー
ス層においてＰ画像は最も近接した先行ＩまたはＰ画像
を使って予測符号化されることを思い出そう。強化層に
おいて，Ｂ画像は可能な異なるモードを３つまで使って
予測符号化され得る。しかし，対応する下位層画像がＩ
画像であるとき，そのＩ画像のみが使用される。また，
強化層においてＰ画像は，最も最近の強化層画像，ディ
スプレイ順に最も最近の下位層画像，またはディスプレ
イ順に次の下位層画像を使って予測符号化される。再
び，対応する下位層画像がＩ画像であるとき，そのＩ画
像のみが使用される。ある場合において，示された予測
モードは付加的経路を含むことに注意すべきである。Here, the prediction method is somewhat complicated. Recall that in the base layer the P picture is predictively coded using the closest preceding I or P picture. In the enhancement layer, the B picture can be predictively coded using up to three different possible modes. However, the corresponding lower layer image is I
If it is an image, only that I-image is used. Also,
In the enhancement layer, the P picture is predictively coded using the most recent enhancement layer image, the most recent lower layer image in display order, or the next lower layer image in display order. Again, when the corresponding lower layer image is an I image, only that I image is used. It should be noted that in some cases, the indicated prediction modes include additional paths.

【００６３】したがって，下位層シーケンス300におい
て，例えば，P_L4はP_L3及びP_L5を使って符号化される。
強化層200において，P_E3はB_E2またはP_L3を使って符号化
され得る。本発明にしたがって，I_L2で始まる適当な画
像移送シーケンスは，I_L2,B_E1,P_L3,B_E2,P_L4,P_E3,P_L5,B
_E4,P_L6,B_E5,P_L7,P_E6,I_L8,B_E7,P_L9,B_E8,P_L10,P_E9,P_L11,B
_E10,P_L12,B_E11などである（シーケンス３）。このシー
ケンスに対して，復合化及び呈示は以下の表３に説明さ
れるように生じる。Therefore, in the lower layer sequence 300, for example, P _L4 is encoded using P _L3 and P _L5 .
In reinforcing layer 200, P _E3 may be coded using B _E2 or P _L3. In accordance with the present invention, a suitable image transfer sequence beginning with I _L2 is I _L2 , B _E1 , P _L3 , B _E2 , P _L4 , P _E3 , P _L5 , B
_E4 , P _L6 , B _E5 , P _L7 , P _E6 , I _L8 , B _E7 , P _L9 , B _E8 , P _L10 , P _E9 , P _L11 , B
_E10 , P _L12 , B _E11, etc. (sequence 3). For this sequence, decryption and presentation occur as described in Table 3 below.

【００６４】[0064]

【表３】ここで，２つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2及びP_L3はB_E2が受信される前に
復合化されかつ保存される。受信されると，B_E2はすぐ
に復合化されかつI_L2と同時にプレゼンテーション用に
出力される。[Table 3] Here, it is necessary to save only the two decoded images. For example, I _L2 and P _L3 are Fukugo of and stored before the B _E2 is received. When received, B _E2 is outputted immediately be Fukugo of and for I _L2 presentation concurrently.

【００６５】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表３の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or enhanced sequence
For the ith image, DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 3.

【００６６】PTS_Li=DTS_Li+1.5F DTS_Ei=DTS_Li+1.5F PTS_Ei=PTS_Li 替わって，図３の例に対して他の適当な移送シーケンス
は，I_L2,B_E2,P_L3,P_E3,P_L4,B_E4,P_L5,B_E5,P_L6,P_E6,P_L7,B
_E7,I_L8,B_E8,P_L9,P_E9,P_L10,B_E10,P_L11,B_E11,P_L12,I_E12な
どである（シーケンス４）。復合化及び呈示は以下に示
される表４のように生じる。PTS _Li = DTS _Li + 1.5F DTS _Ei = DTS _Li + 1.5F PTS _Ei = PTS _Li Alternatively, another suitable transfer sequence for the example of FIG. 3 is I _L2 , B _E2 , P _L3 , P _E3 , P _L4 , B _E4 , P _L5 , B _E5 , P _L6 , P _E6 , P _L7 , B
_E7 , _IL8 , _BE8 , _PL9 , _PE9 , _PL10 , _BE10 , _PL11 , _BE11 , _PL12 , _IE12, etc. (sequence 4). The decryption and presentation occurs as shown in Table 4 below.

【００６７】[0067]

【表４】ここで，１つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2はB_E2が受信される前に復合化さ
れかつ保存され，受信されると，B_E2はすぐに復合化さ
れかつI_L2と同時にプレゼンテーション用に直接出力さ
れる。[Table 4] Here, only one decrypted image needs to be stored. For example, I _L2 is and stored is Fukugo of before the B _E2 is received, it is received, B _E2 is outputted immediately be Fukugo of and directly for I _L2 presentation concurrently.

【００６８】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表４の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or enhanced sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 4.

【００６９】PTS_Li=DTS_Li+0.5F DTS_Ei=DTS_Li+0.5F PTS_Ei=PTS_Li 図４は本発明に使用する強化層画像シーケンス及び第３
ベース層画像シーケンスを示したものである。ここで，
下位層はＩ，Ｐ及びＢ画像を含み，該Ｂ画像は非連続で
ある。図２及び３と同一のエレメントは同一の符号で示
されている。強化層200は上記したものと同一である。
下位層400は，画像シーケンスP_L0(402),B_L1(404),I_L2(4
06),B_L3(408),P_L4(410),B_L5(412),P_L6(414),B_L7(416),I
_L8(418),B_L9(420),P_L10(422),B_L11(424)及びP_L12(426)
を含む。GOPはI_L2(406)及びI_L8(418)で開始する。PTS _Li = DTS _Li + 0.5F DTS _Ei = DTS _Li + 0.5F PTS _Ei = PTS _Li FIG. 4 shows the enhancement layer image sequence used in the present invention and the third sequence.
5 shows a base layer image sequence. here,
The lower layers include I, P and B images, which are non-continuous. 2 and 3 are designated by the same reference numerals. The reinforcement layer 200 is the same as described above.
The lower layer 400 includes an image sequence P _L0 (402), B _L1 (404), I _L2 (4
06), B _L3 (408), P _L4 (410), B _L5 (412), P _L6 (414), B _L7 (416), I
_L8 (418), B _L9 (420), P _L10 (422), B _L11 (424) and P _L12 (426)
including. The GOP starts at _IL2 (406) and _IL8 (418).

【００７０】ここで，予測手法は以下の通りである。ベ
ース層においてＢ画像は，最近の先行ＩまたはＰ画像及
び最近の後続ＩまたはＰ画像を使って予測符号化される
ことを思い出そう。したがって，下位層シーケンス400
において，例えば，B_L3はI_L2及びP_L4を使って符号化さ
れる。本発明に従うI_L2で始まる適当な画像送信シーケ
ンスは，I_L2,P_L4,B_L3,B_E2,P_E3,P_L6,B_L5,B_E4,B_E5,I_L8,B
_L7,P_E6,B_E7,P_L10,B_L9,B_E8,P_E9,P_L12,B_L11,B_E10,B_E11な
どである（シーケンス５）。替わって，他の適当な送信
シーケンスは，I_L2,B_E2,P_L4,B_L3,P_E3,B_E4,P_L6,B_L5,B_E5,
P_E6,I_L8,B_L7,B_E7,B_E8,P_L10,B_L9,P_E9,B_E10,P_L12,B_L11,B
_E11,I_E12などである（シーケンス６）。さらに，適当な
送信シーケンスは，I_L2,P_L4,B_E2,B_L3,P_E3,P_L6,B_E4,B_L5,
B_E5,I_L8,P_E6,B_L7,B_E7,P_L10,B_E8,B_L9,P_E9,P_L12,B_E10,B
_L11,B_E11などである（シーケンス７）。Here, the prediction method is as follows. Recall that in the base layer, the B picture is predictively coded using the most recent preceding I or P picture and the most recent succeeding I or P picture. Therefore, the lower layer sequence 400
For example, B _L3 is encoded using I _L2 and P _L4 . A suitable image transmission sequence starting with I _L2 according to the invention is I _L2 , P _L4 , B _L3 , B _E2 , P _E3 , P _L6 , B _L5 , B _E4 , B _E5 , I _L8 , B
_L7, and the like _{_{_{P E6, B E7, P L10}}} , B L9, B E8, P E9, P L12, B L11, B E10, B E11 ( sequence 5). Alternatively, other suitable transmission sequences are I _L2 , B _E2 , P _L4 , B _L3 , P _E3 , B _E4 , P _L6 , B _L5 , B _E5 ,
P _E6 , I _L8 , B _L7 , B _E7 , B _E8 , P _L10 , B _L9 , P _E9 , B _E10 , P _L12 , B _L11 , B
_E11 , _IE12, etc. (sequence 6). Further, suitable transmission sequences are I _L2 , P _L4 , B _E2 , B _L3 , P _E3 , P _L6 , B _E4 , B _L5 ,
B _E5 , I _L8 , P _E6 , B _L7 , B _E7 , P _L10 , B _E8 , B _L9 , P _E9 , P _L12 , B _E10 , B
_L11 , _BE11, etc. (sequence 7).

【００７１】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは以下のようにDTS_Liから
決定される。各画像に対し，画像のプレゼンテーション
は，画像の復合化に続きFの整数倍だけ遅れる。In either the lower or enhanced sequence
For the i-th image, DTS and PTS are determined from DTS _Li as follows. For each image, the presentation of the image is delayed by an integer multiple of F following the decoding of the image.

【００７２】例えば，上記第１送信シーケンス（シーケ
ンス５）に関して，復合化及び呈示は以下の表５の記載
のように生じる。For example, with respect to the first transmission sequence (sequence 5), decoding and presentation occur as shown in Table 5 below.

【００７３】[0073]

【表５】ここで，３つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2,P_L4及びB_L3は，B_E2が受信され
る前に復合化されかつ保存され，受信されると，B_E2は
すぐに復合化されかつI_L2と同時にプレゼンテーション
用に直接出力される。[Table 5] Here, it is necessary to save only the three decoded images. For example, I _L2 , P _L4 and B _L3 are decrypted and stored before B _E2 is received, and upon receipt B _E2 is immediately decrypted and output directly for presentation at the same time as I _L2 Is done.

【００７４】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表５の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or enhanced sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 5.

【００７５】PTS_Li=DTS_Li+(mod2(i+1)+1)1.5F，すべて
のiに対してDTS_Ei=DTS_Li+1.5F，i=2に対してDTS_Ei=DTS
_Li+(1+2mod2(i+1))F，i>2に対してPTS_Ei=PTS_Li，すべて
のiに対してここで，mod2(i)は，iが偶数のときmod2(i)
=0であり，iが奇数のときmod2(i)=1であるような整数i
のベース２モジューロである。[0075] _{_{PTS Li = DTS Li + (mod2}} (i + 1) +1) 1.5F, DTS for all i _Ei = DTS _Li + 1.5F, with respect to i = 2 DTS _Ei = DTS
_Li + (1 + 2mod2 (i + 1)) F, PTS _Ei = PTS _Li for i> 2, for all i, where mod2 (i) is mod2 (i) when i is even
= 0 and integer i such that when i is odd, mod2 (i) = 1
This is a base 2 modulo.

【００７６】シーケンス６に関して，復合化及び呈示は
以下の表６に説明されたように生じる。For Sequence 6, decryption and presentation occur as described in Table 6 below.

【００７７】[0077]

【表６】ここで，２つの復合化された画像に対してのみ保存が必
要である。例えば，P_L4及びB_L3は，P_E3が受信される前
に復合化されかつ保存され，受信されると，P_E3はすぐ
に復合化されかつI_L2と同時にプレゼンテーション用に
直接出力される。[Table 6] Here, it is necessary to save only the two decoded images. For example, P _L4 and B _L3 are decrypted and stored before P _E3 is received, and upon receipt, P _E3 is immediately decrypted and output directly for presentation at the same time as I _L2 .

【００７８】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表６の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or enhanced sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 6.

【００７９】PTS_Li=DTS_Li+F，i=2に対してPTS_Li=DTS_Li+
(3mod2(i+1)+1)0.5F，i>2に対してDTS_Ei=DTS_Li+0.5F，i
=2に対してDTS_Ei=DTS_Li+(1+2mod2(i+1))0.5F，i>2に対
してPTS_Ei=PTS_Li，すべてのiに対してシーケンス７に関
して，復合化及び呈示は以下の表７に説明されたように
生じる。For PTS _Li = DTS _Li + F, i = 2, PTS _Li = DTS _Li +
DTS _Ei = DTS _Li + 0.5F, i for (3mod2 (i + 1) +1) 0.5F, i> 2
= 2 for DTS _Ei = DTS _Li + (1 + 2mod2 (i + 1)) 0.5F, for i> 2 PTS _Ei = PTS _Li , for all i, decoding and presentation for sequence 7 Occurs as described in Table 7 below.

【００８０】[0080]

【表７】ここで，２つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2及びP_L4は，B_E2が受信される前
に復合化されかつ保存され，受信されると，B_E2はすぐ
に復合化されかつI_L2と同時にプレゼンテーション用に
直接出力される。[Table 7] Here, it is necessary to save only the two decoded images. For example, I _L2 and P _L4 is a Fukugo of and stored before the B _E2 is received, it is received, B _E2 is outputted immediately be Fukugo of and directly for I _L2 presentation concurrently.

【００８１】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表７の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or the enhanced sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 7.

【００８２】PTS_Li=DTS_Li+F，i=2に対してPTS_Li=DTS_Li+
(4mod2(i+1)+1)0.5F，i>2に対してDTS_Ei=DTS_Li+F，i=2
に対してDTS_Ei=DTS_Li+(4mod2(i+1)+1)0.5F，i>2に対し
てPTS_Ei=PTS_Li，すべてのiに対して図５は本発明の装置
とともに使用する強化層画像シーケンス及び第４ベース
層画像シーケンスを図示したものである。ここで，下位
層はＩ，Ｐ及び２つの連続するＢ画像を含む。図２〜４
の同一のエレメントは同一の符号で示されている。強化
層200は上記したものと同一である。下位層500は画像シ
ーケンスB_L0(502),B_L1(504),I_L2(506),B_L3(508),B_L4(51
0),P_L5(512),B_L6(514),B_L7(516),I_L8(518),B_L9(520),B
_L10(522),P_L11(524)及びB_L12(526)を含む。GOPはI_L2(50
6)及びI_L8(518)で開始する。For PTS _Li = DTS _Li + F, i = 2, PTS _Li = DTS _Li +
DTS _Ei = DTS _Li + F, i = 2 for (4mod2 (i + 1) +1) 0.5F, i> 2
DTS _Ei = DTS _Li + (4mod2 (i + 1) +1) 0.5F for i> 2, PTS _Ei = PTS _Li for i> 2, FIG. 5 for use with the device of the invention for all i Fig. 4 illustrates an enhancement layer image sequence and a fourth base layer image sequence. Here, the lower layer includes I, P, and two consecutive B images. Figures 2-4
Are designated by the same reference numerals. The reinforcement layer 200 is the same as described above. The lower layer 500 is an image sequence B _L0 (502), B _L1 (504), I _L2 (506), B _L3 (508), B _L4 (51
0), _PL5 (512), _BL6 (514), _BL7 (516), _IL8 (518), _BL9 (520), B
Includes _L10 (522), P _L11 (524) and B _L12 (526). GOP is I _L2 (50
Start with 6) and I _L8 (518).

【００８３】I_L2で始まる，本発明に従う適当な画像シ
ーケンスは，I_L2,P_L5,B_L3,B_E2,B_L4,P_E3,B_E4,I_L8,B_L6,B
_E5,B_L7,P_E6,B_E7,P_L11,B_L9,B_E8,B_L10,P_E9,B_E10などであ
る（シーケンス８）。この送信シーケンスに関して，復
合化及び呈示は以下の表８に説明されるように生じる。[0083] starting with I _L2, appropriate image sequence according to the _{_{invention, I L2, P L5, B}} L3, B E2, B L4, P E3, B E4, I L8, B L6, B
_E5, B _L7, and the like _{_{_{P E6, B E7, P L11}}} , B L9, B E8, B L10, P E9, B E10 ( sequence 8). For this transmission sequence, decryption and presentation occur as described in Table 8 below.

【００８４】[0084]

【表８】ここで，３つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2,P_L5及びB_L3は，B_E2が受信され
る前に復合化されかつ保存され，受信されると，B_E2は
すぐに復合化されかつI_L2と同時にプレゼンテーション
用に直接出力される。[Table 8] Here, it is necessary to save only the three decoded images. For example, I _L2 , P _L5, and B _L3 are decrypted and stored before B _E2 is received, and upon receipt, B _E2 is immediately decrypted and output directly for presentation at the same time as I _L2. Is done.

【００８５】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表８の移送シーケンス
に対して以下のようにDTS_Liから決定される。[0085] Within either the lower or enhanced sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 8.

【００８６】PTS_Li=DTS_Li+1.5F，i=2に対してPTS_Li=DTS
_Li+(5mod2(mod3(i-1))+3)0.5F，i>2に対してDTS_Ei=DTS
_Li+1.5F，i=2に対してDTS_Ei=DTS_Li+(3-mod2(mod3(i))+5
mod2(mod3(i-1))0.5F，i>2に対してPTS_Ei=PTS_Li，すべ
てのiに対してここで，mod3(i)は，i=0+3nのときmod3
(i)=0であり，i=1+3nのときmod3(i)=1であり，i=2+3nの
ときmod3(i)=2であるような(n=0,1,2,3,...)，整数iの
ベース３モジューロである。For PTS _Li = DTS _Li + 1.5F, i = 2, PTS _Li = DTS
_Li + (5mod2 (mod3 (i-1)) + 3) 0.5F, DTS _Ei = DTS for i> 2
DTS _Ei = DTS _Li + (3-mod2 (mod3 (i)) + 5 for _Li + 1.5F, i = 2
mod2 (mod3 (i-1)) 0.5F, PTS _Ei = PTS _Li for i> 2, for all i, where mod3 (i) is mod3 when i = 0 + 3n
(i) = 0, mod3 (i) = 1 when i = 1 + 3n, and mod3 (i) = 2 when i = 2 + 3n (n = 0,1,2, 3, ...), the base 3 modulo of the integer i.

【００８７】替わって，他の適当な送信シーケンスは，
I_L2,B_E2,P_L5,B_L3,P_E3,B_L4,B_E4,B_E5,I_L8,B_L6,P_E6,B_L7,B
_E7,B_E8,P_L11,B_L9,P_E9,B_L10,B_E10,B_E11などである（シー
ケンス９）。この送信シーケンスに関して，復合化及び
呈示は以下の表９に説明されるように生じる。Alternatively, another suitable transmission sequence is:
I _L2 , B _E2 , P _L5 , B _L3 , P _E3 , B _L4 , B _E4 , B _E5 , I _L8 , B _L6 , P _E6 , B _L7 , B
_E7 , _BE8 , _PL11 , _BL9 , _PE9 , _BL10 , _BE10 , _BE11, etc. (sequence 9). For this transmission sequence, the decryption and presentation occurs as described in Table 9 below.

【００８８】[0088]

【表９】ここで，２つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2及びB_E2は，P_L5が受信される前
に復合化されかつ保存され，受信されると，I_L2及びB_E2
は同時のプレゼンテーション用に出力される。[Table 9] Here, it is necessary to save only the two decoded images. For example, I _L 2 and B _E2 is Fukugo of and stored before the P _L5 is received, it is received, I _L2 and B _E2
Is output for simultaneous presentation.

【００８９】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表９の移送シーケンス
に対して以下のようにDTS_Liから決定される。In either the lower or enhanced sequence
For the ith image, DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 9.

【００９０】PTS_Li=DTS_Li+F，i=2に対してPTS_Li=DTS_Li+
(5mod2(mod3(i-1))+1)0.5F，i>2に対してDTS_Ei=DTS_Li+
0.5F，i=2に対してDTS_Ei=DTS_Li+(5mod2(mod3(i-1))+1)
0.5F，i>2に対してPTS_Ei=PTS_Li，すべてのiに対して他
の適当な送信シーケンスは，I_L2,P_L5,B_E2,B_L3,P_E3,B_L4,
B_E4,I_L8,B_E5,B_L6,P_E6,B_L7,B_E7,P_L11,B_E8,B_L9,P_E9,B_L10,
B_E10などである（シーケンス１０）。この送信シーケン
スに関して，復合化及び呈示は以下の表１０に説明され
るように生じる。For PTS _Li = DTS _Li + F, i = 2, PTS _Li = DTS _Li +
DTS _Ei = DTS _Li + for (5mod2 (mod3 (i-1)) + 1) 0.5F, i> 2
DTS _Ei = DTS _Li + (5mod2 (mod3 (i-1)) + 1) for 0.5F, i = 2
0.5F, i> 2 with respect to PTS _Ei = PTS _Li, other suitable transmission sequence for all _{_{i, I L2, P L5,}} B E2, B L3, P E3, B L4,
B _E4 , I _L8 , B _E5 , B _L6 , P _E6 , B _L7 , B _E7 , P _L11 , B _E8 , B _L9 , P _E9 , B _L10 ,
_BE10, etc. (sequence 10). For this transmission sequence, decryption and presentation occur as described in Table 10 below.

【００９１】[0091]

【表１０】ここで，２つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2及びP_L5は，B_E2が受信される前
に復合化されかつ保存され，受信されると，B_E2は復合
化されかつI_L2と同時のプレゼンテーションのために直
接出力される。[Table 10] Here, it is necessary to save only the two decoded images. For example, I _L2 and P _L5 is Fukugo of before the B _E2 is received and stored, it is received, B _E2 are output directly for Fukugoka it is and I _L2 and simultaneous presentation.

【００９２】下位または強化シーケンスのいずれか内の
i番目画像に対し，DTS及びPTSは表１０の移送シーケン
スに対して以下のようにDTS_Liから決定される。[0092] Within either the lower or enhancement sequence
For the ith image, the DTS and PTS are determined from DTS _Li as follows for the transport sequence in Table 10.

【００９３】PTS_Li=DTS_Li+F，i=2に対してPTS_Li=DTS_Li+
(6mod2(mod3(i-1))+1)0.5F，i>2に対してDTS_Ei=DTS_Li+
F，i=2に対してDTS_Ei=DTS_Li+(6mod2(mod3(i-1))+1)0.5
F，i>2に対してPTS_Ei=PTS_Li，すべてのiに対してシーケ
ンス１〜１０に関する上記ケースにおいて，連続符号化
が仮定されていた点に注意すべきである。並行符号化が
使用されるとき，PTS及びDTSの間の関係はより一般的な
方法で特徴づけられる。特に，下位層がＢ画像を有しな
いがＩ及び／またはＰ画像のみを有するとき，両層内の
すべての画像はプレゼンテーション順にデコーダに到着
する。したがって，下位または強化シーケンスのいずれ
か内のi番目画像に対し，DTS及びPTSは以下のようにDTS
_Liから決定される。For PTS _Li = DTS _Li + F, i = 2, PTS _Li = DTS _Li +
DTS _Ei = DTS _Li + for (6mod2 (mod3 (i-1)) + 1) 0.5F, i> 2
DTS _Ei = DTS _Li + (6mod2 (mod3 (i-1)) + 1) 0.5 for F, i = 2
Note that in the above case for _FTS , PTS _Ei = PTS _Li for i> 2 and sequences 1 to 10 for all i, continuous coding was assumed. When parallel coding is used, the relationship between PTS and DTS is characterized in a more general way. In particular, when the lower layer has no B pictures but only I and / or P pictures, all pictures in both layers arrive at the decoder in presentation order. Therefore, for the ith image in either the lower or enhanced sequence, the DTS and PTS are:
Determined from _Li .

【００９４】PTS_Li=DTS_Li+F， DTS_Ei=DTS_Li+F， PTS_Ei=PTS_Li この関係は以下の表１１に例示されている。DTS_LiとDTS
_L(i-1)の差はFである。PTS _Li = DTS _Li + F, DTS _Ei = DTS _Li + F, PTS _Ei = PTS _Li This relationship is illustrated in Table 11 below. DTS _Li and DTS
The difference between _{L (i-1)} is F.

【００９５】[0095]

【表１１】例えば，図２との関係で説明したシーケンス１を参照す
ると，復合化及び呈示は以下の表１２に示されるように
生じる。[Table 11] For example, referring to Sequence 1 described in relation to FIG. 2, decryption and presentation occur as shown in Table 12 below.

【００９６】[0096]

【表１２】ここで，１つの復合化された画像に対してのみ保存が必
要である。例えば，I_L2は，B_E2が受信される前に復合化
されかつ保存される。受信されると，B_E2はすぐに復合
化されかつI_L2と実質的に同時にプレゼンテーション用
に出力される。[Table 12] Here, only one decrypted image needs to be stored. For example, I _L2 is and stored is Fukugo of before the B _E2 is received. When received, B _E2 is outputted immediately be Fukugo of and for I _L2 substantially simultaneously presentations.

【００９７】下位層が非連続Ｂ画像を有するとき，DTS
及びPTSは以下のようにDTS_Liから決定される。もし下位
層内のi番目の画像が“閉じたGOP”インジケータを有す
るＩ画像またはそのようなＩ画像が続くＰ画像であれ
ば，PTS_Li=DTS_Li+2Fが成立する。もし下位層内のi番目
の画像がＰ画像または“開いたGOP”のＩ画像であって
かつ(i+1)番目の画像が“閉じたGOP”インジケータを有
するＩ画像でなければ，PTS_Li=DTS_Li+3Fが成立する。も
し下位層内のi番目の画像がＢ画像ならば，PTS_Li=DTS_Li
+Fである。強化層に対して，DTS_Ei=DTS_Li+2F及びPTS_Ei=
DTS_Li+2Fである。MPEG-2ビデオプロトコルにおいて，画
像ヘッダのグループはGOPの最初に含まれ，かつ１ビッ
トインジケータによってclosed_gop=0にセットされる
（ここで，closed_gop=1は閉じたGOPを示す）。開いたG
OPのＩ画像は復合化順にしたがってＰ画像と同様に扱わ
れる。When the lower layer has a discontinuous B image, DTS
And PTS are determined from DTS _Li as follows. If the i-th picture in the lower layer is an I-picture with a "closed GOP" indicator or a P-picture followed by such an I-picture, then PTS _Li = DTS _Li + 2F holds. If the i-th picture in the lower layer is a P picture or an I picture of an "open GOP" and the (i + 1) th picture is not an I picture with a "closed GOP" indicator, PTS _Li = DTS _Li + 3F holds. If the i-th image in the lower layer is a B image, PTS _Li = DTS _Li
+ F. For the reinforcement layer, DTS _Ei = DTS _Li + 2F and PTS _Ei =
DTS _Li + 2F. In the MPEG-2 video protocol, a group of picture headers is included at the beginning of a GOP and is set to closed_gop = 0 by a 1-bit indicator (where closed_gop = 1 indicates a closed GOP). G open
The I image of the OP is treated in the same order as the P image according to the decoding order.

【００９８】下位層内の非連続Ｂ画像に関する復合化及
び呈示は以下の表１３に例示されている。The decoding and presentation for non-continuous B images in the lower layers is illustrated in Table 13 below.

【００９９】[0099]

【表１３】特定の例において，下位層シーケンスはディスプレイ順
に，I_L0,B_L1,P_L2,B_L3,P_L4,B_L5,I_L6,I_L7などである。強
化層シーケンスはディスプレイ及び送信順に，P_E0,B_E1,
B_E2,B_E3,B_E4,B_E5,P_E6,P_E7などである。本発明に従うひ
とつの可能な送信順は，I_L0,P_L2,B_L1,P_E0,P_L4,B_E1,B_L3,
B_E2,I_L6,B_E3,B_L5,B_E4,I_L7,B_E5などである。DTS及びPTS
は表１４に示されるように決定される。[Table 13] In a specific example, the lower layer sequences are, in display order, I _L0 , B _L1 , P _L2 , B _L3 , P _L4 , B _L5 , I _L6 , I _L7, etc. The enhancement layer sequence consists of P _E0 , B _E1 ,
_BE2 , _BE3 , _BE4 , _BE5 , _PE6 , _PE7 and the like. One possible transmission order according to the present invention is I _L0 , P _L2 , B _L1 , P _E0 , P _L4 , B _E1 , B _L3 ,
_BE2 , _IL6 , _BE3 , _BL5 , _BE4 , _IL7 , _BE5 , and the like. DTS and PTS
Is determined as shown in Table 14.

【０１００】[0100]

【表１４】下位層が２つの連続Ｂ画像を有するとき，DTS及びPTSは
以下の規則で計算される。もし下位層内のi番目の画像
が閉じたGOPインジケータを有するＩ画像またはそのよ
うなＩ画像が続くＰ画像であれば，PTS_Li=DTS_Li+2Fであ
る。もし下位層内のi番目の画像がＰ画像または“開い
たGOP”のＩ画像であってかつ(i+1)番目の画像が“閉じ
たGOP”インジケータを有するＩ画像でなければ，PTS_Li
=DTS_Li+4Fが成立する。もし下位層内のi番目の画像がＢ
画像ならば，PTS_Li=DTS_Li+Fである。強化層に対して，D
TS_Ei=DTS_Li+2F及びPTS_Ei=DTS_Li+2Fである。[Table 14] When the lower layer has two consecutive B images, DTS and PTS are calculated according to the following rules. If the i-th picture in the lower layer is an I-picture with a closed GOP indicator or a P-picture followed by such an I-picture, then PTS _Li = DTS _Li + 2F. If the i-th picture in the lower layer is a P picture or an I picture of an "open GOP" and the (i + 1) th picture is not an I picture with a "closed GOP" indicator, PTS _Li
= DTS _Li + 4F holds. If the i-th image in the lower layer is B
For images, PTS _Li = DTS _Li + F. For the reinforcement layer, D
TS _Ei = DTS _Li + 2F and PTS _Ei = DTS _Li + 2F.

【０１０１】下位層内の２つの連続Ｂ画像に関する復合
化及び呈示は以下の表１５に例示されている。The decoding and presentation for two consecutive B images in the lower layer is illustrated in Table 15 below.

【０１０２】[0102]

【表１５】特定の例において，下位層シーケンスはディスプレイ順
にI_L0,B_L1,B_L2,P_L3,B_L4,B_L5,I_L6,I_L7などである。強化
層シーケンスはディスプレイ及び送信順に，P_E0,B_E1,B
_E2,B_E3,B_E4,B_E5,P_E6,P_E7などである。本発明に従うひと
つの可能な送信順は，I_L0,P_L3,B_L1,P_E0,B_L2,B_E1,I_L6,B
_E2,B_L4,B_E3,B_L5,B_E4,I_L7,B_E5などである。DTS及びPTSは
表１６に示されるように決定される。[Table 15] In certain instances, the lower layer sequence, etc. _{_{_{I L0, B L1, B L2}}} , P L3, B L4, B L5, I L6, I L7 on the display order. The enhancement layer sequence is P _E0 , B _E1 , B
_E2 , _BE3 , _BE4 , _BE5 , _PE6 , _PE7 and the like. One possible transmission order according to the present invention is I _L0 , P _L3 , B _L1 , P _E0 , B _L2 , B _E1 , I _L6 , B
_E2 , _BL4 , _BE3 , _BL5 , _BE4 , _IL7 , _BE5 , and the like. DTS and PTS are determined as shown in Table 16.

【０１０３】[0103]

【表１６】フレームモードに適用される上記規則は，フィルムモー
ドの対応するケースに一般化され得る。[Table 16] The above rules applied to the frame mode can be generalized to the corresponding case of the film mode.

【０１０４】図６は立体ビデオ用の強化層デコーダのブ
ロック図である。デコーダ130は圧縮された強化層デー
タを受信するための入力端子605及び該データをパージ
ング(parsing)するための移送レベル構文パーザ610を含
む。パージングされたデータは中央演算装置から成るメ
モリマネージャ630へ与えられる。該メモリマネージャ6
30は，例えば，ダイナミック・ランダム・アクセス・メ
モリ(DRAM)から成るメモリ620と連結している。メモリ
マネージャはまた減圧／予測プロセッサ640と連結し，
不均衡予測された強化層画像を復合化する際にプロセッ
サ640によって続いて使用するためにメモリ620内に一時
的に保存される復合化下位レベルデータを端子650を通
じて受信する。FIG. 6 is a block diagram of an enhancement layer decoder for stereoscopic video. The decoder 130 includes an input terminal 605 for receiving the compressed enhancement layer data and a transport level syntax parser 610 for parsing the data. The parsed data is provided to a memory manager 630 comprising a central processing unit. The memory manager 6
30 is connected to a memory 620 comprising, for example, a dynamic random access memory (DRAM). The memory manager also interfaces with the decompression / prediction processor 640,
Decoded lower level data temporarily stored in memory 620 is received via terminal 650 for subsequent use by processor 640 in decoding the unbalanced predicted enhancement layer image.

【０１０５】減圧／予測プロセッサ640はさまざまな処
理機能，例えば，エラー検出及び補正，移動ベクトル復
合化，逆量子化，逆離散コサイン変換，ハフマン復合化
及び予測計算を与える。減圧／予測機能640によって処
理された後，復合化された強化層データはメモリマネー
ジャによって出力される。替わって，復合化データは図
示されていない手段を通じて直接に減圧／予測機能640
から出力されることもできる。The decompression / prediction processor 640 provides various processing functions, such as error detection and correction, motion vector decoding, inverse quantization, inverse discrete cosine transform, Huffman decoding, and prediction calculations. After being processed by the decompression / prediction function 640, the decrypted enhancement layer data is output by the memory manager. Alternatively, the decrypted data is directly sent to the decompression / prediction function 640 through means not shown.
Can also be output from.

【０１０６】下位層に対しても類似の構造が使用され得
る。さらに，強化及び下位層は共通のハードウエアを共
有しうる。例えば，メモリ620及びプロセッサ640は共有
されることができる。しかし，並行符号化が採用される
場合にはこれは不可能である。ここに開示される送信シ
ーケンスに従って復合化が整合されるように，共通のク
ロック信号（図示せず）が与えられる。特に，不均衡予
測される強化層画像の予測用に使用される下位層画像ま
たは他の下位層画像を予測画像データの受信前に一時的
に保存することが必要である。本発明にしたがって，復
合化前に保存されるべき画像の数は最少化され，それに
よってメモリサイズの縮小をもたらす。A similar structure can be used for the lower layers. In addition, enhancements and lower layers may share common hardware. For example, the memory 620 and the processor 640 can be shared. However, this is not possible if parallel coding is employed. A common clock signal (not shown) is provided to match the decoding according to the transmission sequence disclosed herein. In particular, it is necessary to temporarily store a lower layer image or another lower layer image used for prediction of an unbalanced predicted enhancement layer image before receiving predicted image data. According to the present invention, the number of images to be stored before decryption is minimized, thereby resulting in a reduction in memory size.

【０１０７】見てきたように，本発明は立体ビデオ画像
シーケンスのための有利な画像送信手法を与える。特
に，画像はプレゼンテーション前に一時的に保存される
べき画像の数が最少化されるような順序で送信される。
さらに，ここに開示された送信シーケンスの例は，MPEG
-2 MVPプロトコル及び提案されたMPEG-4プロトコルの両
方と互換性がある。さらにまた，各画像に対するデコー
ド・タイム・スタンプ(DTS)及びプレゼンテーション・
タイム・スタンプ(PTS)はデコーダにおいて下位層及び
強化層の間の同期を与えるべく決定される。DTS及びPTS
は，復合化が連続か並行か，下位層がＢ画像を有しない
か，非連続Ｂ画像を有するかまたは２つの連続Ｂ画像を
有するかにしたがって設定される。As can be seen, the present invention provides an advantageous image transmission technique for stereoscopic video image sequences. In particular, the images are transmitted in an order that minimizes the number of images that must be temporarily stored prior to the presentation.
Further, an example of the transmission sequence disclosed herein is MPEG
-2 Compatible with both the MVP protocol and the proposed MPEG-4 protocol. Furthermore, the decoding time stamp (DTS) and presentation
The time stamp (PTS) is determined at the decoder to provide synchronization between the lower and enhancement layers. DTS and PTS
Is set according to whether the decoding is continuous or parallel, whether the lower layer has no B-picture, has a non-continuous B-picture or has two consecutive B-pictures.

【０１０８】発明はさまざまな特定の実施例について説
明されてきたが，特許請求の範囲に記載された発明の思
想及び態様から離れることなくさまざまな付加及び修正
が可能であることは当業者の知るところである。例え
ば，ここに開示された手法はここに特定して示されたも
の以外の他の下位及び強化層シーケンスに適用され得る
ことは当業者の知るところである。Although the invention has been described with reference to various specific embodiments, those skilled in the art will recognize that various additions and modifications may be made without departing from the spirit and aspects of the invention as set forth in the appended claims. By the way. For example, those skilled in the art will recognize that the techniques disclosed herein may be applied to other lower and enhancement layer sequences other than those specifically shown herein.

[Brief description of the drawings]

【図１】図１は，立体ビデオ用のコーダ／デコーダ構造
のブロック図である。FIG. 1 is a block diagram of a coder / decoder structure for stereoscopic video.

【図２】図２は，本発明の装置とともに使用するための
強化層画像シーケンス及び第１ベース層画像シーケンス
を示したものである。FIG. 2 shows an enhancement layer image sequence and a first base layer image sequence for use with the apparatus of the present invention.

【図３】図３は，本発明の装置とともに使用するための
強化層画像シーケンス及び第２ベース層画像シーケンス
を示したものである。FIG. 3 shows an enhancement layer image sequence and a second base layer image sequence for use with the apparatus of the present invention.

【図４】図４は，本発明の装置とともに使用するための
強化層画像シーケンス及び第３ベース層画像シーケンス
を示したものである。FIG. 4 illustrates an enhancement layer image sequence and a third base layer image sequence for use with the apparatus of the present invention.

【図５】図５は，本発明の装置とともに使用するための
強化層画像シーケンス及び第４ベース層画像シーケンス
を示したものである。FIG. 5 shows an enhancement layer image sequence and a fourth base layer image sequence for use with the apparatus of the present invention.

【図６】図６は，ステレオビデオ用の強化層デコーダ構
造のブロック図である。FIG. 6 is a block diagram of an enhancement layer decoder structure for stereo video.

[Explanation of symbols]

105 一時的リマルチプレクサ 110 強化エンコーダ 115 下位エンコーダ 120 システムマルチプレクス 122 デコーダ 125 システムデマルチプレクス 130 強化デコーダ 135 下位デコーダ 140 一時的リマルチプレクサ 105 Temporary remultiplexer 110 Enhanced encoder 115 Lower encoder 120 System multiplex 122 Decoder 125 System demultiplex 130 Enhanced decoder 135 Lower decoder 140 Temporary remultiplexer

Claims

[Claims]

A method for arranging a sequence of video images in a lower layer and an enhancement layer of a stereoscopic video signal for transmission to a decoder, wherein the enhancement layer is an image predicted using a corresponding lower layer image. Arranging the video images such that the unbalanced predicted enhancement layer images are transmitted after the corresponding respective lower layer images.

2. The method according to claim 1, wherein the lower layers are I _Li , I _{Li + 1} , and I
_2. The method according to claim ₁ , wherein only the inner coded picture (I picture) containing _{Li + 2} is included, and the corresponding enhancement layer pictures are represented by H _Ei , H _{Ei + 1} , and H _{Ei + 2} respectively. And arranging the video images such that the video images are transmitted in the order I _Li , I _{Li + 1} , H _Ei , I _{Li + 2} .

3. The method according to claim 1, wherein the lower layer includes only an inner coded image (I image) including the continuous images I _Li and I _{Li + 1} , and the corresponding enhancement layer images are represented by H _Ei and H _{Ei + 1} , respectively. 2. The method according to claim 1, wherein the video image further comprises
Arranging the video images so that they are transmitted in the order of _Li , H _Ei , I _{Li + 1} , H _{Ei + 1} .

4. The method according to claim 1, wherein the lower layers are I _Li , P _{Li + 1} , and P
Includes only the inner coded image (I image) and the predicted coded image (P image) including _{Li + 2} , and the corresponding enhancement layer images are represented by H _Ei , H _{Ei + 1} , and H _{Ei + 2} , respectively. 2. The method according to claim 1, wherein the video image is I _Li , P
Arranging the video images so that they are transmitted in the order of _{Li + 1} , H _Ei , P _{Li + 2} .

5. The lower layer includes only an intra-coded image (I-picture) including a continuous picture I _Li and P _{Li + 1} and a predicted coded picture (P-picture), and the corresponding enhancement layer pictures are H _2. The method according to claim 1, wherein the video image is transmitted in the order of I _Li , H _Ei , P _{Li + 1} , and H _{Ei + 1} , wherein the video image is represented by _Ei and H _{Ei + 1.} Arranging the video images.

6. The method according to claim 1, wherein said lower layers are I _Li , B _{Li + 1} , and P
_Intra coded picture (I picture), predictive coded picture (P picture) and discontinuous bidirectional predictive coded picture (B picture) containing _{Li + 2}
And the corresponding enhancement layer images are H _Ei , H _{Ei + 1} , and
_2. The method according to claim 1, wherein the video image is represented by I _Li , P _{Li + 2} , B _{Li + 1} , H _Ei , H
Arranging the video images to be transmitted in the order of _{Ei + 1} .

7. The method according to claim 1, wherein the lower layers are I _Li , B _{Li + 1} , and P
_Intra coded picture (I picture), predictive coded picture (P picture) and discontinuous bidirectional predictive coded picture (B picture) containing _{Li + 2}
And the corresponding enhancement layer images are H _Ei , H _{Ei + 1} , and
_2. The method of claim 1, wherein the video image is represented by I _Li , H _Ei , P _{Li + 2} , B _{Li + 1} , H _Ei _{+ 2} .
Arranging the video images such that they are transmitted in the order of _{Ei + 1} , H _{Ei + 2} .

8. The method according to claim 1, wherein the lower layers are I _Li , B _{Li + 1} , and P
_Intra coded picture (I picture), predictive coded picture (P picture) and discontinuous bidirectional predictive coded picture (B picture) containing _{Li + 2}
And the corresponding enhancement layer images are H _Ei , H _{Ei + 1} , and
_2. The method of claim 1 wherein the video image is represented by I _Li , P _{Li + 2} , H _Ei , B _{Li + 1} , H _Ei _{+ 2} .
Arranging the video images to be transmitted in the order of _{Ei + 1} .

9. The method according to claim 1, wherein the lower layers are I _Li , B _{Li + 1} , B _{Li + 2} ,
And P _{Li + 3} , including an intra-coded image (I image), a predicted coded image (P image), and a continuous bidirectional predictive coded image (B image), and the corresponding enhancement layer images are H _Ei , H _{Ei + 1} ,
_2. The method of claim 1, wherein the video images are represented by I _Li , P _{Li + 3} , B _{Li + 1} , H _{Ei + 2} and H _Ei ₊₃ .
Arranging the video images so that they are transmitted in the order of H _Ei , B _{Li + 2} , H _{Ei + 1} , H _{Ei + 2} .

10. The method according to claim 1, wherein the lower layers are I _Li , B _{Li + 1} , B
Includes an intra-coded picture (I picture), a predictive coded picture (P picture) and a continuous bidirectional predictive coded picture (B picture) containing _{Li + 2} and P _{Li + 3} , and the corresponding enhancement layer pictures are respectively H _Ei ,
_{_{H Ei + 1, H Ei +}} 2 , and a H _{Ei + 3} by the method of claim 1 where represented, further wherein the video image I _Li, H _Ei, P
_{Li + 3} , B _{Li + 1} , H _{Ei + 1} , B _{Li + 2} , _{HEi + 2} , _{HEi + 3} .

11. The method according to claim 1, wherein the lower layer is a sequence of images I _Li , B _{Li + 1} , B
Includes an intra-coded picture (I picture), a predictive coded picture (P picture) and a continuous bidirectional predictive coded picture (B picture) containing _{Li + 2} and P _{Li + 3} , and the corresponding enhancement layer pictures are respectively H _Ei ,
_2. The method according to claim 1, wherein the video image is represented by I _Li , P _{Li + 3} , H _{Ei + 1} , H _{Ei + 2} and H _Ei _{+ 3} .
Arranging the video images to be transmitted in the order of H _Ei , B _{Li + 1} , H _{Ei + 1} , B _{Li + 2} , H _{Ei + 2} .

12. A method for decoding a sequence of video images in a lower layer and an enhancement layer of a stereoscopic video signal in parallel, said lower layer comprising at least one intra-coded image (I-picture) and a prediction code. The method includes a coded image (P image) but does not include a bidirectional predictive coded image (B image), and is used to indicate a time for decoding each image and a time for presenting each image, respectively. Decode time
Providing a stamp (DTS) and a presentation time stamp (PTS) to the image, wherein the DTS of the i-th lower layer image is DTS _Li , and the PTS of the i-th lower layer image
Is PTS _Li , the DTS of the ith enhancement layer image is DTS _Hi , the PTS of the ith enhancement layer image is PTS _Hi , F is the time interval between the presentation of successive images, and PTS _Li =
DTS _Hi = PTS _Hi = DTS _Li + F.

13. A method for decoding a sequence of video images in a lower layer and an enhancement layer of a stereoscopic video signal in parallel, said lower layer comprising a discontinuous bidirectional predictive coded image (B).
Image), wherein a decoding time stamp (DTS) and a presentation time stamp (PTS) are provided to indicate the time for decoding and presenting each of the images, respectively. In the step of providing the image, the DTS of the i-th lower layer image is DTS _Li , the PTS of the i-th lower layer image is PTS _Li , and the DTS of the i-th enhancement layer image is DTS _Hi . , P of the i-th enhancement layer image
When TS is PTS _Hi , F is the time interval between the presentation of consecutive pictures, and when the i-th lower layer picture is an inner coded picture (I picture) with a closed GOP indicator, PTS _Li = DTS A process that is _Li + 2F.

14. The method according to claim 13, wherein the i-th lower layer image is a prediction coded image (P image) and
The method wherein PTS _Li = DTS _Li + 2F when the (i + 1) th lower layer image is an I image with a closed GOP indicator.

15. The I-picture according to claim 13, wherein the i-th lower-layer picture is a P-picture indicator and the (i + 1) -th lower-layer picture has a closed GOP indicator. If not, the method is PTS _Li = DTS _Li + 3F.

16. The method according to claim 13, wherein the i-th lower layer image is an I image having an open GOP indicator, and the (i + 1) -th lower layer image is If not an I-picture with a closed GOP indicator, P
The method where TS _Li = DTS _Li + 3F.

17. The method according to claim 13, wherein when the i-th lower layer image is a B image, PTS _Li = DTS _Li + F.

18. The method according to claim 13, wherein the DTS
_Hi = PTS _Hi = PTS _Li = DTS _Li + 2F.

19. A method for decoding a sequence of video images in a lower layer and an enhancement layer of a stereoscopic video signal in parallel, said lower layer comprising two consecutive bidirectional predictive coded images (B images). A method comprising at least one group, comprising a decoding time stamp (DTS) and a presentation time stamp (PTS) for respectively indicating the time for decoding and presenting each said image. ) To the image, wherein the DTS of the i-th lower layer image is DTS _Li , the PTS of the i-th lower layer image is PTS _Li , and the DTS of the i-th enhancement layer image is DTS _Li.
_Hi , the PTS of the ith enhancement layer image is PTS _Hi , F is the time interval between presentations of successive images,
PTS _Li = DTS _Li + 2F when the i-th lower layer picture is an inner coded picture (I picture) having a closed GOP indicator.

20. The method according to claim 19, wherein the i-th lower layer image is a prediction coded image (P image) and
The method wherein PTS _Li = DTS _Li + 2F when the (i + 1) th lower layer image is an I image with a closed GOP indicator.

21. The method according to claim 19, wherein the i-th lower layer image is a P image indicator and the (i + 1) th lower layer image has a closed GOP indicator. If not, the method of PTS _Li = DTS _Li + 4F.

22. The method according to claim 19, wherein the i-th lower-layer image is an I image having an open GOP indicator, and the (i + 1) -th lower-layer image is If not an I-picture with a closed GOP indicator, P
The method where TS _Li = DTS _Li + 4F.

23. The method according to claim 19, wherein when the i-th lower layer image is a B image, PTS _Li = DTS _Li + F.

24. The method according to claim 19, wherein the DTS
_Hi = PTS _Hi = PTS _Li = DTS _Li + 2F.