JP3693407B2

JP3693407B2 - Multi-view image encoding apparatus and decoding apparatus

Info

Publication number: JP3693407B2
Application number: JP8221596A
Authority: JP
Inventors: 竜二北浦; 敏男野村
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1996-04-04
Filing date: 1996-04-04
Publication date: 2005-09-07
Anticipated expiration: 2016-04-04
Also published as: JPH09275578A

Description

【０００１】
【発明の属する技術分野】
本発明は画像高能率符号化及び復号において、特に動き補償あるいは視差補償予測を用いた多視点画像の高能率符号化及び復号装置に関するものである。
【０００２】
【従来の技術】
多視点画像の符号化において、動画像符号化の際に一般的に用いられている動き補償と、視点の異る画像間に対し視差補償を組み合わせて符号化することにより、符号化効率を向上させる方式がよく知られている。例えば文献（Ｗ．Ａ．シュップ，安田，“視差補償および動き補償を用いたステレオ動画像のデータ圧縮”，PCSJ88(1988)，pp.63-64）によれば、図１１に示すように、右画像に対しては動き補償のみを行なう。左画像に対しては視差補償と動き補償のうち、予測誤差がより少ない方の補償方法を選択し、符号化するという方式である。
【０００３】
また、動画像の符号化においてブロックマッチングによる動き補償を行なう際に、ブロックの大きさを変えて動きベクトルを求める例が、特開平６−１１３２８３号公報に開示されている。この方式では、図１２に示すようにまず小ブロック７１を用いて動き補償を行ない、小ブロック７１に対する動きベクトル（以降、小ブロックベクトルと略す）を求める。それらの小ブロック７１を複数個集めて大ブロックとし、動き補償を行ない、大ブロックに対する動きベクトル（以降、大ブロックベクトルと略す）を求める。
【０００４】
大ブロック７０を４つの小ブロックにより構成したとすると、小ブロックベクトルが大ブロックベクトルに対して予め決められた範囲内に存在し、そこに存在する小ブロックベクトルの数が所定の閾値を越えた場合、大ブロックベクトルと、小ブロックベクトルと大ブロックベクトルの差分を伝送することにより動きベクトルの情報量を低減する。
【０００５】
【発明が解決しようとする課題】
しかしながら従来の方法では、視差補償を行う際に視差のある部分と視差のない部分全てに対して視差補償を行なうので、効率がよくない。しかも求まった視差ベクトルの大きさは通常、動き補償において求まる動きベクトルの大きさよりも大きくなることが多く、それをそのままの大きさで符号化するため、視差ベクトルの符号化効率がよくないという問題点がある。
【０００６】
またこの方式では、動き補償を用いるか視差補償を用いるかが選択的に決定されるので視差ベクトルが必ず伝送されるわけではない。従って、複数の物体が立体表示されるときに、一部の物体の奥行きだけを変える等の編集を行うことができない。
【０００７】
一方、動画像の符号化を大ブロックと小ブロックの２種類の動きベクトルにより符号化する方法では、画面全体にわたって２段階の探索をする必要があるため計算量が莫大になる。また、大ブロックは複数の小ブロックにより構成されているため、信頼度の異なる２種類のベクトル及びその差分を求めているに過ぎず、それらのベクトル間に物理的な意味の相違はない。さらに、この２種類の動きベクトルに異なる作用をもたせることができないため、部品画像編集に利用することは難しい。
【０００８】
本発明の目的は、多視点画像の符号化において動き補償及び視差補償予測を用いることにより高効率の符号化を実現しながら、視差ベクトルを２段階に分けることにより視差ベクトルの符号化効率を向上させるとともに、視差ベクトルを求める際の計算量を低減させ、２段階に分けた視差ベクトルを利用して立体部品画像の編集を簡単に行うことができる視差補償符号化及び復号装置を提供することにある。
【０００９】
【課題を解決するための手段】
本発明の符号化装置は上記目的を達成するために、多視点立体画像を少なくとも１つの視差のある部品画像と、少なくとも１つの視差のない背景画像に分けて入力する入力手段と、部品画像のフレームあるいはフィールド間の動き量を用いて動き補償予測を行う動き補償手段と、部品画像のフレームあるいはフィールド間の視差量を用いて視差補償予測を行う視差補償手段と、視差ベクトル分割手段とを各部品画像毎に備える多視点符号化装置であって、前記視差ベクトル分割手段は、前記視差補償手段で求められた視差ベクトルのうちの１つをグローバル視差ベクトルとし、残りの視差ベクトルからグローバル視差ベクトルを引いたものをローカル視差ベクトルとして出力することを特徴とする。
【００１０】
また本発明の符号化装置では、３視点以上の多視点立体画像を少なくとも１つの視差のある部品画像と、少なくとも１つの視差のない背景画像に分けて入力する入力手段と、部品画像のフレームあるいはフィールド間の動き量を用いて動き補償予測を行う動き補償手段と、部品画像のフレームあるいはフィールド間の視差量を用いて視差補償予測を行う視差補償手段とを各部品画像毎に備えるとともに、各部品画像毎に１つの視差ベクトル分割手段と少なくとも１つの差分器を備える多視点符号化装置であって、前記視差ベクトル分割手段は、特定の視点間において求められた視差ベクトルのうちの１つをグローバル視差ベクトルとし、残りの視差ベクトルからグローバル視差ベクトルを引いたものをローカル視差ベクトルとして出力するとともに、前記差分器は、他の視点間において求められたすべての視差ベクトルと前記グローバル視差ベクトルとの差分を出力することを特徴とする。
【００１１】
また本発明では、グローバル視差ベクトルとして立体部品画像内の全ての視差ベクトルの平均を用いてもよい。
【００１２】
また本発明では、グローバル視差ベクトルとして立体部品画像内で最初に求めた視差ベクトルを用いてもよい。
【００１３】
また本発明では、グローバル視差ベクトルとして立体部品画像内の視差ベクトルの大きさをヒストグラムにして、最も発生頻度の多い視差ベクトルを用いてもよい。
【００１４】
また本発明では、現フレームあるいはフィールドのグローバル視差ベクトルとして、現フレームあるいはフィールドのグローバル視差ベクトルと直前のフレームあるいはフィールドのグローバル視差ベクトルの差分を用いてもよい。
【００１５】
また本発明では、ローカル視差ベクトルとして、直前に求めたローカル視差ベクトルとの差分を用いてもよい。
【００１６】
また本発明の復号装置では、多視点立体画像のフレームあるいはフィールド間の動き量及び視差量を用いて、動き補償を行う動き補償手段及び視差補償を行う視差補償手段とを備える多視点画像復号装置において、入力されたグローバル視差ベクトルと前記ローカル視差ベクトルを用いて前記視差ベクトルを視差補償手段に出力する視差ベクトル合成手段と、奥行き方向の移動パラメータを画像編集手段に出力するパラメータ入力手段と、入力されたパラメータを用いて立体部品画像の奥行きを変化させて画像合成手段に出力する画像編集手段と、入力された背景画像と立体部品画像を合成する画像合成手段を備えることを特徴とする。
【００１７】
また本発明では、前記画像編集手段は、前記グローバル視差ベクトルが小さい立体部品画像から順に出力し、前記画像合成手段は、背景画像が入力された後に順次入力される立体部品画像を背景画像上に上書きすることにより画像を合成するようにしてもよい。
【００１８】
【発明の実施の形態】
以下、本発明の実施の形態について詳細に説明する。
【００１９】
図１に本発明の画像符号化装置の第１の実施の形態を示す。本実施の形態は２眼式の例であり、この画像符号化装置に入力する画像は、視差のない背景画像と視差のある立体部品画像とする。背景画像と立体部品画像はどちらも静止画像と動画像のいずれであってもよいが、ここでは両者とも動画像であるものとする。
【００２０】
ここで立体部品画像とは、例えば図２に示すように、２台のカメラを平行に配置して撮影したステレオ画像２を切り出し装置３に入力し、注視物体だけを切り出したものを示す。また、図２の画像入力部５におけるＶＯＰ(Video Object Plane)とは、複数の部品画像によって合成画像を形成する場合の１つ１つの構成要素を意味し、通常、形状データとテクスチャデータにより構成されるが、ここでは説明を簡単にするために任意形状の領域に分割された部品画像としてとり扱う。立体部品画像ｏｂｊ_n（ｎ＝１，２，…によって異なる部品を区別する）は視点数と同じ数のＶＯＰ_mn（ｍ＝１，２，…によって異なる視点を区別し、２眼式の場合はｍ＝１，２となる）に分離されており、ＶＯＰ₀を背景画像１とする。このとき２台のカメラ間隔は人の目と同じとしてもよい。また２台のカメラは平行に配置されており、かつ上下方向のズレや歪みがないように調整されているので、切り出し装置から切り出された画像の視差は垂直成分はなく、水平成分のみをもつものとする。
【００２１】
まず、図１を用いて背景画像の符号化方法を説明する。動画像の符号化において、符号化の単位をブロック（例えば１６×１６画素で構成される）として符号化するブロックマッチングを用いた動き補償予測符号化という方法が一般的によく知られている。本実施の形態において、背景画像の符号化方法はこのブロックマッチングを用いた動き補償予測符号化の方法と同じである。例えば背景画像ＶＯＰ₀は、動き補償部６により前フレームあるいはフィールドの復号画像を用いて動き補償され、検出された動きベクトルが可変長符号化部１０に伝送される。
【００２２】
その後減算器７で背景画像ＶＯＰ₀と動き補償された画像の差分がとられ、その差分データが変換部８と量子化部９を通って、可変長符号化部１０に伝送される。またこの差分データは、可変長符号化部１０に入力されるとともに、逆量子化部１１と逆変換部１２を通り、加算器１３で動き補償された画像が加算され、背景画像ＶＯＰ₀の復号画像がフレームメモリ１４に蓄えられる。これ以降は、次の入力に対して、この繰り返しで符号化される。
【００２３】
次に立体部品画像の符号化について説明する。立体部品画像ｏｂｊ₁を符号化する順序を図３を用いて説明する。符号化は以下の（ａ）〜（ｄ）の手順で行なわれる。
【００２４】
（ａ）立体部品画像ｏｂｊ₁の左目用の部品画像４３はフレームあるいはフィールド内で符号化を行なう。
【００２５】
（ｂ）部品画像４３の復号画像を用いた視差補償により、右目用の部品画像４４に対してフレームあるいはフィールド間符号化を行なう。
【００２６】
（ｃ）部品画像４５は部品画像４３の復号画像を用いた動き補償によりフレームあるいはフィールド間符号化を行なう。
【００２７】
（ｄ）部品画像４５の復号画像を用いた視差補償により、右目用の部品画像４６はフレームあるいはフィールド間符号化を行なう。
【００２８】
すなわち左目用画像は動き補償で符号化され、右目用画像は同時刻の左目用画像を用いた視差補償で符号化される。以降（ｃ）、（ｄ）の手順を繰り返して符号化は行なわれる。ただし（ｃ）、（ｄ）の手順を繰り返して符号化するとき、予測誤差の伝搬を防ぐために（ｃ）、（ｄ）の手順の代わりに（ａ）、（ｂ）の手順による符号化を、一定の周期で行なってもよい。
【００２９】
図１において、立体部品画像ｏｂｊ₁を構成する左目用画像ＶＯＰ₁₁は、背景画像の符号化のときと同様の方式で符号化を行なう。同じく立体部品画像ｏｂｊ₁を構成する右目用画像ＶＯＰ₂₁は、左目用画像ＶＯＰ₁₁の復号画像を用いてブロックマッチングによる視差補償を行ない、符号化する。このとき、視差補償部１６に右目用画像ＶＯＰ₂₁と、加算器１５からの左目用画像ＶＯＰ₁₁の復号画像が入力され、視差ベクトルが求まる。
【００３０】
視差ベクトルの求め方は、図４に示すように左目用の部品画像３１と右目用の部品画像３２を画像３３のように同一平面上に重ねて、注目ブロックに対してマッチングを行なうことにより行う。従って１つのブロックに対して１つの視差ベクトルが求まる。求まった視差ベクトルは視差ベクトル分割器１７に入力され、１つのグローバル視差ベクトルと、それぞれの視差ベクトルからグローバル視差ベクトルを引いた、ローカル視差ベクトルに分けられる。
【００３１】
つまり立体部品画像内において、グローバル視差ベクトルは１個の所定のブロックの視差を表し、ローカル視差ベクトルはそれ以外のブロックの視差からグローバル視差ベクトルを引いたものとなる。例えば、図５のように視差ベクトルｖ₁をグローバル視差ベクトルＧＶとすると、視差ベクトルｖ₂からグローバル視差ベクトルＧＶ引いたものがローカル視差ベクトルＬＶ₂となり、視差ベクトルｖ₂はＧＶ＋ＬＶ₂として表すことができる。
【００３２】
よって、ｋ（ｋ＝１，２，…）本の視差ベクトルは１本のグローバル視差ベクトルとｋ−１本のローカル視差ベクトルで表すことができる。このときグローバル視差ベクトルは３次元空間における立体部品画像の奥行き（存在位置）に対応し、ローカル視差ベクトルは立体部品画像内での局所的な奥行き分布（立体形状）に対応する。
【００３３】
このとき、全ての視差ベクトルの平均をグローバル視差ベクトルとすることによって、ローカル視差ベクトルの情報量を低減することができる。また最初に求めた視差ベクトルをグローバル視差ベクトルとしてもよく、これによりグローバル視差ベクトルを選択する際の計算量が低減される。さらに視差ベクトルの大きさをヒストグラムにして、最も発生頻度の多い視差ベクトルをグローバル視差ベクトルとすることによって、３次元空間における立体部品画像の存在位置を適切に表現することができる。
【００３４】
一般的に視差ベクトルをｋ本もつことに比べ、グローバル視差ベクトルを１本と、値の小さなｋ−１本のローカル視差ベクトルをもつ方が視差ベクトルの情報量を低減することができる。
【００３５】
また立体画像において、立体画像内の隣接ブロック間における視差量の変化は少ないが、立体画像内に複数の物体が存在し、それらの物体が部品化されていない場合、異なる物体が重なり合う部分における視差は急激に変化することがある。本発明では立体部品画像を用いて部品毎に符号化を行うため、１つの立体部品画像内においてはこのような視差の急激な変化は起こらない。よって隣合うブロックのローカル視差ベクトルで差分をとることにより、ローカル視差ベクトルの情報量をさらに削減することができる。
【００３６】
これらのベクトルは図１の可変長符号化部１０に伝送される。減算器１８で、右目用画像ＶＯＰ₂₁と視差補償された画像との差分がとられ、変換部１９と量子化部２０を通り、可変長符号化部１０に伝送される。以下、次の入力に対して、この繰り返しで符号化される。他の立体部品画像についても同様の方式で符号化される。可変長符号化部１０に伝送されたデータはそこで可変長符号化され、多重化部２１で多重化される。
【００３７】
ここでは１つの背景画像とｎ個の立体部品画像を符号化したが、背景画像は１つに限らず複数でも構わない。また図３において、時間方向にはグローバル視差ベクトルはそれほど急激に変わらないので、現在のグローバル視差ベクトル（例えば左目用画像４５と右目用画像４６で求められたグローバル視差ベクトル）と１フレームあるいはフィールド前のグローバル視差ベクトル（例えば左目用画像４３と右目用画像４４で求められたグローバル視差ベクトル）の差分をとることによりグローバル視差ベクトルの情報量を削減することができる。
【００３８】
本実施の形態では２眼式の立体部品画像の符号化について述べたが、多視点立体部品画像においても同様に符号化できる。
【００３９】
次に本発明における第２の実施の形態について説明する。本実施の形態は第１の実施の形態の符号化装置を、多視点立体画像にも対応できるようにしたものである。このとき背景画像は、第１の実施の形態と同じ方法で符号化する。
【００４０】
図６に本発明の画像符号化装置の第２の実施の形態を示す。図６における入力画像は、多視点画像ＶＯＰ_mn（ｍ＝１，２，…、ｎ＝１，２，…）である。画像ＶＯＰ₁₁と画像ＶＯＰ₂₁は、図１の画像ＶＯＰ₁₁と画像ＶＯＰ₁₂のときと同じようにそれぞれ符号化する。復号された画像ＶＯＰ₂₁は加算器５１から出力されて、視差補償部５３に入力される。さらに画像ＶＯＰ₁₁と画像ＶＯＰ₂₁を用いて求めたグローバル視差ベクトルが、視差補償部５３に入力される。
【００４１】
視差補償部５３では、ＶＯＰ₂₁の復号画像を用いて入力画像ＶＯＰ₃₁の視差補償を行ない、視差ベクトルを求める。求めた視差ベクトルと画像ＶＯＰ₁₁と画像ＶＯＰ₂₁を用いて求めたグローバル視差ベクトルが差分器５４に入力され、それらの差分がローカル視差ベクトルとして出力される。すなわち、ここでは新たにグローバル視差ベクトルを求める必要がない。
【００４２】
ここで多視点画像であってもグローバル視差を共通にできることについて説明する。説明を簡略化するために３視点をもつ立体部品画像について説明する。
【００４３】
例えば図７に示すように、原点をＯ₁、ｘ軸を水平方向、ｚ軸を奥行き方向とした座標系をとる。この座標上にカメラ群Ｃ_m（ｍ＝１，２，３）をそれぞれａの間隔で光軸がｚ軸と平行になるように、点Ｏ_m（ｍ＝１，２，３）の位置に配置する。第１の実施の形態と同様に、カメラは上下方向のズレや歪みがないように調整されているものとする。
【００４４】
また点Ａ₀を立体部品画像ｏｂｊ₁の代表点とし、点Ａ₀と点Ｏ_mを結び、カメラの仮想画像面（カメラにより撮影された画像面を示す）と交差する点をそれぞれＡ_m（ｍ＝１，２，３）とする。点Ｏ_mを通り、ｚ軸に平行に伸ばした直線がカメラの仮想画像面と交わる点をそれぞれＱ_m（ｍ＝１，２，３）とする。これらの点Ｑ_m は各カメラによる画像の中心点を表す。
【００４５】
ここでＡ₁、Ａ₂、Ａ₃のｘ座標をそれぞれｘ₁、ｘ₂、ｘ₃とする。カメラの仮想画像面とｘ軸は平行ゆえ、｜Ａ₁Ａ₂｜と｜Ａ₂Ａ₃｜は等しくなる。これとｘ₁＜ｘ₂＜ｘ₃より、
ｘ₁−ｘ₂ ＝ｘ₂−ｘ₃ （１）
となる。
【００４６】
また、カメラＣ₁とＣ₂における視差ベクトルｄ₁は水平成分しかもたないことから、１次元ベクトルであり、ベクトルＱ₁Ａ₁からベクトルＱ₂Ａ₂を引いたものとなるので、
ｄ₁ ＝ｘ₁ − （ｘ₂−ａ₁）（２）
となる。
【００４７】
次にカメラＣ₂とＣ₃における視差ベクトルｄ₂は、ベクトルＱ₃Ａ₃からベクトルＱ₁Ａ₂を引いたものとなり、
ｄ₂ ＝ｘ₂ − （ｘ₃−ａ）（３）
となる。
【００４８】
よって式（１）〜式（３）より、
ｄ₁ ＝ｄ₂ （４）
となる。このように隣合うカメラ間の代表点における視差ベクトルは全て等しくなる。このことは３視点の立体画像に限らず、３以外の視点数の場合も同様である。
【００４９】
従って本方式では、グローバル視差ベクトルは代表点における視差ベクトルとみなすことができるので、視点数がいくつであってもグローバル視差ベクトルは１つもてばよく、多視点画像において視差ベクトルを求める際の計算量及び、視差ベクトル自体の情報量が削減できる。
【００５０】
ゆえに、図６において、差分器５４から出力される値は、視差補償部５３から出力された視差ベクトルからＶＯＰ₁₁とＶＯＰ₂₁の間で求めたグローバル視差ベクトルを引いたローカル視差ベクトルとすることができる。図６における他のＶＯＰについても同様に符号化される。
【００５１】
次に本発明における第３の実施の形態について説明する。
【００５２】
図８に本実施の形態の画像復号装置を示す。本実施の形態の復号装置は図１の符号化装置を用いて符号化されたデータを復号するためのものである。背景画像と立体部品画像ｏｂｊ₁を構成する左目用画像ＶＯＰ₁₁の復号方法は、一般的なブロックマッチングを用いた動き補償予測復号と同じ方法である。立体部品画像ｏｂｊ₁を構成する右目用画像ＶＯＰ₂₁に対しては、左目用画像ＶＯＰ₁₁の復号画像を用いて視差補償を行ない、復号する。
【００５３】
グローバル視差ベクトルとローカル視差ベクトルは視差ベクトル合成器８１に入力され、通常の視差ベクトルが合成される。画像編集部８０には、パラメータ入力部８２より部品画像を画像面に対して平行に動かすときに用いる移動量と、部品画像を画像面に対して垂直に動かすときに用いるグローバル視差ベクトルの変化量が編集用の値として入力される。画像編集部８０には、復号された立体部品画像とグローバル視差ベクトル、及びグローバル視差ベクトルをもつ代表ブロックの位置情報も入力される。画像編集部８０は、これらの入力値を用いて立体部品画像の編集を行う。復号された背景画像と編集後の部品画像は画像合成部８３で１つの画像に合成される。
【００５４】
グローバル視差ベクトルは３次元空間における立体部品画像の奥行き（存在位置）に対応するため、より大きいグローバル視差ベクトルをもつ部品画像は奥行き方向に関してカメラにより近い位置に存在する。画像編集部８０では、そのグローバル視差ベクトルが、小さい立体部品画像から順に出力される。また、画像合成部８３には、まず背景画像が入力され、順次入力された立体部品画像を背景画像上に上書きする。これにより、最初に入力した左右の各画像の上に、複数の部品画像が重なる場合は、奥行き方向のより手前の部品画像が上に書かれることになる。このようにグローバル視差ベクトルを用いることにより、立体部品画像を合成する際に部品画像の重なりを正しく表現することができる。
【００５５】
まず、ｘ方向の移動パラメータを入力することにより、立体部品画像を水平方向に平行移動する場合について説明する。
【００５６】
図９において、原点をＯ₁、水平方向をｘ軸、奥行き方向をｚ軸とする座標系をとる。この座標上にカメラＬとＲをそれぞれａの間隔で光軸がｚ軸と平行になるように、点Ｏ_m（ｍ＝１，２）の位置に配置する。また点Ａ₀（ｘ_A，ｚ_A）を立体部品画像ｏｂｊ₁の代表点とし、点Ａ₀と点Ｏ_mを結び、カメラの仮想画像面と交差する点をそれぞれＡ_m（ｍ＝１，２）とする。
【００５７】
点Ｏ_mを通り、ｚ軸に平行に伸ばした直線がカメラの仮想画像面と交わる点をそれぞれＱ_m（ｍ＝１，２）とする。これらの点Ｑ_mは各カメラによる画像の中心点を表す。さらにカメラの仮想画像面からｘ軸までの距離をｗとし、ｚ＝ｚ_A平面がカメラＬの光軸と交わる点をＳ₁、カメラＲの光軸と交わる点をＳ₂（ａ，ｚ_a）とする。ここでＡ₁、Ａ₂の座標をそれぞれ（ｘ_a1，ｗ）、（ｘ_a2，ｗ）とすると、１次元ベクトルＱ₁Ａ₁、Ｑ₂Ａ₂はそれぞれｘ_a1、ｘ_a2−ａとなる。ここで、△Ｏ₁Ａ₀Ｓ₁∽△Ｏ₁Ａ₁Ｑ₁かつ△Ｏ₂Ａ₀Ｓ₂∽△Ｏ₂Ａ₂Ｑ₂より、
ｚ_A／ｗ＝ｘ_A／ｘ_a1 ＝（ｘ_A−ａ）／（ｘ_a2−ａ）（５）
という関係があることから、点Ａ₀のｘ座標ｘ_Aは、
ｘ_A ＝ａ×ｘ_a1／（ｘ_a1−ｘ_a2＋ａ）＝ａ×ｘ_a1／ｄ_A （６）
と表される。ｄ_Aは物体１００のもつグローバル視差ベクトルを表し、
ｄ_A ＝ｘ_a1 −（ｘ_a2−ａ）（７）
である。
【００５８】
ここで、左画像面上の立体部品画像のｘ方向移動量を、平行方向の移動パラメータＰ_xで表したとする。パラメータＰ_xを入力とすることにより、物体１００は物体１０１の位置に平行移動したとすると、物体１００の代表点Ａ₀（ｘ_A，ｚ_A）は点Ｂ₀（ｘ_B，ｚ_A）に、点Ａ₁は点Ｂ₁（ｘ_a1＋Ｐ_x，ｗ）に平行移動する。ここで物体１０１の代表点Ｂ₀と点Ｏ₂を結んだ線分がカメラの仮想画像面と交わる点を点Ｂ₂（ｘ_b2，ｗ）とすると、△Ｏ₁Ｏ₂Ａ₀∽△Ａ₁Ａ₂Ａ₀かつ△Ｏ₁Ｏ₂Ｂ₀∽△Ｂ₁Ｂ₂Ｂ₀より、
ｚ_A／(ｚ_A−ｗ) ＝ａ／(ｘ_a2−ｘ_a1) ＝ａ／(ｘ_b2−ｘ_a1−Ｐ_x) （８）
という関係があることから、物体１００の代表点Ａ₀が物体１０１の代表点Ｂ₀に移動したときの右画像面上における移動量ｘ_b2−ｘ_a2は、
ｘ_b2−ｘ_a2＝Ｐ_x （９）
となる。
【００５９】
よって、物体１００の代表点Ａ₀ が物体１０１の代表点Ｂ₀に移動したときの左右の両画像面上における移動量はＰ_xに等しい。このとき、式（６）、式（７）と同様にして、点Ｂ₀のｘ座標ｘ_Bは、
ｘ_B ＝ａ×（ｘ_a1＋Ｐ_x）／ｄ_B （１０）
となる。
【００６０】
ｄ_Bは物体１０１のもつグローバル視差ベクトルを表し、
ｄ_B ＝（ｘ_a1＋Ｐ_x）−（ｘ_b2−ａ）（１１）
である。
【００６１】
式（７）、式（９）、式（１１）より、
ｄ_A ＝ｄ_B （１２）
となり、グローバル視差ベクトルｄ_Aとｄ_Bは等しくなる。
【００６２】
このとき式（６）、式（１０）、式（１２）より、
ｘ_B−ｘ_A ＝Ｐ_x／ｘ_a1×ｘ_A （１３）
となる。よって物体を水平方向に移動する際は、以下の１）、２）の手順で行なえばよい。
【００６３】
１）ｘ方向の移動パラメータＰ_xを入力し、左画像上で物体１００の代表点Ａ₀のｘ座標ｘ_a1を、左画像上の代表点Ｂ₀のｘ座標ｘ_a1＋Ｐ_xの位置に平行に動かす。
【００６４】
２）右画像上で、物体１００の代表点Ａ₀のｘ座標ｘ_a2を、左画像上の代表点Ｂ₀のｘ座標ｘ_a2＋Ｐ_xの位置に平行に動かす。
【００６５】
この結果、物体１００は物体１０１の位置に移動し、このときの３次元空間内での移動量は式（１３）よりＰ_x／ｘ_a1×ｘ_Aとなる。
【００６６】
次に、ｚ方向の移動パラメータを入力することにより、立体部品画像を奥行き方向（画像面に対して垂直方向）に移動する場合について図１０を用いて説明する。ただし、図１０で用いる座標系及び物体１００の配置は図９と同じである。ｚ方向の移動パラメータＰ_zによって、移動後のカメラと物体の距離が移動前の距離のＰ_z倍になるとする。図１０において、パラメータＰ_zを入力し、物体１００が点Ｏ₁と点Ａ₀を通る直線上にある物体１０２の位置に移動したとすると、物体１００の代表点Ａ₀（ｘ_A，ｚ_A）は点Ｃ₀（ｘ_C，ｚ_C）に移動するが、点Ａ₁は動かない。このとき、
ｚ_C ＝Ｐ_z×ｚ_A （１４）
となる。点Ｏ₂と点Ｃ₀を結んだ線分がカメラの仮想画像面と交わる点をＣ₂、ｚ＝ｚ_Cの平面とｚ軸が交わる点をＴ₁とする。Ｃ₂の座標をそれぞれ（ｘ_c2，ｗ）とし、物体１００の点Ａ₀を点Ｏ₁と点Ａ₀を通る直線上で移動した後の点Ｃ₀のグローバル視差ベクトルをｄ_Cとする。
【００６７】
またｘ_Cは、式（６）を求めるときと同様にして、
ｘ_C ＝ａ×ｘ_a1／ｄ_C （１５）
となる。ここに
ｄ_C ＝（ｘ_a1 − ｘ_c2 ＋ａ）（１６）
である。△Ｏ₁Ａ₀Ｓ₁∽△Ｏ₁Ｃ₀Ｔ₁より、
ｘ_C／ｘ_A ＝ｚ_C／ｚ_A （１７）
という関係があることから、式（１７）に式（６）と式（１５）を代入してｚ_Cを求めると、
ｚ_C ＝（ｄ_A／ｄ_C）×ｚ_A （１８）
となる。
【００６８】
よって式（１４）と式（１７）よりグローバル視差ベクトルｄ_C、点Ｃ₀のｘ座標ｘ_C及び点Ｃ₂の座標ｘ_c2は、
ｄ_C ＝ｄ_A／Ｐ_z （１９）
ｘ_C ＝Ｐ_z×ａ×ｘ_a1／ｄ_A （２０）
ｘ_c2 ＝ｘ_a1＋ａ − ｄ_A／Ｐ_z （２１）
となる。よって物体１００を物体１０２に動かす場合、左画像上の物体１００の代表点Ａ₁は動かさず、右画像上の点Ａ₂はｘ_c2−ｘ_a2だけ動かせばよい。
【００６９】
ここで物体１０２を−（ｘ_C−ｘ_A）移動させ、それを物体１０３とする。物体１０２の代表点Ｃ₀の移動した点を物体１０３の代表点Ｄ₀とする。さらに点Ｄ₀とＯ₁を結んだ直線がカメラの仮想画像面と交わる点をＤ₁（ｘ_d1，ｗ）とする。点Ｄ₁のｘ座標ｘ_d1は、式（６）、式（１９）より
ｘ_d1 ＝ｘ_a1／Ｐ_z （２２）
となる。よって物体１０２を物体１０３に動かす場合、左画像上では点Ｂ₁を、右画像上では点Ｂ₂をそれぞれｘ_a1−ｘ_d1だけ動かせばよい。従って、物体１００を奥行き方向（物体１０３の位置）に移動する場合、以下の３）、４）の手順で行なう。
【００７０】
３）ｚ方向の移動パラメータＰ_zを入力し、式（１９）によりグローバル視差ｄ_Cを求め、立体部品画像の左画像上の点Ａ１を水平方向に（ｘ_d1−ｘ_a1）動かす。
【００７１】
４）式（２１）及び式（２２）より、ｘ_c2とｘ_d1の値を求め、立体部品画像の右画像上の点Ａ₂を（ｘ_c2−ｘ_a2＋ｘ_a1−ｘ_d1）動かす。
【００７２】
ゆえに視差ベクトルをグローバル視差ベクトルとローカル視差ベクトルの２段階にしてもつことにより、ｚ方向の移動パラメータＰ_zを入力し、グローバル視差ベクトルを変化させて、個々の部品画像の奥行き方向の位置を簡単に変えることができる。また、水平方向の移動（手順１），２））と奥行き方向の移動（手順３），４））を組み合わせることにより、３次元空間内の任意の位置に物体を移動させることができる。
【００７３】
なお上下方向の移動に関しては視差は関係がないので、単に画面上で平行移動すればよい。
【００７４】
また本実施の形態では２眼式の立体部品画像を用いたが、多視点立体部品画像においても同様である。
【００７５】
【発明の効果】
以上説明したように本発明の符号化装置によれば、立体部品画像を符号化するときに用いるすべての視差ベクトルに対して、そのうちの１本の視差ベクトルをグローバル視差ベクトル（３次元空間における立体部品画像の奥行きに対応）とし、このグローバル視差ベクトルと他の視差ベクトルとの差分をとったものをローカル視差ベクトル（立体部品画像内での局所的な奥行き分布に対応）として、視差ベクトルを２つに分けてもつことによって、視差ベクトルの情報量を低減することができ、符号化効率を向上することができる。本発明においては、グローバル視差ベクトルとローカル視差ベクトルの２種類のベクトルを用いるが、これらを求める際の検索は１回でよいので、２回検索する場合と比べて計算量を削減することができる。
【００７６】
本発明の符号化装置によれば、立体画像を背景画像と立体部品画像に分けて符号化するとき、全ての視点の画像に対して視差のない部分を１つの背景画像として符号化することによって、背景画像の情報量が低減でき、符号化の効率を向上することができる。
【００７７】
本発明の符号化装置によれば、立体画像から切り出した立体部品画像と背景画像に分けて符号化を行なうことによって、立体画像内において急激な視差の変化が生じることがなく、マッチングするときの信頼度が向上し、また隣合う位置のローカル視差ベクトルの差分をとることによりローカル視差ベクトルの情報量を低減する場合でも、絶対量の大きいものが発生しない。
【００７８】
本発明の符号化装置によれば、立体部品画像内の全ての視差ベクトルの平均をグローバル視差ベクトルとすることによって、ローカル視差ベクトルの情報量を低減することができる。
【００７９】
本発明の符号化装置によれば、立体部品画像内において最初に求めた視差ベクトルをグローバル視差ベクトルとすることによって、グローバル視差ベクトルを選択する際の計算量が低減される。
【００８０】
本発明の符号化装置によれば、立体部品画像内における視差ベクトルの大きさをヒストグラムにして、最も発生頻度の多い視差ベクトルをグローバル視差ベクトルとすることによって、３次元空間における立体部品画像の奥行きを適切に表現することができる。
【００８１】
本発明の符号化装置によれば、現フレームあるいはフィールドのグローバル視差ベクトルと１つ前のフレームあるいはフィールドのグローバル視差ベクトルの差分をとることによって、グローバル視差ベクトルの情報量を減らすことができる。
【００８２】
本発明の符号化装置によれば、空間方向に関して、１つの多視点立体部品画像に対してグローバル視差ベクトルは１つもてばよいので、グローバル視差ベクトルの計算量及び情報量が低減でき、符号化の効率を向上することができる。
【００８３】
本発明の復号装置において、多視点の各画像を合成する際に複数の部品画像が重なる部分では、より大きいグローバル視差をもつ部品画像を上書きすることによって、重なりが正しく表現された多視点立体画像を復号することができる。
【００８４】
本発明の復号装置において、奥行き方向の移動パラメータを入力し、立体部品画像のグローバル視差ベクトルの値を変えることによって、簡単に任意の奥行きをもつ立体部品画像を復号することができる。
【図面の簡単な説明】
【図１】本発明の一実施の形態であり、背景画像と複数の立体部品画像を符号化する符号化装置の構成図である。
【図２】カメラによる入力画像から立体部品画像を切り出すことを説明する図である。
【図３】立体部品画像の符号化を行なう順序を説明する図である。
【図４】視差ベクトルの算出例を説明する図である。
【図５】視差ベクトルをグローバル視差ベクトルとローカル視差ベクトルで表すことを説明する図である。
【図６】本発明の別の一実施の形態であり、多視点立体部品画像を符号化するときの符号化装置の構成図である。
【図７】多視点立体部品画像を符号化する際に、空間方向に対しグローバル視差ベクトルを共通にもつことを説明する図である。
【図８】本発明の一実施の形態であり、背景画像と複数の立体部品画像を復号する復号装置の構成図である。
【図９】立体部品画像の平行移動を説明する図である。
【図１０】立体部品画像の奥行き方向の移動を説明する図である。
【図１１】多視点画像の圧縮に関する従来例の説明図である。
【図１２】２段階の動きベクトルを求める従来例の説明図である。
【符号の説明】
１背景画像
２原画像
３切り出し装置
４３，４４，４５，４６立体部品画像
５画像入力部
６動き補償部
７，１８減算器
８，１９変換部
９，２０量子化部
１０可変長符号化部
１１逆量子化部
１２逆変換部
１３，１５，５１加算器
１４フレームメモリ
１６，５３視差補償部
１７，５２視差ベクトル分割器
２１多重化部
３１左目用立体部品画像
３２右目用立体部品画像
３３合成画像
５４差分器
６０比較器
７０大ブロック
７１小ブロック
８０編集部
８１視差ベクトル合成器
８２外部入力部
８３画像合成部
１００，１０１，１０２，１０３物体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a high-efficiency encoding and decoding apparatus for multi-view images using motion compensation or parallax compensation prediction, particularly in high-efficiency image encoding and decoding.
[0002]
[Prior art]
In multi-view image encoding, encoding efficiency is improved by combining motion compensation, which is commonly used for moving image encoding, and parallax compensation for images with different viewpoints. The method of making it known is well known. For example, according to the literature (WA Sup, Yasuda, “Data compression of stereo moving images using parallax compensation and motion compensation”, PCSJ88 (1988), pp. 63-64), as shown in FIG. Only motion compensation is performed on the right image. For the left image, a compensation method with a smaller prediction error is selected from among parallax compensation and motion compensation, and is encoded.
[0003]
Japanese Patent Laid-Open No. 6-113283 discloses an example in which a motion vector is obtained by changing the block size when performing motion compensation by block matching in moving image encoding. In this method, as shown in FIG. 12, first, motion compensation is performed using the small block 71 to obtain a motion vector for the small block 71 (hereinafter abbreviated as a small block vector). A plurality of small blocks 71 are collected to form a large block, motion compensation is performed, and a motion vector for the large block (hereinafter abbreviated as a large block vector) is obtained.
[0004]
Assuming that the large block 70 is composed of four small blocks, the small block vector exists within a predetermined range with respect to the large block vector, and the number of small block vectors existing therein exceeds a predetermined threshold. In this case, the amount of motion vector information is reduced by transmitting a large block vector and a difference between the small block vector and the large block vector.
[0005]
[Problems to be solved by the invention]
However, in the conventional method, when performing parallax compensation, the parallax compensation is performed on all the parts with parallax and the parts without parallax. In addition, the size of the obtained disparity vector is usually larger than the size of the motion vector obtained in motion compensation, and is encoded with the size as it is, so that the encoding efficiency of the disparity vector is not good. There is a point.
[0006]
In this method, since it is selectively determined whether to use motion compensation or parallax compensation, the parallax vector is not necessarily transmitted. Therefore, when a plurality of objects are stereoscopically displayed, editing such as changing only the depth of some objects cannot be performed.
[0007]
On the other hand, in the method of encoding a moving image with two types of motion vectors, a large block and a small block, the amount of calculation becomes enormous because it is necessary to perform a two-stage search over the entire screen. Further, since the large block is composed of a plurality of small blocks, only two types of vectors having different degrees of reliability and their differences are obtained, and there is no difference in physical meaning between these vectors. Further, since these two types of motion vectors cannot have different actions, it is difficult to use them for component image editing.
[0008]
The object of the present invention is to improve the encoding efficiency of a disparity vector by dividing the disparity vector into two stages while realizing highly efficient encoding by using motion compensation and disparity compensation prediction in the encoding of multi-viewpoint images. To provide a parallax compensation encoding and decoding apparatus that can reduce the amount of calculation when obtaining a parallax vector and can easily edit a three-dimensional component image using a parallax vector divided into two stages. is there.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, the encoding apparatus of the present invention divides and inputs a multi-viewpoint stereoscopic image into at least one component image with parallax and at least one background image without parallax, A motion compensation unit that performs motion compensation prediction using a motion amount between frames or fields, a parallax compensation unit that performs parallax compensation prediction using a parallax amount between frames or fields of a component image, and a disparity vector division unit A multi-viewpoint encoding apparatus provided for each component image, wherein the disparity vector dividing unit sets one of the disparity vectors obtained by the disparity compensation unit as a global disparity vector, and determines a global disparity vector from the remaining disparity vectors. A value obtained by subtracting is output as a local disparity vector.
[0010]
In the encoding device of the present invention, an input means for dividing and inputting a multi-view stereoscopic image having three or more viewpoints into at least one component image having parallax and at least one background image having no parallax, and a frame of the component image or Each component image includes a motion compensation unit that performs motion compensation prediction using a motion amount between fields, and a parallax compensation unit that performs parallax compensation prediction using a parallax amount between frames or fields of component images. A multi-viewpoint encoding apparatus including one disparity vector dividing unit and at least one differentiator for each component image, wherein the disparity vector dividing unit calculates one of disparity vectors obtained between specific viewpoints. When the global disparity vector is output and the remaining disparity vector minus the global disparity vector is output as the local disparity vector Moni, the differentiator is characterized by outputting a difference between all the disparity vector and the global parallax vector obtained between other viewpoints.
[0011]
In the present invention, the average of all the parallax vectors in the three-dimensional component image may be used as the global parallax vector.
[0012]
In the present invention, the parallax vector obtained first in the three-dimensional component image may be used as the global parallax vector.
[0013]
In the present invention, the parallax vector having the highest frequency of occurrence may be used as the global parallax vector by using the magnitude of the parallax vector in the three-dimensional component image as a histogram.
[0014]
In the present invention, the difference between the current frame or field global disparity vector and the previous frame or field global disparity vector may be used as the current frame or field global disparity vector.
[0015]
In the present invention, a difference from the local parallax vector obtained immediately before may be used as the local parallax vector.
[0016]
In the decoding device of the present invention, the multi-view image decoding device includes a motion compensation unit that performs motion compensation and a parallax compensation unit that performs parallax compensation using the motion amount and the parallax amount between frames or fields of a multi-view stereoscopic image. A parallax vector synthesizing unit that outputs the parallax vector to the parallax compensation unit using the input global parallax vector and the local parallax vector, a parameter input unit that outputs a movement parameter in the depth direction to the image editing unit, and an input The image editing means for changing the depth of the three-dimensional component image using the set parameters and outputting it to the image synthesizing means, and the image synthesizing means for synthesizing the input background image and the three-dimensional component image are provided.
[0017]
In the present invention, the image editing unit sequentially outputs a three-dimensional component image having a small global parallax vector, and the image synthesizing unit outputs the three-dimensional component image sequentially input after the background image is input on the background image. You may make it synthesize | combine an image by overwriting.
[0018]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail.
[0019]
FIG. 1 shows a first embodiment of an image encoding apparatus according to the present invention. This embodiment is a twin-lens example, and an image input to this image encoding apparatus is a background image without parallax and a three-dimensional component image with parallax. Both the background image and the three-dimensional component image may be either a still image or a moving image, but both are assumed to be moving images here.
[0020]
Here, for example, as shown in FIG. 2, the three-dimensional component image is an image obtained by inputting a stereo image 2 photographed by arranging two cameras in parallel to the clipping device 3 and cutting out only a gaze object. In addition, the VOP (Video Object Plane) in the image input unit 5 in FIG. 2 means each component when a composite image is formed by a plurality of component images, and is usually composed of shape data and texture data. However, here, in order to simplify the description, it is handled as a component image divided into regions of arbitrary shapes. Solid part image obj _n (Different parts are distinguished by n = 1, 2,...) Is the same number of VOPs as the number of viewpoints. _mn (M = 1, 2,..., Different viewpoints are distinguished, and in the case of a two-lens system, m = 1, 2). ₀ Is a background image 1. At this time, the interval between the two cameras may be the same as the human eye. In addition, since the two cameras are arranged in parallel and adjusted so that there is no vertical displacement or distortion, the parallax of the image cut out from the clipping device has no vertical component and only a horizontal component. Shall.
[0021]
First, a background image encoding method will be described with reference to FIG. In moving picture coding, a method called motion compensation predictive coding using block matching in which a coding unit is coded as a block (for example, composed of 16 × 16 pixels) is well known. In the present embodiment, the background image encoding method is the same as the motion compensated prediction encoding method using block matching. For example, background image VOP ₀ The motion compensation unit 6 performs motion compensation using the decoded image of the previous frame or field, and the detected motion vector is transmitted to the variable length coding unit 10.
[0022]
Then the background image VOP is subtracted by the subtractor 7 ₀ The difference between the motion compensated images is taken, and the difference data is transmitted to the variable length coding unit 10 through the conversion unit 8 and the quantization unit 9. The difference data is input to the variable-length encoding unit 10, passes through the inverse quantization unit 11 and the inverse conversion unit 12, and the image compensated for motion by the adder 13 is added to the background image VOP. ₀ The decoded image is stored in the frame memory 14. After this, the next input is encoded by this repetition.
[0023]
Next, encoding of a three-dimensional component image will be described. Solid part image obj ₁ The order of encoding the will be described with reference to FIG. Encoding is performed according to the following procedures (a) to (d).
[0024]
(A) Solid part image obj ₁ The left-eye component image 43 is encoded within a frame or field.
[0025]
(B) Frame or inter-field coding is performed on the right-eye component image 44 by parallax compensation using a decoded image of the component image 43.
[0026]
(C) The component image 45 is subjected to frame or inter-field encoding by motion compensation using a decoded image of the component image 43.
[0027]
(D) By the parallax compensation using the decoded image of the component image 45, the component image 46 for the right eye is subjected to frame or inter-field encoding.
[0028]
That is, the left-eye image is encoded by motion compensation, and the right-eye image is encoded by parallax compensation using the left-eye image at the same time. Thereafter, encoding is performed by repeating the procedures (c) and (d). However, when encoding by repeating the procedures of (c) and (d), encoding by the procedures of (a) and (b) is performed instead of the procedures of (c) and (d) in order to prevent propagation of prediction errors. , It may be performed at a constant cycle.
[0029]
In FIG. 1, the three-dimensional component image obj ₁ Left-eye image VOP ₁₁ The encoding is performed in the same manner as in the background image encoding. Similarly three-dimensional part image obj ₁ Right-eye image VOP _{twenty one} Is the left-eye image VOP ₁₁ Using the decoded image, parallax compensation by block matching is performed and encoding is performed. At this time, the right-eye image VOP is sent to the parallax compensation unit 16. _{twenty one} And the left-eye image VOP from the adder 15 ₁₁ The decoded image is input, and a disparity vector is obtained.
[0030]
The parallax vector is obtained by superimposing the left-eye component image 31 and the right-eye component image 32 on the same plane as shown in FIG. 4 and matching the target block as shown in FIG. . Therefore, one disparity vector is obtained for one block. The obtained disparity vector is input to the disparity vector divider 17 and divided into one global disparity vector and a local disparity vector obtained by subtracting the global disparity vector from each disparity vector.
[0031]
That is, in the three-dimensional component image, the global parallax vector represents the parallax of one predetermined block, and the local parallax vector is obtained by subtracting the global parallax vector from the parallax of other blocks. For example, as shown in FIG. ₁ Is the global disparity vector GV, the disparity vector v ₂ The global disparity vector GV minus the local disparity vector LV ₂ And the disparity vector v ₂ Is GV + LV ₂ Can be expressed as
[0032]
Therefore, k (k = 1, 2,...) Disparity vectors can be represented by one global disparity vector and k−1 local disparity vectors. At this time, the global disparity vector corresponds to the depth (existing position) of the three-dimensional component image in the three-dimensional space, and the local disparity vector corresponds to the local depth distribution (three-dimensional shape) in the three-dimensional component image.
[0033]
At this time, the information amount of the local parallax vector can be reduced by setting the average of all the parallax vectors as the global parallax vector. The parallax vector obtained first may be used as a global parallax vector, which reduces the amount of calculation when selecting the global parallax vector. Further, the position of the three-dimensional component image in the three-dimensional space can be appropriately expressed by using the magnitude of the parallax vector as a histogram and the parallax vector with the highest occurrence frequency as the global parallax vector.
[0034]
In general, the number of disparity vectors can be reduced by having one global disparity vector and k-1 local disparity vectors having a smaller value than that having k disparity vectors.
[0035]
Also, in a stereoscopic image, there is little change in the amount of parallax between adjacent blocks in the stereoscopic image, but when there are multiple objects in the stereoscopic image and these objects are not componentized, the parallax in the part where different objects overlap May change rapidly. In the present invention, since encoding is performed for each component using a three-dimensional component image, such a rapid change in parallax does not occur in one three-dimensional component image. Therefore, the information amount of the local parallax vector can be further reduced by taking the difference between the local parallax vectors of adjacent blocks.
[0036]
These vectors are transmitted to the variable length coding unit 10 in FIG. In the subtractor 18, the right-eye image VOP _{twenty one} And the difference-compensated image are taken and transmitted to the variable length coding unit 10 through the conversion unit 19 and the quantization unit 20. Hereinafter, the next input is encoded by this repetition. Other 3D component images are encoded in the same manner. The data transmitted to the variable length coding unit 10 is variable length coded there and multiplexed by the multiplexing unit 21.
[0037]
Here, one background image and n number of three-dimensional component images are encoded, but the number of background images is not limited to one and may be plural. In FIG. 3, since the global disparity vector does not change so rapidly in the time direction, the current global disparity vector (for example, the global disparity vector obtained from the left-eye image 45 and the right-eye image 46) and one frame or field before The amount of information of the global parallax vector can be reduced by taking the difference between the global parallax vectors (for example, the global parallax vector obtained from the left-eye image 43 and the right-eye image 44).
[0038]
In the present embodiment, encoding of a twin-lens stereoscopic component image has been described. However, encoding can be similarly performed in a multi-viewpoint stereoscopic component image.
[0039]
Next, a second embodiment of the present invention will be described. In the present embodiment, the encoding apparatus according to the first embodiment is adapted to support multi-viewpoint stereoscopic images. At this time, the background image is encoded by the same method as in the first embodiment.
[0040]
FIG. 6 shows a second embodiment of the image coding apparatus according to the present invention. The input image in FIG. 6 is a multi-viewpoint image VOP. _mn (M = 1, 2,..., N = 1, 2,...). Image VOP ₁₁ And image VOP _{twenty one} Is the image VOP in FIG. ₁₁ And image VOP ₁₂ Each is encoded in the same manner as in the above. Decoded image VOP _{twenty one} Is output from the adder 51 and input to the parallax compensation unit 53. Further image VOP ₁₁ And image VOP _{twenty one} The global parallax vector obtained using is input to the parallax compensation unit 53.
[0041]
In the parallax compensation unit 53, VOP _{twenty one} Input image VOP using the decoded image of ₃₁ Parallax compensation is performed to obtain a parallax vector. Obtained disparity vector and image VOP ₁₁ And image VOP _{twenty one} The global disparity vector obtained using is input to the differentiator 54, and the difference between them is output as a local disparity vector. That is, it is not necessary to newly obtain a global parallax vector here.
[0042]
Here, it will be described that global parallax can be shared even for multi-viewpoint images. In order to simplify the description, a three-dimensional component image having three viewpoints will be described.
[0043]
For example, as shown in FIG. ₁ , A coordinate system in which the x-axis is the horizontal direction and the z-axis is the depth direction. Camera group C on this coordinate _m (M = 1, 2, 3) at intervals of a, the point O so that the optical axis is parallel to the z-axis. _m It arrange | positions in the position of (m = 1,2,3). As in the first embodiment, it is assumed that the camera is adjusted so that there is no vertical displacement or distortion.
[0044]
Point A ₀ 3D part image obj ₁ Point A and point A ₀ And point O _m And the points intersecting the virtual image plane of the camera (showing the image plane taken by the camera) _m (M = 1, 2, 3). Point O _m Each point where a straight line extending in parallel with the z-axis intersects the virtual image plane of the camera _m (M = 1, 2, 3). These points Q _m Represents the center point of the image from each camera.
[0045]
Where A ₁ , A ₂ , A _Three X coordinates of x ₁ , X ₂ , X _Three And Since the virtual image plane of the camera and the x-axis are parallel, | A ₁ A ₂ | And | A ₂ A _Three | Is equal. This and x ₁ <X ₂ <X _Three Than,
x ₁ -X ₂ = X ₂ -X _Three (1)
It becomes.
[0046]
Camera C ₁ And C ₂ Disparity vector d ₁ Is a one-dimensional vector since it has only a horizontal component, and the vector Q ₁ A ₁ To vector Q ₂ A ₂ So that
d ₁ = X ₁ − (X ₂ -A ₁ (2)
It becomes.
[0047]
Next, camera C ₂ And C _Three Disparity vector d ₂ Is the vector Q _Three A _Three To vector Q ₁ A ₂ Minus
d ₂ = X ₂ − (X _Three -A) (3)
It becomes.
[0048]
Therefore, from Formula (1)-Formula (3),
d ₁ = D ₂ (4)
It becomes. In this way, the parallax vectors at the representative points between adjacent cameras are all equal. This is not limited to a three-viewpoint stereoscopic image, and the same applies to the case of a viewpoint number other than three.
[0049]
Therefore, in this method, since the global disparity vector can be regarded as a disparity vector at a representative point, only one global disparity vector is required regardless of the number of viewpoints, and calculation for obtaining a disparity vector in a multi-viewpoint image is possible. The amount of information and the amount of information of the disparity vector itself can be reduced.
[0050]
Therefore, in FIG. 6, the value output from the differentiator 54 is calculated from the disparity vector output from the disparity compensation unit 53 by VOP. ₁₁ And VOP _{twenty one} The local disparity vector obtained by subtracting the global disparity vector obtained between The other VOPs in FIG. 6 are similarly encoded.
[0051]
Next, a third embodiment of the present invention will be described.
[0052]
FIG. 8 shows an image decoding apparatus according to the present embodiment. The decoding apparatus of this embodiment is for decoding data encoded using the encoding apparatus of FIG. Background image and 3D part image obj ₁ Left-eye image VOP ₁₁ This decoding method is the same method as motion compensation predictive decoding using general block matching. Solid part image obj ₁ Right-eye image VOP _{twenty one} For the left-eye image VOP ₁₁ The decoded image is used to perform parallax compensation and decode.
[0053]
The global disparity vector and the local disparity vector are input to the disparity vector combiner 81, and a normal disparity vector is combined. The image editing unit 80 uses the parameter input unit 82 to move the component image in parallel with the image plane, and to change the global parallax vector used to move the component image perpendicular to the image plane. Is entered as the value for editing. The image editing unit 80 also receives the decoded three-dimensional component image, the global disparity vector, and the position information of the representative block having the global disparity vector. The image editing unit 80 edits the three-dimensional component image using these input values. The decoded background image and the edited component image are combined into one image by the image combining unit 83.
[0054]
Since the global parallax vector corresponds to the depth (existing position) of the three-dimensional part image in the three-dimensional space, the part image having a larger global parallax vector exists at a position closer to the camera in the depth direction. In the image editing unit 80, the global parallax vector is output in order from the smallest three-dimensional component image. Further, a background image is first input to the image composition unit 83, and the sequentially input three-dimensional component image is overwritten on the background image. As a result, when a plurality of component images overlap each of the left and right images that are input first, the component image that is closer to the front in the depth direction is written upward. By using the global parallax vector in this way, it is possible to correctly represent the overlapping of the component images when synthesizing the three-dimensional component image.
[0055]
First, a description will be given of a case where a three-dimensional component image is translated in the horizontal direction by inputting a movement parameter in the x direction.
[0056]
In FIG. 9, the origin is O ₁ The coordinate system has an x axis in the horizontal direction and a z axis in the depth direction. On this coordinate, the points O and L are arranged so that the optical axis is parallel to the z-axis at intervals of a. _m It arrange | positions at the position of (m = 1, 2). Point A ₀ (X _A , Z _A ) The three-dimensional image obj ₁ Point A and point A ₀ And point O _m , And each point that intersects the virtual image plane of the camera _m (M = 1, 2).
[0057]
Point O _m Each point where a straight line extending in parallel with the z-axis intersects the virtual image plane of the camera _m (M = 1, 2). These points Q _m Represents the center point of the image from each camera. Further, the distance from the virtual image plane of the camera to the x axis is w, and z = z _A The point where the plane intersects the optical axis of camera L is S ₁ , The point that intersects the optical axis of camera R is S ₂ (A, z _a ). Where A ₁ , A ₂ The coordinates of (x _a1 , W), (x _a2 , W), the one-dimensional vector Q ₁ A ₁ , Q ₂ A ₂ Are each x _a1 , X _a2 -A. Where △ O ₁ A ₀ S ₁ ∽ △ O ₁ A ₁ Q ₁ And △ O ₂ A ₀ S ₂ ∽ △ O ₂ A ₂ Q ₂ Than,
z _A / W = x _A / X _a1 = (X _A -A) / (x _a2 -A) (5)
Point A ₀ X coordinate x _A Is
x _A = A × x _a1 / (X _a1 -X _a2 + A) = a × x _a1 / D _A (6)
It is expressed. d _A Represents the global disparity vector of the object 100,
d _A = X _a1 -(X _a2 -A) (7)
It is.
[0058]
Here, the amount of movement in the x direction of the three-dimensional component image on the left image plane is set as the parallel movement parameter P _x Suppose that Parameter P _x , If the object 100 is translated to the position of the object 101, the representative point A of the object 100 ₀ (X _A , Z _A ) Is point B ₀ (X _B , Z _A ), Point A ₁ Is point B ₁ (X _a1 + P _x , W). Here, the representative point B of the object 101 ₀ And point O ₂ Point B is the point where the line segment connecting the crosses the virtual image plane of the camera ₂ (X _b2 , W) ₁ O ₂ A ₀ ∽ △ A ₁ A ₂ A ₀ And △ O ₁ O ₂ B ₀ ∽ △ B ₁ B ₂ B ₀ Than,
z _A / (Z _A −w) = a / (x _a2 -X _a1 ) = A / (x _b2 -X _a1 -P _x (8)
Therefore, the representative point A of the object 100 ₀ Is the representative point B of the object 101 ₀ X on the right image plane when moving to _b2 -X _a2 Is
x _b2 -X _a2 = P _x (9)
It becomes.
[0059]
Therefore, the representative point A of the object 100 ₀ Is the representative point B of the object 101 ₀ The amount of movement on both the left and right image planes when moving to _x be equivalent to. At this time, in the same manner as in equations (6) and (7), point B ₀ X coordinate x _B Is
x _B = A × (x _a1 + P _x ) / D _B (10)
It becomes.
[0060]
d _B Represents the global disparity vector of the object 101,
d _B = (X _a1 + P _x )-(X _b2 -A) (11)
It is.
[0061]
From Equation (7), Equation (9), and Equation (11),
d _A = D _B (12)
And the global disparity vector d _A And d _B Are equal.
[0062]
At this time, from Equation (6), Equation (10), and Equation (12),
x _B -X _A = P _x / X _a1 X _A (13)
It becomes. Therefore, when the object is moved in the horizontal direction, the following steps 1) and 2) may be performed.
[0063]
1) Movement parameter P in the x direction _x And the representative point A of the object 100 on the left image ₀ X coordinate x _a1 To the representative point B on the left image ₀ X coordinate x _a1 + P _x Move parallel to the position of.
[0064]
2) The representative point A of the object 100 on the right image ₀ X coordinate x _a2 To the representative point B on the left image ₀ X coordinate x _a2 + P _x Move parallel to the position of.
[0065]
As a result, the object 100 moves to the position of the object 101, and the amount of movement in the three-dimensional space at this time is P from the equation (13). _x / X _a1 X _A It becomes.
[0066]
Next, a case where the three-dimensional component image is moved in the depth direction (perpendicular to the image plane) by inputting a movement parameter in the z direction will be described with reference to FIG. However, the arrangement of the coordinate system and the object 100 used in FIG. 10 is the same as in FIG. z movement parameter P _z The distance between the camera after moving and the object is P _z Suppose that it doubles. In FIG. 10, the parameter P _z , The object 100 is point O ₁ And point A ₀ And the representative point A of the object 100 is moved to the position of the object 102 on the straight line passing through ₀ (X _A , Z _A ) Is point C ₀ (X _C , Z _C ) But point A ₁ Does not move. At this time,
z _C = P _z Xz _A (14)
It becomes. Point O ₂ And point C ₀ The point where the line segment connecting the crosses the virtual image plane of the camera is C ₂ , Z = z _C The point where the z plane intersects the plane of T ₁ And C ₂ The coordinates of (x _c2 , W) and point A of the object 100 ₀ Point O ₁ And point A ₀ Point C after moving on a straight line passing through ₀ The global disparity vector of d _C And
[0067]
X _C Is the same as when calculating Equation (6),
x _C = A × x _a1 / D _C (15)
It becomes. here
d _C = (X _a1 -X _c2 + A) (16)
It is. △ O ₁ A ₀ S ₁ ∽ △ O ₁ C ₀ T ₁ Than,
x _C / X _A = Z _C / Z _A (17)
Therefore, by substituting Equation (6) and Equation (15) into Equation (17), z _C Ask for
z _C = (D _A / D _C ) Xz _A (18)
It becomes.
[0068]
Therefore, the global disparity vector d is obtained from the equations (14) and (17). _C , Point C ₀ X coordinate x _C And point C ₂ Coordinate x _c2 Is
d _C = D _A / P _z (19)
x _C = P _z × a × x _a1 / D _A (20)
x _c2 = X _a1 + A-d _A / P _z (21)
It becomes. Therefore, when moving the object 100 to the object 102, the representative point A of the object 100 on the left image ₁ Does not move, point A on the right image ₂ Is x _c2 -X _a2 Just move it.
[0069]
Here, the object 102 is changed to − (x _C -X _A ) Move it to make it an object 103. Representative point C of object 102 ₀ Is the representative point D of the object 103. ₀ And Point D ₀ And O ₁ The point where the straight line connecting the crosses the virtual image plane of the camera is D ₁ (X _d1 , W). Point D ₁ X coordinate x _d1 From Equation (6) and Equation (19)
x _d1 = X _a1 / P _z (22)
It becomes. Therefore, when moving the object 102 to the object 103, the point B on the left image ₁ , Point B on the right image ₂ X _a1 -X _d1 Just move it. Therefore, when the object 100 is moved in the depth direction (the position of the object 103), the following steps 3) and 4) are performed.
[0070]
3) z-direction movement parameter P _z And the global parallax d according to equation (19) _C The point A1 on the left image of the three-dimensional component image is _d1 -X _a1 )move.
[0071]
4) From equation (21) and equation (22), x _c2 And x _d1 And the point A on the right image of the three-dimensional component image ₂ (X _c2 -X _a2 + X _a1 -X _d1 )move.
[0072]
Therefore, the movement parameter P in the z direction is obtained by making the disparity vector into two stages of the global disparity vector and the local disparity vector. _z And changing the global parallax vector, the position of each component image in the depth direction can be easily changed. Further, by combining the horizontal movement (procedures 1) and 2)) and the depth movement (procedures 3 and 4)), the object can be moved to an arbitrary position in the three-dimensional space.
[0073]
It should be noted that since the parallax is not related to the movement in the vertical direction, it may be simply moved in parallel on the screen.
[0074]
In the present embodiment, a twin-lens three-dimensional component image is used, but the same applies to a multi-view three-dimensional component image.
[0075]
【The invention's effect】
As described above, according to the encoding device of the present invention, for all the disparity vectors used when encoding a three-dimensional component image, one of the disparity vectors is converted into a global disparity vector (a three-dimensional space in a three-dimensional space). Corresponding to the depth of the component image), and the difference between this global disparity vector and another disparity vector is taken as the local disparity vector (corresponding to the local depth distribution in the three-dimensional component image), and the disparity vector is 2 By dividing the information into two, the information amount of the disparity vector can be reduced, and the encoding efficiency can be improved. In the present invention, two types of vectors, a global disparity vector and a local disparity vector, are used. However, since the search for obtaining these vectors may be performed once, the amount of calculation can be reduced as compared with the case of searching twice. .
[0076]
According to the encoding apparatus of the present invention, when a stereoscopic image is encoded by dividing it into a background image and a stereoscopic component image, by encoding a part having no parallax as a single background image for all viewpoint images. Therefore, the amount of information of the background image can be reduced, and the encoding efficiency can be improved.
[0077]
According to the encoding device of the present invention, when encoding is performed separately for a three-dimensional component image cut out from a three-dimensional image and a background image, a sudden change in parallax does not occur in the three-dimensional image, and matching is performed. Even when the reliability is improved and the information amount of the local parallax vector is reduced by taking the difference between the local parallax vectors at adjacent positions, a large absolute amount is not generated.
[0078]
According to the encoding apparatus of the present invention, the information amount of the local parallax vector can be reduced by setting the average of all the parallax vectors in the three-dimensional component image as the global parallax vector.
[0079]
According to the encoding device of the present invention, the amount of calculation when selecting a global disparity vector is reduced by using the first disparity vector obtained in the three-dimensional component image as the global disparity vector.
[0080]
According to the encoding device of the present invention, the depth of the three-dimensional component image in the three-dimensional space is obtained by setting the magnitude of the disparity vector in the three-dimensional component image as a histogram and setting the most frequently occurring disparity vector as the global disparity vector. Can be expressed appropriately.
[0081]
According to the encoding apparatus of the present invention, the information amount of the global disparity vector can be reduced by calculating the difference between the global disparity vector of the current frame or field and the global disparity vector of the previous frame or field.
[0082]
According to the encoding apparatus of the present invention, since only one global disparity vector is required for one multi-viewpoint three-dimensional component image with respect to the spatial direction, the amount of calculation and the amount of information of the global disparity vector can be reduced. Efficiency can be improved.
[0083]
In the decoding apparatus of the present invention, when a plurality of component images overlap when a plurality of component images are combined, a component image having a larger global parallax is overwritten to overwrite a multi-view stereoscopic image in which the overlap is correctly expressed. Can be decrypted.
[0084]
In the decoding device of the present invention, a three-dimensional component image having an arbitrary depth can be easily decoded by inputting a movement parameter in the depth direction and changing the value of the global parallax vector of the three-dimensional component image.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of an encoding apparatus that encodes a background image and a plurality of three-dimensional component images according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining cutting out a three-dimensional component image from an input image by a camera.
FIG. 3 is a diagram illustrating an order in which a three-dimensional component image is encoded.
FIG. 4 is a diagram illustrating an example of calculating a disparity vector.
FIG. 5 is a diagram illustrating that a disparity vector is represented by a global disparity vector and a local disparity vector.
FIG. 6 is a configuration diagram of an encoding apparatus when encoding a multi-viewpoint three-dimensional component image according to another embodiment of the present invention.
FIG. 7 is a diagram for explaining that a global parallax vector is shared in the spatial direction when a multi-viewpoint three-dimensional component image is encoded.
FIG. 8 is a configuration diagram of a decoding apparatus that decodes a background image and a plurality of three-dimensional component images according to an embodiment of the present invention.
FIG. 9 is a diagram for explaining parallel movement of a three-dimensional component image.
FIG. 10 is a diagram for explaining movement in the depth direction of a three-dimensional component image.
FIG. 11 is an explanatory diagram of a conventional example relating to compression of a multi-viewpoint image.
FIG. 12 is an explanatory diagram of a conventional example for obtaining a two-stage motion vector.
[Explanation of symbols]
1 Background image
2 Original image
3 Cutting device
43, 44, 45, 46 3D parts image
5 Image input section
6 Motion compensation unit
7,18 subtractor
8,19 Conversion unit
9,20 Quantization unit
10 Variable length coding unit
11 Inverse quantization section
12 Inverse conversion unit
13, 15, 51 Adder
14 frame memory
16, 53 Parallax compensation unit
17, 52 Parallax vector divider
21 Multiplexer
31 Three-dimensional image for left eye
32 Three-dimensional image for right eye
33 Composite image
54 Differentiator
60 comparator
70 large blocks
71 small blocks
80 editorial department
81 Disparity vector synthesizer
82 External input section
83 Image composition unit
100, 101, 102, 103 objects

Claims

Input means for inputting the multi-viewpoint image with at least one parallax, and motion compensation means for performing motion compensated prediction using a motion amount between frames or fields for at least one image of said multi-viewpoint image, wherein a parallax compensation means for performing disparity compensation prediction using a parallax amount between frames or fields for the remaining image in the multi-view image, a multi-viewpoint encoding apparatus to obtain Bei the disparity vector splitting means, the parallax The vector dividing means outputs one of the disparity vectors obtained by the disparity compensation means as a global disparity vector, and outputs a value obtained by subtracting the global disparity vector from the remaining disparity vectors as a local disparity vector. Multi-view image encoding device.

Input means for inputting at least one multi-view image of three or more viewpoints, motion compensation means for performing motion compensation prediction on at least one of the multi-view images using a motion amount between frames or fields, using said parallax amount between frames or fields to the rest of the image with obtaining Bei the parallax compensation means for performing parallax compensation prediction of the multi-view image, multi comprises a visual difference vector splitting means at least one differentiator The viewpoint encoding apparatus, wherein the disparity vector dividing unit uses one of the disparity vectors obtained between specific viewpoints as a global disparity vector and subtracts the global disparity vector from the remaining disparity vectors. While outputting as a disparity vector, the differentiator outputs all disparity vectors obtained between other viewpoints. Multi-view image encoding apparatus and outputs the difference Torr and the global disparity vector.

The multi-view image encoding apparatus according to claim 1 or 2, wherein the multi-view image is a three-dimensional component image.

The multi-viewpoint image encoding apparatus according to claim 3, wherein the three-dimensional component image includes a background image.

3. The multi-view image encoding apparatus according to claim 1, wherein the disparity vector dividing unit outputs an average of all the disparity vectors in the multi-view image as a global disparity vector.

3. The multi-view image encoding apparatus according to claim 1, wherein the disparity vector dividing unit outputs a disparity vector first obtained in a multi-view image as a global disparity vector.

The disparity vector dividing unit according to claim 1 or 2, wherein the disparity vector dividing means outputs a disparity vector having the highest frequency of occurrence as a global disparity vector, using the disparity vector size in the multi-view image as a histogram. Encoding device.

Disparity vector dividing means according to claim 1 or 2, the global disparity vector of a frame or field being currently disparity compensation, performed disparity compensation to the global disparity vector and the previous frame or field being currently disparity compensation A multi-view image encoding apparatus that outputs a difference between global disparity vectors of frames or fields.

3. The multi-view image encoding apparatus according to claim 1, wherein the disparity vector dividing unit outputs a difference from the previously obtained local disparity vector as a local disparity vector.

In a multi-view image decoding apparatus including a motion compensation unit that performs motion compensation and a parallax compensation unit that performs parallax compensation using a motion amount and a parallax amount between frames or fields of a multi-view stereoscopic image, an input specific viewpoint A disparity vector synthesizing unit that outputs a disparity vector created using a global disparity vector that is a disparity vector between and a disparity vector that is obtained by subtracting the global disparity vector from a disparity vector between other viewpoints to a disparity compensation unit A parameter input unit that outputs the movement parameter in the depth direction to the image editing unit, an image editing unit that changes the depth of the three-dimensional component image using the input parameter, and outputs the image to the image synthesis unit, and an input background Multi-viewpoint image decoding apparatus comprising image composition means for synthesizing an image and a three-dimensional component image .

The multi-viewpoint image decoding apparatus according to claim 10 , wherein the image editing unit sequentially outputs the three-dimensional component image having the smallest global parallax vector, and the image synthesizing unit inputs the three-dimensional image sequentially input after the background image is input. A multi-viewpoint image decoding apparatus, wherein an image is synthesized by overwriting a component image on a background image.