JP2004165894A

JP2004165894A - Bit rate control method for encoder

Info

Publication number: JP2004165894A
Application number: JP2002328194A
Authority: JP
Inventors: Koichi Takagi; 幸一高木; Yasuhiro Takishima; 康弘滝嶋; Yasuyuki Nakajima; 康之中島
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2002-11-12
Filing date: 2002-11-12
Publication date: 2004-06-10

Abstract

【課題】処理時間を短縮できる符号化器のビットレート制御方法を提供することにある。
【解決手段】量子化パラメータ決定部１４は、画面複雑度演算部１２で求められた画面複雑度を用いて仮想バッファ占有量ｄを求め、該仮想バッファ占有量ｄを含む下記の（１）式から量子化パラメータＱＰを求める。
ＱＰ＝ＱＳＩＺＥ×ｄ／ｒ・・・（１）
ここに、ＱＳＩＺＥは全ＱＰの取りうる値の総数、ｒ＝ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅであり、ＱＳＩＺＥとｒは定数である。
次いで、該量子化パラメータＱＰを用いて、ＲＤｏｐｔｉｍｉｚａｔｉｏｎ法を基に、最適な動きベクトルとモードを選択する。
【選択図】図１The present invention provides a method for controlling a bit rate of an encoder that can reduce processing time.
A quantization parameter determination unit calculates a virtual buffer occupancy d using a screen complexity calculated by a screen complexity calculation unit, and includes the following equation (1) including the virtual buffer occupancy d. To obtain a quantization parameter QP.
QP = QSIZE × d / r (1)
Here, QSIZE is the total number of possible values of all QPs, r = bitrate / picrate, and QSIZE and r are constants.
Next, using the quantization parameter QP, an optimal motion vector and mode are selected based on the RDOptimization method.
[Selection diagram] Fig. 1

Description

【０００１】
【発明の属する技術分野】
本発明は符号化器のビットレート制御方法に関し、特にＪＶＴ（ＪｏｉｎｔＶｉｄｅｏＴｅａｍ）の符号化器に適用して好適なビットレート制御方法に関する。
【０００２】
【従来の技術】
従来のビットレート制御技術では、ＭＰＥＧ−２の検証モデルＴＭ５［２］におけるビットレート制御方式を利用して、次のような方式が採用されている。
【０００３】
１．ピクチャレベルで目標符号量の割当を行う（ＴＭ５レート制御のステップ１に相当）。
【０００４】
２．各マクロブロック（ＭＢ）に対して、以下の処理を行う。
【０００５】
（２−１）量子化パラメータ（ＱＰ）値を直前に符号化されたＭＢのＱＰ値とし、既知のＲＤｏｐｔｉｍｉｚａｔｉｏｎ法を利用して、最適なモード、および動きベクトル（ＭＶ）を求める。
【０００６】
（２−２）アクティビテイ値を考慮し、仮想バッファ占有量を計算し、そこから当該ＭＢに使用するＱＰを計算する（ＴＭ５レート制御のステップ２に相当）。
【０００７】
（２−３）求められたＱＰを使って、前記（２−１）をもとに、最適なＭＶ，モードを選択する。（ただし、（２−２）で求められたＱＰ値が（２−１）のＱＰ値と同じであれば、この処理は行われない。）
【０００８】
ここで、前記ＲＤｏｐｔｉｍｉｚａｔｉｏｎ（以下、ＲＤＯ）法の概略を説明する。前記（２−１）におけるＲＤＯモードにおける動き推定およびモード選択は、以下の手順で行われる。
【０００９】
（１）与えられた量子化パラメータＱＰに対し、ラグランジェ（Ｌａｇｒａｎｇｅ）定数を以下のように与える。
λ_ＭＯＤＥ＝０．８５×２^ＱＰ／３，λ_{ＭＯＴＩＯＮ}＝√λ_ＭＯＤＥ
【００１０】
（２）複数の４×４イントラ予測モードの中で、下記のコスト（ｃｏｓｔ）関数を最小にするモードを選択する。
Ｊ（ｓ，ｃ，ＩＭＯＤＥ＼ＱＰ，λ_ＭＯＤＥ）＝ＳＳＤ（ｓ，ｃ，ＩＭＯＤＥ＼ＱＰ）＋λ_ＭＯＤＥ（ｓ，ｃ，ＩＭＯＤＥ＼ＱＰ）
ここで、ｓは原画像、ｃは予測画像、ＩＭＯＤＥは４×４イントラ予測モードの集合である。
【００１１】
（３）原画像ｓと予測画像ｃの差分に２次元アダマール（Ｈａｄａｍａｒｄ）変換を施したものの絶対値和の１／２（これを、ＳＡＴＤと呼ぶ）が最小になる１６×１６イントラ予測モードを選択する。
【００１２】
（４）８×８画素サイズのブロックを複数ブロックに分割した各モードに対し、動き推定、参照フレームの推定を行い、コスト関数、すなわちＳＳＤ＋λＲ（ＭＶ，ＲＥＦ）の値を最小にするものをそれぞれ選択する。
【００１３】
（５）１６×１６，８×１６，１６×８画素サイズのＭＢを複数のブロックに分割した各モードに対し、動き推定、参照フレームの推定を行い、下記のコスト関数を最小にするものを、それぞれ選択する。
Ｊ（ＲＥＦ，ｍ（ＲＥＦ）＼λ_{ＭＯＴＩＯＮ}）
＝ＳＡＴＤ（ｓ，ｃ（ＲＥＦ，ｍ（ＲＥＦ）））＋λ_{ＭＯＴＩＯＮ}・（Ｒ（ｍ（ＲＥＦ）−ｐ（ＲＥＦ））＋Ｒ（ＲＥＦ））
ただし、ｍ（ＲＥＦ）、ｐ（ＲＥＦ）は、それぞれ参照画像ＲＥＦに対する動ベクトルおよびその予測値、Ｒ（）は、ＵｎｉｖｅｒｓａｌＶＬＣ（ＵＶＬＣ）を使って計算されたビット量である。
【００１４】
（６）Ｂフレームに関しては、予測方向も考慮に入れて、コスト関数が最小になる予測方向を選択する。
Ｊ（ＰＤＩＲ＼λ_{ＭＯＴＩＯＮ}）
＝ＳＡＴＤ（ｓ，ｃ（ＰＤＩＲ，ｍ（ＰＤＩＲ）））＋λ_{ＭＯＴＩＯＮ}・（Ｒ（ｍ（ＰＤＩＲ）−ｐ（ＰＤＩＲ））＋Ｒ（ＰＤＩＲ））
ここに、ＰＤＩＲは予測方向を示す。
【００１５】
（７）以上、前記（２）〜（５）の各モードで選ばれたそれぞれのパラメータに対して、次のコスト関数を最小にするモードＭＯＤＥを選択する。
Ｊ（ｓ，ｃ，ＭＯＤＥ＼ＱＰ，λ_ＭＯＤＥ）＝ＳＳＤ（ｓ，ｃ，ＭＯＤＥ＼ＱＰ）＋λ_ＭＯＤＥ（ｓ，ｃ，ＭＯＤＥ＼ＱＰ）
【００１６】
なお、ＰフレームではＳＫＩＰモード、ＢフレームではＤＩＲＥＣＴモードも考慮に入れる。
【００１７】
【発明が解決しようとする課題】
しかしながら、前記した従来技術では、最適ＭＶ探索およびモード判定を、各ＭＢで最悪２度ずつ行わなければならない。すなわち、前記（２−１）と（２−３）の処理を行わなければならない。これらの処理は、符号化に要する時間の大きな部分を占めるため、符号化の処理時間が長くなるという課題があった。
【００１８】
本発明は、前記課題に鑑みてなされたものであり、処理時間を低減できる符号化器のビットレート制御方法を提供することにある。
【００１９】
【課題を解決するための手段】
前記した目的を達成するために、本発明は、ＪＶＴ（ＪｏｉｎｔＶｉｄｅｏＴｅａｍ）の符号化器のビットレート制御方法が、各ピクチャタイプごとの画面複雑度を基に、各画面の目標符号量の割当てを行うステップと、前記画面の各マクロブロックにおける仮想バッファ占有量ｄを求めるステップと、前記仮想バッファ占有量ｄを含む下記の（１）または（２）式から量子化パラメータＱＰを求めるステップと、該量子化パラメータＱＰを用いて、ＲＤｏｐｔｉｍｉｚａｔｉｏｎ法を基に、最適な動きベクトルとモードを選択するステップとからなるようにした点に第１の特徴がある。
ＱＰ＝ＱＳＩＺＥ×ｄ／ｒ・・・（１）
ここに、ＱＳＩＺＥは全ＱＰの取りうる値の総数、ｒ＝ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅであり、ＱＳＩＺＥとｒは定数である。
ＱＰ＝ａｌｏｇ（ＱＳＩＺＥ×ｄ／ｒ）＋ｂ…（２）
ここに、ａ、ｂは定数である。
【００２０】
また、本発明は、前記量子化パラメータＱＰは、０≦ＱＰ≦５１であり、範囲外のものは、０または５１にクリップすることとし、前記ＱＳＩＺＥ＝５２とした点に第２の特徴がある。
【００２１】
該第１、第２の特徴によれば、ＲＤｏｐｔｉｍｉｚａｔｉｏｎ法を一度しか行わなくても、レート−歪み特性が改善できるとともに、符号化の処理時間を大幅に短縮できる。
【００２２】
また、本発明は、前記仮想バッファ占有量の初期値ｄ_０を、ピクチャタイプ（ｉ，ｐ，ｂ）毎に、ｄ_ｉ０＝３０×ｒ／ＱＳＩＺＥ、ｄ_ｐ０＝Ｋ_ｐ×ｄ_ｉ０、ｄ_ｂ０＝Ｋ_ｂ×ｄ_ｉ０とし、またｒ＝２×ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅとした点に第３の特徴がある。
【００２３】
この特徴によれば、初期状態のフレーム毎の符号量の変動、およびＰＳＮＲの変動を安定化させることが可能になる。
【００２４】
【発明の実施の形態】
以下に、図面を参照して、本発明を詳細に説明する。図１は、本発明を適用して好適なＪＶＴ（ＪｏｉｎｔＶｉｄｅｏＴｅａｍ）の符号化器の概略の構成を示すブロック図である。
【００２５】
図において、動画像信号の第１画面の入力画像信号１が入力される時には、イントラ予測部１５からの制御信号によりスイッチ２は端子ａ側に接続されており、入力画像信号は高い符号化効率を得るために直交変換器および量子化器３に入力する。該直交変換器および量子化器３では、該入力画像信号はＤＣＴ（離散コサイン変換）などを用いて直交変換され、その直交変換係数が量子化される。この量子化係数は、エントロピ符号化部４でエントロピ符号化される。
【００２６】
一方、逆量子化器および逆直交変換器５に入力した量子化係数は、逆量子化および逆直交変換されて画像データが復元される。復元された画像データは、ループフィルタ６を経てフレームバッファ７に蓄積される。
【００２７】
次の画面の入力画像信号１が入力されるようになると、前記スイッチ２はイントラ予測部１５からの制御信号により端子ｂに接続され、また該入力画像信号１は、予測信号減算器１０および動き推定部８に入力される。動き推定部８では、該入力画像信号１を参照して、フレームバッファ７から読み出された画像データの動きを推定し、動きベクトルＭＶを検出する。動き補償部９は、該動きベクトルＭＶを基に動き補償された動き補償予測信号を生成し、予測信号減算器１０と局所復号加算器１１に送出する。
【００２８】
予測信号減算器１０では、入力画像信号１から動き補償予測信号を減算し、前記直交変換器および量子化器３に出力する。直交変換器および量子化器３は、高い符号化効率を得るために、予測信号減算器１０からの信号を直交変換し、量子化する。一方、局所復号加算器１１は、逆量子化器および逆直交変換器５で復元された画像データに前記動き補償予測信号を加算し、ループフィルタ６に出力する。
【００２９】
量子化パラメータ決定部１４は、画面複雑度演算部１２から得られた画面複雑度および仮想バッファ占有量演算部１３から得られた仮想バッファ占有量を用いて量子化パラメータ（ＱＰ）を決定する。該量子化パラメータＱＰは、イントラ予測部１５と前記動き推定部８に送られる。イントラ予測部１５は、該量子化パラメータＱＰに従ってイントラモードにするか否かの予測を行い、前記スイッチ２を切り替える制御信号を発生する。
【００３０】
次に、本発明の一実施形態のビットレート制御方式を説明する。該ビットレート制御は主に前記量子化パラメータ決定部１４で行われる。
【００３１】
１．まず、従来技術と同様に、ピクチャレベル（１画面毎）で目標符号量の割当が行われる（ＴＭ５レート制御のステップ１に相当）。図２は、符号化されるピクチャの概念図を示す。
【００３２】
以下に、該ピクチャレベルでの目標符号量の割当について説明する。まず、画面複雑度Ｘ_ｔｙｐｅを定義し、符号化されたピクチャタイプにより更新する。ここで、添え字「ｔｙｐｅ」は、Ｉ，Ｐ，Ｂの各ピクチャタイプを示し、以下でも同様とする。
【００３３】
更新式は、Ｘ_ｔｙｐｅ＝Ｓ_ｔｙｐｅＱＰ_ｔｙｐｅとする。ここに、Ｓ_ｔｙｐｅは当該ピクチャの符号量、ＱＰ_ｔｙｐｅは実際に求められた量子化パラメータＱＰの平均である。なお、画面複雑度Ｘ_ｔｙｐｅの初期値は、それぞれＸ_ｉ＝１５５×ｂｉｔｒａｔｅ／１１５、Ｘ_ｐ＝１５×ｂｉｔｒａｔｅ／１１５、Ｘ_ｂ＝５×ｂｉｔｒａｔｅ／１１５とする。該「ｂｉｔｒａｔｅ」は、符号化器から出力されるビットストリーム速度であり、既知である。
【００３４】
上記の画面複雑度Ｘ_ｔｙｐｅを用いて、各ピクチャの目標符号量Ｔ_ｔｙｐｅすなわち、Ｔ_ｉ、Ｔ_ｐ、およびＴ_ｂをＴＭ５と同様に、以下の通り求める。

【００３５】
ここで、Ｋ_ｔｙｐｅは各ピクチャタイプ間の複雑度の違いを表す比例成分であり、Ｋ_ｐ＝１．１，Ｋ_ｂ＝１．５とするのが好適である。また、Ｎ_ｔｙｐｅは当該ＧＯＰ（ｇｒｏｕｐｏｆｐｉｃｔｕｒｅｓ）で符号化されていないフレームの数、Ｒは当該ＧＯＰにおいて符号化されていないフレームに割当て可能なビット量を示す。
【００３６】
なお、ＧＯＰの最初のピクチャを符号化する前に、前記Ｒは下記の通りとする。
Ｒ＝（ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅ）×Ｎ＋Ｒ_ｐｒｅｖ
ただし、ｐｉｃｒａｔｅはピクチャレート（枚／ｓｅｃ）、ＮはＧＯＰに含まれるフレームの数、Ｒ_ｐｒｅｖは直前のＧＯＰにおけるＲとする。また、符号化後のＲは、Ｒ＝Ｒ−Ｓ_ｔｙｐｅとする。
以上により、ピクチャレベルの目標符号量Ｔ_ｔｙｐｅが求められる。
【００３７】
２．次に、各マクロブロックＭＢに対して、以下の処理を行う。
【００３８】
（２−１）前記仮想バッファ占有量演算部１３が仮想バッファ占有量を計算する。第ｍ番目のマクロブロックＭＢｍにおける仮想バッファ占有量ｄ_ｍを、次のように定義する。
ｄ_ｍ＝ｄ_０＋Ｂ_ｍ−１−Ｔ_ｔｙｐｅ×（ｍ−１）／ＭＢｃｎｔ
ここで、Ｂ_ｍ−１は、当該ｍ−１番目のＭＢまでの消費符号量、ＭＢｃｎｔは、全ＭＢ数である。また、ｄ_０は、直前に符号化された同一ピクチャタイプをもつフレームの最後に求められた仮想バッファ占有量を使用する。なお、ｄ_０の初期値は、ピクチャタイプ毎に、ｄ_ｉ０＝３０×ｒ／ＱＳＩＺＥ、ｄ_ｐ０＝Ｋ_ｐ×ｄ_ｉ０、ｄ_ｂ０＝Ｋ_ｂ×ｄ_ｉ０、またｒ＝２×ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅとする。ここで、ＱＳＩＺＥは、全ＱＰの取りうる値の総数を示している。該総数は、５２とするのが好適である。
【００３９】
以上を基に、量子化パラメータＱＰを、下記の（１）式の通り定める。
ＱＰ＝ＱＳＩＺＥ×ｄ／ｒ …（１）
ただし、０≦ＱＰ≦５１とし、範囲外のものは、０，５１にそれぞれクリップする。
なお、前記（１）式に代えて、下記の（２）式を用いてもよい。
ＱＰ＝ａｌｏｇ（ＱＳＩＺＥ×ｄ／ｒ）＋ｂ…（２）
ここに、ａ，ｂは定数である。
【００４０】
（２−２）以上のようにして求められたＱＰを用いて、前記ＲＤｏｐｔｉｍｉｚａｔｉｏｎ法を利用して、最適な動きベクトルＭＶおよびモードを選択する。
【００４１】
以上の説明から明らかなように、本実施形態では、前記ＲＤｏｐｔｉｍｉｚａｔｉｏｎ法は一度しか行われず、その他の処理は従来方式と等価であるため、従来法式に比べて、符号化の処理時間を大幅に短縮することができる。
【００４２】
また、テスト画像を使った本実施形態に係るシミュレーション実験を行い、以下のようにして、レート−歪み特性を求めた。
【００４３】
実験条件を、図３に示す。テスト画像Ｆｏｒｅｍａｎを用いた実験では、図４（ａ）に示されているように、従来方式と本実施形態方式とでは、レート−歪み特性はほぼ同じであったが、Ｍｏｂｉｌｅ＆Ｃａｌｅｎｄａｒを用いた実験では、図４（ｂ）に示されているように、本実施形態方式の方が優れていることが確認できた。また、このことから、Ｂピクチャが存在している場合でも、存在していない場合でも、全ビットレートで本実施形態方式の方が優れていることが確認できる。特に、Ｂピクチャが存在している場合には、高レートでの向上が著しいことが確認できる。この効果は、量子化パラメータを前記（１）式で求め、ＱＳＩＺＥ＝５２とした点に、主たる要因があると考えられる。
【００４４】
次に、フレーム毎の符号量と、ＰＳＮＲの推移を図５に示す。実線が本実施形態方式、破線が従来方式を示している。
【００４５】
図５により、特に従来法式は初期状態が不安定であるのに対し、本実施形態方式は終始安定し続けていること、また従来方式ではＩピクチャで符号量を多く消費しており、同一ＧＯＰのＰピクチャに悪影響を与えていることが伺える。
前者の効果は、仮想バッファ占有量ｄの初期値として、ピクチャタイプ毎に、ｄ_ｉ０＝３０×ｒ／ＱＳＩＺＥ、ｄ_ｐ０＝Ｋ_ｐ×ｄ_ｉ０、ｄ_ｂ０＝Ｋ_ｂ×ｄ_ｉ０、またｒ＝２×ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅとした点に、主たる要因があると考えられる。
以上により、本実施形態方式の有効性が確認できた。
【００４６】
上記のように、本実施形態方式が有効である原因は、量子化パラメータＱＰを求めるために、仮想バッファ占有量ｄを含む前記（１）式を用いたこと、および／またはｄ_０の初期値は、ピクチャタイプ毎に、ｄ_ｉ０＝３０×ｒ／ＱＳＩＺＥ、ｄ_ｐ０＝Ｋ_ｐ×ｄ_ｉ０、ｄ_ｂ０＝Ｋ_ｂ×ｄ_ｉ０、またｒ＝２×ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅとし、ＱＳＩＺＥ＝５２とした点であると考えられる。
【００４７】
【発明の効果】
以上の説明から明らかなように、請求項１の発明によれば、ＲＤｏｐｔｉｍｉｚａｔｉｏｎ法を一度しか行わないにもかかわらず、従来方式と同等またはそれ以上の効果を上げることができるようになる。また、このため、符号化に要する処理時間を大幅に短縮することができる。
【００４８】
また、請求項２の発明によれば、量子化パラメータＱＰを、０≦ＱＰ≦５１、その総数ＱＳＩＺＥを、ＱＳＩＺＥ＝５２としたので、レート−歪み特性を従来法式に比べて向上させることができる。
【００４９】
さらに、請求項３の発明によれば、仮想バッファ占有量の初期値ｄ_０を、ピクチャタイプ（ｉ，ｐ，ｂ）毎に、ｄ_ｉ０＝３０×ｒ／ＱＳＩＺＥ、ｄ_ｐ０＝Ｋ_ｐ×ｄ_ｉ０、ｄ_ｂ０＝Ｋ_ｂ×ｄ_ｉ０、またｒ＝２×ｂｉｔｒａｔｅ／ｐｉｃｒａｔｅとしたので、初期状態のフレーム毎の符号量の変動、およびＰＳＮＲの変動を安定化させることが可能になる。
【図面の簡単な説明】
【図１】本発明が適用されるＪＶＴ（ＪｏｉｎｔＶｉｄｅｏＴｅａｍ）の符号化器の概略の構成を示すブロック図である。
【図２】符号化されるピクチャとマクロブロックＭＢの説明図である。
【図３】実験条件を示す図である。
【図４】テスト画像Ｆｏｒｅｍａｎ、Ｍｏｂｉｌｅ＆Ｃａｌｅｎｄａｒを用いた時のレート−歪み特性を示す図である。
【図５】符号量とＰＳＮＲの推移を、従来方式と本発明方式とで表した図である。
【符号の説明】
１２・・・面複雑度演算部、１３・・・仮想バッファ占有量演算部、１４・・・量子化パラメータ決定部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a bit rate control method for an encoder, and more particularly, to a bit rate control method suitable for being applied to a JVT (Joint Video Team) encoder.
[0002]
[Prior art]
In the conventional bit rate control technology, the following method is adopted by using the bit rate control method in the MPEG-2 verification model TM5 [2].
[0003]
1. The target code amount is assigned at the picture level (corresponding to step 1 of TM5 rate control).
[0004]
2. The following processing is performed on each macroblock (MB).
[0005]
(2-1) Using the quantization parameter (QP) value as the QP value of the MB coded immediately before, and using a known RDoptimization method, an optimum mode and a motion vector (MV) are obtained.
[0006]
(2-2) The virtual buffer occupancy is calculated in consideration of the activity value, and the QP used for the MB is calculated therefrom (corresponding to step 2 of the TM5 rate control).
[0007]
(2-3) Using the obtained QP, select an optimal MV and mode based on the above (2-1). (However, if the QP value obtained in (2-2) is the same as the QP value in (2-1), this processing is not performed.)
[0008]
Here, an outline of the RDOptimization (hereinafter, RDO) method will be described. The motion estimation and mode selection in the RDO mode in (2-1) are performed in the following procedure.
[0009]
(1) For a given quantization parameter QP, a Lagrange constant is given as follows.
λ _MODE = 0.85 × 2 ^{QP / 3} , λ _MOTION = √λ _MODE
[0010]
(2) A mode that minimizes the following cost function is selected from a plurality of 4 × 4 intra prediction modes.
J (s, c, IMODE\QP, λ MODE) = SSD (s, c, IMODE\QP) + λ MODE (s, c, IMODE\QP)
Here, s is an original image, c is a predicted image, and IMODE is a set of 4 × 4 intra prediction modes.
[0011]
(3) A 16 × 16 intra-prediction mode in which the difference between the original image s and the predicted image c is subjected to a two-dimensional Hadamard transformation and the half of the sum of absolute values (this is called SATD) is minimized select.
[0012]
(4) For each mode obtained by dividing a block of 8 × 8 pixels into a plurality of blocks, motion estimation and reference frame estimation are performed, and the cost function, that is, the one that minimizes the value of SSD + λR (MV, REF) is determined. select.
[0013]
(5) For each mode obtained by dividing a 16 × 16, 8 × 16, 16 × 8 pixel size MB into a plurality of blocks, motion estimation and reference frame estimation are performed to minimize the following cost function. , Select each.
J (REF, m (REF) ＼λ _MOTION )
= SATD (s, c (REF, m (REF))) + λ _MOTION · (R (m (REF) −p (REF)) + R (REF))
Here, m (REF) and p (REF) are a motion vector and a predicted value thereof for the reference image REF, respectively, and R () is a bit amount calculated using Universal VLC (UVLC).
[0014]
(6) For the B frame, taking into account the prediction direction, select the prediction direction that minimizes the cost function.
J (PDIR @ λ _MOTION )
= SATD (s, c (PDIR, m (PDIR))) + λ _MOTION · (R (m (PDIR) −p (PDIR)) + R (PDIR))
Here, PDIR indicates the prediction direction.
[0015]
(7) As described above, for each parameter selected in each of the modes (2) to (5), the mode MODE that minimizes the following cost function is selected.
J (s, c, _MODE @ QP, λ _MODE ) = SSD (s, c, _MODE @ QP) + λ _MODE (s, c, _MODE @ QP)
[0016]
It should be noted that the SKIP mode is also taken into account for the P frame, and the DIRECT mode is taken into account for the B frame.
[0017]
[Problems to be solved by the invention]
However, in the above-described conventional technique, the optimum MV search and the mode determination must be performed twice at worst for each MB. That is, the processes (2-1) and (2-3) must be performed. Since these processes occupy a large part of the time required for encoding, there has been a problem that the encoding processing time becomes long.
[0018]
SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and an object of the present invention is to provide a bit rate control method for an encoder that can reduce processing time.
[0019]
[Means for Solving the Problems]
In order to achieve the above object, the present invention provides a method for controlling a bit rate of a JVT (Joint Video Team) encoder by allocating a target code amount for each screen based on the screen complexity for each picture type. , A step of obtaining a virtual buffer occupancy d in each macroblock of the screen, and a step of obtaining a quantization parameter QP from the following equation (1) or (2) including the virtual buffer occupancy d: The first feature is that the method includes a step of selecting an optimal motion vector and a mode based on the RDoptimization method using the quantization parameter QP.
QP = QSIZE × d / r (1)
Here, QSIZE is the total number of possible values of all QPs, r = bitrate / picrate, and QSIZE and r are constants.
QP = alog (QSIZE × d / r) + b (2)
Here, a and b are constants.
[0020]
Further, the present invention has a second feature in that the quantization parameter QP satisfies 0 ≦ QP ≦ 51, and values outside the range are clipped to 0 or 51, and QSIZE = 52. .
[0021]
According to the first and second features, even if the RDOptimization method is performed only once, the rate-distortion characteristics can be improved and the encoding processing time can be significantly reduced.
[0022]
Further, the present invention is that the initial value _{d 0} of the virtual buffer occupancy, picture type (i, p, b) for _{each, d i0 = 30 × r /} QSIZE, d p0 = K p × d i0, d b0 = a _{K b} × _{d i0,} also has a third feature in that the r = 2 × bitrate / picrate.
[0023]
According to this feature, it is possible to stabilize the variation of the code amount for each frame in the initial state and the variation of the PSNR.
[0024]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a JVT (Joint Video Team) encoder suitable for applying the present invention.
[0025]
In the figure, when the input image signal 1 of the first screen of the moving image signal is input, the switch 2 is connected to the terminal a by the control signal from the intra prediction unit 15, and the input image signal has a high coding efficiency. Is input to the orthogonal transformer and quantizer 3 to obtain. In the orthogonal transformer and quantizer 3, the input image signal is orthogonally transformed using DCT (discrete cosine transform) or the like, and the orthogonal transform coefficient is quantized. The quantized coefficients are entropy-coded by the entropy coding unit 4.
[0026]
On the other hand, the quantized coefficients input to the inverse quantizer and inverse orthogonal transformer 5 are subjected to inverse quantization and inverse orthogonal transform, and image data is restored. The restored image data is stored in the frame buffer 7 via the loop filter 6.
[0027]
When the input image signal 1 of the next screen is input, the switch 2 is connected to the terminal b by a control signal from the intra prediction unit 15, and the input image signal 1 It is input to the estimating unit 8. The motion estimating unit 8 estimates the motion of the image data read from the frame buffer 7 with reference to the input image signal 1 and detects the motion vector MV. The motion compensator 9 generates a motion-compensated prediction signal that is motion-compensated based on the motion vector MV, and sends it to the prediction signal subtractor 10 and the local decoding adder 11.
[0028]
The prediction signal subtracter 10 subtracts the motion compensation prediction signal from the input image signal 1 and outputs the result to the orthogonal transformer and the quantizer 3. The orthogonal transformer and the quantizer 3 orthogonally transform and quantize the signal from the prediction signal subtractor 10 in order to obtain high coding efficiency. On the other hand, the local decoding adder 11 adds the motion compensation prediction signal to the image data restored by the inverse quantizer and the inverse orthogonal transformer 5, and outputs the result to the loop filter 6.
[0029]
The quantization parameter determination unit 14 determines a quantization parameter (QP) using the screen complexity obtained from the screen complexity calculation unit 12 and the virtual buffer occupancy obtained from the virtual buffer occupancy calculation unit 13. The quantization parameter QP is sent to the intra prediction unit 15 and the motion estimation unit 8. The intra prediction unit 15 predicts whether or not to switch to the intra mode according to the quantization parameter QP, and generates a control signal for switching the switch 2.
[0030]
Next, a bit rate control method according to an embodiment of the present invention will be described. The bit rate control is mainly performed by the quantization parameter determination unit 14.
[0031]
1. First, a target code amount is allocated at a picture level (for each screen), as in the prior art (corresponding to step 1 of TM5 rate control). FIG. 2 shows a conceptual diagram of a picture to be encoded.
[0032]
The assignment of the target code amount at the picture level will be described below. First, the screen complexity X _type is defined, and updated by the coded picture type. Here, the subscript “type” indicates each picture type of I, P, and B, and the same applies to the following.
[0033]
The update formula is set as X _type = S _type QP _type . Here, S _type is the code amount of the picture, and QP _type is the average of the quantization parameter QP actually obtained. The screenshots initial value of complexity _{X type,} respectively with _{X i = 155 × bitrate / 115} , X p = 15 × bitrate / 115, X b = 5 × bitrate / 115. The “bitrate” is a bit stream rate output from the encoder, and is known.
[0034]
Using the above-mentioned screen complexity X _type , the target code amount T _type of each picture, that is, T _i , T _p , and T _b is obtained as follows in the same manner as TM5.

[0035]
Here, K _type is a proportional component representing a difference in complexity between each picture type, and it is preferable that K _p = 1.1 and K _b = 1.5. N _type indicates the number of frames that are not encoded in the GOP (group of pictures), and R indicates the amount of bits that can be allocated to frames that are not encoded in the GOP.
[0036]
Note that, before encoding the first picture of the GOP, R is as follows.
R = (bitrate / picrate) × N + R _prev
Here, picrate is the picture rate (frames / sec), N is the number of frames included in the GOP, and R _prev is R in the immediately preceding GOP. Also, R after encoding is R = R-S _type .
As described above, the target code amount T _type at the picture level is obtained.
[0037]
2. Next, the following processing is performed on each macro block MB.
[0038]
(2-1) The virtual buffer occupancy calculator 13 calculates the virtual buffer occupancy. The virtual buffer fullness d _m in the m-th macroblock MBm, defined as follows.
_{_{_{d m = d 0 + B m}}} -1 -T type × (m-1) / MBcnt
Here, B _m-1 is the consumed code amount up to the (m-1) th MB, and MBcnt is the total number of MBs. For d ₀ , the virtual buffer occupancy obtained at the end of the immediately preceding frame having the same picture type is used. The initial value of _{d 0,} for each picture _type, and _{d i0 = 30 × r / QSIZE} , d p0 = K p × d i0, d b0 = K b × d i0 Further, r = 2 × bitrate / picrate I do. Here, QSIZE indicates the total number of possible values of all QPs. The total number is preferably 52.
[0039]
Based on the above, the quantization parameter QP is determined according to the following equation (1).
QP = QSIZE × d / r (1)
However, 0 ≦ QP ≦ 51, and those outside the range are clipped to 0 and 51, respectively.
Note that the following equation (2) may be used instead of the above equation (1).
QP = alog (QSIZE × d / r) + b (2)
Here, a and b are constants.
[0040]
(2-2) Using the QP obtained as described above, the optimal motion vector MV and mode are selected by using the RDoptimization method.
[0041]
As is clear from the above description, in the present embodiment, the RDOptimization method is performed only once, and the other processing is equivalent to the conventional method. Therefore, the encoding processing time is significantly reduced as compared with the conventional method. can do.
[0042]
Further, a simulation experiment according to the present embodiment using a test image was performed, and a rate-distortion characteristic was obtained as follows.
[0043]
The experimental conditions are shown in FIG. In the experiment using the test image Foreman, as shown in FIG. 4A, the rate-distortion characteristics were almost the same between the conventional method and the present embodiment, but in the experiment using Mobile & Calender, As shown in FIG. 4B, it was confirmed that the method of the present embodiment was superior. From this, it can be confirmed that the system of the present embodiment is superior at all bit rates regardless of whether a B picture exists or does not exist. In particular, when a B picture exists, it can be confirmed that the improvement at a high rate is remarkable. This effect is considered to be mainly due to the fact that the quantization parameter is obtained by the above equation (1) and QSIZE = 52.
[0044]
Next, FIG. 5 shows the transition of the code amount and the PSNR for each frame. The solid line indicates the method of the present embodiment, and the broken line indicates the conventional method.
[0045]
According to FIG. 5, the initial state is particularly unstable in the conventional method, whereas the method of the present embodiment is stable throughout. In the conventional method, a large amount of code is consumed in the I picture, and the same GOP It seems that the P picture has been adversely affected.
The effect of the former is that, as an initial value of the virtual buffer occupation amount d, d _i0 = 30 × r / QSIZE, d _p0 = K _p × d _i0 , d _b0 = K _b × d _i0 , and r = It is considered that there is a main factor in the point of 2 × bitrate / picrate.
From the above, the effectiveness of the method of the present embodiment was confirmed.
[0046]
As described above, the reason why the method of the present embodiment is effective is that the expression (1) including the virtual buffer occupancy d is used to obtain the quantization parameter QP, and / or the initial value of d ₀ . , for each picture type _that, a _{d i0 = 30 × r / QSIZE} , d p0 = K p × d i0, d b0 = K b × d i0, also r = 2 × bitrate / picrate, was QSIZE = 52 It is considered to be.
[0047]
【The invention's effect】
As is apparent from the above description, according to the first aspect of the present invention, it is possible to achieve an effect equal to or higher than that of the conventional method, even though the RDOptimization method is performed only once. Therefore, the processing time required for encoding can be significantly reduced.
[0048]
According to the second aspect of the present invention, since the quantization parameter QP is 0 ≦ QP ≦ 51 and the total number QSIZE is QSIZE = 52, the rate-distortion characteristic can be improved as compared with the conventional method. .
[0049]
Furthermore, according to the invention of claim 3, the initial value _{d 0} for the virtual buffer occupancy, picture type (i, p, b) for _{each, d i0 = 30 × r /} QSIZE, d p0 = K p × d _{_{_{i0, d b0 = K b ×}}} d i0, also since the r = 2 × bitrate / picrate, it is possible to stabilize fluctuations, and variations in PSNR of code amount for each of the initial state frame.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a JVT (Joint Video Team) encoder to which the present invention is applied.
FIG. 2 is an explanatory diagram of a picture to be encoded and a macroblock MB.
FIG. 3 is a view showing experimental conditions.
FIG. 4 is a diagram illustrating a rate-distortion characteristic when a test image Foreman, Mobile & Calender is used.
FIG. 5 is a diagram illustrating transitions of a code amount and a PSNR in a conventional method and a method of the present invention.
[Explanation of symbols]
12: surface complexity calculation unit, 13: virtual buffer occupancy calculation unit, 14: quantization parameter determination unit

Claims

In a bit rate control method of an encoder of JVT (Joint Video Team),
Allocating a target code amount for each screen based on the screen complexity for each picture type;
Determining a virtual buffer occupancy d in each macroblock of the screen;
Obtaining a quantization parameter QP from the following equation (1) or (2) including the virtual buffer occupancy d;
Selecting an optimal motion vector and mode based on the RDOptimization method using the quantization parameter QP.
QP = QSIZE × d / r (1)
Here, QSIZE is the total number of possible values of all QPs, r = bitrate / picrate, and QSIZE and r are constants.
QP = alog (QSIZE × d / r) + b (2)
Here, a and b are constants.

A method for controlling a bit rate of an encoder according to claim 1,
The bit rate control method for an encoder, wherein the quantization parameter QP satisfies 0 ≦ QP ≦ 51, and values outside the range are clipped to 0 or 51, and the QSIZE = 52.

A method for controlling a bit rate of an encoder according to claim 1 or 2,
An initial value _{d 0} of the virtual buffer occupancy, picture type (i, p, b) for _{each, d i0 = 30 × r /} QSIZE, d p0 = K p × d i0, d b0 = K b × d i0 , And r = 2 × bitrate / picrate.