JP2009033652A

JP2009033652A - Moving image encoding method

Info

Publication number: JP2009033652A
Application number: JP2007197741A
Authority: JP
Inventors: Hideaki Hattori; 秀昭服部
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2007-07-30
Filing date: 2007-07-30
Publication date: 2009-02-12

Abstract

<P>PROBLEM TO BE SOLVED: To perform moving image encoding with low power consumption by appropriately selecting an encoding mode. <P>SOLUTION: The present invention relates to a moving image encoding method for encoding a moving image by selecting one encoding mode from among a plurality of encoding modes, wherein the encoding mode to be selected is determined from a bit rate of an encoded motion picture and a target bit rate, and power consumption information corresponding to the respective encoding modes. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は動画像符号化方法に関する。高圧縮効率を実現する動画符号化方式の省電力化に好適である。 The present invention relates to a moving image encoding method. It is suitable for power saving of a moving picture coding system that realizes high compression efficiency.

デジタル画像の有効な利用のためには効率的に記憶、伝送する必要があり、画像データの圧縮符号化は必須のものである。画像データを圧縮する方法として、画像符号化の標準符号化方式であるJPEG(Joint Photographic Experts Group)においては以下のような方式が採用されている。 For effective use of a digital image, it is necessary to efficiently store and transmit the digital image, and compression encoding of image data is essential. As a method for compressing image data, the following method is adopted in JPEG (Joint Photographic Experts Group), which is a standard coding method for image coding.

まず圧縮対象となる画像データについてブロック分割して、該ブロック単位に直交変換を用いて直流成分と交流成分とで構成された直交変換係数に変換する。変換された直交変換係数に対して符号量を減らすために量子化を行い、量子化された直交変換係数をエントロピー符号する方式がよく知られている。この際、人間の視覚特性として敏感な低周波成分については小さい量子化値を用いて量子化し、鈍感な高周波成分については大きな量子化値で量子化を行うことによって、視覚的に劣化が少なく効率的に量子化を行う方式が一般的に用いられている。 First, image data to be compressed is divided into blocks, and converted into orthogonal transform coefficients composed of a DC component and an AC component using orthogonal transform for each block. A method of performing quantization to reduce the code amount for the transformed orthogonal transform coefficient and entropy-coding the quantized orthogonal transform coefficient is well known. At this time, low-frequency components that are sensitive to human visual characteristics are quantized using a small quantization value, and insensitive high-frequency components are quantized using a large quantization value so that there is little visual degradation and efficiency. In general, a method of performing quantization is generally used.

また、MPEG-2(Motion Picture Experts Group Phase 2)に利用されているように、動画像のフレーム間の相関性を利用して符号量を効果的に削減するフレーム間予測動画像符号化方式も一般的に用いられている。図４にフレーム間予測の概念図を示す。現フレームは例えば、１６画素×１６ラインの領域に分割され、該領域を処理対象領域として領域毎に動きベクトルを探索する。符号化した動画像データを復号した再構成動画像データを参照フレームとして、動きベクトルは参照フレーム中の現フレームの処理対象領域に対応する位置の領域を基準にする。そして、その領域を含む周囲の画素群を動き探索範囲として最も直交変換係数の予測誤差による符号量が小さくなると推定される領域の相対位置を求める。この直交変換係数の予測誤差による符号量はブロックマッチングを用いて推定され、該相対位置は現フレームの処理対象領域と該動き探索範囲の中から現処理対象領域と同サイズの領域（以下、参照領域）とをずらしながら比較することによって求める。
また、ブロックマッチング演算には、領域内における互いに対応する位置の画素の差分絶対値和（ＳＡＤ）等がよく用いられ、該ＳＡＤの値が小さいほど相関が高く、符号量が小さくなると推定される。 In addition, as used in MPEG-2 (Motion Picture Experts Group Phase 2), there is also an inter-frame predictive video encoding method that effectively reduces the amount of code using the correlation between video frames. Commonly used. FIG. 4 shows a conceptual diagram of inter-frame prediction. The current frame is divided into, for example, an area of 16 pixels × 16 lines, and a motion vector is searched for each area using the area as a processing target area. The reconstructed moving image data obtained by decoding the encoded moving image data is used as a reference frame, and the motion vector is based on the region at the position corresponding to the processing target region of the current frame in the reference frame. Then, the relative position of the region estimated to have the smallest code amount due to the prediction error of the orthogonal transform coefficient is obtained using the surrounding pixel group including the region as the motion search range. The code amount due to the prediction error of the orthogonal transform coefficient is estimated using block matching, and the relative position is an area of the same size as the current processing target area from the processing target area of the current frame and the motion search range (hereinafter referred to as reference). It is obtained by comparing with shifting the area.
In addition, for the block matching calculation, the sum of absolute differences (SAD) of pixels at positions corresponding to each other in the region is often used, and it is estimated that the smaller the SAD value, the higher the correlation and the smaller the code amount. .

また、MPEG-2やMPEG-4、H.264においては表示順序で前方向１枚の参照フレーム内の参照領域（前方向参照領域）又は後方向１枚の参照フレーム内の参照領域（後方向参照領域）又はその２つの参照領域を同時に参照する。そして、２つの参照領域の平均値を予測領域として、対象ピクチャと予測ピクチャの差分データを符号化する双方向フレーム間予測符号化が定められている。H.264においては同じ方向の参照フレーム２枚を用いる等柔軟な予測が可能である。 Also, in MPEG-2, MPEG-4, and H.264, the reference area in one reference frame in the forward direction (forward reference area) or the reference area in one reference frame in the backward direction (backward) Reference area) or the two reference areas at the same time. Then, bi-directional interframe predictive coding is defined in which difference data between a target picture and a predicted picture is encoded using an average value of two reference areas as a prediction area. In H.264, flexible prediction such as using two reference frames in the same direction is possible.

図５に双方向フレーム間予測符号化方式の概念図を示す。
この双方向フレーム間予測には一般的に用いられている動きベクトル情報を符号化する方法と共に、動き情報を符号化済みのブロックの動き情報から生成する（参照フレームが決まったら予測誤差が一意に定まる）ダイレクトモードが定められている。 FIG. 5 shows a conceptual diagram of the bidirectional interframe predictive coding method.
In this bi-directional inter-frame prediction, motion information is generated from motion information of an already-encoded block together with a method of encoding motion vector information generally used (if a reference frame is determined, a prediction error is uniquely determined). Direct mode is defined.

図６にダイレクトモードの概念図を示す。図６においてＭＶ_baseは既に符号化済みの後方向フレームであるFrame1から符号化済みの前方向フレームであるFrame0を参照する場合の、処理対象領域と同じ位置にある領域の動きベクトルを示している。このＭＶ_baseを用いてダイレクトモードにおける処理対象領域の前方向動きベクトルＭＶ_F及び後方向動きベクトルＭＶ_Bは下記のように与えられる。
ＭＶ_F＝ＭＶ_base×Ｔ_F／Ｔ_D
ＭＶ_B＝ＭＶ_F−ＭＶ_base
上式においてＴ_F、Ｔ_DはそれぞれFrame0と処理対象フレームの時間間隔、Frame0とFrame1の時間間隔を表している。 FIG. 6 shows a conceptual diagram of the direct mode. In FIG. 6, MV _base indicates a motion vector of a region located at the same position as the processing target region when referring to Frame 0 that is the encoded forward frame from Frame 1 that is the encoded backward frame. . Using this MV _base , the forward motion vector MV _F and the backward motion vector MV _B of the processing target area in the direct mode are given as follows.
_{_{_{MV F = MV base × T F}}} / T D
MV _B = MV _F -MV _base
In the above equation, T _F and T _D represent the time interval between Frame 0 and the frame to be processed, and the time interval between Frame 0 and Frame 1, respectively.

このようにダイレクトモードは既に符号化済みの動きベクトルから一意に動きベクトルを決定できるため、動きベクトル探索を行う必要が無い。動きベクトル探索を行わないため、一般的には直交変換係数の予測誤差による符号量が大きく増加し圧縮効率が悪化することになるが、動き情報の符号化が不要なため場合によっては符号化効率を向上させることが可能となる。
また、このダイレクトモードの利点として処理の簡略さがある。一般的な動きベクトル探索が数十回のブロックマッチング演算やメモリからの参照領域の読み出しを行うのに対し、ダイレクトモードについての評価はブロックマッチング演算及びメモリからの参照領域の読み出しが一回のみで済む。 In this way, since the direct mode can uniquely determine a motion vector from already encoded motion vectors, there is no need to perform a motion vector search. Since motion vector search is not performed, the code amount due to the orthogonal transform coefficient prediction error will generally increase and the compression efficiency will deteriorate. Can be improved.
Also, the advantage of this direct mode is the simplicity of processing. While general motion vector search performs dozens of block matching operations and reading reference areas from memory, direct mode evaluation requires only one block matching operation and reading of reference areas from memory. That's it.

このような双方向フレーム間予測において、ダイレクトモードを用いるか否か及び最適な動きベクトルは一般的に以下のようなコスト関数を用いて決定される。
for(i=0;i<search_number;i++){
ＣＯＳＴ[i] =ＳＡＤ[i] +ＭＶ[i];}
上式においてsearch_numberはダイレクトモードについての評価及び動きベクトル探索を行う回数を示している（i=0がダイレクトモードについての評価を示す）。 In such bi-directional inter-frame prediction, whether or not to use the direct mode and the optimal motion vector are generally determined using the following cost function.
for (i = 0; i <search_number; i ++) {
COST [i] = SAD [i] + MV [i];}
In the above equation, search_number indicates the number of times that the direct mode evaluation and motion vector search are performed (i = 0 indicates the direct mode evaluation).

符号化効率を向上させるために動きベクトル探索範囲を広げ、動きベクトル探索の回数（メモリ読み出しやブロックマッチングを行う回数）を増やす程search_numberは大きな値となる。
ＳＡＤ[i]は、ダイレクトモード及び各動きベクトル探索についてのブロックマッチング結果であり（符号量推定情報）、ＳＡＤ[i]が小さくなるほど相関度が高くなり直交変換係数の予測誤差による符号量が減少すると推定される。
ＭＶ[i]は、動きベクトルの符号量情報を示しており、ダイレクトモードにおいては０となる。
ＣＯＳＴ[i]は、ブロックマッチングの結果と動きベクトル情報から決定されるコスト関数であり、このＣＯＳＴ[i]が一番小さくなった動きベクトルを符号化効率が一番高い最適な動きベクトルとして、実際の符号化に用いる。ダイレクトモードについてのＣＯＳＴ[i]が一番小さくなった場合にダイレクトモードで符号化が行われ、動きベクトル情報の符号化は行われない。 The search_number becomes larger as the motion vector search range is expanded in order to improve the coding efficiency and the number of motion vector searches (the number of times of memory reading and block matching) is increased.
SAD [i] is a block matching result for the direct mode and each motion vector search (code amount estimation information). As SAD [i] becomes smaller, the degree of correlation increases and the code amount due to the prediction error of the orthogonal transform coefficient decreases. It is estimated that.
MV [i] indicates the code amount information of the motion vector, and is 0 in the direct mode.
COST [i] is a cost function determined from the result of block matching and motion vector information. The motion vector having the smallest COST [i] is set as the optimal motion vector having the highest coding efficiency. Used for actual encoding. When COST [i] for the direct mode is the smallest, encoding is performed in the direct mode, and motion vector information is not encoded.

このように、従来例においてはフレーム内の分割された各領域について、常に多くの動きベクトル探索とダイレクトモードの評価を行い、一番良い符号化効率を実現する動きベクトル又はダイレクトモードを選択していた。
また一般的に動画像符号化においては一定に制御すべき目標ビットレートが設定される。ここで一般的な公知技術、例えばＭＰＥＧ−２のＴＭ５（ＩＳＯ／ＩＥＣ−ＪＴＣ１／ＳＣ２９／ＷＧ１１：“ＴｅｓｔＭｏｄｅｌ５（Ｄｒａｆｔ）”，ＭＰＥＧ９３／Ｎ０４００，１９９３）がある。この公知技術では、ビットレートに余裕がある場合に量子化スケール値を小さくすることで発生するビットレートを大きくしていた。 Thus, in the conventional example, for each divided region in the frame, many motion vector searches and direct mode evaluation are always performed, and the motion vector or direct mode that achieves the best coding efficiency is selected. It was.
In general, in video coding, a target bit rate to be controlled at a constant level is set. Here, there is a general known technique, for example, TM2 of MPEG-2 (ISO / IEC-JTC1 / SC29 / WG11: “Test Model 5 (Draft)”, MPEG93 / N0400, 1993). In this known technique, when the bit rate has a margin, the bit rate generated by decreasing the quantization scale value is increased.

なお、本発明は携帯電話やデジタルカメラ、デジタルビデオカメラ等の携帯機器へ適用することができる。そして、このような携帯機器の多くはバッテリー駆動であり、長時間使用するためには機器の中で多くの電力を消費する動画符号化方式の省電力化が必須である。しかし、上述した公知技術のようにフレーム内の各領域についてダイレクトモードの評価及び多くの回数の動きベクトル探索を行う場合には、記憶装置からの読み出しやブロックマッチングの回数が多くなる。一般的に記憶装置において消費される電力は読み出し・書き込みの回数に比例し、ブロックマッチング演算において消費される電力もブロックマッチングを行う回数にほぼ比例する。従って、結果として公知技術を用いた動画像符号化方式では消費される電力を考慮せずに多くの回数の動きベクトル探索を行うため、消費電力が多くなってしまうという問題があった。 The present invention can be applied to mobile devices such as mobile phones, digital cameras, and digital video cameras. Many of such portable devices are battery-powered, and in order to use them for a long time, it is indispensable to save power in a moving image coding system that consumes a lot of power in the devices. However, when direct mode evaluation and a large number of motion vector searches are performed for each region in the frame as in the above-described known technique, the number of times of reading from the storage device and block matching increases. In general, the power consumed in the storage device is proportional to the number of times of reading and writing, and the power consumed in the block matching calculation is almost proportional to the number of times of performing block matching. Therefore, as a result, the moving picture coding method using the known technique has a problem that the power consumption increases because the motion vector search is performed many times without considering the power consumption.

この問題に対し、特許文献１には符号化対象ブロックの周波数成分等の画像特徴情報や電池の残量等の使用状況情報に応じて相関度演算手段でのサブサンプルの設定を行い、相関度演算ブロックの全てまたはその一部を選択して動作させる発明が開示されている。 To deal with this problem, Patent Document 1 sets subsamples in the correlation degree calculation means in accordance with image feature information such as frequency components of the encoding target block and usage state information such as the remaining battery level, and the correlation degree. An invention is disclosed in which all or a part of operation blocks are selected and operated.

また、特許文献２には、撮影画角の広い場合あるいは撮影距離の長い場合に動きベクトル探索範囲を狭くすることで消費電力を削減する発明が開示されている。 Patent Document 2 discloses an invention that reduces power consumption by narrowing the motion vector search range when the shooting angle of view is wide or when the shooting distance is long.

特開平１１−１３６６８２号公報Japanese Patent Laid-Open No. 11-136682 特開平１１−２０５６５６号公報JP-A-11-205656

しかしながら、特許文献１に開示された発明では、既に符号化済みの動画のビットレート及び目標ビットレートを考慮に入れていない。このために、既に符号化済みの動画のビットレートが高い場合にサブサンプリングを行うことで符号化効率を更に悪化させると画質の劣化を招いてしまうという問題がある。また、サブサンプルの設定を変化させるのみではメモリからの読み出しに消費される電力は変わらず、かつサブサンプリング回路を動作させなければならないため消費電力削減効果が少ないという問題がある。 However, the invention disclosed in Patent Document 1 does not take into account the bit rate and target bit rate of an already encoded moving image. For this reason, when the bit rate of the already encoded moving image is high, if the encoding efficiency is further deteriorated by performing sub-sampling, there is a problem that the image quality is deteriorated. In addition, there is a problem that the power consumption for reading from the memory does not change only by changing the setting of the subsample, and the effect of reducing the power consumption is small because the subsampling circuit must be operated.

また、特許文献２に開示された発明でも同様に、動き探索範囲を狭くした場合に符号化効率が悪化し、結果として画質の劣化を招いてしまうという問題がある。また、撮影画角が狭い場合あるいは撮影距離が短い場合には常に動きベクトル探索を広い範囲で行い、消費電力が低減されないという問題がある。 Similarly, in the invention disclosed in Patent Document 2, when the motion search range is narrowed, there is a problem that the encoding efficiency is deteriorated and as a result, the image quality is deteriorated. Further, when the shooting angle of view is narrow or when the shooting distance is short, there is a problem that the motion vector search is always performed in a wide range and the power consumption is not reduced.

本発明は上述した問題点に鑑みてなされたものであり、符号化モードを適切に選択することで低消費電力の動画像符号化方式を提供することを目的とする。また、本発明は、極端な画質劣化が起こらないようなコスト関数を設定することを目的とする。また、本発明は、より消費電力の削減効果が大きくなるような符号化モードの選択を行うことを目的とする。さらに、本発明は、低消費電力の符号化モードをより適切に選択することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a low-power-consumption video coding method by appropriately selecting a coding mode. Another object of the present invention is to set a cost function that does not cause extreme image quality degradation. Another object of the present invention is to select an encoding mode that further increases the power consumption reduction effect. Furthermore, an object of the present invention is to more appropriately select a low power consumption encoding mode.

本発明は、複数の符号化モードから一つの符号化モードを選択して動画像を符号化する動画像符号化方法であって、該符号化モードは既に符号化済みの動画のビットレート及び目標ビットレートと、各符号化モードに対応する消費電力量情報とから決定されることを特徴とする動画像符号化方法等、を提供する。 The present invention is a moving picture coding method for coding a moving picture by selecting one coding mode from a plurality of coding modes, and the coding mode includes a bit rate and a target of a moving picture that has already been coded. There is provided a moving picture coding method or the like characterized by being determined from a bit rate and power consumption information corresponding to each coding mode.

本発明によれば、符号化モードを適切に選択することで、電力の消費を低減させることができる。より具体的には図２を用いて説明する。図２は発生したビットレート（符号量）の遷移を時系列で示した図である。図２の横軸が時間、縦軸が累積符号量を示し、図中の点線が目標ビットレート、実線が既に符号化済みの動画のビットレートを示す（点線及び実線の傾きがビットレートを示している）。図２に示すようにビットレートに余裕がある場合にのみビットレートが高くなることを許容し、ダイレクトモード等の低消費電力の符号化モードを選択し低消費電力化を実現する。 According to the present invention, it is possible to reduce power consumption by appropriately selecting an encoding mode. More specifically, a description will be given with reference to FIG. FIG. 2 is a diagram showing the transition of the generated bit rate (code amount) in time series. In FIG. 2, the horizontal axis indicates time, the vertical axis indicates the accumulated code amount, the dotted line in the figure indicates the target bit rate, and the solid line indicates the bit rate of the already encoded video (the slopes of the dotted line and the solid line indicate the bit rate). ing). As shown in FIG. 2, the bit rate is allowed to increase only when there is a margin in the bit rate, and the low power consumption encoding mode such as the direct mode is selected to realize the low power consumption.

また、従来例のように量子化スケール値を大きくしないので、ビットレートに余裕がある時のみ、一時的に画質が良くなるという現象はない。しかし、一時的に画質がよくなっても主観的な画質の優劣にはそれほど大きな影響を与えないことから、消費電力を削減することの方がユーザーにとってのメリットは大きいと言える。 Further, since the quantization scale value is not increased as in the conventional example, there is no phenomenon that the image quality is temporarily improved only when there is a margin in the bit rate. However, even if the image quality is temporarily improved, it does not have a great influence on the superiority or inferiority of the subjective image quality. Therefore, it can be said that reducing the power consumption has a greater merit for the user.

また、本発明によれば低消費電力の符号化モードの符号化効率が著しく低い場合には低消費電力の符号化モードを適用せず、画質劣化を防ぐことが可能になる。また、本発明によれば、画像特徴情報を用いて発生する符号量を正確に予測し、符号化効率の高い符号化モード（消費電力の高い符号化モード）又は低消費電力の符号化モード（符号化効率の低いモード）をより適切に選択することが可能となる。 In addition, according to the present invention, when the coding efficiency of the low power consumption coding mode is extremely low, the low power consumption coding mode is not applied, and image quality deterioration can be prevented. In addition, according to the present invention, a coding amount generated using image feature information is accurately predicted, and a coding mode with high coding efficiency (coding mode with high power consumption) or a coding mode with low power consumption ( It is possible to more appropriately select a mode with low coding efficiency.

（第１の実施形態）
本発明の動画像符号化方式において、ダイレクトモードを用いるか否か及び最適な動きベクトルは以下のようなコスト関数を用いて決定される。
for(i=0;i<search_number;i++){
if (i=0){
ＣＯＳＴ[i] =ＳＡＤ[i] +ＭＶ[i]−ＯＦＦＳＥＴ;
if (ＣＯＳＴ[i] <Ｔｈｒ) break;
}
else{
ＣＯＳＴ[i] =ＳＡＤ[i] +ＭＶ[i]; }}
上式においてi=0がダイレクトモードについての評価を示している。また、ＯＦＦＳＥＴは以下のように定義されるオフセット量である。 (First embodiment)
In the moving picture coding system of the present invention, whether or not to use the direct mode and the optimum motion vector are determined using the following cost function.
for (i = 0; i <search_number; i ++) {
if (i = 0) {
COST [i] = SAD [i] + MV [i] −OFFSET;
if (COST [i] <Thr) break;
}
else {
COST [i] = SAD [i] + MV [i];}}
In the above equation, i = 0 indicates the evaluation for the direct mode. OFFSET is an offset amount defined as follows.

上式において符号化難易度は符号量の発生しやすさを示す画像特徴情報であり、一般的には画像のアクティビティが用いられる。この画像のアクティビティについて図７を用いて説明する。図７は量子化スケール値（Ｑスケール）と発生する符号量の関係を示したグラフであり、横軸が量子化スケール値、縦軸が発生符号量を示す。図７に示すように、同じ量子化スケール値を用いた場合でも画像のアクティビティによって発生する符号量は大きくなることが分かる。よって、アクティビティが高いほど（画像の空間周波数高い、符号化難易度が高い画像ほど）符号量は発生しやすくなり、アクティビティが低いほど（画像の空間周波数が低い、符号化難易度が低いほど）符号量が発生しにくくなる。 In the above equation, the encoding difficulty level is image feature information indicating the ease of generation of the code amount, and generally an image activity is used. The activity of this image will be described with reference to FIG. FIG. 7 is a graph showing the relationship between the quantization scale value (Q scale) and the generated code amount, where the horizontal axis indicates the quantization scale value and the vertical axis indicates the generated code amount. As shown in FIG. 7, it can be seen that even when the same quantization scale value is used, the amount of code generated by the activity of the image increases. Therefore, the higher the activity (the higher the spatial frequency of the image, the higher the encoding difficulty), the more likely the code amount is generated, and the lower the activity (the lower the spatial frequency of the image, the lower the encoding difficulty). The amount of code is less likely to occur.

このように発生する符号量を予想できる画像のアクティビティを抽出し、符号化モードの決定に反映させることでより適切に符号化モードを決定することができる。このアクティビティを用いない場合には実際はあまり符号量が発生しない画像であるにも関わらず、消費電力の多い、高圧縮効率の符号化モードを選択してしまう、等のデメリットが生じてしまう。 By extracting the activity of the image that can predict the amount of code generated in this way and reflecting it in the determination of the encoding mode, the encoding mode can be determined more appropriately. If this activity is not used, there are disadvantages such as selecting a coding mode with high power consumption and high compression efficiency even though the image does not generate much code.

また、α_POWERは通常の動きベクトル探索での消費電力量と、低消費電力であるダイレクトモードでの消費電力量の比によって決まる消費電力情報である。この消費電力情報は、ユーザーが各符号化モードの構成や実装形態に応じて予め指定するパラメータである（α_POWERは常に正の値を取る）。通常の符号化モードと低消費電力の符号化モードとの差が大きい程、すなわち消費電力の削減効果が大きいほどα_POWERは大きくなる。また、後述するようにこのパラメータはユーザーが指定する動作モードに応じて設定することも可能である。α_POWERは常に正の値であるため、ＯＦＦＳＥＴはビットレートに余裕があればあるほど、また符号化難易度が低いほど大きくなる。 Also, alpha _POWER is the power consumption information determined and the power consumption in a normal motion vector search, the ratio of power consumption in the direct mode is a low power consumption. This power consumption information is a parameter specified in advance by the user in accordance with the configuration and mounting form of each encoding mode (α _POWER always takes a positive value). Α _POWER increases as the difference between the normal encoding mode and the low power consumption encoding mode increases, that is, as the power consumption reduction effect increases. As will be described later, this parameter can also be set according to the operation mode designated by the user. Since α _POWER is always a positive value, OFFSET increases as the bit rate has a margin and the encoding difficulty level decreases.

ＯＦＦＳＥＴが大きくなるとダイレクトモードに相当するコスト関数が通常の動きベクトル探索モードに比べて低くなる。よって、ダイレクトモードに対応するコスト関数の値が閾値Ｔｈｒ以下になれば、符号化モードはダイレクトモードとして決定され、動きベクトル探索は行わない。結果として当該ブロックの符号化に必要な消費電力は非常に少なくて済む。このように、ビットレートや画像特徴情報、消費電力情報に応じてダイレクトモードが選ばれやすくすることにより、低消費電力化を実現することができる。 When OFFSET increases, the cost function corresponding to the direct mode becomes lower than that in the normal motion vector search mode. Therefore, if the value of the cost function corresponding to the direct mode is equal to or less than the threshold value Thr, the encoding mode is determined as the direct mode, and the motion vector search is not performed. As a result, very little power is required for encoding the block. As described above, the direct mode can be easily selected according to the bit rate, the image feature information, and the power consumption information, thereby realizing low power consumption.

次に、本発明における動画像符号化装置の構成を図１に示す。
図１は本発明における動画像符号化装置のブロック図で示したものである。図１に示されえるように、動画像符号化装置は、フレームメモリ101、減算器102、直交変換器103、量子化器104、スキャン処理器105、エントロピー符号化器106、逆量子化器107、逆直交変換器108、加算器109を含む。
また、動画像符号化装置は、フレームメモリ110、動きベクトル生成器111、ブロックマッチング演算器112、オフセット量生成器113、アクティビティ抽出器114、コスト関数計算器115、参照領域生成器116を含む。 Next, FIG. 1 shows the configuration of the moving picture coding apparatus according to the present invention.
FIG. 1 is a block diagram of a moving picture coding apparatus according to the present invention. As shown in FIG. 1, the moving picture encoding apparatus includes a frame memory 101, a subtractor 102, an orthogonal transformer 103, a quantizer 104, a scan processor 105, an entropy encoder 106, and an inverse quantizer 107. , An inverse orthogonal transformer 108 and an adder 109 are included.
The moving picture coding apparatus includes a frame memory 110, a motion vector generator 111, a block matching calculator 112, an offset amount generator 113, an activity extractor 114, a cost function calculator 115, and a reference region generator 116.

フレームメモリ101は、双方向フレーム間予測を行うために、入力動画像の各フレームを複数枚保存し、フレームの処理順序を入れ替える動作を行う。
減算器102は、時間的に異なるフレームからの予測を行うフレーム間予測符号化において、入力される処理対象領域から参照領域生成器116より出力される参照領域データを減算（動き予測）して、減算結果の予測誤差を直交変換器103へ出力する。
直交変換器103は、減算器102からの入力データに対しブロック単位（このブロックと動き予測を行う領域は必ずしも一致するものではない）に直交変換を施し、直交変換係数を量子化器104へ出力する。
量子化器104は、領域内の位置に対応する量子化テーブル値と当該領域の量子化スケール値を用いて直交変換を量子化し、全ての量子化直交変換係数をスキャン処理器105及び逆量子化器107に出力する。
スキャン処理器105は、符号化モードに応じてジグザグスキャン等のスキャン処理を行う。
エントロピー符号化器106は、スキャン処理器105の出力をエントロピー符号化し、符号として出力する。 In order to perform bidirectional inter-frame prediction, the frame memory 101 stores a plurality of frames of the input moving image and performs an operation of changing the processing order of the frames.
The subtracter 102 subtracts the reference region data output from the reference region generator 116 from the input region to be processed (motion prediction) in inter-frame prediction encoding for performing prediction from temporally different frames, The prediction error of the subtraction result is output to the orthogonal transformer 103.
The orthogonal transformer 103 performs orthogonal transformation on the input data from the subtractor 102 in units of blocks (the area where motion prediction is performed does not necessarily match this block), and outputs the orthogonal transformation coefficients to the quantizer 104. To do.
The quantizer 104 quantizes the orthogonal transform using the quantization table value corresponding to the position in the region and the quantization scale value of the region, and the scan processor 105 and inverse quantize all quantized orthogonal transform coefficients. Output to the device 107.
The scan processor 105 performs a scan process such as a zigzag scan in accordance with the encoding mode.
The entropy encoder 106 performs entropy encoding on the output of the scan processor 105 and outputs it as a code.

ここで図１に示す動画像符号化装置においては、動きベクトル探索及び動き予測を行うために、逆量子化器107、逆直交変換器108を用いて局所復号化処理が行われる。逆量子化器107においては当該領域の量子化スケール値を用いて当該領域の量子化直交変換係数の逆量子化が行われ、逆量子化係数を逆直交変換器108へ出力する。
逆直交変換器108は、逆量子化された直交変換係数に対し、該ブロック単位に逆直交変換を施し、復号した予測誤差を加算器109へ出力する。
加算器109は、参照領域生成器116から出力された予測値と逆直交変換器108からの復号された予測誤差を加算することにより復号化された再構成画像データとしてフレームメモリ110に記憶する。
動きベクトル生成器111は、次の処理対象領域の処理を行うために、ダイレクトモード及び動きベクトル探索に対応する動きベクトルを計算し、コスト関数計算器115へ出力する。同時にダイレクトモード又は動きベクトルに対応する参照領域をフレームメモリ110から読み出すためのアドレスを出力する。フレームメモリ110から読み出された参照領域はブロックマッチング演算器112へ出力される。 Here, in the moving picture encoding apparatus shown in FIG. 1, local decoding processing is performed using an inverse quantizer 107 and an inverse orthogonal transformer 108 in order to perform motion vector search and motion prediction. The inverse quantizer 107 performs inverse quantization of the quantized orthogonal transform coefficient of the region using the quantization scale value of the region, and outputs the inverse quantized coefficient to the inverse orthogonal transformer 108.
The inverse orthogonal transformer 108 performs inverse orthogonal transformation on the inversely quantized orthogonal transformation coefficient in units of the blocks, and outputs the decoded prediction error to the adder 109.
The adder 109 adds the prediction value output from the reference region generator 116 and the decoded prediction error from the inverse orthogonal transformer 108, and stores them in the frame memory 110 as reconstructed image data decoded.
The motion vector generator 111 calculates a motion vector corresponding to the direct mode and the motion vector search and outputs the motion vector to the cost function calculator 115 in order to perform processing of the next processing target region. At the same time, an address for reading a reference area corresponding to the direct mode or the motion vector from the frame memory 110 is output. The reference area read from the frame memory 110 is output to the block matching calculator 112.

ブロックマッチング演算器112においては与えられた参照領域と処理対象領域とのブロックマッチング演算を施し、結果であるＳＡＤ値をコスト関数計算器115へ出力する。
アクティビティ抽出器114は、処理対象領域のアクティビティを計算し、オフセット量計算器113へ出力する。
オフセット量計算器113は、目標ビットレート及び発生したビットレート、消費電力量情報であるα_POWER及びアクティビティ抽出器から与えられ入力された処理対象領域のアクティビティを用いてオフセット量を計算し、コスト関数計算器115へ出力する。
コスト関数計算器115においては入力されたＳＡＤ，動きベクトル情報及びオフセット量から最適な動きベクトル又はダイレクトモードを決定し、決定した動きベクトル又はダイレクトモードの情報を参照領域生成器116へと出力する。
参照領域生成器116は、決定された動きベクトル又はダイレクトモード情報に応じて、動き予測のために対応する参照領域を減算器102へ出力する。 The block matching calculator 112 performs a block matching calculation between the given reference area and the processing target area, and outputs the resulting SAD value to the cost function calculator 115.
The activity extractor 114 calculates the activity in the processing target area and outputs it to the offset amount calculator 113.
The offset amount calculator 113 calculates the offset amount using the target bit rate, the generated bit rate, α _POWER that is power consumption information, and the activity of the processing target area input from the activity extractor, and calculates the cost function. Output to the calculator 115.
The cost function calculator 115 determines an optimal motion vector or direct mode from the input SAD, motion vector information and offset amount, and outputs the determined motion vector or direct mode information to the reference region generator 116.
The reference region generator 116 outputs a corresponding reference region to the subtracter 102 for motion prediction according to the determined motion vector or direct mode information.

（第２の実施形態）
図３に本発明に係る動画像符号化方式を適用したデジタルカメラの構成図を示す。
図３はデジタルカメラ内部の構成を示した図である。デジタルカメラは、レンズ301、撮像素子302、信号処理回路303、記憶装置304、本発明を適用した動画像符号化装置305、記録媒体306、バッテリー307、ユーザーインターフェース308を含む。 (Second Embodiment)
FIG. 3 shows a block diagram of a digital camera to which the moving picture coding system according to the present invention is applied.
FIG. 3 is a diagram showing an internal configuration of the digital camera. The digital camera includes a lens 301, an image sensor 302, a signal processing circuit 303, a storage device 304, a moving image encoding device 305 to which the present invention is applied, a recording medium 306, a battery 307, and a user interface 308.

撮影対象から入力された映像はレンズ301を通して撮像素子302に入力され、映像信号に変換される。変換された映像信号は信号処理回路303において色空間変換やノイズ除去等が行われる。
信号処理回路303から出力された映像信号は処理速度を調整するために一時記憶装置304へ記憶される。記憶装置304から読み出された映像信号は動画像符号化装置305へ入力され、符号化データとして出力される。出力された符号化データはフラッシュメモリ等の記録媒体306に記録される。
バッテリー307は動画像符号化装置305を始めとする機器全体へ電力を供給する。 An image input from the object to be imaged is input to the image sensor 302 through the lens 301 and converted into an image signal. The converted video signal is subjected to color space conversion, noise removal, and the like in the signal processing circuit 303.
The video signal output from the signal processing circuit 303 is stored in the temporary storage device 304 in order to adjust the processing speed. The video signal read from the storage device 304 is input to the moving image encoding device 305 and output as encoded data. The output encoded data is recorded on a recording medium 306 such as a flash memory.
The battery 307 supplies power to the entire apparatus including the moving image encoding device 305.

ここでユーザーはユーザーインターフェース308を通じて撮影や再生等の処理と共に、動画像符号化装置305の動作モードの指定を行う。この動作モードとしてユーザーは高圧縮効率モードか低消費電力モードのいずれかを選択できる。高圧縮モードが選択された場合には消費電力が多くなってもなるべく高い圧縮効率を実現するため、動画像符号化装置305における消費電力量情報α_POWER（第１の実施形態を参照）は小さくなる。
一方、低消費電力モードでは圧縮効率が低くなっても低い消費電力で符号化を行うため、動画像符号化装置305における消費電力量情報α_POWERは大きくなる。低消費電力モードでは動画像符号化装置305で消費される電力は小さくなるため、バッテリー307をより長い時間使用することができる。 Here, the user designates the operation mode of the moving picture coding apparatus 305 through the user interface 308 along with processing such as shooting and reproduction. As this operation mode, the user can select either the high compression efficiency mode or the low power consumption mode. When the high compression mode is selected, the power consumption amount information α _POWER (see the first embodiment) in the moving picture coding apparatus 305 is small in order to achieve as high a compression efficiency as possible even if the power consumption increases. Become.
On the other hand, in the low power consumption mode, encoding is performed with low power consumption even when the compression efficiency is low, so that the power consumption amount information α _POWER in the moving picture coding apparatus 305 becomes large. In the low power consumption mode, the power consumed by the moving picture coding apparatus 305 is reduced, so that the battery 307 can be used for a longer time.

ユーザーは所持しているメディアの容量が小さい場合には撮影時間はメディア容量に制限されることから高圧縮モードを選択すればよい。もし所持しているメディアの容量が十分大きい場合には撮影時間はデジタルカメラに搭載されているバッテリー容量及び消費電力に制限されることから低消費電力モードを選択すればよい。
このように本発明によればユーザーは所持しているメディア等に応じて適切に動作モードを選択することにより、より長い撮影時間を確保できるというメリットを享受できる。 When the user has a small amount of media, the user can select the high compression mode because the shooting time is limited by the media capacity. If the capacity of the possessed media is sufficiently large, the low power consumption mode may be selected because the shooting time is limited by the battery capacity and power consumption mounted on the digital camera.
As described above, according to the present invention, the user can enjoy the advantage that a longer shooting time can be secured by appropriately selecting the operation mode according to the media or the like possessed.

上述した本発明の実施形態における動画像符号化装置を構成する各手段、並びに動画像符号化方法の各ステップは、コンピュータのＲＡＭやＲＯＭ等に記憶されたプログラムが動作することによって実現できる。このプログラム及び前記プログラムを記録したコンピュータ読み取り可能な記録媒体は本発明に含まれる。 Each means constituting the moving picture coding apparatus and each step of the moving picture coding method in the embodiment of the present invention described above can be realized by operating a program stored in a RAM or ROM of a computer. This program and a computer-readable recording medium recording the program are included in the present invention.

また、本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施形態も可能であり、具体的には、一つの機器からなる装置に適用してもよい。 Further, the present invention can be implemented as, for example, a system, apparatus, method, program, or recording medium, and may be applied to an apparatus composed of a single device.

なお、本発明は、上述した実施形態の機能を実現するソフトウェアのプログラムを、システム又は装置に直接、又は遠隔から供給する。そして、そのシステム又は装置のコンピュータが前記供給されたプログラムコードを読み出して実行することによっても達成される場合を含む。 The present invention supplies a software program for realizing the functions of the above-described embodiments directly or remotely to a system or apparatus. In addition, this includes a case where the system or the computer of the apparatus is also achieved by reading and executing the supplied program code.

従って、本発明の機能処理をコンピュータで実現するために、前記コンピュータにインストールされるプログラムコード自体も本発明を実現するものである。つまり、本発明は、本発明の機能処理を実現するためのコンピュータプログラム自体も含まれる。その場合、プログラムの機能を有していれば、オブジェクトコード、インタプリタにより実行されるプログラム、ＯＳに供給するスクリプトデータ等の形態であってもよい。 Accordingly, since the functions of the present invention are implemented by computer, the program code installed in the computer also implements the present invention. In other words, the present invention includes a computer program itself for realizing the functional processing of the present invention. In that case, as long as it has the function of a program, it may be in the form of object code, a program executed by an interpreter, script data supplied to the OS, and the like.

また、コンピュータが、読み出したプログラムを実行することによって、前述した実施形態の機能が実現される。更に、そのプログラムの指示に基づき、コンピュータ上で稼動しているＯＳ等が、実際の処理の一部又は全部を行い、その処理によっても前述した実施形態の機能が実現され得る。 Further, the functions of the above-described embodiments are realized by the computer executing the read program. Furthermore, based on the instructions of the program, an OS or the like running on the computer performs part or all of the actual processing, and the functions of the above-described embodiments can be realized by the processing.

更に、その他の方法として、まず記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれる。そして、そのプログラムの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部又は全部を行い、その処理によっても前述した実施形態の機能が実現される。 As another method, a program read from a recording medium is first written in a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer. Then, based on the instructions of the program, the CPU or the like provided in the function expansion board or function expansion unit performs part or all of the actual processing, and the functions of the above-described embodiments are also realized by the processing.

第１の実施形態に係る動画像符号化装置の構成を示す図である。It is a figure which shows the structure of the moving image encoder which concerns on 1st Embodiment. 第１の実施形態に係る発生符号量の時系列での遷移を示す図である。It is a figure which shows the transition in the time series of the generated code amount which concerns on 1st Embodiment. 第２の実施形態に係る撮像装置の構成を示す図である。It is a figure which shows the structure of the imaging device which concerns on 2nd Embodiment. フレーム間予測の概念を示す図である。It is a figure which shows the concept of inter-frame prediction. 双方向フレーム間予測の概念を示す図である。It is a figure which shows the concept of bi-directional inter-frame prediction. 双方向フレーム間予測におけるダイレクトモードを示す図である。It is a figure which shows the direct mode in bidirectional | two-way inter-frame prediction. 量子化スケール値と発生符号量の関係を示す図である。It is a figure which shows the relationship between a quantization scale value and the amount of generated codes.

Explanation of symbols

101 フレームメモリ
102 減算器
103 直交変換器
104 量子化器
105 スキャン変換器
106 エントロピー符号化器
107 逆量子化器
108 逆直交変換器
109 加算器
110 フレームメモリ
111 動きベクトル生成器
112 ブロックマッチング演算器
113 オフセット量生成器
114 アクティビティ抽出器
115 コスト関数計算器
116 参照領域生成器
301 レンズ
302 撮像素子
303 信号処理回路
304 記憶装置
305 動画像符号化装置
306 記録媒体
307 バッテリー
308 ユーザーインターフェース 101 frame memory
102 Subtractor
103 Orthogonal converter
104 Quantizer
105 Scan converter
106 Entropy encoder
107 Inverse quantizer
108 Inverse orthogonal transformer
109 adder
110 frame memory
111 Motion vector generator
112 Block matching calculator
113 Offset generator
114 activity extractor
115 Cost function calculator
116 Reference region generator
301 lenses
302 Image sensor
303 Signal processing circuit
304 storage
305 Video encoding device
306 Recording medium
307 battery
308 User interface

Claims

A moving picture coding method for coding a moving picture by selecting one coding mode from a plurality of coding modes,
The moving picture code characterized in that the selected coding mode is determined from a bit rate of a moving picture that has already been coded, a target bit rate, and power consumption information corresponding to each coding mode. Method.

The coding mode to be selected is determined using a cost function, and the cost function includes code amount estimation information calculated by block matching, motion vector code amount information, and power consumption corresponding to each coding mode. The moving picture encoding method according to claim 1, wherein the moving picture encoding method is determined based on an offset amount determined using the amount information.

The selected encoding mode includes a bidirectional inter-frame prediction encoding scheme, and the bidirectional inter-frame prediction encoding scheme includes a mode for encoding motion vector information and a mode for not encoding motion vector information. The moving picture encoding method according to claim 1, further comprising:

4. The moving image encoding method according to claim 1, wherein the selected encoding mode is determined using image feature information of an input moving image.