JPH06113290A

JPH06113290A - Motion vector detector

Info

Publication number: JPH06113290A
Application number: JP10543293A
Authority: JP
Inventors: Shinichi Uramoto; 紳一浦本; Mitsuyoshi Suzuki; 光義鈴木; Akihiko Takahata; 明彦高畠
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1992-08-13
Filing date: 1993-05-06
Publication date: 1994-04-22
Anticipated expiration: 2015-09-04
Also published as: JP3084170B2

Abstract

PURPOSE:To obtain the motion vector detector with a small occupied area, low power consumption and at a high speed operation. CONSTITUTION:A motion vector detector includes a processor array (10) in which element processors storing search window data and template data are arranged in a shape of 2-dimension array. Each of the element processors included in the processor array (10) is provided with a function taking a difference absolute value of the search window data and template data stored and with a function shifting storage data to an adjacent element processor. The motion vector detector includes a total sum section (12) summing the total of the difference absolute values outputted from each element processor of the processor array (10) and a comparator section (3) detecting a motion vector with respect to the template block according to the output of the total sum section (12).

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、動画像の動き補償に
用いられる動きベクトルを検出するための装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device for detecting a motion vector used for motion compensation of a moving image.

【０００２】[0002]

【従来の技術】膨大なデータ量を有する画像信号の伝送
または蓄積のためには、データ量を削減するデータ圧縮
技術が必要不可欠となる。画像データは、近隣画素間の
相関関係および人間の知覚特性などに起因するかなりの
冗長度を備える。このデータの冗長度を抑圧して伝送デ
ータ量を低減するデータ圧縮技術は高能率符号化と呼ば
れる。この高能率符号化方式の１つに、フレーム間予測
符号化方式がある。このフレーム間予測符号化方式にお
いては、以下の処理が実行される。2. Description of the Related Art In order to transmit or store an image signal having an enormous amount of data, a data compression technique for reducing the amount of data is indispensable. The image data has a considerable degree of redundancy due to the correlation between neighboring pixels and human perception characteristics. A data compression technique that suppresses this data redundancy and reduces the amount of transmission data is called high efficiency coding. An interframe predictive coding system is one of the high efficiency coding systems. In this interframe predictive coding method, the following processing is executed.

【０００３】現在符号化しようとしている現フレームの
各画素データと参照する前フレームの同じ位置にある各
画素データとの差分である予測誤差を算出する。この算
出された予測誤差を以後の符号化に用いる。この方法で
は、動きの少ない画像に関しては、フレーム間の相関が
大きいため高能率で符号化を行なうことができる。しか
しながら、動きの大きな画像については、フレーム間の
相関が小さいため誤差が大きくなり、逆に伝送されるデ
ータ量が増加するという欠点が生じる。A prediction error, which is a difference between each pixel data of the current frame to be encoded at present and each pixel data at the same position of the preceding frame to be referred to, is calculated. This calculated prediction error is used for subsequent encoding. With this method, an image with little motion can be coded with high efficiency because the correlation between frames is large. However, for an image with a large amount of motion, the error between the frames is small because the correlation between the frames is small, and conversely, the amount of data to be transmitted increases.

【０００４】上述の問題点を解決する方法として、動き
補償付フレーム間予測符号化方式がある。この方法で
は、以下の処理が行なわれる。すなわち、予測誤差を算
出する前に、予め現フレームと前フレームの画素データ
を用いて動きベクトルを算出する。この算出された動き
ベクトルに従って前フレームの予測画像を移動させる。
すなわち前フレームの動きベクトルだけずれた位置の画
素データを参照画素とし、この参照画素を予測値として
用いる。次いでこの移動後の前フレームと現フレームと
の各画素の予測誤差を算出し、予測誤差と動きベクトル
を伝送する。As a method for solving the above problems, there is an interframe predictive coding system with motion compensation. In this method, the following processing is performed. That is, before calculating the prediction error, the motion vector is calculated in advance using the pixel data of the current frame and the previous frame. The predicted image of the previous frame is moved according to the calculated motion vector.
That is, pixel data at a position shifted by the motion vector of the previous frame is used as a reference pixel, and this reference pixel is used as a prediction value. Next, the prediction error of each pixel between the previous frame and the current frame after this movement is calculated, and the prediction error and the motion vector are transmitted.

【０００５】図８６は、従来の動き補償付予測符号化方
式に従って画像データを符号化する符号器の全体の構成
を示すブロック図である。図８６において、符号器は、
入力された画像信号に対し所定の前処理を実行する前処
理回路９１０と、この前処理回路９１０により前処理さ
れた信号に対し冗長度の除去および入力信号の量子化を
実行するソース符号化回路９１２と、ソース符号化回路
９１２からの信号に対し所定のフォーマットに従った符
号化および予め定められたデータ構造の符号列に多重化
するビデオマルチプレクス符号化回路９１４とを含む。FIG. 86 is a block diagram showing the overall configuration of an encoder for encoding image data according to the conventional predictive encoding system with motion compensation. In FIG. 86, the encoder is
A pre-processing circuit 910 that performs a predetermined pre-processing on the input image signal, and a source coding circuit that removes redundancy and quantizes the input signal for the signal pre-processed by the pre-processing circuit 910. 912, and a video multiplex encoding circuit 914 that encodes the signal from the source encoding circuit 912 according to a predetermined format and multiplexes it into a code string having a predetermined data structure.

【０００６】前処理回路９１０は、時間および空間フィ
ルタを用いて入力画像信号を共通の中間フォーマット
（ＣＩＦ）に変換しかつノイズ除去のためのフィルタ処
理を実行する。The pre-processing circuit 910 converts the input image signal into a common intermediate format (CIF) using a temporal and spatial filter and performs filtering for noise removal.

【０００７】ソース符号化回路９１２は、与えられた信
号に対するたとえば離散コサイン変換（ＤＣＴ）などの
直交変換処理を行なうとともに、入力信号に対する動き
補償を行ない、かつ直交変換された画像データを量子化
する。The source coding circuit 912 performs orthogonal transform processing such as discrete cosine transform (DCT) on the given signal, performs motion compensation on the input signal, and quantizes the orthogonally transformed image data. .

【０００８】ビデオマルチプレクス符号化回路９１４
は、与えられた画像信号に対し２次元可変長符号化を行
なうとともに、データ処理単位であるブロックの各種属
性（動きベクトルなど）も可変長符号化した後に、予め
定められたデータ構造の符号列に多重化する。Video multiplex encoding circuit 914
Performs two-dimensional variable-length coding on a given image signal, and after variable-length coding various attributes (motion vector etc.) of a block which is a data processing unit, a code string of a predetermined data structure. To multiplex.

【０００９】符号器はさらに、このビデオマルチプレク
ス符号化回路９１４からの画像データをバッファ処理す
る伝送バッファ９１６と、伝送バッファ９１６からの画
像データを伝送チャネルに適合させるための伝送符号化
回路９１８を含む。The encoder further includes a transmission buffer 916 for buffering the image data from the video multiplex encoding circuit 914, and a transmission encoding circuit 918 for adapting the image data from the transmission buffer 916 to the transmission channel. Including.

【００１０】伝送バッファ９１６は、情報発生速度を一
定速度に平滑化する。伝送符号化回路９１８は誤り訂正
ビットの付加、音声信号データの付加などを実行する。The transmission buffer 916 smoothes the information generation rate to a constant rate. The transmission encoding circuit 918 executes addition of error correction bits, addition of voice signal data, and the like.

【００１１】図８７は図８６に示すソース符号化回路の
具体的構成の一例を示す図である。図８７において、ソ
ース符号化回路は、入力画像信号に対し動きベクトルを
検出しかつ動きベクトルに従って動き補償された参照画
素を生成する動き補償予測器９２０と、動き補償予測器
９２０からの参照画素データに対しフィルタ処理を行な
うループフィルタ９２２と、ループフィルタ９２２の出
力と入力画像信号との差分を求める減算器９２４と、減
算器９２４の出力を直交変換する直交変換器９２６と、
直交変換器９２６により直交変換されたデータを量子化
する量子化器９２８を含む。FIG. 87 is a diagram showing an example of a specific configuration of the source encoding circuit shown in FIG. In FIG. 87, the source encoding circuit detects a motion vector for an input image signal and generates a motion-compensated reference pixel according to the motion vector, and reference pixel data from the motion-compensated predictor 920. A loop filter 922 for performing a filter process on the output signal, a subtractor 924 for obtaining a difference between the output of the loop filter 922 and the input image signal, an orthogonal transformer 926 for orthogonally transforming the output of the subtractor 924,
It includes a quantizer 928 for quantizing the data orthogonally transformed by the orthogonal transformer 926.

【００１２】動き補償予測器９２０の構成は後に詳細に
説明するが、１フレーム前の画素データを格納するフレ
ームメモリを含み、入力画像信号データとこのフレーム
メモリ内の画素データとに従って動きベクトルの検出お
よび動き補償された参照画素データの生成を行なう。ル
ープフィルタ９２２は、画質改善のために設けられる。Although the structure of the motion compensation predictor 920 will be described in detail later, it includes a frame memory for storing the pixel data of one frame before, and detects a motion vector in accordance with the input image signal data and the pixel data in this frame memory. And generation of motion-compensated reference pixel data. The loop filter 922 is provided to improve image quality.

【００１３】直交変換器９２６は、減算器９２４からの
データに対し所定のブロック（通常８×８画素）を１つ
の単位としてＤＣＴ変換などの直交変換を行なう。量子
化器９２８はこの直交変換された画素データを量子化す
る。The orthogonal transformer 926 performs orthogonal transformation such as DCT transformation on the data from the subtractor 924 using a predetermined block (usually 8 × 8 pixels) as one unit. The quantizer 928 quantizes the orthogonally transformed pixel data.

【００１４】動き補償予測器９２０および減算器９２４
により動き補償付フレーム間予測が実行され、動画像信
号における時間的な冗長性が除去される。また、直交変
換器９２６による直交変換により動画像信号における空
間的な冗長性が除去される。Motion compensated predictor 920 and subtractor 924
By this, inter-frame prediction with motion compensation is executed, and temporal redundancy in the moving image signal is removed. Further, the orthogonal transformation by the orthogonal transformer 926 removes spatial redundancy in the moving image signal.

【００１５】ソース符号化回路はさらに、量子化器９２
８で量子化されたデータを量子化前の信号状態に変換す
るための逆量子化器９３０と、この逆量子化器９３０の
出力に対し逆直交変換を行なう逆直交変換器９３２と、
ループフィルタ９２２の出力と逆直交変換器９３２の出
力を加算する加算器９３４を含む。この逆量子化器９３
０および逆直交変換器９３２により次のフレームに対す
るフレーム間予測に用いる画像が生成される。この生成
された画像データは動き補償予測器９２０に含まれるフ
レームメモリへ書込まれる。入力画像信号（フレーム間
差分データ）が加算されるため、現フレームのデータが
再生される。通常、一般にこの逆量子化処理、逆直交変
換処理および加算処理は局部復号過程と呼ばれる。The source encoding circuit further includes a quantizer 92.
An inverse quantizer 930 for converting the data quantized in 8 into a signal state before quantization, and an inverse orthogonal transformer 932 for performing an inverse orthogonal transform on the output of the inverse quantizer 930,
It includes an adder 934 for adding the output of the loop filter 922 and the output of the inverse orthogonal transformer 932. This inverse quantizer 93
An image used for interframe prediction for the next frame is generated by the 0 and the inverse orthogonal transformer 932. The generated image data is written in the frame memory included in the motion compensation predictor 920. Since the input image signal (difference data between frames) is added, the data of the current frame is reproduced. Generally, the inverse quantization process, the inverse orthogonal transform process and the addition process are generally called a local decoding process.

【００１６】次に、動きベクトルの算出について具体的
に説明する。動きベクトルの算出には、一般にブロック
マッチング法が用いられる。Next, the calculation of the motion vector will be specifically described. A block matching method is generally used to calculate the motion vector.

【００１７】図８８（Ａ）に示すように、第（ｍ−１）
フレームにおける画像Ａが第ｍフレームにおいてはＡ′
に移動した状態を考える。ブロックマッチング法におい
ては、画像（１フレーム）をＰ×Ｑ画素のブロックに分
割する（一般にはＰ＝Ｑ）。現フレームにおいて着目す
るブロックに最も近似するブロックを前フレームから探
し出す。この着目ブロックから最も近似する前フレーム
におけるブロックへのずれを動きベクトルと称す。以下
に、より詳細に説明する。As shown in FIG. 88 (A), the (m-1) th
The image A in the frame is A ′ in the mth frame
Consider the state of moving to. In the block matching method, an image (one frame) is divided into blocks of P × Q pixels (generally P = Q). The block closest to the block of interest in the current frame is searched for in the previous frame. The shift from this block of interest to the block that most closely approximates the previous frame is called a motion vector. The details will be described below.

【００１８】図８８（Ｂ）に示すように、今第ｍフレー
ムを符号化対象フレームとする。フレームはＮ×Ｎ画素
のブロックに分割される。第ｍフレームにおけるＮ×Ｎ
画素のブロックにおける一番左上の画素位置（Ｎｋ，Ｎ
ｌ）における画素データの値をＸｍ（Ｎｋ，Ｎｌ）とす
る。画素位置を位置（ｉ，ｊ）だけずらした前フレーム
におけるブロックと現フレームにおけるブロックとの差
分の絶対値和を求める。次に、このずれ（ｉ，ｊ）を様
々な値に変え、それぞれの差分絶対値和を求める。その
最小値の差分絶対値和を与える位置（ｉ，ｊ）を動きベ
クトルと称する。As shown in FIG. 88B, it is assumed that the m-th frame is the current frame to be coded. The frame is divided into blocks of N × N pixels. N × N in the m-th frame
The upper left pixel position (Nk, N
Let the value of the pixel data in l) be Xm (Nk, Nl). The absolute value sum of the differences between the block in the previous frame and the block in the current frame in which the pixel position is shifted by the position (i, j) is obtained. Next, the deviation (i, j) is changed into various values, and the sum of absolute differences is obtained. A position (i, j) that gives the sum of absolute differences of the minimum values is called a motion vector.

【００１９】動きベクトルは１ブロック画素当り１個伝
送する必要がある。ブロックサイズを小さくすると伝送
情報が増加し、効果的なデータ圧縮ができなくなる。一
方、ブロックサイズを大きくすると効果的な動き検出が
困難となる。そこでブロックサイズは１６×１６画素、
動きベクトル探索範囲（ｉ，ｊの最大変化幅）は−１５
〜＋１５とするのが一般的である。以下に、具体的にブ
ロックマッチング法による動きベクトルの算出について
説明する。It is necessary to transmit one motion vector for each block pixel. If the block size is reduced, the amount of transmission information increases and effective data compression cannot be performed. On the other hand, if the block size is increased, effective motion detection becomes difficult. So the block size is 16x16 pixels,
Motion vector search range (maximum change width of i, j) is -15
It is generally set to +15. The calculation of the motion vector by the block matching method will be specifically described below.

【００２０】図８９はブロックマッチング法による動き
ベクトルの算出の方法を示す図である。今３５２ドット
×２８８ラインからなる画像９５０を考える。画像９５
０を、１６×１６の画素群を１ブロックとしてブロック
に分割する。このブロック単位で動きベクトルの検出が
実行される。検出処理の対象となるブロック（以下、テ
ンプレートブロックと称す）９５２の同じ位置にある前
フレームにおけるブロック９５４を基準として水平方向
および垂直方向に±１６画素大きなブロック、すなわち
ブロック９５４を中心とする４８×４８画素により構成
されるブロック９５６を探索ブロック（以下、サーチエ
リアと称す）とする。テンプレートブロック９５２に対
する動きベクトルの探索はこのサーチエリア内において
実行される。ブロックマッチング法に従った動きベクト
ルの検索方法は以下の処理ステップを備える。FIG. 89 is a diagram showing a method of calculating a motion vector by the block matching method. Now consider an image 950 consisting of 352 dots × 288 lines. Image 95
0 is divided into blocks with a 16 × 16 pixel group as one block. Motion vector detection is performed in block units. A block that is larger by ± 16 pixels in the horizontal and vertical directions with respect to a block 954 in the previous frame at the same position as a block to be detected (hereinafter referred to as a template block) 952, that is, a block 954 at the center 48 × A block 956 composed of 48 pixels is a search block (hereinafter referred to as a search area). The motion vector search for template block 952 is performed in this search area. The motion vector search method according to the block matching method includes the following processing steps.

【００２１】動きベクトルの候補に対応する変位を有す
る予測画像ブロック（図８９においては（ｉ，ｊ）で示
す）を求める。この求められたブロックとテンプレート
ブロックの対応の位置にある画素の差分絶対値和（また
は差分二乗和）のような評価関数値を求める。A predicted image block (indicated by (i, j) in FIG. 89) having a displacement corresponding to a motion vector candidate is obtained. An evaluation function value such as a sum of absolute differences (or a sum of squared differences) of pixels at corresponding positions of the obtained block and the template block is obtained.

【００２２】上述の動作を（ｉ，ｊ）が（−１６，−１
６）〜（＋１６，＋１６）のすべての変位に対して実行
する。すべての予測画像ブロックに対して評価関数（評
価値）を求めた後、この評価関数の値が最小となる予測
画像ブロックを検出する。テンプレートブロックと同じ
位置（以下、真裏と称す）のブロック（図８９において
（０，０）で示すブロック９５４）から評価関数値が最
小となる予測画像ブロックに向かうベクトルをこのテン
プレートブロックに対する動きベクトルと決定する。In the above operation, (i, j) is (-16, -1)
6) to (+16, +16) for all displacements. After obtaining the evaluation function (evaluation value) for all the predicted image blocks, the predicted image block having the smallest value of this evaluation function is detected. The vector from the block at the same position as the template block (hereinafter, referred to as the true back) (block 954 indicated by (0,0) in FIG. 89) to the prediction image block having the smallest evaluation function value is defined as the motion vector for this template block. decide.

【００２３】このような動きベクトルをハードウェアで
求めるための構成が種々提案されている。Various configurations for obtaining such a motion vector by hardware have been proposed.

【００２４】図９０は、従来の動きベクトル検出装置の
全体の構成を示す図であり、たとえば１９８９ＩＥＥ
Ｅ、ＩＣＡＳＳＰ’８９のプロシーディング、第２４５
３頁ないし第２４５６頁においてエー・アルティエリ等
により示されている。図９０において、動きベクトル検
出装置は、サーチエリアの画素データをサーチエリアの
１列分入力するためのサーチエリア入力レジスタ９６２
と、テンプレートブロック評価点と同一サイズの行およ
び列のマトリックス状に配置された複数のプロセサを含
むプロセサアレイ９６６と、このプロセサアレイに対し
サーチエリアにおける同一列のデータを格納するサーチ
エリアサイドレジスタ９６４ａおよび９６４ｂと、プロ
セサアレイ９６６の演算結果に従って動きベクトルを検
出する動きベクトル検出部９６８を含む。FIG. 90 is a diagram showing the overall structure of a conventional motion vector detecting device, for example, 1989 IEEE.
E, ICASSP'89 Proceedings, 245
See pages 3 through 2456 by A. Artieri et al. In FIG. 90, the motion vector detection device has a search area input register 962 for inputting one column of the search area pixel data.
And a processor array 966 including a plurality of processors arranged in a matrix of rows and columns having the same size as the template block evaluation points, and a search area side register 964a for storing data of the same column in the search area for this processor array. And 964b, and a motion vector detection unit 968 that detects a motion vector according to the calculation result of the processor array 966.

【００２５】プロセサアレイ９６６には、変位ベクトル
（ｉ，ｊ）に対応してプロセサが配置される。すなわ
ち、第ｉ行、第ｊ列に配置されたプロセサＰｉｊは、変
位ベクトルＤ（ｉ，ｊ）を計算する。Processors are arranged in the processor array 966 in correspondence with the displacement vector (i, j). That is, the processor Pij arranged in the i-th row and the j-th column calculates the displacement vector D (i, j).

【００２６】図９１は、図９０に示すプロセサアレイに
含まれるプロセサの構成を示す図である。図９１におい
て、プロセサ９７０は、アレイの水平および垂直方向の
３方向のプロセサから伝達されるサーチエリアデータを
受け、選択信号ＳＥＬに応答してその１つの入力を通過
させる３入力レジスタ９７２と、３入力レジスタ９７２
からのサーチエリアデータＹと外部から与えられるテン
プレートブロックデータＸとに基づいて歪（差分絶対値
和）を計算する歪計算部９７４と、歪計算部９７４から
の歪Ｄと水平方向に隣接するプロセサからの歪とを受
け、選択信号Ｔｏに従って一方を選択的に通過させる２
入力レジスタ９７６を含む。FIG. 91 shows a structure of a processor included in the processor array shown in FIG. In FIG. 91, a processor 970 receives search area data transmitted from processors in three horizontal and vertical directions of the array, and responds to a selection signal SEL to pass one input thereof. Input register 972
Distortion calculation section 974 for calculating distortion (sum of absolute differences) based on the search area data Y from and the template block data X given from the outside, and a processor adjacent to the distortion D from the distortion calculation section 974 in the horizontal direction. 2 and selectively pass one of them according to the selection signal To.
It includes an input register 976.

【００２７】このプロセサは、図９０に示すプロセサア
レイにおいて、サーチエリアにおいて動きベクトルの候
補となるすべての変位ベクトルに対応して２次元状に配
置される。プロセサアレイ９６６（図９０参照）の各プ
ロセサ９７０へは、すべて同じテンプレートブロックデ
ータＸが与えられる。このとき、プロセサ９７０へは、
サーチエリアブロックの対応のデータが与えられる。す
なわち、たとえばテンプレートブロックデータＸが、Ｘ
（ｍ，ｎ）の場合、プロセサＰｉｊに対してはサーチエ
リアブロックデータＹ（ｉ＋ｍ，ｊ＋ｎ）が与えられ
る。サーチウインドーデータは、サーチエリアサイドレ
ジスタ９６４ａおよび９６４ｂならびにプロセサアレイ
９６６の各プロセサ９７０を介して転送される。外部か
ら与えられるテンプレートブロックデータＸ（ｍ，ｎ）
に対し、正確にサーチエリアブロックデータＹ（ｍ＋
ｉ，ｎ＋ｊ）を与えるために、テンプレートブロックデ
ータおよびサーチエリアブロックデータは或る規則性を
もってスキャンされなければならない。This processor is arranged two-dimensionally in the processor array shown in FIG. 90 so as to correspond to all displacement vectors that are candidates for motion vectors in the search area. The same template block data X is applied to all the processors 970 of the processor array 966 (see FIG. 90). At this time, to the processor 970,
The corresponding data of the search area block is given. That is, for example, the template block data X is X
In the case of (m, n), the search area block data Y (i + m, j + n) is given to the processor Pij. The search window data is transferred via the search area side registers 964a and 964b and each processor 970 of the processor array 966. Template block data X (m, n) given from the outside
, The search area block data Y (m +
To give i, n + j), the template block data and the search area block data must be scanned with some regularity.

【００２８】図９２は、このテンプレートブロックのデ
ータのスキャン方法を示す図である。図９２において、
テンプレートブロックデータは、テンプレートブロック
９９９において図において矢印で示すように、まず同一
列に沿って上方向から下方向へスキャンされ、次いで１
列隣のデータを下方向から上方向へスキャンして生成さ
れる。このスキャン方法を「スネークスキャン」と称
す。このテンプレートブロックデータの「スネークスキ
ャン」方法に従って、プロセサアレイに与えられるサー
チエリアブロックデータも同様にスキャンされる。プロ
セサは、その配置位置に応じてサーチエリアデータを図
の上下方向または図９１の左方向へ転送する必要があ
る。このため３入力レジスタ９７２が設けられる。FIG. 92 is a diagram showing a method of scanning the data of this template block. In FIG. 92,
The template block data is first scanned from top to bottom along the same column in template block 999, as indicated by the arrows in the figure, then 1
It is generated by scanning the data next to the column from the bottom to the top. This scanning method is called “snake scan”. According to the "snake scan" method of the template block data, the search area block data provided to the processor array is also scanned. The processor needs to transfer the search area data in the vertical direction in the figure or in the left direction in FIG. 91 according to the arrangement position. Therefore, a 3-input register 972 is provided.

【００２９】２入力レジスタ９７６は、各変位ベクトル
が計算された後、動きベクトル検出部９６８において最
小の歪を与える変位ベクトルを求めるために、プロセサ
で計算された歪を動きベクトル検出部９６８へ伝達する
ために設けられる。動きベクトル検出部９６８は、各プ
ロセサからの歪のうち最小の歪を検出し、その最小の歪
を与えるプロセサ位置、すなわち動きベクトルを求め
る。次に、この図９０に示す動きベクトル検出装置の動
作について簡単に説明する。The 2-input register 976 transmits the distortion calculated by the processor to the motion vector detecting section 968 in order to obtain the displacement vector which gives the minimum distortion in the motion vector detecting section 968 after each displacement vector is calculated. It is provided to do. The motion vector detection unit 968 detects the minimum distortion among the distortions from the respective processors, and obtains the processor position giving the minimum distortion, that is, the motion vector. Next, the operation of the motion vector detecting device shown in FIG. 90 will be briefly described.

【００３０】プロセサアレイ９６６において第ｉ行第ｊ
列に配置されたプロセサＰｉｊは、Ｄ（ｉ，ｊ）＝Σ｜Ｘ（ｍ，ｎ）−Ｙ（ｍ＋ｉ，ｎ＋ｊ）｜で表わされる歪Ｄ（ｉ，ｊ）を算出する。ここで総和Σ
は、ｍおよびｎに関して行なわれる。ｍ，ｎの変化範囲
は、サーチエリアのサイズにより決定される。In the processor array 966, the i-th row and the j-th row
The processors Pij arranged in columns calculate the distortion D (i, j) represented by D (i, j) = Σ | X (m, n) -Y (m + i, n + j) |. Where the sum Σ
Is performed for m and n. The change range of m and n is determined by the size of the search area.

【００３１】今、図９３に示すように、テンプレートブ
ロック９８０としてＭ行Ｎ列に配置された画素を考え
る。最初のサイクルにおいては、プロセサアレイにおけ
る各プロセサには参照符号９８２で示すサーチエリアブ
ロックデータが格納される。外部からはテンプレートブ
ロック９８０における第１行第１列の画素Ｘ（１，１）
がプロセサアレイのすべてのプロセサに与えられる。各
プロセサ９７０はそこに格納されているサーチウインド
ーデータＹと与えられたテンプレートブロックデータＸ
との差分絶対値を求めて累算する。Consider a pixel arranged in M rows and N columns as a template block 980 as shown in FIG. In the first cycle, search area block data indicated by reference numeral 982 is stored in each processor in the processor array. From the outside, the pixel X (1,1) in the first row, first column in the template block 980
Is given to all processors in the processor array. Each processor 970 has search window data Y stored therein and given template block data X.
The absolute value of the difference between and is calculated and accumulated.

【００３２】次のサイクルにおいては、プロセサアレイ
においてサーチエリアデータが１行図９３の下方向へシ
フトされる。この状態で、次いでテンプレートブロック
９８０から次の画素データＸ（２，１）が与えられる。
プロセサ９７０において、格納されたサーチエリアデー
タはＹ（ｍ＋ｉ＋１，ｎ＋ｊ）である。これらのデータ
を用いて再び差分絶対値がとられかつ累算される。この
動作がＭ回繰返される。In the next cycle, the search area data is shifted downward by one row in FIG. 93 in the processor array. In this state, the next pixel data X (2,1) is then supplied from the template block 980.
In the processor 970, the stored search area data is Y (m + i + 1, n + j). Absolute difference values are again taken and accumulated using these data. This operation is repeated M times.

【００３３】Ｍ回上述の動作を繰返すと、外部からサー
チエリアの１列のサーチエリア画素データが図９０に示
すサーチエリア入力レジスタ９６２を介して書込まれ
る。不要となったサーチエリアの１列の画素データは放
出される。これにより、新しいサーチエリアデータがサ
ーチエリアサイドレジスタ９６４ａおよび９６４ｂおよ
びプロセサアレイ９６６に格納される。この動作が繰返
し実行される。When the above operation is repeated M times, the search area pixel data for one column of the search area is externally written via search area input register 962 shown in FIG. Pixel data for one column of the search area that has become unnecessary is released. As a result, new search area data is stored in the search area side registers 964a and 964b and the processor array 966. This operation is repeatedly executed.

【００３４】すなわち、図９４に示すように、最初はサ
ーチウインドー９９０を用いて差分絶対値和の計算が実
行され、Ｍサイクル完了後、次のサーチウインドー９９
２のデータを用いて再び同様の計算が実行され、以降、
サーチウインドー９９４、…と同様の動作が繰返され
る。最終的にサーチエリア９９６すべてに対する画素デ
ータに対する計算が実行されると、プロセサＰｉｊにお
いては、歪Ｄ（ｉ，ｊ）が求められ、保持される。That is, as shown in FIG. 94, first, the calculation of the sum of absolute differences is executed using the search window 990, and after the completion of M cycles, the next search window 99 is displayed.
The same calculation is performed again using the data of 2, and thereafter,
The same operation as the search window 994, ... Is repeated. When the calculation is finally performed on the pixel data for all the search areas 996, the distortion D (i, j) is obtained and held in the processor Pij.

【００３５】この各プロセサＰｉｊにおいて求められた
歪Ｄ（ｉ，ｊ）は動きベクトル検出部９６８（図９０参
照）へ伝達され、そこで最小の歪を与える歪が動きベク
トルとして検出される。The distortion D (i, j) obtained by each processor Pij is transmitted to the motion vector detecting unit 968 (see FIG. 90), and the distortion giving the minimum distortion is detected as a motion vector.

【００３６】上述の動きベクトル検出装置は、サーチエ
リアの画素データとテンプレートブロックの画素データ
を用いて動きベクトルを検出している。この場合に求め
られる動きベクトルは、整数精度の動きベクトルと称
す。動きベクトルの水平成分および垂直成分の最小単位
は１画素単位であるからである。このような整数精度の
動きベクトルは、ＴＶ電話およびＴＶ会議用動画像符号
化方式（ＣＣＩＴＴ勧告Ｈ．２６１）において規定され
ている。The above-mentioned motion vector detecting device detects the motion vector using the pixel data of the search area and the pixel data of the template block. The motion vector obtained in this case is called an integer precision motion vector. This is because the minimum unit of horizontal and vertical components of the motion vector is one pixel unit. Such an integer-precision motion vector is specified in the moving image coding system for TV phones and video conferences (CCITT Recommendation H.261).

【００３７】一方、デジタル蓄積メディアを用いる蓄積
系動画像符号化方式においては１／２画素単位（以下、
１／２画素精度と称す。また、整数精度よりも細かな精
度の検出を行なう場合を分数精度と称す）の検出精度が
要求される。１／２画素精度の動き補償においては、予
測に用いる参照フレーム上において画素単位でずらした
位置を調べるのみならず、画素と画素の間の位置をデー
タの補間によって生成し、この補間データに対してもブ
ロックマッチング処理を実行する。この１／２画素精度
の動きベクトルを用いる動き補償は、「ハーフ・ペル動
き補償」と呼ばれる。On the other hand, in the storage type moving image coding system using a digital storage medium, a 1/2 pixel unit (hereinafter,
This is called 1/2 pixel precision. In addition, detection precision that is finer than integer precision is called fractional precision). In motion compensation with 1/2 pixel accuracy, not only the position shifted in pixel units on the reference frame used for prediction is checked, but also the position between pixels is generated by data interpolation, and with respect to this interpolation data Even the block matching process is executed. Motion compensation using this 1/2 pixel precision motion vector is called “half-pel motion compensation”.

【００３８】図９５は、従来の分数精度動きベクトル検
出装置の全体の構成を概略的に示す図である。この図９
５に示す装置の構成は、たとえば１９８９、ＩＥＥＥ、
プロシーディングスオブＩＣＡＳＳＰ’８９の第２４３
７頁ないし第２４４０頁においてヤング等により開示さ
れている。FIG. 95 is a diagram schematically showing the overall structure of a conventional fractional precision motion vector detecting device. This Figure 9
The configuration of the apparatus shown in 5 is, for example, 1989, IEEE,
Proceedings of ICASSP'89 No. 243
No. 7, pp. 2440 to Young.

【００３９】図９５において分数精度動きベクトル検出
装置は、１フレーム前の参照フレーム画像全体の画素デ
ータを格納する参照画像用フレームメモリ８０１と、現
フレーム画像全体の画素データを格納する現画像用フレ
ームメモリ８０２と、フレームメモリ８０１および８０
２からデータ線８０７および８０９を介してサーチウイ
ンドーブロックデータおよびテンプレートブロックデー
タを読出し、整数精度での動きベクトルを検出する第１
の演算器８０５を含む。この第１の演算器８０５におい
て実行される処理内容は、先に説明したものと同様であ
る。In FIG. 95, the fractional precision motion vector detection device is provided with a reference image frame memory 801 for storing the pixel data of the entire reference frame image one frame before, and a current image frame for storing the pixel data of the entire current frame image. Memory 802 and frame memories 801 and 80
First, the search window block data and the template block data are read from No. 2 through the data lines 807 and 809 to detect the motion vector with integer precision.
The calculator 805 of FIG. The processing contents executed by the first arithmetic unit 805 are the same as those described above.

【００４０】分数精度動きベクトル検出装置はさらに、
参照画像用フレームメモリ８０１から分数精度での動き
ベクトルを検出するためのサーチウインドーブロックデ
ータをデータ線８０８を介して受けて格納する分数精度
用サーチウインドーメモリ８０４と、現画像用フレーム
メモリ８０２からデータ線８１０を介してそのときに用
いられているテンプレートブロックデータを受けて格納
する分数精度用テンプレートメモリ８０３と、分数精度
用テンプレートメモリ８０３および分数精度用サーチウ
インドーメモリ８０４からそれぞれデータ線８１１およ
び８１２を介して画素データを受け、分数精度の動きベ
クトルを検出する第２の演算器８０６を含む。第１の演
算器８０５からは検出された動きベクトルが信号線８１
３を介して第２の演算器８０６へ与えられる。第２の演
算器８０６は、この整数精度での動きベクトルと新たに
求めた分数精度での動きベクトルとを組として信号線８
１４上に伝達する。次に動作について説明する。The fractional precision motion vector detection device further comprises:
A fractional precision search window memory 804 for receiving and storing search window block data for detecting a motion vector with fractional precision from the reference image frame memory 801 via a data line 808, and a current image frame memory 802. From the fraction precision template memory 803, which receives and stores the template block data used at that time via the data line 810 from the fraction precision template memory 803 and the fraction precision search window memory 804. And 812 to receive pixel data and detect a fractional precision motion vector. The motion vector detected from the first computing unit 805 is the signal line 81.
3 to the second arithmetic unit 806. The second arithmetic unit 806 sets the motion vector with the integer precision and the motion vector with the newly obtained fractional precision as a set to the signal line 8
Communicate on 14. Next, the operation will be described.

【００４１】第１の演算器８０５は、参照画像用フレー
ムメモリ８０１からサーチウインドーブロックデータを
読出し、かつ現画像用フレームメモリ８０２からテンプ
レートブロックデータを読出し、各変位ベクトルに対す
る評価関数（評価値）を算出する。参照画像用フレーム
メモリ８０１からサーチエリアにおけるすべてのサーチ
ウインドーブロックデータが読出され、評価値が算出さ
れた後、第１の演算器８０５は、この評価値から最小の
評価値を検出し、その最小の評価値に対応する変位ベク
トルを動きベクトルとして決定する。The first computing unit 805 reads the search window block data from the reference image frame memory 801, reads the template block data from the current image frame memory 802, and evaluates the displacement function (evaluation value) for each displacement vector. To calculate. After all the search window block data in the search area is read from the reference image frame memory 801 and the evaluation value is calculated, the first arithmetic unit 805 detects the minimum evaluation value from this evaluation value and The displacement vector corresponding to the smallest evaluation value is determined as the motion vector.

【００４２】動きベクトルが検出された後、現画像用フ
レームメモリ８０２はそのときに用いられていたテンプ
レートブロックデータをデータ線８１０を介して分数精
度用テンプレートメモリ８０３へ伝達する。参照画像用
フレームメモリ８０１は、動きベクトルを与える変位ベ
クトルに対応するサーチウインドーブロックデータとそ
の周辺データとをデータ線８０８を介して分数精度用サ
ーチウインドーメモリ８０４へ伝達する。たとえば、サ
ーチウインドーブロックが１６×１６の画素からなる場
合、参照画像用フレームメモリ８０１からは、その周辺
の画素データを含む１８×１８の画素データが分数精度
用サーチウインドーメモリ８０４へ与えられる。After the motion vector is detected, the current image frame memory 802 transmits the template block data used at that time to the fraction accuracy template memory 803 via the data line 810. The reference image frame memory 801 transmits the search window block data corresponding to the displacement vector giving the motion vector and its peripheral data to the fraction accuracy search window memory 804 via the data line 808. For example, when the search window block is composed of 16 × 16 pixels, the reference image frame memory 801 supplies 18 × 18 pixel data including the peripheral pixel data to the fractional precision search window memory 804. .

【００４３】この結果、図９６に示すように、テンプレ
ートメモリ８０３には領域８２０のテンプレートブロッ
クの画素データ（白丸印で示す）が格納され、一方サー
チウインドーメモリ８０４には、領域８２２で示すサー
チウインドーブロックデータが格納される。第２の演算
器８０６においては以下の動作が実行される。As a result, as shown in FIG. 96, the template memory 803 stores the pixel data of the template block in the area 820 (indicated by the white circles), while the search window memory 804 stores the search indicated by the area 822. Stores window block data. The following operation is executed in the second arithmetic unit 806.

【００４４】まずサーチウインドーブロック８２２の各
画素データ（白丸印で示す）に対して補間処理が行なわ
れ、画素と画素の間に補間データが生成される。ここ
で、図９６においては、１／２画素精度での動きベクト
ルを検出する場合を示す。動きベクトルの候補は（−１
／２，−１／２）〜（１／２，１／２）の間の分数成分
を含む１６個の変位ベクトルである。すなわち、１／２
画素精度の場合、１６個のサーチウインドーブロックが
形成され、各ブロックに対し、評価値の算出が行なわれ
る。図９６に×印で示すように、１／２画素精度での動
きベクトル検出時においては、２画素Ｐ（Ａ）およびＰ
（Ｂ）の間の補間データは、（Ｐ（Ａ）＋Ｐ（Ｂ））／
２により算出され、４画素Ｐ（Ａ）、Ｐ（Ｂ）、Ｐ
（Ｃ）およびＰ（Ｄ）の間の補間データは、（Ｐ（Ａ）
＋Ｐ（Ｂ）＋Ｐ（Ｃ）＋Ｐ（Ｄ））／４で算出される。
補間データに対しては丸め処理が実行される。この生成
された補間データに対し、整数精度での動きベクトル検
出と同様の動作が実行される。First, interpolation processing is performed on each pixel data (indicated by white circles) of the search window block 822 to generate interpolation data between pixels. Here, FIG. 96 shows a case where a motion vector with 1/2 pixel precision is detected. The motion vector candidates are (-1
16 displacement vectors including fractional components between / 2, -1/2) and (1/2, 1/2). That is, 1/2
In the case of pixel accuracy, 16 search window blocks are formed, and an evaluation value is calculated for each block. As indicated by an X mark in FIG. 96, two pixels P (A) and P
The interpolation data between (B) is (P (A) + P (B)) /
4 pixels P (A), P (B), P
Interpolated data between (C) and P (D) is (P (A)
It is calculated by + P (B) + P (C) + P (D)) / 4.
Rounding processing is executed on the interpolation data. An operation similar to motion vector detection with integer precision is executed on the generated interpolation data.

【００４５】図９７に、整数精度の動きベクトルと分数
精度の動きベクトルとの関係を示す。図９７において、
垂直方向および水平方向において１画素単位で変位ベク
トルが配置される（図において白丸印で示す）。動きベ
クトルＭＩの周辺に対し分数精度での変位ベクトルが第
２の演算器により求められ（図９７において破線の丸印
で示す）、この変位ベクトルのうち最小値を与える変位
ベクトルが求められ、整数精度での動きベクトルＭＩに
対しその分数精度成分を組合せることにより分数精度
（詳細精度）での動きベクトルＭＦが求められる。FIG. 97 shows the relationship between the integer precision motion vector and the fraction precision motion vector. In FIG. 97,
Displacement vectors are arranged in units of one pixel in the vertical direction and the horizontal direction (shown by white circles in the figure). A displacement vector with a fractional precision is obtained by the second computing unit around the motion vector MI (shown by a broken line circle in FIG. 97), and the displacement vector giving the minimum value among these displacement vectors is obtained, and an integer is obtained. The motion vector MF with fractional precision (detailed accuracy) is obtained by combining the fractional precision component with the motion vector MI with precision.

【００４６】[0046]

【発明が解決しようとする課題】図９０に示す動きベク
トル検出装置においては、プロセサアレイ内のすべての
プロセサに対し同じテンプレートブロックデータが与え
られる。このため、このテンプレートブロックの画素デ
ータを書込むための回路には大きな駆動力が必要とさ
れ、このテンプレートブロック画素データ書込回路にお
ける電流消費が大きくなり、装置全体としての電力消費
が大きくなるという問題が生じる。In the motion vector detecting device shown in FIG. 90, the same template block data is given to all the processors in the processor array. Therefore, a large driving force is required for the circuit for writing the pixel data of the template block, and the current consumption in the template block pixel data writing circuit becomes large, and the power consumption of the entire device becomes large. The problem arises.

【００４７】また、図９０に示す動きベクトル検出装置
においては、プロセサアレイ内の各プロセサはそれぞれ
動きベクトルの候補となる変位ベクトルに対応する。サ
ーチエリアが垂直方向＋１６〜−１６、水平方向−１６
〜＋１６であれば、動きベクトルの候補となる変位ベク
トルの数は３３×３３＝１０８９となり、応じてプロセ
サの数が非常に大きくなり、装置の占有面積が大きくな
るという問題が生じる。Further, in the motion vector detecting device shown in FIG. 90, each processor in the processor array corresponds to a displacement vector which is a motion vector candidate. Search area is vertical direction +16 to -16, horizontal direction -16
In the case of up to +16, the number of displacement vectors as motion vector candidates is 33 × 33 = 1089, and accordingly, the number of processors becomes very large, which causes a problem that the area occupied by the device becomes large.

【００４８】また、演算の各サイクルにおいて、データ
転送はプロセサを介して行なわれる。このとき、データ
転送方向を決定するために３入力レジスタが用いられて
おり、データ転送時における消費電力が増大するという
問題が生じる。In each operation cycle, data transfer is performed via the processor. At this time, a 3-input register is used to determine the data transfer direction, which causes a problem that power consumption during data transfer increases.

【００４９】さらに、この図９０に示す動きベクトル検
出装置においては、各プロセサが評価値（歪）を計算し
ている。すなわち差分絶対値（または差分二乗）の演算
を行ない、かつこの演算結果を累算している。このため
プロセサの占有面積が大きくなるとともに消費電流が大
きくなるという問題が生じる。Further, in the motion vector detecting device shown in FIG. 90, each processor calculates an evaluation value (distortion). That is, the difference absolute value (or difference square) is calculated, and the calculation result is accumulated. For this reason, there arises a problem that the area occupied by the processor is increased and the current consumption is increased.

【００５０】また図９５に示す分数精度動きベクトル検
出装置の構成においては、整数精度のためのフレームメ
モリと、分数精度のためのフレームメモリとが設けられ
ている。分数精度での動きベクトル検出動作時において
は、整数精度用の参照画像用フレームメモリから分数精
度用サーチウインドーメモリへ必要とされるサーチウイ
ンドーブロックデータを転送する必要がある。このサー
チウインドーメモリへのデータの転送の後再びこのサー
チウインドーメモリからサーチウインドーブロックデー
タを読出して分数精度の動きベクトルを検出する必要が
ある。このため、メモリへのアクセス回数が極めて大き
くなり、このメモリへのアクセス時間により動きベクト
ル検出装置全体のスループットが律速されてしまい、高
速で動きベクトルを検出することができないという問題
が生じる。Further, in the configuration of the fractional precision motion vector detection device shown in FIG. 95, a frame memory for integer precision and a frame memory for fractional precision are provided. During the motion vector detection operation with fractional precision, it is necessary to transfer the required search window block data from the reference precision frame memory for integer precision to the fractional precision search window memory. After the data is transferred to the search window memory, it is necessary to read the search window block data from the search window memory again to detect the motion vector with the fractional accuracy. Therefore, the number of accesses to the memory becomes extremely large, and the throughput of the motion vector detection apparatus as a whole is limited by the access time to the memory, which causes a problem that the motion vector cannot be detected at high speed.

【００５１】同様に、テンプレートブロックデータにつ
いても現画像用フレームメモリから分数精度用テンプレ
ートメモリへデータを転送した後にこのテンプレートメ
モリからテンプレートブロックデータを読出す必要があ
り、この現フレーム画像データを格納するフレームメモ
リへのアクセス時間が長くなり、動きベクトル検出装置
のスループットが現画像用フレームメモリへのアクセス
時間により律速され、高速で動きベクトルを検出するこ
とができないという問題が生じる。Similarly, for the template block data, it is necessary to read the template block data from the template memory after transferring the data from the frame memory for the current image to the template memory for the fractional precision, and store the current frame image data. There is a problem that the access time to the frame memory becomes long, the throughput of the motion vector detecting device is limited by the access time to the current image frame memory, and the motion vector cannot be detected at high speed.

【００５２】それゆえに、この発明の目的は、高速で動
きベクトルを検出することのできる動きベクトル検出装
置を提供することである。Therefore, an object of the present invention is to provide a motion vector detecting device capable of detecting a motion vector at high speed.

【００５３】この発明の他の目的は、低消費電力で動作
する動きベクトル検出装置を提供することである。Another object of the present invention is to provide a motion vector detecting device which operates with low power consumption.

【００５４】この発明のさらに他の目的は、小占有面積
の動きベクトル検出装置を提供することである。Still another object of the present invention is to provide a motion vector detecting device having a small occupied area.

【００５５】[0055]

【課題を解決するための手段】この発明に係る動きベク
トル検出装置は、要約すれば、各々が互いに異なるテン
プレートブロックデータと互いに異なるサーチウインド
ーブロックデータとを格納する要素プロセサをアレイ状
に配置し、このアレイ状に配置されたプロセサを同時に
駆動して１つの変位ベクトルの評価値を計算するもので
ある。サーチウインドーブロックデータはこのプロセサ
アレイを一方方向に沿って転送され、不要となるサーチ
ウインドーブロックデータは順次シフトアウトされる。SUMMARY OF THE INVENTION In summary, a motion vector detecting apparatus according to the present invention has an array of element processors which respectively store different template block data and different search window block data. , The processors arranged in the array are simultaneously driven to calculate the evaluation value of one displacement vector. The search window block data is transferred through this processor array along one direction, and unnecessary search window block data is sequentially shifted out.

【００５６】分数精度の動きベクトル検出動作時および
内挿予測動きベクトル検出時においては、このプロセサ
アレイからシフトアウトされたデータを利用する。The data shifted out from this processor array is used during the motion vector detection operation with fractional precision and during the interpolative prediction motion vector detection.

【００５７】すなわち、請求項１に係る動きベクトル検
出装置は、各々が、実質的に一方方向に沿ってデータを
転送する転送手段を含む複数の要素プロセサを備える。
この要素プロセサは、現フレーム画像の第１のブロック
の画素データを格納する第１の格納手段と、参照フレー
ム画像内の第１のブロックに関連する第２のブロックの
画素データを格納する第２の格納手段と、第１および第
２の格納手段に格納されたデータに所定の演算処理を施
す演算手段とを含む。That is, the motion vector detecting apparatus according to the first aspect comprises a plurality of element processors each including a transfer means for transferring data substantially along one direction.
The element processor has first storage means for storing pixel data of a first block of the current frame image and second storage means for storing pixel data of a second block related to the first block in the reference frame image. Storage means and a calculation means for performing a predetermined calculation process on the data stored in the first and second storage means.

【００５８】請求項１に係る動きベクトル検出装置はさ
らに、複数の要素プロセサの各演算手段の出力に応答し
て第１のブロックの画像と第２のブロックの画像との相
関度を示す評価値を生成する評価値生成手段と、この評
価値生成手段からの第１のブロックに関連する複数の評
価値に従って第１のブロックに対する動きベクトルを決
定する決定手段とを備える。The motion vector detecting apparatus according to claim 1 is further responsive to the outputs of the respective calculating means of the plurality of element processors, and the evaluation value indicating the degree of correlation between the image of the first block and the image of the second block. And an deciding means for deciding a motion vector for the first block in accordance with a plurality of evaluation values associated with the first block from the evaluation value producing means.

【００５９】請求項２に係る動きベクトル検出装置は、
請求項１の決定手段が整数精度での動きベクトルを決定
し、かつさらに参照フレーム画像データと現フレーム画
像データとを受け、整数精度よりも細かな分数精度で第
１のブロックの動きベクトルを算出する演算処理手段
と、複数要素プロセサから出力される参照フレーム画像
データおよび現フレーム画像データの少なくとも一方を
演算処理手段へ処理されるべきデータとして与える手段
をさらに備える。A motion vector detecting device according to claim 2 is
The determining means according to claim 1 determines a motion vector with integer precision, and further receives the reference frame image data and the current frame image data, and calculates the motion vector of the first block with fractional precision finer than integer precision. And a means for giving at least one of the reference frame image data and the current frame image data output from the multi-element processor to the arithmetic processing means as data to be processed.

【００６０】請求項３に係る動きベクトル検出装置は請
求項１の要素プロセサの各々が、第１の格納手段とし
て、互いに異なる画素に対応するＭ個の画素データを記
憶する記憶手段を含み、また第２の格納手段は、互いに
異なる参照フレーム画素に対応するＮ個の画素データを
記憶する記憶手段を含む。ＭおよびＮは共に正の整数で
ありＮ≧Ｍの関係を満足する。In the motion vector detecting device according to a third aspect of the present invention, each of the element processors of the first aspect includes, as a first storing means, a storage means for storing M pixel data corresponding to mutually different pixels, and The second storage unit includes a storage unit that stores N pieces of pixel data corresponding to mutually different reference frame pixels. Both M and N are positive integers and satisfy the relationship of N ≧ M.

【００６１】請求項４に係る動きベクトル検出装置は、
請求項１における要素プロセサの演算手段が、第１のブ
ロックの画素データと第２のブロックの画素データとの
減算を行ない、この減算結果の符号を示す符号ビットと
大きさを示す大きさビットとの組合せで出力する減算手
段と、この減算手段の各大きさビットと符号ビットとの
モジュール２の加算を行なうことにより減算結果の差分
絶対値に対応する値を出力するゲート手段とを含む。演
算手段の出力は符号ビットと差分絶対値に対応する値と
の組で与えられる。A motion vector detecting device according to a fourth aspect is
The arithmetic means of the element processor according to claim 1 performs a subtraction between the pixel data of the first block and the pixel data of the second block, and a code bit indicating the sign of the subtraction result and a size bit indicating the size are obtained. And a gate means for outputting a value corresponding to the absolute difference value of the subtraction result by performing addition of the module 2 of each magnitude bit and the sign bit of the subtraction means. The output of the computing means is given as a set of a sign bit and a value corresponding to the absolute difference value.

【００６２】請求項５に係る動きベクトル検出装置は、
請求項４に係る動きベクトル検出装置において評価値生
成手段が演算手段の出力の総和を求める総和回路を含
む。この総和回路は、全出力が次段へ伝達されるツリー
状に配置された複数段の全加算回路を含む。演算手段か
ら与えられる符号ビットはこの全加算回路段の最下位ビ
ットの全加算回路のキャリ入力へ印加される。A motion vector detecting apparatus according to claim 5 is
In the motion vector detecting device according to the fourth aspect, the evaluation value generating means includes a summing circuit for calculating a sum of outputs of the calculating means. The summing circuit includes a plurality of stages of full adder circuits arranged in a tree shape, in which all outputs are transmitted to the next stage. The sign bit provided from the arithmetic means is applied to the carry input of the full adder circuit of the least significant bit of this full adder circuit stage.

【００６３】請求項６に係る動きベクトル検出装置は、
Ｑ行Ｐ列に配置された画素からなるテンプレートブロッ
クの動きベクトルをブロックマッチング処理により求め
る装置であって、各々がＭ個のテンプレートブロックデ
ータを格納する第１の格納手段とＮ個のサーチウインド
ーブロックデータを格納する第２の格納手段を含むＱ／
Ｍ個の要素プロセサとこのＱ／Ｍ個の要素プロセサに縦
続接続されるデータバッファ手段とを含む線形プロセサ
アレイが、Ｐ個並列に配置されたプロセサアレイを含
む。データバッファ手段はＲ個のサーチウインドーデー
タを格納する手段を含む。Ｎ＋Ｒはサーチエリアの１列
の大きさを示す。要素プロセサはさらに、第１の格納手
段に格納されたデータと第２の格納手段に格納されたデ
ータとに所定の処理を行なう演算手段を含む。A motion vector detecting device according to a sixth aspect is
A device for obtaining a motion vector of a template block composed of pixels arranged in Q rows and P columns by block matching processing, each of which stores first M template block data and N search windows. Q / including second storage means for storing block data
A linear processor array including M element processors and data buffer means cascaded to the Q / M element processors includes P processor arrays arranged in parallel. The data buffer means includes means for storing R search window data. N + R indicates the size of one row of the search area. The element processor further includes calculation means for performing a predetermined process on the data stored in the first storage means and the data stored in the second storage means.

【００６４】請求項７に係る動きベクトル検出装置は、
請求項６における動きベクトル検出装置において、要素
プロセサの各々は、線形プロセサアレイにおいて一方方
向に沿ってのみサーチウインドーデータを転送する手段
を含む。A motion vector detecting apparatus according to claim 7 is
7. The motion vector detection device according to claim 6, wherein each of the element processors includes means for transferring the search window data only along one direction in the linear processor array.

【００６５】請求項８に係る動きベクトル検出装置は、
請求項７に係る動きベクトル検出装置においてテンプレ
ートブロックデータがこのサーチウインドーデータの転
送方向と直交する方向とされる。A motion vector detecting device according to an eighth aspect is
In the motion vector detecting device according to the seventh aspect, the template block data is in a direction orthogonal to the transfer direction of the search window data.

【００６６】請求項９に係る動きベクトル検出装置は、
請求項７に係る動きベクトル検出装置において、テンプ
レートブロックデータがサーチウインドーデータの転送
方向と同一方向とされる。A motion vector detecting device according to a ninth aspect is
In the motion vector detecting device according to the seventh aspect, the template block data is in the same direction as the transfer direction of the search window data.

【００６７】請求項１０に係る動きベクトル検出装置
は、演算手段の演算速度がサーチウインドーデータの転
送速度のＮ倍とされる。In the motion vector detecting apparatus according to the tenth aspect, the calculation speed of the calculation means is N times the transfer speed of the search window data.

【００６８】請求項１１に係る動きベクトル検出装置
は、請求項６に係る動きベクトル検出装置において、サ
ーチウインドーデータの転送レートのＮ倍の速度で演算
手段が演算を実行する。According to an eleventh aspect of the motion vector detecting apparatus of the sixth aspect, in the motion vector detecting apparatus of the sixth aspect, the calculating means executes the calculation at a speed N times the transfer rate of the search window data.

【００６９】請求項１２に係る動きベクトル検出装置
は、請求項６に係る動きベクトル検出装置において、各
要素プロセサにおける第１の格納手段が、現フレームの
第１のブロックを格納する第１の記憶手段と、この現フ
レームにおける第２のテンプレートブロックデータを格
納する第２の記憶手段とを含む。A motion vector detecting apparatus according to a twelfth aspect is the motion vector detecting apparatus according to the sixth aspect, wherein the first storage means in each element processor stores the first block of the current frame. Means and a second storage means for storing the second template block data in this current frame.

【００７０】請求項１３に係る動きベクトル検出装置
は、現フレーム画像の第１のブロックの画素データを格
納する第１の格納手段と、この現フレーム画像の第２の
ブロックの画素データを格納する第２の格納手段と、第
１および第２のブロックに関連する参照フレームのブロ
ックの画素データを格納する第３の格納手段と、第１の
格納手段に格納された画素データと第３の格納手段に格
納された画素データに対し所定の演算処理を施す第１の
演算手段と、第２の格納手段に格納された画素データと
第３の格納手段に格納された画素データとに対し所定の
演算処理を施す第２の演算手段とを含む。第１および第
２の演算手段は共に参照フレームの同一画素データを用
いてそれぞれ所定の演算処理を行なう。第１および第２
の演算手段は並列態様で動作してもよく、また同一構成
を備え時分割的に動作するものであってもよい。A motion vector detecting apparatus according to a thirteenth aspect stores first pixel means for storing pixel data of a first block of the current frame image and pixel data of a second block of the current frame image. Second storage means, third storage means for storing pixel data of blocks of reference frames related to the first and second blocks, pixel data stored in the first storage means, and third storage A first arithmetic means for performing a predetermined arithmetic processing on the pixel data stored in the means; a predetermined arithmetic operation for the pixel data stored in the second storage means and the pixel data stored in the third storage means; Second arithmetic means for performing arithmetic processing is included. Both the first and second arithmetic means perform predetermined arithmetic processing using the same pixel data of the reference frame. First and second
The arithmetic means may operate in parallel, or may have the same configuration and operate in a time division manner.

【００７１】この請求項１３の動きベクトル検出装置は
さらに、第１および第２の演算手段から生成された演算
結果に基づいて第１のブロックと参照フレーム画像ブロ
ックとの相関度および第２のブロックと参照フレーム画
像ブロックとの相関度をそれぞれ示す評価値を生成する
評価値生成手段と、この評価値生成手段出力に応答して
第１のブロックおよび第２のブロックそれぞれに対する
動きベクトルを並列に決定する手段を含む。The motion vector detecting device according to the thirteenth aspect of the present invention further includes the degree of correlation between the first block and the reference frame image block and the second block based on the calculation result generated by the first and second calculating means. Evaluation value generating means for generating an evaluation value indicating the degree of correlation between each of the first block and the reference frame image block, and motion vectors for the first block and the second block are determined in parallel in response to the output of the evaluation value generating means Including means to do.

【００７２】請求項１４に係る動きベクトル検出装置
は、各々が現クレーム画像内の符号化されるべきテンプ
レートブロック画像とこのテンプレートブロック画像に
関連する参照フレーム画像内の部分参照画像とに従って
テンプレートブロック画像に対する動きベクトルをブロ
ックマッチングの評価値に従って算出する複数の片側予
測動き検出手段と、これらの複数の片側予測動き検出手
段が利用した部分参照画像データを入力し、内挿処理に
より内挿部分参照画像を生成し、テンプレートブロック
画像とこの内挿部分参照画像とのブロックマッチング処
理を行なってその評価関数値に従って動きベクトルを検
出する内挿予測動き検出手段と、複数の片側予測動き検
出手段が算出した評価関数値とこの内挿予測動き検出手
段が算出した評価関数値とに従ってテンプレートブロッ
ク画像の最終の動きベクトルを決定して出力する出力決
定手段とを備える。The motion vector detecting apparatus according to the fourteenth aspect of the present invention comprises a template block image according to a template block image to be encoded in a current claim image and a partial reference image in a reference frame image associated with the template block image. A plurality of one-sided predictive motion detecting means for calculating a motion vector for the block matching evaluation value, and partial reference image data used by the plurality of one-sided predictive motion detecting means are input, and an interpolated partial reference image by interpolation processing. Is generated, and an interpolated predicted motion detecting means for performing block matching processing between the template block image and this interpolated partial reference image to detect a motion vector according to the evaluation function value, and a plurality of one-sided predicted motion detecting means are calculated. The evaluation function value and the evaluation function calculated by this interpolation prediction motion detection means. And an output determining unit that determines and outputs the final motion vector of the template block image according to the value.

【００７３】複数の片側予測動き検出手段はそれぞれ互
いに異なる参照画像内の部分参照画像データを入力す
る。The plurality of one-sided prediction motion detecting means inputs partial reference image data in different reference images.

【００７４】請求項１５に係る動きベクトル検出装置
は、請求項１４記載の片側予測動き検出手段の各々が、
整数精度での動きベクトルを検出する整数精度動きベク
トル検出手段と、この整数精度動きベクトル検出手段か
ら対応の部分参照画像データを受けて格納するバッファ
メモリ手段と、このバッファメモリ手段からの部分参照
画像データとテンプレートブロック画像データとに従っ
て分数精度での動きベクトルを算出する分数精度動きベ
クトル検出手段とを含み、さらに内挿予測動き検出手段
がこのバッファメモリ手段各々から対応の部分参照画像
データを入力するようにされている。In the motion vector detecting device according to the fifteenth aspect, each of the one-side predictive motion detecting means according to the fourteenth aspect is
Integer precision motion vector detecting means for detecting a motion vector with integer precision, buffer memory means for receiving corresponding partial reference image data from the integer precision motion vector detecting means, and partial reference image from this buffer memory means A fractional precision motion vector detecting means for calculating a motion vector with a fractional precision according to the data and the template block image data, and the interpolation prediction motion detecting means further inputs the corresponding partial reference image data from each of the buffer memory means. Is being done.

【００７５】請求項１６に係る動きベクトル検出装置
は、請求項１４記載の動きベクトル検出装置においてさ
らに、複数の片側予測動き検出手段からそれぞれにおい
て算出されたブロックマッチング評価関数値、動きベク
トル、および部分参照画像データを対応の片側予測動き
検出手段から内挿予測動き検出手段へ転送する転送手段
を設けたものである。A motion vector detecting device according to a sixteenth aspect is the motion vector detecting device according to the fourteenth aspect, further comprising a block matching evaluation function value, a motion vector, and a part calculated from a plurality of one-sided prediction motion detecting means. A transfer means is provided for transferring the reference image data from the corresponding one-sided prediction motion detection means to the interpolation prediction motion detection means.

【００７６】[0076]

【作用】請求項１に係る動きベクトル検出装置において
は要素プロセサの数は最大テンプレートブロックの画素
数であり、要素プロセサの数を低減することができる。In the motion vector detecting device according to the first aspect, the number of element processors is the maximum number of pixels in the template block, and the number of element processors can be reduced.

【００７７】またテンプレートブロックデータはその関
連の動きベクトルが検出されるまでプロセサ内に常駐す
る。転送されるのはサーチウインドーデータのみである
ためデータ転送時における消費電流／電力を低減するこ
とができる。Template block data also remains resident in the processor until its associated motion vector is detected. Since only the search window data is transferred, it is possible to reduce current consumption / power during data transfer.

【００７８】また要素プロセサにおいては所定の演算の
みが実行され、最終のブロック間の関連度を示す評価値
をアレイ外部に設けられた評価値生成手段により求める
ため、プロセサアレイの占有面積を低減することがで
き、応じて装置占有面積を低減することができる。Further, in the element processor, only a predetermined operation is executed, and the evaluation value indicating the degree of association between the final blocks is obtained by the evaluation value generating means provided outside the array, so that the area occupied by the processor array is reduced. Therefore, the area occupied by the device can be reduced accordingly.

【００７９】請求項２に係る発明においては、分数精度
の動きベクトル検出に必要とされるデータは整数精度の
動きベクトル決定に用いられたデータがそのまま伝達さ
れる。このためフレームメモリへのアクセス回数を大幅
に低減することが可能となり、分数精度の動きベクトル
を高速で検出することができる。According to the second aspect of the present invention, the data used for the motion vector determination with fractional precision is transmitted as it is as the data used for the motion vector determination with integer precision. Therefore, the number of times of access to the frame memory can be significantly reduced, and a motion vector with a fractional accuracy can be detected at high speed.

【００８０】請求項３に係る発明においては、要素プロ
セサがＭ個のテンプレートブロックデータおよびＮ個の
サーチウインドーデータを格納するため、プロセサアレ
イにおける要素プロセサの数を低減することができる。
またこのときプロセサ手段において１つの演算手段がＭ
個のテンプレートブロックデータとＮ個のサーチウイン
ドーデータとの演算を実行するため、同時に動作する演
算手段の数も応じて低減されるため、演算に要する消費
電流を低減することができる。またこの演算手段を時分
割的に駆動することにより、外部に設けられた評価値生
成手段の回路規模も応じて低減され、応じて消費電力が
低減される。In the invention according to claim 3, since the element processor stores M template block data and N search window data, the number of element processors in the processor array can be reduced.
Further, at this time, in the processor means, one arithmetic means is M
Since the calculation of the template block data and the N search window data is executed, the number of calculation means operating at the same time is also reduced, so that the current consumption required for the calculation can be reduced. Further, by driving the arithmetic means in a time-division manner, the circuit scale of the evaluation value generating means provided outside is also reduced accordingly, and the power consumption is accordingly reduced.

【００８１】請求項４に係る発明においては、演算手段
は、符号付減算結果を符号ビットと各大きさビットとの
モジュール２の加算を行なって、符号ビットとこの差分
絶対値に対応する値との組で出力を構成している。これ
により２の補数表示における負数の表示に必要とされる
インクリメンタを用いる必要がなくなり、この演算回路
規模の低減を実現することができ、また単にゲート手段
のみを用いて符号ビットと差分絶対値に対応する値とを
生成しているため高速で演算結果を出力することができ
る。In the invention according to claim 4, the calculation means performs addition of the signed subtraction result by the module 2 of the sign bit and each size bit to obtain the sign bit and the value corresponding to the absolute difference value. Configures the output. This eliminates the need to use an incrementer required for displaying a negative number in the 2's complement display, and this reduction in the scale of the arithmetic circuit can be realized, and the sign bit and the difference absolute value can be simply used only by the gate means. Since the values corresponding to and are generated, the calculation result can be output at high speed.

【００８２】請求項５に係る発明においては、演算手段
出力の総和をとる総和回路がツリー状に配置された複数
段の全加算回路で構成されており、符号ビットがこの全
加算回路段の最下位ビットのキャリ入力へ与えられる。
これによりキャリ伝搬遅延時間を大幅に低減することが
でき高速で総和演算を実行することができ、応じて高速
で評価値を生成することが可能となる。In the invention according to claim 5, the summing circuit for summing the outputs of the calculating means is composed of a plurality of stages of full adder circuits arranged in a tree form, and the sign bit is the maximum of the full adder circuit stage. It is given to the carry input of the lower bit.
As a result, the carry propagation delay time can be significantly reduced, the summation operation can be executed at high speed, and the evaluation value can be generated accordingly.

【００８３】請求項６に係る動きベクトル検出装置は、
アレイ状に配置された要素プロセサを含む。最小の要素
プロセサの数で効率的に変位ベクトルの評価値を生成す
ることができ、低消費電力および小占有面積を実現する
ことができる。A motion vector detecting apparatus according to claim 6 is
It includes element processors arranged in an array. The evaluation value of the displacement vector can be efficiently generated with the minimum number of element processors, and low power consumption and a small occupied area can be realized.

【００８４】請求項７に係る動きベクトル検出装置は、
請求項６に係る動きベクトル検出装置においてサーチウ
インドーデータの転送は線形アレイにおいて常に一方方
向のみとなるように実行される。これにより効率的にサ
ーチウインドーデータの転送を行なうことができるとと
もに低消費電力でデータ転送を実行することができる。A motion vector detection device according to a seventh aspect is
In the motion vector detecting device according to the sixth aspect, the transfer of the search window data is executed in the linear array only in one direction at all times. As a result, the search window data can be efficiently transferred, and the data transfer can be executed with low power consumption.

【００８５】請求項８に係る動きベクトル検出装置にお
いては、サーチウインドーデータ転送方向とテンプレー
トブロックデータ転送方向とが直交するように配置され
る。この場合、ラスタースキャンされる画像データに対
応してテンプレートブロックデータを転送することがで
き、効率的にデータ転送を行なうことができる。In the motion vector detecting apparatus according to the eighth aspect, the search window data transfer direction and the template block data transfer direction are arranged so as to be orthogonal to each other. In this case, the template block data can be transferred corresponding to the raster-scanned image data, and the data transfer can be performed efficiently.

【００８６】請求項９に係る動きベクトル検出装置にお
いては、サーチウインドーデータとテンプレートブロッ
クデータとが同一方向に沿って転送される。この場合、
データ転送用信号線はこの一方方向に沿って要素プロセ
サ幅内に配置されるため、配線占有面積を大幅に低減す
ることができる。In the motion vector detecting device according to the ninth aspect, the search window data and the template block data are transferred along the same direction. in this case,
Since the data transfer signal line is arranged within the element processor width along this one direction, the wiring occupying area can be significantly reduced.

【００８７】請求項１０に係る動きベクトル検出装置に
おいては、請求項６に係る動きベクトル検出装置におけ
る演算手段がサーチウインドーデータ転送速度のＮ倍の
速度で演算を実行する。これにより各要素プロセサにお
いて１つの演算手段のみを用いて複数のサーチウインド
ーブロックデータとテンプレートブロックデータとの演
算を実行することができ、要素プロセサの回路規模を低
減することができる。In the motion vector detecting device according to the tenth aspect, the calculating means in the motion vector detecting device according to the sixth aspect executes the calculation at a speed N times the search window data transfer speed. As a result, each element processor can execute the operation of a plurality of search window block data and template block data by using only one operation means, and the circuit scale of the element processor can be reduced.

【００８８】請求項１１に係る発明においては、請求項
６に係る動きベクトル検出装置がサーチウインドーデー
タの転送速度のＮ倍の速度で演算を実行している。この
場合演算が実行されないサーチウインドーデータが存在
する。このとき、テンプレートブロックデータを間引い
て（サブサンプリングして）各要素プロセサの記憶手段
へ格納することにより、必要最少限のテンプレートブロ
ックデータを用いて動きベクトルの決定を行なうことが
でき、高速で動きベクトルを検出することができる。In the invention according to claim 11, the motion vector detecting device according to claim 6 executes the calculation at a speed N times the transfer speed of the search window data. In this case, there is search window data for which calculation is not executed. At this time, the template block data is thinned out (subsampled) and stored in the storage means of each element processor, so that the motion vector can be determined using the minimum necessary template block data, and the motion can be performed at high speed. Vectors can be detected.

【００８９】請求項１２に係る発明においては、請求項
６に係る動きベクトル検出装置が２つのテンプレートブ
ロックの動きベクトルを同じサーチウインドーブロック
データを利用して並列的に算出している。これにより高
速で動きベクトルを検出することが可能となる。In the twelfth aspect of the invention, the motion vector detecting apparatus according to the sixth aspect calculates the motion vectors of the two template blocks in parallel using the same search window block data. This makes it possible to detect the motion vector at high speed.

【００９０】請求項１３に係る動きベクトル検出装置で
は、２つのテンプレートブロックに対し同じサーチウイ
ンドーブロックのデータを用いて並列的に動きベクトル
の決定が行なわれている。これにより装置規模および消
費電力を増加させることなく高速で動きベクトルを検出
することができる。In the motion vector detecting apparatus according to the thirteenth aspect, the motion vector is determined in parallel for the two template blocks by using the data of the same search window block. As a result, the motion vector can be detected at high speed without increasing the device scale and power consumption.

【００９１】請求項１４に係る動きベクトル検出装置に
おいては、内挿予測動きベクトル検出のために用いられ
る部分参照画像データは対応の各片側予測動き検出手段
から内挿予測動き検出手段へ与えられるため、参照画像
データを格納するフレームメモリへのアクセス回数が低
減され、高速で内挿予測動き検出を行なうことができ
る。In the motion vector detecting apparatus according to the fourteenth aspect, since the partial reference image data used for detecting the interpolated motion vector predictor is supplied from the corresponding one-sided motion vector predictor to the interpolated motion predictor. The number of accesses to the frame memory that stores the reference image data is reduced, and the interpolation prediction motion detection can be performed at high speed.

【００９２】請求項１５に係る動きベクトル検出装置に
おいては、バッファメモリ手段から部分参照画像データ
を読出して内挿予測動き検出を行なうため、装置規模を
増大させることなく詳細精度の内挿予測動きベクトル検
出を行なうことができる。In the motion vector detecting apparatus according to the fifteenth aspect, since the partial reference image data is read from the buffer memory means to perform the interpolative predictive motion detection, the interpolative predictive motion vector of the detailed accuracy can be obtained without increasing the scale of the apparatus. Detection can be performed.

【００９３】請求項１６に係る動きベクトル検出装置に
おいては、片側予測動き検出手段から必要なデータが全
て内容予測動き検出手段へ転送手段により転送されるた
め、その処理動作をパイプライン的に実行することがで
き、高速で動きベクトルを検出することが可能となる。In the motion vector detecting apparatus according to the sixteenth aspect, all necessary data is transferred from the one-sided predictive motion detecting means to the content predictive motion detecting means by the transferring means, so that the processing operation is executed in a pipeline manner. Therefore, the motion vector can be detected at high speed.

【００９４】[0094]

【実施例】［実施例１］図１はこの発明の第１の実施例
である動きベクトル検出装置の全体の構成を概略的に示
すブロック図である。図１に示す動きベクトル検出装置
は整数精度で動きベクトルを検出する。[Embodiment 1] FIG. 1 is a block diagram schematically showing the overall configuration of a motion vector detecting device according to a first embodiment of the present invention. The motion vector detection device shown in FIG. 1 detects a motion vector with integer precision.

【００９５】図１において、動きベクトル検出装置は、
サーチウインドーデータＹとテンプレートデータＸとを
受け、与えられたデータを所定のタイミングで出力する
入力部２と、入力部２から与えられたデータに基づい
て、１つのテンプレートブロックに対する変位ベクトル
に関する評価値（評価関数）を算出する演算部１と、演
算部１で求められた評価値（Σ｜ａ−ｂ｜）を受け、１
つのテンプレートブロックに関連する最小の評価値を求
め、その最小の評価値に対応する変位ベクトルを動きベ
クトルと決定する比較部３を含む。比較部３から動きベ
クトルが出力される。In FIG. 1, the motion vector detecting device is
An input section 2 that receives the search window data Y and the template data X and outputs the given data at a predetermined timing, and an evaluation of a displacement vector for one template block based on the data given from the input section 2. The calculation unit 1 for calculating a value (evaluation function) and the evaluation value (Σ | a−b |) obtained by the calculation unit 1 are received, and 1
It includes a comparison unit 3 which obtains a minimum evaluation value associated with one template block and determines a displacement vector corresponding to the minimum evaluation value as a motion vector. The comparison unit 3 outputs the motion vector.

【００９６】演算部１は、アレイ状に配置された複数の
要素プロセサを含むプロセサアレイ１０と、プロセサア
レイ１０の各要素プロセサが出力する演算結果値（本実
施例においては差分絶対値）の総和を求める総和部１２
とを含む。プロセサアレイ１０に含まれる要素プロセサ
は互いに異なるテンプレートブロックデータを格納し、
１つのテンプレートブロックと１つのサーチウインドー
ブロックとの相関度を示す評価値の成分を算出する。プ
ロセサアレイ１０においては、テンプレートブロックデ
ータはこのテンプレートブロックに関する動きベクトル
を求める動作サイクル中常時格納されている。サーチウ
インドーブロックデータは、１演算サイクル毎にこのプ
ロセサアレイ内を１画素分ずつシフトされる。これによ
り最小のサーチウインドーデータ転送動作で各変位ベク
トルに対する評価値を算出することができ、消費電流／
電力の低減を得ることができる。次に図１に示す演算部
１の具体的構成について説明する。The arithmetic unit 1 sums the processor array 10 including a plurality of element processors arranged in an array, and the operation result values (absolute difference values in this embodiment) output by each element processor of the processor array 10. Summation section 12
Including and The element processors included in the processor array 10 store different template block data,
A component of an evaluation value indicating the degree of correlation between one template block and one search window block is calculated. In the processor array 10, the template block data is always stored during the operation cycle for obtaining the motion vector for this template block. The search window block data is shifted by one pixel in the processor array for each operation cycle. As a result, the evaluation value for each displacement vector can be calculated with the minimum search window data transfer operation.
A reduction in power can be obtained. Next, a specific configuration of the arithmetic unit 1 shown in FIG. 1 will be described.

【００９７】図２はこの実施例において利用されるテン
プレートブロックおよびサーチエリアの大きさを示す図
である。テンプレートブロック２０は、Ｑ行Ｐ列に配置
された画素を含む。サーチエリア２２は水平方向におけ
る検索範囲が＋ｔ１〜−ｔ２であり、垂直方向の検索範
囲が＋ｒ１〜−ｒ２である。すなわち、サーチエリア２
２は、（ｔ２＋Ｐ＋ｔ１）×（ｒ２＋Ｑ＋ｒ１）の画素
を含む。FIG. 2 is a diagram showing the sizes of the template block and the search area used in this embodiment. Template block 20 includes pixels arranged in Q rows and P columns. The search area 22 has a horizontal search range of + t1 to -t2 and a vertical search range of + r1 to -r2. That is, search area 2
2 includes (t2 + P + t1) × (r2 + Q + r1) pixels.

【００９８】図３は、図１に示すプロセサアレイに含ま
れる要素プロセサの構成を概略的に示す図である。図３
において要素プロセサＰＥは、テンプレートブロックデ
ータを格納するための縦続接続されたＭ個のデータレジ
スタ２５−１〜２５−Ｍを含む。データレジスタ２５−
１〜２５−Ｍには互いに異なるテンプレートブロックデ
ータが格納される。このデータレジスタ２５−１〜２５
−Ｍは第１の格納手段内の記憶手段に対応する。FIG. 3 is a diagram schematically showing a configuration of an element processor included in the processor array shown in FIG. Figure 3
In, the element processor PE includes M data registers 25-1 to 25-M connected in cascade for storing template block data. Data register 25-
1 to 25-M store different template block data. This data register 25-1 to 25
-M corresponds to the storage means in the first storage means.

【００９９】要素プロセサＰＥはまた、サーチウインド
ーデータを格納するためのＮ段の縦続接続されたデータ
レジスタ２６−１〜２６−Ｎを含む。このデータレジス
タ２６−１〜２６−Ｎは第２の格納手段の各記憶手段に
対応する。ＮはＭの整数倍（ｎ倍）である。またテンプ
レートブロックの行の数Ｑは、データレジスタ２５−１
〜２５−Ｍの段数Ｍの整数倍である。この要素プロセサ
ＰＥにおいては、Ｍ個のデータレジスタ２５−１〜２５
−Ｍに格納されたテンプレートブロックデータを用いて
演算が実行される。このときサーチウインドーデータ格
納用のデータレジスタ２６−１〜２６−Ｎがテンプレー
トブロックデータ格納用データレジスタ２５−１〜２５
−Ｍと１対１に対応し（Ｎ＝Ｍ）、各対応のレジスタの
格納データを利用して演算が実行されてもよい。別の組
合せが用いられてもよい。Element processor PE also includes N cascaded data registers 26-1 to 26-N for storing search window data. The data registers 26-1 to 26-N correspond to the storage means of the second storage means. N is an integer multiple (n times) of M. In addition, the number Q of rows in the template block is determined by the data register 25-1.
.About.25-M is an integer multiple of the number M of stages. In this element processor PE, there are M data registers 25-1 to 25-25.
An operation is performed using the template block data stored in -M. At this time, the data registers 26-1 to 26-N for storing the search window data are replaced by the data registers 25-1 to 25 for storing the template block data.
There may be a one-to-one correspondence with -M (N = M), and the operation may be executed using the data stored in each corresponding register. Other combinations may be used.

【０１００】要素プロセサＰＥはＭ個のテンプレートブ
ロックデータに関する演算を実行する。要素プロセサＰ
Ｅにおける演算手段は、このＭ個のデータレジスタ２５
−１〜２５−Ｍに対して多重化態様で利用される。すな
わち、この実施例においては演算手段は１個のみ要素プ
ロセサＰＥ内に設けられる。The element processor PE executes an operation on M template block data. Element processor P
The calculation means in E is the M data registers 25.
Used in a multiplexed manner for -1 to 25-M. That is, in this embodiment, only one arithmetic means is provided in the element processor PE.

【０１０１】サーチウインドーデータおよびテンプレー
トブロックデータは一方方向の隣接要素プロセサ間にお
いてのみ転送される。Search window data and template block data are transferred only between adjacent element processors in one direction.

【０１０２】図４は、図１に示すプロセサアレイの構成
を示す図である。図４において、プロセサアレイ１０
は、Ｐ列に配置された線形プロセサアレイＬＡ１〜ＬＡ
Ｐを含む。線形プロセサアレイＬＡ１〜ＬＡＰは同じ構
成を備え、縦続形態で配置されるｍ個の要素プロセサＰ
Ｅ１〜ＰＥｍと、Ｒ（＝ｒ１＋ｒ２）個のサーチウイン
ドーデータを格納するとともに遅延手段としても機能す
るデータバッファＤＬを含む。FIG. 4 is a diagram showing the structure of the processor array shown in FIG. In FIG. 4, the processor array 10
Are linear processor arrays LA1 to LA arranged in P columns
Including P. The linear processor arrays LA1 to LAP have the same structure and m element processors P arranged in a cascade form.
The data buffer DL includes E1 to PEm and R (= r1 + r2) pieces of search window data and also functions as a delay unit.

【０１０３】要素プロセサＰＥ１〜ＰＥｍは同じ線形プ
ロセサアレイＬＡ内においては一方方向（図４における
垂直上方向）に沿ってサーチウインドーデータおよびテ
ンプレートブロックデータを伝達する。隣接する線形プ
ロセサアレイへのデータ転送時においては、最上流の要
素プロセサＰＥ１からのサーチウインドーデータは隣接
線形プロセサアレイ（図４において左側）のデータバッ
ファＤＬへ与えられる。最上流の要素プロセサＰＥ１か
らのテンプレートブロックデータは隣接線形プロセサア
レイの最下流の要素プロセサＰＥｍへ与えられる。すな
わち、サーチウインドーデータは要素プロセサＰＥおよ
びデータバッファＤＬを介して伝達され、一方テンプレ
ートブロックデータは要素プロセサＰＥのみを介して伝
達される。The element processors PE1 to PEm transmit the search window data and the template block data along one direction (vertical upward direction in FIG. 4) in the same linear processor array LA. During data transfer to the adjacent linear processor array, the search window data from the most upstream element processor PE1 is supplied to the data buffer DL of the adjacent linear processor array (left side in FIG. 4). The template block data from the most upstream element processor PE1 is given to the most downstream element processor PEm of the adjacent linear processor array. That is, the search window data is transmitted via the element processor PE and the data buffer DL, while the template block data is transmitted only via the element processor PE.

【０１０４】データバッファＤＬは前述のごとく遅延機
能を備えており、また与えられたデータをファーストイ
ン・ファーストアウト（ＦＩＦＯ）態様で出力する機能
を備える。データバッファＤＬとしては、したがってＲ
個のシフト機能付データラッチが用いられてもよく、ま
たＲ個のデータを格納するレジスタファイルが用いられ
てもよい。The data buffer DL has a delay function as described above, and also has a function of outputting given data in a first-in first-out (FIFO) mode. As a data buffer DL, therefore R
Data latches with shift function may be used, and a register file for storing R data may be used.

【０１０５】上述のようにプロセサアレイ１０内におい
て、サーチウインドーデータを転送する信号線とテンプ
レートデータを転送する信号線とを平行に配置すること
により信号線の配線面積を十分に確保することができる
とともに配線占有面積を最小とすることができる。要素
プロセサＰＥの幅に対応する領域を配線領域として利用
することができるからである。As described above, in the processor array 10, by arranging the signal lines for transmitting the search window data and the signal lines for transmitting the template data in parallel, it is possible to secure a sufficient wiring area for the signal lines. In addition, the wiring occupancy area can be minimized. This is because the area corresponding to the width of the element processor PE can be used as the wiring area.

【０１０６】図５は、このデータ転送経路の別の形態を
示す図である。図５に示すようにテンプレートデータ転
送方向３２とサーチウインドーデータ転送方向３０とが
互いに直交するようにデータ転送経路が設定されてもよ
い。通常フレームメモリなどにおいては画像信号はラス
タスキャンされているため行方向（図の水平方向）に沿
って隣接して連続する画像データが格納される。したが
って、この場合図５に示すように図の水平方向にデータ
を転送する場合フレームメモリから同一行のデータをア
クセスして順次読出していくことができ、データの読出
および転送にとって都合がよい。この場合においても、
サーチウインドーデータおよびテンプレートデータは各
要素プロセサにおいて一方方向に沿ってのみ転送され
る。次に動作について説明する。FIG. 5 is a diagram showing another form of this data transfer path. As shown in FIG. 5, the data transfer path may be set so that the template data transfer direction 32 and the search window data transfer direction 30 are orthogonal to each other. In a normal frame memory or the like, since image signals are raster-scanned, image data that is adjacent and continuous in the row direction (horizontal direction in the drawing) is stored. Therefore, in this case, as shown in FIG. 5, when transferring data in the horizontal direction in the figure, the data in the same row can be accessed and sequentially read from the frame memory, which is convenient for reading and transferring the data. Even in this case,
The search window data and the template data are transferred only in one direction in each element processor. Next, the operation will be described.

【０１０７】図６に示すように、１フレームの画像を８
×８のマクロブロックに分割した状態を考える。図６
（Ａ）に示すように現フレーム画像３６における斜線で
示すマクロブロックをテンプレートブロックＴＢ１とす
る。このテンプレートブロックＴＢ１に対し動きベクト
ルを検出する場合を考える。As shown in FIG. 6, one frame image is converted into 8
Consider a state of being divided into × 8 macroblocks. Figure 6
As shown in (A), a macro block indicated by diagonal lines in the current frame image 36 is a template block TB1. Consider a case where a motion vector is detected for this template block TB1.

【０１０８】この場合、プロセサアレイ１０において
は、図６（Ｂ）に示すように前フレーム画像３５におけ
る斜線で示す３つのマクロブロックＭＢ１、ＭＢ２およ
びＭＢ３が格納される。図６（Ａ）に示すテンプレート
ブロックＴＢ１の画素データが要素プロセサＰＥの各デ
ータレジスタに格納される。１個の要素プロセサＰＥに
はＱ／ｍ個の図の垂直方向に配列されたテンプレートブ
ロックデータが格納される。一方、サーチウインドーデ
ータについては１つの要素プロセサにＱ・ｎ／ｍ個の図
の垂直方向に隣接する画素データが格納される。プロセ
サアレイの要素プロセサには垂直方向Ｑ個、水平方向Ｐ
個、合計Ｐ・Ｑ個のサーチウインドー画素データが格納
される。このＰ・Ｑ個の画素データを以下の説明におい
てはサーチウインドーブロック画素データと称する。残
りのＲ・Ｐ個のサーチウインドー画素データはデータバ
ッファＤＬに格納される。この状態を図７に示す。In this case, the processor array 10 stores three macroblocks MB1, MB2 and MB3 indicated by diagonal lines in the previous frame image 35 as shown in FIG. 6B. The pixel data of the template block TB1 shown in FIG. 6A is stored in each data register of the element processor PE. One element processor PE stores Q / m template block data arranged in the vertical direction in the figure. On the other hand, for the search window data, Q.n / m pieces of pixel data adjacent in the vertical direction in the drawing are stored in one element processor. Elements of the processor array are Q in the vertical direction and P in the horizontal direction for the processor.
A total of P and Q search window pixel data are stored. This P · Q pixel data will be referred to as search window block pixel data in the following description. The remaining R and P search window pixel data are stored in the data buffer DL. This state is shown in FIG.

【０１０９】すなわち、Ｐ・Ｑ個の画素データからなる
サーチウインドーデータはサーチウインドーブロック４
２を構成し、プロセサアレイの各要素プロセサＰＥに格
納される。残りの（ｒ１＋ｒ２）・Ｐ個の画素データは
データバッファＤＬに格納される。このデータバッファ
ＤＬに格納される画素データ領域４４を、サイドウイン
ドーブロックと以下の説明では称する。また以下の説明
においては、このサーチウインドーブロック４２とサイ
ドウインドーブロック４４とを合わせてサーチウインド
ー４０と称する。That is, the search window data composed of P · Q pixel data is the search window block 4
2 and is stored in each element processor PE of the processor array. The remaining (r1 + r2) · P pixel data are stored in the data buffer DL. The pixel data area 44 stored in the data buffer DL is referred to as a side window block in the following description. In the following description, the search window block 42 and the side window block 44 will be collectively referred to as the search window 40.

【０１１０】１個の要素プロセサＰＥには、一般に、図
８に示すように、Ｍ個のテンプレートブロックデータと
Ｎ個のサーチウインドーブロックデータとが格納され
る。この要素プロセサＰＥがｍ個縦続接続されることに
よりＱ個の１列に配置された画素データが線形プロセサ
アレイ内の要素プロセサＰＥ内に格納される。次に具体
的動作について説明する。In general, one element processor PE stores M template block data and N search window block data, as shown in FIG. By connecting m element processors PE in cascade, the pixel data arranged in one column of Q pieces are stored in the element processor PE in the linear processor array. Next, a specific operation will be described.

【０１１１】以下の説明を簡単にするために以下の条件
を仮定する。テンプレートブロックサイズ：Ｐ＝Ｑ＝１６動きベクトルの検索範囲：ｒ１＝ｒ２＝１５，ｔ２＝
０，ｔ１＝１５ｍ＝８Ｍ＝Ｎ＝２図９は、動きベクトル検出動作における最初の動作サイ
クルにおけるデータ格納状況を示す図である。図９にお
いて、１６行×１６列の画素データからなるテンプレー
トブロック４３のデータがプロセサアレイ１０の各要素
プロセサＰＥに格納される。これに対応してまた１６行
×１６列のサーチウインドーブロック４２の画素データ
がプロセサアレイ内の各要素プロセサＰＥに格納され
る。この状態は変位ベクトル（０，−１５）に対応す
る。この状態において各要素プロセサＰＥはテンプレー
トブロックデータと対応のサーチウインドーブロックデ
ータとの差分絶対値を求める。各要素プロセサＰＥにお
いて求められた差分絶対値は総和部１２へ伝達され、そ
こで総和が実行されてこの変位ベクトル（０，−１５）
に対する評価値（評価関数）が求められる。To simplify the following description, assume the following conditions. Template block size: P = Q = 16 Motion vector search range: r1 = r2 = 15, t2 =
0, t1 = 15 m = 8 M = N = 2 FIG. 9 is a diagram showing a data storage state in the first operation cycle in the motion vector detection operation. In FIG. 9, the data of the template block 43 consisting of pixel data of 16 rows × 16 columns is stored in each element processor PE of the processor array 10. Correspondingly, the pixel data of the search window block 42 of 16 rows × 16 columns is stored in each element processor PE in the processor array. This state corresponds to the displacement vector (0, -15). In this state, each element processor PE obtains the absolute difference value between the template block data and the corresponding search window block data. The absolute difference value obtained in each element processor PE is transmitted to the summation unit 12, where the summation is executed and this displacement vector (0, −15)
An evaluation value (evaluation function) for is obtained.

【０１１２】次いで、テンプレートブロックデータは各
要素プロセサＰＥ内に保持した状態でサーチウインドー
ブロックデータのみを１画素分転送する。Next, the template block data is transferred to one pixel only in the search window block data while being held in each element processor PE.

【０１１３】この状態においては図１０（Ａ）に示すよ
うにサーチウインドーブロック４２における最上行のデ
ータが隣接列のデータバッファＤＬへ転送され、応じて
このサーチウインドーブロックのうちの最初のデータが
シフトアウトされる。このシフトアウトと並行して、新
たにサーチウインドーブロックデータが入力される。こ
のシフトアウトされるサーチウインドーブロックデータ
および新たにシフトインされるサーチウインドーブロッ
クデータを図１０（Ａ）において斜線領域で示す。In this state, as shown in FIG. 10A, the data in the uppermost row in the search window block 42 is transferred to the data buffer DL in the adjacent column, and accordingly the first data in this search window block is transferred. Is shifted out. In parallel with this shift-out, new search window block data is input. The shift-out search window block data and the new shift-in search window block data are indicated by hatched areas in FIG.

【０１１４】これにより、図１０（Ｂ）に示すようにプ
ロセサアレイ１０内の要素プロセサＰＥ内には、サーチ
ウインドー４０における１行下方向にずれたサーチウイ
ンドーブロック４２ａが格納される。この状態において
はテンプレートブロック４３とサーチウインドーブロッ
ク４２ａとの変位はベクトル（０，−１４）となる。As a result, as shown in FIG. 10B, the element processor PE in the processor array 10 stores the search window block 42a which is shifted downward by one line in the search window 40. In this state, the displacement between the template block 43 and the search window block 42a becomes the vector (0, -14).

【０１１５】この状態において再び上述と同様の差分絶
対値の演算および総和演算が行なわれ、変位ベクトル
（０，−１４）に対する評価値が求められる。In this state, the difference absolute value calculation and the summation calculation similar to the above are performed again to obtain the evaluation value for the displacement vector (0, -14).

【０１１６】この動作を繰返し、変位ベクトル（０，
０）となった場合、図１１（Ａ）に示すように、サーチ
ウインドー４０におけるサーチウインドーブロック４２
ｂはテンプレートブロック４３の真裏に対応する。この
状態においては図１１（Ｂ）に示すように、プロセサア
レイ内に格納されるサーチウインドー４０のデータとし
ては、その上部１５画素×１６画素の領域において１列
図の右方向にずれた位置のサーチウインドーデータが格
納される。上述のように評価値算出動作にとって不要と
なるサーチウインドーデータを１ビットシフトアウト
し、新たに１画素分ずつサーチウインドーデータをシフ
トインすることにより評価値算出動作と並行して、新た
に次の列のサーチウインドーデータが格納される。By repeating this operation, the displacement vector (0,
0), as shown in FIG. 11A, the search window block 42 in the search window 40.
b corresponds to the back of the template block 43. In this state, as shown in FIG. 11B, the data of the search window 40 stored in the processor array is located at a position displaced to the right in the one-column diagram in the upper 15 pixel × 16 pixel area. The search window data of is stored. As described above, the search window data that is unnecessary for the evaluation value calculation operation is shifted out by 1 bit, and the search window data is newly shifted in by one pixel, so that the evaluation value calculation operation is newly performed in parallel. The search window data of the next column is stored.

【０１１７】最終的に変位ベクトルが（０，＋１５）状
態となった場合、図１２（Ａ）に示すようにサーチウイ
ンドーブロック４２ｃは、サーチウインドー４０の一番
下の領域に配置される。この状態での変位ベクトル
（０，＋１５）に対する評価値が算出された後、プロセ
サアレイからは１画素分不要となったサーチウインドー
データＰＹ１がシフトアウトされ、新たにサーチウイン
ドーデータＰＹ２がシフトインされる。この後、１６回
シフト動作を繰返す。このシフト動作により、次のステ
ップの演算に必要なデータがすべてプロセサアレイ内に
揃うことになる。When the displacement vector finally becomes the (0, +15) state, the search window block 42c is arranged in the lowermost area of the search window 40 as shown in FIG. . After the evaluation value for the displacement vector (0, +15) in this state is calculated, the search window data PY1 that has become unnecessary for one pixel is shifted out from the processor array, and the search window data PY2 is newly shifted. Will be in. After that, the shift operation is repeated 16 times. By this shift operation, all the data necessary for the calculation of the next step are prepared in the processor array.

【０１１８】次のステップにおいては、サーチエリアに
おいて１列図の右方向にずれたサーチウインドーに対し
て評価値の算出が行なわれる。すなわち図１３に示すよ
うに、サーチエリア４５内において次のサーチウインド
ー４０ａは元のサーチウインドー４０よりも１列図の右
方向にずれた位置の画素データで構成される。いま、図
１２（Ｂ）に示す状態においてプロセサアレイにおいて
は斜線で示す領域５０の画素データが格納されている。In the next step, the evaluation value is calculated for the search window which is displaced to the right in the one-column diagram in the search area. That is, as shown in FIG. 13, the next search window 40a in the search area 45 is composed of pixel data at a position displaced to the right in the one-column diagram from the original search window 40. In the state shown in FIG. 12B, the pixel data of the shaded area 50 is stored in the processor array.

【０１１９】新たにプロセサアレイにおける各要素プロ
セサＰＥに対しこの図１３に示すサーチウインドー４０
ａの最上位に位置するサーチウインドーブロックを格納
するために、この状態（図１２（Ｂ））で残りの１５画
素分のデータを前述のごとくシフトインする。これによ
り、図１４に示すように、新たなサーチウインドー４０
ａにおける最上位のサーチウインドーブロック４２ｄが
プロセサアレイの各要素プロセサＰＥ内に格納される。The search window 40 shown in FIG. 13 is newly added to each element processor PE in the processor array.
In order to store the search window block located at the top of a, the remaining 15 pixels of data are shifted in as described above in this state (FIG. 12B). As a result, as shown in FIG. 14, a new search window 40 is displayed.
The uppermost search window block 42d in a is stored in each element processor PE of the processor array.

【０１２０】この状態は、変位ベクトル（１，−１５）
に対応するサーチウインドーブロック４２ｄとテンプレ
ートブロック４３とが要素プロセサに格納された状態に
対応する。この状態において再び上述の動作すなわち差
分絶対値の算出および総和の算出による評価値の導出を
実行する。ここで、図１４において点線で示す１列の領
域はシフトアウトされたサーチウインドーデータおよび
シフトインされたサーチウインドーデータを示す。In this state, the displacement vector (1, -15)
Corresponds to a state in which the search window block 42d and the template block 43 corresponding to are stored in the element processor. In this state, the above-described operation is again performed, that is, the evaluation value is derived by calculating the absolute difference value and calculating the total sum. Here, the area of one column shown by the dotted line in FIG. 14 indicates the shifted-out search window data and the shifted-in search window data.

【０１２１】上述のシフト動作を（１５＋１５＋１６）
×１５＋（１５＋１５＋１）＝７３６回実行すると、図
１５に示すような変位ベクトル（１５，１５）に対する
評価値の算出が実行される。すなわち、この状態におい
ては、テンプレートブロック４３に対し、サーチエリア
４５における右下隅のサーチウインドーブロック４２ｅ
に対する評価値の算出が実行される。The above shift operation is (15 + 15 + 16)
When it is executed × 15 + (15 + 15 + 1) = 736 times, calculation of the evaluation value for the displacement vector (15,15) as shown in FIG. 15 is executed. That is, in this state, with respect to the template block 43, the search window block 42e at the lower right corner of the search area 45.
Calculation of the evaluation value for is performed.

【０１２２】この状態においては、図１６に示すよう
に、プロセサアレイ内における要素プロセサには斜線で
示す領域５０ａのデータが格納されている。残りの１６
×３０画素の領域（サイドウインドー領域）５２は、デ
ータバッファ内に格納されている。上述の動作を行なう
ことにより、テンプレートブロック４３に対する必要な
評価関数（評価値）のすべてを算出することができる。
この算出されたすべての評価関数（評価値）に対し比較
部において最小の評価値（評価関数）が求められ、それ
に対応する変位ベクトルがこのテンプレートブロック４
３に対する動きベクトルとして決定される。In this state, as shown in FIG. 16, the data in the shaded area 50a is stored in the element processor in the processor array. 16 remaining
An area (side window area) 52 of x30 pixels is stored in the data buffer. By performing the above operation, all necessary evaluation functions (evaluation values) for the template block 43 can be calculated.
The minimum evaluation value (evaluation function) is obtained in the comparison unit for all of the calculated evaluation functions (evaluation values), and the displacement vector corresponding to this is obtained in this template block 4.
3 is determined as a motion vector.

【０１２３】上述の画像データの動きを具体的にサーチ
ウィンドー画素データに注目して説明する。今、図１７
に示すように、プロセサアレイ１０は８行８列に配置さ
れた６４個の要素プロセサＰＥ０〜ＰＥ６３を含むと仮
定する。図１７には、図面を簡略化するため、バッファ
ＤＬは示していない。要素プロセサＰＥ０〜ＰＥ６３に
は、テンプレートブロックの各画素データが格納され
る。The above-mentioned movement of the image data will be specifically described by paying attention to the search window pixel data. FIG. 17 now
It is assumed that the processor array 10 includes 64 element processors PE0 to PE63 arranged in 8 rows and 8 columns, as shown in FIG. The buffer DL is not shown in FIG. 17 to simplify the drawing. Each pixel data of the template block is stored in the element processors PE0 to PE63.

【０１２４】このテンプレートブロックは図１８に示す
ように、８行８列に配置された画素データａ０，０〜ａ
７，７を含む。画素データａｉ，ｊは、８行８列のサイ
ズを有するテンプレートブロックにおいてｉ行ｊ列の位
置に配置された画素データを示す。テンプレート画素デ
ータａ０，０〜ａ７，７は、動きベクトル算出期間中対
応の要素プロセサ内に常駐する。検索範囲としては水平
方向−８〜＋８および垂直方向−８〜＋８を想定する。
図１９にサーチエリアの半分すなわち水平方向０〜＋
８の検索範囲のエリアの画素データの配置を示す。図１
９において、サーチエリアは２４行１６列に配置された
画素データｂ０，０〜ｂ２３，１５を含む。ここで画素
データｂｉ，ｊはこのサーチエリアにおいてｉ行ｊ列に
配置された参照画素のデータを示す。As shown in FIG. 18, this template block has pixel data a0, 0 to a arranged in 8 rows and 8 columns.
Including 7, 7. Pixel data a i, j indicates pixel data arranged at the position of the i th row and the j th column in the template block having the size of 8 rows and 8 columns. The template pixel data a0, 0 to a7, 7 are resident in the corresponding element processor during the motion vector calculation period. The search range is assumed to be -8 to +8 in the horizontal direction and -8 to +8 in the vertical direction.
FIG. 19 shows half the search area, that is, 0 to + in the horizontal direction.
8 shows an arrangement of pixel data in the area of the search range of 8. Figure 1
9, the search area includes pixel data b0,0 to b23,15 arranged in 24 rows and 16 columns. Here, the pixel data bi, j indicates the data of the reference pixels arranged in the i-th row and the j-th column in this search area.

【０１２５】図１７、１８および１９に示す配置におい
ては、図２０に示すように変位ベクトル（０，−８）〜
（８，８）各々に対する評価値が算出される。要素プロ
セサＰＥ０〜ＰＥ６３は行列状に配置されるが、そのデ
ータの転送方向は一方方向のみである。したがって、図
２１に示すように、プロセサアレイ１０に含まれる要素
プロセサＰＥ０〜ＰＥ６３およびバッファＢＬは、サー
チウインドー画素データに関しては１列に配置される。In the arrangements shown in FIGS. 17, 18 and 19, as shown in FIG. 20, the displacement vectors (0, -8) .about.
An evaluation value for each (8,8) is calculated. The element processors PE0 to PE63 are arranged in a matrix, but the data transfer direction is only one direction. Therefore, as shown in FIG. 21, the element processors PE0 to PE63 and the buffer BL included in the processor array 10 are arranged in one column with respect to the search window pixel data.

【０１２６】図２１において、プロセサ列ＰＬ１は要素
プロセサＰＥ０〜ＰＥ７を含み、プロセサ列ＰＬ２は要
素プロセサＰＥ８〜ＰＥ１５を含む。プロセサ列ＰＬｋ
は、要素プロセサＰＥ（８ｋ）〜ＰＥ（８ｋ＋７）を含
む。ただしｋ＝０〜７である。In FIG. 21, processor row PL1 includes element processors PE0 to PE7, and processor row PL2 includes element processors PE8 to PE15. Processor row PLk
Includes element processors PE (8k) to PE (8k + 7). However, k = 0 to 7.

【０１２７】プロセサ列ＰＬｋとプロセサ列ＰＬ（ｋ＋
１）との間にバッファＤＬ（ｋ＋１）が配置され、さら
に最下流（画素データ入力部に最も近い位置）のプロセ
サ列ＰＬ７の入力部にバッファＤＬ８が設けられる。バ
ッファＤＬ１〜ＤＬ８の各々は１６画素のデータを格納
し、一方方向にのみデータを転送する。Processor row PLk and processor row PL (k +
The buffer DL (k + 1) is provided between the buffer DL8 and the buffer DL8 (1), and the buffer DL8 is provided at the input section of the processor row PL7 at the most downstream side (position closest to the pixel data input section). Each of the buffers DL1 to DL8 stores data of 16 pixels and transfers the data only in one direction.

【０１２８】出発状態として、今、図２１に示すように
プロセサ列ＰＬ０にサーチウインドー画素データｂ０，
０〜ｂ７，０が格納され、バッファＤＬ１に画素データ
ｂ８，０〜ｂ２３，０が格納されるというように、図１
９に示すサーチエリアの第０行〜第２３行および第０列
〜第７列の画素データｂがそれぞれ対応の要素プロセサ
に格納された状態を考える。As a starting state, as shown in FIG. 21, search window pixel data b0,
1 to 0 are stored in the buffer DL1, and pixel data b8,0 to b23,0 are stored in the buffer DL1.
Consider a state in which the pixel data b of the 0th to 23rd rows and the 0th to 7th columns of the search area shown in FIG. 9 are stored in the corresponding element processors.

【０１２９】プロセサ列ＰＬ０〜ＰＬ７には図１９のサ
ーチエリアの第０行ないし第７行および第０列ないし第
７列の画素データが格納される。このとき、要素プロセ
サＰＥ０〜Ｐ６３には、図１８に示すテンプレートブロ
ックの画素データａ０，０〜ａ７，７１つのプロセサ列
ＰＬｋが１列のテンプレート画素データａ０，ｋ〜ａ
７，ｋを格納するようにして、対応の画素データが格納
されている。この状態は変位ベクトル（０，−８）に対
応し、この変位ベクトル（０，−８）に対する評価値が
算出される。The processor columns PL0 to PL7 store the pixel data of the 0th to 7th rows and the 0th to 7th columns of the search area of FIG. At this time, in the element processors PE0 to P63, the template block pixel data a0, 0 to a7 shown in FIG.
Corresponding pixel data is stored in such a manner that 7 and k are stored. This state corresponds to the displacement vector (0, -8), and the evaluation value for this displacement vector (0, -8) is calculated.

【０１３０】次のサイクルにおいては、サーチウインド
ー画素データのシフトが行なわれる。すなわち、図２２
（ａ）に示すように、画素データｂ０，０がシフトアウ
トされ、画素データｂ０，８がシフトインされる。この
シフト動作はバッファ（図２２においては斜線で示す）
を介して行なわれる。In the next cycle, the search window pixel data is shifted. That is, FIG.
As shown in (a), the pixel data b0,0 is shifted out and the pixel data b0,8 is shifted in. This shift operation is a buffer (shown by diagonal lines in FIG. 22)
Through.

【０１３１】プロセッサ列ＰＬ０〜ＰＬ７においては、
図１９に示すサーチエリアの１行下のサーチウインドー
ブロックの画素データが格納される。この新たなサーチ
ウインドーブロックは変位ベクトル（０，−７）に対応
する（テンプレートブロック画素データはシフトされな
い）。この状態で計算が行なわれると、変位ベクトル
（０，−７）に対する評価値が算出される。In the processor rows PL0 to PL7,
Pixel data of the search window block one row below the search area shown in FIG. 19 is stored. This new search window block corresponds to the displacement vector (0, -7) (template block pixel data is not shifted). When the calculation is performed in this state, the evaluation value for the displacement vector (0, −7) is calculated.

【０１３２】次に図２２（ｂ）に示すように、第１行第
８列の画素データｂ１，８のシフトインが行なわれ、画
素データｂ１，０のシフトアウトが行なわれると、プロ
セサ列ＰＬ０〜ＰＬ７が格納する画素データは図１９に
おいて１行下方向にシフトされる。すなわち、プロセサ
列ＰＬｋにはサーチウインドーブロック画素データｂ
２，ｋ〜ｂ９，ｋが格納される。このサーチウインドー
ブロックは変位ベクトル（０，−６）に対応し、この変
位ベクトル（０，−６）に対する評価値（評価関数）が
算出される。Next, as shown in FIG. 22B, when the pixel data b1,8 in the first row, eighth column is shifted in and the pixel data b1,0 is shifted out, the processor column PL0 is changed. The pixel data stored by PL7 is shifted downward by one row in FIG. That is, the search window block pixel data b is stored in the processor row PLk.
2, k to b9, k are stored. This search window block corresponds to the displacement vector (0, -6), and the evaluation value (evaluation function) for this displacement vector (0, -6) is calculated.

【０１３３】このシフトおよび算出動作を繰返すと、図
２２（ｃ）に示す状態に到達し、画素データｂ７，８が
シフトインされ、画素データｂ７，０がシフトアウトさ
れる。プロセサ列ＰＬｋには画素データｂ８，ｋ〜ｂ１
５，ｋが格納される。このプロセサ列ＰＬ０〜ＰＬ７に
格納された画素データを含むサーチウインドーブロック
は変位ベクトル（０，０）に対応する。When the shift and calculation operations are repeated, the state shown in FIG. 22C is reached, the pixel data b7, 8 is shifted in, and the pixel data b7, 0 is shifted out. Pixel data b8, k to b1 are stored in the processor row PLk.
5, k are stored. The search window block including the pixel data stored in the processor rows PL0 to PL7 corresponds to the displacement vector (0,0).

【０１３４】変位ベクトル（０，０）に対する評価値算
出の後、８回シフト／算出動作を実行すると、図２２
（ｄ）に示すように、画素データｂ１５，８がシフトイ
ンされ、画素データｂ１５，０がシフトアウトされる。
変位ベクトル（０，８）に対応するサーチウインドーブ
ロックの画素データプロセサ列ＰＬ０〜ＰＬ７に格納さ
れる。このサーチウインドーブロックは変位ベクトル
（０，８）に対応する。After the evaluation value is calculated for the displacement vector (0,0), the shift / calculation operation is performed eight times, as shown in FIG.
As shown in (d), the pixel data b15, 8 is shifted in, and the pixel data b15, 0 is shifted out.
It is stored in the pixel data processor columns PL0 to PL7 of the search window block corresponding to the displacement vector (0, 8). This search window block corresponds to the displacement vector (0,8).

【０１３５】この変位ベクトル（０，８）の評価値の算
出の後、変位ベクトル（１，−８）に対する評価値の算
出が行なわれる。このとき、まだ、サーチエリアの第８
列の画素データはすべてシフトインされていない。そこ
で、第８列の残りのサーチウインドー画素データｂ１
６，８〜ｂ２３，８が順次シフトインされ、応じて不要
となった画素データｂ１６，０〜ｂ２３，０がシフトア
ウトされる。この状態を図２３（ａ）に示す。After the evaluation value of the displacement vector (0, 8) is calculated, the evaluation value for the displacement vector (1, -8) is calculated. At this time, it is still the eighth in the search area.
All pixel data in the columns are not shifted in. Therefore, the remaining search window pixel data b1 in the eighth column
6, 8 to b23, 8 are sequentially shifted in, and accordingly unnecessary pixel data b16, 0 to b23, 0 are shifted out. This state is shown in FIG.

【０１３６】この図２３（ａ）に示す状態においてはプ
ロセサ例ＰＬｋには画素データｂ０，ｋ＋１〜ｂ７，ｋ
＋１が格納される。すなわち、変位ベクトル（１，−
８）に対するサーチウインドーブロックの画素データが
要素プロセサＰＥ０〜ＰＥ６３に格納されており、これ
らの格納されている画素データを用いることにより変位
ベクトル（１，−８）に対する評価値が算出される。In the state shown in FIG. 23 (a), pixel data b0, k + 1 to b7, k are stored in the processor example PLk.
+1 is stored. That is, the displacement vector (1,-
The pixel data of the search window block for 8) is stored in the element processors PE0 to PE63, and the evaluation value for the displacement vector (1, -8) is calculated by using these stored pixel data.

【０１３７】再び、１画素データのシフトおよびシフト
後の算出動作が実行される。この場合、図２３（ｂ）に
示すように画素データｂ０，９のシフトインおよび画素
データｂ０，１のシフトアウトが行なわれ、変位ベクト
ル（１，−７）に対する評価値の算出が行なわれる。Again, the shift of one pixel data and the calculation operation after the shift are executed. In this case, the pixel data b0 and 9 are shifted in and the pixel data b0 and 1 are shifted out as shown in FIG. 23B, and the evaluation value for the displacement vector (1, -7) is calculated.

【０１３８】以降は、上述のシフト／算出動作を繰返
し、図２３（ｃ）に示すように画素データｂ１５，１６
がシフトインされると、画素データｂ１５，８がシフト
アウトされ、最後の変位ベクトル（８，８）に対する評
価値の算出が行なわれる。After that, the above-mentioned shift / calculation operation is repeated, and as shown in FIG.
Is shifted in, the pixel data b15, 8 is shifted out, and the evaluation value for the final displacement vector (8, 8) is calculated.

【０１３９】上述のサーチエリア画素データのシフト動
作は図１９に示すサーチエリアのマトリクスの各列を連
結して１本の画素データ列を形成し、１画素データずつ
シフトさせる動作と同様である。必要な画素データ（サ
ーチウインドー画素データ）のみが要素プロセサ内に格
納されるようにバッファＤＬが挿入される。The above-described shift operation of the search area pixel data is similar to the operation of connecting the columns of the search area matrix shown in FIG. 19 to form one pixel data column and shifting the pixel data by one pixel. The buffer DL is inserted so that only necessary pixel data (search window pixel data) is stored in the element processor.

【０１４０】上述の構成においては、テンプレートブロ
ックのデータはプロセサアレイ内の各要素プロセサ内に
常駐している。サーチエリア内のデータのみが各動作サ
イクル毎にシフトする。このときのサーチエリア内のデ
ータのスキャン方向は、図２４に示すように、サーチエ
リア４５において一方方向（図の上から下方向）のみで
ある。サーチエリアのデータ転送は一方方向のみにおい
て行なわれるため、データ転送方向を選択するための回
路構成が不要となり、データ転送に必要とされる回路規
模を低減することができ、またこの回路規模低減に伴い
電力消費をも低減することができる。In the above structure, the data of the template block resides in each element processor in the processor array. Only the data in the search area shifts in each operation cycle. The scanning direction of the data in the search area at this time is only one direction (from the top to the bottom of the drawing) in the search area 45, as shown in FIG. Since the data transfer in the search area is performed only in one direction, the circuit configuration for selecting the data transfer direction becomes unnecessary, and the circuit scale required for the data transfer can be reduced. Accordingly, power consumption can be reduced.

【０１４１】図２５は、要素プロセサＰＥの具体的構成
を示す図である。先の構成においては単にサーチウイン
ドーデータ格納手段とテンプレートブロックデータ格納
手段のみを説明した。以下この要素プロセサＰＥの具体
的構成について図２５を参照して説明する。FIG. 25 is a diagram showing a specific structure of the element processor PE. In the above configuration, only the search window data storage means and the template block data storage means have been described. The specific configuration of the element processor PE will be described below with reference to FIG.

【０１４２】図２５において、要素プロセサＰＥは、２
段の縦続接続されたサーチウインドーデータを格納する
ためのレジスタ２６−１および２６−２と、選択信号Ｓ
ＥＬ０に従ってレジスタ２６−１および２６−２の一方
の格納データを選択するセレクタ６０と、テンプレート
データを格納するための２段の縦続接続されたレジスタ
２５−１および２５−２と、選択信号ＳＥＬ１に従って
レジスタ２５−１および２５−２の一方の格納データを
選択するセレクタ６２と、セレクタ６０および６２によ
り選択されたデータに対し差分絶対値を求める差分絶対
値回路６４を含む。この構成は図３の構成おけるＭ＝Ｎ
＝２に対応する。In FIG. 25, the element processor PE is 2
Registers 26-1 and 26-2 for storing cascaded search window data, and a selection signal S
A selector 60 for selecting one of the stored data in the registers 26-1 and 26-2 according to EL0, two stages of cascade-connected registers 25-1 and 25-2 for storing template data, and a selection signal SEL1 It includes a selector 62 for selecting data stored in one of the registers 25-1 and 25-2, and a difference absolute value circuit 64 for obtaining a difference absolute value for the data selected by the selectors 60 and 62. This configuration is M = N in the configuration of FIG.
= 2.

【０１４３】レジスタ２６−１および２６−２はシフト
レジスタ段を構成し、図示しないクロック信号に従って
与えられたデータの転送およびラッチを行なう。レジス
タ２５−１および２５−２も同様にシフトレジスタを構
成し、図示しない転送クロック信号に従って与えられた
テンプレートブロックデータのシフトおよびラッチを行
なう。Registers 26-1 and 26-2 form a shift register stage, and transfer and latch data applied according to a clock signal (not shown). Registers 25-1 and 25-2 similarly configure a shift register, and perform shift and latch of template block data applied according to a transfer clock signal (not shown).

【０１４４】セレクタ６０および６２は同期動作をし、
対をなすレジスタ２６−１とレジスタ２５−１またはレ
ジスタ２６−２および２５−２の組を選択する。The selectors 60 and 62 operate synchronously,
A pair of register 26-1 and register 25-1 or a pair of registers 26-2 and 25-2 is selected.

【０１４５】図２５に示す構成においては１つの要素プ
ロセサＰＥが２画素のテンプレートブロックデータに対
する演算を実行する。In the structure shown in FIG. 25, one element processor PE executes an operation on template pixel data of 2 pixels.

【０１４６】テンプレートブロックデータはレジスタ２
５−１および２５−２において、対応の動きベクトルが
検出されるまでは常駐状態とされる。サーチウインドー
データを格納するレジスタ２６−１および２６−２は、
各サイクル毎にデータを１画素分シフトする。セレクタ
６０および６２はそれぞれこの１画素分のサーチウイン
ドーの転送周期の間に２つのレジスタを交互に選択す
る。すなわちサーチウインドーデータの転送周期の１／
２の周期で差分絶対値回路６４は差分絶対値を求める演
算を実行する。The template block data is registered in the register 2
In 5-1 and 25-2, it remains resident until the corresponding motion vector is detected. The registers 26-1 and 26-2 for storing the search window data are
The data is shifted by one pixel for each cycle. The selectors 60 and 62 alternately select two registers during the transfer period of the search window for one pixel. That is, 1 / of the transfer period of search window data
In the cycle of 2, the difference absolute value circuit 64 executes the calculation for obtaining the difference absolute value.

【０１４７】図２６は図２５に示す差分絶対値回路の具
体的構成の一例を示す図である。図２４において、差分
絶対値回路６４は、図２５に示すセレクタ６０からのサ
ーチウインドーデータを負入力（Ａ）に受け、かつ図２
５に示すセレクタ６２からのテンプレートブロックデー
タを正入力（Ｂ）に受ける減算器７０を含む。減算器７
０はその減算結果を符号付マルチビット表示する。符号
ビットＳ（Ａ＞Ｂ）はサーチウインドーデータがテンプ
レートブロックデータよりも大きい場合には“１”とな
り、そうでない場合に“０”となる。この減算器７０の
出力が２の補数表示されるものとする。FIG. 26 is a diagram showing an example of a specific configuration of the differential absolute value circuit shown in FIG. 24, the absolute difference circuit 64 receives the search window data from the selector 60 shown in FIG. 25 at its negative input (A), and
Included is a subtractor 70 which receives the template block data from the selector 62 shown in FIG. 5 at its positive input (B). Subtractor 7
0 displays the subtraction result with a signed multi-bit. The code bit S (A> B) is “1” when the search window data is larger than the template block data, and is “0” otherwise. It is assumed that the output of the subtractor 70 is displayed in 2's complement.

【０１４８】差分絶対値回路６４はさらに、符号ビット
Ｓ（Ａ＞Ｂ）と減算器７０からの残りのビット（大きさ
ビットと称す）とを受けるＥｘＯＲ回路７２と、符号ビ
ットＳ（Ａ＞Ｂ）に応じてＥｘＯＲ回路７２の出力に１
を加算するインクリメンタ７４を含む。インクリメンタ
７４は、符号ビットＳ（Ａ＞Ｂ）が“１”の場合にＥｘ
ＯＲ回路７２の出力に１を加算する。インクリメンタ７
４は、符号ビットＳ（Ａ＞Ｂ）が“０”の場合にはＥｘ
ＯＲ回路７２の出力に１を加算せずにそのまま通過させ
る。ＥｘＣＲ回路７２は、マルチビットの画素データに
対して演算を行なう。画素データの各ビットが符号ビッ
トの値に従って反転または非反転される。The difference absolute value circuit 64 further includes an ExOR circuit 72 that receives the sign bit S (A> B) and the remaining bits (referred to as magnitude bits) from the subtractor 70, and the sign bit S (A> B). 1) to the output of the ExOR circuit 72 according to
Incrementer 74 for adding The incrementer 74 sets Ex when the sign bit S (A> B) is "1".
1 is added to the output of the OR circuit 72. Incrementer 7
4 is Ex when the sign bit S (A> B) is “0”
The output of the OR circuit 72 is passed through as it is without adding 1. The ExCR circuit 72 operates on multi-bit pixel data. Each bit of pixel data is inverted or non-inverted according to the value of the sign bit.

【０１４９】ＥｘＯＲ回路７２は、符号ビットＳ（Ａ＞
Ｂ）が“０”の場合には減算器７０からの大きさビット
（減算器７０の出力のうち符号ビットを除いた残りのビ
ット）をそのまま通過させる。符号ビットＳ（Ａ＞Ｂ）
が“１”の場合、ＥｘＯＲ回路７２は減算器７０の大き
さビットの各ビットを反転する。すなわち、ＥｘＯＲ回
路は減算器７０からの大きさビットの各ビットと符号ビ
ットＳ（Ａ＞Ｂ）とのモジュール２の加算を実行する。The ExOR circuit 72 outputs the sign bit S (A>
When B) is "0", the magnitude bit from the subtractor 70 (the remaining bits of the output of the subtractor 70 excluding the sign bit) is passed as it is. Sign bit S (A> B)
Is 1, the ExOR circuit 72 inverts each of the magnitude bits of the subtractor 70. That is, the ExOR circuit performs module 2 addition of each bit of the magnitude bits from the subtractor 70 and the sign bit S (A> B).

【０１５０】減算器７０は（Ｂ−Ａ）の演算を行なう。
この減算結果が正であれば符号ビットＳ（Ａ＞Ｂ）は
“０”であり、負の場合には符号ビットＳ（Ａ＞Ｂ）は
“１”である。減算器７０の出力は２の補数表示されて
いる。したがって、ＥｘＯＲ回路７２およびインクリメ
ンタ７４により符号ビットＳ（Ａ＞Ｂ）に従って減算器
出力のビット反転および１増分を選択的に行なうことに
より｜Ｂ−Ａ｜の差分絶対値が出力される。Subtractor 70 performs the operation (B-A).
If this subtraction result is positive, the sign bit S (A> B) is "0", and if it is negative, the sign bit S (A> B) is "1". The output of the subtractor 70 is displayed in 2's complement. Therefore, the ExOR circuit 72 and the incrementer 74 selectively perform bit inversion and increment by 1 of the subtractor output in accordance with the sign bit S (A> B), and the absolute difference value of | BA is output.

【０１５１】図２５に示す構成においては、要素プロセ
サＰＥは２個のサーチウインドーブロックデータ格納用
レジスタと２個のテンプレートブロックデータ格納用レ
ジスタを含んでいる。この状態はＭ＝Ｎ＝２の状態に対
応する。この状態においては、セレクタ６０および６２
がサーチウインドーデータのビット転送速度の２倍の周
期で選択動作を行ない、応じて差分絶対値回路６４もこ
のサーチウインドーデータビット転送速度の２倍の速度
で演算を実行している。一般に、Ｎ個のレジスタが設け
られている場合、図２７に示すように、サーチウインド
ーデータ転送の１周期内でセレクタ６０および６２は選
択信号ＳＥＬ０およびＳＥＬ１に従ってＮ個のレジスタ
を順次選択する。この場合、差分絶対値回路はサーチウ
インドーデータのビット転送周期のＮ倍の速度で演算を
実行することになる（ただしＮ＝Ｍの場合）。In the structure shown in FIG. 25, the element processor PE includes two search window block data storage registers and two template block data storage registers. This state corresponds to the state of M = N = 2. In this state, the selectors 60 and 62
Performs the selection operation at a cycle twice the bit transfer rate of the search window data, and accordingly, the absolute difference circuit 64 also executes the calculation at a rate twice the bit rate of the search window data. Generally, when N registers are provided, as shown in FIG. 27, the selectors 60 and 62 sequentially select the N registers according to the selection signals SEL0 and SEL1 within one cycle of the search window data transfer. In this case, the differential absolute value circuit executes the operation at a speed N times the bit transfer cycle of the search window data (provided that N = M).

【０１５２】図２５に示すように、２個のサーチウイン
ドーブロックデータ格納用レジスタと２個のテンプレー
トブロックデータ格納用レジスタを用い、この２組のレ
ジスタを交互に駆動して差分絶対値演算を行なうことに
より、図２８に示すように、１つの要素プロセサにおい
て２つの画素データに関する差分絶対値算出演算が実行
される。この上述のような構成により、要素プロセサの
数を半減して差分絶対値を求めることができ、プロセサ
アレイのサイズを低減することができるという利点を備
える。この場合、複数のデータ格納用レジスタを設ける
ことにより別の利点を得ることもできる。As shown in FIG. 25, two search window block data storage registers and two template block data storage registers are used, and these two sets of registers are alternately driven to perform the absolute difference calculation. By doing so, as shown in FIG. 28, the difference absolute value calculation operation regarding the two pixel data is executed in one element processor. With the above-described configuration, it is possible to obtain the difference absolute value by halving the number of element processors, and it is possible to reduce the size of the processor array. In this case, another advantage can be obtained by providing a plurality of data storage registers.

【０１５３】今、図２９に示すように選択信号ＳＥＬ０
およびＳＥＬ１をサーチウインドーデータの転送速度の
２倍の速度で切換える。この場合には、上述の２つの画
素データの組に対する演算時と同様の選択が実行され
る。演算器活性化制御信号φＳをサーチウインドーデー
タ転送速度と同じ周期で発生する（図２９参照）。この
とき、選択信号ＳＥＬ０およびＳＥＬ１により最初に選
択された画素データの組に対する演算結果（差分絶対
値）のみが生成される。２回目の選択信号ＳＥＬ０およ
びＳＥＬ１が指定する２つめの画素データの組に対する
演算結果は出力されない。Now, as shown in FIG. 29, the selection signal SEL0
And SEL1 are switched at a speed twice the transfer speed of the search window data. In this case, the same selection as at the time of calculation for the above-mentioned two pixel data sets is executed. The arithmetic unit activation control signal φS is generated at the same cycle as the search window data transfer rate (see FIG. 29). At this time, only the calculation result (absolute difference value) for the pixel data set initially selected by the selection signals SEL0 and SEL1 is generated. The operation result for the second set of pixel data designated by the second selection signals SEL0 and SEL1 is not output.

【０１５４】サーチウインドーブロックデータは各周期
で１画素分ずつシフトされる。テンプレートブロックデ
ータは動きベクトルが決定されるまではプロセサアレイ
内において常駐する。したがって、この場合、図３０に
示すように、動きベクトル計算において演算結果に対す
るサブサンプリングを行なったことと等価となり、差分
絶対値の演算における評価点を間引くことができ、低速
の演算回路を用いて評価関数（評価値）を生成すること
ができる。これにより消費電力を低減することができ、
かつまた評価値生成回路規模を低減することができる。The search window block data is shifted by one pixel in each cycle. Template block data remains resident in the processor array until the motion vector is determined. Therefore, in this case, as shown in FIG. 30, it is equivalent to subsampling the calculation result in the motion vector calculation, the evaluation points in the calculation of the absolute difference value can be thinned, and a low-speed calculation circuit is used. An evaluation function (evaluation value) can be generated. This reduces power consumption,
Moreover, the scale of the evaluation value generation circuit can be reduced.

【０１５５】この場合、図３０に示すように、要素プロ
セサにおいて、一方のテンプレートブロックデータのみ
を用いて差分絶対値演算が実行されている（図３０にお
いて、テンプレートブロック４３において斜線で示す画
素データのみが用いられている）。この場合、サブサン
プリングを行なうためには、テンプレートブロック４３
の全部の画素データをプロセサアレイ内の要素プロセサ
に格納する必要はない。必要とされる有効データのみを
要素プロセサ内に格納するだけで同様の効果を得ること
ができる。In this case, as shown in FIG. 30, in the element processor, the difference absolute value calculation is executed using only one template block data (in FIG. 30, only the pixel data indicated by diagonal lines in the template block 43). Is used). In this case, in order to perform subsampling, the template block 43
It is not necessary to store all of the pixel data in the element processor in the processor array. A similar effect can be achieved by storing only the required valid data in the element processor.

【０１５６】図３１は要素プロセサの他の構成例を示す
図である。図３１において、要素プロセサＰＥは、２段
の縦続接続されたサーチウインドーデータ格納用レジス
タ２６−１および２６−２と、選択信号ＳＥＬに従って
レジスタ２６−１および２６−２の格納データを順次選
択するセレクタ６０と、テンプレートブロックデータを
格納する１段のレジスタ２５と、セレクタ６０で選択さ
れたデータとレジスタ２５の格納データとの差分絶対値
を求める差分絶対値回路６４を含む。FIG. 31 is a diagram showing another example of the configuration of the element processor. In FIG. 31, the element processor PE sequentially selects the two-stage cascade-connected search window data storage registers 26-1 and 26-2 and the data stored in the registers 26-1 and 26-2 according to the selection signal SEL. Selector 60, a one-stage register 25 for storing template block data, and a difference absolute value circuit 64 for obtaining a difference absolute value between the data selected by the selector 60 and the data stored in the register 25.

【０１５７】テンプレートブロックを格納するレジスタ
２５には、図３０に示すテンプレートブロック４３にお
ける有効データ（斜線で示すデータ）のみが格納され
る。セレクタ６０は選択信号ＳＥＬに従って、サーチウ
インドーデータＹの転送速度の２倍の速度でレジスタ２
６−１および２６−２の格納データを選択する。差分絶
対値回路６４はこのサーチウインドーデータＹの転送速
度と同じ速度で演算を実行する。これにより、常時、レ
ジスタ２６−１および２６−２の一方のレジスタの格納
データとレジスタ２５に格納された格納データとの差分
絶対値が求められる。差分絶対値回路６４が、常時、与
えられたデータに基づいて演算を実行し、この差分絶対
値回路６４の出力段にサブサンプリング用のデータラッ
チが設けられていてもよい。Only valid data (data indicated by diagonal lines) in the template block 43 shown in FIG. 30 is stored in the register 25 for storing the template block. The selector 60 operates in accordance with the selection signal SEL to register 2 at a speed twice as high as the transfer speed of the search window data Y.
The stored data of 6-1 and 26-2 is selected. The absolute difference circuit 64 executes the calculation at the same speed as the transfer speed of the search window data Y. As a result, the absolute difference value between the data stored in one of the registers 26-1 and 26-2 and the data stored in the register 25 is always obtained. The differential absolute value circuit 64 may always execute the operation based on the given data, and the output stage of the differential absolute value circuit 64 may be provided with a subsampling data latch.

【０１５８】上述の構成により高速でテンプレートブロ
ックに対する動きベクトルの算出を行なうことができ、
また要素プロセサの回路規模も低減される。With the above configuration, the motion vector for the template block can be calculated at high speed,
Also, the circuit scale of the element processor is reduced.

【０１５９】セレクタ６０を用いる代わりに図３０に示
すテンプレートブロック４３の有効データに対する評価
値演算が実行されるように配線によりレジスタ２６−１
および２６−２の一方のみが差分絶対値回路６４へデー
タを転送するように構成されてもよい。Instead of using the selector 60, a register 26-1 is provided by wiring so that the evaluation value calculation is performed on the valid data of the template block 43 shown in FIG.
And 26-2 may be configured to transfer data to the absolute difference circuit 64.

【０１６０】図３２はデータレジスタの構成の一例を示
す図である。図３２において、データレジスタはシフト
レジスタの構成を備える。シフトレジスタＳＲは、転送
クロック信号φをゲートに受けるｎチャネルＭＯＳトラ
ンジスタ（絶縁ゲート型電界効果トランジスタ）ＮＴ１
と、相補転送クロック信号／φをゲートに受けるｐチャ
ネルＭＯＳトランジスタＰＴ１を含む。トランジスタＮ
Ｔ１およびＰＴ１はクロック信号φに応答して与えられ
たデータを伝達するＣＭＯＳトランスミッションゲート
を構成する。FIG. 32 shows an example of the structure of the data register. In FIG. 32, the data register has a shift register configuration. The shift register SR includes an n-channel MOS transistor (insulated gate field effect transistor) NT1 that receives the transfer clock signal φ at its gate.
And a p-channel MOS transistor PT1 receiving the complementary transfer clock signal / φ at its gate. Transistor N
T1 and PT1 form a CMOS transmission gate transmitting the applied data in response to clock signal φ.

【０１６１】シフトレジスタＳＲはさらに、このトラン
ジスタＮＴ１およびＰＴ１から転送されたデータを反転
するインバータ回路ＩＶ１と、インバータ回路ＩＶ１の
出力データを反転してインバータ回路ＩＶ１の入力へ伝
達するインバータ回路ＩＶ２と、転送クロック信号／φ
をゲートに受けるｎチャネルＭＯＳトランジスタＮＴ２
と、転送クロック信号φをゲートに受けるｐチャネルＭ
ＯＳトランジスタＰＴ２と、トランジスタＮＴ２および
ＰＴ２により伝達されたデータを反転するインバータ回
路ＩＶ３と、インバータ回路ＩＶ３の出力データを反転
してインバータ回路ＩＶ３の入力へ伝達するインバータ
回路ＩＶ４を含む。Shift register SR further includes an inverter circuit IV1 for inverting the data transferred from transistors NT1 and PT1, an inverter circuit IV2 for inverting the output data of inverter circuit IV1 and transmitting it to the input of inverter circuit IV1. Transfer clock signal / φ
Channel MOS transistor NT2 which receives at its gate
And a p-channel M whose gate receives the transfer clock signal φ.
It includes an OS transistor PT2, an inverter circuit IV3 that inverts the data transmitted by the transistors NT2 and PT2, and an inverter circuit IV4 that inverts the output data of the inverter circuit IV3 and transmits it to the input of the inverter circuit IV3.

【０１６２】インバータ回路ＩＶ１およびＩＶ２はイン
バータラッチ回路を構成し、インバータ回路ＩＶ３およ
びＩＶ４はまた他方のインバータラッチ回路を構成す
る。トランジスタＮＴ２およびＰＴ２は相補転送クロッ
ク信号／φに応答して与えられたデータを伝達するＣＭ
ＯＳトランスミッションゲートを構成する。Inverter circuits IV1 and IV2 form an inverter latch circuit, and inverter circuits IV3 and IV4 also form the other inverter latch circuit. Transistors NT2 and PT2 are CMs for transmitting the applied data in response to complementary transfer clock signal / φ.
Configure OS transmission gate.

【０１６３】図３２に示すようなシフトレジスタは、デ
ータビット数に応じて複数個並列に設けられる。A plurality of shift registers as shown in FIG. 32 are provided in parallel according to the number of data bits.

【０１６４】図３２に示すようなシフトレジスタＳＲを
サーチウインドー格納用データレジスタおよびテンプレ
ートブロックデータ格納用データレジスタとして利用す
ることにより、容易にデータの一方方向転送およびデー
タの選択が実現される。差分絶対値回路へデータを与え
る場合には、このインバータ回路ＩＶ３の入力部からデ
ータを取出せばよい。By utilizing the shift register SR as shown in FIG. 32 as a search window storage data register and a template block data storage data register, one-way transfer of data and selection of data can be easily realized. When data is given to the absolute difference circuit, the data may be taken out from the input part of the inverter circuit IV3.

【０１６５】図３３は要素プロセサの他の構成例を示す
図である。図３３において、要素プロセサＰＥは、Ｎワ
ードのサーチウインドーデータを格納するＮワードレジ
スタファイル７３と、Ｍワードのテンプレートブロック
データを格納するＭワードレジスタファイル７５と、Ｎ
ワードレジスタファイル７３およびＭワードレジスタフ
ァイル７５から読出された１ワードのデータの差分絶対
値をとる差分絶対値回路６４を含む。１ワードが１画素
データに対応する。Ｎワードレジスタファイル７３およ
びＭワードレジスタファイル７５からはそれぞれ１ワー
ドずつデータが読出され、差分絶対値回路６４へ与えら
れる。レジスタファイル７３および７５は、ＦＩＦＯ構
成を備える複数のレジスタを備える。FIG. 33 is a diagram showing another configuration example of the element processor. In FIG. 33, the element processor PE has an N-word register file 73 for storing N-word search window data, an M-word register file 75 for storing M-word template block data, and an N-word register file 75.
It includes a difference absolute value circuit 64 for calculating a difference absolute value of 1-word data read from the word register file 73 and the M word register file 75. One word corresponds to one pixel data. Data is read out word by word from the N word register file 73 and the M word register file 75, and applied to the absolute difference circuit 64. The register files 73 and 75 include a plurality of registers having a FIFO structure.

【０１６６】この構成においては、セレクタ回路を設け
る必要がなく、データの読出をレジスタファイル７３お
よび７５において同期して行なうことにより対応の画素
データの組を差分絶対値回路６４へ与えることができ
る。レジスタファイルの構成については後に詳細に説明
する。この場合においても差分絶対値回路６４は、その
処理方法（サブサンプリングまたはすべての画素データ
のサンプリング）に応じて演算速度が決定されるかまた
はデータ出力速度が決定される。In this structure, it is not necessary to provide a selector circuit, and a corresponding pixel data set can be applied to absolute difference value circuit 64 by synchronously reading data in register files 73 and 75. The structure of the register file will be described later in detail. In this case as well, the difference absolute value circuit 64 has a calculation speed determined or a data output speed determined according to its processing method (sub-sampling or sampling of all pixel data).

【０１６７】図３４は要素プロセサのさらに他の構成例
を示す図である。図３４に示す要素プロセサＰＥは、図
３３に示す要素プロセサＰＥにおいて差分絶対値回路６
４に代えて差分絶対値和回路６５を含む。すなわちこの
図３４に示す要素プロセサＰＥにおいては、Ｍワードレ
ジスタファイル７５に格納されたＭワードのテンプレー
トブロックデータに対する差分絶対値の演算および求め
られた差分絶対値の累算が実行される。総和部にはＭ個
のテンプレートブロック画素データについての部分和が
伝達されるため、総和演算を高速化することができる。FIG. 34 is a diagram showing still another configuration example of the element processor. The element processor PE shown in FIG. 34 is the same as the element processor PE shown in FIG.
Instead of 4, the absolute difference value sum circuit 65 is included. That is, in the element processor PE shown in FIG. 34, the calculation of the absolute difference value for the M word template block data stored in the M word register file 75 and the accumulation of the obtained absolute difference values are executed. Since the partial sums of the M template block pixel data are transmitted to the summing unit, the summing operation can be speeded up.

【０１６８】図３５は図３３および図３４に示すレジス
タファイルにおける１ビットの構成を示す図である。図
３５において、レジスタファイルの基本単位は、ワード
線ＷＬとビット線対ＢＬおよび／ＢＬとの交差部に対応
して配置されるメモリセルＭＣを含む。メモリセルＭＣ
はＳＲＡＭセル構造を備え、ソースとドレインとが交差
接続された、ｎチャネルＭＯＳトランジスタＮＴ１０お
よびＮＴ１２と、ストレージノードＳＮ１を“Ｈ”レベ
ルへプルアップするための負荷素子として機能する抵抗
接続されたｎチャネルＭＯＳトランジスタＮＴ１４と、
ストレージノードＳＮ２をプルアップするための抵抗接
続されたｎチャネルＭＯＳトランジスタＮＴ１６と、ワ
ード線ＷＬ上の信号電位に応答してストレージノードＳ
Ｎ１およびＳＮ２をビット線ＢＬおよび／ＢＬへそれぞ
れ接続するｎチャネルＭＯＳトランジスタＮＴ１８およ
びＮＴ２０を含む。FIG. 35 shows a structure of 1 bit in the register file shown in FIGS. 33 and 34. In FIG. 35, the basic unit of the register file includes memory cells MC arranged corresponding to the intersections of word lines WL and bit line pairs BL and / BL. Memory cell MC
Is an SRAM cell structure having n-channel MOS transistors NT10 and NT12 whose sources and drains are cross-connected and resistance-connected n which functions as a load element for pulling up the storage node SN1 to the "H" level. A channel MOS transistor NT14,
A resistance-connected n-channel MOS transistor NT16 for pulling up the storage node SN2 and the storage node S in response to the signal potential on the word line WL.
Includes n channel MOS transistors NT18 and NT20 connecting N1 and SN2 to bit lines BL and / BL, respectively.

【０１６９】このメモリセルＭＣは実質的に交差結合さ
れたインバータ回路を記憶素子として利用する。構成要
素数は６個のトランジスタであり、図３２に示すシフト
レジスタよりもその１ビット当りの占有面積をより小さ
くすることができる。This memory cell MC utilizes substantially cross-coupled inverter circuits as storage elements. The number of constituent elements is six transistors, and the occupied area per bit can be made smaller than that of the shift register shown in FIG.

【０１７０】図３６はレジスタファイルの全体の構成を
示す図である。図３６において、レジスタファイルは、
図３５に示すメモリセルが行列状に配置されたメモリセ
ルアレイ８０と、書込アドレスを発生する書込用アドレ
スポインタ８２と、アドレスポインタ８２からのアドレ
スに従ってメモリセルアレイ８０における１ワードを選
択し、選択された１ワードへデータを書込む書込制御回
路８４と、読出用のワードアドレスを発生する読出用ア
ドレスポインタ８８と、アドレスポインタ８８からのア
ドレスに従ってメモリセルアレイ８０における対応のワ
ードを選択し、選択されたワードのデータを読出す読出
制御回路８６を含む。メモリセルアレイ８０は、１ワー
ド領域に１つの画素データを格納する。１ワードが１本
のワード線に対応してもよい。FIG. 36 shows the overall structure of the register file. In FIG. 36, the register file is
A memory cell array 80 in which the memory cells shown in FIG. 35 are arranged in a matrix, a write address pointer 82 for generating a write address, and one word in the memory cell array 80 is selected according to the address from the address pointer 82, and selected. A write control circuit 84 for writing data to one word, a read address pointer 88 for generating a read word address, and a corresponding word in the memory cell array 80 is selected according to the address from the address pointer 88. A read control circuit 86 for reading the data of the selected word is included. The memory cell array 80 stores one pixel data in one word area. One word may correspond to one word line.

【０１７１】データ書込時においては、アドレスポイン
タ８２が書込アドレスを発生し、書込制御回路８４がこ
の発生された書込アドレスに従って対応のワード線ＷＬ
およびビット線対（１本のワード線に複数ワードが格納
される場合）を選択する。この選択されたワードに対し
書込データが書込制御回路８４の制御の下に書込まれ
る。データ読出時においてはアドレスポインタ８８およ
び読出制御回路８６により同様にワードの選択が実行さ
れ、選択されたワードのデータが読出される。In data writing, address pointer 82 generates a write address, and write control circuit 84 causes a corresponding word line WL in accordance with the generated write address.
And a bit line pair (when multiple words are stored in one word line). Write data is written to the selected word under the control of write control circuit 84. At the time of data reading, word selection is similarly performed by address pointer 88 and read control circuit 86, and the data of the selected word is read.

【０１７２】画素データのシフト動作時においては、ま
ずアドレスポインタ８８から最も古いデータが格納され
たアドレスが発生され、読出制御回路８６はこの最も古
いデータワードを選択し、対応のワードのデータをメモ
リセルアレイ８０から読出す。アドレスポインタ８２
は、最も新しく書込まれたアドレスの次のアドレスを指
定する。書込制御回路８４はこの最も新しく書込まれた
ワードの次のアドレス位置を選択し、そのアドレス位置
に書込データを書込む。これにより１画素分のサーチウ
インドーデータの転送またはテンプレートブロックデー
タの転送が実現される。書込経路と読出経路が異なって
いるため、読出データと書込データとが衝突することは
ない。アドレスポインタ８２および８８として最大カウ
ント値Ｍのリングカウンタを用いればこの構成は容易に
実現することができる。In the pixel data shift operation, the address in which the oldest data is stored is first generated from the address pointer 88, and the read control circuit 86 selects the oldest data word and stores the data of the corresponding word in the memory. Read from the cell array 80. Address pointer 82
Specifies the address next to the most recently written address. The write control circuit 84 selects the address position next to the most recently written word, and writes the write data at that address position. As a result, the transfer of search window data for one pixel or the transfer of template block data is realized. Since the write path and the read path are different, read data and write data do not collide. This structure can be easily realized by using ring counters having the maximum count value M as the address pointers 82 and 88.

【０１７３】差分絶対値を求める演算動作時において
は、読出用のアドレスポインタ８８は所定の順序で格納
されたＭワードを順次選択するようにアドレスを発生す
る。この場合サブサンプリングが行なわれるＮ＞Ｍの構
成の場合には最も古くから書込まれているデータから順
次データワードの読出が行なわれる。Ｎ＝Ｍであり、サ
ブサンプリングが行なわれない場合にはアドレスポイン
タ８８は所定の順序で格納されたすべてのワードデータ
を順次読出すようにアドレスを発生する。In the operation of calculating the absolute difference value, the read address pointer 88 generates an address so as to sequentially select the M words stored in a predetermined order. In this case, in the case of N> M in which subsampling is performed, the data word is sequentially read from the oldest written data. When N = M and sub-sampling is not performed, address pointer 88 generates an address so as to sequentially read all the word data stored in a predetermined order.

【０１７４】サブサンプリングが行なわれる場合、必要
とされるデータのみが所定のサブサンプリング速度で読
出されてもよい。When sub-sampling is performed, only the required data may be read at a given sub-sampling rate.

【０１７５】初期設定値において各レジスタファイルに
ＮワードまたはＭワードのデータを書込む場合、このレ
ジスタファイルを介してデータの転送が行なわれる。こ
の場合最下流のレジスタファイルにおいてＮまたはＭワ
ードのデータワードが格納されたときには順次１ワード
ずつデータが読出されかつ書込みが行なわれるように構
成されてもよい。この構成は書込用のアドレスポインタ
８２のアドレスが所定値に達したときに読出制御回路８
６がイネーブル状態とされるように構成することにより
実現される。When N word data or M word data is written in each register file at the initial setting value, data transfer is performed through this register file. In this case, when N or M data words are stored in the most downstream register file, data may be sequentially read and written word by word. With this configuration, when the address of the write address pointer 82 reaches a predetermined value, the read control circuit 8
This is realized by configuring 6 to be enabled.

【０１７６】また初期設定値のみ書込データを読出デー
タバス経路へバイパスする経路が設けられてもよい。こ
の構成の場合、各レジスタファイルにおいて転送クロッ
クをカウントしそのクロックカウント値が所定値に達し
たときにバイパス経路が遮断され、データの書込が行な
われ、所定ワードの書込が完了したときにデータの書込
が禁止される。A path for bypassing write data only to the initial set value to the read data bus path may be provided. In this configuration, when the transfer clock is counted in each register file, the bypass path is cut off when the clock count value reaches a predetermined value, data writing is performed, and when the writing of the predetermined word is completed. Writing of data is prohibited.

【０１７７】図３６に示すレジスタファイルはまた図４
に示すサイドウインドーデータ格納用データバッファと
しても利用することができる。The register file shown in FIG. 36 is also shown in FIG.
It can also be used as a data buffer for storing side window data shown in.

【０１７８】図３７は、図４に示すデータバッファの具
体的構成の一例を示す図である。図３７においてデータ
バッファＤＬはＲワードのデータを格納するＲワードレ
ジスタ９０を備える。Ｒワードレジスタ９０は図３６に
示すデュアルポートメモリの構成を備えてもよい。また
Ｒワードのデータを格納するシフトレジスタにより構成
されてもよい。また、さらに、Ｄラッチのような遅延素
子が縦続接続される構成が利用されてもよい。FIG. 37 shows an example of a specific structure of the data buffer shown in FIG. In FIG. 37, the data buffer DL includes an R word register 90 that stores R word data. The R word register 90 may have a dual port memory configuration shown in FIG. It may also be composed of a shift register that stores R word data. Furthermore, a configuration in which delay elements such as D latches are connected in cascade may be used.

【０１７９】遅延手段としてシフトレジスタと異なりレ
ジスタファイルを用いることにより回路規模および消費
電力を低減することができる（構成トランジスタ数が少
ないためである）。Unlike the shift register, a register file is used as the delay means, whereby the circuit scale and power consumption can be reduced (because the number of constituent transistors is small).

【０１８０】図３８は差分絶対値回路の他の構成例を示
す図である。図３８に示す差分絶対値回路は、サーチウ
インドーデータとテンプレートブロックデータとの減算
を行なう減算器７０と、減算器７０の出力の大きさビッ
トと符号ビットＳ（Ａ＞Ｂ）を受けるＥｘＯＲ回路７２
を含む。この図３８に示す差分絶対値回路６４は、図２
６に示す差分絶対値回路と異なりインクリメンタを含ん
でいない。インクリメンタの機能は、符号ビットＳ（Ａ
＞Ｂ）が“１”のときにＥｘＯＲ回路７２の出力に１を
加算することである。この１を加算する動作は、図３８
に示す差分絶対値回路６４においては実行されない。次
段の総和部において実行される。すなわち、図３８に示
す差分絶対値回路６４は、差分絶対値ＰＥ♯と符号ビッ
トＳ♯を出力し、次段の総和部へ与える。FIG. 38 is a diagram showing another configuration example of the difference absolute value circuit. The absolute difference circuit shown in FIG. 38 is a subtractor 70 that subtracts search window data and template block data, and an ExOR circuit that receives a magnitude bit and a sign bit S (A> B) of the output of the subtractor 70. 72
including. The absolute difference value circuit 64 shown in FIG.
Unlike the difference absolute value circuit shown in FIG. 6, it does not include an incrementer. The function of the incrementer is to determine the sign bit S (A
> B) is “1”, 1 is added to the output of the ExOR circuit 72. The operation of adding 1 is shown in FIG.
The difference absolute value circuit 64 shown in FIG. It is executed in the summing section in the next stage. That is, the absolute difference circuit 64 shown in FIG. 38 outputs the absolute difference value PE # and the sign bit S # and supplies them to the summing unit of the next stage.

【０１８１】図３９は、図３８に示す差分絶対値回路を
用いた場合の総和部の全体の構成を示すブロック図であ
る。図３９において総和部１２は、ｎ個の要素プロセサ
からの差分絶対値ＰＥ♯１〜ＰＥ♯ｎと符号ビットＳ♯
１〜Ｓ♯ｎを加算する。差分絶対値ＰＥ♯１〜ＰＥ♯ｎ
はマルチビットデータであり、符号ビットＳ♯１〜Ｓ♯
ｎは１ビットデータである。総和部１２は、その構成は
後に説明するが、差分絶対値ＰＥ♯１〜ＰＥ♯ｎを受け
る、全加算器からなるコンプレッサで構成される。符号
ビットＳ♯１〜Ｓ♯ｎはこのコンプレッサの最下位ビッ
トのキャリ入力へ与えられる。これにより加算操作の高
速化および装置規模の低減を図る。FIG. 39 is a block diagram showing the entire structure of the summation unit when the difference absolute value circuit shown in FIG. 38 is used. In FIG. 39, the summation unit 12 determines the absolute difference values PE # 1 to PE # n from the n element processors and the sign bit S #.
1 to S # n are added. Difference absolute values PE # 1 to PE # n
Is multi-bit data, and sign bits S # 1 to S #
n is 1-bit data. The summing unit 12, whose configuration will be described later, is composed of a compressor including a full adder for receiving the absolute difference values PE # 1 to PE # n. Sign bits S # 1-S # n are applied to the carry input of the least significant bit of this compressor. This will speed up the addition operation and reduce the device scale.

【０１８２】図４０は、図３９に示す総和部の具体的構
成の一例を示す図である。図４０においては、ｎ個の要
素プロセサの出力を加算する場合の構成が一例として示
される。要素プロセサの数に応じてこの図４０に示す構
成が拡張される。FIG. 40 is a diagram showing an example of a specific structure of the summing unit shown in FIG. In FIG. 40, a configuration in the case of adding the outputs of n element processors is shown as an example. The configuration shown in FIG. 40 is expanded according to the number of element processors.

【０１８３】図４０において、総和部１２は、４つの要
素プロセサからの差分絶対値に対応する値Ｐ♯１〜Ｐ♯
４を入力Ａ、Ｂ、ＣおよびＤにそれぞれ受けかつキャリ
入力に符号ビットＳ♯１を受けて加算を行ない、該加算
結果を２出力ＥおよびＦから出力する４対２コンプレッ
サ１０２ａと、差分絶対値に対応する値Ｐ♯５〜Ｐ♯８
を入力Ａ〜Ｄにそれぞれ受けかつ最下位ビットのキャリ
入力に符号ビットＳ♯２を受けるとともに与えられたデ
ータの加算を行ない、２出力ＥおよびＦから該加算結果
を出力する４対２コンプレッサ１０２ｂと、４対２コン
プレッサ１０２ａおよび１０２ｂの出力をその４入力
Ａ、Ｂ、ＣおよびＤに受け、かつその最下位ビット位置
に符号ビットＳ♯３、Ｓ♯４およびＳ♯５を受ける４対
２コンプレッサ１０２ｃを含む。In FIG. 40, summation unit 12 has values P # 1 to P # corresponding to the absolute difference values from the four element processors.
4 to the inputs A, B, C and D respectively, and the carry input to receive the sign bit S # 1 to perform addition and to output the addition result from the two outputs E and F. Values corresponding to the values P # 5 to P # 8
To the inputs A to D and the sign bit S # 2 to the carry input of the least significant bit, add the given data, and output the addition result from the two outputs E and F. And 4 to 2 which receives the outputs of the 4 to 2 compressors 102a and 102b at their 4 inputs A, B, C and D, and receives the sign bits S # 3, S # 4 and S # 5 at their least significant bit positions. A compressor 102c is included.

【０１８４】４対２コンプレッサ１０２ｃが３ビットの
符号ビットＳ♯３、Ｓ♯４およびＳ♯５を受けることが
できるのは、この４対２コンプレッサ１０２ｃが３段の
全加算回路段を含むためである。この構成については後
により具体的に説明する。The 4 to 2 compressor 102c can receive the sign bits S # 3, S # 4 and S # 5 of 3 bits because the 4 to 2 compressor 102c includes three full adder circuit stages. Is. This configuration will be specifically described later.

【０１８５】総和部１２はさらに、４対２コンプレッサ
１０２ｃの出力（ＥおよびＦ）をその入力ＡおよびＢに
受けかつその最下位ビットのキャリ入力に符号ビットＳ
♯６、Ｓ♯７およびＳ♯８を受ける加算器１０４を含
む。加算器１０４から総和結果が出力される。The summing unit 12 further receives the outputs (E and F) of the 4-to-2 compressor 102c at its inputs A and B, and has the sign bit S at the carry input of its least significant bit.
Includes adder 104 receiving # 6, S # 7 and S # 8. The summation result is output from the adder 104.

【０１８６】この総和部１２は、いわゆるワレスツリー
（ＷａｌｌａｃｅＴｒｅｅ）の構成を備え、キャリ伝
搬遅延を最小にして高速で加算を実行することができ
る。ここで、図４０に示す総和部１２は、累算器を備え
ていない。演算速度がサーチウインドーデータ転送速度
よりも早い場合には、複数回の加算を実行する必要があ
る。このため、加算回路１０４の出力部に累算器が設け
られる（要素プロセサが複数のテンプレートブロックデ
ータ格納手段を備える場合）。各演算サイクル毎に要素
プロセサから演算結果データ（差分絶対値和データ）が
総和部へ転送されてもよい。The summing unit 12 has a so-called Wallace Tree configuration, and can carry out addition at high speed with a minimum carry propagation delay. Here, the summing unit 12 shown in FIG. 40 does not include an accumulator. If the calculation speed is faster than the search window data transfer speed, it is necessary to execute additions a plurality of times. Therefore, an accumulator is provided at the output of the adder circuit 104 (when the element processor has a plurality of template block data storage means). The calculation result data (sum of absolute difference values) may be transferred from the element processor to the summing unit for each calculation cycle.

【０１８７】上述の構成においては、符号ビットを最下
位ビットのキャリ入力に与えているため、小さい回路規
模で高速で加算を実行することができる。次にこの４対
２コンプレッサ（４入力２出力加算器）の構成および総
和部の具体的構成について説明する。In the above structure, since the sign bit is given to the carry input of the least significant bit, the addition can be executed at a high speed with a small circuit scale. Next, the configuration of the 4-to-2 compressor (4-input 2-output adder) and the specific configuration of the summing unit will be described.

【０１８８】図４１は図４０に示す４対２コンプレッサ
の具体的構成の一例を示す図である。図４１に示す４対
２コンプレッサは、与えられたデータが４ビット幅の場
合に対する構成を備える。入力データのビット幅が大き
くなればこの図４１に示す構成が拡張される。FIG. 41 is a diagram showing an example of a specific configuration of the 4 to 2 compressor shown in FIG. The 4 to 2 compressor shown in FIG. 41 has a configuration for the case where the supplied data has a 4-bit width. The configuration shown in FIG. 41 is expanded as the bit width of the input data increases.

【０１８９】図４１において、４対２コンプレッサ１０
２は、各々が入力ＡおよびＢと、キャリ入力Ｃｉｎと、
キャリ出力Ｃｏと、和出力Ｓとを備える並列に配置され
る全加算回路１１０ａ、１１０ｂ、１１０ｃおよび１１
０ｄを含む。全加算回路１１０ａ〜１１０ｄのＡ入力お
よびＢ入力に４ビット入力データＡ＜３；０＞およびＢ
＜３；０＞が与えられかつキャリ入力Ｃｉｎに入力デー
タＣ＜３；０＞が与えられる。ここで、「Ａ＜３；０
＞」はデータＡがビットＡ０を最下位ビットとしかつビ
ットＡ３を最上位ビットとする４ビットデータであるこ
とを示す。In FIG. 41, the 4 to 2 compressor 10
2 has inputs A and B, carry input Cin, and
Full adders 110a, 110b, 110c and 11 arranged in parallel, each having a carry output Co and a sum output S
Including 0d. 4-bit input data A <3;0> and B are input to A and B inputs of full adder circuits 110a to 110d.
<3;0> is given and input data C <3;0> is given to carry input Cin. Here, "A <3; 0
>"Indicates that the data A is 4-bit data in which the bit A0 is the least significant bit and the bit A3 is the most significant bit.

【０１９０】４対２コンプレッサ１０２はさらに、初段
の全加算回路１１０ａ〜１１０ｄの和出力Ｓおよびキャ
リ出力Ｃｏと入力データＤ＜３；０＞との加算を行なう
全加算回路１１０ｅ、１１０ｆ、１１０ｇおよび１１０
ｈを含む。全加算回路１１０ａ〜１１０ｄと全加算回路
１１０ｅ〜１１０ｈとは桁合わせして配置される。初段
の全加算回路１１０ａ〜１１０ｄの和出力Ｓは次段の対
応の全加算回路の入力（ＡまたはＢ）に与えられる。初
段の全加算回路１１０ａ〜１１０ｄのキャリ出力Ｃｏは
次段の全加算回路において１ビット上位の全加算回路の
キャリ入力へ与えられる。The 4-to-2 compressor 102 further includes full adder circuits 110e, 110f, 110g for performing addition of the sum output S and carry output Co of the initial stage full adder circuits 110a to 110d and the input data D <3;0>. 110
Including h. The full-addition circuits 110a to 110d and the full-addition circuits 110e to 110h are aligned with each other. The sum output S of the first-stage full adder circuits 110a to 110d is given to the input (A or B) of the corresponding full-adder circuit of the next stage. The carry output Co of the first-stage full adder circuits 110a to 110d is applied to the carry input of the 1-bit higher full adder circuit in the next stage full adder circuit.

【０１９１】全加算回路（ＦＡ）において最下位ビット
の全加算回路１１０ｈのキャリ入力１０４には０が印加
される。すなわち全加算回路１１０ｈのキャリ入力は空
きキャリとなる。本実施例においてはこの空きキャリ１
０４へ符号ビットＳ♯を与える。４対２コンプレッサ１
０２からは５ビットデータＥ＜４；０＞およびＦ＜４；
０＞が出力される。全加算回路（ＦＡ）１１０ａ〜１１
０ｈの和出力ＳがデータビットＦ＜３；０＞を与え、全
加算回路（ＦＡ）１１０ｅ〜１１０ｈのキャリ出力がデ
ータビットＥ＜４；０＞を与える。初段の全加算回路１
１０ａのキャリ出力がデータビットＦ＜４＞を与える。In the full adder circuit (FA), 0 is applied to the carry input 104 of the least significant bit full adder circuit 110h. That is, the carry input of full adder circuit 110h becomes an empty carry. In this embodiment, this empty carrier 1
The code bit S # is supplied to 04. 4 to 2 compressor 1
From 02, 5-bit data E <4;0> and F <4;
0> is output. Full adder circuits (FA) 110a-11
The sum output S of 0h gives data bits F <3;0>, and the carry outputs of full adder circuits (FA) 110e-110h give data bits E <4;0>. First-stage full adder circuit 1
The carry output of 10a provides the data bit F <4>.

【０１９２】この図４１に示す４対２コンプレッサ１０
２の構成においては、キャリ伝搬は存在しない。演算に
要する遅延時間は全加算回路２段分だけである。これに
より高速で加算を実行することができる。また空きキャ
リ１０４へ符号ビットＳ♯を与える構成とするため、回
路規模を増加させることなく差分絶対値の加算を実行す
ることができる。The 4 to 2 compressor 10 shown in FIG.
In the configuration of 2, there is no carry propagation. The delay time required for the calculation is only for two full adder circuits. Thereby, the addition can be executed at high speed. Further, since the configuration is such that the sign bit S # is given to the empty carrier 104, the addition of absolute difference values can be executed without increasing the circuit scale.

【０１９３】図４２は、図４０に示す回路構成の具体的
接続形態を示す図である。この図４２に示す総和部は、
図４１に示す４対２コンプレッサを利用する。今、差分
絶対値Ｐ♯ｉを（Ｐｉ３，Ｐｉ２，Ｐｉ１，Ｐｉ０）で
表わす。FIG. 42 is a diagram showing a specific connection form of the circuit configuration shown in FIG. The summation section shown in FIG. 42 is
The 4 to 2 compressor shown in FIG. 41 is used. Now, the absolute difference value P # i is represented by (Pi3, Pi2, Pi1, Pi0).

【０１９４】４対２コンプレッサ１０２ａは差分絶対値
Ｐ♯１〜Ｐ♯３を加算する全加算回路ＦＡ１〜ＦＡ４
と、全加算回路ＦＡ１〜ＦＡ４の出力と差分絶対値Ｐ♯
４とを加算する全加算回路ＦＡ５、ＦＡ６、Ａ７および
ＦＡ８を含む。全加算回路ＦＡ８のキャリ入力へ符号ビ
ットＳ♯１が与えられる。The 4-to-2 compressor 102a has full adder circuits FA1 to FA4 for adding the absolute difference values P # 1 to P # 3.
And the outputs of the full adders FA1 to FA4 and the absolute difference value P #
4 and full adders FA5, FA6, A7 and FA8 are included. Sign bit S # 1 is applied to the carry input of full adder circuit FA8.

【０１９５】４対２コンプレッサ１０２ｂは、差分絶対
値Ｐ♯５〜Ｐ♯７を加算する全加算回路ＦＡ９、ＦＡ１
０、ＦＡ１１およびＦＡ１２と、全加算回路ＦＡ９〜Ｆ
Ａ１２の出力と差分絶対値Ｐ♯８とを加算する全加算回
路ＦＡ１３、ＦＡ１４、ＦＡ１５およびＦＡ１６を含
む。全加算回路ＦＡ１６のキャリ入力へ符号ビットＳ♯
２が与えられる。The 4-to-2 compressor 102b includes full adder circuits FA9 and FA1 for adding the absolute difference values P # 5 to P # 7.
0, FA11 and FA12, and full adder circuits FA9-F
Full adder circuits FA13, FA14, FA15 and FA16 for adding the output of A12 and absolute difference value P # 8 are included. Sign bit S # to carry input of full adder FA16
2 is given.

【０１９６】４対２コンプレッサ１０２ｃは、４対２コ
ンプレッサ１０２ａの出力と、４対２コンプレッサ１０
２ｂの一方出力（全加算回路ＦＡ１３〜ＦＡ１６の和出
力および全加算回路ＦＡ９のキャリ出力）とを加算する
全加算回路ＦＡ１７〜ＦＡ２１と、全加算回路ＦＡ１７
〜ＦＡ２１の出力と４対２コンプレッサ１０２ｂの他方
出力（全加算回路ＦＡ１３〜ＦＡ１６のキャリ出力）を
加算する全加算回路ＦＡ２２ないしＦＡ２６を含む。全
加算回路ＦＡ２６のキャリ入力および一方入力へ符号ビ
ットＳ♯４およびＳ♯５が与えられる。The 4 to 2 compressor 102c outputs the output of the 4 to 2 compressor 102a and the 4 to 2 compressor 10.
2b one output (sum output of full adder circuits FA13 to FA16 and carry output of full adder circuit FA9) full adder circuits FA17 to FA21 and full adder circuit FA17
To FA21 and full-addition circuits FA22 to FA26 for adding the other output of the 4-to-2 compressor 102b (carry output of full-addition circuits FA13 to FA16). Sign bits S # 4 and S # 5 are applied to the carry input and one input of full adder circuit FA26.

【０１９７】加算器１０４は、符号ビットＳ♯６ないし
Ｓ♯８を加算する全加算回路ＦＡ２７と、この全加算回
路ＦＡ２７の出力と４対２コンプレッサ１０２ｃの出力
とを加算する全加算回路ＦＡ２８ないしＦＡ３３と、全
加算回路ＦＡ２８ないしＦＡ３３の出力を受けて最終加
算結果を出力する全加算回路ＦＡ３４ないしＦＡ３９を
含む。全加算回路ＦＡ２８ないしＦＡ３３は３対２コン
プレッサを構成する。全加算回路ＦＡ３４〜ＦＡ３９は
リップルキャリ型加算器を構成する。他の加算器の構成
（たとえば桁上げ先見型加算器）の構成が利用されても
よい。Adder 104 includes full adder circuit FA27 for adding sign bits S # 6 to S # 8, and full adder circuit FA28 for adding the output of full adder circuit FA27 and the output of 4-to-2 compressor 102c. FA33 and full adder circuits FA34 to FA39 which receive the outputs of full adder circuits FA28 to FA33 and output the final addition result. Full adders FA28 to FA33 form a 3 to 2 compressor. Full adders FA34 to FA39 form a ripple carry type adder. Other adder configurations (eg, carry look ahead adders) configurations may be utilized.

【０１９８】図４２に示すようにコンプレッサを利用し
て加算を行なうことによりキャリ伝搬に伴う遅延を最小
限に抑えることができ、高速で加算を実行することがで
きる。As shown in FIG. 42, by using a compressor to perform addition, the delay caused by carry propagation can be minimized, and the addition can be performed at high speed.

【０１９９】図４３は図１に示す比較部の具体的構成を
示す図である。図４３において、比較部３は、総和部か
ら与えられる評価値（評価関数）を格納するためのレジ
スタラッチ１３０と、レジスタラッチ１３０に格納され
た評価値と総和部から新たに与えられる評価値（評価関
数）の大きさを比較する比較器１３２と、評価値算出サ
イクル数をカウントするカウンタ１３４と、比較器１３
２の出力に応答してカウンタ１３４のカウント値を格納
するレジスタラッチ１３６を含む。レジスタラッチ１３
６から動きベクトルがそのまま出力されてもよく、また
図において破線のブロック１３７で示すように、レジス
タラッチ１３６の出力を所定の形式でコード化するデコ
ーダが設けられてもよい。次に動作について説明する。FIG. 43 is a diagram showing a specific structure of the comparison unit shown in FIG. In FIG. 43, the comparison unit 3 includes a register latch 130 for storing an evaluation value (evaluation function) given from the summation unit, an evaluation value stored in the register latch 130, and an evaluation value newly given from the summation unit ( A comparator 132 for comparing the size of the evaluation function), a counter 134 for counting the number of evaluation value calculation cycles, and a comparator 13.
A register latch 136 for storing the count value of the counter 134 in response to the output of 2 is included. Register latch 13
6 may output the motion vector as it is, or may be provided with a decoder that encodes the output of the register latch 136 in a predetermined format, as indicated by a broken line block 137 in the figure. Next, the operation will be described.

【０２００】１つのテンプレートブロックに対する動き
ベクトルの動作開始時においてカウンタ１３４、レジス
タラッチ１３０、およびレジスタラッチ１３６はリセッ
トされる。レジスタラッチ１３０の初期設定値は最大評
価値よりも大きい値に設定される（たとえば全ビット
“１”）。１つの変位ベクトルに対する評価値が与えら
れると、比較器１３２はこのレジスタラッチ１３０に格
納された値と総和部から与えられる評価値とを比較す
る。比較器１３２は、総和部から新たに与えられる評価
値（評価関数）がレジスタラッチ１３０に格納された値
よりも小さいときにはラッチ指示信号を発生する。レジ
スタラッチ１３０はこのラッチ指示信号に応答して総和
部から与えられる評価値（評価関数）を格納する。同様
にレジスタラッチ１３６もカウンタ１３４のカウント値
を動きベクトルの候補としてラッチする。The counter 134, the register latch 130, and the register latch 136 are reset at the start of operation of the motion vector for one template block. The initial setting value of the register latch 130 is set to a value larger than the maximum evaluation value (for example, all bits “1”). When the evaluation value for one displacement vector is given, the comparator 132 compares the value stored in the register latch 130 with the evaluation value given by the summing section. Comparator 132 generates a latch instruction signal when the evaluation value (evaluation function) newly given from the summing unit is smaller than the value stored in register latch 130. The register latch 130 stores the evaluation value (evaluation function) given from the summing unit in response to the latch instruction signal. Similarly, the register latch 136 also latches the count value of the counter 134 as a motion vector candidate.

【０２０１】次の変位ベクトルに対してカウンタ１３４
はその制御信号φＣに応答してカウント値を１インクリ
メントする。評価値算出が完了すると、比較器１３２は
総和部から新たに与えられた評価値（評価関数）とレジ
スタラッチ１３０に格納された値との大きさの比較を行
なう。新たに与えられた評価値がレジスタラッチ１３０
に格納されている値よりも大きい場合には、比較器１３
２はラッチ指示信号を発生しない。新たに与えられた評
価値（評価関数）がレジスタラッチ１３０に格納されて
いる値よりも小さいときにはラッチ指示信号が発生され
る。この動作をすべての変位ベクトルに対する評価値に
対して実行する。これによりレジスタラッチ１３０に
は、すべての評価値のうち最小の評価値が格納される。
またレジスタラッチ１３６にはその最小の評価値を与え
る動作サイクルを示すカウンタ１３４のカウント値がラ
ッチされる。このカウンタ１３４のカウント値が動きベ
クトルとして利用される。The counter 134 for the next displacement vector
Increments the count value by 1 in response to the control signal φC. When the calculation of the evaluation value is completed, the comparator 132 compares the size of the evaluation value (evaluation function) newly given from the summing unit with the value stored in the register latch 130. The newly given evaluation value is the register latch 130.
If it is larger than the value stored in, the comparator 13
2 does not generate a latch instruction signal. When the newly provided evaluation value (evaluation function) is smaller than the value stored in register latch 130, a latch instruction signal is generated. This operation is executed for the evaluation values for all displacement vectors. As a result, the register latch 130 stores the smallest evaluation value of all the evaluation values.
Further, the register latch 136 latches the count value of the counter 134 indicating the operation cycle that gives the minimum evaluation value. The count value of the counter 134 is used as a motion vector.

【０２０２】この動きベクトルが求められた後は再びカ
ウンタ１３４、レジスタラッチ１３０および１３６が初
期設定され、次のテンプレートブロックに対する動きベ
クトルの算出が実行される。After the motion vector is obtained, the counter 134 and the register latches 130 and 136 are initialized again, and the motion vector for the next template block is calculated.

【０２０３】画像データ格納のためのフレームメモリと
しては、ダイナミック・ランダム・アクセス・メモリま
たはスタティック・ランダム・アクセス・メモリが用い
られてもよい。ランダム・アクセス・メモリの場合、連
続データを読出す場合にページモードなどの高速動作モ
ードが利用される。A dynamic random access memory or a static random access memory may be used as a frame memory for storing image data. In the case of a random access memory, a high speed operation mode such as page mode is used when reading continuous data.

【０２０４】またプロセサアレイへフレームメモリから
画像データを読みだして与えるコントローラは、動きベ
クトル検出装置内に設けられてもよい。A controller for reading out image data from the frame memory and giving it to the processor array may be provided in the motion vector detecting device.

【０２０５】［実施例２］図４４はこの発明の第２の実
施例である動きベクトル検出装置の全体の構成を概略的
に示す図である。図４４において、動きベクトル検出装
置は、サーチウインドーデータＹとテンプレートブロッ
クデータＸとを受け、整数精度での動きベクトルを決定
する第１の演算装置２１０と、第１の演算装置２１０か
らのサーチウインドーデータＳＹおよびテンプレートブ
ロックデータＴＸを直接受けて分数精度での動きベクト
ルを決定する第２の演算装置２５０を含む。[Embodiment 2] FIG. 44 is a diagram schematically showing the overall structure of a motion vector detecting device according to a second embodiment of the present invention. In FIG. 44, the motion vector detection device receives the search window data Y and the template block data X, and determines the motion vector with integer precision. The first calculation device 210 and the search from the first calculation device 210. A second arithmetic unit 250 for directly receiving the window data SY and the template block data TX to determine a motion vector with a fractional precision is included.

【０２０６】第１の演算装置２１０は、先の第１の実施
例に示したように、２次元アレイ状に配置された要素プ
ロセサを含むプロセサアレイ１０と、プロセサアレイ１
０の各要素プロセサからの差分絶対値を総和する総和部
１２と、総和部１２からの総和（評価値）に従って動き
ベクトルを決定する比較部３を含む。プロセサアレイ１
０に格納されたテンプレートブロックデータＴＸおよび
サーチウインドーデータＳＹがその動作に従ってシフト
アウトされるとき直接第２の演算装置２５０へ与えられ
る。第２の演算装置２５０は比較部３からの比較結果指
示（ラッチ指示信号）により動作が制御される。As shown in the first embodiment, the first arithmetic unit 210 includes a processor array 10 including element processors arranged in a two-dimensional array, and a processor array 1.
It includes a summation unit 12 that sums the absolute difference values from each element processor of 0, and a comparison unit 3 that determines a motion vector according to the summation (evaluation value) from the summation unit 12. Processor array 1
When the template block data TX and the search window data SY stored in 0 are shifted out according to the operation, they are directly applied to the second arithmetic unit 250. The operation of the second arithmetic unit 250 is controlled by the comparison result instruction (latch instruction signal) from the comparison unit 3.

【０２０７】この図４４に示す動きベクトル検出装置の
構成においては、サーチウインドーデータおよびテンプ
レートブロックデータは一旦フレームメモリから読出さ
れた後第１の演算装置２１０において整数精度での動き
ベクトル検出に利用され、次いでこの第１の演算装置か
ら第２の演算装置へ、利用されたサーチウインドーデー
タおよびテンプレートブロックデータが転送される。こ
れにより、分数精度での動きベクトル決定動作時におい
てフレームメモリへアクセスする必要がなく、高速で分
数精度での動きベクトルを決定することができる。In the structure of the motion vector detecting device shown in FIG. 44, the search window data and the template block data are once read from the frame memory and then used for motion vector detection with integer precision in the first arithmetic unit 210. Then, the used search window data and template block data are transferred from the first arithmetic unit to the second arithmetic unit. As a result, it is not necessary to access the frame memory during the motion vector determination operation with fractional precision, and the motion vector with fractional precision can be determined at high speed.

【０２０８】図４５は、この第２の実施例における第１
の演算装置２１０に含まれるプロセサアレイ１０と第２
の演算装置２１０との間のデータ伝達系の構成の一例を
示す図である。図４５において、プロセサアレイ１０か
らのテンプレートブロックデータＴＸは直接第２の演算
装置２５０へ与えられ、一方プロセサアレイ１０からの
サーチウインドーデータＳＹは遅延回路２６０を介して
所定時間遅延された後バッファメモリ回路２７０に格納
される。バッファメモリ回路２７０は、分数精度での動
きベクトル検出に必要とされるサーチウインドーデータ
を格納する。たとえばマクロブロック（テンプレートブ
ロックおよびサーチウインドーブロック）のサイズが１
６画素×１６画素の場合、分数精度においてはサーチウ
インドーブロックとしてその周辺を含む１８画素×１８
画素がバッファメモリ回路２７０に格納される。FIG. 45 shows the first embodiment of this second embodiment.
Processor array 10 and second processor unit 210 included in
It is a figure which shows an example of a structure of the data transmission system with the arithmetic unit 210 of this. In FIG. 45, the template block data TX from the processor array 10 is directly supplied to the second arithmetic unit 250, while the search window data SY from the processor array 10 is delayed by the delay circuit 260 for a predetermined time and then buffered. It is stored in the memory circuit 270. The buffer memory circuit 270 stores search window data required for motion vector detection with fractional accuracy. For example, the size of a macroblock (template block and search window block) is 1
In the case of 6 pixels x 16 pixels, 18 pixels x 18 including the periphery as a search window block in terms of fractional accuracy
The pixel is stored in the buffer memory circuit 270.

【０２０９】バッファメモリ回路２７０において格納さ
れたサーチウインドーデータとプロセサアレイ１０から
のテンプレートブロックデータＴＸとを用いて分数精度
での動きベクトルを決定する。遅延回路２６０が与える
遅延時間はバッファメモリ回路２７０の記憶容量により
決定される。バッファメモリ回路２７０が、サーチエリ
アの全画素を格納する容量を備える場合には、遅延回路
２６０は特に設けられなくてもよい。今、バッファメモ
リ回路２７０の記憶容量を１８画素×１８画素の最小容
量とする状態を考える。遅延回路２６０は、一例とし
て、サーチエリアにおける１列の画素転送に要する時間
に等しい遅延時間を与える。The search window data stored in the buffer memory circuit 270 and the template block data TX from the processor array 10 are used to determine a motion vector with a fractional precision. The delay time provided by the delay circuit 260 is determined by the storage capacity of the buffer memory circuit 270. When the buffer memory circuit 270 has a capacity for storing all pixels in the search area, the delay circuit 260 may not be provided. Now, consider a state in which the storage capacity of the buffer memory circuit 270 is set to a minimum capacity of 18 pixels × 18 pixels. The delay circuit 260 gives, for example, a delay time equal to the time required to transfer pixels in one column in the search area.

【０２１０】図４６は図４５に示すバッファメモリ回路
の具体的構成の一例を示す図である。図４０において、
バッファメモリ回路２７０は、プロセサアレイ１０から
のサーチウインドーデータＳＹを格納するメモリ２７２
と、メモリ２７２のデータの書込および読出を制御する
書込／読出制御回路２７４を含む。メモリ２７２の出力
ノードＤｏは第２の演算装置２５０に接続される。FIG. 46 shows an example of a specific structure of the buffer memory circuit shown in FIG. 45. In FIG. 40,
The buffer memory circuit 270 is a memory 272 that stores the search window data SY from the processor array 10.
And a write / read control circuit 274 for controlling writing and reading of data in memory 272. The output node Do of the memory 272 is connected to the second arithmetic unit 250.

【０２１１】書込／読出制御回路２７４は、メモリ２７
２への書込アドレスを発生する書込アドレス発生回路２
８１と、メモリ２７２の読出アドレスを発生する読出ア
ドレス発生回路２８３と、メモリ２７２の読出モードお
よび書込モードを制御する信号を発生する制御回路２８
６と、制御回路２８６の制御の下に書込アドレスおよび
読出アドレスの一方を選択してメモリ２７２のアドレス
入力ノードＡへ与えるセレクタ２８４を含む。The writing / reading control circuit 274 is connected to the memory 27.
Write address generating circuit 2 for generating write address to 2
81, a read address generation circuit 283 for generating a read address of the memory 272, and a control circuit 28 for generating a signal for controlling the read mode and the write mode of the memory 272.
6 and a selector 284 which selects one of a write address and a read address and supplies it to the address input node A of the memory 272 under the control of the control circuit 286.

【０２１２】書込アドレス発生回路２８１は、図４３に
示す比較器からのラッチ指示信号Ｒφに応答して書込ア
ドレスを発生する。このラッチ指示信号Ｒφが発生され
たときには書込アドレスは初期値にリセットされる。書
込アドレス発生回路２８１はたとえば０番地から順次書
込アドレスをクロック信号φに応答して発生する。読出
アドレス発生回路２８３も同様に、０番地から順次読出
アドレスを発生する。セレクタ２８１は、制御回路２８
６がデータ書込を指示している場合には書込アドレス発
生回路２８１からの書込アドレスを選択する。制御回路
２８６が読出モードを指示している場合にはセレクタ２
８４は読出アドレス発生回路２８３からの読出アドレス
を選択する。Write address generating circuit 281 generates a write address in response to latch designating signal Rφ from the comparator shown in FIG. When this latch instruction signal Rφ is generated, the write address is reset to the initial value. Write address generating circuit 281 sequentially generates write addresses from address 0 in response to clock signal φ. Similarly, the read address generation circuit 283 sequentially generates read addresses from address 0. The selector 281 is the control circuit 28.
When 6 indicates data writing, the write address from write address generating circuit 281 is selected. If the control circuit 286 indicates the read mode, the selector 2
Reference numeral 84 selects the read address from the read address generation circuit 283.

【０２１３】制御回路２８６は、サーチウインドーデー
タ転送クロック信号φに応答してメモリ２７２へのデー
タ書込タイミングおよび読出タイミングを決定する信号
を発生する。制御回路２８６はまた動作モード指示信号
φＲＷに応答してメモリ２７２の書込および読出を制御
する信号を発生する。メモリ２７２がダイナミック・ラ
ンダム・アクセス・メモリで構成される場合、制御回路
２８６はロウアドレスストローブ信号／ＲＡＳ、コラム
アドレスストローブ信号ＣＡＳ、ライトイネーブル信号
／ＷＥおよびアウトプットイネーブル信号／ＯＥ（出力
が３状態とされる場合）を発生する。Control circuit 286 responds to search window data transfer clock signal φ to generate a signal for determining the data writing timing and reading timing to memory 272. Control circuit 286 also generates a signal for controlling writing and reading of memory 272 in response to operation mode instruction signal φRW. When the memory 272 is composed of a dynamic random access memory, the control circuit 286 controls the row address strobe signal / RAS, the column address strobe signal CAS, the write enable signal / WE and the output enable signal / OE (the output is in three states). And if it is).

【０２１４】制御回路２８６へ与えられる制御信号φＲ
Ｗは、外部のコントローラから与えられてもよい。また
図４３に示す比較部３のカウンタ１３４のカウント値が
所定のカウント値に達したときにカウントアップ信号を
発生し、このカウントアップ信号を制御信号φＲＷとし
て利用してもよい。カウンタ（図４３参照）が所定のカ
ウント値に達するまでは１つのテンプレートブロックに
対する評価値の導出が持続的に実行されているからであ
る。次に動作について説明する。Control signal φR applied to control circuit 286
W may be given from an external controller. A count-up signal may be generated when the count value of the counter 134 of the comparison unit 3 shown in FIG. 43 reaches a predetermined count value, and this count-up signal may be used as the control signal φRW. This is because the evaluation value is continuously derived for one template block until the counter (see FIG. 43) reaches a predetermined count value. Next, the operation will be described.

【０２１５】今、図４７に示すようにサーチウインドー
が４８画素×１６画素の大きさを備え、マクロブロック
（テンプレートブロックおよびサーチウインドーブロッ
ク）が１６画素×１６画素の大きさを備える状態を考え
る。今、サーチウインドーブロック４２に対し評価値算
出動作が行なわれているとする。このサーチウインドー
ブロック４２に対する分数精度の動きベクトルを求める
ために必要とされる領域は、領域４２を含む１８画素×
１８画素の領域４８である。領域４８は図４１に示すよ
うに画素データＰ０〜Ｐ３２５を含む。Now, as shown in FIG. 47, assume that the search window has a size of 48 pixels × 16 pixels, and the macro block (template block and search window block) has a size of 16 pixels × 16 pixels. Think Now, it is assumed that the search window block 42 is being evaluated. The area required to obtain the fractionally accurate motion vector for the search window block 42 is 18 pixels including the area 42.
An area 48 of 18 pixels. The area 48 includes pixel data P0 to P325 as shown in FIG.

【０２１６】図４８に示すように、クロック信号φは、
このサーチウインドー画素データの各転送時に発生され
る。クロック信号φが１つ発生されるとサーチウインド
ーデータが１画素分シフトアウトされる。サーチウイン
ドーブロック４２に対する評価値算出動作時において
は、図４５に示す遅延回路２６０の出力は画素Ｐ０に対
応するデータである。このとき、サーチウインドーブロ
ック４２の評価値がそれまでに得られている評価値のう
ちで最小である場合には、図４３に示す比較器からラッ
チ指示信号Ｒφが発生される。それに応答して図４７に
示す書込アドレス発生回路２８１の書込アドレスが初期
値０にリセットされる。メモリ２７２のアドレス０の位
置に画素データＰ０が書込まれる。以降連続して１８画
素のデータすなわちＰ１…Ｐ１７がアドレス１〜１７の
位置に格納される（信号Ｒφが発生されないとき）。As shown in FIG. 48, the clock signal φ is
This is generated at each transfer of the search window pixel data. When one clock signal φ is generated, the search window data is shifted out by one pixel. In the operation of calculating the evaluation value for the search window block 42, the output of the delay circuit 260 shown in FIG. 45 is the data corresponding to the pixel P0. At this time, when the evaluation value of search window block 42 is the smallest among the evaluation values obtained so far, latch instruction signal Rφ is generated from the comparator shown in FIG. In response to this, the write address of write address generating circuit 281 shown in FIG. 47 is reset to initial value 0. The pixel data P0 is written in the position of address 0 of the memory 272. After that, data of 18 pixels, that is, P1 ... P17 are continuously stored in the positions of addresses 1 to 17 (when the signal Rφ is not generated).

【０２１７】次いで、不要データの書込を禁止するため
に、書込アドレス発生回路２８１は３０クロック期間す
なわち、３０φの期間休止状態となり、メモリ２７２は
書込禁止状態となる。この書込禁止の休止期間の決定
は、ラッチ指示信号Ｒφが制御回路２８６へまた与えら
れ、この制御回路２８６がラッチ指示信号Ｒφが与えら
れてから１８回クロック信号φをカウントした後３０φ
期間メモリ２７２を休止状態とする構成が利用される。Then, in order to prohibit the writing of unnecessary data, write address generating circuit 281 is in the idle state for 30 clock periods, that is, 30φ, and memory 272 is in the write inhibit state. To determine the write-prohibited rest period, latch instruction signal Rφ is applied to control circuit 286 again, and control circuit 286 counts clock signal φ 18 times after latch instruction signal Rφ is applied and then 30φ
A configuration is used in which the period memory 272 is put in a dormant state.

【０２１８】３０クロック期間（３０φサイクル期間）
が経過すると、再び書込アドレス発生回路２８１は書込
アドレスを発生する。このときのアドレスはアドレス１
８であり、このアドレス１８の位置に画素データＰ１８
が格納される。以降この動作が繰返し実行される。テン
プレートブロック変更時には、サーチエリアは１マクロ
ブロックサイズ水平方向へシフトする。したがって、動
きベクトルを与えるマクロブロックがサーチエリアの境
界に接していても、分数精度の動きベクトル検出に必要
なデータはすべて得られる。この場合、サーチエリア外
部のデータは無視されてもよい。隣接データが用いられ
てもよい。30 clock periods (30φ cycle period)
When elapses, the write address generation circuit 281 again generates the write address. The address at this time is address 1
8 and the pixel data P18 is at the position of this address 18.
Is stored. Thereafter, this operation is repeatedly executed. When the template block is changed, the search area is horizontally shifted by one macroblock size. Therefore, even if the macro block giving the motion vector is in contact with the boundary of the search area, all the data necessary for the fractionally accurate motion vector detection can be obtained. In this case, data outside the search area may be ignored. Adjacent data may be used.

【０２１９】上述の動作により、メモリ２７２には、常
に動きベクトル候補となる変位ベクトルに対応するサー
チウインドーブロックのデータのみが格納される。この
ときメモリ２７２はその記憶容量が１８ワード×１８ワ
ードの最小の記憶容量でよく、装置構成を小さくするこ
とができる。As a result of the above-described operation, the memory 272 always stores only the data of the search window block corresponding to the displacement vector which is the motion vector candidate. At this time, the memory 272 has a minimum storage capacity of 18 words × 18 words, and the device configuration can be reduced.

【０２２０】バッファメモリ回路２７０が１つのテンプ
レートブロックに対するサーチエリア全体の画素データ
すべてが格納される構成の場合には、遅延回路２６０は
設けなくてもよい。このときには、読出アドレスは比較
部３（図４３）からの動きベクトルの値に従って発生さ
れる。In the case where buffer memory circuit 270 is constructed to store all pixel data of the entire search area for one template block, delay circuit 260 may not be provided. At this time, the read address is generated according to the value of the motion vector from the comparison unit 3 (FIG. 43).

【０２２１】図４９は、図４５に示す第２の演算装置の
具体的構成例を示す図である。図４９において、第２の
演算装置２５０は、第１の演算装置（プロセサアレイ）
からのサーチウインドーデータ（より正確にはバッファ
メモリの出力）を受けて分数精度に必要とされる予測画
像を生成する分数精度予測画像生成回路３０２と、分数
精度予測画像生成回路３０２で生成された予測画像の画
素データとテンプレートブロックデータＸとの差分絶対
値和を求める差分絶対値和回路３０４と、差分絶対値和
回路３０４の出力のうち最小の差分絶対値和を与える変
位ベクトルを検出する比較部３０６を含む。FIG. 49 is a diagram showing a specific structural example of the second arithmetic unit shown in FIG. In FIG. 49, the second arithmetic unit 250 is the first arithmetic unit (processor array).
Generated by the fractional precision predicted image generation circuit 302 and the fractional precision predicted image generation circuit 302 that receives the search window data (more accurately, the output of the buffer memory) from The difference absolute value sum circuit 304 for obtaining the difference absolute value sum between the pixel data of the predicted image and the template block data X, and the displacement vector giving the smallest sum of the difference absolute values among the outputs of the difference absolute value sum circuit 304 are detected. The comparison unit 306 is included.

【０２２２】分数精度予測画像生成回路３０２は、複数
の予測画像画素データを並列に生成する。差分絶対値和
回路３０４もまた、動きベクトル候補となる変位ベクト
ルに対する評価値を並列態様で生成する。比較部３０６
は、差分絶対値和回路３０４から与えられる複数の差分
絶対値和のうち最小の絶対値和を検出し、その最小の絶
対値和に対応する変位ベクトルを動きベクトルと決定す
る。次に各回路構成の具体的構成について説明する。The fractional precision predicted image generation circuit 302 generates a plurality of predicted image pixel data in parallel. The difference absolute value sum circuit 304 also generates an evaluation value for a displacement vector that is a motion vector candidate in a parallel manner. Comparison unit 306
Detects the smallest sum of absolute values among the plurality of sums of absolute differences given from the sum of absolute differences circuit 304, and determines the displacement vector corresponding to the smallest sum of absolute values as a motion vector. Next, a specific configuration of each circuit configuration will be described.

【０２２３】図５０は図４９に示す分数精度予測画像生
成回路の具体的構成例を示す図である。図５０におい
て、分数精度予測画像生成回路３０２は、与えられたサ
ーチウインドーデータを所定時間遅延する遅延回路３１
０と、遅延回路３１０の出力とサーチウインドーデータ
Ｙとを加算する加算器３１２と、加算器３１２の出力に
係数（１／２）を掛ける乗算器３１４を含む。FIG. 50 is a diagram showing a specific configuration example of the fractional precision predicted image generating circuit shown in FIG. In FIG. 50, a fractional accuracy predicted image generation circuit 302 includes a delay circuit 31 that delays given search window data for a predetermined time.
0, an adder 312 that adds the output of the delay circuit 310 and the search window data Y, and a multiplier 314 that multiplies the output of the adder 312 by a coefficient (1/2).

【０２２４】乗算器３１４を１ビット下位ビット方向へ
与えられたデータをシフトするシフタで構成してもよ
い。遅延回路３１０が与える遅延時間は、分数精度予測
画像生成回路３０２がいずれの変位ベクトルに対応する
かにより決定される。この遅延回路３１０の遅延時間に
ついては後に説明する。図５０に示す分数精度予測画像
生成回路３０２は、１／２画素精度で動きベクトルを検
出するために用いられる。Multiplier 314 may be formed of a shifter that shifts data applied in the direction of the lower bit of 1 bit. The delay time provided by the delay circuit 310 is determined by which displacement vector the fractional accuracy predicted image generation circuit 302 corresponds to. The delay time of the delay circuit 310 will be described later. The fractional accuracy predicted image generation circuit 302 shown in FIG. 50 is used to detect a motion vector with 1/2 pixel accuracy.

【０２２５】図５１は、図４９に示す差分絶対値和回路
の具体的構成の一例を示す図である。図５１において、
差分絶対値和回路３０４は、分数精度予測画像生成回路
３０２から与えられる補間データＹＦとテンプレートブ
ロックデータＸとの減算を行なう減算器３２０と、減算
器３２０の出力の絶対値を求める絶対値回路３２２と、
絶対値回路３２２の出力を累算する累算回路３２４を含
む。次にこの分数精度での動きベクトルを求める第２の
演算装置の具体的構成例について説明する。FIG. 51 shows an example of a specific structure of the difference absolute value sum circuit shown in FIG. In FIG. 51,
The absolute difference sum circuit 304 subtracts the interpolation data YF given from the fractional accuracy predicted image generation circuit 302 and the template block data X, and the absolute value circuit 322 for obtaining the absolute value of the output of the subtractor 320. When,
An accumulator circuit 324 for accumulating the output of the absolute value circuit 322 is included. Next, a specific configuration example of the second arithmetic device for obtaining the motion vector with the fractional accuracy will be described.

【０２２６】図５２は、第２の演算装置における分数精
度での動きベクトルを求める動作を示す図である。第２
の演算装置２５０は１／２画素精度で動きベクトルを検
出する。今、動きベクトルの候補として８点を考える。
すなわち、着目画素Ｐに対しその８近傍Ｑ１〜Ｑ４およ
びＱ６〜Ｑ９の画素データを補間により求める。この補
間により求められた画素データＱ１〜Ｑ４およびＱ６〜
Ｑ９それぞれとテンプレートブロックデータＸとの差分
絶対値が求められかつ累算されて動きベクトルの候補を
示す評価値が算出される。FIG. 52 is a diagram showing an operation for obtaining a motion vector with a fractional accuracy in the second arithmetic unit. Second
The arithmetic unit 250 of 1 detects the motion vector with 1/2 pixel accuracy. Now consider 8 points as motion vector candidates.
That is, the pixel data of the eight neighborhoods Q1 to Q4 and Q6 to Q9 of the pixel of interest P are obtained by interpolation. Pixel data Q1 to Q4 and Q6 to be obtained by this interpolation
The absolute difference value between each Q9 and the template block data X is calculated and accumulated to calculate an evaluation value indicating a motion vector candidate.

【０２２７】今、サーチウインドーデータをＰ１〜Ｐ９
とし、この隣接列画素間の期間をＴとし、隣接行間の遅
延時間をＨ（１８Ｔ：分数精度でのサーチウインドーブ
ロックサイズが１８画素×１８画素の場合）とする。こ
こで図５２においては、行と列の配置が先に示した実施
例のものと転置されている。プロセサアレイからのサー
チウインドーブロックデータはサーチウインドーの列方
向に沿って順次シフトアウトされ、またテンプレートブ
ロックデータも同様に列方向に沿ってシフトされる（テ
ンプレートブロックデータ伝達線とサーチウインドーデ
ータ伝達線とが互いに平行に配置されている場合）。こ
の場合、単に各画素データに付されている参照番号を変
える（行と列を入れ替える）だけでよく、特に問題は生
じない。図５２に示す補間サーチウインドーデータＱ１
〜Ｑ４、およびＱ６〜Ｑ９を用いて分数精度での動きベ
クトルを求めるための構成について次に説明する。Now, search window data are set to P1 to P9.
The period between adjacent column pixels is T, and the delay time between adjacent rows is H (18T: when the search window block size with fractional accuracy is 18 pixels × 18 pixels). Here, in FIG. 52, the arrangement of rows and columns is transposed to that of the above-described embodiment. The search window block data from the processor array is sequentially shifted out along the column direction of the search window, and the template block data is similarly shifted along the column direction (template block data transmission line and search window data). If the transmission lines and are arranged parallel to each other). In this case, it suffices to simply change the reference number given to each pixel data (change the row and the column), and there is no particular problem. Interpolation search window data Q1 shown in FIG.
Next, a configuration for obtaining a motion vector with fractional accuracy using Q4 to Q4 and Q6 to Q9 will be described.

【０２２８】図５３は第２の演算装置の具体的構成を示
す図である。図５３において、第２の演算装置２５０
は、与えられたサーチウインドーデータを１Ｈ期間遅延
する遅延回路３３５ａと、この遅延回路３３５ａ出力を
さらに１Ｈ遅延する遅延回路３３５ｂを含む。この２段
の縦続接続された遅延回路３３５ａおよび３３５ｂによ
り、図５２における各行に対応するデータを発生する経
路が形成される。FIG. 53 is a diagram showing a specific structure of the second arithmetic unit. In FIG. 53, the second arithmetic unit 250
Includes a delay circuit 335a that delays the supplied search window data for 1H period, and a delay circuit 335b that further delays the output of this delay circuit 335a by 1H. These two stages of cascaded delay circuits 335a and 335b form a path for generating data corresponding to each row in FIG.

【０２２９】図５３においてこの第２の演算装置はさら
に、入力サーチウインドーデータＰを１Ｔ期間遅延する
遅延回路３３６ａと、遅延回路３３６ａの出力をさらに
１Ｔ期間遅延する遅延回路３３６ｄと、１Ｈ遅延回路３
３５ａの出力を１Ｔ期間遅延する遅延回路３３６ｂと、
遅延回路３３６ｂの出力を１Ｔ期間遅延する遅延回路３
３６ｅと、１Ｈ遅延回路３３５ｂの出力を１Ｔ期間遅延
する遅延回路３３６ｃと、遅延回路３３６ｃの出力を１
Ｔ期間さらに遅延する遅延回路３３６ｆを含む。この１
Ｔ遅延回路３３６ａ〜３３６ｆにより、補間に必要とさ
れるサーチウインドーデータが生成される。In FIG. 53, the second arithmetic unit further includes a delay circuit 336a for delaying the input search window data P for 1T period, a delay circuit 336d for delaying the output of the delay circuit 336a for another 1T period, and a 1H delay circuit. Three
A delay circuit 336b for delaying the output of 35a by 1T period;
Delay circuit 3 for delaying the output of delay circuit 336b for 1T period
36e, a delay circuit 336c that delays the output of the 1H delay circuit 335b for 1T period, and an output of the delay circuit 336c that is 1
It includes a delay circuit 336f that further delays for the T period. This one
Search window data required for interpolation is generated by the T delay circuits 336a to 336f.

【０２３０】第２の演算装置２５０はさらに入力サーチ
ウインドーデータＰと１Ｔ遅延回路３３６ａの出力を加
算しかつ係数（１／２）を乗算する加算シフト回路３３
０ａを含む。加算シフト回路３３０ａは係数（１／２）
の乗算をシフト動作により実現する。この第２の演算装
置２５０はさらに、１Ｔ遅延回路３３６ａの出力と１Ｔ
遅延回路３３６ｅの出力に対し加算シフト動作を実行す
る加算シフト回路３３０ｂと、１Ｔ遅延回路３３６ｂの
出力と１Ｔ遅延回路３３６ｅの出力とに対し加算シフト
動作を実行する加算シフト回路３３０ｃと、１Ｈ遅延回
路３３５ａの出力と１Ｔ遅延回路３３６ｂの出力とに対
し加算シフト動作を実行する加算シフト回路３３０ｄ
と、１Ｈ遅延回路３３５ｂの出力と１Ｔ遅延回路３３６
ｃの出力とに対し加算シフト動作を実行する加算シフト
回路３３０ｅと、１Ｔ遅延回路３３６ｃの出力と１Ｔ遅
延回路３３６ｆの出力とに対し加算シフト動作を実行す
る加算シフト回路３３０ｆとを含む。この加算シフト回
路３３０ａ〜３３０ｆにより、４画素間の補間データを
生成するためのデータが生成される。The second arithmetic unit 250 further adds the input search window data P and the output of the 1T delay circuit 336a and multiplies them by a coefficient (1/2).
Including 0a. The addition shift circuit 330a has a coefficient (1/2)
The multiplication of is realized by the shift operation. The second arithmetic unit 250 further includes the output of the 1T delay circuit 336a and the 1T delay circuit 336a.
An addition shift circuit 330b that performs an addition shift operation on the output of the delay circuit 336e, an addition shift circuit 330c that performs an addition shift operation on the outputs of the 1T delay circuit 336b and the 1T delay circuit 336e, and a 1H delay circuit. Addition shift circuit 330d that performs addition shift operation on the output of 335a and the output of 1T delay circuit 336b
And the output of the 1H delay circuit 335b and the 1T delay circuit 336.
An addition shift circuit 330e that performs an addition shift operation on the output of c and an addition shift circuit 330f that performs an addition shift operation on the outputs of the 1T delay circuit 336c and the 1T delay circuit 336f are included. The addition shift circuits 330a to 330f generate data for generating interpolation data between four pixels.

【０２３１】第２の演算装置２５０はさらに、加算シフ
ト回路３３０ａの出力と加算回路３３０ｄの出力とに対
し加算シフト操作を実行する加算シフト回路３３０ｇ
と、１Ｔ遅延回路３３６ａの出力と１Ｔ遅延回路３３６
ｂの出力とに対し加算シフト操作を実行する加算シフト
回路３３０ｈと、加算シフト回路３３０ｂの出力と加算
シフト回路３３０ｃの出力とに対し加算シフト操作を実
行する加算シフト回路３３０ｉと、１Ｔ遅延回路３３６
ｂの出力と１Ｈ遅延回路３３５ａの出力とに対し加算シ
フト操作を実行する加算シフト回路３３０ｊと、１Ｔ遅
延回路３３６ｅの出力と１Ｔ遅延回路３３６ｂの出力と
に対し加算シフト操作を実行する加算シフト回路３３０
ｋと、加算シフト回路３３０ｄの出力と加算シフト回路
３３０ｅの出力とに対し加算シフト操作を実行する加算
シフト回路３３０ｌと、１Ｔ遅延回路３３６ｂの出力と
１Ｔ遅延回路３３６ｃの出力とに対し加算シフト操作を
実行する加算シフト回路３３０ｍと、加算シフト回路３
３０ｃの出力と加算シフト回路３３０ｆの出力とに対し
加算シフト操作を実行する加算シフト回路３３０ｎとを
含む。加算シフト回路３３０ｇないし３３０ｎから、図
５２に示す補間画素データＱ９〜Ｑ６およびＱ４〜Ｑ１
の位置にある画素データが生成される。The second arithmetic unit 250 further executes an add shift circuit 330g for performing an add shift operation on the output of the add shift circuit 330a and the output of the add circuit 330d.
And the output of the 1T delay circuit 336a and the 1T delay circuit 336
b and the addition shift circuit 330h that performs the addition shift operation on the output of b, the addition shift circuit 330i that performs the addition shift operation on the outputs of the addition shift circuit 330b and the output of the addition shift circuit 330c, and the 1T delay circuit 336.
b and the output of the 1H delay circuit 335a, an add shift circuit 330j that performs an add shift operation, and an add shift circuit that performs an add shift operation on the output of the 1T delay circuit 336e and the output of the 1T delay circuit 336b. 330
k, an output of the addition shift circuit 330d and an output of the addition shift circuit 330e, an addition shift circuit 330l that performs an addition shift operation, an output of the 1T delay circuit 336b, and an output of the 1T delay circuit 336c. And an addition shift circuit 330m for executing
And an add shift circuit 330n that performs an add shift operation on the output of 30c and the output of add shift circuit 330f. Interpolation pixel data Q9 to Q6 and Q4 to Q1 shown in FIG. 52 are added from the addition shift circuits 330g to 330n.
The pixel data at the position of is generated.

【０２３２】差分絶対値和回路３０４は、加算シフト回
路３３０ｇ〜３３０ｎの出力Ｑ９〜Ｑ６およびＱ４〜Ｑ
１とテンプレートブロックデータＡとを受け、与えられ
た信号の差分絶対値和を求める差分絶対値和回路３０４
ａ〜３０４ｈを含む。サーチウインドーブロックデータ
ＰとテンプレートブロックデータＡとは真裏の状態の関
係にある。The absolute difference sum circuit 304 outputs the outputs Q9 to Q6 and Q4 to Q of the addition shift circuits 330g to 330n.
The difference absolute value sum circuit 304 which receives 1 and the template block data A and calculates the sum of difference absolute values of given signals
a to 304h are included. The search window block data P and the template block data A are in the relationship of the back side.

【０２３３】差分絶対値和回路３０４ａ〜３０４ｈの出
力は並列に比較部３０６へ与えられ、そこで最小の差分
絶対値和を与える変位ベクトルが動きベクトルとして検
出される。比較部３０６は、この差分絶対値和回路３０
４ａ〜３０４ｈの出力を比較し、最小の値を与える差分
絶対値和回路に付されたコードを動きベクトルとして出
力する。差分絶対値和回路３０４ａ〜３０４ｈはそれぞ
れ８個の評価点を与える変位ベクトルに対応する。した
がって、これにより動きベクトルを分数精度で決定する
ことができる。The outputs of the difference absolute value sum circuits 304a to 304h are applied in parallel to the comparison unit 306, and the displacement vector which gives the minimum difference absolute value sum is detected as a motion vector. The comparison unit 306 uses the absolute difference value sum circuit 30.
The outputs of 4a to 304h are compared, and the code attached to the difference absolute value sum circuit that gives the minimum value is output as a motion vector. Each of the absolute difference value sum circuits 304a to 304h corresponds to a displacement vector giving eight evaluation points. Therefore, this allows the motion vector to be determined with fractional accuracy.

【０２３４】図５３に示す構成において、差分絶対値和
回路は１つまたは４つだけ設けられ、時分割的に活性化
されて加算シフト回路３３０ｇ〜３３０ｎの出力を順次
加算および累算してもよい。この共有構造とされる場
合、差分絶対値和回路における累算結果はレジスタファ
イル内に格納され、レジスタファイルの値を用いて累算
動作が実行される。この場合比較部３０６には、図４３
に示す比較部と同様の構成を利用することができる。順
次比較動作が実行されるためである。In the structure shown in FIG. 53, only one or four absolute difference value summing circuits are provided, which are activated in a time division manner to sequentially add and accumulate the outputs of the addition shift circuits 330g to 330n. Good. In the case of this shared structure, the accumulation result in the difference absolute value sum circuit is stored in the register file, and the accumulation operation is executed using the value of the register file. In this case, the comparison unit 306 is shown in FIG.
A configuration similar to that of the comparison unit shown in can be used. This is because the sequential comparison operation is executed.

【０２３５】また上述の構成においては、１／２画素精
度で動きベクトルを検出している。１／４画素精度など
のより細かな分数精度の動きベクトルを検出する構成が
利用されてもよい。また評価点は８点であるがさらに数
多くの評価点が利用される構成が用いられてもよい。Further, in the above configuration, the motion vector is detected with 1/2 pixel precision. A configuration for detecting a motion vector with finer fractional precision such as ¼ pixel precision may be used. Although the evaluation score is 8 points, a configuration in which more evaluation points are used may be used.

【０２３６】図５４は、この発明の第２の実施例の変形
例を示す図である。図５４に示す動きベクトル検出装置
においては、整数精度での動きベクトルを検出する第１
の演算装置２１０と分数精度での動きベクトルを検出す
るための第２の演算装置２５０との間にバッファメモリ
２８０および２８２が設けられる。バッファメモリ２８
０および２８２は、第１の演算装置２１０から出力され
るテンプレートブロックデータＴＸおよびサーチウイン
ドーデータＳＹをそれぞれ格納する。バッファメモリ２
８０および２８２の構成は図４６に示すものと同様であ
る。FIG. 54 is a diagram showing a modification of the second embodiment of the present invention. In the motion vector detection device shown in FIG. 54, the first method for detecting a motion vector with integer precision
Buffer memories 280 and 282 are provided between the arithmetic unit 210 and the second arithmetic unit 250 for detecting a motion vector with a fractional precision. Buffer memory 28
0 and 282 store the template block data TX and the search window data SY output from the first arithmetic unit 210, respectively. Buffer memory 2
The configuration of 80 and 282 is similar to that shown in FIG.

【０２３７】また、バッファメモリ２８０はテンプレー
トブロックすべての画素データを格納する容量を備え、
バッファメモリ２８２はこのテンプレートブロックに対
するサーチエリアすべての画素データを記憶する容量を
備えてもよい。バッファメモリ２８２は、先の実施例と
同様分数精度での動きベクトル検出に必要とされる領域
の画素データのみを記憶する容量を備えてもよい。The buffer memory 280 has a capacity for storing pixel data of all template blocks,
The buffer memory 282 may have a capacity for storing pixel data of all search areas for this template block. The buffer memory 282 may have a capacity to store only pixel data of an area required for motion vector detection with fractional accuracy, as in the previous embodiment.

【０２３８】図４５に示す構成においては、第１の演算
装置２１０からのテンプレートブロックデータＴＸは第
２の演算装置２５０へ直接与えられている。その構成の
場合、分数精度での動きベクトルの検出動作期間は、第
１の演算装置２１０に含まれるプロセサアレイへのテン
プレートブロックデータのロード期間である。この期間
中にテンプレートブロックデータがプロセサアレイから
順次シフトアウトされるからである。この場合、第２の
演算装置２５０にはテンプレートブロックデータの転送
速度とほぼ同じ動作速度で演算を実行することが要求さ
れる。In the structure shown in FIG. 45, the template block data TX from the first arithmetic unit 210 is directly applied to the second arithmetic unit 250. In the case of that configuration, the motion vector detection operation period with the fractional accuracy is the load period of the template block data to the processor array included in the first arithmetic unit 210. This is because the template block data is sequentially shifted out from the processor array during this period. In this case, the second arithmetic unit 250 is required to perform arithmetic at an operation speed substantially the same as the transfer speed of the template block data.

【０２３９】しかしながら、図５４に示すようにテンプ
レートブロックデータＴＸおよびサーチウインドーデー
タＳＹ両者に対しバッファメモリ２８０および２８２を
設けることにより、第２の演算装置２５０は、十分なマ
ージンをもって演算を実行することができる。またテン
プレートブロックデータＴＸとサーチウインドーデータ
ＳＹをタイミングを合わせて第２の演算装置２５０へ与
えるのが容易となる。However, by providing the buffer memories 280 and 282 for both the template block data TX and the search window data SY as shown in FIG. 54, the second arithmetic unit 250 executes the arithmetic operation with a sufficient margin. be able to. Further, it becomes easy to apply the template block data TX and the search window data SY to the second arithmetic unit 250 at the same timing.

【０２４０】またこの場合、図５５に示すように、第１
の演算装置２１０および第２の演算装置２５０における
演算動作をパイプライン化することができる。ここで、
図５５は、図５４に示す動きベクトル検出装置の動作態
様を示す図であり、横軸に時間を示す。第１の演算装置
２１０においてＮ番目のマクロブロック（テンプレート
ブロック）についての処理を行ない、このＮ番目のマク
ロブロック（テンプレートブロック）についての動きベ
クトル決定動作が完了後、このＮ番目のマクロブロック
の処理に用いられたサーチウインドーデータおよびテン
プレートブロックデータを用いて第２の演算装置２５０
において、Ｎ番目のマクロブロックに対する分数精度で
の動きベクトルの検出を実行する。このＮ番目のマクロ
ブロックの分数精度での動きベクトルの決定動作と並行
して第１の演算装置においては（Ｎ＋１）番目のマクロ
ブロック（テンプレートブロック）についての整数精度
での動きベクトルの検出動作を実行する。In this case, as shown in FIG. 55, the first
The arithmetic operation in the arithmetic unit 210 and the second arithmetic unit 250 can be pipelined. here,
FIG. 55 is a diagram showing an operation mode of the motion vector detection device shown in FIG. 54, and the horizontal axis shows time. In the first arithmetic unit 210, the process for the Nth macroblock (template block) is performed, and after the motion vector determination operation for this Nth macroblock (template block) is completed, the process for this Nth macroblock is performed. Second arithmetic unit 250 using the search window data and template block data used for
In, the motion vector detection with fractional precision for the Nth macroblock is performed. In parallel with the motion vector determination operation with fractional precision of the Nth macroblock, the motion vector detection operation with integer precision for the (N + 1) th macroblock (template block) is performed in the first arithmetic unit. Run.

【０２４１】このように第１の演算装置２１０と第２の
演算装置２５０の動作をパイプライン化することによ
り、第１の演算装置２１０の動作と第２の演算装置２５
０における演算動作とを互いに時間的に切り離して実行
することができ、演算操作に対するタイミング要件に余
裕をもって分数精度での動きベクトル検出を実行するこ
とができる。またこの場合においても、第１の演算装置
２１０と第２の演算装置２５０とは互いに並列に動作し
ており、高速で分数精度での動きベクトルの検出を行な
うことができる。By thus pipelining the operation of the first arithmetic unit 210 and the second arithmetic unit 250, the operation of the first arithmetic unit 210 and the second arithmetic unit 25
The arithmetic operation at 0 can be executed separately from each other in time, and the motion vector detection with the fractional accuracy can be executed with a margin in the timing requirement for the arithmetic operation. Also in this case, the first arithmetic device 210 and the second arithmetic device 250 operate in parallel with each other, and the motion vector can be detected at high speed with fractional accuracy.

【０２４２】この図５４に示す構成において、テンプレ
ートブロックデータＴＸを格納するためにバッファメモ
リ２８０が用いられているが、これは遅延機能を備える
素子（遅延線またはラインメモリ）であってもよい。In the structure shown in FIG. 54, buffer memory 280 is used to store template block data TX, but it may be an element (delay line or line memory) having a delay function.

【０２４３】図５６はこの発明の第２の実施例のさらに
他の変更例を示す図である。図５６においては、整数精
度での動きベクトルを検出する第１の演算装置の部分が
示される。図５６において、第１の演算装置４００は、
互いに縦続接続された要素演算器４１１ａ〜４１１ｗを
含む。要素演算器４１１ａ〜４１１ｗの各々は同一の構
成を備え、サーチウインドーデータをラッチするための
レジスタ４１３ａ〜４１３ｗと、テンプレートブロック
データをラッチするためのレジスタ４１４ａ〜４１４ｗ
と、対応のサーチウインドー用レジスタ４１３ａ〜４１
３ｗとテンプレートブロックデータ格納用レジスタ４１
４ａ〜４１４ｗに格納されたデータ値の差分絶対値を求
める差分絶対値回路４１２ａ〜４１２ｗを含む。FIG. 56 shows still another modification of the second embodiment of the present invention. In FIG. 56, the portion of the first arithmetic unit that detects a motion vector with integer precision is shown. In FIG. 56, the first arithmetic unit 400 is
Element arithmetic units 411a to 411w connected in series are included. Each of the element arithmetic units 411a to 411w has the same configuration, and registers 413a to 413w for latching search window data and registers 414a to 414w for latching template block data.
And corresponding search window registers 413a-41
3w and template block data storage register 41
Included are absolute difference circuits 412a to 412w for obtaining absolute difference values of the data values stored in 4a to 414w.

【０２４４】レジスタ４１３ａ〜４１３ｗはそれぞれ一
方方向に沿ってデータを転送することができる。またレ
ジスタ４１４ａ〜４１４ｗもそれぞれ一方方向に沿って
隣接要素演算器へ格納データを転送することができる。
初段の要素演算器４１１ａのサーチウインドー用レジス
タ４１３ａへは信号線４１５を介してフレームメモリ４
０１からサーチウインドーデータが伝達される。また要
素演算器４１１ａのテンプレートブロックデータ用レジ
スタ４１４ａへは信号線４１６を介してフレームメモリ
４０２からテンプレートブロックデータが伝達される。Each of the registers 413a to 413w can transfer data along one direction. Each of the registers 414a to 414w can also transfer the stored data to the adjacent element arithmetic unit along one direction.
The search window register 413a of the element arithmetic unit 411a at the first stage is connected to the frame memory 4 via the signal line 415.
The search window data is transmitted from 01. The template block data is transmitted from the frame memory 402 to the template block data register 414a of the element calculator 411a via the signal line 416.

【０２４５】フレームメモリ４０１は参照フレーム画像
（前フレーム画像）の全画素データを格納する。フレー
ムメモリ４０２は、現フレーム画像の全画素データを格
納する。要素演算器４１１ａ〜４１１ｗは１つのマクロ
ブロックのサイズに対応する数だけ設けられる。すなわ
ち、テンプレートブロックサイズが１６行×１６列の場
合には、要素演算器４１１ａ〜４１１ｗは合計２５６個
設けられる。The frame memory 401 stores all pixel data of the reference frame image (previous frame image). The frame memory 402 stores all pixel data of the current frame image. The element calculators 411a to 411w are provided by the number corresponding to the size of one macroblock. That is, when the template block size is 16 rows × 16 columns, a total of 256 element calculators 411a to 411w are provided.

【０２４６】第１の演算装置４００はさらに、この要素
演算器４１１ａ〜４１１ｗのそれぞれの差分絶対値回路
４１２ａ〜４１２ｗの出力を加算する総和部４１７と、
この総和部４１７の出力に応答して最小の評価値を検出
し、対応のサーチウインドーブロックの変位ベクトルを
動きベクトルとして決定する比較部４１８を含む。The first arithmetic unit 400 further includes a summation unit 417 for adding the outputs of the absolute difference circuits 412a-412w of the element arithmetic units 411a-411w.
It includes a comparison unit 418 which detects the minimum evaluation value in response to the output of the summation unit 417 and determines the displacement vector of the corresponding search window block as the motion vector.

【０２４７】図５６に示す構成においては、第１の演算
装置４００においてテンプレートブロックデータはレジ
スタ４１４ａ〜４１４ｗに格納され、１つのサーチウイ
ンドーブロックデータがレジスタ４１３ａ〜４１３ｗに
格納される。１つの演算サイクル（評価値決定サイク
ル）が完了するとサーチウインドーブロックの１行また
は１列データがシフトアウトされる。In the structure shown in FIG. 56, template block data is stored in registers 414a to 414w and one search window block data is stored in registers 413a to 413w in first arithmetic unit 400. When one operation cycle (evaluation value determination cycle) is completed, one row or one column data of the search window block is shifted out.

【０２４８】この場合においても、シフトアウトされた
サーチウインドーデータＳＹは第２の演算装置へ与えら
れ、また同様にテンプレートブロックデータＴＸが次の
動きベクトル検出サイクルにおいてシフトアウトされ、
第２の演算装置へ与えられる。シフトアウトされたサー
チウインドーデータＳＹのうち分数精度での動きベクト
ルに必要とされるサーチウインドーデータは先の図４６
に示した構成を利用して取出される。Also in this case, the shifted-out search window data SY is given to the second arithmetic unit, and similarly the template block data TX is shifted out in the next motion vector detection cycle.
It is provided to the second arithmetic unit. Of the shifted-out search window data SY, the search window data required for the motion vector with fractional accuracy is shown in FIG.
It is retrieved using the configuration shown in.

【０２４９】この図５６に示す構成において、レジスタ
４１４ａ〜４１４ｗをすべてスルー状態とし、要素演算
器４１１ａ〜４１１ｗへ同じテンプレートブロックデー
タを与え、またレジスタ４１３ａ〜４１３ｗへは、サー
チウインドーデータを１画素分ずつシフトインおよびシ
フトアウトしても同様の効果を得ることができる。この
場合、単位演算器へは、与えられたテンプレートブロッ
クＴＸと与えられたサーチウインドーデータとが常に同
じ変位ベクトルを有する関係にあるようにサーチウイン
ドーデータが与えられる。その場合には差分絶対値回路
に代えて差分絶対値和回路が用いられる。また他の構成
が利用されてもよく、整数精度での動きベクトルを検出
する機能を備える回路構成であればよい。In the structure shown in FIG. 56, all the registers 414a to 414w are set to the through state, the same template block data is given to the element arithmetic units 411a to 411w, and one window of search window data is input to the registers 413a to 413w. The same effect can be obtained by shifting in and shifting out by minutes. In this case, the unit window is supplied with the search window data such that the supplied template block TX and the supplied search window data always have the same displacement vector. In that case, a difference absolute value sum circuit is used instead of the difference absolute value circuit. Other configurations may be used as long as they are circuit configurations having a function of detecting a motion vector with integer precision.

【０２５０】また図５６に示すように要素演算器を縦続
接続する構成の場合、分数精度に必要とされるサーチウ
インドーデータを容易に得ることができる。すなわち、
サーチウインドーデータの出力段にシフトレジスタを設
け、かつ要素演算器において適当な間隔でサーチウイン
ドーデータをラッチするラッチ回路を設ける。このラッ
チ回路が配置される位置は、１つのサーチウインドーブ
ロックよりも１画素外部の位置に対応する。すなわち、
第１の演算装置４００の内部において、分数精度におい
て必要とされるサーチウインドーデータをすべてラッチ
しておき、整数精度での動きベクトル検出に必要とされ
る画素データに対応してのみ要素演算器が配置されて整
数精度での動きベクトルの検出が実行される。Further, as shown in FIG. 56, in the case where the element arithmetic units are cascade-connected, the search window data required for the fractional accuracy can be easily obtained. That is,
A shift register is provided at the output stage of the search window data, and a latch circuit for latching the search window data at appropriate intervals is provided in the element calculator. The position where this latch circuit is arranged corresponds to a position outside one pixel from one search window block. That is,
Inside the first arithmetic unit 400, all the search window data required for fractional precision are latched, and only the element arithmetic unit corresponding to the pixel data required for motion vector detection with integer precision is latched. Are arranged to detect the motion vector with integer precision.

【０２５１】この場合、テンプレートブロックデータＴ
Ｘの送出と並行して、分数精度に必要とされるサーチウ
インドーデータが順次シフトアウトされるため、分数精
度での動きベクトル検出時において、容易に参照フレー
ムにおける必要なウインドーデータを得ることが可能と
なり、装置構成が簡略化される。In this case, the template block data T
Since the search window data required for the fractional precision is sequentially shifted out in parallel with the transmission of X, it is possible to easily obtain the required window data in the reference frame when detecting the motion vector with the fractional precision. Is possible and the device configuration is simplified.

【０２５２】たとえばテンプレートブロックおよびサー
チウインドーブロックのサイズが１６画素×１６画素の
場合、分数精度での動きベクトルの検出に必要とされる
サーチウインドーのブロックサイズは１８画素×１８画
素である。このとき、最初の１８画素分をラッチするシ
フトレジスタを出力段に設けておき、次にサーチウイン
ドーブロック（１６画素×１６画素）の各列の上下１ビ
ットずつのデータをそれぞれラッチするラッチ回路を要
素演算器間の所定の位置に配置する。この場合、フレー
ムメモリからは分数精度に必要とされるサーチウインド
ーデータが読出されて第１の演算装置へ与えられる。こ
の状態を図５７に示す。For example, when the size of the template block and the search window block is 16 pixels × 16 pixels, the block size of the search window required for detecting the motion vector with the fractional accuracy is 18 pixels × 18 pixels. At this time, a shift circuit that latches the first 18 pixels is provided in the output stage, and then a latch circuit that latches the upper and lower 1-bit data of each column of the search window block (16 pixels × 16 pixels) Is arranged at a predetermined position between the element calculators. In this case, the search window data required for the fractional accuracy is read from the frame memory and given to the first arithmetic unit. This state is shown in FIG.

【０２５３】図５７において、要素演算器４１１ａ〜４
１１ｆ内にサーチウインドーブロック４３４のデータが
格納される。図５７においては、サーチウインドーブロ
ックデータＹＢが要素演算器４１１ｄに格納される状態
が例示される。分数精度での動きベクトル検出時におい
ては、このサーチウインドーブロック４３４の周辺画素
４３６を含む領域４３８の画素データが利用される。こ
の周辺画素領域４３６におけるデータが各ラッチおよび
シフトレジスタに格納される。図５７においては、周辺
画素ＹＡのデータがラッチ４３０ａにラッチされた状態
が示され、また周辺の画素列（画素ＹＣで代表する）は
出力段のシフトレジスタ４３２ａに格納され、他方の画
素列（画素ＹＤで代表する）は入力部のシフトレジスタ
４３２ｂに格納される。フレームメモリ４０１からは領
域４３８のデータがすべて読出される。ただし、評価値
生成サイクル毎に領域４３８の１列または１行のサーチ
ウインドーデータがシフトインおよびシフトアウトされ
る。In FIG. 57, element calculators 411a-4d.
The data of the search window block 434 is stored in 11f. In FIG. 57, a state in which the search window block data YB is stored in the element calculator 411d is illustrated. When detecting a motion vector with a fractional accuracy, the pixel data of the area 438 including the peripheral pixels 436 of the search window block 434 is used. The data in the peripheral pixel area 436 is stored in each latch and shift register. In FIG. 57, the state where the data of the peripheral pixel YA is latched by the latch 430a is shown, and the peripheral pixel column (represented by the pixel YC) is stored in the shift register 432a of the output stage and the other pixel column ( The pixel YD) is stored in the shift register 432b of the input unit. All the data in the area 438 is read from the frame memory 401. However, the search window data in one column or one row of the area 438 is shifted in and out in each evaluation value generation cycle.

【０２５４】この構成により、分数精度での動きベクト
ル検出に必要とされる画素データを容易に生成して分数
精度での動きベクトル検出を行なう第２の演算装置へ与
えることができる。また、要素演算器４１１ａ〜４１１
ｂはマトリクス状に配列されてよい（データ転送方向が
実質的に一方向であればよい）。With this configuration, the pixel data required for motion vector detection with fractional precision can be easily generated and given to the second arithmetic unit for detecting motion vector with fractional precision. Further, the element calculators 411a to 411
b may be arranged in a matrix (the data transfer direction may be substantially one direction).

【０２５５】［実施例３］上述の実施例１および実施例
２いずれにおいても１つのテンプレートブロックに対し
動きベクトルが検出されている。この場合、２つのテン
プレートブロックに対し並列して動きベクトルの検出動
作を行なうこともできる。以下、この構成について説明
する。[Embodiment 3] In both Embodiments 1 and 2 described above, a motion vector is detected for one template block. In this case, the motion vector detecting operation can be performed on the two template blocks in parallel. The configuration will be described below.

【０２５６】図５８にサーチエリアとテンプレートブロ
ックのサイズを示す。図５８において、サーチエリア４
５は４８画素×４８画素の大きさを備え、テンプレート
ブロック４３は１６画素×１６画素の大きさを備える。
検索範囲は水平方向＋１６〜−１６および垂直方向＋１
６〜−１６である。FIG. 58 shows the sizes of the search area and template block. In FIG. 58, the search area 4
5 has a size of 48 pixels × 48 pixels, and the template block 43 has a size of 16 pixels × 16 pixels.
Search range is +16 to -16 in the horizontal direction and +1 in the vertical direction
6 to -16.

【０２５７】今、図５９に示すように、現フレーム画像
４９において隣接する２のマクロブロックＴＢ１および
ＴＢ２をテンプレートブロックとして動きベクトルを検
出する動作を考える。この実施例においては、図１およ
び図４に示すプロセサアレイの構成が利用される。プロ
セサアレイ内においては図６０に示すサーチウインドー
４０の画素データが格納される。Now, as shown in FIG. 59, consider the operation of detecting a motion vector using two adjacent macroblocks TB1 and TB2 in the current frame image 49 as template blocks. In this embodiment, the configuration of the processor array shown in FIGS. 1 and 4 is used. The pixel data of the search window 40 shown in FIG. 60 is stored in the processor array.

【０２５８】今、図６０に示すようにサーチウインドー
ブロック４２がプロセサアレイ内の各要素プロセサに格
納されている状態を考える。このときテンプレートブロ
ックＴＢ１に対する変位ベクトルは（０，−１６）であ
り、テンプレートブロックＴＢ２については変位ベクト
ル（動きベクトルの候補）は（−１６，−１６）であ
る。またサーチウインドーブロック４２ａに対しては、
テンプレートブロックＴＢ１の変位ベクトルは（＋１
６，−１６）であり、テンプレートブロックＴＢ２につ
いては変位ベクトルは（０，−１６）である。この２つ
のテンプレートブロックＴＢ１およびＴＢ２に対する評
価値の演算を並列して実行する。Now, let us consider a state in which the search window block 42 is stored in each element processor in the processor array as shown in FIG. At this time, the displacement vector for the template block TB1 is (0, -16), and for the template block TB2, the displacement vector (motion vector candidate) is (-16, -16). For the search window block 42a,
The displacement vector of the template block TB1 is (+1
6, -16), and the displacement vector for the template block TB2 is (0, -16). The calculation of the evaluation value for these two template blocks TB1 and TB2 is executed in parallel.

【０２５９】プロセサアレイの要素プロセサＰＥ内に
は、図６０に示すサーチウインドーブロック４２が格納
されている。同様にプロセサアレイ内の各要素エレメン
トにはテンプレートブロックＴＢ１およびＴＢ２の画素
データがそれぞれ画面上の配列順序を維持して格納され
る。この要素プロセサの構成については後に説明する。
要素プロセサの基本的動作は先の実施例１および２で説
明したものと同様である。In the element processor PE of the processor array, a search window block 42 shown in FIG. 60 is stored. Similarly, the pixel data of the template blocks TB1 and TB2 are stored in the respective element elements in the processor array while maintaining the arrangement order on the screen. The structure of this element processor will be described later.
The basic operation of the element processor is the same as that described in the first and second embodiments.

【０２６０】まず図６１に示すように、サーチウインド
ーブロック４２を用いてテンプレートブロックＴＢ１お
よびＴＢ２の変位ベクトル（０，−１６）および（−１
６，−１６）に対する評価値が計算される。First, as shown in FIG. 61, using the search window block 42, the displacement vectors (0, -16) and (-1) of the template blocks TB1 and TB2.
The evaluation value for 6, -16) is calculated.

【０２６１】この変位ベクトル（０，−１６）および
（−１６，−１６）に対する評価値の算出が完了する
と、プロセサアレイにおいては、１画素分のシフト動作
が実行される。When the calculation of the evaluation values for the displacement vectors (0, -16) and (-16, -16) is completed, the processor array executes the shift operation for one pixel.

【０２６２】この状態では、図６２に示すように、サー
チウインドー４０においてサーチウインドーブロック４
２が１行図の下方方向へシフトした状態となる（ブロッ
ク４２ａで示す）。この状態においては、テンプレート
ブロックＴＢ１のサーチウインドーブロック４２ａに対
する変位ベクトルは（０，−１５）であり、テンプレー
トブロックＴＢ２の変位ベクトルは（−１６，−１５）
である。各変位ベクトルに対し評価値が算出される。In this state, as shown in FIG. 62, the search window block 4 in the search window 40 is searched.
2 is shifted downward in the one-row diagram (indicated by block 42a). In this state, the displacement vector of the template block TB1 with respect to the search window block 42a is (0, −15), and the displacement vector of the template block TB2 is (−16, −15).
Is. An evaluation value is calculated for each displacement vector.

【０２６３】この動作を繰返す。３３サイクル完了する
と、図６３に示すように、プロセサアレイ内の要素プロ
セサにはサーチウインドーブロック４２ｂの画素データ
が格納される。この状態では、テンプレートブロックＴ
Ｂ１の変位ベクトルが（０，１６）、テンプレートブロ
ックＴＢ２についての変位ベクトルが（−１６，１６）
となる。さらにサーチウインドーデータのシフトを実行
すると、サーチウインドー４０は１列図の右方向にシフ
トし、新たなサーチウインドー４０ａを用いて動きベク
トルの評価が実行される。This operation is repeated. When 33 cycles are completed, as shown in FIG. 63, the pixel data of the search window block 42b is stored in the element processor in the processor array. In this state, the template block T
The displacement vector of B1 is (0,16), and the displacement vector of the template block TB2 is (-16,16).
Becomes When the search window data is further shifted, the search window 40 shifts to the right in the one-column diagram, and the motion vector is evaluated using the new search window 40a.

【０２６４】すなわち、図６４に示すように、新たなサ
ーチウインドー４０ａにおける最も上方向のサーチウイ
ンドーブロック４２ｃを用いて動きベクトルの評価が行
なわれる。テンプレートブロックＴＢ１はこのサーチウ
インドーブロック４２ｃに対してはその変位ベクトルが
（１，−１５）であり、テンプレートブロックＴＢ２に
関してはその変位ベクトルは（−１５，−１６）であ
る。That is, as shown in FIG. 64, the motion vector is evaluated using the uppermost search window block 42c in the new search window 40a. The displacement vector of the template block TB1 is (1, -15) for the search window block 42c, and the displacement vector of the template block TB2 is (-15, -16).

【０２６５】以降この動作を繰返すことにより最終のサ
ーチウインドー４０ｂの最下部のサーチウインドーブロ
ック４２ｄに対する動きベクトルの評価が完了すると、
テンプレートブロックＴＢ１に対する動きベクトル決定
サイクルが完了する。サーチウインドー４０が与えられ
る前に、既にその水平成分−１６から−１までの１６列
分は処理が完了していると考える。この状態において
は、テンプレートブロックＴＢ１に対し動きベクトルの
候補を示す評価点１０８９点について評価値を算出する
ことができ、動きベクトルが算出される。テンプレート
ブロックＴＢ２に対してはこの操作をさらに繰返すこと
により動きベクトルを検出することができる。After this operation is repeated, when the evaluation of the motion vector for the lowermost search window block 42d of the final search window 40b is completed,
The motion vector determination cycle for template block TB1 is completed. Before the search window 40 is given, it is considered that 16 columns of horizontal components -16 to -1 have already been processed. In this state, the evaluation value can be calculated for the evaluation point 1089 indicating the motion vector candidate for the template block TB1, and the motion vector is calculated. By repeating this operation for template block TB2, the motion vector can be detected.

【０２６６】すなわち、図６５に示すように、テンプレ
ートブロックＴＢ１およびＴＢ２の動きベクトル評価操
作を、その動きベクトル検出サイクルの半周期を重なり
合うようにして並列態様で動きベクトル評価処理を実行
することができる。これにより高速で画像の動き補償を
実行することが可能となる。次にこの２つのテンプレー
トブロックに対する動きベクトルの評価を並列して行な
うための構成について説明する。That is, as shown in FIG. 65, the motion vector evaluation operation of the template blocks TB1 and TB2 can be executed in a parallel manner by overlapping the half cycles of the motion vector detection cycles. . Thereby, it becomes possible to execute motion compensation of an image at high speed. Next, a configuration for parallelly evaluating motion vectors for these two template blocks will be described.

【０２６７】図６６は、この発明の第３の実施例に用い
られる要素プロセサの構成を示す図である。図６６にお
いて、要素プロセサＰＥは、隣接要素プロセサから与え
られるサーチウインドーデータを格納するサーチウイン
ドーデータ用レジスタ５０５ｃと、隣接要素プロセサか
ら与えられる右側のテンプレートブロックの画素データ
を格納するための右テンプレートデータ用レジスタ５０
５ａと、隣接要素プロセサから与えられるテンプレート
ブロックの画素データを格納するための左テンプレート
データ用レジスタ５０５ｂと、テンプレートデータ用レ
ジスタ５０５ａおよび５０５ｂの出力の一方を選択する
ためのセレクタ５０６と、レジスタ５０５ｃの出力とセ
レクタ５０６の出力をそれぞれ信号線５１０および５０
９を介して受け、差分絶対値演算を行なう差分絶対値回
路５０７を含む。FIG. 66 is a diagram showing the structure of an element processor used in the third embodiment of the present invention. In FIG. 66, the element processor PE is a search window data register 505c for storing the search window data given from the adjacent element processor, and a right window for storing the pixel data of the template block on the right side given by the adjacent element processor. Template data register 50
5a, a left template data register 505b for storing pixel data of a template block given from an adjacent element processor, a selector 506 for selecting one of the outputs of the template data registers 505a and 505b, and a register 505c. The output and the output of the selector 506 are connected to signal lines 510 and 50, respectively.
9 includes a differential absolute value circuit 507 for receiving the differential absolute value and performing a differential absolute value calculation.

【０２６８】差分絶対値回路５０７の出力は信号線５１
１を介して総和部へ与えられる。この図６６に示す要素
プロセサＰＥが図４に示すような２次元アレイ状に配置
される。テンプレートブロックデータは水平成分につい
て１６列（サーチエリアにおける１６列）に対する動き
ベクトル評価が行なわれた後に更新される。このとき、
右側テンプレートブロックおよび左側テンプレートブロ
ックが同時に更新されるのではなく、一方のテンプレー
トブロックデータレジスタの格納データのみが更新され
てもよい。それに応じてセレクタ５０６の選択タイミン
グが切換えられる。１つのテンプレートブロックが選択
される順序がどの動作サイクルにおいても同じであれば
よい。The output of the absolute difference circuit 507 is the signal line 51.
It is given to Sowabe via 1. The element processors PE shown in FIG. 66 are arranged in a two-dimensional array as shown in FIG. The template block data is updated after the motion vector evaluation is performed on 16 columns (16 columns in the search area) for the horizontal component. At this time,
Instead of updating the right side template block and the left side template block at the same time, only the data stored in one of the template block data registers may be updated. The selection timing of the selector 506 is switched accordingly. The order in which one template block is selected may be the same in any operation cycle.

【０２６９】差分絶対値回路５０７および総和部の構成
は先の第１および第２の実施例において説明したものと
同様の構成が利用される。この要素プロセサＰＥの動作
は同じであるため詳細には繰返さない。セレクタ５０６
によりテンプレートデータ用レジスタ５０５ａおよび５
０５ｂが順次選択され、サーチウインドーデータ用レジ
スタ５０５ｃに格納されたサーチウインドーブロックデ
ータとの差分絶対値が求められる。差分絶対値回路５０
７からはしたがって右側テンプレートブロックデータに
対する差分絶対値と左側テンプレートブロックデータに
対する差分絶対値とが時分割的に総和部へ与えられる。As the configurations of the absolute difference circuit 507 and the summing unit, the same configurations as those described in the first and second embodiments are used. Since the operation of this element processor PE is the same, it will not be repeated in detail. Selector 506
Causes template data registers 505a and 505
05b are sequentially selected, and the absolute difference value from the search window block data stored in the search window data register 505c is obtained. Difference absolute value circuit 50
From 7, therefore, the absolute difference value for the right template block data and the absolute difference value for the left template block data are provided to the summing section in a time division manner.

【０２７０】図６７，図６８は、右側テンプレートブロ
ックデータおよび左側テンプレートブロックデータを生
成するための構成を示す図である。図６７において、右
側テンプレートブロックデータは現フレーム画像用フレ
ームメモリ５５０から直接読出され、左側テンプレート
ブロックデータは、バッファメモリ５５２から読出され
る。バッファメモリ５５２は、現フレーム画像用フレー
ムメモリ５５０から読出される右側テンプレートブロッ
クデータを順次格納する。バッファメモリ５５２の構成
は先に図４７を参照して説明した書込経路と読出経路と
が別々のメモリを利用することができる。またＦＩＦＯ
型のメモリが用いられてもよい。67 and 68 are diagrams showing a structure for generating the right side template block data and the left side template block data. In FIG. 67, the right template block data is read directly from the current frame image frame memory 550, and the left template block data is read from the buffer memory 552. The buffer memory 552 sequentially stores the right template block data read from the current frame image frame memory 550. As the structure of the buffer memory 552, it is possible to use the memory described above with reference to FIG. 47, in which the write path and the read path are separate. Also FIFO
Type of memory may be used.

【０２７１】バッファメモリ５５２は、テンプレートブ
ロックの全画素データを格納する記憶容量が必要とされ
る。現フレーム画像用フレームメモリ５５０は、現フレ
ーム画像のすべての画素データを格納する。この図６７
に示す構成においては、プロセサアレイにおいて水平方
向において１６列の動きベクトル評価動作が完了された
後右側テンプレートブロックおよび左側テンプレートブ
ロックが共に更新される。The buffer memory 552 requires a storage capacity for storing all pixel data of the template block. The current frame image frame memory 550 stores all pixel data of the current frame image. This FIG. 67
In the configuration shown in (1), both the right side template block and the left side template block are updated after the motion vector evaluation operation of 16 columns in the horizontal direction is completed in the processor array.

【０２７２】図６８は一方のテンプレートブロックデー
タのみをプロセサアレイ内において更新するための構成
を示す。図６８において、現フレーム画像用フレームメ
モリ５５０から読出されたテンプレートブロックデータ
はセレクタ５５４により右側テンプレートブロックデー
タまたは左側テンプレートブロックデータとして出力さ
れる。セレクタ５５４は、現フレーム画像用フレームメ
モリ５５０から読出されたテンプレートブロックデータ
を右側テンプレートブロックデータ経路または左側テン
プレートブロックデータ経路へ伝達する。FIG. 68 shows a structure for updating only one template block data in the processor array. In FIG. 68, the template block data read from the current frame image frame memory 550 is output as the right template block data or the left template block data by the selector 554. The selector 554 transmits the template block data read from the current frame image frame memory 550 to the right template block data path or the left template block data path.

【０２７３】この場合、図６６に示す要素プロセサＰＥ
において、右テンプレートデータ用レジスタ５０５ａま
たは左テンプレートデータ用レジスタ５０５ｂの一方の
格納データのみが更新される。このセレクタ５５４の選
択態様の切換えに応じて、図６６に示すセレクタ５０６
の選択順序が切換えられる。In this case, the element processor PE shown in FIG.
In, only the data stored in one of the right template data register 505a and the left template data register 505b is updated. In response to the switching of the selection mode of the selector 554, the selector 506 shown in FIG. 66.
The selection order of is switched.

【０２７４】図６９は要素プロセサの他の構成例を示す
図である。図６９に示す要素プロセサＰＥは、２つの差
分絶対値回路５０７ａおよび５０７ｂを含む。差分絶対
値回路５０７ａは、サーチウインドーデータ用レジスタ
５０５ｃの格納データと右テンプレートデータ用レジス
タ５０５ａの格納データとの差分絶対値を求める。差分
絶対値回路５０７ｂは、サーチウインドーデータ用レジ
スタ５０５ｃの格納データと左テンプレートデータ用レ
ジスタ５０５ｂの格納データとの差分絶対値を求める。FIG. 69 is a diagram showing another example of the configuration of the element processor. The element processor PE shown in FIG. 69 includes two difference absolute value circuits 507a and 507b. The absolute difference circuit 507a calculates the absolute difference between the data stored in the search window data register 505c and the data stored in the right template data register 505a. The absolute difference circuit 507b calculates the absolute difference between the data stored in the search window data register 505c and the data stored in the left template data register 505b.

【０２７５】差分絶対値回路５０７ａおよび５０７ｂの
出力はそれぞれ信号線５１１ａおよび５１１ｂを介して
総和部へ伝達される。総和部には、差分絶対値回路５０
７ａおよび５０７ｂそれぞれに対応して２つの総和回路
が設けられてもよい。１つの総和回路がセレクタを介し
て時分割的に駆動される構成が利用されてもよい。The outputs of the absolute difference circuits 507a and 507b are transmitted to the summing unit via signal lines 511a and 511b, respectively. In the summing section, the absolute difference circuit 50
Two summing circuits may be provided corresponding to 7a and 507b, respectively. A configuration in which one summing circuit is driven in a time division manner via a selector may be used.

【０２７６】上述の要素プロセサの構成においては隣接
要素プロセサ間でテンプレートデータも転送されている
と説明している。分数精度での動きベクトル検出部にこ
のテンプレートブロックデータが直接伝達されない場合
には、隣接要素プロセサ間でテンプレートブロックデー
タをシフトさせることなく外部から直接各要素プロセサ
へテンプレートデータが与えられる構成が利用されても
よい。テンプレートブロックデータは動きベクトル評価
動作期間中このプロセサアレイ内に常駐しているためで
ある（シフト操作は必要とされない）。It has been described that template data is also transferred between adjacent element processors in the configuration of the above element processor. If this template block data is not directly transmitted to the motion vector detection unit with fractional accuracy, the template data is directly given to each element processor from the outside without shifting the template block data between adjacent element processors. May be. This is because the template block data is resident in this processor array during the motion vector evaluation operation (shift operation is not required).

【０２７７】図７０，図７１は総和部および比較部の構
成を示す図である。図７０において、総和部５６０は、
要素プロセサＰＥに含まれる差分絶対値回路からの差分
絶対値を並列に受け、総和演算を実行する。総和部５６
０の出力は右比較部５６２および左比較部５６４へ与え
られる。右比較部５６２および左比較部５６４はそれぞ
れ活性化信号φ１およびφ２に応答して活性化される。
右比較部５６２は、右側テンプレートブロックに対する
動きベクトルを決定し、左比較部５６４は左側テンプレ
ートブロックに対する動きベクトルを検出する。70 and 71 are diagrams showing the structures of the summing unit and the comparing unit. In FIG. 70, the summation unit 560 is
The difference absolute value from the difference absolute value circuit included in the element processor PE is received in parallel, and the sum operation is executed. Sum Department 56
The output of 0 is given to the right comparing section 562 and the left comparing section 564. Right comparison unit 562 and left comparison unit 564 are activated in response to activation signals φ1 and φ2, respectively.
The right comparison unit 562 determines the motion vector for the right template block, and the left comparison unit 564 detects the motion vector for the left template block.

【０２７８】この総和部５６０および比較部５６２なら
びに５６４の構成は先の実施例に示したものを利用する
ことができる。すなわち総和部５６０としては図４０な
いし４２に示す構成を利用することができ、比較部５６
２および５６４に対しては図４３に示す構成を利用する
ことができる。活性化制御信号φ１およびφ２はその比
較部において含まれる比較器（図４３参照）の活性化お
よびレジスタラッチのラッチ動作を制御するために与え
られる。The structures of the summing unit 560 and the comparing units 562 and 564 can utilize those shown in the previous embodiment. That is, as the summation unit 560, the configuration shown in FIGS.
For 2 and 564, the configuration shown in FIG. 43 can be used. Activation control signals φ1 and φ2 are applied to control activation of a comparator (see FIG. 43) included in the comparison section and a latch operation of a register latch.

【０２７９】図７０に示す構成においては、総和部５６
０からは右側テンプレートブロックに対する評価値およ
び左側テンプレートブロックに対する評価値が時分割的
に出力される。この場合、比較動作を確実に行なうため
に、図７１に示すように、総和部５６０と比較部５６２
および５６４との間に制御信号φＣに応答してその信号
伝達経路を切換えるセレクタ５６６が設けられてもよ
い。セレクタ５６６は、右側テンプレートブロックの評
価値および左側テンプレートブロックの評価値に応じて
その制御信号φＣの制御の下に信号伝達経路を切換え、
総和部５６０から出力される評価値をそれぞれ対応の右
比較部５６２または左比較部５６４へ伝達する。In the structure shown in FIG. 70, the summation unit 56
From 0, the evaluation value for the right template block and the evaluation value for the left template block are output in a time division manner. In this case, in order to surely perform the comparison operation, as shown in FIG. 71, the summation unit 560 and the comparison unit 562.
And 564 may be provided with selector 566 that switches its signal transmission path in response to control signal φC. The selector 566 switches the signal transmission path under the control of the control signal φC according to the evaluation value of the right template block and the evaluation value of the left template block,
The evaluation value output from the summing unit 560 is transmitted to the corresponding right comparing unit 562 or left comparing unit 564.

【０２８０】図６９に示すように要素プロセサが２つの
差分絶対値回路を含む場合、総和部５６０は２つの総和
回路を含んでもよい。この場合、総和部５６０からは並
列に右側テンプレートブロックおよび左側テンプレート
ブロックそれぞれに対する評価値が出力される。When the element processor includes two difference absolute value circuits as shown in FIG. 69, summation unit 560 may include two summation circuits. In this case, the summation unit 560 outputs evaluation values for the right side template block and the left side template block in parallel.

【０２８１】図７２は分数精度での動きベクトルの検出
をも２つのテンプレートブロックに対し並列に行なうた
めの構成を示す図である。FIG. 72 is a diagram showing a structure for detecting a motion vector with fractional accuracy in parallel for two template blocks.

【０２８２】図７２において、分数精度での動きベクト
ルを検出するための装置は、整数精度での動きベクトル
を検出するプロセサアレイから出力されるサーチウイン
ドーデータのうち、分数精度での動きベクトル検出に必
要とされるサーチウインドーデータを格納するバッファ
メモリ５８０および５８２と、バッファメモリ５８０か
らのサーチウインドーデータと左側テンプレートブロッ
クデータとを受けて分数精度での動きベクトルを検出す
る分数精度動きベクトル演算装置５８４と、バッファメ
モリ５８２からのサーチウインドーデータと右側テンプ
レートブロックデータとを受けて右側テンプレートブロ
ックに対する分数精度での動きベクトルを検出する分数
精度動きベクトル演算装置５８６を含む。In FIG. 72, a device for detecting a motion vector with a fractional precision includes a motion vector detection with a fractional precision in search window data output from a processor array for detecting a motion vector with an integer precision. Buffers 580 and 582 for storing the search window data required for the search window data, and a fractional precision motion vector for receiving the search window data from the buffer memory 580 and the left side template block data to detect the fractional precision motion vector. An arithmetic unit 584 and a fractional precision motion vector arithmetic unit 586 for receiving the search window data and the right side template block data from the buffer memory 582 and detecting the motion vector with the fractional precision for the right side template block are included.

【０２８３】バッファメモリ５８０および５８２はそれ
ぞれ書込制御信号φＡおよびφＢに応答してサーチウイ
ンドーデータの書込を実行する。この制御信号φＡおよ
びφＢはそれぞれ図７０，図７７に示す比較部５６４お
よび５６２からそれぞれ発生される。バッファメモリ５
８０および５８２には先の第２の実施例において説明し
たものと同様の構成が利用される。バッファメモリ５８
０は、左側テンプレートブロックに関する整数精度での
動きベクトル決定時にその記憶データが順次読出され、
バッファメモリ５８２は、右側テンプレートブロックに
対する動きベクトル決定時にその記憶内容が読出され
る。Buffer memories 580 and 582 execute writing of search window data in response to write control signals φA and φB, respectively. Control signals φA and φB are generated from comparing units 564 and 562 shown in FIGS. 70 and 77, respectively. Buffer memory 5
For 80 and 582, the same structure as that described in the second embodiment is used. Buffer memory 58
0 indicates that the storage data of the left template block is sequentially read when the motion vector is determined with integer precision,
The buffer memory 582 has its stored contents read when the motion vector for the right template block is determined.

【０２８４】分数精度動きベクトル演算装置５８４およ
び５８６は、先に第２の実施例において説明したものと
同様の構成が利用される。その分数精度動きベクトル演
算装置５８４および５８６の動作は先の第２の実施例に
説明したものと同じであり、その詳細は繰返さない。The fractional precision motion vector arithmetic units 584 and 586 have the same structure as that described in the second embodiment. The operations of the fractional precision motion vector arithmetic units 584 and 586 are the same as those described in the second embodiment, and the details thereof will not be repeated.

【０２８５】ここで、右側テンプレートブロックデータ
と左側テンプレートブロックデータとはそれぞれ１６水
平成分に対応するサイクル期間の出力タイミングのずれ
を有している。右側テンプレートブロックデータおよび
左側テンプレートブロックデータがともに１６水平成分
サイクル毎に更新される場合には、左側テンプレートブ
ロックデータおよび右側テンプレートブロックデータそ
れぞれに対しバッファ手段が設けられ、タイミング調整
が実行される。またこのとき、右側テンプレートブロッ
クと左側テンプレートブロックの動きベクトル決定期間
には時間のずれがある。分数精度動きベクトル演算に対
しても同様にタイミングのずれが生ずる。この場合、分
数精度動きベクトル演算装置を右側テンプレートブロッ
クおよび左側テンプレートブロック両者に対し共通に利
用することもできる。Here, the right-side template block data and the left-side template block data each have a difference in output timing in the cycle period corresponding to 16 horizontal components. When both the right side template block data and the left side template block data are updated every 16 horizontal component cycles, buffer means is provided for each of the left side template block data and the right side template block data, and timing adjustment is executed. At this time, there is a time lag between the motion vector determination periods of the right template block and the left template block. A timing shift similarly occurs for the fractional precision motion vector calculation. In this case, the fractional precision motion vector calculation device can be commonly used for both the right side template block and the left side template block.

【０２８６】図７３はこの図７２に示す分数精度動きベ
クトル検出構成の変更例を示す図である。図７３におい
て、分数精度での動きベクトルを検出するための構成
は、左側テンプレートブロックデータを格納するための
バッファメモリ５９０と、サーチウインドーデータをそ
れぞれ制御信号φＡおよびφＢに応答して記憶するバッ
ファメモリ５９２および５９４と、右側テンプレートブ
ロックデータを格納するバッファメモリ５９６と、バッ
ファメモリ５９０、５９２、５９４および５９６の読出
データから１組のテンプレートデータおよびサーチウイ
ンドーデータを選択するセレクタ５９８と、セレクタ５
９８から出力されるテンプレートデータおよびサーチウ
インドーデータに従って分数精度での動きベクトルを検
出する第２の演算装置５９９を含む。FIG. 73 is a diagram showing a modification of the fractional precision motion vector detection configuration shown in FIG. In FIG. 73, a configuration for detecting a motion vector with a fractional precision includes a buffer memory 590 for storing left side template block data and a buffer for storing search window data in response to control signals φA and φB, respectively. Memories 592 and 594, a right side template block data buffer memory 596, a selector 598 for selecting a set of template data and search window data from the read data of the buffer memories 590, 592, 594 and 596, and a selector 5
A second arithmetic unit 599 for detecting a motion vector with a fractional accuracy according to the template data and search window data output from 98 is included.

【０２８７】セレクタ５９８は、左側テンプレートブロ
ックに対する分数精度での動きベクトル決定時において
は、バッファメモリ５９０および５９２の記憶データを
選択する。右側テンプレートブロックに対する分数精度
での動きベクトル検出時においてはセレクタ５９８はバ
ッファメモリ５９４および５９６の読出データを選択す
る。第２の演算装置５９９はこの与えられたテンプレー
トデータおよびサーチウインドーデータに従って分数精
度での動きベクトルを決定する。The selector 598 selects the data stored in the buffer memories 590 and 592 when the motion vector for the left side template block is determined with the fractional precision. When the motion vector is detected with the fractional accuracy for the right template block, the selector 598 selects the read data from the buffer memories 594 and 596. The second arithmetic unit 599 determines a motion vector with a fractional accuracy according to the given template data and search window data.

【０２８８】バッファメモリ５９０、５９２、５９４お
よび５９６を用いることにより左側テンプレートブロッ
クおよび右側テンプレートブロックそれぞれに対する分
数精度での動きベクトルを確実に検出することができ
る。このときまた、図７４に示すように、左側テンプレ
ートブロックおよび右側テンプレートブロックそれぞれ
に対する整数精度での動きベクトル決定動作と並行して
パイプライン態様で右側テンプレートブロックおよび左
側テンプレートブロックに対する分数精度での動きベク
トルを検出することができる。この構成により、より効
率的で回路規模の小さな動きベクトル検出装置を得るこ
とができる。By using the buffer memories 590, 592, 594 and 596, it is possible to reliably detect the motion vector with the fractional accuracy for the left side template block and the right side template block. At this time, as shown in FIG. 74, in parallel with the motion vector determination operation with integer precision for each of the left template block and the right template block, the motion vector with fractional precision for the right template block and the left template block is pipelined in parallel. Can be detected. With this configuration, it is possible to obtain a more efficient motion vector detection device having a small circuit scale.

【０２８９】ここで、図７３に示す構成においてバッフ
ァメモリからのデータの読出タイミングについては厳密
に説明していない。しかしながら、先の分数精度での動
きベクトル検出を行なう第２の実施例において説明した
と同様にテンプレートブロックデータについては、整数
精度演算装置の比較部からの動きベクトル決定信号に応
じてテンプレートブロックデータの書込を実行し、すべ
てのデータの書込完了後にデータを読出す構成が利用さ
れればよい。Here, the timing of reading data from the buffer memory in the structure shown in FIG. 73 is not strictly described. However, in the same manner as described in the second embodiment in which the motion vector detection is performed with the fractional accuracy described above, the template block data is converted into the template block data according to the motion vector determination signal from the comparison unit of the integer precision arithmetic unit. A configuration may be used in which writing is executed and data is read after writing of all data is completed.

【０２９０】このようにバッファメモリによりテンプレ
ートブロックデータをもバッファ処理することにより、
図７４に示すように整数精度での動きベクトル検出動作
と分数精度での動きベクトル検出動作とをパイプライン
化して高速で決定することができる。ここで、図７４に
おいて、マクロブロック♯１〜♯６に対する整数精度で
の動きベクトル検出と分数精度での動きベクトル検出の
シーケンスが一例として示される。By thus buffering the template block data with the buffer memory as well,
As shown in FIG. 74, a motion vector detection operation with integer precision and a motion vector detection operation with fractional precision can be pipelined and determined at high speed. Here, in FIG. 74, a sequence of motion vector detection with integer precision and motion vector detection with fractional precision for macro blocks # 1 to # 6 is shown as an example.

【０２９１】「第４の実施例」蓄積エリア符号化技術で
は、画像信号が記憶装置に格納されるため、画像処理に
おいて時間軸の制約を受けない。画像信号はＣＤ−ＲＯ
Ｍなどの記憶媒体に格納されるからである。このような
蓄積メディア用動画像動き補償の場合、（ｉ）過去から
現在を予測する順方向動き補償、（ｉｉ）未来から現在
を予測する逆方向動き補償および（ｉｉｉ）過去および
未来両者から現在を予測する補間動き補償（内挿動き補
償）の３つの動き補償が可能である。[Fourth Embodiment] In the storage area coding technique, since the image signal is stored in the storage device, the time axis is not restricted in the image processing. Image signal is CD-RO
This is because it is stored in a storage medium such as M. In the case of such moving image motion compensation for storage media, (i) forward motion compensation for predicting the present from the past, (ii) backward motion compensation for predicting the present from the future, and (iii) both past and future to the present Three types of motion compensation are possible: interpolating motion compensation (interpolation motion compensation) for predicting.

【０２９２】順方向および逆方向動き補償の一方方向の
予測（以下、片側予測と称す）においては、１つの参照
フレームと現在のフレーム（符号化されるべき画像フレ
ーム）とが用いられるため、先の実施例１ないし３の構
成を利用することができる。補間動き補償（内挿動き補
償）においては過去および未来の参照フレームと現在の
フレームとを必要とする。この補間動き補償において
は、２つの参照フレーム（過去の参照フレームと未来の
参照フレーム）と現在のフレームとの時間距離を考慮し
て、これらの２つの参照フレームの合成を行なう。この
合成参照フレーム（またはブロック）を用いて現在のフ
レームの画素の動き補償付予測値を求める。この合成参
照フレーム（またはブロック）に対する動きベクトルを
求めるためには、過去の参照フレームのマクロブロック
により求めた動きベクトル（ブロックマッチングによる
評価値算出に基づいて求めた動きベクトル）と未来の参
照フレームのマクロブロックにより求めた動きベクトル
とから最適な動きベクトルを求める必要がある。Since one reference frame and the current frame (image frame to be coded) are used in one-way prediction of forward and backward motion compensation (hereinafter, referred to as one-sided prediction), The configurations of the first to third embodiments can be used. Interpolation motion compensation (interpolation motion compensation) requires past and future reference frames and the current frame. In this interpolation motion compensation, these two reference frames are combined in consideration of the time distance between the two reference frames (past reference frame and future reference frame) and the current frame. Using this composite reference frame (or block), the motion-compensated prediction value of the pixel of the current frame is obtained. In order to obtain the motion vector for this combined reference frame (or block), the motion vector obtained by the macroblock of the past reference frame (the motion vector obtained based on the evaluation value calculation by block matching) and the future reference frame It is necessary to find the optimum motion vector from the motion vector found by the macroblock.

【０２９３】このような処理は、一例として、ＭＰＥＧ
（ｍｏｔｉｏｎｐｉｃｔｕｒｅｉｍａｇｅｃｏｄｉ
ｎｇｅｘｐｅｒｔｓｇｒｏｕｐ）の蓄積用動画像符
号化標準（ＭｏｔｉｏｎＰｉｃｔｕｒｅＩｍａｇｅ
ＣｏｄｉｎｇＳｔａｎｄａｒｄｆｏｒＳｔｏｒ
ａｇｅＭｅｄｉａ）におけるＢピクチャーに対して実
行される。以下、特にＢピクチャーなどの用途に限定さ
れるものではないが、第４の実施例として内挿予測（補
間予測）による動きベクトル検出を行なうための構成に
ついて説明する。[0293] Such processing is, for example, MPEG
(Motion picture image codi
ng expert's group (Motion Picture Image)
Coding Standard for Store
This is performed for B pictures in the Age Media). Hereinafter, although not particularly limited to applications such as B pictures, a configuration for performing motion vector detection by interpolation prediction (interpolation prediction) will be described as a fourth embodiment.

【０２９４】図７５はこの発明の第４の実施例である動
きベクトル検出装置の全体の構成を概略的に示す図であ
る。図７５において、内挿予測動きベクトル検出装置
は、第１のサーチエリアの画素データを格納するための
第１のサーチエリアデータ記憶部１００１と、現在のフ
レームのテンプレートブロックの画素データを格納する
テンプレートデータ記憶部１０００と、第２のサーチエ
リアの画素データを格納するための第２のサーチエリア
データ格納部１００２とを含む。FIG. 75 is a diagram schematically showing the overall structure of the motion vector detecting device according to the fourth embodiment of the present invention. In FIG. 75, the interpolation motion vector predictor detection apparatus includes a first search area data storage unit 1001 for storing pixel data of a first search area, and a template storing pixel data of a template block of the current frame. It includes a data storage unit 1000 and a second search area data storage unit 1002 for storing pixel data in the second search area.

【０２９５】第１および第２のサーチエリアデータ格納
部１００１および１００２は、各々第１および第２の参
照フレームの画素データを格納するフレームメモリであ
ってもよく、また、テンプレートデータ記憶部１０００
は、現在のフレームの画素データを格納するフレームメ
モリであってもよい。第１の参照フレームおよび第２の
参照フレームの一方は過去のフレームであり、他方は、
未来のフレームである。時間軸に関して一方方向の複数
のフレームの合成により動き補償を行なう構成のような
場合には、第１および第２の参照フレームは現在のフレ
ームに関して時間軸上で同一方向に配置されるものであ
ってもよい。The first and second search area data storage units 1001 and 1002 may be frame memories for storing pixel data of the first and second reference frames, respectively, and the template data storage unit 1000.
May be a frame memory that stores the pixel data of the current frame. One of the first reference frame and the second reference frame is a past frame, and the other is
It is the frame of the future. In a case where the motion compensation is performed by combining a plurality of frames in one direction with respect to the time axis, the first and second reference frames are arranged in the same direction on the time axis with respect to the current frame. May be.

【０２９６】内挿予測動きベクトル検出装置はさらに、
テンプレートデータ記憶部１０００からのテンプレート
ブロック画素データと第１のサーチエリアデータ記憶部
１００１からのサーチエリアデータ（サーチウインドー
データ）とを受けて第１の動きベクトルｍｖａを求める
第１の片側予測動き検出部１００３と、記憶部１０００
からのテンプレートブロック画素データと第２のサーチ
エリアデータ記憶部１００２から読出されたサーチエリ
アデータとから第２の動きベクトルを検出する第２の片
側予測動き検出部１００４とを含む。The interpolated motion vector predictor further comprises:
A first one-sided predictive motion for obtaining the first motion vector mva by receiving the template block pixel data from the template data storage unit 1000 and the search area data (search window data) from the first search area data storage unit 1001. Detection unit 1003 and storage unit 1000
A second one-sided predictive motion detection unit 1004 that detects a second motion vector from the template block pixel data from the first search area data storage unit 1002 and the search area data read from the second search area data storage unit 1002.

【０２９７】第１の片側予測動き検出部１００３は、第
１ないし第３の実施例において利用された要素プロセサ
を含み、与えられたサーチエリアデータとテンプレート
ブロックデータとのブロックマッチング処理を行なって
動きベクトルｍｖａおよび評価値ｅｖａを求める第１の
演算処理回路１０１１と、第１の演算処理回路１０１１
からの、算出された動きベクトルに対応するサーチウイ
ンドーブロック画素データを格納する第１のバッファメ
モリ１０１３を含む。The first one-sided predictive motion detector 1003 includes the element processors used in the first to third embodiments, and performs block matching processing between given search area data and template block data to perform motion. A first arithmetic processing circuit 1011 for obtaining a vector mva and an evaluation value eva, and a first arithmetic processing circuit 1011.
Includes a first buffer memory 1013 for storing search window block pixel data corresponding to the calculated motion vector from

【０２９８】このバッファメモリ１０１３は先の第２の
実施例で示された詳細精度の動きベクトル検出のために
用いられるものと同様の構成を有する。すなわち、第１
のバッファメモリ１０１３は、第１の演算処理回路１０
１１から順次シフトアウトされるサーチウインドー画素
データを格納する。第１のバッファメモリ１０１３はサ
ーチエリアの画素データのすべてを格納する記憶容量を
有してもよく、１つのサーチウインドーブロック画素デ
ータを格納する記憶容量を有していてもよい。このバッ
ファメモリ１０１３への画素データの書込および読出
は、たとえば図４６に示す制御系を利用して実行され
る。ただし、整数精度での動きベクトル検出のために
は、サーチウインドーブロック周辺画素は利用する必要
がない。The buffer memory 1013 has the same structure as that used for the motion vector detection with the detailed accuracy shown in the second embodiment. That is, the first
Of the first arithmetic processing circuit 10
The search window pixel data sequentially shifted out from 11 is stored. The first buffer memory 1013 may have a storage capacity for storing all the pixel data in the search area, or may have a storage capacity for storing one search window block pixel data. Writing and reading of pixel data to and from buffer memory 1013 is executed by using a control system shown in FIG. 46, for example. However, in order to detect a motion vector with integer precision, it is not necessary to use pixels around the search window block.

【０２９９】第２の片側予測動き検出部１００４は、第
１の片側動き予測検出部１００３と同様の構成を有し、
第２の演算処理回路１０１２と、第２のバッファメモリ
１０１４とを含む。第２の演算処理回路１０１２は、与
えられたサーチエリア画素データとテンプレートブロッ
ク画素データとからブロックマッチング処理により動き
ベクトルｍｖｂとその評価値ｅｖｂを求める。The second one-sided motion estimation / detection unit 1004 has the same configuration as the first one-sided motion estimation / detection unit 1003.
It includes a second arithmetic processing circuit 1012 and a second buffer memory 1014. The second arithmetic processing circuit 1012 obtains a motion vector mvb and its evaluation value evb by block matching processing from the given search area pixel data and template block pixel data.

【０３００】第２のバッファメモリ１０１４は、第２の
演算処理回路１０１２からのサーチウインドー画素デー
タを受け、動きベクトルｍｖｂに対応するサーチウイン
ドーブロック画素データを格納する。第２のバッファメ
モリ１０１４は、また、サーチエリアの全画素データを
格納してもよく、また動きベクトルｍｖｂに対応するサ
ーチウインドーブロック画素データのみを格納出力する
ように構成されてもよい。The second buffer memory 1014 receives the search window pixel data from the second arithmetic processing circuit 1012 and stores the search window block pixel data corresponding to the motion vector mvb. The second buffer memory 1014 may also store all pixel data in the search area, or may be configured to store and output only the search window block pixel data corresponding to the motion vector mvb.

【０３０１】第１および第２のバッファメモリ１０１３
および１０１４から、動きベクトルｍｖａおよびｍｖｂ
に対応するサーチウインドーブロック画素データが部分
参照画像データＲＰａおよびＲＰｂとして読出されて動
きベクトルｍｖａ、ｍｖｂおよび評価値ｅｖａ，ｅｖｂ
とともに内挿予測動き検出部１００５へ与えられる。First and second buffer memories 1013
And 1014 from motion vectors mva and mvb
The search window block pixel data corresponding to is read as the partial reference image data RPa and RPb, and the motion vectors mva and mvb and the evaluation values eva and evb are read.
It is also given to the interpolated predicted motion detection unit 1005.

【０３０２】第１および第２の演算処理回路１０１１お
よび１０１２は、先の第１ないし第３の実施例に示した
構成と同様、プロセサアレイ、総和部および比較部を有
しており、先の実施例と同様の手法に従ってそれぞれ並
列態様で評価値算出および動きベクトル検出を行なう。
第１および第２のバッファメモリ１０１３および１０１
４は、各々第１および第２の演算処理回路１０１１およ
び１０１２からシフトアウトされたサーチウインドー画
素データを格納する。The first and second arithmetic processing circuits 1011 and 1012 have a processor array, a summation unit and a comparison unit, as in the configurations shown in the first to third embodiments. Evaluation values are calculated and motion vectors are detected in parallel in the same manner as in the embodiment.
First and second buffer memories 1013 and 101
Reference numeral 4 stores the search window pixel data shifted out from the first and second arithmetic processing circuits 1011 and 1012, respectively.

【０３０３】第１および第２のバッファメモリ１０１３
および１０１４は第１および第２の演算処理回路１０１
１および１０１２それぞれに含まれる比較部からの制御
信号に応答して動きベクトルの候補となる変位ベクトル
に対するサーチウインドーブロック画素データを格納す
る（記憶容量がサーチウインドーブロックに対応する場
合）。比較部において変位ベクトルの更新の有無判別時
に発生される信号をこれらの第１および第２のバッファ
メモリ１０１３および１０１４に対するデータ書込制御
信号として利用する。対応の処理回路からはサーチウイ
ンドーブロックの先頭画素となるデータがシフトアウト
されるため、この比較部からの制御信号に応答してアド
レス信号のリセットを行なえば、常に、動きベクトル候
補となる変位ベクトルに対応するサーチウインドーブロ
ックの画素データを格納することができる。First and second buffer memories 1013
And 1014 are the first and second arithmetic processing circuits 101.
In response to the control signals from the comparators included in 1 and 1012, the search window block pixel data for the displacement vector which is the candidate of the motion vector is stored (when the storage capacity corresponds to the search window block). A signal generated when the comparison unit determines whether or not the displacement vector is updated is used as a data write control signal for the first and second buffer memories 1013 and 1014. Since the data that becomes the first pixel of the search window block is shifted out from the corresponding processing circuit, if the address signal is reset in response to the control signal from this comparison unit, the displacement that is a motion vector candidate will always be generated. Pixel data of a search window block corresponding to a vector can be stored.

【０３０４】データ読出時には、動きベクトル検出サイ
クル完了後、先頭アドレス（リセットアドレス）からデ
ータの読出が行なわれる。この制御の方法は形図４６な
いし図４８を参照して説明したものと同様である（図４
７に示すブロック４８をサーチウインドー画素ブロック
と見放せばよい。周辺画素のための遅延段は不要であ
る）。At the time of data reading, after the motion vector detection cycle is completed, the data is read from the head address (reset address). This control method is the same as that described with reference to FIGS. 46 to 48 (FIG. 4).
The block 48 shown in 7 may be overlooked as a search window pixel block. No delay stage is needed for the surrounding pixels).

【０３０５】第１および第２のバッファメモリ１０１３
および１０１４がサーチエリアの全画素データを格納す
る構成の場合には、データ読出時に動きベクトルの値を
先頭アドレスとして利用すればよい（動きベクトルがカ
ウンタのカウント値で表現される場合：図４３参照）。
アドレス飛越し動作を行なえば、不要なサイドウインド
ー画素データの読出は防止することができる。First and second buffer memories 1013
And 1014 store all pixel data in the search area, the value of the motion vector may be used as the start address when reading the data (when the motion vector is represented by the count value of the counter: see FIG. 43). ).
By performing the address jump operation, it is possible to prevent unnecessary reading of the side window pixel data.

【０３０６】第１および第２の片側予測動き検出部１０
０３および１００４の各々の動作は第１ないし第３の実
施例において説明したものと同様であり、その詳細動作
は説明しない。次に、内挿予測動き検出部１００５の構
成および動作について説明する。内挿動きベクトルの検
出時にバッファメモリ内の部分参照画像データが読出さ
れる。この読出された部分参照画像データが内挿予測値
生成のための参照画像生成に際して利用される。First and second one-sided predictive motion detector 10
The operation of each of 03 and 1004 is similar to that described in the first to third embodiments, and detailed operation thereof will not be described. Next, the configuration and operation of the interpolated prediction motion detection unit 1005 will be described. The partial reference image data in the buffer memory is read when the interpolation motion vector is detected. The read partial reference image data is used when generating the reference image for generating the interpolated predicted value.

【０３０７】図７６は図７５に示す内挿予測動き検出部
の構成を概略的に示す図である。図７６を参照して、内
挿予測動き検出部１００５は、第１および第２のバッフ
ァメモリ（図７５参照）から読出された部分参照画像デ
ータ（動きベクトルｍｖａおよびｍｖｂに対応する第１
および第２のサーチエリア内のサーチウインドーブロッ
ク画素データ）を受けて、内挿予測に必要とされる参照
画像（合成サーチウインドーブロックと以下称す）を生
成する内挿予測用参照画像生成回路１０２０と、テンプ
レートブロック画素データＴＰを格納するバッファメモ
リ１０２１と、内挿予測用参照画像生成回路１０２０か
らの合成サーチウインドーブロック画素データとバッフ
ァメモリ１０２１から読出されたテンプレートブロック
画素データとのブロックマッチング（合成サーチウイン
ドーブロックとテンプレートブロックとの相関度を示す
評価値の算出）を行なって、内挿予測評価値ｅｖｃを生
成する内挿予測動きベクトル検出演算部１０２２と、内
挿予測動きベクトル検出演算部１０２２からの内挿予測
評価値ｅｖｃと第１および第２の演算処理回路から与え
られた第１および第２の評価値ｅｖａおよびｅｖｂとか
ら最適な動きベクトルを検出する動きベクトル判定回路
１０２４とを含む。FIG. 76 is a diagram schematically showing the structure of the interpolative prediction motion detection unit shown in FIG. Referring to FIG. 76, interpolation prediction motion detection section 1005 determines partial reference image data (first motion vector corresponding to motion vectors mva and mvb) read from the first and second buffer memories (see FIG. 75).
And a search window block pixel data in the second search area) to generate a reference image (hereinafter referred to as a synthetic search window block) required for interpolation prediction. 1020, a buffer memory 1021 for storing template block pixel data TP, a block search matching between the synthetic search window block pixel data from the interpolation prediction reference image generation circuit 1020 and the template block pixel data read from the buffer memory 1021. (Calculation of evaluation value indicating degree of correlation between combined search window block and template block) is performed to generate interpolation prediction evaluation value evc, interpolation prediction motion vector detection calculation unit 1022, and interpolation prediction motion vector detection The interpolation prediction evaluation value evc from the arithmetic unit 1022 and the And a motion vector determination circuit 1024 for detecting an optimum motion vector from the first and second evaluation values eva and evb given from the second arithmetic processing circuit.

【０３０８】バッファメモリ１０２１は、図７５に示す
テンプレートデータ記憶部１０００からのテンプレート
データの第１および第２の演算処理回路１０１１および
１０１２へのロード時に並行して必要なテンプレート画
素データを格納する。バッファメモリ１０２１からのデ
ータの読出は第１および第２のバッファメモリ１０１３
および１０１４（図７５参照）からのデータ読出と同期
して行なわれる。The buffer memory 1021 stores necessary template pixel data in parallel when the template data from the template data storage unit 1000 shown in FIG. 75 is loaded into the first and second arithmetic processing circuits 1011 and 1012. The reading of data from the buffer memory 1021 is performed by the first and second buffer memories 1013.
And 1014 (see FIG. 75), the data reading is performed in synchronization with the data reading.

【０３０９】内挿予測用参照画像生成回路１０２０は、
与えられた画素データＲＰａおよびＲＰｂに対し所定の
演算を実行する。この演算は、（ＲＰａ＋ＲＰｂ）／２
のような算術平均であってもよい。また、（ｍ・ＲＰａ
＋ｎ・ＲＰｂ）／（ｍ＋ｎ）のような加重平均が行なわ
れてもよい。重みｍおよびｎは、現フレームと第１およ
び第２の参照フレーム（第１および第２のサーチエリア
をそれぞれ含む）各々との時間距離により決定される。
時間距離が小さい参照フレームに対する重みが大きくさ
れる。現クレームとの変化が、時間距離が小さくなれば
小さくなるためである。The interpolation prediction reference image generation circuit 1020 is
A predetermined calculation is performed on the given pixel data RPa and RPb. This calculation is (RPa + RPb) / 2
It may be an arithmetic mean such as. In addition, (m ・ RPa
A weighted average such as + n · RPb) / (m + n) may be performed. The weights m and n are determined by the time distance between the current frame and each of the first and second reference frames (including the first and second search areas, respectively).
The weight is increased for reference frames with a small time distance. This is because the change from the current claim becomes smaller as the time distance becomes smaller.

【０３１０】内挿予測動きベクトル検出演算部１０２２
は、内挿予測用参照画像生成回路１０２０からの合成サ
ーチウインドー画素データＲＰｃとテンプレート画素デ
ータＴＰとのたとえば差分絶対値和を求めて評価値ｅｖ
ｃを生成する。差分絶対値和演算の代わりとして、差分
二乗和などの演算が用いられてもよい。ブロックの相関
度を示す評価値が得られる演算であればよい。内挿予測
用参照画像生成回路１０２０からの合成参照画像画素デ
ータＲＰｃおよびバッファメモリ１０２１からのテンプ
レートブロック画素データは１画素ずつ演算部１０２２
へ与えられて順次所定の演算が実行される。この演算の
終了時に評価値ｅｖｃが生成される。Interpolation prediction motion vector detection calculation unit 1022
Is the sum of absolute differences between the synthetic search window pixel data RPc from the interpolation prediction reference image generation circuit 1020 and the template pixel data TP, and the evaluation value ev
produces c. As an alternative to the sum of absolute differences calculation, a calculation such as the sum of squared differences may be used. It may be any calculation as long as it can obtain an evaluation value indicating the degree of correlation of blocks. The synthetic reference image pixel data RPc from the interpolation prediction reference image generation circuit 1020 and the template block pixel data from the buffer memory 1021 are calculated pixel by pixel in the calculation unit 1022.
Is given to and a predetermined calculation is sequentially executed. At the end of this calculation, the evaluation value evc is generated.

【０３１１】動きベクトル判定回路１０２４は、演算部
１０２２からの評価値ｅｖｃと第１および第２の処理回
路（図７５参照）からの評価値ｅｖａおよびｅｖｂの大
きさを比較し、最小の評価値に対応する動きベクトルを
最適な内挿予測のための動きベクトルＭＶとして決定し
て出力する。The motion vector determination circuit 1024 compares the evaluation value evc from the calculation unit 1022 with the evaluation values eva and evb from the first and second processing circuits (see FIG. 75) to determine the minimum evaluation value. The motion vector corresponding to is determined and output as the motion vector MV for optimum interpolation prediction.

【０３１２】判定回路１０２４は、演算部１０２２から
の評価値ｅｖｃが最小のときには動きベクトルｍｖａお
よびｍｖｂを順次所定の順序で（または並列に）出力す
る。判定回路１０２４は、最適な動きベクトルＭＶに対
応する評価値ＥＶをも合わせて出力するように構成され
てもよい。The determination circuit 1024 sequentially outputs the motion vectors mva and mvb in a predetermined order (or in parallel) when the evaluation value evc from the arithmetic unit 1022 is the minimum. The determination circuit 1024 may be configured to also output the evaluation value EV corresponding to the optimum motion vector MV.

【０３１３】上述の構成により、参照フレーム画像を読
出して内挿予測用動きベクトルを検出することが可能と
なるため、参照画像を格納するフレームメモリへアクセ
スする必要がなくなり、高速で内挿予測用動きベクトル
を算出することができる。動きベクトルＭＶの検出とと
もにバッファメモリ１０１３および１０１４の部分参照
画像が出力され予測符号化のために利用されてもよい。With the above-described structure, it is possible to read out the reference frame image and detect the motion vector for interpolation prediction. Therefore, it is not necessary to access the frame memory for storing the reference image, and it is possible to perform interpolation prediction at high speed. The motion vector can be calculated. The partial reference images of the buffer memories 1013 and 1014 may be output together with the detection of the motion vector MV and used for predictive coding.

【０３１４】なお、第１および第２のサーチエリアの画
素データを内挿予測用参照画像生成回路１０２０により
合成（内挿）し、合成画素データとテンプレートブロッ
ク画素データをプロセサアレイ（第１または第２の演算
処理回路と同様の構成を有する）へ投入して動きベクト
ルを検出してもよい。第１の片側予測動き検出部１００
３を順方向予測のための動きベクトル検出、第２の片側
予測動き検出部１００４を逆方向予測のための動きベク
トル検出のために利用する。内挿予測（双方向予測）の
ための動きベクトル検出を装置規模を増加させることな
く実現することができる。次に各部の構成について説明
する。The pixel data in the first and second search areas are combined (interpolated) by the reference image generation circuit for interpolation prediction 1020, and the combined pixel data and the template block pixel data are combined in the processor array (first or first). 2 has the same configuration as the arithmetic processing circuit 2) to detect the motion vector. First one-sided prediction motion detection unit 100
3 is used for motion vector detection for forward prediction, and the second one-sided prediction motion detection unit 1004 is used for motion vector detection for backward prediction. Motion vector detection for interpolation prediction (bidirectional prediction) can be realized without increasing the device scale. Next, the configuration of each part will be described.

【０３１５】図７７は、図７６に示す内挿予測用参照画
像生成回路および内挿予測動きベクトル検出演算部の構
成を示す図である。内挿予測用参照画像生成回路１０２
０は、部分参照画像ＲＰａおよびＲＰｂの画素データＰ
ＸａおよびＰＸｂの平均値を求める平均化回路１０３０
を含む。平均化回路１０３０は、演算（ｖ・ＰＸａ＋ｗ
・ＰＸｂ）／（ｖ＋ｗ）を実行する。ｖ＝ｗであっても
よく、またｖおよびｗは現フレーム（テンプレートブロ
ック）に対する各参照画像フレームの時間距離に応じて
定められるものであってもよい。平均化回路１０３０
は、乗算回路（係数ｖおよびｗの乗算）、加算回路およ
び割算回路で構成されてもよく、またＲＯＭ（データＰ
ＸａおよびＰＸｂをアドレスとする）のようなテーブル
を用いて構成されてもよい。FIG. 77 is a diagram showing the structures of the interpolation prediction reference image generation circuit and the interpolation prediction motion vector detection calculation section shown in FIG. Reference image generation circuit 102 for interpolation prediction
0 is the pixel data P of the partial reference images RPa and RPb.
Averaging circuit 1030 for obtaining the average value of Xa and PXb
including. The averaging circuit 1030 calculates (v · PXa + w
-Execute PXb) / (v + w). v = w may be set, or v and w may be set according to the time distance of each reference image frame with respect to the current frame (template block). Averaging circuit 1030
May be composed of a multiplication circuit (multiplication of coefficients v and w), an addition circuit and a division circuit, and a ROM (data P
Xa and PXb are used as addresses).

【０３１６】演算部１０２２は、平均化回路１０３０か
らの合成画素データＰＸｃとテンプレートブロック画素
データＴＰａとの差分絶対値｜ＴＰａ−ＰＸｃ｜を求め
る部分絶対値回路１０３１と、この差分絶対値回路１０
３１の出力を累算して評価値ｅｖｃを生成する累算器１
０３２を含む。演算部１０２２に用いられる差分絶対値
回路１０３１には、先に第１の実施例において説明した
もの（図２６参照）と同様の構成を利用することができ
る。累算器１０３２には通常のレジスタと加算器とを含
む構成を利用することができる。この演算部１０２２
は、差分二乗和回路で構成されてもよい。The computing unit 1022 obtains the absolute difference value | TPa-PXc | between the composite pixel data PXc from the averaging circuit 1030 and the template block pixel data TPa, and the absolute difference circuit 1031.
Accumulator 1 that accumulates the outputs of 31 to generate the evaluation value evc
Including 032. The same configuration as that described in the first embodiment (see FIG. 26) can be used for the difference absolute value circuit 1031 used in the calculation unit 1022. The accumulator 1032 can use a structure including a normal register and an adder. This operation unit 1022
May be composed of a sum of squared differences circuit.

【０３１７】図７８は、図７６に示す動きベクトル判定
部１０２４の構成を示す図である。図７８において、動
きベクトル判定部１０２４は、評価値ｅｖａ、ｅｖｂお
よびｅｖｃのうちの最小値ｍｉｎ｛ｅｖａ，ｅｖｂ，ｅ
ｖｃ｝を求める最小値回路１０３５と、最小値回路１０
３５の最小値情報に応答して動きベクトルｍｖａおよび
ｍｖｂの一方または両方を選択する選択回路１０３６を
含む。選択回路１０３６は、最小値回路１０３５が評価
値ｅｖａまたはｅｖｂを選択したときには、対応の動き
ベクトルｍｖａまたはｍｖｂを選択し、最小値回路１０
３５が評価値ｅｖｃを選択したときには動きベクトルｍ
ｖａおよびｍｖｂ両者を選択する。FIG. 78 shows a structure of motion vector determination unit 1024 shown in FIG. 76. In FIG. 78, the motion vector determination unit 1024 determines that the minimum value min {eva, evb, e of the evaluation values eva, evb, and evc.
vc} minimum value circuit 1035 and minimum value circuit 10
A selection circuit 1036 for selecting one or both of the motion vectors mva and mvb in response to the minimum value information of 35. When the minimum value circuit 1035 selects the evaluation value eva or evb, the selection circuit 1036 selects the corresponding motion vector mva or mvb, and the minimum value circuit 1036.
When 35 selects the evaluation value evc, the motion vector m
Select both va and mvb.

【０３１８】図７９は図７８に示す最小値回路および選
択回路のより詳細構成を示す図である。図７９におい
て、最小値回路１０３５は、評価値ｅｖａおよびｅｖｂ
を各々一時的に格納するレジスタ１０４１および１０４
２と、レジスタ１０４１および１０４２に格納された評
価値ｅｖａおよびｅｖｂの大きさを比較する比較器１０
４３と、比較器１０４３の出力に応答してレジスタ１０
４１および１０４２に格納された評価値ｅｖａおよびｅ
ｖｂの一方を通過させる選択回路１０４４を含む。FIG. 79 shows a more detailed structure of the minimum value circuit and the selection circuit shown in FIG. 78. 79. In FIG. 79, the minimum value circuit 1035 has the evaluation values eva and evb.
Registers 1041 and 104 for temporarily storing
2 is compared with the evaluation values eva and evb stored in the registers 1041 and 1042.
43 and the register 10 in response to the output of the comparator 1043.
Evaluation values eva and e stored in 41 and 1042
It includes a selection circuit 1044 for passing one of vb.

【０３１９】比較器１０４３は、評価値ｅｖａが評価値
ｅｖｂよりも大きい場合には第１の論理レベル（たとえ
ば“Ｈ”）の信号を出力し、そうでない場合には第２の
論理レベル（たとえば“Ｌ”）の信号を出力する。選択
回路１０４４は、第１の論理レベルの信号を受けたとき
にはレジスタ１０４２からの比較値ｅｖｂを選択し、第
２の論理レベルの信号を受けたときにはレジスタ１０４
１からの評価値ｅｖａを選択する。したがって、選択回
路１０４４からは評価値ｅｖａおよびｅｖｂのうちの小
さな評価値ＭＩＮ｛ｅｖａ，ｅｖｂ｝が出力される。Comparator 1043 outputs a signal of a first logic level (eg, "H") when evaluation value eva is larger than evaluation value evb, and a second logic level (eg, "H") otherwise. "L") signal is output. The selection circuit 1044 selects the comparison value evb from the register 1042 when receiving the signal of the first logic level, and the register 104 when receiving the signal of the second logic level.
An evaluation value eva from 1 is selected. Therefore, selection circuit 1044 outputs a smaller evaluation value MIN {eva, evb} of evaluation values eva and evb.

【０３２０】最小値回路１０３５はさらに、評価値ｅｖ
ｃ（内挿予測のための評価値）を一時的に格納するレジ
スタ１０４５と、選択回路１０４４の出力とレジスタ１
０４５からの評価値ｅｖｃとを比較する比較器１０４６
と、比較器１０４６の出力に応答して選択回路１０４４
の出力とレジスタ１０４５の格納する評価値ｅｖｃの一
方を通過させる選択回路１０４７を含む。The minimum value circuit 1035 further has an evaluation value ev.
The register 1045 for temporarily storing c (evaluation value for interpolation prediction), the output of the selection circuit 1044, and the register 1
Comparator 1046 for comparing with the evaluation value evc from 045
And a selection circuit 1044 in response to the output of the comparator 1046.
And a selection circuit 1047 for passing one of the evaluation value evc stored in the register 1045.

【０３２１】比較器１０４６は、選択回路１０４４の出
力が評価値ｅｖｃよりも大きいとき第１の論理レベルを
信号を出力し、そうでないときには第２の論理レベルの
信号を出力する。選択回路１０４７は、第１の論理レベ
ルの信号を比較器１０４６から与えられるとレジスタ１
０４５からの評価値ｅｖｃを選択し、第２の論理レベル
の信号が与えられると選択回路１０４４の出力を選択す
る。すなわち、選択回路１０４７からはＭＩＮ｛ｅｖ
ｃ，ＭＩＮ（ｅｖａ，ｅｖｂ）｝＝ＭＩＮ（ｅｖａ，ｅ
ｖｂ，ｅｖｃ）となる最小の評価値が評価値ＥＶとして
出力される。Comparator 1046 outputs a signal of the first logic level when the output of selection circuit 1044 is larger than evaluation value evc, and outputs a signal of the second logic level otherwise. The selection circuit 1047 receives the signal of the first logic level from the comparator 1046 and then receives the signal from the register 1
The evaluation value evc from 045 is selected, and when the signal of the second logic level is given, the output of the selection circuit 1044 is selected. That is, MIN {ev
c, MIN (eva, evb)} = MIN (eva, e
The minimum evaluation value of vb, evc) is output as the evaluation value EV.

【０３２２】比較器１０４３および１０４６およびにお
いて、両入力の値が等しいときには予め定められた優先
順位に従って一方が選択される構成が利用されてもよ
い。In comparators 1043 and 1046, a configuration may be used in which when the values of both inputs are equal, one of them is selected in accordance with a predetermined priority order.

【０３２３】選択回路１０３６は、動きベクトルｍｖａ
およびｍｖｂを各々一時的に格納するレジスタ１０５０
および１０５１と、最小値回路１０３５内の比較器１０
４３の出力に応答してレジスタ１０５０および１０５１
の一方の格納データを通過させる選択回路１０５２と、
比較器１０４６の出力に応答して選択的に選択回路１０
５２の出力を通過させるゲート回路１０５３と、比較器
１０４６の出力に応答してレジスタ１０５０および１０
５１からの動きベクトルｍｖａおよびｍｖｂを順次また
は並列に出力するレジスタ１０５４とを含む。The selection circuit 1036 uses the motion vector mva.
And a register 1050 for temporarily storing mvb and mvb respectively
And 1051 and the comparator 10 in the minimum value circuit 1035.
43 in response to the output of 43
A selection circuit 1052 for passing one of the stored data,
Select circuit 10 is selectively operated in response to the output of comparator 1046.
Gate circuit 1053 for passing the output of 52 and registers 1050 and 10 in response to the output of comparator 1046.
And a register 1054 for outputting the motion vectors mva and mvb from 51 sequentially or in parallel.

【０３２４】選択回路１０５２は、選択回路１０４４と
同様の動作を行ない、選択された（小さい方の）評価値
に対応する動きベクトルを通過させる。ゲート回路１０
５３は、比較器１０４６が評価値ｅｖａを選択する第１
の論理レベルの信号を出力すると出力ハイインピーダン
ス状態となり、第２の論理レベルの信号が比較器１０４
６から与えられると選択回路１０５２の出力を通過させ
る。Select circuit 1052 performs the same operation as select circuit 1044, and passes the motion vector corresponding to the selected (smaller) evaluation value. Gate circuit 10
53 is a first value for the comparator 1046 to select the evaluation value eva.
When the signal of the logical level is output, the output becomes a high impedance state, and the signal of the second logical level is output from the comparator 104.
When it is given from 6, the output of the selection circuit 1052 is passed.

【０３２５】レジスタ１０５４はゲート回路１０５３と
相補的に動作し、第１の論理レベルの信号が与えられる
とレジスタ１０５０および１０５１からの動きベクトル
ｍｖａおよびｍｖｂをともに並列または直列に出力し、
第２の論理レベルの信号が与えられると出力ハイインピ
ーダンス状態となる。The register 1054 operates complementarily to the gate circuit 1053, and outputs the motion vectors mva and mvb from the registers 1050 and 1051 in parallel or in series when a signal of the first logic level is applied,
When the signal of the second logic level is applied, the output becomes in the high impedance state.

【０３２６】図８０は、内挿予測用参照画像生成回路の
変更例を示す図である。この図８０に示す内挿予測用参
照画像生成回路１０２２は、図７７に示す構成に加えて
さらに、第１および第２の演算処理回路から与えられる
動きベクトルｍｖａおよびｍｖｂの平均値を求める平均
化回路１０６０を含む。平均化回路１０６０は図７７に
示す画素データの平均値を求める平均化回路と同様の構
成を備える。内挿予測用参照画像に応じて動きベクトル
も重み付けされる。FIG. 80 is a diagram showing a modification of the interpolation prediction reference image generating circuit. The reference image generation circuit for interpolation prediction 1022 shown in FIG. 80 has the configuration shown in FIG. 77, and further, averaging for obtaining the average value of the motion vectors mva and mvb given from the first and second arithmetic processing circuits. A circuit 1060 is included. The averaging circuit 1060 has the same configuration as the averaging circuit for obtaining the average value of the pixel data shown in FIG. The motion vector is also weighted according to the reference image for interpolation prediction.

【０３２７】選択回路１０３６は、動きベクトルｍｖ
ａ，ｍｖｂおよびｍｖｃのうち最小値回路で求められた
最小の評価値に対応する動きベクトルを選択する。この
図８０に示す選択回路の構成は図７９に示すレジスタ１
０５４を比較器１０４６出力により選択的に通過させる
ゲート回路により実現される。The selection circuit 1036 uses the motion vector mv.
A motion vector corresponding to the minimum evaluation value obtained by the minimum value circuit is selected from a, mvb, and mvc. The configuration of the selection circuit shown in FIG. 80 is the register 1 shown in FIG.
054 is selectively realized by the output of the comparator 1046.

【０３２８】図８１はこの発明の第４の実施例である内
挿予測動きベクトル検出装置の変更例の構成を概略的に
示す図である。図８１においては、図７５に示す第１お
よび第２のバッファメモリ１０１３および１０１４は動
きベクトルｍｖａおよびｍｖｂに対応する第１および第
２のサーチウインドーブロック１１０２および１１０４
を含む部分参照画像データをそれぞれ格納する。この部
分参照画像ブロック１１０１および１１０３は、それぞ
れ第１および第２のサーチウインドーブロック１１０２
および１１０４とその周辺画素を含む。第１および第２
のバッファメモリ１０１３および１０１４にこのような
部分参照画素ブロック１１０２および１１０４を格納す
るための構成としては第３の実施例において分数精度の
動きベクトル算出のために用いられた構成を利用するこ
とができる。FIG. 81 is a diagram schematically showing the configuration of a modification of the interpolated motion vector predictor detecting apparatus according to the fourth embodiment of the present invention. In FIG. 81, first and second buffer memories 1013 and 1014 shown in FIG. 75 correspond to motion vectors mva and mvb, and first and second search window blocks 1102 and 1104.
Each of the partial reference image data including is stored. The partial reference image blocks 1101 and 1103 are respectively the first and second search window blocks 1102.
And 1104 and its surrounding pixels. First and second
As the configuration for storing such partial reference pixel blocks 1102 and 1104 in the buffer memories 1013 and 1014 of the above, the configuration used for the fraction-precision motion vector calculation in the third embodiment can be used. .

【０３２９】内挿予測用参照画像生成回路において、部
分参照画像１１０３および１１０１の平均化処理を行な
って内挿予測用参照画像１１０５を生成し、この合成参
照画像１１０５をサーチエリアとしてテンプレートブロ
ック１１０７の動きベクトル求める。すなわち、合成参
照画像１１０５内の各合成サーチウインドーブロック１
１０６に対するブロックマッチングによる評価値の算出
を行ない、算出された評価値に基づいてこのテンプレー
トブロック１０１７に対する動きベクトルを算出する。In the interpolation prediction reference image generation circuit, the partial reference images 1103 and 1101 are averaged to generate an interpolation prediction reference image 1105, and the synthesized reference image 1105 is used as a search area in the template block 1107. Find the motion vector. That is, each synthetic search window block 1 in the synthetic reference image 1105.
An evaluation value is calculated by block matching for 106, and a motion vector for this template block 1017 is calculated based on the calculated evaluation value.

【０３３０】この方法においては、バッファメモリ１０
１３および１０１４ならびに１０２１から１画素ずつデ
ータが読出されて合成サーチウインドーの生成および評
価値の算出が行なわれてもよい。内挿予測のためのフレ
ームメモリへのアクセスが必要はない。したがって、内
挿予測による予測信号算出動作と並行して動きベクトル
の算出を行なうことができ、高速の内挿動き補償を行な
うことができる。In this method, the buffer memory 10
Data may be read pixel by pixel from 13 and 1014 and 1021 to generate a synthetic search window and calculate an evaluation value. There is no need to access the frame memory for interpolation prediction. Therefore, the motion vector can be calculated in parallel with the prediction signal calculation operation by the interpolation prediction, and the high-speed interpolation motion compensation can be performed.

【０３３１】この図８１に示す構成に変えてさらに実施
例１ないし３に示したプロセサアレイを含む処理装置に
より評価値算出が行なわれてもよい。テンプレートブロ
ック画素データをプロセサアレイに格納しておき、第１
および第２のバッファメモリ１０１３および１０１４か
らの画素データの平均化により得られる合成（内挿）画
素データをプロセサアレイに導入する。動きベクトルお
よび評価値が求められると、判定部において評価時ｅｖ
ｃが第１および第２の処理装置１００３および１００４
からの評価値ｅｖａおよびｅｖｂと比較され、この比較
結果に従って動きベクトルの決定が行なわれる。Instead of the configuration shown in FIG. 81, the evaluation value may be calculated by the processing device including the processor array shown in the first to third embodiments. The template block pixel data is stored in the processor array, and the first
And introducing the synthesized (interpolated) pixel data obtained by averaging the pixel data from the second buffer memories 1013 and 1014 into the processor array. When the motion vector and the evaluation value are obtained, the determination unit ev at the time of evaluation
c is the first and second processing devices 1003 and 1004
Are compared with the evaluation values eva and evb from, and the motion vector is determined according to the comparison result.

【０３３２】さらに、図８２に示すように、テンプレー
トブロック画素データは片側予測動き検出部１００３ま
たは１００４の演算処理回路１０１１または１０１２
（図８２においては演算処理回路１０１１）からシフト
アウトされたものがそのまま内挿予測動き検出部１００
５へ与えられてもよい。この構成の場合、図８３に示す
ように、片側予測動き検出と内挿予測動き検出とをパイ
プライン的に実行することができる。Further, as shown in FIG. 82, the template block pixel data is processed by the arithmetic processing circuit 1011 or 1012 of the one-sided prediction motion detecting unit 1003 or 1004.
What is shifted out from (the arithmetic processing circuit 1011 in FIG. 82) is as it is, the interpolated prediction motion detection unit 100.
May be given to 5. In the case of this configuration, as shown in FIG. 83, one-sided prediction motion detection and interpolation prediction motion detection can be executed in a pipeline manner.

【０３３３】図８３において、内挿予測動き検出（ｉ）
では、片側予測動き検出に用いられたテンプレートブロ
ックＴＰ（ＴＰ１およびＴＰ２）のアンロード（シフト
アウト）と並行して内挿予測の動き検出が実行される。
第１および第２のバッファメモリ１０１３および１０１
４には必要なサーチウインドーブロック画素データが格
納されている。したがってバッファメモリ１０１３およ
び１０１４から、この処理回路からのテンプレートブロ
ック画素データのシフトアウトと同期してデータを読出
すことにより図８３に示すパイプライン動作を実現する
ことができる。In FIG. 83, interpolation prediction motion detection (i)
Then, the motion detection of the interpolative prediction is executed in parallel with the unloading (shift out) of the template blocks TP (TP1 and TP2) used for the one-sided motion estimation.
First and second buffer memories 1013 and 101
4 stores necessary search window block pixel data. Therefore, the pipeline operation shown in FIG. 83 can be realized by reading the data from buffer memories 1013 and 1014 in synchronization with the shift-out of the template block pixel data from this processing circuit.

【０３３４】この構成では、内挿予測動き検出部１００
５内に設けられたバッファメモリはタイミング調整用の
バッファとして機能することになる。With this configuration, the interpolated prediction motion detection unit 100
The buffer memory provided in 5 functions as a buffer for timing adjustment.

【０３３５】図８３に示す内挿動き検出（ｉｉ）におい
ては、片側予測動き検出におけるテンプレートブロック
の処理と並行して内挿予測動き検出が実行される。バッ
ファメモリ内にテンプレートブロックデータを格納して
処理を実行する。テンプレートブロックＴＰのアンロー
ド時に内挿予測動き検出部のバッファメモリにこの片側
予測動き検出部１００３からアンロード（シフトアウ
ト）されたテンプレートブロック画素データがロードさ
れる。ロード完了後内挿予測動き検出処理が行なわれ
る。内挿予測のための部分参照画像がサーチウインドー
ブロックの周辺画素を含む場合であっても高速で処理を
実行することができる。In the interpolation motion detection (ii) shown in FIG. 83, the interpolation prediction motion detection is executed in parallel with the processing of the template block in the one-sided prediction motion detection. The template block data is stored in the buffer memory and the process is executed. At the time of unloading the template block TP, the template block pixel data that has been unloaded (shifted out) from the one-sided prediction motion detection unit 1003 is loaded into the buffer memory of the interpolation prediction motion detection unit. After the loading is completed, the interpolation prediction motion detection process is performed. Even when the partial reference image for interpolation prediction includes peripheral pixels of the search window block, the processing can be executed at high speed.

【０３３６】図８４は、この発明の第４の実施例におい
て求められた動きベクトルを利用する装置の全体の構成
を概略的に示す図である。図８４においては、動画像の
符号化に用いられるソース符号化回路の構成が示され
る。図８４において、ソース符号化回路は図８７に示す
構成と同様、減算器１１１０、直交変換器（ＤＣＴ）１
１１１、量子化器１１１２、逆量子化器１１１３、逆直
交変換器（ＩＤＣＴ）１１１４および加算器１１１５を
含む。これらの構成要素の機能は図８７に示すものと同
様である。FIG. 84 is a diagram schematically showing the overall structure of an apparatus using the motion vector obtained in the fourth embodiment of the present invention. FIG. 84 shows the configuration of a source encoding circuit used for encoding a moving image. 84, the source encoding circuit has a subtracter 1110 and an orthogonal transformer (DCT) 1 as in the configuration shown in FIG.
111, a quantizer 1112, an inverse quantizer 1113, an inverse orthogonal transformer (IDCT) 1114 and an adder 1115. The functions of these components are similar to those shown in FIG.

【０３３７】ソース符号化回路はさらに、加算器１１１
５の出力をスイッチＳＷ３を介して受けて格納するフレ
ームメモリ１１１６と、フレームメモリ１１１６の格納
データを受けて格納するフレームメモリ１１１７と、フ
レームメモリ１１１６の格納データを用いて逆方向動き
補償を行なう逆方向動き補償回路１１１８と、フレーム
メモリ１１１７の格納データを用いて順方向動き補償を
行なう順方向動き補償回路１１２０と、フレームメモリ
１１１６および１１１７の格納データを用いて内挿予測
動き補償を行なう内挿動き補償回路１１１９を含む。The source encoding circuit further includes an adder 111.
Frame memory 1116 that receives and stores the output of FIG. 5 via the switch SW3, frame memory 1117 that receives and stores the data stored in the frame memory 1116, and reverse motion compensation that uses the data stored in the frame memory 1116 Directional motion compensation circuit 1118, forward motion compensation circuit 1120 that performs forward motion compensation using the data stored in frame memory 1117, and interpolation that performs interpolation predictive motion compensation using the data stored in frame memories 1116 and 1117. A motion compensation circuit 1119 is included.

【０３３８】順方向動き補償回路１１１８は、フレーム
メモリ１１１６に格納された未来のフレームを参照画像
として、現フレームに対するブロックマッチング処理を
行なって動きベクトルｍｖａの算出を行なうとともにこ
の現フレーム画像に対する予測信号を生成する。The forward motion compensation circuit 1118 uses the future frame stored in the frame memory 1116 as a reference image to perform block matching processing on the current frame to calculate a motion vector mva and a prediction signal for the current frame image. To generate.

【０３３９】順方向動き補償回路１１２０は、フレーム
メモリ１１１７の格納する過去のフレームを参照フレー
ムとして利用して、現フレーム画像に対する動きベクト
ルｍｖｂを生成するとともにこの現フレームに対する予
測信号を生成する。The forward motion compensation circuit 1120 uses the past frame stored in the frame memory 1117 as a reference frame to generate a motion vector mvb for the current frame image and a prediction signal for this current frame.

【０３４０】内挿動き補償回路１１１９はフレームメモ
リ１１１６および１１１７から読された画像データを合
成して参照フレーム画像を生成してブロックマッチング
処理を行なって、現フレームに対する動きベクトルＭＶ
を生成するとともにこの現クレーム画像に対する予測信
号を生成する。The interpolation motion compensation circuit 1119 synthesizes the image data read from the frame memories 1116 and 1117 to generate a reference frame image and performs a block matching process to perform a motion vector MV for the current frame.
And a prediction signal for this current claim image.

【０３４１】本実施例では、片側予測動き検出部は補償
回路１１１８〜１１２０で共用される。機能的構成を明
確にするため補償回路１１１８〜１１２０は別々に示
す。In this embodiment, the one-side predictive motion detecting section is shared by the compensation circuits 1118 to 1120. Compensation circuits 1118-1120 are shown separately to clarify the functional configuration.

【０３４２】動き補償回路１１１８〜１１２０の出力
は、スイッチＳＷ４を介して減算器１１１０へ与えられ
る。減算器１１１０は、また、画素再配置回路１１２４
の出力を受ける。減算器１１１０の出力と画素再配置回
路１１２４の出力の一方がスイッチＳＷ１により選択さ
れた直交器１１１１へ与えられる。スイッチＳＷ４の出
力は、また、ＳＷ２を介して加算器１１１５へ与えられ
る。スイッチＳＷ１〜ＳＷ４の端子の選択は制御回路１
１２２により行なわれる。それにより逆方向動き補償、
順方向動き補償および内挿動き補償それぞれに応じて生
成された予測信号に対する誤差信号が新たなフレームに
対する参照画像データを生成することができる。The outputs of the motion compensation circuits 1118 to 1120 are given to the subtractor 1110 via the switch SW4. The subtractor 1110 also includes a pixel rearrangement circuit 1124.
Receive the output of. One of the output of the subtractor 1110 and the output of the pixel rearrangement circuit 1124 is given to the orthogonal unit 1111 selected by the switch SW1. The output of the switch SW4 is also given to the adder 1115 via SW2. The control circuit 1 selects the terminals of the switches SW1 to SW4.
122. This allows backward motion compensation,
The error signal for the prediction signal generated in accordance with the forward motion compensation and the interpolation motion compensation can generate the reference image data for the new frame.

【０３４３】画素再配置回路１１２４が設けられている
のは双方予測すなわち内挿予測を行なうための処理され
るフレームの順序が現画像のフレーム順序と異ならせる
必要があるためである（ＭＰＥＧ標準参照）。The pixel rearrangement circuit 1124 is provided because the order of frames to be processed for performing bi-prediction, that is, interpolation prediction needs to be different from the frame order of the current image (see MPEG standard). ).

【０３４４】数字は、図示されるような装置の構成で
は、画像は、フレーム内補間が行なわれるＩピクチャ
ー、一方方向の予測（フレーム間予測）符号化が行なわ
れるＰピクチャーおよび内挿予測が行なわれるＢピクチ
ャーを含む。符号化処理においては、現フレームシーケ
ンスにおいてＢピクチャーよりも後のＩピクチャーおよ
びＢピクチャーの処理を行ない、次いで、過去および未
来のＩおよびＰピクチャーを用いてＢピクチャーの処理
を行なう。この処理シーケンスを画素再配置回路１１２
４によるフレーム再配置により実現する。スイッチＳＷ
１〜ＳＷ４が処理されるべき画像に応じて信号経路を選
択する。スイッチＳＷ１の画素再配置回路１１２４の出
力１０００を選択するのはＩピクチャーの処理時であ
り、このときスイッチＳＷ２およびＳＷ３は開放状態と
なる。In the configuration of the apparatus as shown in the figure, an image is an I picture for which intra-frame interpolation is performed, a P picture for which one-way prediction (inter-frame prediction) coding is performed, and interpolation prediction. Included B-pictures. In the encoding process, the I picture and the B picture after the B picture in the current frame sequence are processed, and then the B picture is processed using the past and future I and P pictures. This processing sequence is applied to the pixel rearrangement circuit 112.
This is achieved by rearranging the frames according to 4. Switch SW
1 to SW4 select the signal path according to the image to be processed. The output 1000 of the pixel rearrangement circuit 1124 of the switch SW1 is selected at the time of processing an I picture, and at this time, the switches SW2 and SW3 are in an open state.

【０３４５】図８４に示すソース符号化回路に対し、こ
の発明の第４の実施例の構成は動きベクトルｍｖａ、ｍ
ｖｂおよびＭＶをプロセサアレイ（演算処理回路）を共
用して生成することができる。すなわち、内挿予測動き
補償回路１１１９は、フレームメモリ１１１６および１
１１７へアクセスすることなく逆方向動き補償回路１１
１８および順方向動き補償回路１１２０から部分参照画
像データを直接受けることにより内挿動き補償を行なう
ことができる。これにより、装置規模の低減のみならず
内挿予測のための動きベクトル検出時におけるフレーム
メモリ１１１６および１１１７へのアクセス回数を低減
することができ、高速画像処理が可能となる。また、検
出された動きベクトルとともにサーチウインドーブロッ
ク（部分参照画像）が並行して生成されるため、予測信
号生成のために新たにフレームメモリへアクセスする必
要がなく、このサーチウインドーブロック画素データを
予測信号として利用することができる。In contrast to the source coding circuit shown in FIG. 84, the configuration of the fourth embodiment of the present invention is the motion vectors mva, m.
vb and MV can be generated by sharing the processor array (arithmetic processing circuit). That is, the interpolated prediction motion compensation circuit 1119 has the frame memories 1116 and 1
Reverse motion compensation circuit 11 without accessing 117
Interpolation motion compensation can be performed by directly receiving the partial reference image data from 18 and the forward motion compensation circuit 1120. As a result, not only the apparatus size can be reduced, but also the number of accesses to the frame memories 1116 and 1117 at the time of motion vector detection for interpolation prediction can be reduced, and high-speed image processing can be performed. Further, since the search window block (partial reference image) is generated in parallel with the detected motion vector, it is not necessary to access the frame memory newly for generating the prediction signal. Can be used as a prediction signal.

【０３４６】図８５はこの発明の第４の実施例のさらに
他の変更例を示す図である。図８５において、動きベク
トル検出装置は、第１の片側予測動き検出部１１５０、
第２の片側予測動き検出部１１６０および内挿予測動き
検出部１１７０を含む。FIG. 85 is a diagram showing still another modification of the fourth embodiment of the present invention. In FIG. 85, the motion vector detection device is configured such that the first one-sided prediction motion detection unit 1150,
A second one-sided prediction motion detection unit 1160 and an interpolation prediction motion detection unit 1170 are included.

【０３４７】第１の片側動き検出部１１５０は、第１の
サーチウインドー画素データとテンプレートブロック画
素データとを受けてブロックマッチング処理により整数
精度での動きベクトルおよび対応の評価値を算出する第
１の整数精度動き検出部１１５１と、第１の整数精度動
き検出部１１５１からのサーチウインドーデータＳａを
格納する第１のバッファメモリ１１５２と、動き検出部
１１５１から与えられる動きベクトルｍｖａｉおよび評
価値ｅｖａｉおよびテンプレートブロックデータＴＰａ
を受けて分数精度（詳細精度）での動きベクトルｍｖａ
および評価値を求める第１の詳細精度動き検出部１１５
３を含む。The first one-sided motion detector 1150 receives the first search window pixel data and the template block pixel data, and calculates a motion vector with integer precision and a corresponding evaluation value by block matching processing. Integer precision motion detection unit 1151, a first buffer memory 1152 for storing search window data Sa from the first integer precision motion detection unit 1151, and a motion vector mvai and evaluation value evai given from the motion detection unit 1151. And template block data TPa
Motion vector mva with fractional accuracy (detailed accuracy)
And the first detailed precision motion detection unit 115 for obtaining the evaluation value
Including 3.

【０３４８】第１のメモリ１１５２は整数精度の動きベ
クトルｍｖａｉに対応するサーチウインドーブロックと
その周辺画素を含むエリアの画素データを格納する。第
１の詳細精度動き検出部１１５３は、テンプレートブロ
ックデータＴＰａと第１のバッファメモリ１１５２の画
素データとを用いて詳細精度での動きベクトルｍｖａお
よび対応の評価値ｅｖａを生成する。この第１の片側予
測動き検出部１１５０の構成および動作は先に図４４な
いし図５７を参照して説明した第２の実施例の構成およ
び動作と同様である。The first memory 1152 stores the pixel data of the area including the search window block corresponding to the integer-precision motion vector mvai and its peripheral pixels. The first detailed precision motion detection unit 1153 uses the template block data TPa and the pixel data in the first buffer memory 1152 to generate a motion vector mva with detailed precision and the corresponding evaluation value eva. The configuration and operation of the first one-sided predictive motion detection unit 1150 are the same as the configuration and operation of the second embodiment described above with reference to FIGS. 44 to 57.

【０３４９】第２の片側予測動き検出部１１６０は、第
１の片側予測動き検出部１１５０と同様の構成を有し、
第２のサーチウインドーデータとテンプレートデータと
を受ける第２の整数精度動き検出部１１６１と、この検
出部１１６１からのサーチウインドーデータＳｂを格納
する第２のバッファメモリ１１６２と、検出部１１６１
からのテンプレートブロックデータＴＰｂと第２のバッ
ファメモリ１１６２からのサーチウインドーデータを受
けて詳細精度での動きベクトルｍｖｂおよび対応の評価
値ｍｖｂを求める第２の詳細精度動き検出部１１６３を
含む。第２の片側予測動き検出部１１６０の動作も先に
説明した第２の実施例のものと同様である。The second one-sided predictive motion detection section 1160 has the same configuration as the first one-sided predictive motion detection section 1150,
A second integer precision motion detection unit 1161 that receives the second search window data and the template data, a second buffer memory 1162 that stores the search window data Sb from this detection unit 1161, and a detection unit 1161.
It includes a second detailed precision motion detection unit 1163 which receives the template block data TPb from the second buffer memory 1162 and the search window data from the second buffer memory 1162 to obtain the motion vector mvb with the detailed precision and the corresponding evaluation value mvb. The operation of the second one-side predictive motion detection unit 1160 is also the same as that of the second embodiment described above.

【０３５０】内挿予測動き検出部１１７０は、第１およ
び第２の片側動き１１５０および１１６０からの詳細精
度の動きベクトルｍｖａ、ｍｖｂ、および評価値ｅｖａ
およびｅｖｂならびに部分参照画像データＲＰａおよび
ＲＰｂを受けてテンプレートブロックに対する最適の動
きベクトルＭＶを検出する。このとき、対応の評価値Ｅ
Ｖが動きベクトルＭＶとともに出力されてもよい。内挿
予測動き検出部１１７０のブロックレベルでの構成は先
に図７６を参照して説明したものと同様である。ただ
し、詳細精度での動きベクトルの検出を行なうためその
構成要素の動作が異なる。以下に詳細精度での内挿予測
動きベクトル検出のための動作を図７６を参照して簡単
に説明する。The interpolation predictive motion detection unit 1170 uses the detailed precision motion vectors mva and mvb from the first and second one-sided motions 1150 and 1160, and the evaluation value eva.
And evb and the partial reference image data RPa and RPb, the optimum motion vector MV for the template block is detected. At this time, the corresponding evaluation value E
V may be output together with the motion vector MV. The block-level configuration of the interpolated motion prediction detector 1170 is the same as that described above with reference to FIG. However, since the motion vector is detected with detailed accuracy, the operation of its constituent elements is different. The operation for detecting the interpolated motion vector predictor with the detailed accuracy will be briefly described below with reference to FIG.

【０３５１】内挿予測用参照画像生成回路１０２０にお
いて部分参照画像データＲＰａおよびＲＰｂから内挿参
照画像データＲＰｃが生成される。この動作は整数精度
時のそれと同様である。内挿予測動きベクトル検出演算
部１０２２はバッファメモリ１０２１からのテンプレー
トブロック画素データと回路１０２０からの内挿参照画
像データとから詳細精度での動きベクトルｍｖｃおよび
対応の評価値ｅｖｃを求める。この詳細精度での検出を
行なうため、この検出演算部１０２２は、たとえば図５
３に示す構成を備える。The interpolation prediction reference image generation circuit 1020 generates interpolation reference image data RPc from the partial reference image data RPa and RPb. This operation is similar to that at integer precision. The interpolation prediction motion vector detection calculation unit 1022 obtains the motion vector mvc and the corresponding evaluation value evc with high precision from the template block pixel data from the buffer memory 1021 and the interpolation reference image data from the circuit 1020. In order to perform the detection with this detailed accuracy, the detection calculation unit 1022 is provided with, for example, FIG.
3 is provided.

【０３５２】動きベクトル判定回路１０２４は、評価値
ｅｖａ、ｅｖｂおよびｅｖｃのうち最小のものを求め、
最小の評価値に対応する動きベクトルを選択して出力す
る。対象の評価値ＥＶが併せて出力されてもよい。The motion vector determination circuit 1024 finds the smallest one of the evaluation values eva, evb and evc,
The motion vector corresponding to the smallest evaluation value is selected and output. The target evaluation value EV may be output together.

【０３５３】詳細精度での片側予測動きベクトル検出の
ために用いられるバッファメモリ１１５２および１１６
２を内挿予測動き検出のためにも利用することができ、
フレームメモリへのアクセス回数および装置規模を増大
させることなく高速で詳細精度での内挿予測動き検出を
行なうことができる。Buffer memories 1152 and 116 used for one-sided motion vector predictor detection with detailed accuracy
2 can also be used for interpolative motion estimation,
It is possible to perform interpolation prediction motion detection at high speed and with high precision, without increasing the number of accesses to the frame memory and the device scale.

【０３５４】なお、テンプレートブロックデータは整数
精度の内挿動き検出時と同様、片側予測動き検出部１１
５０または１１６０から検出部１１７０へ直接与えられ
てもよい。この場合、整数精度での動きベクトル検出、
詳細精度での動きベクトル検出および内挿予測動きベク
トル検出をパイプライン的に実行することができる。Note that the template block data is stored in the one-sided predictive motion detection unit 11 as in the case of integer-precision interpolation motion detection.
It may be directly given to the detection unit 1170 from 50 or 1160. In this case, motion vector detection with integer precision,
It is possible to execute the motion vector detection and the interpolated prediction motion vector detection with a fine precision in a pipeline manner.

【０３５５】なお、詳細精度の内挿予測動き検出のため
の構成に対し、整数精度での内挿予測動き検出において
述べた構成を利用することができる。For the structure for the interpolative prediction motion detection with the detailed accuracy, the structure described in the interpolative predictive motion detection with the integer accuracy can be used.

【０３５６】[0356]

【発明の効果】以上のように、この発明によれば、要素
プロセサを２次元アレイ状に配置し、このプロセサアレ
イ内にテンプレートブロックデータを常駐させて動きベ
クトル検出に必要とされる評価値を算出するように構成
したので、小さな回路規模でかつ低消費電力で動きベク
トルを検出することが可能となる。As described above, according to the present invention, the element processors are arranged in a two-dimensional array and the template block data is made resident in this processor array so that the evaluation values required for motion vector detection can be obtained. Since the calculation is performed, the motion vector can be detected with a small circuit scale and low power consumption.

【０３５７】また、要素プロセサをアレイ状に配置し、
このアレイ内をサーチウインドーデータおよびテンプレ
ートブロックデータをそれぞれ一方方向に沿ってシフト
動作させているため、分数精度での動きベクトル検出に
必要とされるデータをこの整数精度の動きベクトル検出
に用いたデータをそのまま利用して実行することがで
き、高速で分数精度での動きベクトルを検出することが
可能となる。Also, the element processors are arranged in an array,
Since the search window data and the template block data are shifted in one direction in this array, the data required for motion vector detection with fractional precision was used for this integer precision motion vector detection. The data can be used as it is, and the motion vector can be detected at high speed with fractional accuracy.

【０３５８】すなわち、請求項１に係る発明によれば、
サーチウインドーデータとテンプレートデータとを格納
することのできる要素プロセサを２次元アレイ状に配置
し、この２次元アレイ内を一方方向に沿ってのみデータ
を転送するように構成したためデータ転送に要する回路
規模が低減され、低消費電力で動作する小占有面積の動
きベクトル検出装置を得ることができる。That is, according to the invention of claim 1,
A circuit required for data transfer because the element processors capable of storing the search window data and the template data are arranged in a two-dimensional array and the data is transferred only in one direction in the two-dimensional array. It is possible to obtain a motion vector detection device having a small scale and a small occupation area which operates with low power consumption.

【０３５９】請求項２に係る発明に従えば、整数精度で
の動きベクトル検出に用いられたサーチウインドーデー
タおよびテンプレートブロックデータをフレームメモリ
を介さずにそのまま分数精度での動きベクトル検出動作
に利用しているため、フレームメモリへのアクセス回数
が低減され、高速で分数精度での動きベクトルを検出す
ることが可能となる。According to the invention of claim 2, the search window data and the template block data used for motion vector detection with integer precision are used as they are for motion vector detection operation with fractional precision without passing through the frame memory. Therefore, the number of accesses to the frame memory is reduced, and the motion vector can be detected at high speed with fractional accuracy.

【０３６０】請求項３の発明に従えば要素プロセサは複
数のサーチウインドーデータおよび／または複数のテン
プレートブロックデータを記憶することができ、プロセ
サアレイの占有面積を低減することができ、小占有面積
の動きベクトル検出装置が得られる。また、要素プロセ
サの数が低減されるため、応じて差分絶対値などの評価
値算出に必要とされる演算回路系の構成要素数を低減す
ることができ、低消費電力の動きベクトル検出装置が得
られる。According to the third aspect of the present invention, the element processor can store a plurality of search window data and / or a plurality of template block data, so that the area occupied by the processor array can be reduced and the small occupied area can be obtained. The motion vector detection device of is obtained. Further, since the number of element processors is reduced, the number of constituent elements of the arithmetic circuit system required for calculating the evaluation value such as the absolute difference value can be reduced accordingly, and the motion vector detecting device with low power consumption can be reduced. can get.

【０３６１】請求項４に係る発明によれば、動きベクト
ルの評価値を算出するための演算手段が符号ビットと大
きさビットの組合せで出力されているため、２の補数表
示における負の数を表現するためのインクリメンタが不
要となり、小占有面積の要素プロセサを得ることが可能
となる。また応じてインクリメンタに必要とされる消費
電力を低減することができる。According to the invention of claim 4, since the operation means for calculating the evaluation value of the motion vector is output by the combination of the sign bit and the size bit, the negative number in the two's complement display is represented. An incrementer for expressing is unnecessary, and an element processor with a small occupied area can be obtained. In addition, the power consumption required for the incrementer can be reduced accordingly.

【０３６２】請求項５に係る発明に従えば、評価値を生
成するための総和回路をツリー状に配置された全加算回
路で構成し、その全加算回路段の最下位ビットへ符号ビ
ットを与えているため小占有面積で高速動作する総和回
路を得ることができ、応じて高速で動きベクトルを検出
することが可能となる。According to the invention of claim 5, the summing circuit for generating the evaluation value is formed by a full adder circuit arranged in a tree shape, and the sign bit is given to the least significant bit of the full adder circuit stage. Therefore, it is possible to obtain a summing circuit that operates at a high speed with a small occupied area, and accordingly it is possible to detect a motion vector at a high speed.

【０３６３】請求項６の発明に従えば、サーチウインド
ーの画素データをプロセサアレイ内に格納しかつテンプ
レートブロックデータをプロセサアレイ内に常駐させる
ように構成したため、小占有面積でかつ低消費電力で動
作する動きベクトル検出装置を得ることができる。According to the invention of claim 6, since the pixel data of the search window is stored in the processor array and the template block data is made resident in the processor array, the area occupied is small and the power consumption is low. An operating motion vector detection device can be obtained.

【０３６４】請求項７に係る発明によれば、サーチウイ
ンドーデータはこのプロセサアレイ内において一方方向
に沿って転送されるため、評価値生成時に不要となるデ
ータはプロセサアレイからシフトアウトされかつ次に必
要とされるサーチウインドーデータをプロセサアレイ内
へシフトインすることができ、効率的にサーチウインド
ーデータをプロセサアレイ内に格納することができ、高
速かつ低消費電力で動作する小占有面積の動きベクトル
検出装置を得ることができる。According to the invention of claim 7, since the search window data is transferred along the one direction in this processor array, the data which becomes unnecessary when the evaluation value is generated is shifted out from the processor array and The required search window data can be shifted into the processor array, the search window data can be efficiently stored in the processor array, and the small occupied area that operates at high speed and low power consumption It is possible to obtain the motion vector detecting device.

【０３６５】請求項８に係る発明に従えば、テンプレー
トブロックのデータの転送方向とサーチウインドーデー
タの転送方向とがプロセサアレイ内において直交する。
テンプレートブロックデータはフレームメモリにおいて
はラスタスキャンに従って配置されているため、テンプ
レートブロックデータの記憶態様に従ってそのままデー
タをプロセサアレイ内へ格納することが可能となり、テ
ンプレートデータを高速でプロセサアレイ内へロードす
ることができる。According to the eighth aspect of the present invention, the template block data transfer direction and the search window data transfer direction are orthogonal to each other in the processor array.
Since the template block data is arranged according to the raster scan in the frame memory, the data can be stored as it is in the processor array according to the storage mode of the template block data, and the template data can be loaded into the processor array at high speed. You can

【０３６６】請求項９の発明に従えば、プロセサアレイ
内においてサーチウインドーデータとテンプレートブロ
ックデータの転送方向が平行にされている。これにより
両データの伝達に必要とされる配線占有面積を低減する
ことができ、小占有面積のプロセサアレイを実現するこ
とができる。According to the ninth aspect of the present invention, the transfer directions of the search window data and the template block data are parallel in the processor array. As a result, the wiring occupying area required for transmitting both data can be reduced, and a processor array having a small occupying area can be realized.

【０３６７】請求項１０に係る発明に従えば、サーチウ
インドーデータの転送速度のＮ倍の速度で演算を行なっ
ているため、効率的に動きベクトル評価値を生成するこ
とが可能となる。According to the tenth aspect of the invention, since the calculation is performed at a speed N times the transfer speed of the search window data, it is possible to efficiently generate the motion vector evaluation value.

【０３６８】請求項１１の発明に従えば、評価値生成に
必要とされるテンプレートブロックデータをサブサンプ
リングすることができ、高速フレームレートの画像デー
タに対しても正確に動きベクトルを検出することが可能
となる。またこのとき、必要とされる要素プロセサの数
も応じて低減され、プロセサアレイ占有面積を低減する
ことができる。According to the eleventh aspect of the invention, the template block data required for generating the evaluation value can be sub-sampled, and the motion vector can be accurately detected even for the image data of the high frame rate. It will be possible. At this time, the number of required element processors is also reduced accordingly, and the area occupied by the processor array can be reduced.

【０３６９】請求項１２に係る発明に従えば、２つのテ
ンプレートブロックデータが各要素プロセサに格納され
ており、同じサーチウインドーブロックデータを用いて
２つのテンプレートブロックに対する評価値の生成が行
なわれている。これによりサーチウインドーデータを読
出すために効率的にフレームメモリへアクセスすること
ができ、高速で動きベクトルを検出することが可能とな
る。According to the invention of claim 12, two template block data are stored in each element processor, and evaluation values for two template blocks are generated using the same search window block data. There is. As a result, the frame memory can be efficiently accessed to read the search window data, and the motion vector can be detected at high speed.

【０３７０】請求項１３に係る発明に従えば、２つのテ
ンプレートブロックに対し同時に動きベクトルの評価が
実行されており、サーチウインドーデータを読出すため
のフレームメモリへのアクセス回数を大幅に低減するこ
とができ、効率的かつ高速で動きベクトルを検出するこ
とが可能となる。According to the thirteenth aspect of the present invention, the motion vector is evaluated simultaneously for the two template blocks, and the number of accesses to the frame memory for reading the search window data is greatly reduced. It is possible to detect the motion vector efficiently and at high speed.

【０３７１】請求項１４に係る発明に従えば、片側予測
動き検出部が利用した部分参照画像データを利用して合
成参照画像データ（内挿参照画像データ）を生成して内
挿予測動き検出を行なっているため、フレームメモリへ
のアクセス回数を大幅に低減することができ、高速で内
挿予測動き検出を行なうことができる。請求項１５に係
る発明に従えば、詳細精度片側予測動き検出のために用
いられたバッファメモリ手段に格納された部分参照画像
データを利用して合成参照画像データを生成して詳細精
度での内挿予測検出を行なっているため、詳細精度での
内挿予測の検出のために新たにバッファメモリを設ける
必要がなく、装置規模を増大させることなく高速で詳細
精度での内挿予測動き検出を行なうことができる。According to the fourteenth aspect of the present invention, the synthetic reference image data (interpolation reference image data) is generated by using the partial reference image data used by the one-sided prediction motion detection unit, and interpolation prediction motion detection is performed. Since this is performed, the number of accesses to the frame memory can be significantly reduced, and the interpolation prediction motion detection can be performed at high speed. According to the fifteenth aspect of the present invention, the synthetic reference image data is generated by using the partial reference image data stored in the buffer memory means used for the one-sided motion prediction of the detailed precision, and the detailed reference image data is generated. Since interpolative prediction detection is performed, there is no need to provide a new buffer memory to detect interpolative prediction with detailed accuracy, and it is possible to detect interpolative predictive motion with high accuracy at high speed without increasing the device size. Can be done.

【０３７２】請求項１６に係る発明に従えば、片側予測
動き検出手段から必要なデータがすべて内挿予測動き検
出手段へ転送されるため、この動き検出時の処理をパイ
プライン的に実行することができ高速処理が可能とな
る。According to the sixteenth aspect of the present invention, since all the necessary data are transferred from the one-sided predictive motion detecting means to the interpolative predictive motion detecting means, the processing at the time of this motion detection should be executed in a pipeline manner. It is possible to perform high speed processing.

[Brief description of drawings]

【図１】この発明の一実施例である動きベクトル検出装
置の全体の構成を示す図である。FIG. 1 is a diagram showing an overall configuration of a motion vector detection device according to an embodiment of the present invention.

【図２】図１における動きベクトル検出装置が利用する
サーチエリアおよびテンプレートブロックの検索範囲を
示す図である。FIG. 2 is a diagram showing a search area and a search range of a template block used by the motion vector detection device in FIG.

【図３】図１に示すプロセサアレイに含まれる要素プロ
セサの構成を示す図である。FIG. 3 is a diagram showing a configuration of an element processor included in the processor array shown in FIG.

【図４】図１に示すプロセサアレイの全体の構成を示す
図である。FIG. 4 is a diagram showing an overall configuration of the processor array shown in FIG.

【図５】図４に示すプロセサアレイの変形例を示す図で
ある。5 is a diagram showing a modification of the processor array shown in FIG.

【図６】図１に示すプロセサアレイ内における格納デー
タの位置関係を示す図である。6 is a diagram showing a positional relationship of stored data in the processor array shown in FIG.

【図７】図４に示すプロセサアレイ内における格納デー
タの分布を示す図である。7 is a diagram showing a distribution of stored data in the processor array shown in FIG.

【図８】図４に示すプロセサアレイにおける１列の線形
プロセサアレイ内における格納データの分布を示す図で
ある。8 is a diagram showing a distribution of stored data in a linear processor array of one column in the processor array shown in FIG.

【図９】図１に示す動きベクトル検出装置の動作を説明
するための図である。FIG. 9 is a diagram for explaining the operation of the motion vector detection device shown in FIG. 1.

【図１０】１個の評価値生成完了後のプロセサアレイ内
におけるデータの動きおよび分布を示す図である。FIG. 10 is a diagram showing movement and distribution of data in the processor array after completion of generation of one evaluation value.

【図１１】図４に示すプロセサアレイ内の格納データと
変位ベクトルとの関係およびそのときの格納データの分
布を示す図である。11 is a diagram showing a relationship between stored data and displacement vectors in the processor array shown in FIG. 4 and a distribution of stored data at that time.

【図１２】１つのサーチウインドーに対する評価値生成
完了時におけるテンプレートブロックとサーチウインド
ーブロックとの位置関係およびプロセサアレイ内の格納
データの分布を示す図である。FIG. 12 is a diagram showing a positional relationship between a template block and a search window block and a distribution of stored data in a processor array at the time of completion of generation of an evaluation value for one search window.

【図１３】この発明の一実施例における次のサーチウイ
ンドーデータロード時のテンプレートブロックとサーチ
ウインドーとの位置関係を示す図である。FIG. 13 is a diagram showing a positional relationship between the template block and the search window at the time of loading the next search window data in the embodiment of the present invention.

【図１４】図１３に示すサーチウインドーを用いて評価
値を生成する場合のサーチウインドーブロックとテンプ
レートブロックとの位置関係を示す図である。14 is a diagram showing a positional relationship between a search window block and a template block when an evaluation value is generated using the search window shown in FIG.

【図１５】動きベクトル評価動作完了時におけるサーチ
ウインドーブロックとテンプレートブロックとの位置関
係を示す図である。FIG. 15 is a diagram showing a positional relationship between a search window block and a template block when the motion vector evaluation operation is completed.

【図１６】図１５に示すサーチウインドーブロックとテ
ンプレートブロックとの位置関係時におけるプロセサア
レイ内の格納データの分布を示す図である。16 is a diagram showing a distribution of stored data in the processor array at the time of the positional relationship between the search window block and the template block shown in FIG.

【図１７】プロセサアレイ内の要素プロセサの配置を示
す図である。FIG. 17 is a diagram showing the arrangement of element processors in a processor array.

【図１８】図１７に示す要素プロセサ内に格納されたテ
ンプレートデータの配置を示す図である。FIG. 18 is a diagram showing an arrangement of template data stored in the element processor shown in FIG. 17.

【図１９】図１８に示すテンプレートブロックに対する
サーチエリアの画素データの配置を示す図である。19 is a diagram showing an arrangement of pixel data in a search area with respect to the template block shown in FIG.

【図２０】図１９に示すサーチエリアにおける変位ベク
トルの変化範囲（動きベクトル検索範囲）を示す図であ
る。20 is a diagram showing a displacement vector change range (motion vector search range) in the search area shown in FIG. 19;

【図２１】図２１は出発状態におけるプロセサアレイ例
内のサーチウインドー画素データの分布を示す図であ
る。FIG. 21 is a diagram showing a distribution of search window pixel data in the processor array example in the starting state.

【図２２】プロセサアレイ内の処理の進行に伴うサーチ
ウインドー画素データの分布の変化を示す図である。FIG. 22 is a diagram showing a change in the distribution of search window pixel data with the progress of processing in the processor array.

【図２３】プロセサアレイ内の処理の進行に伴うサーチ
ウインドー画素データの分布の変化を示す図である。FIG. 23 is a diagram showing a change in the distribution of search window pixel data with the progress of processing in the processor array.

【図２４】サーチエリアにおけるサーチウインドーデー
タのスキャン方向を示す図である。FIG. 24 is a diagram showing a scan direction of search window data in a search area.

【図２５】図３に示す要素プロセサの具体的構成の一例
を示す図である。25 is a diagram showing an example of a specific configuration of the element processor shown in FIG.

【図２６】図２５に示す差分絶対値回路の具体的構成例
を示す図である。FIG. 26 is a diagram showing a specific configuration example of the differential absolute value circuit shown in FIG. 25.

【図２７】図２５に示す要素プロセサを一般化したとき
の動作を示す信号波形図である。FIG. 27 is a signal waveform diagram representing an operation when the element processor shown in FIG. 25 is generalized.

【図２８】図２５に示す要素プロセサの動作を説明する
ための図である。FIG. 28 is a diagram for explaining the operation of the element processor shown in FIG. 25.

【図２９】図２５に示す要素プロセサの他の動作態様を
示す信号波形図である。29 is a signal waveform diagram representing another operation mode of the element processor shown in FIG.

【図３０】図２９に示す動作波形図における動作時の有
効演算結果を与える画素データの分布を示す図である。30 is a diagram showing a distribution of pixel data giving an effective calculation result at the time of operation in the operation waveform diagram shown in FIG. 29.

【図３１】図３０に示す動作を実現するための要素プロ
セサの具体的構成例を示す図である。31 is a diagram showing a specific configuration example of an element processor for realizing the operation shown in FIG. 30. FIG.

【図３２】要素プロセサ内に用いられるデータレジスタ
の構成の一例を示す図である。FIG. 32 is a diagram showing an example of a configuration of a data register used in the element processor.

【図３３】要素プロセサのさらに他の構成例を示す図で
ある。FIG. 33 is a diagram showing still another configuration example of the element processor.

【図３４】要素プロセサのさらに他の構成例を示す図で
ある。FIG. 34 is a diagram showing still another configuration example of the element processor.

【図３５】データレジスタの他の構成例を示す図であ
る。FIG. 35 is a diagram showing another configuration example of the data register.

【図３６】図３５に示す単位構成を備えるデータレジス
タの構成を示す図である。36 is a diagram showing a structure of a data register having the unit structure shown in FIG. 35. FIG.

【図３７】図４に示すデータバッファの構成を示す図で
ある。37 is a diagram showing the structure of the data buffer shown in FIG. 4. FIG.

【図３８】この発明による要素プロセサ内に用いられる
差分絶対値回路の他の構成例を示す図である。FIG. 38 is a diagram showing another configuration example of the difference absolute value circuit used in the element processor according to the present invention.

【図３９】図３８に示す差分絶対値回路を用いる場合の
総和部の構成を示す図である。39 is a diagram showing a configuration of a summing unit when the absolute difference circuit shown in FIG. 38 is used.

【図４０】図３９に示す総和部の具体的構成例を示す図
である。FIG. 40 is a diagram showing a specific configuration example of the summing unit shown in FIG. 39.

【図４１】図４０に示す４対２コンプレッサの構成を示
す図である。41 is a diagram showing a configuration of the 4 to 2 compressor shown in FIG. 40. FIG.

【図４２】図４０に示す総和部の具体的構成例を示す図
である。42 is a diagram showing a specific configuration example of the summation unit shown in FIG. 40.

【図４３】図１に示す比較部の具体的構成例を示す図で
ある。43 is a diagram illustrating a specific configuration example of a comparison unit illustrated in FIG.

【図４４】この発明の第２の実施例である動きベクトル
検出装置の概念的構成を示す図である。FIG. 44 is a diagram showing a conceptual configuration of a motion vector detection device according to a second embodiment of the present invention.

【図４５】図４４に示す動きベクトル検出装置の具体的
構成例を示す図である。FIG. 45 is a diagram showing a specific configuration example of the motion vector detection device shown in FIG. 44.

【図４６】図４５に示すバッファメモリ回路の具体的構
成例を示す図である。FIG. 46 is a diagram showing a specific configuration example of the buffer memory circuit shown in FIG. 45.

【図４７】図４６に示すバッファメモリの格納データの
分布を示す図である。47 is a diagram showing a distribution of stored data in the buffer memory shown in FIG. 46.

【図４８】図４６に示すバッファメモリ回路の動作を示
す図である。48 is a diagram showing an operation of the buffer memory circuit shown in FIG. 46. FIG.

【図４９】図４４に示す第２の演算装置の具体的構成例
を示す図である。FIG. 49 is a diagram showing a specific configuration example of the second arithmetic unit shown in FIG. 44.

【図５０】図４９に示す分数精度予測画像生成回路の具
体的構成例を示す図である。50 is a diagram showing a specific configuration example of the fractional accuracy predicted image generation circuit shown in FIG. 49.

【図５１】図４９に示す差分絶対値和回路の構成を示す
図である。51 is a diagram showing the configuration of the difference absolute value sum circuit shown in FIG. 49. FIG.

【図５２】第２の演算装置の動作を説明するための図で
ある。FIG. 52 is a diagram for explaining the operation of the second arithmetic device.

【図５３】図４５に示す第２の演算装置の具体的構成例
を示す図である。53 is a diagram showing a specific configuration example of the second arithmetic unit shown in FIG. 45.

【図５４】図４５に示す動きベクトル検出装置の変更例
を示す図である。FIG. 54 is a diagram showing a modified example of the motion vector detection device shown in FIG. 45.

【図５５】図５４に示す動きベクトル検出装置の動作を
示す図である。55 is a diagram showing an operation of the motion vector detection device shown in FIG. 54.

【図５６】図４４に示す動きベクトル検出装置のさらに
他の構成例を示す図である。FIG. 56 is a diagram showing still another configuration example of the motion vector detection device shown in FIG. 44.

【図５７】図５６に示す動きベクトル検出装置の変形例
を示す図である。FIG. 57 is a diagram showing a modification of the motion vector detecting device shown in FIG. 56.

【図５８】第３の実施例において利用されるサーチエリ
アおよびテンプレートブロックならびに検索範囲を示す
図である。FIG. 58 is a diagram showing a search area, template blocks, and a search range used in the third embodiment.

【図５９】第３の実施例において利用されるテンプレー
トブロックの配置を示す図である。FIG. 59 is a diagram showing an arrangement of template blocks used in the third embodiment.

【図６０】この発明の第３の実施例の動作を説明するた
めの図である。FIG. 60 is a diagram for explaining the operation of the third embodiment of the present invention.

【図６１】この発明の第３の実施例である動きベクトル
検出装置の動作を説明するための図である。FIG. 61 is a diagram for explaining the operation of the motion vector detecting device according to the third embodiment of the present invention.

【図６２】この発明の第３の実施例である動きベクトル
検出装置の動作を示す図である。FIG. 62 is a diagram showing an operation of the motion vector detection device according to the third embodiment of the present invention.

【図６３】この発明の第３の実施例である動きベクトル
検出装置の動作を示す図である。FIG. 63 is a diagram showing an operation of the motion vector detection device according to the third embodiment of the present invention.

【図６４】この発明の第３の実施例である動きベクトル
検出装置の動作を示す図である。FIG. 64 is a diagram showing an operation of the motion vector detection device according to the third embodiment of the present invention.

【図６５】この発明の第３の実施例である動きベクトル
検出装置における動きベクトル生成のタイミングを示す
図である。FIG. 65 is a diagram showing timing of motion vector generation in the motion vector detecting device according to the third embodiment of the present invention.

【図６６】この発明の第３の実施例において利用される
要素プロセサの構成を示す図である。FIG. 66 is a diagram showing the structure of an element processor used in the third embodiment of the present invention.

【図６７】図６６に示す要素プロセサへの２つのテンプ
レートブロックのデータを生成するための構成を示す図
である。67 is a diagram showing a configuration for generating data of two template blocks for the element processor shown in FIG. 66. FIG.

【図６８】図６６に示す要素プロセサへの２つのテンプ
レートブロックのデータを生成するための他の構成を示
す図である。68 is a diagram showing another configuration for generating data of two template blocks for the element processor shown in FIG. 66. FIG.

【図６９】図６６に示す要素プロセサの他の構成例を示
す図である。69 is a diagram showing another configuration example of the element processor shown in FIG. 66. FIG.

【図７０】この発明の第３の実施例である動きベクトル
検出装置における比較部の構成例を示す図である。[Fig. 70] Fig. 70 is a diagram illustrating a configuration example of a comparison unit in the motion vector detection device according to the third embodiment of the present invention.

【図７１】この発明の第３の実施例である動きベクトル
検出装置における比較部の構成例を示す図である。[Fig. 71] Fig. 71 is a diagram illustrating a configuration example of a comparison unit in the motion vector detection device according to the third embodiment of the present invention.

【図７２】この発明の第３の実施例を分数精度動きベク
トル検出装置と組合せた際の装置構成を示す図である。FIG. 72 is a diagram showing a device configuration when a third embodiment of the present invention is combined with a fractional precision motion vector detection device.

【図７３】この発明の第３の実施例である動きベクトル
検出装置を分数精度動きベクトル検出装置と組合せた際
の他の構成例を示す図である。[Fig. 73] Fig. 73 is a diagram illustrating another configuration example when the motion vector detection device according to the third embodiment of the present invention is combined with a fractional precision motion vector detection device.

【図７４】図７３の構成の動作を示す図である。FIG. 74 is a diagram showing an operation of the configuration of FIG. 73.

【図７５】図７５はこの発明の第４の実施例である動き
ベクトル検出装置の全体の構成を示す図である。FIG. 75 is a diagram showing an overall configuration of a motion vector detection device which is a fourth embodiment of the present invention.

【図７６】図７５に示す内挿予測動き検出部の構成を示
す図である。[Fig. 76] Fig. 76 is a diagram illustrating the configuration of the interpolated prediction motion detection unit illustrated in Fig. 75.

【図７７】図７６に示す内挿予測用参照画像生成回路お
よび内挿予測動きベクトル検出演算部の構成を示す図で
ある。77 is a diagram showing the configurations of an interpolation prediction reference image generation circuit and an interpolation prediction motion vector detection calculation section shown in FIG. 76.

【図７８】図７６に示す動きベクトル判定回路の具体的
構成を示す図である。78 is a diagram showing a specific configuration of the motion vector determination circuit shown in FIG. 76. FIG.

【図７９】図７８に示す回路の詳細構造を示す図であ
る。79 is a diagram showing a detailed structure of the circuit shown in FIG. 78. FIG.

【図８０】図７６に示す内挿予測用参照画像生成回路の
変更例を示す図である。80 is a diagram showing a modification of the interpolation prediction reference image generation circuit shown in FIG. 76. FIG.

【図８１】内挿予測動き検出部の他の構成例を示す図で
ある。[Fig. 81] Fig. 81 is a diagram illustrating another configuration example of the interpolated prediction motion detection unit.

【図８２】この発明の第４の実施例である動きベクトル
検出装置の変更例の構成を示す図である。FIG. 82 is a diagram showing the configuration of a modification of the motion vector detection device according to the fourth embodiment of the present invention.

【図８３】この発明の第４の実施例の動きベクトル検出
装置におけるデータ処理態様を示す図である。FIG. 83 is a diagram showing a data processing mode in the motion vector detection device according to the fourth embodiment of the present invention.

【図８４】この発明の第４の実施例の動きベクトル検出
装置の用途の一例を示す図である。FIG. 84 is a diagram showing an example of application of the motion vector detection device according to the fourth embodiment of the present invention.

【図８５】この発明の第４の実施例の動きベクトル検出
装置の他の構成を示す図である。FIG. 85 is a diagram showing another configuration of the motion vector detection device according to the fourth embodiment of the present invention.

【図８６】従来の画像信号符号化回路の全体の構成を示
す図である。[Fig. 86] Fig. 86 is a diagram illustrating an overall configuration of a conventional image signal encoding circuit.

【図８７】図８６に示すソース符号化回路の構成を示す
図である。87 is a diagram showing the structure of the source encoding circuit shown in FIG. 86. FIG.

【図８８】画像の動き補償の操作を説明するための図で
ある。[Fig. 88] Fig. 88 is a diagram for describing an operation of image motion compensation.

【図８９】ブロックマッチング法による動き補償を行な
う際のサーチエリアおよびテンプレートブロックの配置
例および動きベクトルの関係を示す図である。[Fig. 89] Fig. 89 is a diagram illustrating an arrangement example of search areas and template blocks and a relationship between motion vectors when performing motion compensation by the block matching method.

【図９０】従来の動きベクトル検出装置の全体の構成を
示す図である。FIG. 90 is a diagram showing an overall configuration of a conventional motion vector detection device.

【図９１】図９０に示すプロセサアレイ内に含まれる要
素プロセサの構成を示す図である。91 is a diagram showing a configuration of an element processor included in the processor array shown in FIG. 90. FIG.

【図９２】図９０に示す動きベクトル検出装置における
テンプレートブロックのスキャンおよびサーチウインド
ーのスキャン方法を示す図である。92 is a diagram showing a method of scanning a template block and a scan of a search window in the motion vector detecting device shown in FIG. 90.

【図９３】図９０に示す動きベクトル検出装置の動作を
示す図である。FIG. 93 is a diagram showing an operation of the motion vector detection device shown in FIG. 90.

【図９４】従来の動きベクトル検出装置の動作を説明す
るための図である。FIG. 94 is a diagram for explaining the operation of the conventional motion vector detection device.

【図９５】従来の動きベクトル検出装置の他の構成例を
示す図である。[Fig. 95] Fig. 95 is a diagram illustrating another configuration example of the conventional motion vector detection device.

【図９６】分数精度での動きベクトルを生成するための
方法を説明するための図である。FIG. 96 is a diagram for explaining a method for generating a motion vector with a fractional precision.

【図９７】分数精度での動きベクトルを説明するための
図である。FIG. 97 is a diagram for explaining a motion vector with a fractional accuracy.

[Explanation of symbols]

１演算部２入力部３比較部１０プロセサアレイ１２総和部２５−１〜２５−Ｍデータレジスタ２６−１〜２６−Ｎデータレジスタ３５参照フレーム画像３６現フレーム画像４０サーチウインドー４２サーチウインドーブロック５２サイドウインドーブロック４３テンプレートブロック６４差分絶対値回路７０減算器７２ＥｘＯＲ回路７３Ｎワードレジスタファイル７５ＭワードレジスタファイルＤＬデータバッファ１０２４対２コンプレッサ１０４全加算器１１０ａ〜１１０ｈ全加算回路１３０レジスタラッチ１３２比較器１３４カウンタ１３６レジスタラッチ１５０フレームメモリ１５４フレームメモリ１５６コントローラ２００動きベクトル検出装置２１０第１の演算装置２５０第２の演算装置２７０バッファメモリ回路２８０バッファメモリ２８２バッファメモリ３０２分数精度予測画像生成回路３０４差分絶対値和回路３０６比較部４００プロセサアレイ４１７総和部４１８比較部４１１ａ〜４１１ｗ要素演算器４１３ａ〜４１３ｗレジスタ４１２ａ〜４１２ｗ差分絶対値回路４１４ａ〜４１４ｗレジスタ４０１フレームメモリ４０２フレームメモリ４３２ａシフトレジスタ４３２ｂシフトレジスタ４３０ａラッチ４３０ｂラッチ５０５ａ右側テンプレートブロックデータ格納用レジ
スタ５０５ｂ左側テンプレートブロックデータ格納用レジ
スタ５０５ｃサーチウインドーデータ格納用レジスタ５０６セレクタ５０７差分絶対値回路５６０総和部５６２右比較部５６４左比較部５８０バッファメモリ５８２バッファメモリ５８４分数精度動きベクトル演算装置５８６分数精度動きベクトル演算装置５９０バッファメモリ５９２バッファメモリ５９４バッファメモリ５９６バッファメモリ５９８セレクタ５９９第２の演算装置１０００テンプレートデータ記憶部１００１第１のサーチエリアデータ記憶部１００２第２のサーチエリアデータ記憶部１００３第１の片側予測動き検出部１００４第２の片側予測動き検出部１００５内挿予測動き検出部１０１１第１の演算処理回路１０１２第２の演算処理回路１０１３第１のバッファメモリ１０１４第２のバッファメモリ１０２０内挿予測用参照画像生成回路１０２１バッファメモリ１０２２内挿予測動きベクトル検出演算部１０２４動きベクトル判定回路DESCRIPTION OF SYMBOLS 1 Calculation part 2 Input part 3 Comparison part 10 Processor array 12 Summing part 25-1 to 25-M Data register 26-1 to 26-N Data register 35 Reference frame image 36 Current frame image 40 Search window 42 Search window block 52 Side Window Block 43 Template Block 64 Difference Absolute Value Circuit 70 Subtractor 72 ExOR Circuit 73 N Word Register File 75 M Word Register File DL Data Buffer 102 4 to 2 Compressor 104 Full Adder 110a to 110h Full Adder Circuit 130 Register Latch 132 Comparator 134 Counter 136 Register Latch 150 Frame Memory 154 Frame Memory 156 Controller 200 Motion Vector Detection Device 210 First Arithmetic Unit 250 Second Arithmetic Device 270 Buffer memory circuit 280 Buffer memory 282 Buffer memory 302 Fractional accuracy prediction image generation circuit 304 Difference absolute value sum circuit 306 Comparison unit 400 Processor array 417 Summation unit 418 Comparison unit 411a to 411w Element arithmetic unit 413a to 413w Registers 412a to 412w Difference Absolute value circuit 414a to 414w register 401 frame memory 402 frame memory 432a shift register 432b shift register 430a latch 430b latch 505a right template block data storage register 505b left template block data storage register 505c search window data storage register 506 selector 507 Difference absolute value circuit 560 Summing unit 562 Right comparing unit 564 Left comparing unit 580 bar Buffer memory 582 Buffer memory 584 Fractional precision motion vector computing device 586 Fractional precision motion vector computing device 590 Buffer memory 592 Buffer memory 594 Buffer memory 596 Buffer memory 598 Selector 599 Second computing device 1000 Template data storage unit 1001 First search area data Storage unit 1002 Second search area data storage unit 1003 First one-sided prediction motion detection unit 1004 Second one-sided prediction motion detection unit 1005 Interpolation prediction motion detection unit 1011 First arithmetic processing circuit 1012 Second arithmetic processing circuit 1013 First buffer memory 1014 Second buffer memory 1020 Interpolation prediction reference image generation circuit 1021 Buffer memory 1022 Interpolation prediction motion vector detection calculation unit 1024 Motion vector judgment Constant circuit

Claims

[Claims]

1. An apparatus for obtaining a motion vector used in motion compensation predictive code processing by block matching processing between a reference frame image and a current frame image, each of which is data substantially along one direction. A plurality of processor means including a means for transferring the first frame in the reference frame image, and the processor means for storing the pixel data of the first block of the current frame image, A second storage means for storing pixel data of a second block related to the second block, and a calculation means for performing a predetermined calculation process on the data stored in the first and second storage means, Evaluation for generating an evaluation value indicating the degree of correlation between the image of the first block and the image of the second block in response to the output of each of the arithmetic means of the plurality of processor means. Comprises a value generating means, and determining means for determining a motion vector for the first block in accordance with the evaluation value from the evaluation value generation means, the motion vector detecting device.

2. The determining means determines a motion vector with integer precision, receives the reference frame image data and the current frame image data, and moves the first block with fractional precision finer than the integer precision. It further comprises arithmetic processing means for calculating a vector, and means for directly giving at least one of the reference frame image data and the current frame image data stored in the plurality of processor means to the arithmetic processing means as data to be processed. The motion vector detecting device according to claim 1.

3. In each of the processor means, the first storage means corresponds to pixels different from each other.
Storage means for storing a number of data, and the second storage means includes a storage means for storing a number N of data corresponding to different pixels, where M and N are positive integers. The motion vector detecting device according to claim 1, wherein N is larger than M.

4. The calculation means performs subtraction between the pixel data of the first block and the pixel data of the second block, and the subtraction result is a code bit indicating a sign and a size bit indicating a size. And a gate means for performing module 2 addition of each magnitude bit of the subtraction means and the sign bit and outputting a difference absolute value of the subtraction result. The output of is given by the set of the sign bit and the absolute difference value,
The motion vector detection device according to claim 1.

5. The evaluation value generation means includes a summation circuit for calculating a summation of outputs of the calculation means, and the summation circuit includes a plurality of stages arranged in a tree shape in which all outputs are transmitted to the next stage. 5. The motion vector detection device according to claim 4, further comprising an adder circuit, wherein the sign bit is applied to a carry input of a full adder circuit of the least significant bit of each of the full adder circuit stages.

6. A motion vector detecting device for obtaining a motion vector of a template block composed of pixels of Q rows and P columns of a current frame image by block matching processing with respect to a search area of a predetermined size of a reference frame image, wherein The region includes a search window having the same width as the template block, and includes linear arrays arranged in P columns, each of the linear arrays including Q / M cascaded element processors. Each of the first storage means stores M different data of the template,
Second storage means for storing N different data in the search window, and operation means for performing a predetermined operation on the storage data of the first storage means and the storage data of the second storage means. And M and N are integers, and N is an integer multiple of M, and are provided corresponding to each of the linear arrays and store R different data in the search area. A motion vector detection device, comprising: a third storage means, wherein the sum of the R and the Q is equal to the number of pixels arranged in one column in the search area.

7. Each of the linear arrays corresponds to each column of the template block, data of the search area is transmitted along one direction via the second storage means, and linear data of adjacent columns is transmitted. 7. A motion vector detection device according to claim 6, which is transmitted to the array via its corresponding third storage means.

8. The motion vector detecting device according to claim 6, wherein a direction in which the template block data is transmitted and a direction in which the search window data is transmitted are substantially orthogonal to each other.

9. The template block data is transferred along the same direction as the search window data,
7. The motion vector detection device according to claim 6, wherein the template data is transmitted only between element processors by bypassing the third storage means corresponding to the linear array.

10. The motion vector detection device according to claim 6, wherein in each of the element processors, the calculation means executes the predetermined calculation at a calculation speed that is N times the transfer speed of the search window data.

11. The motion vector detection device according to claim 6, wherein the calculation means executes the predetermined calculation at a speed M times as high as a transfer speed of the search window data.

12. The first storage means includes first storage means for storing pixel data of a first template block and second storage means for storing pixel data of a second template block. 7. The motion vector detecting device according to claim 6, wherein the arithmetic means executes the predetermined arithmetic operation on the stored data of the second storage means and the stored data of the first and second storage means.

13. An apparatus for obtaining a motion vector used in motion compensation predictive coding processing by block matching processing between a reference frame image and a current frame image, the first vector having a predetermined size in the current frame image. Storage means for storing the data of the template block of the first frame, second storage means for storing the data of the second template block of the predetermined size in the current frame image, the first and second in the reference frame image Storage means for storing the data of the search window block associated with both template blocks, the storage data of the first storage means, the storage data of the second storage means and the storage of the third storage means. By executing a predetermined operation for each of the data and Evaluation value generating means for deriving an evaluation value indicating the correlation degree between the rate block and the search window block and the correlation degree between the second template block and the search window block, and the output of the evaluation value generating means. And means for performing motion vector detection on the first and second template blocks in a parallel fashion in response to the motion vector detection apparatus.

14. A motion vector detection device for obtaining a motion vector used in motion compensation predictive code processing by block matching processing between a reference frame image and a current frame image, each of which is within the current frame image. Motion vectors for the template block image according to the template block image to be encoded and partial reference images in different reference frame images related to the template block image are detected according to a block matching evaluation value. An interpolating partial reference image is generated by an interpolating process of the detecting means and the partial reference image data used by the plurality of one-sided prediction motion detecting means,
An interpolation prediction motion detection unit that calculates a motion vector according to a block matching evaluation value for the template block image by block matching processing between the generated interpolation partial reference image and the template block image; and the one-sided prediction motion detection unit. A motion vector detecting device comprising: an output determining unit that determines and outputs a final motion vector of the template block image according to the calculated evaluation function value and the evaluation value calculated by the interpolation predicted motion detecting unit.

15. Each of the one-sided predictive motion detection means,
Integer precision motion vector detection means for detecting a motion vector with integer precision, buffer memory means for receiving and storing corresponding partial reference image data from the integer precision motion vector detection means, and partial reference image from the buffer memory means A fractional precision motion vector detecting means for detecting a motion vector with a fractional precision according to the data and the template block image data, and the interpolation predicted motion detecting means is provided corresponding to each of the one-sided predicted motion detecting means. 15. The motion vector detecting device according to claim 14, wherein the partial reference image data is input from a buffer memory means.

16. A transfer means for transferring the evaluation value, the motion vector, and the partial reference image data calculated by the one-sided prediction motion detection means from each one-sided prediction detection means to the interpolation prediction motion detection means, The motion vector detecting device according to claim 14.