JPH05268593A

JPH05268593A - Difference absolute sum/difference square sum parallel arithmetic operation device

Info

Publication number: JPH05268593A
Application number: JP6472192A
Authority: JP
Inventors: Toshihiro Minami; 俊宏南; Ryota Kasai; 良太笠井
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-03-23
Filing date: 1992-03-23
Publication date: 1993-10-15

Abstract

PURPOSE:To reduce number of output ports of a memory and to eliminate the need for a selector selecting a picture element to be sent to each arithmetic operation circuit and a shift register used to shift picture elements of a current picture element block read from the memory for each cycle. CONSTITUTION:An L1.L2 norm parallel arithmetic operation unit calculating a difference absolute sum (L1) norm and a difference square sum (L2) norm between plural picture element blocks deviated by one picture element each in the horizontal direction segmented from a preceding frame and a picture element block segmented from a current frame in parallel is provided with a means which latches picture elements of a preceding frame to memories 7-0-7-3 having plural outputs port able to read plural data at once at consecutive addresses and reads plural picture elements having consecutive addresses at once, segments simultaneously picture elements of a picture element block from a current frame and transfers the segmented picture elements of the picture element block to all of plural computing elements 22-0-22-3 whose number is the same as the number of the port.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、動画像符号化のアルゴ
リズムの一つである動き補償に必要なブロックマッチン
グの基本となる差分絶対値和（以下、Ｌ１ノルムとい
う）もしくは差分自乗演算器（以下、Ｌ２ノルムとい
う）計算を並列に行うＬ１・Ｌ２ノルム並列演算装置に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sum of absolute differences (hereinafter referred to as L1 norm) or a difference square operator (which will be referred to as "L1 norm"), which is the basis of block matching required for motion compensation, which is one of moving image coding algorithms. Hereinafter, the present invention relates to an L1 · L2 norm parallel arithmetic device for performing parallel calculations.

【０００２】[0002]

【従来の技術】（１）Ｌ１・Ｌ２ノルム図２は、従来のＬ１・Ｌ２ノルム計算の対象となる画素
ブロックの例を説明するための図である。この例では、
画素ブロックは８×８の大きさである。Ｌ１・Ｌ２ノル
ムは、現フレーム中の画素ブロック１と前フレーム２中
の複数の画素ブロック３〜５の間で次式（１），式
（２）によって計算される。2. Description of the Related Art (1) L1 / L2 Norm FIG. 2 is a diagram for explaining an example of a pixel block which is a target of a conventional L1 / L2 norm calculation. In this example,
The pixel block has a size of 8 × 8. The L1 · L2 norm is calculated by the following equations (1) and (2) between the pixel block 1 in the current frame and the pixel blocks 3 to 5 in the previous frame 2.

【０００３】[0003]

【数１】 [Equation 1]

【０００４】[0004]

【数２】 [Equation 2]

【０００５】ここで、Ｘｊ(ｉ）は、前フレーム２から
切り出した画素ブロック３〜５中の画素である。また、
Ｙ(ｉ）は、現フレームから切り出した画素ブロック１
中の画素である。ｊは複数の前画素ブロックにつけられ
た番号を表す。図２の場合、ｊ＝３，４，５の画素ブロ
ックは、水平方向に１画素ずれているのみであり、大部
分の画素は共通である。ただし、実際には、Ｌ１もしく
はＬ２ノルム計算の対象となる複数の前画素ブロック間
のずれは水平方向に１画素のみとは限らない。水平もし
くは垂直方向に任意の画素数だけずれている場合があり
うる。Ｌ１ノルムとＬ２ノルムの違いは、２画素の差を
計算した後、絶対値をとるか乗算するかだけであるの
で、以下では、Ｌ１ノルムについてのみ説明する。Here, Xj (i) is a pixel in the pixel blocks 3 to 5 cut out from the previous frame 2. Also,
Y (i) is the pixel block 1 cut out from the current frame
It is the inside pixel. j represents a number given to a plurality of preceding pixel blocks. In the case of FIG. 2, the pixel blocks of j = 3, 4, and 5 are shifted by one pixel in the horizontal direction, and most of the pixels are common. However, actually, the shift between the plurality of previous pixel blocks to be the target of the L1 or L2 norm calculation is not limited to one pixel in the horizontal direction. There may be a case where the number of pixels is shifted in the horizontal or vertical direction. The difference between the L1 norm and the L2 norm is only that the difference between two pixels is calculated and then the absolute value is taken or multiplied. Therefore, only the L1 norm will be described below.

【０００６】前記Ｌ１・Ｌ２ノルムに関する技術につて
は、例えば、K.Kikuchi, Y.Nukada,Y.Aoki, T.Kanou,
Y.Endo, T.Nishitani, “A Single-Chip 16-bit 25ns V
ideo/Image Signal Processer” ISSCC Digest Technic
al Paper, pp.170-171, Feb1989.に記載されている。Regarding the technique relating to the L1 and L2 norms, for example, K. Kikuchi, Y. Nukada, Y. Aoki, T. Kanou,
Y.Endo, T. Nishitani, “A Single-Chip 16-bit 25ns V
ideo / Image Signal Processor ”ISSCC Digest Technic
al Paper, pp.170-171, Feb 1989.

【０００７】（２）従来技術の第１の例前記Ｌ１ノルムの計算対象となる画素ブロックの例を図
３に示す。ここで、簡単のために、画素ブロックの大き
さは４×４としている。図中の破線で囲まれた領域の画
素ブロック６についてＬ１ノルムを計算する。なお、Ｘ
５，Ｘ６，Ｘ７，Ｘ８，Ｘ２１，Ｘ２２，……はそれぞ
れ前記式（１）におけるＸ５(０），Ｘ５(１），Ｘ５
(２），Ｘ５(３），Ｘ５(４），Ｘ５(５），……に、Ｙ
０，Ｙ１，Ｙ２，Ｙ３，Ｙ４，……はそれぞれＹ
(０），Ｙ(１），Ｙ(２），Ｙ(３），Ｙ(４），……に
対応する。この画素ブロック６に対するＬ１ノルムを４
並列で計算する従来技術の第１の例の回路構成を図４に
示す。前フレーム２の画素は４バンク構成のメモリ７-
０〜７-３に置かれている。メモリ７-０〜７-３は、メ
モリ７-０が０番地、メモリ７-１が１番地、メモリ７-
２が２番地、７-３が３番地、メモリ７-０が４番地、メ
モリ７-１が５番地とアドレスが与えられており、連続
した４番地のデータを一度に読み出すことができる。画
素Ｘ０は０番地、Ｘ１は１番地、Ｘ２は２番地、……と
添え字と同じアドレスに置かれている。このメモリ７-
０〜７-３から読み出された４データは、４データロー
テーション回路９によって最下位番地のデータが左端の
差分絶対値演算器１１-０に入力するようにシフトされ
る。現画素ブロック１中の画素は、メモリ７-０〜７-３
と同じ構成のメモリ８-０〜８-３上の添え字と同じアド
レスに置かれている。また、４データローテーション回
路９の作用も同じである。従って、図示したように差分
絶対値演算器１１-０〜１１-３において｜Ｘ５−Ｙ０
｜，｜Ｘ６−Ｙ１｜，｜Ｘ７−Ｙ２｜，｜Ｘ８−Ｙ３｜
を同時に計算することができ、最終的にアキュムレータ
１３に前画素ブロック６に対するＬ１ノルムを得ること
ができる。(2) First Example of Prior Art FIG. 3 shows an example of a pixel block for which the L1 norm is to be calculated. Here, for simplification, the size of the pixel block is 4 × 4. The L1 norm is calculated for the pixel block 6 in the area surrounded by the broken line in the figure. Note that X
5, X6, X7, X8, X21, X22, ... Are respectively X5 (0), X5 (1), X5 in the above formula (1).
(2), X5 (3), X5 (4), X5 (5), ..., Y
0, Y1, Y2, Y3, Y4, ... are each Y
(0), Y (1), Y (2), Y (3), Y (4), ... The L1 norm for this pixel block 6 is set to 4
FIG. 4 shows a circuit configuration of a first example of the prior art in which parallel calculation is performed. The pixels of the previous frame 2 are 4-bank memory 7-
It is located at 0-7-3. In the memories 7-0 to 7-3, the memory 7-0 has the address 0, the memory 7-1 has the address 1, and the memory 7-
Addresses are given to 2 at address 2, 7-3 at address 3, memory 7-0 at address 4, and memory 7-1 at address 5, so that data at consecutive addresses 4 can be read at once. Pixel X0 is located at address 0, X1 at address 1, X2 at address 2, ... At the same address as the subscript. This memory 7-
The four data read from 0 to 7-3 are shifted by the four data rotation circuit 9 so that the data at the lowest address is input to the difference absolute value calculator 11-0 at the left end. The pixels in the current pixel block 1 are stored in the memories 7-0 to 7-3.
It is placed at the same address as the subscript on the memories 8-0 to 8-3 having the same configuration as. The operation of the 4-data rotation circuit 9 is also the same. Therefore, as shown in the figure, | X5-Y0 in the absolute difference value calculators 11-0 to 11-3
|, | X6-Y1 |, | X7-Y2 |, | X8-Y3 |
Can be calculated simultaneously, and finally the L1 norm for the previous pixel block 6 can be obtained in the accumulator 13.

【０００８】前記従来技術の第１の例に関する技術につ
いては、例えば、南，山内，田代，鈴木，笠井，高橋，
遠藤，浜口著、「ビデオシグナルプロセッサＩＤＳＰの
データフロー制御」、1991、信学技法、ICD91-12、pp.2
5-32に記載されている。Regarding the technique relating to the first example of the prior art, for example, Minami, Yamauchi, Tashiro, Suzuki, Kasai, Takahashi,
Endo, Hamaguchi, "Data flow control of video signal processor IDSP", 1991, IPSJ, ICD91-12, pp.2
It is described in 5-32.

【０００９】（３）従来技術の第２の例前記Ｌ１ノルムの計算対象となる画素ブロックの第２の
例を図５に示す。前記第１の例で示した画素ブロック６
の他に、水平方向に１画素づつずれた画素ブロック１
５，１６，１７が示されている。これらの４画素ブロッ
ク６，１５，１６，１７に対するＬ１ノルムを４並列で
計算する従来技術の第２の例の回路構成を図６に示す。
前フレーム２の画素は、２出力ポートを持ったメモリ１
８上の添え字と同じアドレスに置かれている。現画素ブ
ロック１中の画素は、メモリ１９上の添え字と同じアド
レスに置かれている。メモリ１８のポート０からは破線
で囲まれた画素が、ポート１からは、直線で囲まれた画
素が読み出される。セレクタ２１-０，２１-１，２１-
２は、これらの画素からそれぞれ画素ブロック１７，１
６，１５の画素を選択する。また、レジスタ２０-０〜
２０-３は、シフトレジスタであり、メモリ１９から読
み出された現画素ブロック１の画素を１サイクルごとに
シフトする。従って、差分絶対値演算器２２-０〜２２-
３でそれぞれ画素ブロック１７，１６，１５，６と現画
素ブロック１の差分絶対値を計算し、アキュムレータ２
３-０〜２３-３に画素ブロック１７，１６，１５，６と
現画素ブロック１のＬ１ノルムを得ることができる。(3) Second Example of Prior Art FIG. 5 shows a second example of the pixel block for which the L1 norm is to be calculated. Pixel block 6 shown in the first example
In addition to the above, a pixel block 1 that is shifted by one pixel in the horizontal direction
5,16,17 are shown. FIG. 6 shows a circuit configuration of a second example of the prior art for calculating the L1 norm for these four pixel blocks 6, 15, 16, 17 in four parallels.
Pixel of previous frame 2 is memory 1 with 2 output ports
It is placed at the same address as the subscript above 8. The pixel in the current pixel block 1 is placed at the same address as the subscript on the memory 19. Pixels surrounded by broken lines are read from the port 0 of the memory 18, and pixels surrounded by straight lines are read from the port 1. Selectors 21-0, 21-1, 21-
2 are pixel blocks 17 and 1 from these pixels, respectively.
Select pixels 6 and 15. Also, register 20-0 to
Reference numeral 20-3 is a shift register, which shifts the pixels of the current pixel block 1 read from the memory 19 for each cycle. Therefore, the absolute difference value calculator 22-0 to 22-
3 calculates the absolute value of the difference between the pixel blocks 17, 16, 15, 6 and the current pixel block 1, respectively, and the accumulator 2
The L1 norms of the pixel blocks 17, 16, 15, 6 and the current pixel block 1 can be obtained in 3-0 to 23-3.

【００１０】従来技術の第２の例に関する技術は、例え
ば、K.Yang M.Sun L.Wu “A FamilyVLSI Design for th
e Motion Compensation Block Algorithm” IEEE Tran
s. on Circuits and Systems, vol.36, pp.137-1325, O
ct. 1989.に記載されている。The technique relating to the second example of the prior art is, for example, K. Yang M. Sun L. Wu "A Family VLSI Design for th.
e Motion Compensation Block Algorithm ”IEEE Tran
s. on Circuits and Systems, vol.36, pp.137-1325, O
ct. 1989.

【００１１】[0011]

【発明が解決しようとする課題】ところが、従来技術の
第１の例においては、４並列演算の場合、前フレーム２
の画素を読み出すために４ポート、現画素ブロック１の
画素を読み出すために４ポート、計８出力ポート必要で
あり、多数の出力ポートを持ったメモリが必要であると
いう問題がある。また、差分絶対値を累算するために、
加算器１２をトリー状に結合するパスが必要であるとい
う問題がある。However, in the first example of the prior art, in the case of 4-parallel operation, the previous frame 2
There is a problem that 4 ports are required to read the pixels of 4 and 4 ports are required to read the pixels of the current pixel block 1, that is, a total of 8 output ports are required, and a memory having a large number of output ports is required. Also, in order to accumulate the absolute difference value,
There is a problem that a path for connecting the adder 12 in a tree shape is required.

【００１２】従来技術の第２の例においては、前フレー
ム２の画素を２画素同時に読み出すために２ポートメモ
リ１８が必要となり、しかも個々の演算回路に送る画素
を選択するセレクタ２１-０〜２１-２が必要となるとい
う問題がある。また、メモリ１９から読み出された現画
素ブロック１の画素を１サイクルごとにシフトするため
にシフトレジスタ２０-０〜２０-３が必要になるという
問題がある。In the second example of the prior art, a 2-port memory 18 is required to read out two pixels of the previous frame 2 at the same time, and selectors 21-0 to 21-21 for selecting the pixels to be sent to the individual arithmetic circuits. There is a problem that -2 is required. Further, there is a problem that the shift registers 20-0 to 20-3 are required to shift the pixels of the current pixel block 1 read from the memory 19 every cycle.

【００１３】本発明は、前記問題点を解決するためにな
されたものであり、本発明の目的は、メモリの出力ポー
トを低減することが可能な技術を提供することにある。The present invention has been made to solve the above problems, and an object of the present invention is to provide a technique capable of reducing the output ports of a memory.

【００１４】本発明の他の目的は、個々の演算回路に送
る画素を選択するセレクタ及びメモリから読み出された
現画素ブロック１の画素を１サイクルごとにシフトする
ためのシフトレジスタを不要にすることが可能な技術を
提供することにある。Another object of the present invention is to eliminate the need for a selector for selecting pixels to be sent to individual arithmetic circuits and a shift register for shifting the pixels of the current pixel block 1 read from the memory for each cycle. It is to provide the technology that is possible.

【００１５】本発明の前記目的ならびにその他の目的及
び新規な特徴は、本明細書の記述及び添付図面によって
明らかにする。The above and other objects and novel features of the present invention will become apparent from the description of this specification and the accompanying drawings.

【００１６】[0016]

【課題を解決するための手段】前記目的を達成するため
に、本発明は、前フレームから切り出した水平方向に１
画素づつずれた複数の画素ブロックと、現フレームから
切り出した画素ブロックの間のＬ１もしくはＬ２ノルム
を並列に計算するＬ１・Ｌ２ノルム並列演算装置におい
て、連続したアドレスに置かれた複数のデータを一度に
読み出すことができる複数の出力ポートを有するメモリ
と、該メモリ上に前フレームの画素を保持する画素保持
手段と、該画素保持手段から連続したアドレスを持った
複数の画素を一度に読み出して、同時に現フレームから
画素ブロックの画素を切り出す手段と、該切り出す手段
によって切り出された画素ブロックの画素を前記ポート
数と同じ数の複数の演算器のすべてに放送する手段とを
備えることを特徴とする。In order to achieve the above-mentioned object, the present invention relates to a horizontal direction 1 cut out from a front frame.
In an L1 and L2 norm parallel arithmetic unit that calculates in parallel the L1 or L2 norm between a plurality of pixel blocks that are shifted pixel by pixel and the pixel block that is cut out from the current frame, a plurality of data placed at consecutive addresses A memory having a plurality of output ports that can be read to, a pixel holding unit that holds the pixels of the previous frame on the memory, and a plurality of pixels with consecutive addresses read from the pixel holding unit at once, At the same time, it is provided with means for cutting out the pixels of the pixel block from the current frame, and means for broadcasting the pixels of the pixel block cut out by the cutting means to all of the plurality of arithmetic units of the same number as the number of ports. ..

【００１７】前記演算器は、差分絶対値演算器又は差分
自乗演算器と累算器からなることを特徴とする。The arithmetic unit is characterized by comprising a differential absolute value arithmetic unit or a differential square arithmetic unit and an accumulator.

【００１８】[0018]

【作用】前述の手段によれば、従来技術の第１の例で用
いられている複数バンクメモリとデータローテーション
回路によって構成され、連続したアドレスに置かれた複
数のデータを一度に読み出すことができるメモリ上に、
前フレームの画素を保持し、そこから連続したアドレス
を持った複数の画素を一度に読み出して、差分絶対値演
算器もしくは差分自乗演算器と累算器からなる複数の演
算回路に並列に送り、同時に現画素ブロックの画素を前
記演算器すべてに放送するので、個々のメモリから現ブ
ロックの画素を読み出すためのポートが１個で済み、必
要なメモリのポート数は演算並列度＋１となり、従来技
術の第１の例に比べて大幅に削減される。また、差分絶
対値を累算するために、加算器をトリー状に結合する必
要もない。According to the above-mentioned means, a plurality of banks of memory used in the first example of the prior art and a data rotation circuit are used, and a plurality of data placed at consecutive addresses can be read at one time. In memory,
Holds the pixels of the previous frame, reads out a plurality of pixels with consecutive addresses at once, and sends them in parallel to a plurality of arithmetic circuits consisting of a difference absolute value calculator or a difference square calculator and an accumulator, At the same time, the pixels of the current pixel block are broadcast to all the arithmetic units, so that only one port is required to read the pixels of the current block from each memory, and the number of memory ports required is +1 for the arithmetic parallelism. Is significantly reduced as compared with the first example. Further, it is not necessary to combine the adders in a tree shape to accumulate the absolute difference values.

【００１９】また、従来技術の第２の例において必要で
あった個々の演算回路に送る画素を選択するセレクタ及
びメモリから読み出された現画素ブロックの画素を１サ
イクルごとにシフトするためのシフトレジスタが不要と
なる。また、各メモリの出力ポートは１個でも良く、２
ポートメモリという制限はなくなる。Also, a shifter for shifting the pixels of the current pixel block read from the memory and the selector for selecting the pixels to be sent to the individual arithmetic circuits, which is required in the second example of the prior art, for each cycle. No need for registers. Also, each memory may have only one output port, 2
The limitation of port memory is gone.

【００２０】[0020]

【実施例】以下、本発明の実施例を図面を参照して詳細
に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００２１】図１は、本発明の実施例のＬ１・Ｌ２ノル
ム並列演算装置の構成を示すブロック図である。前述の
従来技術の第２の例の説明と同じく図５に示す水平方向
に１画素づつずれた画素ブロック６，１５，１６，１７
についてＬ１ノルムを計算する場合を示す。FIG. 1 is a block diagram showing the configuration of an L1 / L2 norm parallel arithmetic unit according to an embodiment of the present invention. Similarly to the description of the second example of the prior art described above, the pixel blocks 6, 15, 16, 17 shown in FIG.
The case where the L1 norm is calculated for

【００２２】図１において、７-０〜７-３は４バンク構
成の前フレーム内画素保持用メモリ、９は４データロー
テーション回路、１０はレジスタ、１２は加算器、１４
はＬ１ノルム書き込み用メモリ、１９は現画素ブロック
内画素保持用のメモリ、２２-０〜２２-３は差分絶対値
演算器、２３-０〜２３-３はアキュムレータ、２４-１
〜２４-３，２５-１〜２５-３は２・１セレクタであ
る。In FIG. 1, reference numerals 7-0 to 7-3 denote pixels for holding pixels in a previous frame having a 4-bank structure, 9 denotes a 4 data rotation circuit, 10 denotes a register, 12 denotes an adder, and 14
Is a memory for writing L1 norm, 19 is a memory for holding pixels in the current pixel block, 22-0 to 22-3 are absolute difference value calculators, 23-0 to 23-3 are accumulators, 24-1
24-3 and 25-1 to 25-3 are 2.1 selectors.

【００２３】前記従来技術の第１の例で用いられている
４バンク構成のメモリ７-０〜７-３から最初に４画素Ｘ
８，Ｘ５，Ｘ６，Ｘ７が読み出され、同じく従来技術の
第１の例で用いられている４データローテーション回路
９でアドレスの低い順番Ｘ５，Ｘ６，Ｘ７，Ｘ８に並べ
直されて差分絶対値演算器２２-０〜２２-３に送られ
る。First, from the memories 7-0 to 7-3 of the four-bank configuration used in the first example of the prior art, four pixels X
8, X5, X6, X7 are read and rearranged in the order of lower addresses X5, X6, X7, X8 by the 4 data rotation circuit 9 used in the first example of the prior art, and the absolute difference value is read. It is sent to the computing units 22-0 to 22-3.

【００２４】メモリ１９から読み出された現画素ブロッ
ク１の画素Ｙ０は、差分絶対値演算器２２-０〜２２-３
に放送される。次に、メモリ７-０〜７-３から４画素Ｘ
８，Ｘ９，Ｘ６，Ｘ７が読み出され、データローテーシ
ョン回路９でアドレの低い順番Ｘ６，Ｘ７，Ｘ８，Ｘ９
に並べ直されて差分絶対値演算器２２-０〜２２-３に送
られる。メモリ１９からは、画素Ｙ１が、差分絶対値演
算器２２-０〜２２-３に放送される。以下、同様にして
差分絶対値演算器２２-０〜２２-３でそれぞれ画素ブロ
ック６，１５，１６，１７の画素と現画素ブロック１の
画素の間の差分絶対値が計算され、アキュムレータ２３
-０〜２３-３にＬ１ノルムを得ることができる。なお、
途切れなくＬ１ノルムを計算するためには、図１に示す
ように、アキュムレータ２３-１〜２３-３を２重化し、
計算済みのＬ１ノルムをメモリ１４に書き込むまで上書
きされないようにする必要がある。The pixel Y0 of the current pixel block 1 read from the memory 19 is the absolute difference value calculator 22-0 to 22-3.
Will be broadcast on. Next, 4 pixels X from the memory 7-0 to 7-3
8, X9, X6, and X7 are read out, and the data rotation circuit 9 has the lowest address X6, X7, X8, and X9.
And are sent to the absolute difference calculators 22-0 to 22-3. From the memory 19, the pixel Y1 is broadcast to the absolute difference calculators 22-0 to 22-3. In the same manner, the absolute difference value calculators 22-0 to 22-3 similarly calculate the absolute difference values between the pixels of the pixel blocks 6, 15, 16, 17 and the pixel of the current pixel block 1, and the accumulator 23
The L1 norm can be obtained from -0 to 23-3. In addition,
In order to calculate the L1 norm without interruption, the accumulators 23-1 to 23-3 are duplicated as shown in FIG.
It is necessary to prevent the calculated L1 norm from being overwritten until it is written in the memory 14.

【００２５】以上の説明からわかるように、本実施例に
よれば、個々のメモリ７-０〜７-３から現ブロックの画
素を読み出すためのポートが１個で済み、必要なメモリ
のポート数は、演算並列度＋１となり、従来技術の第１
の例に比べて大幅に削減される。また、差分絶対値を累
算するために、加算器１２をトリー状に結合する必要も
ない。As can be seen from the above description, according to the present embodiment, only one port is required to read out the pixels of the current block from each of the memories 7-0 to 7-3, and the required number of memory ports is required. Is the degree of parallel operation +1, which is the first
It is greatly reduced compared to the example. Further, it is not necessary to combine the adders 12 in a tree shape to accumulate the absolute difference values.

【００２６】さらに、従来技術の第２の例において必要
であった個々の演算回路に送る画素を選択するセレクタ
２１-０〜２１-２、およびメモリ１９から読み出された
現画素ブロック１の画素を１サイクルごとにシフトする
ためのシフトレジスタ２０-０〜２０-３は不要となる。
また、各メモリの出力ポートは１個でも良く、２ポート
メモリという制限はなくなる。Further, the selectors 21-0 to 21-2 for selecting pixels to be sent to the individual arithmetic circuits required in the second example of the prior art, and the pixels of the current pixel block 1 read from the memory 19 The shift registers 20-0 to 20-3 for shifting each of the cycles are eliminated.
Further, each memory may have only one output port, and the limitation of 2-port memory is removed.

【００２７】なお、前述の実施例では簡単のために４×
４画素ブロックに対して４並列演算を行う場合について
のみ説明したが、本発明は任意の並列度、任意の画素ブ
ロックサイズに対して適用できる。In the above-mentioned embodiment, 4 × is used for simplicity.
Although only the case of performing four parallel operations on four pixel blocks has been described, the present invention can be applied to any degree of parallelism and any pixel block size.

【００２８】以上、本発明を実施例に基づき具体的に説
明したが、本発明は、前記実施例に限定されるものでは
なく、その要旨を逸脱しない範囲において種々変更し得
ることはいうまでもない。Although the present invention has been specifically described based on the embodiments, it is needless to say that the present invention is not limited to the above embodiments and various modifications can be made without departing from the scope of the invention. Absent.

【００２９】[0029]

【発明の効果】以上、説明したように、本発明によれ
ば、個々のメモリから現ブロックの画素を読み出すため
のポートが１個で済み、必要なメモリのポート数は演算
並列度＋１となり、従来技術の第１の例に比べて大幅に
削減される。また、差分絶対値を累算するために、加算
器をトリー状に結合する必要もない。As described above, according to the present invention, only one port is required for reading out the pixels of the current block from each memory, and the required number of memory ports is +1 for the arithmetic parallelism. This is a significant reduction compared to the first example of the prior art. Further, it is not necessary to combine the adders in a tree shape to accumulate the absolute difference values.

[Brief description of drawings]

【図１】本発明の実施例の全探索向きＬ１・Ｌ２ノル
ム並列演算装置の構成を示すブロック図、FIG. 1 is a block diagram showing the configuration of an L1 · L2 norm parallel arithmetic unit for full search according to an embodiment of the present invention,

【図２】Ｌ１ノルムとＬ２ノルムの計算式を説明する
ための前フレーム内の画素ブロックと現ブロックを示す
図、FIG. 2 is a diagram showing a pixel block and a current block in a previous frame for explaining formulas for calculating an L1 norm and an L2 norm.

【図３】従来技術の第１の例を説明するための計算の
対象となる画素ブロックを示す図、FIG. 3 is a diagram showing a pixel block that is a calculation target for explaining a first example of the related art;

【図４】従来技術の第１の例を説明するための回路構
成図、FIG. 4 is a circuit configuration diagram for explaining a first example of the related art;

【図５】従来技術の第２の例を説明するための計算の
対象となる画素ブロックを示す図、FIG. 5 is a diagram showing a pixel block that is a calculation target for explaining a second example of the related art;

【図６】従来技術の第２の例を説明するための回路構
成図。FIG. 6 is a circuit configuration diagram for explaining a second example of the related art.

[Explanation of symbols]

１…現画素ブロック、２…前フレーム、３，４，５，
６，１５，１６，１７…前フレーム内の画素ブロック、
７-０〜７-３…４バンク構成の前フレーム内画素保持用
メモリ、８-０〜８-３…４バンク構成の現画素ブロック
内画素保持用メモリ、９…４データローテーション回
路、１０…レジスタ、１１-０〜１１-３，２２-０〜２
２-３…差分絶対値演算器、１２…加算器、１３，２３-
０〜２３-３…アキュムレータ、１４…Ｌ１ノルム書き
込み用メモリ、１８…前フレーム内画素保持用の２ポー
トメモリ、１９…現画素ブロック内画素保持用のメモ
リ、２０−０〜２０−３…４シフトレジスタを構成する
レジスタ、２１-０〜２１-２，２４-１〜２４-３，２５
-１〜２５-３…２・１セレクタ。1 ... Current pixel block, 2 ... Previous frame, 3, 4, 5,
6, 15, 16, 17 ... Pixel block in the previous frame,
7-0 to 7-3 ... Memory for holding pixels in previous frame of 4 banks, 8-0 to 8-3 ... Memory for holding pixels in current pixel block of 4 banks, 9 ... 4 Data rotation circuit, 10 ... Register, 11-0 to 11-3, 22-0 to 2
2-3 ... Difference absolute value calculator, 12 ... Adder, 13, 23-
0-23-3 ... Accumulator, 14 ... L1 norm writing memory, 18 ... 2-port memory for holding pixels in previous frame, 19 ... Memory for holding pixels in current pixel block, 20-0 to 20-3 ... 4 Registers constituting a shift register, 21-0 to 21-2, 24-1 to 24-3, 25
-1 to 25-3 ... 2.1 selector.

フロントページの続き (51)Int.Cl.⁵ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０６Ｆ 15/70 ４１０ 9071−5Ｌ Continuation of front page (51) Int.Cl. ⁵ Identification number Office reference number FI technical display location G06F 15/70 410 9071-5L

Claims

[Claims]

1. A horizontal 1 cut out from a previous frame
The difference absolute value sum / difference sum of squares parallel calculation device that calculates the sum of absolute differences or the sum of squared differences in parallel between a plurality of pixel blocks that are shifted pixel by pixel and the pixel block cut out from the current frame A memory having a plurality of output ports capable of reading a plurality of placed data at a time, a pixel holding unit for holding a pixel of a previous frame on the memory, and a plurality of units having consecutive addresses from the pixel holding unit. Of the pixels of the pixel block are read out at the same time, and the pixels of the pixel block are simultaneously cut out from the current frame, and the means of broadcasting the pixels of the pixel block cut out by the cutting out means to all of the plurality of arithmetic units having the same number as the number of ports. A difference absolute value sum / difference sum of squares parallel arithmetic operation device comprising:

2. The difference absolute value sum / difference sum of squares parallel arithmetic operation device according to claim 1, wherein the arithmetic unit comprises a difference absolute value arithmetic unit or a difference squared arithmetic unit and an accumulator. Difference absolute value sum / difference sum of squares parallel calculation device.