JPH03237887A

JPH03237887A - Dct processor unit

Info

Publication number: JPH03237887A
Application number: JP2034310A
Authority: JP
Inventors: Mikio Fujiwara; 藤原　美貴雄; Takayuki Minemaru; 貴行峯丸; Hisashi Takayama; 久高山
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-02-14
Filing date: 1990-02-14
Publication date: 1991-10-23
Anticipated expiration: 2014-06-23
Also published as: JP2910124B2

Abstract

PURPOSE:To make a linear DCT (discrete cosine transformation) processing complete by dividing a bit length into specific bit length sets, executing the calculation of partial product in parallel depending on the bit length and executing the sum of intermediate results finally. CONSTITUTION:The DCT processor is provided with a 14-bit picture signal input u(j)2, 14-bit data registers 3-10 and 14-bit picture signal {u(n=mod(j)s), m=0-7} 11-18. Bit serial arithmetic sections 19-22 employ shift registers to apply addition and subtraction in bit serial. An M-bit length is divided into L bit length sets to satisfy the relation of L<N, the calculation of partial product is executed in parallel in the L bit length and the intermediate results are added finally. Thus, in the case of N=8, J=2, 8X1 linear DCT processing is realized for a period of 8 sampling clocks and the accuracy of the internal arithmetic operation is ensured up to the accuracy of M=14-bit without use of a multiplier.

Description

【発明の詳細な説明】産業上の利用分野本発明（よ　テレビ会議システヘ　テレビ電話の動画像
帯域圧縮でＣＣＩＴＴにより標準化作業がなされている
６４にビット／秒の画像コーデック処理テ用イられるＤ
　ＣＴ　（Ｄｉｓｃｒｅｔｅ　Ｃｏ５１ｎｅ　Ｔｒａｎ
ｓｆｏｒａ離散コサイン変換）処理装置に関すん従来の技術１画素データがＭビット長であるＭ×ＮＩｆ素ブロック
に対して、ＤＣＴを行なう場合、フィルター処理等の場
合と異なり、Ｎ画素のデータアクセス期間中に　−次元
方向の処理が完結していれば良いという利点があん　こ
の利点を活用して、ビットシリアルに演算処理をおこな
う方法力ｔ　分数型演算手法として、例えば　アイ・イ
ー・イー・イー・トランザクション・アコースティック
入スピーチ、シグナ／ｋ　　プロセッシング第２２巻（
１９７４年１２月）第４５６頁から第４６２頁（Ｉ　Ｅ
　Ｅ　Ｅ　Ｔｒａｎｓ、Ａｃ。DETAILED DESCRIPTION OF THE INVENTION Industrial Fields of Application The present invention is applicable to videoconferencing systems.The 64 bits per second image codec processing system is being used for standardization work by the CCITT for video telephony video bandwidth compression.
CT (Discrete Co51ne Tran)
sfora (discrete cosine transform) processing device Conventional technology When performing DCT on an M×NIf element block in which one pixel data has a length of M bits, the data access period of N pixels is different from the case of filter processing, etc. Among them, there is an advantage that processing in the -dimensional direction only needs to be completed. Taking advantage of this advantage, there is a method to perform bit-serial calculation processing. As a fractional type calculation method, for example, I. Transaction Acoustic Speech, Signa/k Processing Volume 22 (
December 1974), pages 456 to 462 (IE
E E Trans, Ac.

ｕｓｔｉｃ、、５ｐｅｅｃｈ、Ｓｉｇｎａｌ　Ｐｒｏｃ
ｅｓｓｉｎｇ　ｖｏｌ、ＡＳＳＰ＝２２゜ｐｐ、　４５
６−４６２．　Ｄｅｃ。１９７４どＡ　ｎｅｗ　ｈａｒ
ｄｗａｒｅ　ｒｅａｌｉｚａｔｉｏｎ　ｏｆ　ｄｉｇｉ
ｔａｌ　ｆｉｌｔｅｒｓ、’ｂｙ　Ａ、Ｐｅ１ｅｄ　ａ
ｎｄ　Ｂ、Ｌｉ。ustic, 5peech, Signal Proc
essing vol, ASSP=22゜pp, 45
6-462. Dec. 1974 A new har
dware realization of digi
tal filters,'by A, Pe1ed a
nd B, Li.

Ｕ）に発表されていも　この処理手法＆よ　Ｍビット長
のデータに関する演算を、　ｉビット目の演算というサ
ブセットに着目して算出し　その結果に対して２１＋−
１１の桁補正を施して加算することにより最終結果を求
めるというものであ４　　ＤＣＴ処理について、この手
法を適用すると、以下のようになん　今、Ｍビット長で
負の数を２の補数で表ゎ−１すＮ個の整数データ列（ｕ（ｎ）−Σａ＋　（ｎ）２’
、ａｎ−＋　（ｎ）−−■ ［０，−１］、　（ａｔ　（ｎ）（０，１コ、０≦ｉ≦
１２，０≦ｎ≦Ｎ−１））に対する一次元のＤＣＩ友　
式（１−１）〜（１−３）と表現することが出来もこの式で、ｉに関する加算でまとめると、次式のように
なん、０≦に≦Ｎ−１（１−５）式（１−５）で、大括弧（）の中のデータ”’Ｑ、　　
ａｔ（ｎ）は０か１あるいはＯか−１の１ビツトのデー
タであに準備することが可能であも故に大括弧（１１７２ α（０）−（＞（１−２）２　　１′な α（ｋ）−（−）、　　　Ｏ≦に≦Ｎ−１（１−３）上
式（１−１）に　ｕ（ｎ）の指数表現を代入すると、式
（１−４）のように書けも算を用いることなく加減算のみで実行することが出来　
集積回路で実現する場合に　並列乗算器を用いる場合に
比べてチップサイズを小さくすることが出来る利点を有
していも　さらにＤＣＴＯ場が偶数の場Ａ　　Ｎ−２Ｎ
’として式（１−５＞は以下のようにあられすことが出
来も上式の第二項のＣ０５（・）の項を変形すると、となも
　式（１−８）、　（１−９）を用いて、ｋについて偶
数項と奇数項で式（１−７）を変形すると、次式のよう
になもに−２に’　、　Ｏ≦に′≦Ｎ′−１の時ν（２に’）
−Σ２１（Σ（ａｔ　（ｎ）＋ａ＋　（２Ｎ’　−１−
１１）　）（Ｚ　（２に’　）ｃｏｓ雪・　　ｎ・− に−２に’　＋１．０≦に°≦Ｎ°−１の時同様にに−２に’　＋１．　Ｏ≦に′≦Ｎ′−１の時となも式（１−１０）と（１−１１）により、ＤＣＴの変換核
ｃｏｓ［ π　（２ｎ＋１）ｋ］の対称性を利用すると、 α（ｋ）ｃｏｓ容量（よ　１つのｋに対してがワードから２１１１／Ｉ
ｌｌに節約することが出来ることがわかん　しかＬ　　
（ａ、（ｎ）＋ａ＋　（２Ｎ’−１−ｎ））や（ａ＋　
（ｎ）−ａ＋　（２Ｎ’−１−ｎ））項か転キャリーお
よびボロー発生があるので、ｉに関する加算回数は（Ｍ
＋１）回となん　このように　この演算方式は　大括弧
（）の中の演算を、ＤＣＴの変換同時にＤＣＴの変換核
の対称性を利用してＲＯＭ容量を節約することがでよ　
演算そのものは乗算を用いることなく加減算のみで実行
することが出来も　これらの特ｍｔｔ　　集積回路で実
現する場合に　並列乗算器を用いる場合に比べてチップ
サイズを小さくすることが出来るという利点を有してい
も発明が解決しようとする課題しかしなが転　１￥Ａ素のサンプリング時間が１基本ク
ロック期間であるとして、この１クロック期間に一回の
加算処理や一回のＲＯＭアクセスが可能な同期系を想定
すると、　ビット長Ｍ＃ｔＤＣＴの処理単位Ｎよりも大
きい場合　そのままでζ上処理が完結しないことを意味
すム　これ！！　　Ｎ−１６以上の場合には問題になら
ない爪　ＣＣＩＴＴにより標準化作業がなされている６
４にビット／秒の画像コーデック処理で用いられるＮ−
８のＤＣＴの場合にば　Ｍ≦８ビットで制限されること
になるた吹中間処理部で十分な精度を得られないという
問題点があっ１．　　本発明はかかる点に鑑−ＬＭ＞Ｎ
ビットの精度でＮサンプリングクロックの期間でＮｘ１
の一次元のＤＣＴ処理を完結するＮ×ＮのＤＣＴ処理装
置を安価に提供することを目的とすも課題を解決するた
めの手段上記の問題点を解決するた吹　本発明のＤＣＴ処理装置
＋＆　　Ｍビット長をＬ＜Ｎを満足するＬビット長に分
割Ｌ−Ｌビット長で部分積の演算を並列的に実行し　最
後にそれらの中間結果の加算を実行するという構成を備
えたものであも作用本発明は前記した構成により、Ｌビット長で部分積の演
算が並列に実行されると、中間和が並列に生成されるた
めに　演算が高速に実行されることとなり、ビット長Ｍ
がＤＣＴの処理単位Ｎよりも大きい場合においてもＮサ
ンプリングクロックの期間で処理が完結すも実施例以下、本発明のＤＣＴ処理装置の一実施例を図面と共に
説明すも　第１図は本発明の一実施例における１４ビツ
トの画像信号人力ｕ（ｊ）に対する８×１の一次元のＤ
ＣＴ処理装置のブロック図であも図において、２は１４
ビツトの画像信号人力ｕ（ｊ）、３〜１０は１４ビツト
のデータレシス久　１１〜１８は１４ビツトの画像信号
［ｕ（ｎ＝ｍｏｄ（ｊ　）＊　）、　ｍ−０〜７　）で
あも１９〜２２はビットシリアル演算部であり、シフト
レジスタを用いて、ビットシリアルに加算および減算を
行なう。２３〜３８はビットシリアル演算部１９〜２２
のビットシリアル演算の結果である各１ビツトの信号で
、３９〜４２は１ビツトの演算結果２３〜３８を各４ビ
ツトごとにまとめたデータ線であム４３〜４６はデータ
線３９〜４２の４ビツトのデータをアドレス情報とり、
、ＲＯＭにより係数とデータの乗算の部分積を生成し　
その値に左方シフトを施し累積加算を行なうＲＯＭと加
算器による係数乗算部であ′；４ｏ４７〜５４は８×１
のＤＣＴ処理結果の３３ビツトの出力４言号（ν（ｋ）
、　ｋ−０〜７）であも　５５〜６２は３３ビツトトラ
イステートドライバであり、出力データの並列／直列変
換を行なう。６３は３３ビツトトライステートドライバ
の５５〜６２の動作により時系列化された３３ビット信
号出力であも　第２図は第１図のビットシリアル演算部
１９〜２２の回路構成図であモロ５は１４ビツトの画像
信号ｕ（ｎ）、６６は１４ビツトの画像信号ｕ（７−ｎ
）であ７）ｃ、６７．６８は上位７ビツトと下位７ビツ
トが独立な１４ビツトのデータロード機能付き右方シフ
ターであり、ビットシリアル演算に必要なビット単位で
の処理を行なう。６９．７０は１ビツト全加算！　　７
１．７２は１ビツト全減算器であモア３〜７６は１ビツ
トのデータラッチ玄　１ビット全加算器６９．７０での
演算で発生するキャリーおよび１ビット全減算器７１．
７２での演算で発生するボローを保持すん７７〜８０は
各１ビツトの演算結果の信号であり、係数との乗算の部
分積をＲＯＭから読み出す時のアドレス情報として用い
られも　第３図は第１図のＲＯＭと加算器による係数乗
算部４３〜４６の回路構成図であム８２〜８５は係数と
の乗算の部分積をＲＯＭから読み出す時のアドレス情報
である各４ビツトのデータであモ８６〜８９は１６ワー
ド×１８ビツト容量玄　係数との乗算の部分積を生成す
るＲＯＫ９０〜９３は２６ビツト全加算銖９４〜９７は
２６ビツトのデータロード機能付き右方シフター、９８
．９９は３３ビツト全加算器　１００．１０１は３３ビ
ツトレジス久　１０２は３３ビット出力信号ν（２に’
　）、１０３は３３ビット出力信号ν（２に’＋１）で
あも　第１図と第２図と第３図を用いて、　８×１の一
次元ＤＣＴ処理の動作について説明すも　本発明におい
てＬＬＭ＞８ビツト長の１画素データをＬビット長のデ
ータに分割して、処理を実行すも　例えばＭビット長の
データを３個のＬビット長データに分割すると、式（１
−５）は次のように変形できム２π（２ｎ＋１）ｋ ν（ｋ）−Σ２１（Σａｔ（ｎ）α（ｋ）ｃｏｓ　［］
　）−一　内°−２Ｎ上式は３個の部分項の和によって戒り立板　各部分項は
Ｌ回の加算により実行されることを意味してい４Ｌ回の
加算時間と３個の項を加算する時間の総和７５ｔ　　Ｎ
個のデータのサンプリング時間よりも短ければ　目的と
する高速処理が実現出来も一例として、Ｎ−８、Ｊ−２
の場合を考えも　この時以下の式を満足するＭビット長
のデータまで高速処理が可能であも８≧ｔｒｕｎｅ（Ｍ／２＋０．５）＋１　；ｔｒｕｎｅ
（−）切り捨て（１−１３）故にＭ≦１４となも　まｆ
、ＲＯＭ容量削減のた吹従来例と同様に　式（１−１２
）に対し式（１−１０）、　（１−１１）を適用すると
式（１−１４）、（１−１５）が得られもに−２に’、
０≦に′≦３の時・　　　　　　１ｙ（２に’）−Σ２’（Σ（ａ＋（ｎ）十ａ＋（７−ｎ
））α（２に’）ｃｏｓｌ−＠　　　　　　＊−・２π（２ｎ＋１）ｋ’ ［−１）、Ｏ≦に′≦３に−２に’　＋１．　Ｏ≦に′≦３の時（１−１４） ν（２に’＋１）−Σ２ −Ｉ（Σ（ａ＋　（ｎ）−ａ＋　（７−ｎ））　ａ　（２に
’　＋１）ｃｏｓン構戊溝底用することにより、Ｍ−１
４ビツトの精度で８画素のサンプリングクロックの期間
で８ｘｌの一次元のＤＣＴ処理を実現することができも
第１図において、　８×１の一次元ＤＣＴ処理の動作を
説明すモ１４ビットの画像信号人力ｕ（ｊ）２は８画素
のサブセットに対してＤＣＴ処理を施されるた６　１４
ビツトレジスタ３〜１０ニ　　それぞれ［ｕ（ｎ）、ｎ
−ｍｏｄ（ｊ）ｓ、Ｏ≦ｎ≦７）と分割されて保持され
も１４ビツトレジスタ３〜１０で（よ　この８個のサブ
セットデータ列（ｕ（ｎ）、０≦ｎ≦７）が完全に更新
されるま”Ｑ１回のデータサンプリングに対して１回の
シフト動作を行な（＼　データを順次送っていく。Although this processing method was published in
The final result is obtained by applying 11 digit corrections and adding them.4 When this method is applied to DCT processing, the result is as follows.ゎ-1 N integer data string (u(n)-Σa+ (n)2'
, an-+ (n)--■ [0,-1], (at (n) (0,1, 0≦i≦
One-dimensional DCI friend for 12,0≦n≦N-1))
This formula can be expressed as formulas (1-1) to (1-3), and when summarized by addition regarding i, as in the following formula, 0≦≦N-1 (1-5) Formula (1-5), the data in brackets ()''Q,
at(n) can be prepared with 1-bit data of 0 or 1 or O or -1, so it is written in square brackets (1172 α(0)-(> (1-2) 2 1'). α(k)-(-), O≦≦N-1 (1-3) Substituting the exponential expression of u(n) into the above equation (1-1), it can be written as equation (1-4). It can also be executed using only addition and subtraction without using arithmetic.
Even though it has the advantage of being able to reduce the chip size when implemented using an integrated circuit compared to using parallel multipliers, in addition, if the DCTO field is an even number, A N-2N
', the equation (1-5> can be written as below.If we transform the second term C05(・) in the above equation, we get Equation (1-8), (1-9 ) and transform equation (1-7) with an even term and an odd term for k, we get ν(2 ni')
-Σ21(Σ(at (n)+a+ (2N' -1-
11) ) (Z (to 2') cos snow・n・− to −2′ +1.0≦°≦N°Similarly when −1, to −2′ +1. O≦′≦N′ −1, using equations (1-10) and (1-11), and using the symmetry of the DCT transformation kernel cos[π (2n+1)k], α(k)cos capacity (as one 2111/I for k from word
I don't know how much money I can save.
(a, (n)+a+ (2N'-1-n)) or (a+
(n)-a+ (2N'-1-n)) term, carry and borrow occur, so the number of additions for i is (M
+1) times In this way, this calculation method can save ROM capacity by using the symmetry of the DCT transformation kernel while converting the operations in square brackets () to the DCT.
The operations themselves can be performed using only addition and subtraction without using multiplication.These characteristics have the advantage that the chip size can be made smaller when implemented using integrated circuits compared to when parallel multipliers are used. However, the problem to be solved by the invention is as follows: Assuming that the sampling time of 1\A element is one basic clock period, there is a synchronous system that can perform one addition process and one ROM access in this one clock period. Assuming that, if the bit length M#tDCT is larger than the processing unit N, this means that the processing on ζ will not be completed. ! Nails that are not a problem in cases of N-16 or higher Standardization work is being carried out by CCITT6
N- used in image codec processing at 4 bits per second
In the case of a DCT of 8 bits, there is a problem that sufficient accuracy cannot be obtained in the intermediate processing section, which is limited to M≦8 bits.1. The present invention takes into consideration this point - LM>N
Nx1 with a period of N sampling clocks with bit precision
The object of the present invention is to provide an N×N DCT processing device that completes one-dimensional DCT processing at a low cost. It has a structure in which the M bit length is divided into L bit lengths satisfying L<N, partial product operations are executed in parallel with L-L bit lengths, and finally the intermediate results are added. According to the present invention, with the above-described configuration, when partial product operations are executed in parallel with a length of L bits, intermediate sums are generated in parallel, so that the operations are executed at high speed.
Even if the DCT processing unit N is larger than the processing unit N of the DCT, the processing is completed in a period of N sampling clocks. An 8×1 one-dimensional D for a 14-bit image signal u(j) in one embodiment
In the block diagram of the CT processing device, 2 is 14.
11-18 are 14-bit image signals [u(n=mod(j)*), m-0-7) and 19 .about.22 is a bit-serial calculation unit that performs bit-serial addition and subtraction using a shift register. 23 to 38 are bit serial calculation units 19 to 22
The signals 39 to 42 are each 1-bit signal which is the result of the bit serial operation of Takes 4-bit data as address information,
, generate partial products of multiplication of coefficients and data by ROM.
It is a coefficient multiplication unit consisting of a ROM and an adder that performs cumulative addition by shifting the value to the left; 4o47 to 54 are 8×1
The 33-bit output 4 words (ν(k)
, k-0 to k-7) and 55 to 62 are 33-bit tri-state drivers, which perform parallel/serial conversion of output data. 63 is the 33-bit signal output time-series by the operation of 55-62 of the 33-bit tri-state driver. The 14-bit image signal u(n), 66 is the 14-bit image signal u(7-n
) and 7)c, 67.68 is a 14-bit right shifter with a data load function in which the upper 7 bits and lower 7 bits are independent, and performs bit-by-bit processing necessary for bit serial operation. 69.70 is 1 bit total addition! 7
1.72 is a 1-bit full subtractor, and mores 3 to 76 are 1-bit data latch blocks, a carry generated by the operation in the 1-bit full adder 69, and 70, and a 1-bit full subtractor 71.
The signals 77 to 80 that hold the borrow generated in the operation at 72 are each 1-bit operation result signals, and are used as address information when reading out the partial product of multiplication with a coefficient from the ROM. Figure 1 is a circuit diagram of the coefficient multipliers 43 to 46 using the ROM and the adder. Figures 82 to 85 are 4-bit data each serving as address information when reading out the partial product of multiplication with a coefficient from the ROM. 86 to 89 are 16 word x 18 bit capacity. ROK 90 to 93 are 26 bit full adders that generate partial products for multiplication with coefficients. 94 to 97 are 26 bit right shifters with data loading function.
．． 99 is a 33-bit full adder, 100.101 is a 33-bit register, and 102 is a 33-bit output signal ν (2).
), 103 is a 33-bit output signal ν (2 + 1).The operation of 8×1 one-dimensional DCT processing will be explained using FIGS. One pixel data with LLM>8 bit length is divided into L bit length data and processing is executed. For example, if M bit length data is divided into three L bit length data, the formula (1
−5) can be transformed as follows: 2π(2n+1)k ν(k)−Σ21(Σat(n)α(k)cos []
) - 1 inside ° - 2N The above equation is calculated by the sum of 3 subterms. This means that each subterm is executed by L additions, which requires 4L addition times and 3 terms. Total time to add 75t N
For example, if the sampling time is shorter than the data sampling time of N-8, J-2, the desired high-speed processing can be achieved.
Consider the case of 8≧trune(M/2+0.5)+1;trune
(-) Round down (1-13), so M≦14.
, the formula (1-12
), applying equations (1-10) and (1-11), equations (1-14) and (1-15) are obtained, and -2',
When 0≦′≦3・1 y(2′)−Σ2′(Σ(a+(n)10a+(7−n
)) α(to 2')cosl-@*-・2π(2n+1)k' [-1), O≦to'≦3to-2'+1. When O≦′≦3 (1-14) ν (2′+1)−Σ2 −I (Σ(a+ (n)−a+ (7−n)) a (2′+1) cosonic structure By using the groove bottom, M-1
It is possible to realize 8xl one-dimensional DCT processing with 4-bit precision and an 8-pixel sampling clock period. Since the signal force u(j)2 is subjected to DCT processing on a subset of 8 pixels, 6 14
Bit registers 3 to 10 [u(n), n
-mod(j)s, O≦n≦7) and is held in 14-bit registers 3 to 10 (these eight subset data strings (u(n), 0≦n≦7) are completely Until the data is updated, one shift operation is performed for one data sampling (＼ Data is sent sequentially.

つまり、　８回のデータサンプリング毎に　新しいサブ
セットデータカ＜、１４ビツトレジスタ３〜１０にｕ（
７）、・・・・、ｕ（０）としてセットされも　次に　
このデータ＆上１４ビットの信号線１１〜１８を介して
、それぞれビットシリアル演算部１９〜２２に供給され
もこのビットシリアル演算部１９〜２２における処理を
、第２図を用いて説明すモ１４ビットの画像入力６５〜
６６１上　　第１図の１４ビツトレジスタ３〜１０のい
ずれかからのデータ玄　２の補数表現を用いて現わ（ａ
＋＊（７−ｎ）ε［０，−１コ、ａｔ（７−ｎ）Ｅ［０
，１］、０≦ｉ≦　１２．Ｏ≦ｎ≦３）であも　これら
のデータカ丈　上位７ビツトと下位７ビツトが独立した
１４ビツトのデータロード機能付き右方シフター６７、
６８に入力され　それぞ位７ビツトが分離した形で処理
され　１クロック期間毎に１回のＬＳＢ側への右方シフ
トが実行される。データロード機能付き右方シフター６
７．６８より出力される信号ＣＬ　　ｕ（ｎ）およびｕ
（７−ｎ）の上位７ビツトと下位７ビツトに関して２Ｌ
桁の各１ビツトの値で、　ａｌ（ｎ）とａｔ・〒（ｎ）
とａｔ　（７−ｎ）とａ＋＊ｙ（７−１）であも　これ
らの信号により、　１ビツト全加算器６９．７０と１ビ
ット全減算器７１．７２において、式（１−１４）％式
％（）（７））の演算を実行すも　これらの演算により発生するキ
ャリーおよびボローは１ビツトラツチ７３〜７６に保持
され　１クロツク後の演算に用いられるために　元の１
ビット全加算器６９．　Ｔｏと１ビット全減算器７１．
７２に再帰的に入力されも　１ビット全加算器６９．７
０の演算結果（友　１ビツトデータ線７７、７８に各々
出力され　１ビット全減算器７１．７２の演算結果ζ友
　１ビットデータ線７９．８０に各々出力されも第２図
で説明したのと同様に　ビットシリアル演算部１９．２
２で（よ　式（１−４Ｌ（１−５）の右辺の（ａ＋◆マ
（ｎ）＋ａ＋−７（７−ｎ＞）、（ａｔ　（ｎ）＋ａ＋
　（７−ｎ））、（（ａｌｙ（ｎ）−ａ−７（７−ｎ）
）、（ａｔ　（ｎ）−ａｔ　（７−ｎ））の演算が実行
され　ビットシリアル演算部１９ではｕ（０）とｕ（７
）について、ビットシリアル演算部２０ではｕ（１）と
ｕ（６）について、ビットシリアル演算部２１ではｕ（
２）とｕ（５）について、ビットシリアル演算部２２で
はｕ（３）とｕ（４）について、この演算を実行すも　
この語気　各ビットシリアル演算部１９〜２２より出力
される４ビットデータ線３９〜４２４；Ｌ４ビットデー
タ線３９が（（ａ＋＊〒（ｎ）＋ａ＋◆ｖ（７−ｎ））
、ｒｒｏ、　１．２．３）を示Ｌ４ビットデータ線４０
が（（ａｔ　（ｎ）＋ａ＋　（７−ｎ））、ｎ−０，１
，２，３）を示ＬＡ　４ビツトデータ線４１が（（ａｔ
−ｖ（ｎ）−ａ＋◆ｙ（７−ｎ））、ｎ−０，１，２゜
３）を示り、４ビツトデータ線４２が（（ａｔ　（ｎ）
−ａｔ　（７−ｎ））、ｎ−０，１，２，３）をそれぞ
れ示していも　これらの４ビツトの信号の意味を、もう
少し詳しく説明するために　式（１−１４）、（１−１
５）に戻って説明すも　式（１−１４）および式（１−
１５）のｎに関する和の部分を展開すると、次式のよう
に表現することが出来もに−２に’　、　Ｏ≦に°≦３
の時ｋ−２ｋ”＋１，０≦に′≦３の時、０≦ｋ”≦３　　　　　　　　　　（１−１７）δ 、０≦に′≦３（１−１６）このように　上式（１−１６）における各２Ｌ桁に関す
る演算は　Ｋ′を固定すれｇｉ　　（（ａｔ−ｖ（ｎ）
＋ａ＋−ｙ（７−ｎ））、ｎ−０，１，２，３）の４ビ
ツトのデータと（（ａｔ　（ｎ）＋ａ＋（７−ｎ））、
ｎ−０，１，２，３）の４ビツトのデータによって一意
的に決定することが出来も　又　式（１−１７）につい
ても同様のことが成立すも　故にこれらの４ビット信号
をアドレス情報とし　そのアドレス情とは容易であん　
このように　４ビツトデータ線３９の４ビツトデータ（
よ　式（１−１４）における２１９７桁らｈ　　ＲＯＭ
と加算器による係数乗算部４３〜４６に入力されも　同
様に　４ビツトデータ線４０の４ビ（１−１５）におけ
る２′″７桁での（Σ（ａ＋＊〒（ｎ）−ａ＋−ｙ（７
−ドレス情報として用いらｈ　　ＲＯＭと加算器による
係数乗算部４３〜４６に入力されも　同様に　４ビツト
データ線４２の４ビツトデータζ友　式（１−１５）に
として用いらＮ　　ＲＯＭと加算器による係数乗算部４
３〜４６に入力されも　次に　ＲＯＭと加算器による係
数乗算部４３〜４６の中での処理について、第３図を用
いて説明すも　第３図において、　４ビット信号８２は
、式（１−１４）における２１＋？桁での、（，４（ａ
＋求めるアドレス情報として用いらｔ”ｔ　　ＲＯＭと
加算器による係数乗算部４３〜４６に入力されも　同様
ピットデータ線４１の４ピットデータは式求めるアドレス情報で、　４ビツトデータ線３９を介し
て入力されも　同様に　４ビット信号線８３ハ式（１−
１４）における２＋桁での（Σ（ａ＋（ｎ）＋ａ＋（７−ｎ））α１６
ワード×１８ビツト容量のＲＯＭ８７でｉ３　４ピッで
、　４ビツトデータ線４０を介して人力されも　同様に
　４ビット信号８４ζ友　式（１−１５）における２１
０ｙｒ　（２ｎ＋１）（２に’　＋１）［］）を求めるアドレス情報型　４ビットデータ線４１を介して入力されも　同様に４ビツト信号８５（上式（１−１５）における２桁でのトのデータとして出力すも　同様ニ１６ワード×１８ビ
ット容量のＲＯＭ８８では　４ビット信号８４を一夕と
して出力すも　同様？、、　　１８ワード×１８ビツト
容量のＲＯＭ８９では　４ビット信号８５をアドレπ（
２ｎ＋１　）（２に’　＋１　）［］）を求めるアドレス情報で、４ビットデータ線４２を介して入力されも　次ニ１６７−ド
×１８ビット容量のＲＯＭ８６では４ビツト信号８２をアドレス情報として受け（Σ（ａ＋◆マ（ｎ）＋ａ＋◆テ
値を１８ビツトのデータとして出力すム同様に出力すム　次に２６ビツト全加算器９０〜９３と、２６
ビツトのデータロード機能付き右方シフター９４〜９７
（表　４組の２６ビツト累積加算器として働き、前記Ｒ
ＯＭ８６〜８９からの１８ビツトの出力データ（友　２
６ビツト全加算器９０〜９３の一方の入力のＭＳＢ側１
８ビットに入力され７に、２６ビツト全加算器９０〜９
３での加算結果ζよ　それぞれ２６ビツトのデータロー
ド機能付き右方シフター９４〜９７でＬＳＢ側に（右方
に）１ビツトシフトされ　次のクロック期間で、前記Ｒ
ＯＭ８６〜８９の出力と加算されも　但し　この動作で
、ｉ−Ｏの時にζ上２６ビツトのデータロード機能付き
シフター９４〜９７から２６ビツト全加算器９０〜９３
に入力されるデータは０”に初期化されもこの操作によ
り、８回のクロック期間で、式（１−１４）、（１−１
５）のそれぞれ４つの項が算出されモ３３ビット全加算
器９８〜９９で（上２６ビツトシフター９４〜９７の出
力を加算すも　ここで、２６ビツトシフター９４と９６
の出力は加算時に２７で桁補正が行なわれ式（１−１４
）、（１−１５）のν（２に’　）、（２に’　＋１）
の値を算出すａ　そして、３３ビットレジスター１００
．ｌ０ＩＥ、　　その演算結果をセットすモ３３ビット
レジスター１００．１０１は次の８クロツクの朝駆　新
しいサブセットに対してν（２に’　）、（２に’＋１
）の値が算出されるま℃　現在の値を保持すん　ここで
第１図に戻って、説明を続けも　第３図における前記３
３ビットレジスター１００，１０１からのデータ１０２
．１０３ｉよ　第１図の４７〜５４に対応し　他の３つ
のブロックの信号の出力信号と合わせて、ＤＣＴ処理さ
れた信号列（ν（ｋ）、０≦に≦７）となん　この３３
ビット出力信号列（ν（ｋ）。In other words, every 8 data samplings, a new subset data count <, u(
7),..., even if set as u(0), then
This data is supplied to the bit serial calculation units 19 to 22 via the upper 14 bit signal lines 11 to 18, respectively.The processing in the bit serial calculation units 19 to 22 will be explained with reference to FIG. Bit image input 65~
661 Upper Data entry from any of the 14-bit registers 3 to 10 in Figure 1 is expressed using two's complement representation (a
+*(7-n)ε[0,-1, at(7-n)E[0
, 1], 0≦i≦ 12. O≦n≦3) These data lengths: Right shifter 67 with a 14-bit data load function where the upper 7 bits and lower 7 bits are independent;
68, each seven bits are processed separately and right-shifted to the LSB side once every clock period. Right shifter 6 with data load function
7. Signals CL u(n) and u output from 68
2L for the upper 7 bits and lower 7 bits of (7-n)
The value of each 1 bit of the digit is al(n) and at・〒(n)
and at(7-n) and a+*y(7-1) With these signals, in the 1-bit full adder 69.70 and the 1-bit full subtracter 71.72, the formula (1-14) The carries and borrows generated by these operations are held in 1-bit latches 73 to 76, and are used for the operation one clock later, so that the original 1
Bit full adder 69. To and 1-bit full subtractor 71.
1-bit full adder 69.7
The operation result of 0 (outputted to the 1-bit data lines 77 and 78, respectively, and the operation result of the 1-bit full subtracter 71, 72 outputted to the 1-bit data line 79, 80, respectively) is the same as explained in FIG. Similarly, bit serial operation section 19.2
2, (y) (a+◆ma(n)+a+-7(7-n>), (at (n)+a+) on the right side of equation (1-4L(1-5))
(7-n)), ((aly(n)-a-7(7-n)
), (at (n)-at (7-n)) are executed, and the bit serial calculation unit 19 calculates u(0) and u(7
), the bit serial calculation unit 20 calculates u(1) and u(6), and the bit serial calculation unit 21 calculates u(
2) and u(5), the bit serial calculation unit 22 executes this calculation for u(3) and u(4).
4-bit data lines 39 to 424 output from each bit serial calculation unit 19 to 22; L4 bit data line 39 is ((a+*〒(n)+a+◆v(7-n))
, rro, 1.2.3) L4 bit data line 40
is ((at (n)+a+ (7-n)), n-0,1
, 2, 3).The LA 4-bit data line 41 indicates ((at
-v(n)-a+◆y(7-n)), n-0,1,2°3), and the 4-bit data line 42 shows ((at (n)
-at (7-n)), n-0, 1, 2, 3) respectively. 1
Returning to 5), equation (1-14) and equation (1-
Expanding the sum part with respect to n in 15), it can be expressed as follows: -2', O≦°≦3
When k-2k''+1, when 0≦′≦3, 0≦k”≦3 (1-17) δ, 0≦′≦3 (1-16) Thus, the above equation (1-16 ) for each 2L digit, fix K' and write gi ((at-v(n)
4-bit data of +a+-y(7-n)), n-0, 1, 2, 3) and ((at (n)+a+(7-n)),
It can be uniquely determined by the 4-bit data of n-0, 1, 2, 3), and the same holds true for equation (1-17).Therefore, these 4-bit signals can be used as address information. It is not easy to determine the address information.
In this way, the 4-bit data on the 4-bit data line 39 (
From the 2197th digit in formula (1-14) h ROM
Similarly, 2''' 7 digits (Σ(a+*〒(n)-a+-y (7
Similarly, the 4-bit data on the 4-bit data line 42 is input to the coefficient multipliers 43 to 46 using the h ROM and the adder. Coefficient multiplier 4 by
Next, the processing in the coefficient multipliers 43 to 46 by the ROM and adder will be explained using FIG. 3. In FIG. -14) 21+? digit, (,4(a
+Used as the desired address information t''t Also input to the coefficient multipliers 43 to 46 by the ROM and adder.Similarly, the pit data line 41-4 is the address information sought by the formula, and is input via the 4-bit data line 39. Similarly, the 4-bit signal line 83C type (1-
(Σ(a+(n)+a+(7-n))α16 in 2+ digits in 14)
Similarly, if a ROM 87 with a word x 18-bit capacity is input manually via a 4-bit data line 40 using i3 4 bits, the 4-bit signal 84
0yr (2n+1) (2 to ' +1) []) Even if input via the 4-bit data line 41, the 4-bit signal 85 (the 2-digit value in the above equation (1-15)) is inputted via the 4-bit data line 41. Similarly, a ROM88 with a capacity of 16 words x 18 bits outputs a 4-bit signal 84 all at once.Similarly, a ROM89 with a capacity of 18 words x 18 bits sends a 4-bit signal 85 to an address π(
2n+1) (2 to '+1) []) Even if it is input via the 4-bit data line 42, the ROM 86, which has a capacity of 167-bits and 18-bits, receives the 4-bit signal 82 as the address information ( Σ(a+◆ma(n)+a+◆te value is output as 18-bit data. Next, 26-bit full adders 90 to 93 and 26-bit full adders 90 to 93,
BIT's right shifter with data load function 94-97
(Table 4 functions as a 26-bit cumulative adder, and the R
18-bit output data from OM86 to 89 (friend 2
MSB side 1 of one input of 6-bit full adders 90 to 93
8 bits input to 7, 26 bit full adder 90-9
The addition result ζ in step 3 is shifted by 1 bit to the LSB side (to the right) by right shifters 94 to 97, each with a 26-bit data loading function.
However, in this operation, when I-O, 26-bit data load function shifters 94-97 on ζ are added to 26-bit full adders 90-93.
Although the data input to the input terminal is initialized to 0'', this operation allows the expressions (1-14) and (1-1
5) are calculated by the 33-bit full adders 98 to 99 (the outputs of the upper 26-bit shifters 94 to 97 are added together).
The output of is corrected by 27 at the time of addition and becomes the formula (1-14
), (1-15) ν(2' ), (2' +1)
Calculate the value of a and the 33-bit register 100
．． l0IE, the result of the operation is set in the 33-bit register 100.101 for the next 8 clocks.
) is calculated, hold the current value. Now return to Figure 1 and continue the explanation.
Data 102 from 3-bit registers 100, 101
．． 103i, which corresponds to 47 to 54 in Fig. 1, and together with the output signals of the other three blocks, the DCT processed signal sequence (ν(k), 0≦to≦7) and this 33
Bit output signal sequence (ν(k).

０≦に≦７）がそれぞれトライステートドライバー５５
〜６２により、時系列化されて出力端子６３より出力さ
れも　第４図は本発明の一実施例によるアダプティブＤ
ＣＴ処理装置の概略構成を示すものであａ１０４は制御
信号入力端子、１０５はデータストローブ信号入力端子
、１０６は１４ビツトの画像信号入力端子、１０７は１
４ビツトの参照画像信号入力端子、１０８は差分器１０
９はクリッピング回［１１０は８×１の一次元のＤＣＴ
処理回路１１１に対するタイミング信号生成同区１１２
はクリッピング・丸め込み処理同区１１３は１２８ワー
ド×１６ビツトのデュアルポートメモリ１１４への書き
込み制御同区１１５はデュアルポートメモリ１１４から
の読み出し制御同区　１１６は８×１の一次元のＤＣＴ
処理回路１１７に対するタイミング信号生成回ｊ！１　
１１８はクリッピング・丸め込み処理同区１１９は１４
ビツトの画像出力端子であも　第４図は第１図の８×１
のＤＣＴ処理回路ブロックを利用した８×８のアダプテ
ィブＤＣＴ処理装置の一例であん　制御信号１０４によ
りアダプティブ処理を行なう場合６上　差分器１０８に
おいて、１４ビツト画像信号人力１０６と、１４ビツト
参照画像信号人力１０７の差分をとも　その結果の信号
が前提とされている最大・最小のしきい値を越える場合
はクリッピング回路１０９でクリップさｔＮ、　８Ｘ１
の一次元のＤＣＴ処理回路１１１に入力されも　クリッ
プを行なわない場合Ｃヨ　　差分器１０８からの信号が
スルーされ　８×１の一次元のＤＣＴ処理回路１１１に
人力され　８×１のＤＣＴ処理が施され、１８ｘｌの一
次元のＤＣＴ処理回路１１１における処理タイミング（
よ　データストローブ信号入力端子１０５より入力され
る１４ビツト画像信号人力１０６から入力される一組６
４個のデジタル画像信号の先頭の信号を示すストローブ
信号をトリガーとして、タイミング信号生成回路１１０
により制御されも　次に　クリッピング・丸め込み処理
回路１１２でば　８×１の一次元のＤＣＴ処理回路１１
１からの処理出力に対しクリッピング・丸め込み処理を
行なＬ＼　その結果を１２８ワード×１６ビツトのデュ
アルポートメモリ１１４に入力すモ１２８ワード×１６
ビツトのデュアルポートメモリ１１４の書き込ム　読み
だしく表　書き込み制御回路１１ａ、読みだし制御回路
１１５により制御されも　次に　８Ｘ１の一次元のＤＣ
Ｔ処理回路１１７で（友　１２８ワード×１６ビツトの
デュアルポートメモリ１１４からの入力信号をＤＣＴ処
理し　ここでの処理タイミングはタイミング信号生成回
路１１６により制御されも　８×１の一次元のＤＣＴ処
理回路１１７からの出力データ床　クリッピング・丸め
込み処理回路１１８を通じて、１４ビツトの画像出力端
子１１９に出力され　二次元の８×８のＤＣＴ処理が完
結すもなお本実施例では１画素データが１４ビツト長の
啄７ビツト長の信号に２分割した力＜、Ｍ＞Ｎを満たす
Ｍビット長をＬビット長の信号に分割しても（ただしＬ
１ビット長の信号に分割する場合を除く）同様の効果を
有すん発明の効果以上　説明したごとく本発明によれζＣＭビット長をＬ
＞Ｎを満足するＬビット長に分割ＬＬビット長で部分積
の演算を並列的に実行し　最後にそれらの中間結果の加
算を実行する方式により、Ｎ＝８．Ｊ＝２の時に　８つ
のサンプリングクロックの期間で８×１の一次元のＤＣ
Ｔ処理を実現することがでよ　かつ内部演算精度をＭ＝
１４ビットの精度まで乗算器を用いずに確保することが
でき、その実用的効果は大きし１0≦ and ≦7) are each tri-state driver 55
FIG. 4 shows the adaptive D according to an embodiment of the present invention.
This figure shows the general configuration of the CT processing device, in which a104 is a control signal input terminal, 105 is a data strobe signal input terminal, 106 is a 14-bit image signal input terminal, and 107 is a 1-bit image signal input terminal.
4-bit reference image signal input terminal, 108 is a subtractor 10
9 is the clipping time [110 is 8×1 one-dimensional DCT
Timing signal generation section 112 for processing circuit 111
The same area 113 controls the writing to the 128 word x 16 bit dual port memory 114. The area 115 controls the reading from the dual port memory 114. The area 116 shows the 8x1 one-dimensional DCT.
Timing signal generation time j for the processing circuit 117! 1
118 is clipping/rounding processing; 119 is 14
Even if it is a bit image output terminal, Figure 4 shows the 8x1 of Figure 1.
This is an example of an 8x8 adaptive DCT processing device using the DCT processing circuit block of 6. In the case where adaptive processing is performed using the control signal 104, the subtractor 108 inputs the 14-bit image signal 106 and the 14-bit reference image signal. If the resulting signal exceeds the assumed maximum and minimum thresholds, it is clipped by the clipping circuit 109, tN, 8X1.
If the signal is not clipped even if it is input to the one-dimensional DCT processing circuit 111, the signal from the subtractor 108 is passed through, and is manually input to the 8×1 one-dimensional DCT processing circuit 111, where it is subjected to 8×1 DCT processing. and the processing timing in the 18xl one-dimensional DCT processing circuit 111 (
A set of 14-bit image signals input from the data strobe signal input terminal 105 and a set 6 input from the input terminal 106
The timing signal generation circuit 110 is triggered by a strobe signal indicating the first signal of the four digital image signals.
Next, the clipping/rounding processing circuit 112 is controlled by the 8×1 one-dimensional DCT processing circuit 11.
Clipping and rounding processing is performed on the processing output from 1, and the result is input to the 128 word x 16 bit dual port memory 114.
When writing and reading the bit dual port memory 114, it is controlled by the write control circuit 11a and the read control circuit 115.Next, the 8X1 one-dimensional DC
The T processing circuit 117 performs DCT processing on the input signal from the 128 word x 16 bit dual port memory 114, and the processing timing here is controlled by the timing signal generation circuit 116. The output data floor from 117 is output to a 14-bit image output terminal 119 through a clipping/rounding processing circuit 118, and two-dimensional 8×8 DCT processing is completed, but in this embodiment, one pixel data has a length of 14 bits. Even if you divide the M-bit length into a L-bit-long signal that satisfies the force <, M>N of dividing the signal into 2 into a 7-bit-long signal (however, L
As explained above, according to the present invention, the ζCM bit length can be reduced to L.
> N=8. By dividing the partial product into L bit lengths satisfying N, the partial product operation is executed in parallel with LL bit length, and finally the intermediate results are added. When J=2, 8×1 one-dimensional DC in 8 sampling clock periods
It is possible to realize T processing and the internal calculation accuracy is M =
Accuracy up to 14 bits can be secured without using a multiplier, and its practical effect is 1.

[Brief explanation of drawings]

第１図は本発明の一実施例における８Ｘ１の一次元のＤ
ＣＴ処理回路のブロック＆　第２図はビットシリアル演
算部の回路構成は　第３図はＲＯＭと加算器による係数
乗算部の回路構成飄　第４図は本発明の一実施例による
アダプティブＤＣＴ処理回路の概略構成図であも２・・・・画像信号入九　３〜１０・・・・データレシ
ス久　１９〜２２・・・・ビットシリアル演算臥４３〜
４６・・・・ＲＯＭと加算器による係数乗算＠　　１１
１，１１７・・・・８Ｘｌの一次元ＤＣＴ処理回跋１１
４・・・・デュアルポートメモリ。FIG. 1 shows an 8×1 one-dimensional D
Blocks of the CT processing circuit & Figure 2 shows the circuit configuration of the bit serial operation section. Figure 3 shows the circuit configuration of the coefficient multiplication section using ROM and an adder. Figure 4 shows the circuit configuration of the adaptive DCT processing circuit according to an embodiment of the present invention. In the schematic configuration diagram, 2...Image signal input 9 3-10...Data reception 19-22...Bit serial operation 43-
46...Coefficient multiplication by ROM and adder @ 11
1,117...8Xl one-dimensional DCT processing speed 11
4...Dual port memory.

Claims

[Claims]

(1) In DCT processing used for band compression of image signals, a signal of M bit length is converted to DC in a processing unit of N×N pixels.
When performing T processing, when the relationship M>N holds, M
Divide the bit length into L-bit length signals that satisfy L<N, perform the calculation of partial products of each L-bit length in a bit-serial manner using an adder and ROM, and finally calculate the results of these operations. A DCT processing device characterized in that by performing addition, N×1 one-dimensional DCT processing of M bit length is completed in N sampling clock periods.

(2) N×N of M bit length characterized by using two N×1 one-dimensional DCT processing devices of M bit length and a dual port memory for converting the scanning direction of the data string.
Two-dimensional DCT processing device.