TW379301B

TW379301B - Data processor and data processing method

Info

Publication number: TW379301B
Application number: TW086116630A
Authority: TW
Inventors: Masahiro Kainaga; Koji Yamada; Hiroyuki Ono
Original assignee: Hitachi Ltd
Priority date: 1997-08-05
Filing date: 1997-11-07
Publication date: 2000-01-11
Also published as: WO1999008204A1

Abstract

The invention relates to a data processor in a microprocessor or micro computer, in particular, a data processor for executing image processing application programs such as two-dimensioinal discrete cosine transform and two-dimensional discrete inverse cosine transform. The basic structure of high-speed two-dimensional discrete cosine transofrm and two-dimensional discrete inverse cosine transform comprises: a computation circuit with accumulators to linearly transform the corresponding vectors of element 4 and the first register file to store the 4x4 matrix and the second register file for storing the vectors of element 4 for linear transform and the third register file for storing the result of the linear transform at the same time, the computation commands for the vector of the first register file and second register file, with two types of row-column vector operation commands, when the first kind of row-column operations are executed, the reading-out direction of the values from the first register file is defined to be the row direction, when the second kind of row-column operations are executed, the reading-out direction of the values from the first register file is defined to be the column direction.

Description

經濟部中央標準局負工消费合作社印製 A7 B7 五、發明説明) 1 技術領域本發明係關於微處理機或微電腦等之資料處理裝置，特別是關於在實行包含2次元離散餘弦變換變或2次元逆離散餘弦變換等之圖像處理應用程式合適之資料處理裝置〇背景技術圖像資料未經.加工時有：資料量龐大，圖像資料之儲存之際，需要大容量之記憶體，資料傳送之際，傳.送時間需要長之問題。此處採取如下等之對策：將圖像資料儲存於記憶體之前壓縮之’在使用瞬間之前伸長，在傳送之前壓縮圖像資料，接收之後伸長之。以卞，關於崮像貪料之Μ綸、伸長，以依據J PEG 規格之靜止圖像之編碼爲例說明之。在J P E G規格中，壓縮爲組合以下之2個之方法者。 / (1 ) 2次元離散餘弦變換（DCT) (2 )赫夫曼譯碼 _( Huffmann coding ) 另一方面，伸長爲組合以下之2個之方法者。 (3 )赫夫曼譯碼* (4 ) 2次元逆離散餘弦變換。上述2次元離散餘弦變換（1 )以8 X 8像素之2次元方塊之値群爲對象而進行。具體而言’爲介經8 X 8像素之2次元方塊之値群與稱爲D C T之基底之行列式之乘法而求得積者。因此，變換之結果（稱爲0 CT係數）也本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） —I；------d-l« (請先閲讀背面之注意事項再填寫本頁) -訂經濟部中央標準局貝工消費合作社印製 A7 _____B7______ 五、發明説明（2 ) 成爲8 X 8像素之2次元方塊之値群。變換後之D C T係數，使用在各別之係數位置具有不同之値之量子化表（將某區間之値置換爲該區間之代表値之處理）而被量子化。在圖像資料之2次元離散餘弦變換中，通常，被變換之方 _之右下之部份接近0値很多，在量子化處理中，那些大多數具有成爲〇之特徵。上述赫夫曼編碼（2 )爲將上述被量子化之8 X 8像素方塊之値群變換.爲位元通量（bitstream)之處理。此際，進行活用8 X 8像素方塊內之値0多之點之編碼。即，對於信號値之中出現機率多者分配短位元列之編碼以進行可變長度編碼者。介經如此，編碼後之位元通量之字節數成爲變換前之資料之字節數之1/1 0之程度。· 上述赫夫曼譯碼（3)爲赫夫曼編碼（2)之逆處理。即，將位元通量復原爲8 X 8像素方塊之値群之處理。又，2次元逆離散餘弦變換（4 )爲2次元離散餘弦變換 (1 )之逆變換處理。即，對8 X 8像素方塊之値群施予 2次元離散餘弦變換（1 )之逆處理，復原爲最初之8 X 8像素方塊之値群者。具體而言，介經在赫夫曼譯碼處理被譯碼之8 X 8像素方塊之値群（DCT係數）與DCT 基底之乘法以求得積，使得圖像資料被復原。又，介經上述2次元離散餘弦變換（1 )之中進行之量子化，在2次元離散餘弦變換之前之8 X 8像素方塊之値群與上述2次元離散餘弦變換（4 )之後之8 X 8像素方塊之値群嚴格來說並不一致。即，爲非可逆之壓縮、伸本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁)Printed by the Central Standards Bureau of the Ministry of Economic Affairs and Consumer Cooperatives. A7 B7 V. Description of the invention) 1. TECHNICAL FIELD The present invention relates to a data processing device such as a microprocessor or a microcomputer, and more particularly to the implementation of a discrete cosine transform or 2 Appropriate data processing devices for image processing applications such as inverse discrete cosine transform. Background technology When the image data is not processed, there are large amounts of data. When storing image data, large-capacity memory and data When transmitting, it takes a long time to transmit. The following measures are taken here: compress the image data before storing it in the memory, and stretch it before using it, compress the image data before transmitting it, and extend it after receiving. Let's take the example of the M image and the elongation of the image, taking the encoding of the still image according to the J PEG specification as an example. In the J P E G specification, compression is a method of combining two of the following. / (1) 2D discrete cosine transform (DCT) (2) Huffmann coding _ (Huffmann coding) On the other hand, it is extended to combine the following two methods. (3) Huffman decoding * (4) 2 dimensional inverse discrete cosine transform. The above-mentioned two-dimensional discrete cosine transform (1) is performed on a unitary group of two-dimensional blocks of 8 x 8 pixels. Specifically, ′ is a product obtained by multiplying a unitary group of a two-dimensional block of 8 × 8 pixels and a determinant of a base called D C T. Therefore, the conversion result (referred to as the 0 CT coefficient) is also applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) for this paper size —I; ------ dl «(Please read the precautions on the back before (Fill in this page)-Order printed by the Central Standards Bureau of the Ministry of Economic Affairs, Shellfish Consumer Cooperative, printed A7 _____B7______ 5. Description of the invention (2) Become a group of 8-by-8-pixel 2-dimensional cubes. The transformed DCT coefficients are quantized using a quantization table having a different 値 at each coefficient position (the process of replacing 値 in a certain interval with a representative 値 in the interval). In the two-dimensional discrete cosine transform of image data, generally, the lower right part of the transformed square _ is close to 0 値. In the quantization process, most of these have the characteristics of becoming 0. The above Huffman coding (2) is a process of transforming the quantized 8 × 8 pixel block unitary group into a bitstream. In this case, encoding is performed using more than 値 0 points in an 8 × 8 pixel block. That is, those who have a high probability of appearing in the signal frame are assigned a short bit string encoding to perform a variable length encoding. As a result, the number of bytes of the bit flux after encoding becomes 1/1 0 of the number of bytes of the data before conversion. · The above Huffman decoding (3) is the inverse processing of the Huffman coding (2). That is, the process of restoring the bit flux to a unitary group of 8 × 8 pixel squares. The 2D inverse discrete cosine transform (4) is an inverse transform process of the 2D inverse discrete cosine transform (1). That is, the unitary group of the 8 × 8 pixel block is subjected to the inverse processing of the 2D discrete cosine transform (1), and is restored to the unitary group of the original 8 × 8 pixel block. Specifically, the multiplication of the decoded 8 × 8 pixel square unitary group (DCT coefficients) and the DCT base through Huffman decoding processing to obtain the product makes the image data restored. Furthermore, through the quantization performed in the above-mentioned 2D discrete cosine transform (1), the unitary group of 8 X 8 pixel squares before the 2D discrete cosine transform and 8 X after the above 2D discrete cosine transform (4) The group of 8-pixel squares is not strictly consistent. That is, it is irreversibly compressed and stretched. The paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)

A 7 _________B7_____ 五、發明説明（3 ) 長處理。但是，非極端之粗糙之量子化時，現有圖像與復原圖像之差別，以人類之眼幾乎無法試別之故，無實用上之問題》以上，說明介經圖像壓縮，圖像資料之所需要之字節數成爲1/1 0程度，對記憶裝置之儲存效率或轉送效率成爲1 0倍程度之優點。但是另一方面，有圖像資料之壓縮/伸長處理所需要之手續與時間增加之缺點。例如，不壓縮圖像資料而儲存於記憶裝置時，只需單單將該圖像資料由記憶裝置讀出時即刻可以使用。但是，被壓縮之圖像資料被儲存於記億裝置時，該圖像資料由記億裝置讀出之後，將其伸長，復原爲原來之圖像資料始可以使用。經濟部中央標準局員工消費合作社印笨 (請先閲讀背面之注意事項再填寫本頁) 本發明者們在本發明之前，介經泛用微處理器進行圖像資料之壓縮/伸長之場合，每8 X 8像素方塊於（2 ) 、（：3 )之赫夫曼編碼處理以及赫夫曼譯碼處理，各各實行命令數需要1000個之程度，（1)、·：（4)之離散餘弦變換以及逆.離散餘弦變換處理各各需要1 0 0 0 -2000個之程度。因此，在介經（1 )與（2)之編碼或介經（3)與（4)之譯碼，每8x8像素方塊各各需要2 0 0 0 — 3 0 0 0個之程度之命令數。因此，在6 4 0 X 4 8 〇像素每一個帶色之圖像所需要命令數單純換算時，每一色爲上述處理之4 8 0 0倍，全體必須1 4 4 0 0倍之命令實行。即’在先前之微處理器中，在圖像資料之編碼或譯碼時，每1圖像需要 28 . 8Μ—4 3 . 2Μ個（Μ爲百萬〔MEGA〕之意本紙張尺度適用中國國家標準（CNS ) A4規格（210X 297公釐）~~ 經濟部中央標準局貝工消费合作社印製 A7 B7 五、發明説明（4 ) )之程度之命令之實行。此處’1命令之處理平均設爲需要1時鐘脈衝時’在1 〇 OMH z動作之微處理器時’了解到每1圖像需要2 8 8 — 4 3 2m秒之處理時間。在這種處理速度下，即使要將靜止圖像連續地顯示時’每1秒尸有2 — 4張之處理速度，如此以靜止圖像之連續的顯示要產生動畫之效果有其困難。此處，縮小畫面之尺寸’準備壓縮或伸長用之特殊之硬體，介經巧妙之.算法之導入以削減命令數’採用在1時鐘脈衝可以2命令以上之命令處理之機械語實行方式（ superscalar)，等之功夫爲必要。又，在上述之說明中，於（1 )或（4)之處理，說明過每8x8像素方塊需要1〇〇0—2000個之命令，無特別功夫之簡單算法時，爲2 0 0 0個命令之程度’ 下功夫之巧妙的算法時，爲1 0 0 0個命令程度之意義。本發明之目的，在於提供可以高速實行'2次元離散餘弦變換或2次元逆離散餘弦變換之資料處理裝置。本發明之目的在於提供2次元離散餘弦變換或2次元逆離散餘弦變換高速實行所必要之資料處理裝置之基本機構（必須要之最小限度之硬體條件）以及有效活用其之命令形態以及介經該命令之基本機構之控制方式。關於本發明之上述以及其之其它之目地與新的特徵，由本詳細說明書之敘述以及所附圖面可以變得很淸楚。發明之公開揭露本紙張尺度適用中國國家榇準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁) -* Γ 經濟部中央標準局員工消费合作社印製 Α7 Β7 五、發明説明（5 ) 本申請專利所公開揭露之發明之中，簡單說明代表性者之槪要時，如下述者。即，在本發明中，2次元離散餘弦變換或2次元逆離散餘弦變換高速實行用之基本機構爲準備具有對應將要素數4之向量線形變變之要素數之數目之積和器之運算電路 / . 。又，準備儲存線形變換之4 X 4行列用之第1寄存器檔，與儲存成爲線形變換之對象之要素數4之向量用之第2 寄存器檔，與儲存線形變換之結果用之第3寄存器檔。再者，作爲第1寄存器槽之行列與第2寄存器檔之向量之運算命令，準備第1種行列運算命令舆第2種行列運算命令之2種類，第1種行列運算命令被實行時，由上述第1寄存器檔之各値之讀出方向設爲行方向，第2種行列運算命令被實行時，被設爲列方向地控制運算電路。介經如此，在寄存器檔案爲2個之場合或行列運算命令爲1種類之場合，於2次元離散餘弦變換或2次元逆離散餘弦變換之際，有必要在運算途中對寄存器檔案之資料做更正儲存者，適用本發明之故，如此之對寄存器檔案之更正儲存變成不須要，其結果，可以高速實行2次元離散餘弦變換或2次元逆離散餘弦變換。實施發明用之最優良之形態以下參考圖面說明本發明之實施例。在圖1，表示適用本發明之合適之微處理器之方塊圖。於圖1中，1爲中央處理裝置（以下，稱爲CPU)、本紙張尺度適用中國國家標準（CNS Μ4規格( 210X297公釐） β {請先閲讀背面之注意事項再填寫本頁)A 7 _________B7_____ V. Description of the invention (3) Long processing. However, in the non-extreme rough quantization, the difference between the existing image and the restored image can hardly be distinguished by the human eye. There is no practical problem. The above explains the image data through image compression. The required number of bytes becomes about 1/1 0, and the storage efficiency or transfer efficiency of the memory device becomes about 10 times the advantage. However, on the other hand, there are disadvantages in that the procedures and time required for the compression / expansion processing of image data are increased. For example, when the image data is stored in a memory device without compressing the image data, the image data can be used immediately after being read out from the memory device. However, when the compressed image data is stored in the device, the image data is read out by the device, and then it is stretched and restored to the original image data. Yin Ben, an employee consumer cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page). Before the present inventors, the image data was compressed / extended through a general-purpose microprocessor. Every 8 X 8 pixel block is subjected to the Huffman encoding processing (2) and (: 3) and the Huffman decoding processing. Each of the execution commands needs to be about 1,000. (1), ·: (4) Discrete cosine transform and inverse. Discrete cosine transform processes each require a degree of between 100 and 2000. Therefore, in the encoding of (1) and (2) or the decoding of (3) and (4), each 8x8 pixel block requires a number of commands of the order of 2 0 0-3 0 0 0 . Therefore, when the number of commands required for each colored image of 640 x 480 pixels is simply converted, each color is 480 times the above-mentioned processing, and the entire order must be implemented by 1400 times. That is, in previous microprocessors, when encoding or decoding image data, 28.8M-43.2M per image was required (M means millions [MEGA]). This paper standard applies to China. National Standard (CNS) A4 specification (210X 297 mm) ~~ The implementation of the order of the extent that A7 B7 is printed by Shellfish Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (5) Invention Description (4)). Here, the processing of the "1 command is set on the average when one clock pulse is required" when the microprocessor operates at 100 MHz. It is understood that a processing time of 2 8 8-4 3 2 m seconds is required for each image. At this processing speed, even when a still image is to be continuously displayed, there is a processing speed of 2 to 4 frames per 1 second. Therefore, it is difficult to produce an animation effect with the continuous display of a still image. Here, the size of the screen is reduced. “Special hardware for compression or expansion is prepared ingeniously. Algorithms are introduced to reduce the number of commands.” The mechanical language implementation method that can process two or more commands in one clock pulse is implemented ( superscalar), waiting for the effort is necessary. Also, in the above description, in the processing of (1) or (4), it has been explained that each 8x8 pixel block needs 1000-2000 commands. When there is no simple algorithm with special effort, it is 2000. The degree of command 'When the clever algorithm is worked hard, it means the meaning of 100 command levels. An object of the present invention is to provide a data processing device capable of performing a 'two-dimensional discrete cosine transform or a two-dimensional inverse discrete cosine transform at high speed. The object of the present invention is to provide the basic mechanism of the data processing device necessary for high-speed implementation of the two-dimensional discrete cosine transform or the two-dimensional inverse discrete cosine transform (the minimum required hardware conditions), and the form of the command to effectively use it and the introduction The basic mechanism of control of the order. The above and other objects and new features of the present invention will become clear from the description of the detailed description and the attached drawings. The disclosure of the invention reveals that the paper size is applicable to China National Standards (CNS) A4 specifications (210X297 mm) (Please read the precautions on the back before filling out this page)-* Γ Printed by the Staff Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Α7 Β7 V. Description of the invention (5) Among the inventions disclosed in the present patent application, a brief description of the main points of the representative is as follows. That is, in the present invention, the basic mechanism for high-speed implementation of the two-dimensional discrete cosine transform or the two-dimensional inverse discrete cosine transform is an arithmetic circuit that prepares a product summator having the number of elements corresponding to the vector line deformation of the number of elements 4 /. In addition, a first register file for storing 4 X 4 rows and a linear transformation, and a second register file for storing a vector of the number 4 of elements to be subjected to the linear transformation, and a third register file for storing the result of the linear transformation are prepared. . In addition, as the operation command of the vector of the first register slot and the vector of the second register file, the first type of row and column operation command and the second type of row and column operation command are prepared. When the first type of row and column operation command is executed, The reading direction of each frame of the first register file is set to the row direction, and when the second type of row and column calculation command is executed, the arithmetic circuit is set to the column direction. Based on this, when there are two register files or when the rank calculation command is one type, it is necessary to correct the data in the register file during the calculation when the two-dimensional discrete cosine transform or the two-dimensional inverse discrete cosine transform is used. The storage person applies the present invention, so that the correction storage of the register file becomes unnecessary, and as a result, the 2D discrete cosine transform or the 2D inverse discrete cosine transform can be performed at high speed. Best Mode for Carrying Out the Invention An embodiment of the present invention will be described below with reference to the drawings. In Fig. 1, a block diagram of a suitable microprocessor to which the present invention is applied is shown. In Figure 1, 1 is the central processing unit (hereinafter referred to as the CPU). This paper size is applicable to the Chinese national standard (CNS M4 specification (210X297 mm)) β {Please read the precautions on the back before filling this page)

A 7 _________B7 五、發明説明（6 ) 2爲代替C P U 1，進行行列積或浮動小數運算等之運算之協同處理器（以下.，稱爲FPU)、3爲接受由週邊電路1 1，1 2，1 3來之插入要求以及由後述之MMU來之例外處理要求信號，判定優先度，對於上述C PU 1輸出插入信號IRQ之插之控制電路、4爲將由上述CPU 1輸出於匯流排8 a上之位址信號變換，管理假想記億體之記憶體管理單元（MMU)、5爲由將邏輯位址轉換爲物理位址之位址變.，換表等形成之位址變換電路。又，6爲記憶介經上述CPU 1被頻繁使用之程式或資料之高度之快取記憶體，7爲監視由上述CPU 1輸出於匯流排上之位址信號，遵循規定之置換算法將外部之主記憶體（圖外之硬碟記憶裝置等）內之資料以規定之方塊單位傳送於快取記憶體6，廢棄快取記憶體6內成爲不需要之資料，將寫入快取記憶體6之資料以複製回去（copy back)或寫入方式（write through)使記憶'於主記億體之快取控制器。此快取記憶體6以及外部之主記憶體介經上述位址變換表5被變換後之物理位址信號而被存取。經濟部中央橾準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 本實施例之單晶片微處理器，與傳送由C PU 1輸出之邏輯位址信號以及資料信號用之邏輯位址匯流排8 a以及資料匯流排9 a不同地另外設置傳送在上數位址變換表 5被變換之物理位址信號用之物理位址匯流排8 b，與在上數快取記億體6與外部之主記憶體之間傳送資料用之資料匯流排9b，同時，設置進行內部匯流排8b、9b與外部匯流排之間之信號之界面之外部匯流排界面電路10 本紙張尺度適用中國國家揉準（CNS ) A4規格（2丨0 X 297公釐） A 7 ___B7_____ 五、發明説明（7 ) ο · 再者，在此實施例中，不同於上述邏輯位址側匯流排 8a、 9a或物理位址側匯流排8b、 9b，另外設置串列通信用之串列通信界面電路1 1.或具有現在時刻之計時 ^月曆等之機能之即時時脈衝電路12、對CPU 1給予計時機能之計時電路13等之週邊電路被連接之週邊位址匯流排8 c以及週邊資料匯流排9 c。再者，圖1中..，1 4爲控制物理位址側之匯流排8 b 、9b以及週邊匯流排8c、 9c之匯流排狀態之匯流排控制器，1 5爲利用P L L ( phase locked loop )電路產生晶片內部之C PU 1以及各電路方塊之動作必要之時鐘脈衝信號之時鐘脈衝產生電路，16爲檢測硬體之異常用之監視計時器，17爲通過上述外部界面電路10在週邊匯流排8 c、9 c與外部匯流排之間使資料之輸入輸出可能之I/O埠，18爲提供使用者系統開發時支援系統錯誤，在任意之點使程式之實行停止之機能之中斷控制器〇經濟部中央標準局負工消費合作社印裝 (請先閲讀背面之注意事項再填寫本頁) 又，圖1所示之CPU1以及電路方塊（2〜，10 〜18以及SPF)以及匯流排（8a〜8c，9a〜 9 c )爲在如單結晶矽基板之一個之半導體晶片1 0 0上形成。又，雖無特別限制，但在此實施例中，上述外部之主記憶體在以DRAM (動態隨機存取記億體）構成之場合，進行其之更新動作之更新控制器爲內藏於上述外部匯流排界面電路10內。本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） -10- 經濟部中央橾準局貝工消费合作社印裝 A7 __B7_______ 五、發明説明（8 ) 在圖2，表示上述CPU 1之具體之構成例。於圖2 中，2 0表示被實行之命令之位址’ 2 1爲通過資料匯流排9 a保持由上述快取記憶體6或外部之主記憶體取得之命令碼之3 2位元之類之命令寄存器，2 2爲將被取入命令寄存器2 1之命令碼譯碼，產生控制信號之命令譯碼器，2 3爲由保持運算前之資料或運算後之資料等之各種泛用寄存器R E G 1〜R E G η以及進行位址運算或資料之加減算、邏輯運算.之加減法器ALU、進行資料之位元移位之桶狀換行器S FT、位址輸出寄存器ADR、資料輸入輸出寄存器D TR等構成之命令實行電路。上述命令實行電路2 3內設置運算用匯流排B U S 1 ，2，3，介經此運算用匯流排BUS1 ’ 2，3，上述寄存器REG1〜REGn，ADR，加減法器ALU，桶狀換行器S F T間可以連接’設置在各寄存器或運算器之匯流排間之閘極G T 1〜G Tm介經上述命令寄存器 2 2輸出之控制.信號CS1〜Cs i而被時序控制著’對應命令之資料被實行。但是，c PIJ 1判斷被取入命令寄存器2 2之命令對於FPU (協同處理器）2爲專用之命令時，該命令之實行聽任F P U 2，本身爲待機狀態或移往下一個命令之實行。又，在C P U 1內設置由反映內部控制狀態等用之狀態既存器SR、例外發生時避開狀態寄存器SR之內容之狀態避開寄存器S S R、例外發生時避開程式計數器2 0 之內容之PC避開寄存器S PC、儲存間接定址模式之際本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁)A 7 _________B7 V. Description of the invention (6) 2 is a co-processor (hereinafter referred to as FPU) that performs operations such as row-column product or floating decimal operation instead of CPU 1, and 3 is a circuit that accepts peripheral circuits 1 1, 1 2 Insertion request from 13 and exception processing request signal from MMU to be described later to determine the priority. For the above-mentioned CPU 1 output control signal IRQ interpolation control circuit, 4 is the above-mentioned CPU 1 output to the bus 8 a The above address signal conversion manages the memory management unit (MMU) of the imaginary memory, 5 is an address conversion circuit formed by converting a logical address into a physical address, and changing tables. In addition, 6 is a high-speed cache memory that stores frequently used programs or data through the above-mentioned CPU 1, and 7 is to monitor the address signals output from the above-mentioned CPU 1 on the bus, and follow the prescribed replacement algorithm to externally The data in the main memory (hard disk storage device, etc. outside the picture) is transmitted to the cache memory 6 in a predetermined block unit. The cache memory 6 is discarded and becomes unnecessary data, and the cache memory 6 is written The data is copied back or written through to make the memory 'in the master cache memory controller. This cache memory 6 and the external main memory are accessed via the physical address signals transformed by the address conversion table 5 described above. Printed by the Consumers' Cooperative of the Central Government Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page) The single-chip microprocessor of this embodiment is used to transmit the logical address signals and data signals output by the CPU 1. The logical address bus 8 a and the data bus 9 a are differently provided with a physical address bus 8 b for transmitting the physical address signal transformed in the upper address conversion table 5, and the upper cache Data bus 9b for transmitting data between the billion body 6 and the external main memory, and at the same time, an external bus interface circuit for setting an interface between the internal bus 8b, 9b and the external bus 10 paper size Applicable to China National Standard (CNS) A4 (2 丨 0 X 297 mm) A 7 ___B7_____ V. Description of the invention (7) ο · In this embodiment, it is different from the logical address side bus 8a , 9a or the physical address side buses 8b, 9b, in addition to the serial communication interface circuit for serial communication 1 1. or the instant pulse circuit with the function of the current time ^ monthly calendar, etc. 12, to the CPU 1 Timer The peripheral address clock circuit 13, etc. are connected to the peripheral circuit bus and a peripheral data bus 8 c 9 c. Furthermore, in FIG. 1, 14 are bus controllers that control the status of the buses 8 b and 9b on the physical address side and peripheral buses 8c and 9c. 15 is the use of a phase locked loop (PLL) ) The circuit generates the CPU 1 inside the chip and the clock pulse generation circuit of the clock pulse signals necessary for the operation of each circuit block. 16 is a watchdog timer for detecting hardware abnormalities. 17 is a peripheral confluence through the external interface circuit 10 described above. I / O port for data input and output between bus 8 c, 9 c and external bus. 18 is to provide user system development support for system errors, interrupt control for stopping program execution at any point. 〇Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page). Also, the CPU1 and circuit blocks (2 ~, 10 ~ 18, and SPF) shown in Figure 1 and the bus (8a to 8c, 9a to 9c) are formed on a semiconductor wafer 100 such as one of single crystal silicon substrates. In addition, although there is no particular limitation, in this embodiment, when the external main memory is composed of DRAM (Dynamic Random Access Memory), the update controller that performs the update operation is built in the above Inside the external bus interface circuit 10. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) -10- Printed by the Shellfish Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs A7 __B7_______ 5. Description of the invention (8) In Figure 2, the above CPU 1 Specific constitution examples. In Fig. 2, 20 represents the address of the executed command. '2 1 is the 3 2 bits of the command code obtained from the above-mentioned cache memory 6 or external main memory through the data bus 9 a. The command register 22 is a command decoder that decodes the command code taken into the command register 21 and generates a control signal. The command register 23 is a variety of general-purpose registers that hold the data before or after the operation. REG 1 ~ REG η and address calculation or data addition and subtraction, logical operation. Addition and subtraction device ALU, barrel line converter S FT for bit shifting of data, address output register ADR, data input and output register D A command execution circuit composed of TR and the like. The above-mentioned command execution circuit 2 3 is provided with an arithmetic bus BUS 1, 2, 3, via which the arithmetic bus BUS 1 ′ 2, 3, the above-mentioned registers REG1 to REGn, ADR, adder-subtractor ALU, barrel-shaped commutator SFT You can connect 'gates GT 1 ~ G Tm set between the buses of each register or processor through the above-mentioned command register 2 2 output control. Signals CS1 ~ Cs i are controlled by timing' The data of the corresponding command is Implemented. However, when c PIJ 1 judges that the command fetched into the command register 2 2 is a dedicated command for FPU (coprocessor) 2, the execution of this command is left to F P U 2, which itself is in a standby state or moves to the execution of the next command. In addition, a CPU is provided in the CPU 1 to reflect the status of the internal control state, such as the internal register SR, to avoid the contents of the status register SR when an exception occurs, and to avoid the contents of the register SSR and the program counter 2 0 when an exception occurs. When avoiding the register S PC and storing the indirect addressing mode, the paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) (Please read the precautions on the back before filling this page)

-11 - 經濟部中央標準局員工消費合作社印袈 A7 B7 五、發明説明（9 ) 之基本位址之基本位址寄存器G B R、儲存例外處理或中斷處理之向量位址之向量位址寄存器V B R等之寄存器形成之控制寄存器2 4 ’介經由命令譯碼器2 2之輸出’各位元之狀態被讀出、寫入’因應控制寄存器2 4內之規定之位元狀態，命令之實行內容被控制著。在圖3表示上述FPU2之具體的構成例。FPU2 如圖3所示者，係由：含寄存器部9 0 1與4 X 4之行列積可能之積和器9 ;1 0，911，912，913與對應各積和器之4個之閂鎖電路920，921，922， 9 2 3與共通於上述積和器之閂鎖電路9 2 4而形成之運算部9 0 0，以及因應命令控制該運算部9 0 0之運算控制部9 9 0所構成。運算控制部9 9 0雖未圖示出，由與 C P U 1相同之命令寄存器與命令譯碼器所構成，被取入命令寄存器之命令判定爲自己專用命令（第1行列運算命令、第2行列運算命令等）時、實行對應之運算處理地形成對於運算部9 0 0之控制信號。F PU 2之命令寄存器以及命令譯碼器可以與C P U 1之命令寄存器以及命令譯碼器共用地構成之。在圖4表示上述積和器910〜913之構成例。各積和器由乘法器9 6 0與加法器9 6 1以及暫時寄存器 9 6 2形成’乘法器9 6 0進行取得由信號線9 4 0， 9 4 4供姶之16位元之資料之間之積之運算。上述加法器9 6 1取得乘法器9 6 0之運算結果與暫時寄存器 9 6 2之保持資料之和，將其結果儲存於暫時寄存器本紙張尺度適用中國國家樣準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁)-11-A7 B7 of the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs V. The basic address register GBR of the basic address of the invention description (9), the vector address register VBR that stores the vector address of exception processing or interrupt processing, etc. The control register 2 4 formed by the register is read and written through the output of the command decoder 2 2. The state of each bit is read and written. The content of the command is controlled according to the specified bit state in the control register 2 4. With. A specific configuration example of the FPU 2 is shown in FIG. 3. FPU2 is shown in Fig. 3, which includes: a sum product 9 including a possible row and column product of register sections 9 0 1 and 4 X 4; 10, 911, 912, 913 and four latches corresponding to each product sum The operation section 9 0 0 formed by the lock circuits 920, 921, 922, 9 2 3 and the latch circuit 9 2 4 commonly used in the above-mentioned summator, and the operation control section 9 9 which controls the operation section 9 0 0 according to a command. 0. Although not shown in the figure, the operation control unit 9 900 is composed of the same command register and command decoder as the CPU 1. The command fetched into the command register is determined as its own dedicated command (the first row and column operation command, the second row and column Calculation command, etc.), a control signal for the calculation unit 900 is formed so as to execute corresponding calculation processing. The command register and command decoder of FPU 2 can be configured in common with the command register and command decoder of CP1. An example of the configuration of the above-mentioned summators 910 to 913 is shown in FIG. 4. Each multiplier is formed by a multiplier 9 6 0 and an adder 9 6 1 and a temporary register 9 6 2 to form a 'multiplier 9 6 0. The 16-bit data supplied by the signal line 9 4 0, 9 4 4 is provided. The operation of the product of time. The above-mentioned adder 9 6 1 obtains the sum of the operation result of the multiplier 9 6 0 and the holding data of the temporary register 9 62 2 and stores the result in the temporary register. The paper size applies the Chinese National Standard (CNS) A4 specification (210X297). Li) (Please read the notes on the back before filling in this page)

-12- A 7 _B7________ 五、發明説明（彳。） 9 6 2而更新其內容。又’積和運算之前有必要使暫時寄存器9 6 2爲0之故，在此實施例中，設置供給初期値爲「0」之信號線937與選擇元件963，運算開始前選擇元件9 6 3通過信號線9 3 5給予選擇初期値「〇」將 _儲存於暫時寄存器9 6 2之控制信號而構成之。在圖5表示寄存器部9 0 1之具體之構成。寄存器部 901由4個之寄存器檔案500，501，502 ’ 5 0 3與對應上述...問銷電路9 2 0，9 2 1 ’ 9 2 2 ’ 923，924 之選擇元件 550，551 ’ 552 ’ 553，554而構成，在上數個寄存器檔案500 ’ 501，502，503配置個16個之寄存器’這些之寄存器被分割爲4個之副檔案》例如，在寄存器檔案 5 0 0配置由4個之寄存器形成之副檔案5 1 0 ’ 5 1 1 ，5 1 2，5 1 3 0 經濟部中央橾準局貝工消費合作社印装 (請先閲讀背面之注意事項再填寫本頁) 而且，在各各之副檔案內分別配置4個之寄存器，設置界定副檔案用之選擇元件5 1 6。在副檔案5 1 1分配寄存器〇，4，8，12，在副檔案512分配寄存器1 ，5，9，13。因此，0 — 15之16個之寄存器可以由4位元之2進位數形成之寄存器號碼識別。又’ 9 3 6 爲對於寄存器檔案供給給予寫入許可控制信號之信號線’ 9 5 0爲對於寄存器檔案供給寫入資料之共通之信號線。上述各副檔案分別同時可以由2個之寄存器做資料之讀出與對1個之寄存器坐資料之寫入而構成。因此’通過信號線930，931 ’ 932，指定寄存器用之4位元本紙張尺度適用中國國家揉準（CNS ) A4規格（210X297公釐） -13- A7 _；_ B7_____ 五、發明説明（n ) 之寄存器號碼之上位2位元被輸入副檔案。例如爲副檔案 5 1 0時，由介經通過信號線9 3 0而供給之選擇線號而被指定之寄存器讀出之資料被送往選擇元件5 5 0，因應通過信號線9 3 4供給之寄存器檔案之選擇信號之寄存器芩資料被選擇。又，由因應通過信號線9 3 1供給之選擇信號（寄存器號碼之上位2位元）之寄存器督出之資料被送往選擇元件5 1 6，對應通..過信號線9 3 3供給之選擇信號（寄存器號碼之下位2位元）之寄存器之資料被選擇。通·過信號線9 5 0被送到之寫入資料在信號線9 3 6之信號許可寫入時，被寫入對應通過信號線9 3 2供給之選擇信號之寄存器。行列積之性質經濟部中央標準局員工消费合作社印製 (請先閱讀背面之注意事項再填寫本頁) 接著，在說明介經本發明之微處理器之2次元離散餘弦變換或2次元逆離散餘弦變換之前’首先，就給予關於本發明之變換方式成爲有效高速化之根據之行列積之性質說明之。設8x8行列Μ以圖6之（式1 — 1 )被定義時，此行列Μ爲關於逆離散餘弦變換之行列’置換一些離散餘弦變換之行列之行或列可以獲得此行列’以下以此行列進行本發明之說明。4X4行列A ’ C設以圖6之（式1 —2)、（式1 一 3)而定義。如此’行列Μ可以如圖6 之（式1 — 4)表現之° 8x8行列X設以圖7之（式2 — 1)定義之。將此本紙張尺度適用中國國家標準（CNS ) Α4规格U10X297公釐） -14- A7 ___B7____ 五、發明説明（12 ) 行列X分割成4個，將各各部份行列X 1，X 2 ’ X 3 ’ X4 以圖 7 之（式 2-2)、（式 2-3)、（式 2-4 )、（式2 — 5)定義。如此’行列X可以（式2 — 6) 表現之。接著，圖8爲用於表示將離散餘弦變換之定義式與其〆分解成4x4部份行列之場合之計算方法之圖。（式3— _1)爲離散餘弦變換之定義式f。將表現於（式3 — 1) 之8 X 8行列分解,,成4 X 4部份行列而表現者爲（式3 — 2) 。將（式3 — 1)內之行列Μ以（式1_4)之右邊置換，將（式3_1)內之行列X以（式2 — 6)之右邊置換者。而且，將此（式3 — 2)展開時，成爲（式3 - 3) 、（式 3 — 4)、（式 3 — 5)。經濟部中央標準局員工消费合作社印製 (請先閲讀背面之注意事項再填寫本頁) 又，在表現於（式3 - 5)之4種類之項分別賦予（式3 — 6)〜（式3 — 9)之標號。如此，逆離散餘弦變換成爲可以（式3 — 1 0)計算之。此處/將上述（式3 —1 0 )稍加加工。圖9爲說明其之加工用之圖。首先，於圖8之（式3-10)顯示之4X4行列丁 1 ，Τ2， Τ3，Τ4之各別之要素賦予遵循（式4-1 )〜（式4 —4)之標號。又，4x4之常數行列Β以（式4 一 6) 定義之，而且，Τ與Β之積ΤΒ (4X16行列）以行列 S、即S=TB定義之。於行列S之6 4個之各要素賦予遵循（式4 一 7)之標號。而且，使用這些之標號以Y定義如圖9之（式4 一 8)所表示之8x8行列。如此，以（式4 — 8)定義之本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） •15- A7 _____B7____ 五、發明説明（13 ) Y及以圖8之（式3 — 1 0 )定義之Y爲相同。例如，以圖8之（式3 — 1 0)定義之8X8行列Y之左上之要素爲加上（式4 一 1 )〜（式4一 4)定義之4X4行列 ΤΙ，T2，T3，T4之各別之左上之要素t 1〇， t20，t30，t40者。另一方面，以圖9之（式4 一 8)定義之8X8行列Y之左上之要素，S = TB之故，爲給予B之（式4 — 6)之右邊之最初之行（1 1 1 1 )與給予T之（式。4 - 5)之右邊之最初之行（t 1〇 — t20t3 0t40)之內積，具體如以下者。. lxt 10 + lxt20 + lxf30 + lxt40 =t 10+t2 0 + t30+t40 關於其他之要素也相同。因此，可以了解以圖9之（式4 一 8)定義之Y與以圖8之（式3 — 10.)定義之Y爲相同。經濟部中央標準局員工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 寄存器檔案以及命令形式接著，關於本發明必要之硬體構成之中寄存器檔案之構成以及命令形成說明之。在本發明適用之處理器中，以 1 6個之寄存器構成之寄存器檔案至少具有3組。圖10 表示具有4組之寄存器檔案RFLO，RFL1， RFL2，RFL3之場合。又，在圖10之左側被賦予之寄存器號碼被表示於圖5之副檔案510〜513內。本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） ~~ -16- 經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明（14 ) 表示圖5與圖1 0之寄存器檔案之寄存器之對應關係。又，在本發明適用之處理器中，進行4 X 4行列與要素數4 之向量之行列積之命令，具有2種類。命令設爲如下所表示之。 T R V m，n，s, d (第1種行列運算命令） T R V T m，n, s, d (第2種行列運算命令）命令形式例如如圖1 1所示者，介經命令碼被儲存之命令碼領域ICF與以m, s， d， η表示之運算數被儲存之4個之領域〇PFl，0PF2，〇PF3， 0PF4而形成。這些之運算數之中m， s， d爲指定寄存器檔案之號碼。又，η爲指定寄存器檔案內之寄存器之號碼，爲4之倍數（0，4，8，12)之其中之一者。即，η爲由4位元之碼形成，以η之下位·2位元指定圖5 之副檔案，以上位2位元指定副檔案內之寄存器（在 TRV命令中，下位2位元經常設爲00)。接著，首先說明第1中行列運算命令TRV之機能。 TRV命令之機能以圖1 2之（式5 — 1 )定義之。即，將寄存器檔案m內之1 6個之寄存器之群値視爲4 X 4行 '列，又，將寄存器檔案s內之寄存器η，n + 1.，n + 2 ，η + 3之群値視爲4要素之向量，行列與向量相乘，將其結果儲存於寄存器檔案內之寄存器群η，n + 1，n + 2，n + 3。即，被運算向量由圖5之4個之副檔案1個本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁) 、言 Γ -17- 經濟部中央標準局貝工消费合作社印製 A7 B7 五、發明説明（15 ) 1個被取出，運算結果被儲存於4個之副檔案。關於圖 1 0之寄存器檔案’由連續之4個之寄存器向量被讀出，於對應之4個之寄存器儲存結果。因此，爲以下之命令時， T R V 0 9 0， 1 > 2 圖 1 2 之 ( 式 5 -2 ) 之運算被進行 0 又以下之命令列時 9 丁 R V 0 f 〇,， 1 9 2 T R V 0 9 4， 1 9 2 T R V 0 > 8 , 1 ， 2 T R V 0 9 12 9 1 ， 2 圖 1 2 之 ( 式 5 一 3 ) 之運算被進行 > 進行右邊之4 X 4 行列與 4 X 4 行列之乘算 9 其結果，成爲獲得左邊之4 X 4 行列 0 接著 5 說明第2 種行列運算命令 T R -V T之機能。 T R V T 命令之機能以 rcrf 圖 1 2 之（式5 一 4 )定義之。在 ( 式 5 — 4 ) 中被使用之 t 爲將由4 位元之碼形成之上述運算數 η 右移 2 位元之値〇 T 之上位 2 位元爲0 0。第 2 種行列運算命令 T R V T m 9 n , s 9 d 爲 :將寄存器檔案 m 內之 1 6 個之寄存器視爲 4x4 行列 > 將寄存器檔案 S 內之寄存器群 t ， 4 + t > 8 + t 9 1 2 + t 視爲4 要素之向量，行列與向量相乘，將其結果儲存於寄存器檔案內之寄存器群η，n + 1，n + 2，n + 3。即，被運算向量由圖5之4個之副檔案之其中之一之中之4個之寄存本紙張尺度適用中國國家橾準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁)-12- A 7 _B7________ V. Description of the invention (彳.) 9 6 2 and update its content. Also, it is necessary to make the temporary register 9 62 2 be 0 before the product sum operation. In this embodiment, a signal line 937 and a selection element 963 are provided at the initial stage of supply "0", and the selection element 9 6 3 is set before the calculation is started. It is constituted by giving a control signal of "?" To the initial stage of selection through signal line 9 3 5 and storing _ in temporary register 9 6 2. A specific configuration of the register section 901 is shown in FIG. 5. The register section 901 is composed of four register files 500, 501, 502 '5 0 3 and corresponding to the above ... pin circuit 9 2 0, 9 2 1' 9 2 2 '923, 924 selection elements 550, 551' 552 '553,554 and 16 register files in the above register files 500' 501,502,503. These registers are divided into 4 sub-files "For example, register file 5 0 0 configuration by 4 The secondary files formed by the individual registers are 5 1 0 '5 1 1, 5 1 2, 5 1 3 0 Printed by the Shellfish Consumer Cooperative of the Central Bureau of Quasi-Ministry of Economic Affairs (please read the notes on the back before filling this page). Four registers are arranged in each of the auxiliary files, and a selection element 5 1 6 for defining the auxiliary files is set. Registers 0, 4, 8, 12 are allocated to the auxiliary file 5 1 1 and registers 1, 5, 9, 13 are allocated to the auxiliary file 512. Therefore, 16 registers from 0 to 15 can be identified by register numbers formed by 4-digit binary digits. Also, "9 3 6 is a signal line for giving a write permission control signal to a register file supply" 9 50 is a common signal line for supplying a write data to a register file. Each of the above-mentioned sub-files can be constituted by reading data from two registers and writing data from one register. Therefore, through signal lines 930, 931 and 932, the 4-bit paper size of the designated register is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) -13- A7 _; _ B7_____ V. Description of the invention (n ) The upper 2 digits of the register number are entered into the auxiliary file. For example, for the auxiliary file 5 1 0, the data read from the register designated by the selection line number supplied through the signal line 9 3 0 is sent to the selection element 5 5 0, which is supplied in accordance with the signal line 9 3 4 The register data of the selection signal of the register file is selected. In addition, the data supervised by the register corresponding to the selection signal (the upper 2 digits of the register number) supplied through the signal line 9 3 1 is sent to the selection element 5 1 6 and the corresponding signal is supplied through the signal line 9 3 3 The register data of the selection signal (two digits below the register number) is selected. The write data sent through the signal line 9 50 is written to the register corresponding to the selection signal supplied through the signal line 9 3 2 when the signal permission of the signal line 9 3 6 is written. Printed by the Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page), and then explain the 2D discrete cosine transform or 2D inverse discrete cosine through the microprocessor of the present invention 'Before conversion' First, the nature of the matrix product of the conversion method of the present invention which is the basis for effective speedup will be explained. Let 8 × 8 rank M be defined by (Equation 1 — 1) in Figure 6. This rank M is the rank of the inverse discrete cosine transform. 'Replace some ranks or ranks of the discrete cosine transform to obtain this rank.' Below this rank The description of the present invention will be made. 4 × 4 ranks A ′ C are defined by (Expression 1-2) and (Expression 1-3) in FIG. 6. In this way, the rank M can be expressed as shown in Fig. 6 (Equation 1-4). The 8x8 rank X is defined by Fig. 7 (Equation 2-1). Apply this paper size to Chinese National Standard (CNS) A4 specification U10X297 mm) -14- A7 ___B7____ V. Description of the invention (12) Divide the rank X into 4 and divide the ranks of each part X 1, X 2 'X 3 'X4 is defined by (Expression 2-2), (Expression 2-3), (Expression 2-4), and (Expression 2-5) in FIG. In this way, the rank X can be expressed (Equation 2-6). Next, FIG. 8 is a diagram showing a calculation method in a case where the definition of the discrete cosine transform and its 〆 are decomposed into 4x4 partial ranks. (Equation 3— _1) is the definition f of the discrete cosine transform. Decompose the 8 X 8 rows and columns expressed in (Equation 3 — 1) into 4 X 4 partial rows and express them as (Equation 3-2). Substitute the rank M in (Formula 3-1) with the right side of (Formula 1_4), and replace the rank X in (Formula 3_1) with the right side of (Formula 2-6). Then, when this (Expression 3-2) is developed, it becomes (Expression 3-3), (Expression 3-4), (Expression 3-5). Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling out this page). The four types of items shown in (Equations 3-5) are given respectively (Equations 3-6) ~ (Equations 3 — 9). In this way, the inverse discrete cosine transform can be calculated (Equation 3-1 0). Here / process the above (Equation 3-10) slightly. FIG. 9 is a diagram for explaining the processing. First, the respective elements of the 4 × 4 rows and columns 1, T2, T3, and T4 shown in (Equation 3-10) in FIG. 8 are assigned with the following (Equation 4-1) to (Equation 4-4). In addition, the constant row B of 4x4 is defined by (Equation 4-6), and the product of T and B (4X16 row) is defined by row S, that is, S = TB. Each of the six or four elements in the rank S is given a code that follows (Equation 4-7). In addition, using these numbers, Y is used to define 8x8 ranks as shown in (9) (Equation 4-8). In this way, the paper size defined by (Equation 4-8) applies the Chinese National Standard (CNS) A4 specification (210X297 mm) • 15- A7 _____B7____ V. Description of the invention (13) Y and Figure 8 (Equation 3 — 10) Y is the same as defined. For example, the upper left element of the 8X8 rank Y defined by (Expression 3-1 0) in Fig. 8 is added to the 4X4 rank Ti, T2, T3, T4 defined by (Expression 4-1) ~ The respective top left elements are t 10, t 20, t 30, and t 40. On the other hand, the element on the upper left of the 8X8 rank Y as defined in Figure 9 (Equation 4-8), S = TB, is given to the first row (1 1 1) to the right of B (Equation 4-6). 1) The inner product of the initial trip (t 1-10-t20t3 0t40) to the right of T (Eq. 4-5), as follows. . lxt 10 + lxt20 + lxf30 + lxt40 = t 10 + t2 0 + t30 + t40 The same applies to other elements. Therefore, it can be understood that Y defined by (Equation 4-8) in FIG. 9 is the same as Y defined by (Equation 3-10) in FIG. 8. Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the notes on the back before filling out this page) Register file and order form Next, the structure of the register file and the order formation among the necessary hardware components of the present invention will be explained. In the processor to which the present invention is applicable, a register file composed of 16 registers has at least three groups. Figure 10 shows the case of register files RFLO, RFL1, RFL2, RFL3 with 4 groups. The register numbers assigned to the left side of Fig. 10 are shown in the auxiliary files 510 to 513 of Fig. 5. This paper size applies Chinese National Standard (CNS) A4 specification (210X297 mm) ~~ -16- Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 V. Description of the invention (14) The register shown in Figure 5 and Figure 10 Correspondence between the registers of the file. Moreover, in the processor to which the present invention is applied, there are two types of commands for performing a row-column product of a 4 × 4 row and column and a vector of 4 elements. The command is set as shown below. TRV m, n, s, d (the first kind of row and column operation command) TRVT m, n, s, d (the second kind of row and column operation command) For example, the command form is as shown in Figure 11 and stored via the command code The command code field ICF is formed with four fields of which operands represented by m, s, d, η are stored: 0PF1, 0PF2, 0PF3, and 0PF4. Among these operands, m, s, and d are the numbers of the specified register file. Also, η is the number of the register in the designated register file, which is one of multiples of 4 (0, 4, 8, 12). That is, η is formed by a 4-bit code, and the sub file of FIG. 5 is designated by η lower and 2 bits. The upper 2 bits specify the register in the auxiliary file. (In the TRV command, the lower 2 bits are often set. 00). Next, the function of the first row and column operation command TRV will be described first. The function of the TRV command is defined in Figure 12 (Equation 5-1). That is, a group of 16 registers in the register file m is regarded as a 4 × 4 row 'column, and a group of registers η, n + 1., n + 2, and η + 3 in the register file s is considered.値 is regarded as a vector of 4 elements, the ranks are multiplied by the vector, and the result is stored in the register group η, n + 1, n + 2, n + 3 in the register file. That is, the calculated vector consists of 4 auxiliary files in Figure 5 and 1 in this paper. The paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page). 17- Printed by A7 B7, Shellfish Consumer Cooperative, Central Bureau of Standards, Ministry of Economic Affairs 5. Description of the Invention (15) One was taken out, and the calculation results were stored in four auxiliary files. Regarding the register file of FIG. 10, four consecutive register vectors are read out and the results are stored in the corresponding four registers. Therefore, when it is the following command, TRV 0 9 0, 1 > 2 Fig. 12 (Equation 5-2) operation is performed 0 and the following command line 9 RV 0 f 0, 1 9 2 TRV 0 9 4, 1 9 2 TRV 0 > 8, 1, 2 TRV 0 9 12 9 1, 2 Figure 12 (Equation 5-3) is performed > 4 X 4 rows and 4 X on the right Multiplication of 4 rows and columns 9 The result is to obtain 4 X 4 rows and columns 0 on the left. Then 5 describes the function of the second row and column operation command TR -VT. The function of the T R V T command is defined by rcrf Figure 12 (Equation 5-4). The t used in (Equation 5-4) is to form the above-mentioned operand η by a 4-bit code, and shift 2 to the right by 2 bits. 〇 The upper 2 bits of T are 0 0. The second type of rank operation command TRVT m 9 n, s 9 d is: treat 16 registers in the register file m as 4x4 ranks > treat the register group t in the register file S, 4 + t > 8 + t 9 1 2 + t is regarded as a vector of 4 elements, the ranks are multiplied by the vector, and the result is stored in the register group η, n + 1, n + 2, n + 3 in the register file. That is, the calculated vector is stored in four of one of the four sub-files in Figure 5. The paper size applies to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the note on the back first) (Fill in this page again)

-18 - A7 B7 五、發明説明（16) 器之値被取出，運算結果分別被分散儲存於與4個之副檔案對應之寄存器。關於圖1 0之寄存器檔案，1 6個之寄存器之中每隔 4 個之 4 個之寄存器之値被當成運算向量被讀出，結果被儲存於被指定之寄存器檔案內之連續 4個之亨存器 ( 刖頭具有 4 之倍數之號碼之寄存器 ) 〇因此 > 爲以下之命令時 9 T R V 0 0 9 1 9 2 圖 1 2 之 ( 式 5 — 5 ) 之運算被進行。而且 t 以下之命令列時 9 T R V 0 ， 0 9 1 ， 2 T R V 0 9 4 » 1 9 2 T R V 0 » 8 9 1 2 丁 R V 0 f 1 2 9 1 7 2 圖 1 2 之 ( 式 5 — 3 ) 之運算被進行，進行 4 X 4 行列與 4 X 4 行列之乘算 j 獲得結果之 4 X 4行列 0 此處應注意者爲在 ( 式 5 — 3 ) 與 ( 式 5 — 6 ) 中，右邊之刖項之行列被以 ( 式 5 — 3 ). 與 ( 式 5 — 6 ) 轉置之點即，C 式 5 — 3 ) 之前項之行列之行方向之値群之排列與 ( 式5 _ 經濟部中央標準局貝工消费合作社印製 (請先閲讀背面之注意事項再填寫本頁) 6)之前項之行列之列方向之値群之排列相同，（式5— 3 )之前項之行列之列方向之値群之排列與（式5 — 6 ) 之前項之行列之行方向之値群之排列相同之點》此事，意味介經使用上述2個之命令TR U， TRUT，不會改變寄存器檔案儲存之値群之排列方式，即，不需由記憶體在載入寄存器檔案，可以實行（式5 _ 本紙張尺度適用中國國家搮準（CNS ) A4規格（210X297公嫠）經濟部中央標準局員工消費合作社印製 A7 ____B7 __ 五、發明説明（17 ) 3)與（式5_6)之運算。其結果’將遵循（式3 — 1 0)之逆離散餘弦變換用之圖像資料之計算與不使用 TRU，TRUT命令之場合相比，命.令數可以大幅減少〇又，在實行遵循（式3 — 1 0 )之逆離散餘弦變換時，由主記憶體寄存器有必要載入’儲存之命令’設爲具備先前之微處理器也使用之以下之命令者。 L D 4 b + disp,d,n ST n,b + disp 此處，以L S 4表示之命令爲由基礎位址b只隔開以位移値disp指定之値之位址（在實施例中’爲主記憶體之其中之一之位址），將4個份之資料載入寄存器檔案d內之寄存器群η, n + 1，n + 2，n++3之命令。又’ST 表示之命令爲將寄存器檔案s內之寄存器η往由基礎位址 b只隔開以位移値disp指定之値之位址（在實施例中’爲主記憶體之其中之一之位址）儲存之命令。命令之實行順序接著，關於圖3之控制部9 9 0實行第1種行列運算命令TRVm, n， s， d之場合之順序說明。在第1之步驟，控制部9 9 0通過信號線9 3 0作爲 2位元之二進制碼，首先將「0 0」送往各寄存器檔案 500〜503之副檔案5 1 0〜5 1 3。響應於此，副檔案5 1 0〜5 13分別將寄存器0之內容送往選擇元件本紙張尺度適用中國國家標準（CNS ) A4規格·（ 210X297公釐） ----------^-- (請先閲讀背面之注意事項再填寫本頁) 訂 -20- 經濟部中央標準局員工消費合作社印製 A7 ____B7_ 五、發明説明（18 ) 5 5 0、將寄存器1之內容送往選擇元件5 5 1、將寄存器2之內容送往選擇元件5 5 2、將寄存器3之內容送往選擇元件553。而且’控制部990將指定寄存器檔案之號碼m送往信號現9 3 4。如此，m指定之資料通過選擇元件550〜553 ’被閂鎖於閂鎖920 ’ 921 ’ 922，923 (即寄存器檔案m之寄存器0 ’ 1 ’ 2， 3之內容被閂鎖於閂鎖920 ’ 921 ’ 922 ’ 923 )0 心- 又，控制部9 9 0通過信號線9 3 1將以4.位兀碼給予寄存器號碼η之上位2位元送往各寄存器檔案。響應於此，各寄存器檔案將對應於η之上位2位元之寄存器之內容送往選擇元件5 1 6。再者，控制部9 9 0將η之下位 2位元通過信號線9 3 3送往選擇元件5 1 6。如此，被選擇之資料通過選擇元件5 1 6被送往選擇元件5 5 4。而且，控制部9 9 0將指定寄存器檔案之號碼s送往信號線9 3 5。如此，以s指定之資料通過選擇元件5 5 4被閂鎖於閂鎖9 2 4。同時，控制部9 9 0通過初期値「0 」被設定於暫時寄存器9 6 2之信號線9 3 5控制選擇元件 9 6 3。在第2之步驟，閂鎖920，921，922， 9 2 3之內容分別被送往對應之積和器9 1 0，9 1 1， 912，913，同時，閂鎖92 4之內容被送往4個之積和器910 * 911，912，913。而且在各各之積和器9 1 0〜9 1 3最初之積和被進行著，其之結果被本紙張尺度適用中國國家榇準（CNS ) Α4規格（210XW7公釐） ---1-------「"4-- (請先閱讀背面之注意事項再填寫本頁) 訂 -21 - A7 ___ B7_ 五、發明説明) 19 設定於暫時寄存器9 6 2等》與此同時，通過信號線 9 3 0二進制碼「〇 1」被送往各寄存器檔案5 0 0〜 5 0 3。響應於此，各寄存器檔案分別將寄存器4之內容送往選擇元件5 5 0、將寄存器5之內容送往選擇元件 $ 5 1、將寄存器6之內容送往選擇元件5 5 2、將寄存器7之內容送往選擇元件5 5 3。而且，控制部9 9 0將指定寄存器檔案之號碼m送往信號線9 3 4 »如此，以m 指定之資料通過選擇元件5 5 0〜5 5 3，被閂鎖於閂鎖 920，92 1，922，923。又，控制部 9 9 0 通過信號線9 3 1將4位元之寄存器號碼η增量（+ 1 )之値η+1之上位2位元送往各寄存器檔案500〜 5 0 3 〇經濟部中央標準局貝工消費合作社印製 (請先閲讀背面之注意事項再填寫本頁) 響應於此，各寄存器檔案將對應於η + 1之上位之2 位元之寄存器之內容送往選擇元件5 1 6。再者，控制部 9 9 0將η + 1之下位之2位元通過信號線‘9 3 3送往選擇元件5 1 6 如此，被選擇之資料通過選擇元件5 1 6 被送往選擇元件5 5 4 »而且，控制部9 9 0將指定寄存器檔案之號碼s送往信號線9 3 4。如此以s指定之資料通過選擇元件5 5 4，被閂鎖於閂鎖9 2 4。第3之步驟之動作，與第2之步驟爲相同之動作，不同之點爲寄存器號碼4，5，6，7成爲，8，9，1 0 ，：L 1，又4位元之寄存器號碼η + 1成爲η + 2。第4之步驟之動作也與第2之步驟爲相同之動作，不同之點爲寄存器號碼4，5 ’ 6，7成爲12，13 ’ 本纸張尺度通用中國國家操準（CNS ) Α4規格（2丨0Χ297公釐） -22- A7 _B7 _ 五、發明説明（2D ) 14 ’ 15 ’又4位兀之寄存器號碼π + 1成爲π + 3 ° 第5之步驟爲將被閂鎖於積和器9 1 0，9 1 1 ’ 9 1 2，91 3之4個之値寫回寄存器部9 0 1之步驟。被閂鎖在積和器9 1 0之値通過信號線9 2 0等被送往寄 ^器部9 0 1。控制部9 9 0通過信號線9 3 2將4位元之寄存器號碼η之上位2位元送往各副檔案。再者，通過信號線9 3 6許可對對應於運算數「d」之寄存器檔案之寫入。如此，4個之値被設定於以「（1」指定之寄存器檔案之4個之副檔案。在以上之動作中’以（式5 — 1 )定義之運算被進行著。接著，關於圖3之控制部9 9 0實行以下之第2種行列運算命令TRVTm, n, s， d之場合說明之。經濟部中央標準局貝工消費合作社印装 (請先閲讀背面之注意事項再填寫本頁) 在第1之步驟中，控制部9 9 0通過信號線9 3 0 ’ 首先將二進制碼「〇〇」送往各寄存器檔案。響應於此’ 各寄存器檔案將各各寄存器〇之內容送往選擇元件5 5 0 、將寄存器1之內容送往選擇元件5 5 1、將寄存器2之內容送往選擇元件5 5 2、將寄存器3之內容送往選擇元件5 5 3。而且控制部9 9 0將指定寄存器檔案之號碼m 送往信號線9 3 4 »如此’以m指定之資料通過選擇元件 550〜553，被閂鎖於閂鎖920，921，922 ，923。又，控制部990通過信號線931將4位元之寄存器號碼η之下位2位元送往各寄存器檔案（在 TRV命令中，注意上位2位元）。響應於此，各寄存器檔案將對應η之下位2位元之寄存器之內容送往選擇元件本紙張尺度適用中國國家標準（CNS ) Α4規格（210Χ297公釐） -23- 經濟部中央榇準局員工消費合作社印製 Α7 Β7 五、發明説明（21 ) 5 1 6 0 再者，控制部9 9 0將寄存器號碼η之上位2位元通過信號線9 3 3送往選擇元件5 1 6。如此，被選擇之資料通過選擇元件5 1 6被送往選擇元件5 5 4 »而且控制部9 9 0將指定寄存器檔案之號碼s送往信號線9 3 5。〆如此，以s被指定之資料通過選擇元件5 5 4被閂鎖於閂鎖924。同時，控制部990將初期値「0」設定於暫時寄存器9 6 2地.通過信號線9 3 5控制選擇元件9 6 3 〇在第2之步驟中，閂鎖920，921，922， 923之內容被送往各各積和器910，91 1，912 ，9 1 3。又閂鎖9 2 4之內容被送往4個之積和器 910，91 1，912，913。而且在各各之積和器進行最初之積和，其結果被設定於暫時寄存器9 6 2等。同時，通過信號線9 3 0'，二進制碼「0 Γ」被送往各寄存器檔案。響應於此，各寄存器檔案分別將寄存器4之內容送往選擇元件5 5 0、將寄存器5之內容送往選擇元件 5 5 1、將寄存器6之內容送往選擇元件5 5 2、將寄存器7之內容送往選擇元件553 »而且，控制部990將指定寄存器檔案之號碼m送往信號線9 3 4。如此，以m 指定之資料通過選擇元件5 5 0〜5 5 3，被閂鎖於閂鎖 920，921，922，923。又，通過信號線 9 3 1，寄存器號碼n + 1之下位2位元被送往各寄存器檔案。響應於此，個別寄存器檔案將對應寄存器號碼η + 本紙張尺度適用中國國家標準（CNS ) Α4規格（210Χ297公釐） (請先閲讀背面之注意事項再填寫本頁) 訂-18-A7 B7 V. Description of the Invention (16) The device of the device is taken out, and the operation results are separately stored in the registers corresponding to the four auxiliary files. Regarding the register file of FIG. 10, every 4 of the 16 registers is read as an operation vector, and the result is stored in the designated register file of the 4 consecutive registers. Register (register with a mark having a multiple of 4) ○ Therefore, when the following command is executed: 9 TRV 0 0 9 1 9 2 (Equation 5-5) in Figure 12 is performed. And when the following command line is t 9 TRV 0, 0 9 1, 2 TRV 0 9 4 »1 9 2 TRV 0» 8 9 1 2 RV RV 0 f 1 2 9 1 7 2 Figure 1 2 (Equation 5 — 3 The operation of) is performed, multiplying 4 X 4 rows and 4 X 4 rows and columns by j to obtain the result of 4 X 4 rows and columns 0. It should be noted here that (Expression 5-3) and (Expression 5-6), The rank of the right term is given by (Eq. 5-3). The point of transposition with (Eq. 5-6), that is, C Eq. 5-3), is the arrangement of the unitary group in the direction of the line of the previous term and (Eq. 5) _ Printed by the Shellfish Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs (please read the notes on the back before filling this page) 6) The arrangement of the groups in the direction of the previous item is the same, (Equation 5-3) The arrangement of the unitary groups in the direction of the rows and columns is the same as the arrangement of the unitary groups in the direction of the rows of the preceding items (Equation 5-6). This means that the two commands TR U, TRUT, are not used. Will change the register The arrangement method of the archives in the file storage, that is, the register file does not need to be loaded by the memory, can be implemented (Eq. 5 _ This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X297)) Central standard of the Ministry of Economy A7 printed by the Bureau's Consumer Cooperatives ____B7 __ V. The description of the invention (17) 3) and the operation of (Formula 5_6). As a result, the calculation of the image data for the inverse discrete cosine transform according to (Equation 3-10) will be compared with the case where the TRU and TRUT commands are not used. The order number can be greatly reduced. In the inverse discrete cosine transform of Equation 3-10), it is necessary to load the 'stored command' from the main memory register to the one having the following command which is also used by the previous microprocessor. LD 4 b + disp, d, n ST n, b + disp Here, the command represented by LS 4 is separated by the base address b only the address of 値 specified by the displacement 値 disp (in the embodiment, it is The address of one of the main memories), load 4 copies of data into the register group η, n + 1, n + 2, n ++ 3 in the register file d. The command indicated by 'ST is to separate the register η in the register file s from the base address b to only the address specified by the displacement 値 disp (in the embodiment, one of the bits in the main memory) Address) to store the order. Command execution sequence Next, the sequence of the case where the control unit 990 of FIG. 3 executes the first type of row and column operation command TRVm, n, s, and d will be described. In the first step, the control unit 9 0 9 uses the signal line 9 3 0 as a 2-bit binary code, and first sends “0 0” to the auxiliary files 5 1 0 to 5 1 3 of each register file 500 to 503. In response to this, the auxiliary files 5 1 0 to 5 13 respectively send the contents of register 0 to the selection component. The paper size is applicable to the Chinese National Standard (CNS) A4 specification · (210X297 mm) ---------- ^-(Please read the notes on the back before filling out this page) Order-20- Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 ____B7_ V. Description of the invention (18) 5 5 0. Send the contents of register 1 to Selection element 5 5 1. Send the content of register 2 to selection element 5 5 2. Send the content of register 3 to selection element 553. Furthermore, the 'control section 990 sends the number m of the designated register file to the signal 9 3 4. In this way, the data specified by m is latched to the latch 920 '921' 922, 923 through the selection elements 550 to 553 '(i.e., the contents of register 0' 1 '2, 3 of the register file m are latched to the latch 920' 921 '922' 923) 0 heart-In addition, the control unit 9 9 0 sends the upper 2 bits of the register number η to the register number η through the signal line 9 3 1 to each register file. In response to this, each register file sends the contents of the register corresponding to the upper 2 bits of n to the selection element 5 1 6. In addition, the control unit 990 sends the lower 2 bits of η to the selection element 5 1 6 through the signal line 9 3 3. In this way, the selected data is sent to the selection element 5 5 4 through the selection element 5 1 6. Further, the control section 9990 sends the number s of the designated register file to the signal line 935. Thus, the data designated by s is latched to the latch 9 2 4 through the selection element 5 5 4. At the same time, the control unit 9 9 0 controls the selection element 9 6 3 through the signal line 9 3 5 which is set to the temporary register 9 6 2 in the initial stage “0”. In the second step, the contents of the latches 920, 921, 922, and 9 2 3 are sent to the corresponding summators 9 1 0, 9 1 1, 912, 913, and the contents of the latch 92 4 are sent. Add the sum of 4 to 910 * 911, 912, 913. And the initial product sum of each product summator 9 1 0 ~ 9 1 3 is being carried out, and the result is applied to Chinese paper standard (CNS) A4 standard (210XW7 mm) by this paper standard --- 1- ------ "" 4-- (Please read the precautions on the back before filling this page) Order-21-A7 ___ B7_ V. Description of the invention 19 Set in temporary register 9 6 2 etc." At the same time Through the signal line 9 30 binary code "〇1" is sent to each register file 5 0 ~ 5 0 3. In response to this, each register file sends the contents of register 4 to selection element 5 50, the contents of register 5 to selection element $ 5 1, and the contents of register 6 to selection element 5 5 2. Register 7 The content is sent to the selection element 5 5 3. Furthermore, the control unit 9 9 0 sends the number m of the designated register file to the signal line 9 3 4 »In this way, the data designated by m passes through the selection elements 5 5 0 to 5 5 3 and is latched to the latches 920, 92 1 , 922, 923. In addition, the control unit 9 9 0 sends a 4-bit register number η increment (+1) of 値 η + 1 to the upper 2 bits to each register file 500 to 5 0 3 through the signal line 9 3 1 Printed by the Central Standards Bureau Shellfish Consumer Cooperative (please read the precautions on the back before filling this page) In response to this, each register file sends the contents of the 2-bit register corresponding to the upper η + 1 to the selection element 5 1 6. Furthermore, the control unit 9 9 0 sends the lower two bits of η + 1 to the selection element 5 1 6 through the signal line '9 3 3. Thus, the selected data is sent to the selection element 5 through the selection element 5 1 6 5 4 »Furthermore, the control unit 9 9 0 sends the number s of the designated register file to the signal line 9 3 4. The data designated in this way is latched to the latch 9 2 4 by the selection element 5 5 4. The operation of the third step is the same as the operation of the second step. The difference is that the register number 4, 5, 6, 7 becomes 8, 9, 9, 0: L 1, and a 4-digit register number. η + 1 becomes η + 2. The operation of the fourth step is the same as the operation of the second step. The difference is that the register number 4,5 '6,7 becomes 12,13'. This paper standard is in accordance with China National Standards (CNS) Α4 specification ( 2 丨 0 × 297 mm) -22- A7 _B7 _ V. Description of the invention (2D) 14 '15' and 4 digit register number π + 1 becomes π + 3 ° The fifth step is to be latched on the product sum Steps for writing 4 of the registers 9 1 0, 9 1 1 '9 1 2, 91 3 back to the register section 9 01. The latched on the product 9 1 0 is sent to the sender 9 0 1 through the signal line 9 2 0 and so on. The control unit 9 0 0 sends the upper 2 bits of the 4-bit register number η to each auxiliary file through the signal line 9 3 2. Furthermore, the writing of the register file corresponding to the operand "d" is permitted through the signal line 9 3 6. In this way, four of the four files are set to the four sub-files of the register file designated by "(1". In the above operation, the operation defined by (Equation 5-1) is performed. Next, regarding Fig. 3 The control unit 9 9 0 implements the following second kind of rank calculation commands TRVTm, n, s, and d. It is explained in the case of the printing and packaging of the Shellfish Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page) ) In the first step, the control unit 9990 sends the binary code "〇〇" to each register file through the signal line 930. In response to this, each register file sends the contents of each register 0 to Selecting element 5 5 0, sending the content of register 1 to selecting element 5 5 1. Sending the content of register 2 to selecting element 5 5 2. Sending the content of register 3 to selecting element 5 5 3. And the control unit 9 9 0 sends the number m of the specified register file to the signal line 9 3 4 »In this way, the data specified by m passes through the selection elements 550 to 553 and is latched to the latches 920, 921, 922, and 923. The control unit 990 passes Signal line 931 lowers the 4-bit register number η by 2 Yuan to each register file (in the TRV command, pay attention to the upper 2 bits). In response, each register file sends the contents of the register corresponding to the lower 2 bits to η to the selection component CNS) Α4 specification (210 × 297 mm) -23- Printed by the Consumers' Cooperative of the Central Government Bureau of the Ministry of Economic Affairs Α7 Β7 V. Description of the invention (21) 5 1 6 0 Furthermore, the control unit 9 9 0 places the register number η above The 2 bits are sent to the selection element 5 1 6 through the signal line 9 3 3. In this way, the selected data is sent to the selection element 5 5 4 through the selection element 5 1 6 »and the control unit 9 9 0 will specify the number of the register file s is sent to the signal line 9 3 5. 〆 In this way, the data designated by s is latched to the latch 924 by the selection element 5 5 4. At the same time, the control unit 990 sets the initial value "0" to the temporary register 9 6 2 Ground. Control the selection element 9 6 3 through the signal line 9 3 5. In the second step, the contents of the latches 920, 921, 922, and 923 are sent to each of the summators 910, 91 1, 912, 9 1 3. The contents of the latch 9 2 4 are sent to the 4 summators 910, 91 1, 912 , 913. Furthermore, the first product sum is performed in each product summator, and the result is set in the temporary register 9 62, etc. At the same time, the binary code "0 Γ" is sent to each register through the signal line 9 3 0 '. In response to this, each register file sends the contents of register 4 to selection element 5 50 0, the contents of register 5 to selection element 5 5 1, and the contents of register 6 to selection element 5 5 2. The content of the register 7 is sent to the selection element 553 », and the control unit 990 sends the number m of the designated register file to the signal line 9 3 4. In this way, the data designated by m passes through the selection elements 5 50 to 5 53 and is latched to the latches 920, 921, 922, 923. In addition, through the signal line 9 31, the lower 2 bits of the register number n + 1 are sent to each register file. In response to this, individual register files will correspond to the register number η + this paper size applies the Chinese National Standard (CNS) Α4 specification (210 × 297 mm) (Please read the precautions on the back before filling this page) Order

•泉· II -24- 經濟部中央標準局員工消費合作社印11 A7 B7 五、發明説明（22 ) 1之下位2位元之寄存器之內容送往選擇元件5 16。再者，將寄存器號碼n + 1之上位2位元通過信號線9 3 3 送往選擇元件5 1 6。而且，被選擇之資料通過選擇元件 516被送往選元件554。而且，控制部990將指定声存器檔案之號碼s送往信號線9 3 5。如此，以s指定之資料通過選擇元件5 5 4被閂鎖於閂鎖9 2 4。第3之步驟之動作與第2之步驟爲相同之動作，但寄存器號碼4，5，。6 ’ 7成爲8，9，10，11，又4 位元之寄存器號碼n + 1成爲η + 2之點不同。第4之步驟之動作也與第2之步驟爲相同之動作，但寄存器號碼4，5 ’ 6，7成爲12，1 3，14，1 5 ，又4位元之寄存器號碼η + 1成爲η + 3之點不同。第5之步驟爲將被閂鎖於積和器9 1 0，9 1 1 ’ 912，913之4個之値寫回寄存器部901之步驟。被閂鎖於積和器9 1 0等之値通過信號線9’ 2 〇等被送往寄存器部9 0 1。控制部9 9 0通過信號線9 3 2將寄存器號碼η之上位2位元送往各副檔案。再者’通過信號線 9 3 6許可對對應號碼d之寄存器檔案之寫入。如此’ 4 個之値被設定於號碼d之寄存器檔案之4個之副檔案。在以上之動作中，以（式5 — 4)定義之運算被進行著。逆離散餘弦變換所必要之資料接著，整理逆離散餘弦變換所必要之資料。這些之資料由變換程式所見，如圖1 3所示之配置’被儲存於外部本纸張尺度適用中國國家揉準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁)• Quan · II -24- Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 11 A7 B7 V. Description of the invention (22) The contents of the lower 2 bits of the register are sent to the selection element 5 16. Furthermore, the upper 2 bits of the register number n + 1 are sent to the selection element 5 1 6 through the signal line 9 3 3. Further, the selected data is sent to the selection element 554 through the selection element 516. Then, the control unit 990 sends the number s of the designated sound memory file to the signal line 9 3 5. In this way, the data designated by s is latched to the latch 9 2 4 through the selection element 5 5 4. The operation of the third step is the same as the operation of the second step, but the register number is 4, 5 ,. 6 ′ 7 is different in that the register number n + 1 of 8, 9, 10, 11, and 4 bits becomes η + 2. The operation of the fourth step is the same as the operation of the second step, but the register number 4,5 '6,7 becomes 12,1 3,14,1 5 and the 4-bit register number η + 1 becomes η + 3 is different. The fifth step is a step of writing back to the register section 901 the four of the latches 910, 911, 912, 913. Those latched in the product 9 10 and the like are sent to the register portion 9 1 through a signal line 9 '2 0 and the like. The control unit 9 9 0 sends the upper 2 digits of the register number η to each auxiliary file through a signal line 9 3 2. Furthermore, the writing of the register file corresponding to the number d is permitted through the signal line 9 3 6. In this way, four of the four's are set in the four sub-files of the register file of number d. In the above operation, the operation defined by (Equation 5-4) is performed. Data necessary for inverse discrete cosine transform Next, the data necessary for inverse discrete cosine transform are organized. These data are seen by the conversion program. The configuration shown in Figure 13 is stored on the outside. The paper size is applicable to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling out (This page)

-25- A7 B7 經濟部中央樣準局員工消費合作社印裝五、發明説明（23 ) 之主記憶體Μ E Μ內。首先，變變對象之資料（DCT係數）爲必要，此被儲存於圖1 3之主記憶體內之Χ1Τ，Χ2Τ，Χ3Τ， X 4 Τ所示之記憶領域。變變必要之常數行列（相當於 DCT之基底）爲相關於（式1-2)、（式1 — 3)、 (式4_6)者，這些分別被儲存於圖1 3之AT，CT ，ΒΚ示之記憶領域。而且，變換結果被儲存於圖1 3內之Υ所示之記憶領域。逆離散餘弦變換逆離散餘弦變換依序進行圖8之（式3 — 6)、（式 3-7)、（式3-9)、（式3-10)即可。實行（式3 — 6)時，以4個之LD4命令將圖1 3 之XI Τ載入寄存器檔案0，以4個之LD4之命令將圖 13之AT載入寄存器檔案1，以4個之T"RVT命令將對應於（式3 — 6)內之AX1 t之結果於寄存器檔案2 獲得，以4個之TRV命令將對應於（式3 — 6 )之右邊之結果於寄存器檔案2獲得，最後，以1 6個之S T命令儲存於圖1 3之T即可。此可以以下之命令系列實行之》 L D 4 X 1 Τ + 0， 0 , 0 L D 4 X 1 Τ + 4 , 0，4 L D 4 X 1 丁 + 8 , 0 , 8 L D 4 X 1 Τ + 12 ，0 , L D 4 ( 0 ) A T， 1,0 本紙張尺度適用中國國家揉準（CNS ) A4規格（210X297公釐） (請先閲讀背面之注意事項再填寫本頁) ^ -'β Γ -26- 五、發明説明（24 ) A7 B7 經濟部中央梯準局員工消费合作社印製 L D 4 ( 4 ) AT, 1， L D 4 ( 8 ) A T i 1 9 L D 4 ( 1 2 ) A T y 1 T R V T 0 > 0 > 1 贅 2 T R V T 0 9 4 y 1 > 2 T R V T 0 9 8 y 1 ί 2 T R V T 0 > 1 2 9 1 > T R V 1 9 0 j 2 9 2 T R V 1 4 9 2 y 2 T R V 1 8 » 2 y 2 T R V 1 > 1 2 > 2 > S T 2 9 0 T + 0 S T 2 1 9 T + 4 S T 2 ， 1 5 + 6 0 於實行 ( 式 3 — 7 ) 時以 X 2 T 載入寄存器檔案 0 > 於 ( 式 3 — 6 ) 內之 A X 1 以 4 個之 T R V 命令將對果於寄存器檔案 2 獲得最於圖 1 3 之 T 即可 0 與 ( 式令系列實行之故省略命令意 A T 之載入被省略之點 0 實行 ( 式 3 — 8 ) ( 2 2 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） 4 8，12 4個之LD4命令將圖13之以4個之TRVT命令將對應 t之結果於寄存器檔案2獲得應於（式3 — 6)之右邊之結後，以16個之ST命令儲存 3 — 6 )之場合相同地可以命系列之具體例。在此處，應注 • . 式3 — 9)用之具體之說明省 -27- (請先閲讀背面之注意事項再填寫本頁)-25- A7 B7 Printed by the Consumer Cooperatives of the Central Procurement Bureau of the Ministry of Economic Affairs 5. In the main memory Μ E Μ of the invention description (23). First of all, it is necessary to change the data of the object (DCT coefficient). This is stored in the memory fields shown in Fig. 13 in the X1T, X2T, X3T, and X4T. The necessary constants (equivalent to the base of DCT) are related to (Equation 1-2), (Equation 1-3), and (Equation 4_6). These are stored in AT, CT, and Β in Fig. 13 respectively. Shown the field of memory. Moreover, the transformation result is stored in the memory area shown by Υ in Fig. 13. Inverse discrete cosine transform The inverse discrete cosine transform can be performed in the order of (Equation 3-6), (Equation 3-7), (Equation 3-9), and (Equation 3-10) in order. When executing (Equation 3-6), load XI T of FIG. 13 into register file 0 with 4 LD4 commands, load AT of FIG. 13 into register file 1 with 4 LD4 commands, and 4 with LD4 command. The T " RVT command will obtain the result corresponding to AX1 t in (Equation 3-6) in register file 2. The 4 TRV commands will obtain the result corresponding to the right of (Equation 3-6) in register file 2. Finally, 16 ST commands can be stored in T in FIG. 13. This can be implemented in the following command series: LD 4 X 1 Τ + 0, 0, 0 LD 4 X 1 Τ + 4, 0, 4 LD 4 X 1 D + 8, 0, 8 LD 4 X 1 Τ + 12, 0, LD 4 (0) AT, 1,0 This paper size is applicable to China National Standard (CNS) A4 (210X297 mm) (Please read the precautions on the back before filling this page) ^ -'β Γ -26 -V. Description of the invention (24) A7 B7 Printed by the Consumer Cooperatives of the Central Ladder Bureau of the Ministry of Economic Affairs LD 4 (4) AT, 1, LD 4 (8) AT i 1 9 LD 4 (1 2) AT y 1 TRVT 0 > 0 > 1 2 TRVT 0 9 4 y 1 > 2 TRVT 0 9 8 y 1 ί 2 TRVT 0 > 1 2 9 1 > TRV 1 9 0 j 2 9 2 TRV 1 4 9 2 y 2 TRV 1 8 »2 y 2 TRV 1 > 1 2 > 2 > ST 2 9 0 T + 0 ST 2 1 9 T + 4 ST 2, 1 5 + 6 0 X 2 T load register file 0 > AX 1 in (Equation 3-6) with 4 TRV commands will match register file 2 to get the most T in Fig. 1 3 and 0 and (Formula series Carry out Therefore, the omission of the command means that the loading of the AT is omitted. 0 Implementation (Eq. 3-8) (2 2 This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm) 4 8, 12 4 LD4 orders will Figure 13 shows the result of t corresponding to t in register file 2. After obtaining the result on the right side of (Equation 3-6), store the result of 16 ST commands in 3-6). A series of specific examples. Here, you should note •. Formula 3 — 9) Specific instructions for the province -27- (Please read the precautions on the back before filling this page)

-、tT 鲁經濟部中央標準局員工消費合作社印製 A7 B7 五、發明説明（25 ) 略。但是，在此要指出先計算（式3 — 9 )時，可以使 AT或C T之載入次數減少，成爲有效率之點。實行（式3-10)時，首先以4個之LD4命令將圖1 3之B載入寄存器檔案0。接著，以4個之LD4命，由圖13之T將4行份之資料載入寄存器檔案1，以4 個之TRV命令實行（式3 — 1 0)之右邊之運算之1/ 4，將其結果以1 6個之ST命令儲存於圖1 3之Y。之後，將與此相同者；次重複即可。命令系列之最初之1 / 4成爲如以下者。 LD4 B + 0 , 0 , 0 LD4 B + 4,0,4 LD4 B + 8,0,8 LD4 B + 1 2 , 0 , 12 L D 4 T + 〇 , 〇 , 0 L D 4 T + 4 , 0 , 4 L D 4 T + 8，0 , 8 LD4 T + 1 2 , 0 , 12 TRV 0,0,1,1 TRV 0,4,1,1 TRV 0,8,1,1 TRV 0,12,1,1 ST 1，0，Y + 0 ST 1，0 , Y + 4 ST 1 , 2 , Y + 3 2 本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐〉 (請先閲讀背面之注意事項再填寫本頁)-, Printed by tT Lu printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs A7 B7 V. Description of Invention (25) omitted. However, it should be pointed out that when calculating first (Equation 3-9), the number of loading times of AT or C T can be reduced, which becomes an efficient point. When executing (Equation 3-10), first load the register B of Figure 13 with the 4 LD4 commands. Next, with four LD4 lives, load four rows of data into register file 1 from T in Figure 13 and execute one of the four operations on the right side of the four TRV commands (Equation 3 — 1 0). The result is stored in Y of FIG. 13 with 16 ST commands. After that, it will be the same; repeat this time. The first 1/4 of the command series became as follows. LD4 B + 0, 0, 0 LD4 B + 4, 0, 4 LD4 B + 8, 0, 8 LD4 B + 1 2, 0, 12 LD 4 T + 〇, 〇, 0 LD 4 T + 4, 0, 4 LD 4 T + 8,0, 8 LD4 T + 1 2, 0, 12 TRV 0,0,1,1 TRV 0,4,1,1 TRV 0,8,1,1 TRV 0,12,1, 1 ST 1, 0, Y + 0 ST 1, 0, Y + 4 ST 1, 2, Y + 3 2 This paper size applies to China National Standard (CNS) A4 specification (210X297 mm) (Please read the note on the back first (Fill in this page again)

-28 - A7 B7 經濟部中央標準局舅工消费合作社印裝五、發明説明 (26 ) 1 S T 1 3 y Y + 3 6 1 1 S T 1 9 4 > Y + 1 1 i • · · « · · 請 1 I 先 1 S T 1 y 1 5 9 Y + 3 9 閲 ik 背 1 m 餘之 3 / 4 之命令系列省略〇之注 1 1 意事 1 1 I 發明之效果苒填〆 1 1Γ*Ν 1 以上雖可以實斤逆離散餘弦變換但由 ( 式 3 — 6 ) 頁 n I 至 ( 式 3 — 9 ) 之實行爲 1 1 2 命令 ( 式 3 一 1 0 ) 之 1 I 實行爲 1 0 0 命令 \ 合計大約爲 2 0 0 命令 0 在依據先刖 1 I 方式之逆離散餘弦變換中需 1 0 0 0 一 2 0 0 0 命令者 1 > 介經本發明之適用約成爲以 2 0 0 命令完成之故贅了訂 1 解到逆離散餘弦變換可以大幅度效率化〇 1 1 如此可以大幅減少命令數之理由爲準備行行列積用 1 1 之命令 T R V T R V T 而且可以有效率實行其地構成 Λ 1 1 協同處理器之運算電路而控制之點 0 再者關於 T R V T 1 命令之轉置行列之機能也値得注百 0 要簡潔計算 ( 式 3 一 1 1 1 ) 時在計算括弧內之式 Μ X 後有必要製作轉置行列 ’ 1 1 之處理〇但是 T R V T 命令提供同時進行製作轉置行列 1 之處理與進行行列積之處理之機能此將性能改善更爲 1 1 提昇 0 再者被載入寄存器檔案之 A t 活用 T R V T 命.令 1 1 而利用 2 次之點也有注巨之必要 0 1 1 又在上述實施例中並非直接計算 ( 式 3 — 1 ) > 1 1 雖以 ( 式 3 — 6 ) ( 式 3 — 7 ) > ( 式 3 一 9 ) ( 式 I 1 張紙本準標家國國中用適釐公 7 29 29 經濟部中央標準局員工消费合作社印製 A7 B7 五、發明説明（27 ) 3—8)，（式3—10)之順序計算之，但也可以其他之順序計算之。. 又，在以上之說明中，主要雖以逆離散餘弦變換爲例說明之，但本發明也可以適用於離散餘弦變換。圖1 4表 f離散餘弦變換之定義式與將其分解爲部份行列，定義式之展開型。結果將由（式6 — 6)至（式6 — 1 0)依序計算即可。此際，在上數實施例中說明之TRV， T R V T命令有效，地活用，可以大幅減少實行命令數》又，在上述實施例中，就寄存器檔案設置4個者說明之，但關於本發明之TRU命令以及TRUT命令於具有圖1 5所示之至少由3個之寄存器檔案5 0 0，5 0 1， 5 0 2形成之寄存器部，與圖3所示之運算電路之協同處理器中可以實行。再者，關於上述實施例，雖就依據協同處理器之離散餘弦變換以及逆離散餘弦變換說明之，在真'有圖3所示之構成之運算電路之協同處理器中，於行列運算之外，可以使進行浮動小數點運算。又，關於實施例之微處理器，雖就不同於中央處理裝置（第1處理器）1設置進行個別行列運算以及浮動小數點運算之協同處理器（第2處理器） 2之場合而說明之，但也可以爲使這些2個之處理器之機能以1個之處理器實現之構成。産業上之利用之可能性在以上之說明中，將介經本發明者而完成之發明主要本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公董〉 (請先閲讀背面之注意事項再填寫本頁)-28-A7 B7 Printing by the Central Laboratories of the Ministry of Economic Affairs, Masonry Consumer Cooperatives V. Description of Invention (26) 1 ST 1 3 y Y + 3 6 1 1 ST 1 9 4 > Y + 1 1 i • · · «· · Please 1 I first 1 ST 1 y 1 5 9 Y + 3 9 Read the ik back 1 m and the remaining 3/4 command series omitted 0 Note 1 1 Matter 1 1 I The effect of the invention fill in 1 1Γ * Ν 1 Although the above can be the inverse discrete cosine transform, but from the implementation of (Formula 3-6) page n I to (Formula 3-9) is 1 1 2 The command (Formula 3-1 0) 1 I is implemented as 1 0 0 Command \ Total is about 2 0 0 Command 0 1 1 0 0 0 2 0 0 is required in the inverse discrete cosine transform according to the first 1 I method. Commander 1 > Through the application of the present invention, it becomes a 2 0 0 command. Completed the reason to order 1 solution to the inverse discrete cosine transform can Amplitude efficiency 〇1 1 The reason that the number of commands can be greatly reduced is to prepare the TRVTRVT command for the 1st row and column product 1 and to efficiently implement the point that is controlled by the arithmetic circuit of the Λ 1 1 coprocessor. 0 The function of the transposed rank of the TRVT 1 command is also worth noting 100. To perform a simple calculation (Eq. 3 1 1 1), it is necessary to make a transposed rank '1 1 after calculating the formula MX in the brackets. But TRVT The command provides the function of simultaneously processing the production of transposed ranks 1 and the processing of rank and rank products. This will improve the performance even more 1 1 improve 0, and then A t loaded into the register file uses the TRVT command. Order 1 1 and use 2 times It is necessary to pay attention to the point. 0 1 1 In the above embodiment, it is not directly calculated. (Equation 3 — 1) > 1 1 Although (Equation 3-6) (Equation 3-7) > (Equation 3-9) ) (Formula I 1 sheet Paper standard quasi-standards are printed in appropriate cents 7 29 29 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs A7 B7 V. Invention Description (27) 3-8), (Formula 3-10) are calculated in the order, but It can also be calculated in other orders. In the above description, although the inverse discrete cosine transform is mainly described as an example, the present invention can also be applied to the discrete cosine transform. Figure 14 shows the definition of f-discrete cosine transform and its decomposition into partial ranks. The results will be calculated in sequence from (Equation 6-6) to (Equation 6-10). At this time, the TRV and TRVT commands described in the above embodiments are effective and can be used effectively, which can greatly reduce the number of executed commands. In addition, in the above embodiment, four registers are set and explained, but regarding the present invention, The TRU command and the TRUT command can be used in a coprocessor with a register portion formed of at least three register files 50 0, 50 0 1, 50 2 as shown in FIG. 15 and the arithmetic circuit shown in FIG. 3. Implemented. Furthermore, although the above-mentioned embodiment is described based on the discrete cosine transform and inverse discrete cosine transform of the coprocessor, in the coprocessor of the arithmetic circuit having the structure shown in FIG. You can make floating-point operations. In addition, the microprocessor of the embodiment will be described with reference to the case where the central processing unit (the first processor) 1 is provided with a coprocessor (the second processor) 2 that performs individual row and column operations and floating decimal point operations. However, it can also be a configuration in which the functions of these two processors are realized by one processor. Possibility of industrial utilization In the above description, the inventions made by the inventors are mainly based on the Chinese paper standard (CNS) A4 specification (210X297 public directors) (please read the precautions on the back before filling in) (This page)

T 30 經濟部中央標準局員工消费合作社印裝 A7 B7 '説明（28 ) 就適用於泛用微處理器之場合而說明之，但此發明並不限定於此，可以廣泛利用於進行圖像資料之壓縮/伸長之處理器，一般進行其他之行列積之資料處理裝置。 _面之簡單說明圖1表示適用本發明之合適之微處理器之一實施例之方塊圖。 ‘圖2表示構成微處理器之中央處理裝置（CPU)之具體的實施例之方塊圖。圖3表示構成有效率實行TRV命令與TRVT命令之合適之協同處理器之運算電路之a實施例之圖》圖4表示協同處理器之運算電路內之積和器之實施例之圖。圖5表示協同處理器之寄存器部之實施例之圖。圖6表示關於逆離散餘弦變換之行列與其之部份行列分解之圖。圖7表示被逆離散餘弦變換之行列與對其之部份行列之解之圖。圖8表示2次元逆離散餘弦變換之定義與將其分解爲部份行列、定義式之展開之圖。圖9表示爲了計算圖8中之（式3 — 1 0 )之順序用之圖。圖10表示寄存器檔案之槪念之圖。圖1 1表示TRV命令與TRVT命令之命令形式之本紙張尺度適用中國國家標準（CNS ) A4規格（210X2.97公釐） (請先閲讀背面之注意事項再填寫本頁)T 30 A7 B7 printed by the Consumers 'Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs' Explanation (28) is described as applicable to general-purpose microprocessors, but this invention is not limited to this and can be widely used for image data The compression / extension processor generally performs other data processing devices. Brief Description of the Drawings Fig. 1 is a block diagram showing one embodiment of a suitable microprocessor to which the present invention is applicable. 'Fig. 2 is a block diagram showing a specific embodiment of a central processing unit (CPU) constituting a microprocessor. Fig. 3 is a diagram showing an embodiment of an arithmetic circuit constituting a suitable coprocessor that efficiently executes the TRV command and the TRVT command. Fig. 4 is a diagram showing an embodiment of a product summator in the arithmetic circuit of the coprocessor. Fig. 5 is a diagram showing an example of a register section of a coprocessor. Fig. 6 is a diagram showing the rank and inverse decomposition of the inverse discrete cosine transform. Fig. 7 is a diagram showing the ranks of the inverse discrete cosine transform and their partial ranks. Fig. 8 is a diagram showing the definition of a two-dimensional inverse discrete cosine transform and its decomposition into partial ranks and unfolding definitions. FIG. 9 is a diagram for calculating the order of (Expression 3-10) in FIG. Fig. 10 is a diagram showing a concept of a register file. Figure 1 1 shows the form of the TRV command and TRVT command. The paper size applies the Chinese National Standard (CNS) A4 specification (210X2.97 mm) (Please read the precautions on the back before filling this page)

-31 - A7 _ _B7 _______ 一五、發明説明（29 ) 圖。圖1 2表示說明TRV命令與TRVT命令之命令機能用之運算式之圖。圖13表示由儲存在進行逆離散餘弦變換之際之記億體內之資料之變換程式所見之配置圖。圖1 4表示2次元離散餘弦變換之定義與將其分解爲部份行列、定義式之展開之圖。圖15表示可μ實行TRV命令與TRVT命令之協同處理器之寄存器部之其它之構成例之圖。主要元件對照表 1 中央處理器 2 協同處理器 3 中斷控制 4 記憶管理單元 .. 5 位址變換電路 6 命令/資料混在快取記憶體經濟部中央標準局貝工消費合作社印装 7 快取記憶體控制器 8 a 32位元邏輯位址匯流排 8b 32位元物理位址滙流排 8 c 周邊位址匯流排 9a、 9b 32位元資料匯流排 9c 16位元周邊資料匯流排 10 外部匯流排介面本紙張尺度適用中國國家標準（CNS ) A4規格.（210X1297公釐） -32 - A7 B7 經濟部中央標準局員工消费合作社印装五、發明説明（3〇 ) 1 1 串列溝通介面 1 2 即時時鐘脈衝 1 3 計時器 1 4 匯流排狀態控制器 1 5 時鐘脈衝產生電路 1 6 監視計時器 1 7 I / 0 填 1 8 使用者中斷擇制器 1 0 0 半導體晶片 9 0 1 寄存器部 9 1 0 9 1 3 積和器 9 2 0 9 2 4 閂输電路 9 6 0 乘法器 9 6 1 加法器 9 6 2 暫時寄存器 (請先閲讀背面之注意事項再填寫本頁)-31-A7 _ _B7 _______ One 5. Description of the invention (29) Figure. Fig. 12 is a diagram showing the calculation formulas used for the command functions of the TRV command and the TRVT command. FIG. 13 is a layout diagram as seen from a conversion program of data stored in the memory of an inverse discrete cosine transform. Figure 14 shows the definition of the 2D discrete cosine transform and its decomposition into partial ranks and unfolding definitions. Fig. 15 is a diagram showing another example of the configuration of the register section of the coprocessor that can execute the TRV command and the TRVT command. Comparison table of main components 1 Central processing unit 2 Co-processor 3 Interrupt control 4 Memory management unit: 5 Address conversion circuit 6 Command / data mixed in cache memory Printed by the Central Standards Bureau of the Ministry of Economic Affairs Shellfish Consumer Cooperatives 7 Cache memory Body controller 8 a 32-bit logical address bus 8b 32-bit physical address bus 8 c Peripheral address bus 9a, 9b 32-bit data bus 9c 16-bit peripheral data bus 10 External bus Interface This paper size is in accordance with Chinese National Standard (CNS) A4 specifications. (210X1297 mm) -32-A7 B7 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (3〇) 1 1 Serial communication interface 1 2 Real-time clock pulse 1 3 Timer 1 4 Bus status controller 1 5 Clock pulse generating circuit 1 6 Watch timer 1 7 I / 0 Fill 1 8 User interrupt selector 1 0 0 Semiconductor chip 9 0 1 Register section 9 1 0 9 1 3 Multiplier 9 2 0 9 2 4 Latch circuit 9 6 0 Multiplier 9 6 1 Adder 9 6 2 Temporary Register (Please read the notes on the back before filling this page)

本紙張尺度適用中國國家標準（CNS ) A4規格（210X2.97公釐） •33·This paper size applies to China National Standard (CNS) A4 (210X2.97 mm) • 33 ·

Claims

A8 B8 C8 D8 、 Appendix 1 of patent application scope No. 86 1 1 6630 Chinese patent application scope amendment The Republic of China 8 March 1988 Correction code and register file Register file 2 If applying for the Central Standards Bureau of the Ministry of Economic Affairs J In the printing of industrial and consumer cooperatives, it is provided with: the line of input vector in the file of NX register in the file / 3 of the operand designation 3. If the application intermediary is formed by the register selected by the specified vector. 4. If the application is to be rented, in which the above-mentioned fourth operation is performed on the row director number? The patent register for material processing is patented with the above ranks and the above column product. The third patent is used in the above device. The first register is the first register. The first register is the result register. The second entry of the fourth entry vector has the characteristics 2 and 4 of the operands. The operand and output specified by the second operand are specified by the operand and output. The operands and outputs are as follows: The vector of the register specified by the first operand specified by the data processing is stored in the command. The register vector in the data place has the rank operation instructions described in item 2 or 3 of the non-patent scope. Including: via the specified input number The first row operation command selected by the register number η in the row, and the first designated command number and designated command in the row direction of the register group passing through. The number of the second element of the register of the device is the same as that of the 0th device, and the number η is the same. The command data is processed into the register group for the vector. This paper size applies _Guo Gu Jia Standard (CNS) A4 specification (210X297 mm) A8 B8 C8 D8, patent application scope Annex 1 No. 86 1 1 6630 Chinese patent application scope amendment March 1988 Correction code and register file 2 in the register file. If the application is printed by the Central Standards Bureau of the Ministry of Economic Affairs, J. Industry Cooperative Cooperative, it is provided with: Perform the line of the input vector in the file NX X Ν register file / 3 operand designation 3. If the application intermediary is formed by the register selected by the specified vector. 4. If the application is to be rented, in which the above-mentioned fourth operation is performed on the row director number? The patent register for material processing is patented with the above ranks and the above column product. The third patent is used in the above device. The first register is the first register. The first register is the result register. The second entry of the fourth entry vector has the characteristics 2 and 4 of the operands. The operand and output specified by the second operand are specified by the operand and output. The operands and outputs are as follows: The vector of the register specified by the first operand specified by the data processing is stored in the command. The register vector in the data place has the rank operation instructions described in item 2 or 3 of the non-patent scope. Including: via the specified input number The first row operation command selected by the register number η in the row, and the first designated command number and designated command in the row direction of the register group passing through. The number of the second element of the register of the device is the same as that of the 0th device, and the number η is the same. The command data is processed into the register group for vectors. Please read the precautions for this book. This paper size applies _Guo Gu Jia Standard (CNS) Α4 specification (210X297 mm) Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs A8 Βδ C8 D8 6. Application for a patent garden 5. A kind of data processing device, characterized by Use at least one of the inverse discrete cosine transforms using the command as described in any of claims 1, 2, 3, or 4. 6. A kind of data processing device ^ 'It is characterized by having a control device that forms a control signal corresponding to the command to be read in, a register file that can implement at least three registers that are formed from a predetermined number of registers, and an operation system. The Λ calculator is an operation-device that executes the individual processing of the plural command of the row and row operation command via the control signal from the control device described above. The above row and row calculation command passes the specified command code and the first and the first for register files. The second and third operands are configured with the fourth operand for the register in the specified register file. 7. The data processing device as described in item 6 of the scope of patent application, wherein the data processing device includes: performing the N × N sequence in the first register file designated by the first operand and the second operand designation In the second register file, the row-by-line product of the input vector of the element number N of the register specified by the fourth operand, and the output vector of the result is stored in the command of the third register file specified by the third operand. form. 8. The data processing device as described in item 7 of the scope of the patent application, wherein the register group selected through the register number η in the above-mentioned fourth operand used by the specified vector has different input and output vectors. Formed by order. 9. The data processing m set described in the 7th item to the 8th item of the scope of the patent application, wherein the above-mentioned row and column operation command includes: a register number n selected through the register 4 in the above-mentioned fourth operand used to specify the input vector The paper size of the register group applies to the Chinese National Standard (CNS) 8-4 specification (210X297 mm) -2- --------- loading-(Please read the precautions on the back first ^ k this page) Order — Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs of the People's Republic of China. A8, B8, C8, D8. ☆. The first rank calculation command for patent application in the direction of the ranks. The second row and column operation command of the direction. 10. The data processing device according to item 7, 8 or 9 of the scope of the patent application, wherein the above-mentioned computing device is provided with 16 from each of them. Register at least 3 register slots formed by the device, and obtain the data of the 4 selected registers in the first register file specified by the first operand and the second register file specified by the second operand The multiplier of the product of the data specified in the register specified by the fourth operand above, and the four adders that obtain the sum of the operation results of these multipliers and the previous addition result, and the operations that hold these adders The result is 4 temporary registers. 1 1. A data processing language, which is characterized by having a control device 'for forming a control signal corresponding to a command to be read in, and a register and an arithmetic unit, and the control signal' through the control device 'can be implemented separately. The first processor section of the command execution signal corresponding to the processing of the above command, and a control device having a control signal for forming a corresponding command to be read, and at least three register files and operations including a predetermined number of registers The second processor unit of the computing device that can execute each of the plurality of commands that executes a plurality of commands for performing a matrix operation of a matrix by a control signal via the control device described above is provided. Both of the first and second processor sections are configured to be implemented through the second processor. 1 2. — A control signal is formed in the reading command of the control device. 'This (please read the note on the back first), this paper size applies the Chinese National Standard (CNS) Α4 specification (210 × 297 mm)- 3- A8 B8 _ ^ __ VI. The data processing method for applying patent range control signals to the implementation device to implement the processing of corresponding commands, which is characterized by: No. 1, Table 2 and No. 1 through designated command codes and register files The 3 operand 'and the 4th operand for the register in the designated register file are prepared, and two kinds of NXN ranks in the register file designated by the 1st operand specified above are prepared to occupy the 2nd The row-by-product of the input vector of the element number N of the register specified by the fourth operand in the second register file specified by the operand is to store the result-output vector in the third designated by the third operand. The command of the register file 'is a row and row operation command for the register group selected through the register number η in the above-mentioned fourth operand for the specified input vector. The other side's rank-and-row operation command is that the register group selected via the above-mentioned register number η is in the rank-and-row direction. 1 3 · The data processing method described in item 12 of the scope of the patent application, wherein the other party's determinant operation command is a register selected through the register number η in the fourth operand used by the specified vector The group is different in the input vector and the output vector. (Please read the caution page on the back first)-Binding · Binding-Printed by the Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs