CN103365624A

CN103365624A - Judgment system and method

Info

Publication number: CN103365624A
Application number: CN2013102439659A
Authority: CN
Inventors: 罗沙尔.L.史托兹; 雷蒙.A.贝特伦
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2009-10-26
Filing date: 2010-09-07
Publication date: 2013-10-23
Anticipated expiration: 2030-09-07
Also published as: TWI423121B; CN103941601A; CN101937333B; CN101937333A; CN103365624B; CN103941601B; TW201419138A; TW201115460A; TWI489374B

Abstract

Judgment system and method. The system uses a common adder circuit to execute one of a horizontal minimum instruction and an error absolute value sum instruction, and includes multiple adders, a summing circuit, a comparison circuit and a path selection circuit. The path selection circuit transmits multiple digital codes to the adder according to the executed instruction. When executing the horizontal minimum instruction, these adders are classified into many adder pairs. Each adder pair provides a carry output and a transfer output. Each adder pair has a high adder and a low adder. The high adder compares the high part of a digital code pair of these digital codes. The low adder compares the low part of the digital code pair of these digital codes. According to these carry outputs and these transfer outputs, the minimum digital code is found.

Description

Judgment system and method

本申请是申请日为2010年09月07日、申请号为201010277155.1、发明名称为“判断系统及方法”的发明专利申请的分案申请。This application is a divisional application of an invention patent application with an application date of September 07, 2010, an application number of 201010277155.1, and an invention title of "judgment system and method".

技术领域technical field

本发明涉及一种微处理器指令，特别涉及一种用以从一数字码集合(setof digital values)中，判断出最小码的系统及方法，其中最小的数字码作为一水平最小值(horizontal minimum)。The present invention relates to a microprocessor instruction, in particular to a system and method for judging the minimum code from a set of digital values, wherein the smallest digital code is used as a horizontal minimum value (horizontal minimum ).

背景技术Background technique

目前的微处理器(microprocessor)经常被用来执行媒体指令(MediaInstruction)，用以增加多媒体应用的效率。举例而言，微处理器架构可能包含一个或多个媒体指令，用以从一数字码集合中辨识出一水平最小值，以及该水平最小值在一总线(bus)或一寄存器(register)的相对应位置(location)。一具体的例子就是英特尔(intel)的SSE4程序参考手册(SSE4programmingreference manual)里的PHMINPOSUW指令。PHMINPOSUW指令由8个无正负号字(unsigned words，128bits)中，找出最小字及最小字的相对应位置，其中最小字具有16个位(bit)。某些已知的微处理器在执行PHMINPOSUW指令时，需要较多的处理程序或是较多时钟周期。举例而言，为了辨识出多个字对里的最小字对，则需要使用4个16位的大小比较器(magnitudecomparators)，才能在一第一周期内，将搜寻范围由8个字降低至4个字，再将所找到的4个字反馈(feed back)至2个比较器，用以在一第二周期内，搜寻范围由4个字降低至2个字，最后再将寻找结果反馈给1个比较器，用以在一第三(即最后)周期内，找出2个字里的最小字。在一已知的做法中，通过增加16位比较器的数量以达到在单一周期内执行指令的功能。以7个16位比较器为例，在单一周期内，先利用4个比较器进行第一次的比较，用以将搜寻的范围由8字降低至4字，然后再利用2个比较器，将搜寻的范围由4字降低至2字，最后再利用1比较器，从2字中找出最小者。然而，每一16位比较器会占用微处理器较大的空间，因而增加成本并降低处理效能。The current microprocessor (microprocessor) is often used to execute media instructions (MediaInstruction) to increase the efficiency of multimedia applications. For example, the microprocessor architecture may include one or more media instructions to identify a horizontal minimum value from a set of digital codes, and the horizontal minimum value in a bus (bus) or a register (register) Corresponding location. A specific example is the PHMINPOSUW instruction in Intel's SSE4 programming reference manual (SSE4programmingreference manual). The PHMINPOSUW instruction finds the minimum word and the corresponding position of the minimum word from 8 unsigned words (128 bits), wherein the minimum word has 16 bits. Certain known microprocessors require more processing procedures or more clock cycles when executing the PHMINPOSUW instruction. For example, in order to identify the smallest word pair among multiple word pairs, four 16-bit size comparators (magnitude comparators) are required to reduce the search range from 8 words to 4 words in a first cycle. words, and then feed back the found 4 words to 2 comparators to reduce the search range from 4 words to 2 words in a second cycle, and finally feed back the search results to 1 comparator to find the smallest word of 2 words in a third (ie final) cycle. In a known approach, the function of executing instructions in a single cycle is achieved by increasing the number of 16-bit comparators. Taking 7 16-bit comparators as an example, in a single cycle, use 4 comparators for the first comparison to reduce the search range from 8 words to 4 words, and then use 2 comparators, Reduce the search range from 4 words to 2 words, and finally use 1 comparator to find the smallest one from 2 words. However, each 16-bit comparator takes up a lot of space in the microprocessor, thus increasing the cost and reducing the processing performance.

发明内容Contents of the invention

本发明的目的在于，不增加电路的情况下，又可在单一周期从数字码集合中找出最小数字码及其相对应位置。The purpose of the present invention is to find out the minimum digital code and its corresponding position from the digital code set in a single cycle without increasing the circuit.

本发明提供一种判断系统，用以从至少二个二进制码中，找出一最小二进制码。在一实施例中，判断系统包括，一第一加法器、一第二加法器以及一比较电路。第一加法器加总多个第一位以及多个第二位，用以提供一第一进位输出及一第一传递输出。这些第一位为一第一二进制码的高位。这些第二位反相于一第二二进制码的高位。第二加法器加总多个第三位以及多个第四位，用以提供一第二进位输出。这些第三位为第一二进制码的低位。这些第四位反相于第二二进制码的低位。比较电路根据第一及第二进位输出及第一传递输出，判断是否第一二进制码大于第二二进制码。第一及第二二进制码均无正负号(unsigned)。第一及第二加法器执行无正负号二进制加法。该第一传递输出代表该第一加法器是否接收到一进入输入(carry input)。The invention provides a judging system for finding a minimum binary code from at least two binary codes. In an embodiment, the judging system includes a first adder, a second adder and a comparison circuit. The first adder adds up a plurality of first bits and a plurality of second bits to provide a first carry output and a first transfer output. These first bits are high bits of a first binary code. These second bits are inverted to the upper bits of a second binary code. The second adder adds up a plurality of third bits and a plurality of fourth bits to provide a second carry output. These third bits are the lower bits of the first binary code. These fourth bits are the inverse of the lower bits of the second binary code. The comparison circuit judges whether the first binary code is greater than the second binary code according to the first and second carry outputs and the first transfer output. Both the first and second binary codes are unsigned. The first and second adders perform unsigned binary addition. The first transfer output represents whether the first adder receives a carry input.

本发明还提供一种判断系统，用以快速地由多个数字码中，找出一水平最小值。本发明的判断系统包括，多个差异电路、一路径选择电路以及一比较电路。每一差异电路比较两数字码。路径选择电路将这些数字码中的每一个指定给至少一差异电路，用以将每一数字码与其它数字码作比较。每一差异电路可能包括一高加法器以及一低加法器。高加法器比较一第一数字码的高部分及一第二数字码的高部分，用以提供一第一进位输出以及一传递输出。低加法器比较该第一数字码的低部分及该第二数字码的低部分，用以提供一第二进位输出。比较电路比较这些第一及第二进位输出以及比较这些传递输出，用以得知这些数字码中的一最小数字码。The present invention also provides a judging system for quickly finding a horizontal minimum value from multiple digital codes. The judging system of the present invention includes a plurality of difference circuits, a path selection circuit and a comparison circuit. Each difference circuit compares two digital codes. Routing circuitry assigns each of these digital codes to at least one difference circuit for comparing each digital code with other digital codes. Each difference circuit may include a high adder and a low adder. The high adder compares a high part of a first digital code with a high part of a second digital code to provide a first carry output and a transfer output. The low adder compares the low part of the first digital code with the low part of the second digital code to provide a second carry output. The comparison circuit compares the first and second carry outputs and compares the transfer outputs to obtain a minimum digital code among the digital codes.

每一传递输出表示这些差异电路中的一个的高加法器是否接收一进位输入。该比较电路包括一解码电路。解码电路解码比较位，用以提供多个最小位。每一最小位表示相对应的数字码是否为最小数字码。一位置电路告知该最小数字码的存储器位置。判断系统可能被整合在一微处理器芯片中，用以执行一快速的水平最小指令。Each transfer output indicates whether the high adder of one of these difference circuits receives a unary input. The comparison circuit includes a decoding circuit. The decoding circuit decodes the comparison bits to provide a plurality of minimum bits. Each minimum digit indicates whether the corresponding digital code is the minimum digital code. A location circuit informs the memory location of the minimum digital code. The judgment system may be integrated in a microprocessor chip to execute a fast horizontal minimum instruction.

本发明提供一种判断方法，用以找出多个数字码中的一最小数字码。在一可能实施例中，判断方法包括下列步骤，比较一第一数字码的高位以及一第二数字码的高位，用以提供一第一进位输出以及一传递输出；比较该第一数字码的低位以及该第二数字码的低位，用以提供一第二进位输出；以及根据第一及第二进位输出以及该传递输出，判断出该第一或第二数字码为一较小码。本发明的判断方法可能包括，将这些数字码的每一个传送至多个加法器对的至少一加法器对中，用以将每一数字码与其它数字码相比较，以得知一最小数字码。本发明的判断方法还包括，解码比较位。本发明的判断方法还包括，得知最小数字码在一存储器中的位置。The invention provides a judging method for finding a minimum digital code among multiple digital codes. In a possible embodiment, the judging method includes the following steps, comparing the high order of a first digital code and the high order of a second digital code to provide a first carry output and a transfer output; comparing the high order of the first digital code The low bit and the low bit of the second digital code are used to provide a second carry output; and according to the first and second carry output and the transfer output, it is judged that the first or second digital code is a smaller code. The judging method of the present invention may include sending each of these digital codes to at least one adder pair among a plurality of adder pairs, so as to compare each digital code with other digital codes to obtain a minimum digital code . The judging method of the present invention further includes decoding the comparison bit. The judging method of the present invention also includes knowing the location of the minimum digital code in a memory.

本发明提供一种系统，利用一共用加法器电路，执行一水平最小指令及一误差绝对值总和指令中的一个。在一实施例中，该系统包括，多个加法器、一加总电路、一比较电路以及一路径选择电路。输入运算元包括多个数字码。对误差绝对值总和指令而言，这些数字码包括一第一数字码集和以及一第二数字码集合。对水平最小指令而言，这些数字码包括多个数字码对。每一数字码对具有一高数字码以及一低数字码。每一加法器将一第一数字码与一第二数字码作比较，用以提供一误差绝对值以及一进位输出。加总电路加总这些误差绝对值，用以提供多个误差绝对值加总值。这些加法器构成多个加法器对，并提供一传递输出。比较电路结合这些进位输出及这些传递输出，用以找出这些数字码对的一最小数字码对。在执行该水平最小指令时，路径选择电路将这些数字码对的每一数字码对传送至这些加法器对的至少一加法器对，用以将每一数字码对与其它数字码对相比较。在执行该误差绝对值总和指令时，路径选择电路将该第一及第二数字码集合传送至这些加法器对，用以得知该第一数字码集合的每一数字码与该第二数字码集合的每一数字码之间的误差绝对值，该第二数字码集合具有连续的数字码。The present invention provides a system that utilizes a common adder circuit to execute one of a minimum level command and a sum of absolute error command. In one embodiment, the system includes a plurality of adders, a totalization circuit, a comparison circuit and a path selection circuit. Input operands include multiple numeric codes. For the error sum command, the digital codes include a first digital code set and a second digital code set. For horizontal minimal instructions, these codes consist of pairs of codes. Each code pair has a high code and a low code. Each adder compares a first digital code with a second digital code to provide an absolute error value and a carry output. The summing circuit sums up these absolute error values to provide a plurality of summed absolute error values. These adders form adder pairs and provide a transfer output. The comparison circuit combines the carry outputs and the transfer outputs to find a minimum digital code pair of the digital code pairs. When executing the horizontal minimum instruction, the path selection circuit transmits each digital code pair of these digital code pairs to at least one adder pair of these adder pairs, so as to compare each digital code pair with other digital code pairs . When executing the instruction of sum of absolute value of errors, the path selection circuit transmits the first and second digital code sets to these adder pairs, so as to obtain each digital code of the first digital code set and the second digital code The absolute value of the error between each digital code of the code set, the second digital code set has consecutive digital codes.

本发明还提供一种方法，利用一共用加法器电路，执行一水平最小指令以及一误差绝对值总和指令中的一个。在一实施例中，本发明所提供的方法包括：接收多个数字码。在执行误差绝对值总和指令时，这些数字码包括一第一数字码集合以及一第二数字码集合。在执行水平最小指令时，这些数字码包括一高数字码以及一低数字码。本发明所提供的方法还包括，提供多个加法器。每一加法器将一第一数字码与一第二数字码相比较，用以提供一误差绝对值以及一进位输出。本发明所提供的方法还包括，加总这些误差绝对值，用以提供多个误差绝对值总和值；将这些加法器分类成多个加法器对，并提供一传递输出；结合这些进位输出及这些传递输出，用以得知这些数字码对的一最小数字码对；以及在执行该水平最小指令时，将这些数字码对的每一数字码对传送至这些加法器对的至少一加法器对，用以将每一数字码对与其它数字码对相比较，在执行该误差绝对值总和指令时，将该第一及第二数字码集合传送至这些加法器对，用以得知第一数字码集合的每一数字码与该第二数字码集合的每一连续数字码之间的误差绝对值。The present invention also provides a method for executing one of a minimum level command and a sum of absolute error command by using a shared adder circuit. In one embodiment, the method provided by the present invention includes: receiving a plurality of digital codes. These digital codes include a first set of digital codes and a second set of digital codes when executing the command of sum of absolute value of errors. These digital codes include a high digital code and a low digital code when executing the horizontal minimum command. The method provided by the present invention further includes providing multiple adders. Each adder compares a first digital code with a second digital code to provide an absolute error value and a carry output. The method provided by the present invention also includes summing up these absolute values of errors to provide a plurality of sums of absolute values of errors; classifying these adders into a plurality of adder pairs and providing a transfer output; combining these carry outputs and These transfer outputs are used to obtain a minimum digital code pair of these digital code pairs; and when the horizontal minimum instruction is executed, each digital code pair of these digital code pairs is sent to at least one adder of these adder pairs pairs, for comparing each pair of digital codes with other pairs of digital codes, and when executing the instruction of the sum of absolute value of errors, the first and second digital code sets are sent to these adder pairs to obtain the first The absolute value of the error between each digital code of a digital code set and each continuous digital code of the second digital code set.

附图说明Description of drawings

图1显示微处理器100的一实施例。FIG. 1 shows an embodiment of a microprocessor 100 .

图2为比较电路的一实施例。FIG. 2 is an embodiment of a comparison circuit.

图3为本发明的路径选择电路的一实施例。FIG. 3 is an embodiment of the path selection circuit of the present invention.

图4为本发明的第一加法器电路的一实施例。FIG. 4 is an embodiment of the first adder circuit of the present invention.

图5为本发明的差异单元DIFF1的一实施例。FIG. 5 is an embodiment of the difference unit DIFF1 of the present invention.

图6显示本发明的总和单元S1的一实施例。FIG. 6 shows an embodiment of the summing unit S1 of the present invention.

图7为本发明的PMIN电路206的一实施例。FIG. 7 is an embodiment of the PMIN circuit 206 of the present invention.

图8为本发明的高阶/低阶比较电路212的一实施例。FIG. 8 is an embodiment of the high-order/low-order comparison circuit 212 of the present invention.

【主要元件符号说明】[Description of main component symbols]

100：微处理器；100: microprocessor;

102：排程器；102: scheduler;

104：复杂整数执行单元；104: complex integer execution unit;

106：简单整数执行单元；106: simple integer execution unit;

108：浮点执行单元；108: floating point execution unit;

110：媒体单元；110: media unit;

114、802：比较电路；114, 802: comparison circuit;

112：其它单元；112: other units;

202：路径选择电路；202: path selection circuit;

203：低阶加法器电路；203: low-order adder circuit;

204：第一加法器电路；204: the first adder circuit;

206：第一PMIN电路；206: the first PMIN circuit;

207：高阶加法器电路；207: high-order adder circuit;

208：第二加法器电路；208: the second adder circuit;

210：第二PMIN电路；210: the second PMIN circuit;

212：高阶/低阶比较器电路；212: high-order/low-order comparator circuit;

302：缓冲器电路；302: buffer circuit;

304、306、308、506、510、804、806、808：多工器；304, 306, 308, 506, 510, 804, 806, 808: multiplexer;

402：差异电路；402: difference circuit;

404：总和电路；404: summing circuit;

410、412：选择逻辑电路；410, 412: select a logic circuit;

502、504、602、604、606：加法器；502, 504, 602, 604, 606: adders;

514、708、716、722、726：与门；514, 708, 716, 722, 726: AND gate;

516：或门；516: OR gate;

508、512、702、704、706、712、714、720、710、718、724：反相器；508, 512, 702, 704, 706, 712, 714, 720, 710, 718, 724: inverters;

728：选择电路；728: select circuit;

DIFF1～DIFF8：差异单元；DIFF1～DIFF8: difference unit;

S1～S4：总和单元。S1～S4: Sum unit.

具体实施方式Detailed ways

以下的实施例说明用以让本领域的普通技术人员得以制造和使用本发明公开的内容。优选实施例的修改对于本领域的技术人员将是显而易见的，且此处描述的普遍原理可应用于其他实施例。因此，本发明并未局限于此处提出和说明的特定实施例，其应涵盖所有符合公开在此的原理和新颖特征的最大范围。The following examples illustrate to enable those of ordinary skill in the art to make and use the present disclosure. Modifications to the preferred embodiment will be readily apparent to those skilled in the art, and the general principles described herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments presented and described herein, but is to be given the widest scope consistent with the principles and novel features disclosed herein.

本发明注意到，已知微处理器执行水平最小值指令需使用许多周期。本发明在执行相同的指令时，仅需单一周期，并且不会大量增加电路。本发明提供一种系统及方法，用以快速得知水平最小值，为使本发明的特征和优点能更明显易懂，下文特举出优选实施例，并配合附图(图1～图8)，作详细说明。The present invention notes that known microprocessors use many cycles to execute horizontal minimum instructions. When the present invention executes the same instruction, only a single cycle is needed, and the circuit will not be greatly increased. The present invention provides a system and method for quickly obtaining the minimum level value. In order to make the features and advantages of the present invention more obvious and easy to understand, the preferred embodiments are specifically listed below, together with the accompanying drawings (Figs. 1-8 ), for details.

图1为本发明的一实施例中微处理器100的一结构图。处理器100具有比较电路114，比较电路114可由数字码集合中，快速地找出一水平最小值，并得到第一数字码集合及第二数字码集合的误差绝对值总和(sum ofabsolute differences)。在本实施例中，图1并未显示其它已知的系统及功能，如指令提取(instruction fetch)、指令队列(instruction queue)、指令解码(instruction decoding)、以及指令重排(Instruction reordering)…等。虽然图1没有显示部分已知技术，但并不会影响对于本发明的理解。微处理器100具有排程器(scheduler)102。排程器102安排(route)指令或操作的程序，用以选择算术逻辑单元(arithmetic logic units；ALUs)或是执行单元(executionunits；EUs)。如图1所示，排程器102耦接复杂整数执行单元(complex integerexecution unit；IEU)104、简单整数执行单元(simple IEU)106、浮点执行单元(floating point execution unit；FPEU)108、媒体单元(media unit)110以及其它单元112，其中其它单元112为其它相似或不同的处理单元。媒体单元110一般执行以媒体为基础的指令及运作，如单指令多数据流式扩展指令集(Streaming SIMD Extensions，SSE)或者多媒体延伸指令集(MultiMediaextension，MMX)及其它类似指令集。SSE是英特尔的x86架构中的一种SIMD指令集，SIMD是指单指令多数据(single instruction multiple data)。媒体单元110具有比较电路114，用以执行至少两独立的媒体指令。在本实施例中，该两媒体指令称为水平最小指令（PMIN指令）及误差绝对值总和指令（PSAD指令）。PSAD指令表示第一数字码(或二进制码)集合及第二数字码(或二进制码)集合的误差绝对值总和，其中第二数字码集合紧随在第一数字码集合之后。稍后将详细说明第一数字码集合及第二数字码集合。通过执行PMIN指令，可得知一最小数字码及其相对应位置。在本实施例中，上述数字码、二进制码以及相对应格式可相互替换，并且这些码代表多个位(bit)或十六进位的数字码。排程器102具有存储器116。存储器116用以存储PSAD指令及PMIN指令的运算元(operand)，并具有第一总线ABUS以及第二总线BBUS。在一实施例中，第一总线ABUS以及第二总线BBUS可传送128位，但并非用以限制本发明。在其它实施例中，第一总线ABUS以及第二总线BBUS可传送其它数量的位。虽然媒体单元110一般用以执行其它多种本领域人士所深知的媒体指令，但比较电路114用以执行PSAD及PMIN指令。FIG. 1 is a structural diagram of a microprocessor 100 in an embodiment of the present invention. The processor 100 has a comparison circuit 114. The comparison circuit 114 can quickly find a horizontal minimum value from the digital code set, and obtain the sum of absolute differences between the first digital code set and the second digital code set. In this embodiment, FIG. 1 does not show other known systems and functions, such as instruction fetch, instruction queue, instruction decoding, and instruction reordering... wait. Although Fig. 1 does not show some known techniques, it does not affect the understanding of the present invention. The microprocessor 100 has a scheduler 102 . The scheduler 102 routes instructions or operations to select arithmetic logic units (ALUs) or execution units (EUs). As shown in Figure 1, the scheduler 102 is coupled to a complex integer execution unit (complex integer execution unit; IEU) 104, a simple integer execution unit (simple IEU) 106, a floating point execution unit (floating point execution unit; FPEU) 108, media A media unit 110 and other units 112, wherein the other units 112 are other similar or different processing units. The media unit 110 generally executes media-based instructions and operations, such as Streaming SIMD Extensions (SSE) or MultiMedia extension (MMX) and other similar instruction sets. SSE is a SIMD instruction set in Intel's x86 architecture, SIMD refers to single instruction multiple data (single instruction multiple data). The media unit 110 has a comparison circuit 114 for executing at least two independent media commands. In this embodiment, the two media instructions are called a horizontal minimum instruction (PMIN instruction) and a sum of absolute error instruction (PSAD instruction). The PSAD instruction represents the sum of the absolute values of errors of the first set of digital codes (or binary codes) and the second set of digital codes (or binary codes), wherein the second set of digital codes (or binary codes) follows the first set of digital codes. The first digital code set and the second digital code set will be described in detail later. By executing the PMIN command, a minimum digital code and its corresponding position can be obtained. In this embodiment, the above digital codes, binary codes and corresponding formats can be replaced with each other, and these codes represent digital codes of multiple bits or hexadecimal digits. The scheduler 102 has a memory 116 . The memory 116 is used for storing operands of the PSAD instruction and the PMIN instruction, and has a first bus ABUS and a second bus BBUS. In one embodiment, the first bus ABUS and the second bus BBUS can transmit 128 bits, but this is not intended to limit the invention. In other embodiments, the first bus ABUS and the second bus BBUS can transmit other numbers of bits. Although the media unit 110 is generally used to execute other media instructions well known to those skilled in the art, the comparison circuit 114 is used to execute PSAD and PMIN instructions.

在一可能实施例中，针对PSAD指令而言，第一数字码集合具有4个字节(每一字节具有8个位)，其中这4个字节为无正负号字节。针对PSAD指令而言，第二数字码集合具有一字节集合。该字节集合具有11个连续的字节。同一时间，每4个连续的字节会被分类成一个群组。针对第二数字码集合而言，每一下一个4字节群组由下一较高字节开始，意思就是说，每一下一个群组会位移1个字节，因此，会重叠上一个群组的最后3个字节。假设第二数字码集合具有11个字节B0～B10。首先将B0～B3分类成第一群组，接着，由下一个较高字节(如B1)开始，再分类形成第二群组(B1～B4)。因此，第二群组(B1～B4)会重叠第一群组(B0～B3)的最后3个字节(B1～B3)。第一数字码集合的每一字节与第二数字码集合的每一字节之间的差称为误差绝对值。上述误差绝对值会被加总在一起。一具体的例子就是英特尔的SSE4程序参考手册里的MPSADBW指令。针对PSAD指令而言，第一总线ABUS传送第一运算元。第一运算元包括4个无正负号的字节。第二总线BBUS传送第二运算元。第二运算元具有11个无正负号的字节。误差绝对值总和为8个无正负号的10位二进制码。PSAD指令可能包括一个或多个偏移量(offset)，用以找到上述运算元。本发明并不限定偏移量的大小，任何偏移量均可通过第一总线ABUS及第二总线BBUS而配置，因此，相对应的运算元会被配置在第一总线ABUS及第二总线BBUS的最右高位位置(right-most bit position)。在本实施例中，省略上述偏移量。在一实施例中，PMIN指令提供第一总线ABUS中的8个无正负号数字字的最小值及该最小值的相对应位置，其中这8个无正负号数字字的每一字具有16位。一具体的例子就是英特尔的SSE4程序参考手册里的PHMINPOSUW指令。针对PMIN指令而言，第一总线ABUS传送8个字，每一字具有16位。第二总线BBUS所传送的位可不被定义或是忽略，亦或是令第二总线BBUS所传送的字与第一总线ABUS相同。在本实施例中，比较电路114在单一周期内，利用相同的加法器电路执行双指令(PMIN指令及PSAD指令)。In a possible embodiment, for the PSAD instruction, the first digital code set has 4 bytes (each byte has 8 bits), wherein the 4 bytes are unsigned bytes. For the PSAD instruction, the second set of digital codes has a set of bytes. This byte set has 11 consecutive bytes. At the same time, every 4 consecutive bytes will be classified into a group. For the second set of digital codes, each next group of 4 bytes starts with the next higher byte, which means that each next group will be shifted by 1 byte, therefore, the previous group will be overlapped The last 3 bytes of the . Assume that the second digital code set has 11 bytes B0-B10. First, B0-B3 are classified into the first group, and then, starting from the next higher byte (such as B1), they are further classified into the second group (B1-B4). Therefore, the second group ( B1 ˜ B4 ) overlaps the last 3 bytes ( B1 ˜ B3 ) of the first group ( B0 ˜ B3 ). The difference between each byte of the first set of digital codes and each byte of the second set of digital codes is called the absolute value of the error. The absolute values of the above errors are summed together. A specific example is the MPSADBW instruction in Intel's SSE4 program reference manual. For the PSAD instruction, the first bus ABUS transfers the first operand. The first operand consists of 4 unsigned bytes. The second bus BBUS transmits the second operand. The second operand has 11 unsigned bytes. The sum of the absolute value of the error is 8 10-bit binary codes without signs. The PSAD instruction may include one or more offsets (offsets) to find the above operands. The present invention does not limit the size of the offset. Any offset can be configured through the first bus ABUS and the second bus BBUS. Therefore, the corresponding operands will be configured on the first bus ABUS and the second bus BBUS The rightmost high bit position (right-most bit position). In this embodiment, the above offset is omitted. In one embodiment, the PMIN instruction provides the minimum value of 8 unsigned digital words in the first bus ABUS and the corresponding position of the minimum value, wherein each of the 8 unsigned digital words has 16 bits. A specific example is the PHMINPOSUW instruction in Intel's SSE4 program reference manual. For the PMIN instruction, the first bus ABUS transfers 8 words, each word has 16 bits. The bits transmitted by the second bus BBUS can be undefined or ignored, or the words transmitted by the second bus BBUS can be the same as those transmitted by the first bus ABUS. In this embodiment, the comparison circuit 114 uses the same adder circuit to execute two instructions (PMIN instruction and PSAD instruction) in a single cycle.

图2为本发明的比较电路114的一实施例。如图所示，比较电路114包括，路径选择电路(routing circuit)202、低阶(low-order；LO)加法器电路203、高阶(high-order；HI)加法器电路207、高阶/低阶比较器电路212。路径选择电路202具有二输入端，分别耦接第一总线ABUS及第二总线BBUS。路径选择电路202具有另一输入端，用以接收控制码INSTR。路径选择电路202根据输入端所接收到的控制码INSTR，对来自第一总线ABUS及第二总线BBUS的字节，进行重新排列或重新进行路径选择，用以切分第一总线ABUS及第二总线BBUS的字节。控制码INSTR具有至少1位。在本实施例中，当控制码INSTR等于1时，表示执行PMIN指令；当控制码INSTR等于0时，表示执行PSAD指令。第一总线ABUS被切分成一高位部分AH<31:0>以及一低位部分AL<31:0>，其中高位部分AH<31:0>及低位部分AL<31:0>均具有32位。第二总线BBUS被切分成一高位部分BH<55:0>以及一低位部分BL<55:0>，其中高位部分BH<55:0>及低位部分BL<55:0>均具有56位。稍后将详细说明如何根据一开始所执行的指令，对第一总线ABUS及第二总线BBUS的字节进行重新排列或重新进行路径选择。低阶加法器电路203具有第一加法器电路204。第一加法器电路204耦接第一PMIN电路206。高阶加法器电路207具有第二加法器电路208。第二加法器电路208耦接第二PMIN电路210。FIG. 2 is an embodiment of the comparison circuit 114 of the present invention. As shown in the figure, the comparison circuit 114 includes a routing circuit (routing circuit) 202, a low-order (low-order; LO) adder circuit 203, a high-order (high-order; HI) adder circuit 207, a high-order / Low order comparator circuit 212 . The path selection circuit 202 has two input terminals respectively coupled to the first bus ABUS and the second bus BBUS. The path selection circuit 202 has another input terminal for receiving the control code INSTR. The path selection circuit 202 rearranges or re-selects the bytes from the first bus ABUS and the second bus BBUS according to the control code INSTR received by the input terminal, so as to split the first bus ABUS and the second bus ABUS. Bytes of the bus BBUS. The control code INSTR has at least 1 bit. In this embodiment, when the control code INSTR is equal to 1, it means that the PMIN instruction is executed; when the control code INSTR is equal to 0, it means that the PSAD instruction is executed. The first bus ABUS is divided into a high part AH<31:0> and a low part AL<31:0>, wherein both the high part AH<31:0> and the low part AL<31:0> have 32 bits. The second bus BBUS is divided into a high part BH<55:0> and a low part BL<55:0>, wherein both the high part BH<55:0> and the low part BL<55:0> have 56 bits. How to rearrange the bytes of the first bus ABUS and the second bus BBUS or re-select paths will be described in detail later. The low-order adder circuit 203 has a first adder circuit 204 . The first adder circuit 204 is coupled to the first PMIN circuit 206 . The high-order adder circuit 207 has a second adder circuit 208 . The second adder circuit 208 is coupled to the second PMIN circuit 210 .

第一加法器电路204接收控制码INSTR、低位部分AL<31:0>及BL<55:0>，并输出误差绝对值总和PSAD<39:0>以及比较位C<5:0>。误差绝对值总和PSAD<39:0>具有40位。比较位C<5:0>具有6位。比较位C<5:0>、AL<15:0>及BL<47:0>被传送至第一PMIN电路206。针对低位部分，第一PMIN电路206输出最小值PMINVAL<15:0>以及相对应位置PMINLOC<1:0>。控制码INSTR、高位部分AH<31:0>及BH<55:0>被传送至第二加法器电路208。第二加法器电路208输出误差绝对值总和PSAD<79:40>以及比较位C<11:6>。误差绝对值总和PSAD<79:40>具有40位。比较位C<11:6>具有6位。比较位C<11:6>、AH<15:0>及BH<47:0>被传送至第二PMIN电路210。针对高位部分，第二PMIN电路210输出最小值PMINVAL<31:16>以及相对应位置PMINLOC<3:2>。将第一PMIN电路206所输出的最小值PMINVAL<15:0>以及相对应位置PMINLOC<1:0>以及第二PMIN电路210所输出的最小值PMINVAL<31:16>以及相对应位置PMINLOC<3:2>相结合，便可产生PMINVAL<31:0>以及PMINLOC<3:0>。高阶/低阶比较器电路212接收PMINVAL<31:0>以及PMINLOC<3:0>，并产生最终的最小数字码MINVAL<15:0>及其相对位置MINLOC<2:0>。The first adder circuit 204 receives the control code INSTR, the low bits AL<31:0> and BL<55:0>, and outputs the sum of absolute error values PSAD<39:0> and the comparison bits C<5:0>. The sum of absolute error PSAD<39:0> has 40 bits. The compare bits C<5:0> have 6 bits. The compare bits C<5:0>, AL<15:0> and BL<47:0> are sent to the first PMIN circuit 206 . For the low part, the first PMIN circuit 206 outputs the minimum value PMINVAL<15:0> and the corresponding position PMINLOC<1:0>. The control code INSTR, the upper part AH<31:0> and BH<55:0> are sent to the second adder circuit 208 . The second adder circuit 208 outputs the sum of absolute error values PSAD<79:40> and compare bits C<11:6>. The sum of absolute error PSAD<79:40> has 40 bits. The compare bits C<11:6> have 6 bits. The comparison bits C<11:6>, AH<15:0> and BH<47:0> are sent to the second PMIN circuit 210 . For the high part, the second PMIN circuit 210 outputs the minimum value PMINVAL<31:16> and the corresponding position PMINLOC<3:2>. The minimum value PMINVAL<15:0> output by the first PMIN circuit 206 and the corresponding position PMINLOC<1:0> and the minimum value PMINVAL<31:16> output by the second PMIN circuit 210 and the corresponding position PMINLOC< 3:2> are combined to generate PMINVAL<31:0> and PMINLOC<3:0>. The high-order/low-order comparator circuit 212 receives PMINVAL<31:0> and PMINLOC<3:0>, and generates the final minimum digital code MINVAL<15:0> and its relative position MINLOC<2:0>.

第一加法器电路204及第二加法器电路208根据指令(即控制码INSTR)，对输入的字节进行排列，并进行字节间的比较。针对PSAD指令而言，组合后的PSAD<79:0>具有8个10位的数字码，其中这些数字码没有正负号。这8个10位的数字码为执行误差绝对值总和操作后的结果。针对PSAD指令，第一PMIN电路206、第二PMIN电路210及高阶/低阶比较器电路212可被省略。针对PMIN指令，当每一高位部分及低位部分输入时，可省略PSAD<79:0>，通过第一PMIN电路206及第二PMIN电路210所接收到的比较位C<11:0>，便可得知最小的数字码及相对位置。当第一总线ABUS提供128位的输入数据时，高阶/低阶比较器电路212接收并比较高位部分及低位部分的最小数字码，并输出最小值MINVAL<15:0>以及相对位置MINLOC<2:0>。The first adder circuit 204 and the second adder circuit 208 arrange the input bytes according to the instruction (ie, the control code INSTR), and perform comparison between the bytes. For the PSAD instruction, the combined PSAD<79:0> has eight 10-bit digital codes, where these digital codes have no sign. The eight 10-digit digital codes are the result of performing the operation of summing the absolute value of errors. For the PSAD instruction, the first PMIN circuit 206 , the second PMIN circuit 210 and the high-order/low-order comparator circuit 212 can be omitted. For the PMIN instruction, when each high-order part and low-order part are input, PSAD<79:0> can be omitted, and the comparison bits C<11:0> received by the first PMIN circuit 206 and the second PMIN circuit 210 can be The smallest digital code and relative position can be known. When the first bus ABUS provides 128-bit input data, the high-order/low-order comparator circuit 212 receives and compares the minimum digital code of the high-order part and the low-order part, and outputs the minimum value MINVAL<15:0> and the relative position MINLOC< 2:0>.

图3为本发明的路径选择电路202的一实施例。路径选择电路202根据特定的指令，用以对第一总线ABUS及第二总线BBUS所提供的数字码进行排列或是重新进行路径选择。缓冲器电路302接收ABUS<31:0>，并针对PSAD指令及PMIN指令，输出相对应的AL<31:0>。在一实施例中，针对每一位，缓冲器电路302可包含一独立的缓冲器，使得ABUS<31:0>可有效地被复制成AL<31:0>。换句话说，AL<31>=ABUS<31>、AL<30>=ABUS<30>、…、AL<0>=ABUS<0>。对于PSAD指令及PMIN指令而言，AL<31:0>具有4个字节A3～A0。对PMIN指令而言，字节A3～A0可分成两对，其中A3及A2可构成字W1，A1及A0可构成字W0。字W1及W0均具有16位。多工器304接收ABUS<95:64>及ABUS<31:0>。当多工器304的控制信号等于逻辑1(或高电平)时，多工器304的输出AH<31:0>等于ABUS<95:64>。当多工器304的控制信号等于逻辑0(或低电平)时，多工器304的输出AH<31:0>等于ABUS<31:0>。在一实施例中，对于32位的AH<31:0>中的每一位，均提供单独的一具有1位宽度的多工器，因此对于每一输入端及输出端均具有单独的多工器路径(MUX path)。如果控制码INSTR代表PMIN指令时，则多工器304将ABUS<95:64>作为AH<31:0>。这32位形成4个字节A11～A8。针对PMIN而言，字节A11～A8可分成两字，其中字节A11及A10可构成字W5，而字节A9及A8可构成字W4。如果控制码INSTR代表PSAD时，则多工器304将ABUS<31:0>作为AH<31:0>。这32位形成4个字节A3～A0。字节的复制就是因为PSAD指令的第一运算元对于高阶及低阶部分来说是相同的，稍后将详细说明。FIG. 3 is an embodiment of the path selection circuit 202 of the present invention. The path selection circuit 202 is used to arrange or re-select the digital codes provided by the first bus ABUS and the second bus BBUS according to specific instructions. The buffer circuit 302 receives ABUS<31:0>, and outputs corresponding AL<31:0> for the PSAD instruction and the PMIN instruction. In one embodiment, buffer circuit 302 may include a separate buffer for each bit such that ABUS<31:0> is effectively duplicated as AL<31:0>. In other words, AL<31>=ABUS<31>, AL<30>=ABUS<30>, ..., AL<0>=ABUS<0>. For the PSAD instruction and the PMIN instruction, AL<31:0> has 4 bytes A3˜A0. For the PMIN instruction, the bytes A3-A0 can be divided into two pairs, wherein A3 and A2 can form the word W1, and A1 and A0 can form the word W0. Words W1 and W0 each have 16 bits. Multiplexer 304 receives ABUS<95:64> and ABUS<31:0>. When the control signal of the multiplexer 304 is equal to logic 1 (or high level), the output AH<31:0> of the multiplexer 304 is equal to ABUS<95:64>. When the control signal of the multiplexer 304 is equal to logic 0 (or low level), the output AH<31:0> of the multiplexer 304 is equal to ABUS<31:0>. In one embodiment, a separate 1-bit wide multiplexer is provided for each of the 32-bit AH<31:0> bits, thus having a separate multiplexer for each input and output. MUX path. If the control code INSTR represents the PMIN instruction, then the multiplexer 304 regards ABUS<95:64> as AH<31:0>. These 32 bits form 4 bytes A11-A8. For PMIN, bytes A11-A8 can be divided into two words, wherein bytes A11 and A10 can form word W5, and bytes A9 and A8 can form word W4. If the control code INSTR represents PSAD, the multiplexer 304 uses ABUS<31:0> as AH<31:0>. These 32 bits form 4 bytes A3-A0. The byte is copied because the first operand of the PSAD instruction is the same for the high-order and low-order parts, which will be explained in detail later.

当多工器306的控制信号为逻辑1时(即控制码INSTR=1)，多工器306接收并输出8个高位0x8以及ABUS<63:16>，其中这8个高位0x8的逻辑值均为0。此时，多工器306的输出BL<55:0>为8个高位0x8以及ABUS<63:16>。当多工器306的控制信号为逻辑0时，多工器306接收并输出BBUS<55:0>，此时，多工器306的输出BL<55:0>为BBUS<55:0>。在一实施例中，针对每一总线的每一字节而言，可使用具有1位宽度的多工器。如果控制码INSTR代表PMIN指令时，则ABUS<63:16>会被选择到。ABUS<63:16>具有6个字节A7～A2。字节A7～A2可被分别3对。字节A7及A6可构成字W3。字节A5及A4可构成字W2。字节A3及A2可构成字W1。如果控制码INSTR代表PSAD指令时，BBUS<55:0>会被选择到。BBUS<55:0>具7个低字节B6～B0的第二运算元。当多工器308的控制端为逻辑1时，多工器308接收并输出8个高位0x8以及ABUS<127:79>，其中这8个高位0x8的逻辑值均为0。此时，多工器308的输出BH<55:0>为8个高位0x8以及ABUS<127:79>的组合。当多工器308的控制端为逻辑0时，多工器308接收并输出BBUS<87:32>。此时，多工器308的输出BH<55:0>为BBUS<87:32>。如果控制码INSTR为PMIN指令时，ABUS<127:79>会被选择。ABUS<127:79>具有6个字节A15～A10。字节A15～A10可分别3对。字节A15及A14可构成字W7。字节A13及A12可构成字W6。字节A11及A10可构成字W5。如果控制码INSTR为PSAD指令时，BBUS<87:32>会被选择。BBUS<87:32>具有7个高字节B10～B4的，7个高字节B10～B4构成PSAD指令的第二运算元。When the control signal of the multiplexer 306 is logic 1 (that is, the control code INSTR=1), the multiplexer 306 receives and outputs 8 high-order 0x8 and ABUS<63:16>, wherein the logic values of the 8 high-order 0x8 are all is 0. At this time, the output BL<55:0> of the multiplexer 306 is 8 high bits 0x8 and ABUS<63:16>. When the control signal of the multiplexer 306 is logic 0, the multiplexer 306 receives and outputs BBUS<55:0>. At this time, the output BL<55:0> of the multiplexer 306 is BBUS<55:0>. In one embodiment, for each byte of each bus, a multiplexer having a width of 1 bit may be used. If the control code INSTR represents the PMIN command, then ABUS<63:16> will be selected. ABUS<63:16> has 6 bytes A7-A2. The bytes A7-A2 can be divided into 3 pairs respectively. Bytes A7 and A6 may make up word W3. Bytes A5 and A4 may make up word W2. Bytes A3 and A2 may make up word W1. If the control code INSTR represents the PSAD instruction, BBUS<55:0> will be selected. BBUS<55:0> has the second operand of 7 low bytes B6~B0. When the control terminal of the multiplexer 308 is logic 1, the multiplexer 308 receives and outputs 8 high bits 0x8 and ABUS<127:79>, wherein the logic values of the 8 high bits 0x8 are all 0. At this time, the output BH<55:0> of the multiplexer 308 is a combination of 8 high bits 0x8 and ABUS<127:79>. When the control terminal of the multiplexer 308 is logic 0, the multiplexer 308 receives and outputs BBUS<87:32>. At this time, the output BH<55:0> of the multiplexer 308 is BBUS<87:32>. If the control code INSTR is PMIN instruction, ABUS<127:79> will be selected. ABUS<127:79> has 6 bytes A15-A10. There are 3 pairs of bytes A15～A10 respectively. Bytes A15 and A14 may make up word W7. Bytes A13 and A12 may make up word W6. Bytes A11 and A10 may make up word W5. If the control code INSTR is PSAD instruction, BBUS<87:32> will be selected. BBUS<87:32> has 7 high bytes B10~B4, and the 7 high bytes B10~B4 constitute the second operand of the PSAD instruction.

请参考图2，针对PMIN指令而言，利用图3所显示的路径选择电路202的选派，可将字W1及W0提供给AL总线，将字W3～W1提供给BL总线，以便传送到第一加法器电路204。第一加法器电路204将字W0分别与字W1～W3相比较，再将字W1分别与字W2～W3相比较，然后再将字W2与字W3相比较，并根据比较结果，提供相对应的比较位C<5:0>。第一PMIN电路206接收字W3～W0，并将最小字作为PMINVAL<15:0>。第一PMIN电路206指出第一总线ABUS的低位部分的最小字及其相对应位置PMINLOC<1:0>。举例而言，如果最小字位于ABUS<15:0>时，则PMINLOC=00；若最小字位于ABUS<32:16>时，则PMINLOC=01。同样道理，针对PMIN指令而言，可将字W5及W4提供给AH总线，将字W7～W5提供给BH总线，以便传送到第二加法器电路208。第二加法器电路208将字W4与字W5～W7相比较，然后再将字W5分别与字W6～W7相比较，接着将字W6分别与字W7相比较，并根据比较结果，提供相对应的比较位C<11:6>。第二PMIN电路210接收字W7～W4，并将字W7～W4中的最小字的相对应位作为PMINVAL<31:16>。第二PMIN电路210亦指示位于第一总线ABUS的高位部分的最小字的相对应位置PMINLOC<3:2>。举例而言，如果最小字位于ABUS<79:64>时，则PMINLOC=00；若最小字位于ABUS<95:65>时，则PMINLOC=01。高阶/低阶比较器电路212将PMINVAL<15:0>的字与PMINVAL<31:16>的字相比较，用以辨识出何者才是ABUS<127:0>中的最小值。通过高阶/低阶比较器电路212的比较结果，也可得知最小值的相对位置MINLOC<2:0>。Please refer to FIG. 2, for the PMIN instruction, using the selection of the path selection circuit 202 shown in FIG. Adder circuit 204 . The first adder circuit 204 compares the word W0 with the words W1-W3 respectively, then compares the word W1 with the words W2-W3 respectively, then compares the word W2 with the word W3, and provides corresponding The compare bits C<5:0> of the The first PMIN circuit 206 receives the words W3˜W0, and takes the smallest word as PMINVAL<15:0>. The first PMIN circuit 206 indicates the minimum word of the low part of the first bus ABUS and its corresponding position PMINLOC<1:0>. For example, if the minimum word is at ABUS<15:0>, then PMINLOC=00; if the minimum word is at ABUS<32:16>, then PMINLOC=01. Similarly, for the PMIN instruction, the words W5 and W4 can be provided to the AH bus, and the words W7 ˜ W5 can be provided to the BH bus for transmission to the second adder circuit 208 . The second adder circuit 208 compares word W4 with words W5-W7, then compares word W5 with words W6-W7 respectively, then compares word W6 with word W7 respectively, and provides corresponding Compare bits C<11:6> of the The second PMIN circuit 210 receives the words W7-W4, and sets the corresponding bits of the smallest word in the words W7-W4 as PMINVAL<31:16>. The second PMIN circuit 210 also indicates the corresponding position PMINLOC<3:2> of the minimum word located in the upper part of the first bus ABUS. For example, if the minimum word is at ABUS<79:64>, then PMINLOC=00; if the minimum word is at ABUS<95:65>, then PMINLOC=01. The high-order/low-order comparator circuit 212 compares the words of PMINVAL<15:0> with the words of PMINVAL<31:16> to identify which is the minimum value in ABUS<127:0>. Through the comparison result of the high-order/low-order comparator circuit 212 , the relative position MINLOC<2:0> of the minimum value can also be known.

请参考图2，针对PSAD指令而言，路径选择电路202(如图3所示)通过字节的选派，将来自第一总线ABUS的第一运算元的字节A3～A0提供给AL<31:0>及AH<31:0>，并分别将AL<31:0>提供予第一加法器电路204以及将AH<31:0>提供予第二加法器电路208。路径选择电路202将来自第二总线BBUS的第二运算元的字节B6～B0作为BL<55:0>，并将BL<55:0>传送至第一加法器电路204。路径选择电路202将来自第二总线BBUS的第二运算元的字节B10～B4作为BH<55:0>，并将BH<55:0>传送至第二加法器电路208。针对PSAD指令而言，第一加法器电路204将字节A0与B0间的差、字节A1与B1间的差、字节A2与B2间的差与字节A3与B3间的差加总在一起，并提供第一10位的结果PSAD<9:0>。第一加法器电路204将字节A0与B1间的差、字节A1与B2间的差、字节A2与B3间的差与字节A3与B4间的差加总在一起，并提供第二10位的结果PSAD<19:10>。第一加法器电路204将字节A0与B2间的差、字节A1与B3间的差、字节A2与B4间的差与字节A3与B5间的差加总在一起，并提供第三10位的结果PSAD<29:20>。第一加法器电路204将字节A0与B3间的差、字节A1与B4间的差、字节A2与B5间的差与字节A3与B6间的差加总在一起，并提供第三10位的结果PSAD<39:30>。同样道理，第二加法器电路208将字节A0与B4间的差、字节A1与B5间的差、字节A2与B6间的差与字节A3与B7间的差加总在一起，并提供第一10位的结果PSAD<49:40>。第二加法器电路208将字节A0与B5间的差、字节A1与B6间的差、字节A2与B7间的差与字节A3与B8间的差加总在一起，并提供第二10位的结果PSAD<59:50>。第二加法器电路208将字节A0与B6间的差、字节A1与B7间的差、字节A2与B8间的差与字节A3与B9间的差加总在一起，并提供第三10位的结果PSAD<69:60>。第二加法器电路208将字节A0与B7间的差、字节A1与B8间的差、字节A2与B9间的差与字节A3与B10间的差加总在一起，并提供第四10位的结果PSAD<79:70>。Please refer to FIG. 2, for the PSAD instruction, the path selection circuit 202 (as shown in FIG. 3 ) provides the bytes A3-A0 of the first operand from the first bus ABUS to AL<31 through selection of bytes. :0> and AH<31:0>, and provide AL<31:0> to the first adder circuit 204 and AH<31:0> to the second adder circuit 208, respectively. The path selection circuit 202 takes the bytes B6 ˜ B0 of the second operand from the second bus BBUS as BL<55:0>, and transmits the BL<55:0> to the first adder circuit 204 . The path selection circuit 202 takes the bytes B10 ˜ B4 of the second operand from the second bus BBUS as BH<55:0>, and transmits the BH<55:0> to the second adder circuit 208 . For the PSAD instruction, the first adder circuit 204 sums the difference between bytes A0 and B0, the difference between bytes A1 and B1, the difference between bytes A2 and B2, and the difference between bytes A3 and B3 together and provide the first 10 bits of the resulting PSAD<9:0>. The first adder circuit 204 sums together the difference between bytes A0 and B1, the difference between bytes A1 and B2, the difference between bytes A2 and B3, and the difference between bytes A3 and B4 and provides the first Two 10-bit results in PSAD<19:10>. The first adder circuit 204 sums together the difference between bytes A0 and B2, the difference between bytes A1 and B3, the difference between bytes A2 and B4, and the difference between bytes A3 and B5 and provides the first Three 10-bit results in PSAD<29:20>. The first adder circuit 204 sums together the difference between bytes A0 and B3, the difference between bytes A1 and B4, the difference between bytes A2 and B5, and the difference between bytes A3 and B6 and provides the first Three 10-bit results PSAD<39:30>. Similarly, the second adder circuit 208 sums the difference between bytes A0 and B4, the difference between bytes A1 and B5, the difference between bytes A2 and B6 and the difference between bytes A3 and B7, And provide the result PSAD<49:40> for the first 10 bits. The second adder circuit 208 sums the difference between bytes A0 and B5, the difference between bytes A1 and B6, the difference between bytes A2 and B7, and the difference between bytes A3 and B8 and provides the first Two 10-bit results PSAD<59:50>. The second adder circuit 208 adds together the difference between bytes A0 and B6, the difference between bytes A1 and B7, the difference between bytes A2 and B8, and the difference between bytes A3 and B9 and provides the first Three 10-bit results in PSAD<69:60>. The second adder circuit 208 adds together the difference between bytes A0 and B7, the difference between bytes A1 and B8, the difference between bytes A2 and B9, and the difference between bytes A3 and B10 and provides Four 10-bit results in PSAD<79:70>.

图4为本发明的第一加法器电路204的一实施例。第一加法器电路204处理AL<31:0>与BL<31:0>中的字节，并提供PSAD<39:0>或C<5:0>。第一加法器电路204包括差异电路(difference circuit)402、总和电路(sumcircuit)404、选择逻辑电路(selection logic)410及选择逻辑电路412。差异电路402具有多个差异单元DIFF1～DIFF8。差异单元DIFF1～DIFF8各自独立。总和电路404具有总和单元S1～S4。总和单元S1～S4各自独立。每一差异单元判断4个字节(即2对字节)之间的差异(无正负号)。每一差异单元将每一对字节的其中的一字节反相后，再与另一字节加总在一起。每一对字节所产生的差异即为误差绝对值。差异单元所接收到的字节数据由一开始所执行的指令所决定。选择逻辑电路410具有多个多工电路。每一多工电路彼此独立。这些多工电路根据一开始所执行的指令，选择特定字节予差异单元DIFF3。如图所示，针对PMIN指令而言，当选择逻辑电路410的控制端为逻辑1时(即控制码INSTR=1)，选择逻辑电路410选择并输出字节BL<47:40>、BL<31:24>、BL<39:32>及BL<23:16>予差异单元DIFF3。字节BL<47:40>、BL<31:24>、BL<39:32>及BL<23:16>分别对应于字节A7～A4。针对PSAD指令而言，当选择逻辑电路410的控制端为逻辑0时(即控制码INSTR=0)，选择逻辑电路410选择并输出字节BL<23:16>、AL<15:8>、BL<15:8>及AL<7:0>予差异单元DIFF3。字节BL<23:16>、AL<15:8>、BL<15:8>及AL<7:0>分别对应于字节B2、A1、B1及A0。同样的道理，针对PMIN指令而言，当选择逻辑电路412的控制端为逻辑1时，选择逻辑电路412选择并输出字节AL<15:8>及AL<7:0>予差异单元DIFF8。字节AL<15:8>及AL<7:0>分别对应于字节A1及A0。针对PSAD指令而言，当选择逻辑电路412的控制端为逻辑0时，选择逻辑电路412选择并输出字节AL<23:16>及AL<15:8>予差异单元DIFF3。字节AL<23:16>及AL<15:8>分别对应于字节A2及A1。FIG. 4 is an embodiment of the first adder circuit 204 of the present invention. The first adder circuit 204 processes the bytes in AL<31:0> and BL<31:0> and provides PSAD<39:0> or C<5:0>. The first adder circuit 204 includes a difference circuit (difference circuit) 402 , a sum circuit (sum circuit) 404 , a selection logic circuit (selection logic) 410 and a selection logic circuit 412 . The difference circuit 402 has a plurality of difference units DIFF1˜DIFF8. The difference units DIFF1 to DIFF8 are independent of each other. The summing circuit 404 has summing units S1 to S4. The summing units S1-S4 are independent of each other. Each difference unit judges the difference (no sign) between 4 bytes (ie, 2 pairs of bytes). Each difference unit inverts one byte of each pair of bytes and then sums it with the other byte. The difference produced by each pair of bytes is the absolute value of the error. The byte data received by the difference unit is determined by the first executed instruction. The selection logic circuit 410 has a plurality of multiplexing circuits. Each multiplexing circuit is independent of each other. These multiplexing circuits select specific bytes for the difference unit DIFF3 according to the instruction executed at the beginning. As shown in the figure, for the PMIN instruction, when the control terminal of the selection logic circuit 410 is logic 1 (that is, the control code INSTR=1), the selection logic circuit 410 selects and outputs the bytes BL<47:40>, BL< 31:24>, BL<39:32> and BL<23:16> to the difference unit DIFF3. Bytes BL<47:40>, BL<31:24>, BL<39:32>, and BL<23:16> correspond to bytes A7˜A4, respectively. For the PSAD instruction, when the control terminal of the selection logic circuit 410 is logic 0 (that is, the control code INSTR=0), the selection logic circuit 410 selects and outputs bytes BL<23:16>, AL<15:8>, BL<15:8> and AL<7:0> are assigned to the difference unit DIFF3. Bytes BL<23:16>, AL<15:8>, BL<15:8>, and AL<7:0> correspond to bytes B2, A1, B1, and A0, respectively. Similarly, for the PMIN instruction, when the control terminal of the selection logic circuit 412 is logic 1, the selection logic circuit 412 selects and outputs bytes AL<15:8> and AL<7:0> to the difference unit DIFF8. Bytes AL<15:8> and AL<7:0> correspond to bytes A1 and A0, respectively. For the PSAD instruction, when the control terminal of the selection logic circuit 412 is logic 0, the selection logic circuit 412 selects and outputs bytes AL<23:16> and AL<15:8> to the difference unit DIFF3. Bytes AL<23:16> and AL<15:8> correspond to bytes A2 and A1, respectively.

针对PSAD指令而言，差异单元DIFF1的第一反相输入端接收字节BL<15:8>。字节BL<15:8>对应字节B1。差异单元DIFF1的第二非反相输入端接收字节AL<15:8>。字节AL<15:8>对应字节A1。差异单元DIFF1确定字节A1与B1之间的误差绝对值(∣A1-B1∣)。差异单元DIFF1将字节A1与B1之间的误差绝对值(∣A1-B1∣)作为结果AD1，并由第一输出端输出。同样地，差异单元DIFF1的第三反相输入端接收字节BL<7:0>。字节BL<7:0>对应字节B0。差异单元DIFF1的第四非反相输入端接收字节AL<7:0>。字节AL<7:0>对应字节A0。差异单元DIFF1确定字节A0与B0之间的误差绝对值(∣A0-B0∣)。差异单元DIFF1将字节A0与B0之间的误差绝对值(∣A0-B0∣)作为结果AD2，并由第二输出端输出。同样地，差异单元DIFF2确定字节A3与B3之间的误差绝对值(∣A3-B3∣)，并字节A3与B3之间的误差绝对值作为AD3，并由第一输出端输出。差异单元DIFF2确定字节A2与B2之间的误差绝对值(∣A2-B2∣)，并将字节A2与B2之间的误差绝对值作为AD4，并由第二输出端输出。总而言之，当控制码INSTR为PSAD指令时，差异电路402确定字节A0分别与字节B0～B3之间的误差绝对值、字节A1分别与字节B1～B4之间的误差绝对值、字节A2分别与字节B2～B5之间的误差绝对值、及字节A3分别与字节B3～B6之间的误差绝对值。For the PSAD instruction, the first inverting input of the difference unit DIFF1 receives the bytes BL<15:8>. Byte BL<15:8> corresponds to Byte B1. The second non-inverting input of difference unit DIFF1 receives bytes AL<15:8>. Byte AL<15:8> corresponds to Byte A1. The difference unit DIFF1 determines the absolute value of the error (|A1-B1 |) between the bytes A1 and B1. The difference unit DIFF1 takes the absolute value of the error (|A1-B1|) between the bytes A1 and B1 as the result AD1, and outputs it from the first output terminal. Likewise, the third inverting input of the difference unit DIFF1 receives the bytes BL<7:0>. Byte BL<7:0> corresponds to Byte B0. The fourth non-inverting input of difference unit DIFF1 receives bytes AL<7:0>. Byte AL<7:0> corresponds to byte A0. Difference unit DIFF1 determines the absolute value of the error (|A0-B0|) between bytes A0 and B0. The difference unit DIFF1 takes the absolute value of the error (|A0-B0|) between the bytes A0 and B0 as the result AD2, and outputs it from the second output terminal. Similarly, the difference unit DIFF2 determines the absolute value of the error between the bytes A3 and B3 (|A3-B3|), and takes the absolute value of the error between the bytes A3 and B3 as AD3, and outputs it from the first output terminal. The difference unit DIFF2 determines the absolute value of the error between the bytes A2 and B2 (|A2-B2|), and takes the absolute value of the error between the bytes A2 and B2 as AD4, and outputs it from the second output terminal. In a word, when the control code INSTR is a PSAD instruction, the difference circuit 402 determines the absolute value of the error between byte A0 and bytes B0-B3 respectively, the absolute value of error between byte A1 and byte B1-B4 respectively, the absolute value of error between byte A1 and byte B1-B4, The absolute values of the errors between the section A2 and the bytes B2-B5 respectively, and the absolute values of the errors between the byte A3 and the bytes B3-B6 respectively.

总和单元S1计算4个字节AD1～AD4的总合，并将计算后的结果作为10位的PSAD<9:0>。总和单元S1的计算结果对应于(∣A0-B0∣)+(∣A1-B1∣)+(∣A2-B2∣)+(∣A3-B3∣)。针对PSAD指令而言，差异单元DIFF3确定A0与B1之间的误差绝对值，并将A0与B1之间的误差绝对值作为AD6。差异单元DIFF3确定A1与B2之间的误差绝对值，并将A1与B2之间的误差绝对值作为AD5。差异单元DIFF4确定A2与B3之间的误差绝对值，并将A2与B3之间的误差绝对值作为AD8。差异单元DIFF4确定A3与B4之间的误差绝对值，并将A3与B4之间的误差绝对值作为AD7。总和单元S2计算4个字节AD5～AD8的总合，并将计算后的结果作为10位的PSAD<19:10>。总和单元S2的计算结果对应于(∣A0-B1∣)+(∣A1-B2∣)+(∣A2-B3∣)+(∣A3-B4∣)。同样地，针对PSAD指令而言，总和单元S3计算4个字节AD9～AD12的总合，并将计算后的结果作为10位的PSAD<29:20>。总和单元S3的计算结果对应于(∣A0-B2∣)+(∣A1-B3∣)+(∣A2-B4∣)+(∣A3-B5∣)。最后，针对PSAD指令而言，总和单元S4计算4个字节AD13～AD16的总合，并将计算后的结果作为10位的PSAD<39:30>。总和单元S3的计算结果对应于(∣A0-B3∣)+(∣A1-B4∣)+(∣A2-B5∣)+(∣A3-B6∣)。虽然图4仅显示第一加法器电路204的一实施例，但第二加法器电路208大致上与第一加法器电路204相似，用以确定字节A0分别与字节B4～B7之间的误差绝对值、字节A1分别与字节B5～B8之间的误差绝对值、字节A2分别与字节B6～B9之间的误差绝对值、以及字节A3分别与字节B7～B10之间的误差绝对值。另外，第二加法器电路208加总4个误差绝对值，并根据加总后的结果，提供4个加总值。PSAD<79:40>包含这4个加总值。The sum unit S1 calculates the sum of 4 bytes AD1-AD4, and takes the calculated result as 10-bit PSAD<9:0>. The calculation result of the sum unit S1 corresponds to (|A0-B0|)+(|A1-B1|)+(|A2-B2|)+(|A3-B3|). For the PSAD instruction, the difference unit DIFF3 determines the absolute value of the error between A0 and B1, and takes the absolute value of the error between A0 and B1 as AD6. The difference unit DIFF3 determines the absolute value of the error between A1 and B2, and takes the absolute value of the error between A1 and B2 as AD5. The difference unit DIFF4 determines the absolute value of the error between A2 and B3, and takes the absolute value of the error between A2 and B3 as AD8. The difference unit DIFF4 determines the absolute value of the error between A3 and B4, and takes the absolute value of the error between A3 and B4 as AD7. The sum unit S2 calculates the sum of 4 bytes AD5-AD8, and takes the calculated result as 10-bit PSAD<19:10>. The calculation result of the sum unit S2 corresponds to (|A0-B1|)+(|A1-B2|)+(|A2-B3|)+(|A3-B4|). Similarly, for the PSAD instruction, the summing unit S3 calculates the sum of the 4 bytes AD9-AD12, and takes the calculated result as 10-bit PSAD<29:20>. The calculation result of the sum unit S3 corresponds to (|A0-B2|)+(|A1-B3|)+(|A2-B4|)+(|A3-B5|). Finally, for the PSAD instruction, the sum unit S4 calculates the sum of the 4 bytes AD13-AD16, and takes the calculated result as 10-bit PSAD<39:30>. The calculation result of the sum unit S3 corresponds to (|A0-B3|)+(|A1-B4|)+(|A2-B5|)+(|A3-B6|). Although FIG. 4 only shows an embodiment of the first adder circuit 204, the second adder circuit 208 is substantially similar to the first adder circuit 204, and is used to determine the difference between the byte A0 and the bytes B4-B7 respectively. The absolute value of error, the absolute value of error between byte A1 and byte B5~B8, the absolute value of error between byte A2 and byte B6~B9, and the error between byte A3 and byte B7~B10 The absolute value of the error between. In addition, the second adder circuit 208 sums up the four absolute error values, and provides four summed values according to the summed results. PSAD<79:40> contains these 4 summed values.

总而言之，对于PSAD指令而言，差异电路402用以确定第一数字码集合中的每一字节(A3:A0)与第二数字码集合中的每一字节(B10:B0)之间的误差绝对值。当处理完第一群组B3:B0后，再由下一个较高位开始比较，如B1:B4、B2:B5、B3:B6…等。因此，在8个群组中，将产生误差绝对值AD1～AD4、AD5～AD8、…、AD29～AD32。总和电路404加总每一群组的误差绝对值，并提供相应的误差绝对值总和PSAD<79:0>。In summary, for the PSAD instruction, the difference circuit 402 is used to determine the difference between each byte (A3:A0) in the first digital code set and each byte (B10:B0) in the second digital code set The absolute value of the error. After the first group B3:B0 is processed, the comparison starts from the next higher bit, such as B1:B4, B2:B5, B3:B6...etc. Therefore, in the 8 groups, absolute error values AD1-AD4, AD5-AD8, . . . , AD29-AD32 will be generated. The summing circuit 404 sums the absolute error values of each group and provides a corresponding sum of absolute error values PSAD<79:0>.

当控制码INSTR为PMIN指令时，除了所选派的字节不同外，差异电路402的处理方式大致相同。AD1～AD16的总和以及PSAD<39:0>可被省略，只需要比较位C<5:0>。差异单元DIFF1比较或用其它方法确定A1与A3之间的误差绝对值以及A0与A2之间的误差绝对值。第一字节A3为字W1的高字节，而第二字节A1为字W0的高字节。第三字节A2为字W1的低字节，而第四字节A0为字W0的低字节。在本实施例中，差异单元DIFF1分别比较字W1及W0的高字节及低字节。差异单元DIFF1确定比较位C<0>。位C<0>表示哪一个字(W1或W0)为较小的字。同样地，差异单元DIFF2比较字W2及W1的高字节A5与A3，以及比较字W2及W1的低字节A4与A2，用以确定哪一个字(W2或W1)为较小的字，并提供比较位C<3>。同样地，差异单元DIFF3比较字W3及W2的高字节A7与A5，以及比较字W3及W2的低字节A6与A4，用以确定哪一个字(W3或W2)为较小的字，并提供比较位C<5>。针对PMIN指令而言，可省略差异单元DIFF4。差异单元DIFF5比较字W2及W0的高字节A5与A1，以及比较字W2及W0的低字节A4与A0，用以确定哪一个字(W2或W0)为较小的字，并提供比较位C<1>。差异单元DIFF6比较字W3及W1的高字节A7与A3，以及比较字W3及W1的低字节A6与A2，用以确定哪一个字(W3或W1)为较小的字，并提供比较位C<4>。针对PMIN而言，可省略差异单元DIFF7。差异单元DIFF8比较字W3及W0的高字节A7与A1，以及比较字W3及W0的低字节A6与A0，用以确定哪一个字(W3或W0)为较小的字，并提供比较位C<2>。When the control code INSTR is the PMIN instruction, except for the selected byte is different, the processing method of the difference circuit 402 is roughly the same. The sum of AD1~AD16 and PSAD<39:0> can be omitted, only compare bits C<5:0> are needed. Difference unit DIFF1 compares or otherwise determines the absolute value of the error between A1 and A3 and the absolute value of the error between A0 and A2. The first byte A3 is the high byte of word W1 and the second byte A1 is the high byte of word W0. The third byte A2 is the low byte of word W1 and the fourth byte A0 is the low byte of word W0. In this embodiment, the difference unit DIFF1 compares the high byte and low byte of the words W1 and W0 respectively. The difference unit DIFF1 determines the compare bit C<0>. Bit C<0> indicates which word (W1 or W0) is the smaller word. Similarly, difference unit DIFF2 compares high bytes A5 and A3 of words W2 and W1, and compares low bytes A4 and A2 of words W2 and W1 to determine which word (W2 or W1) is the smaller word, And provide compare bit C<3>. Likewise, difference unit DIFF3 compares high bytes A7 and A5 of words W3 and W2, and compares low bytes A6 and A4 of words W3 and W2 to determine which word (W3 or W2) is the smaller word, And provide compare bit C<5>. For the PMIN instruction, the difference unit DIFF4 can be omitted. The difference unit DIFF5 compares the high byte A5 and A1 of the word W2 and W0, and compares the low byte A4 and A0 of the word W2 and W0 to determine which word (W2 or W0) is the smaller word and provide a comparison Bit C<1>. The difference unit DIFF6 compares the high bytes A7 and A3 of the words W3 and W1, and compares the low bytes A6 and A2 of the words W3 and W1 to determine which word (W3 or W1) is the smaller word and provide a comparison Bit C<4>. For PMIN, the difference unit DIFF7 can be omitted. The difference unit DIFF8 compares the high byte A7 and A1 of the word W3 and W0, and compares the low byte A6 and A0 of the word W3 and W0 to determine which word (W3 or W0) is the smaller word and provide a comparison Bit C<2>.

总而言之，针对PMIN指令而言，第一加法器电路204的差异电路402的比较位C<0>表示字W0与W1之间的较小者。比较位C<1>表示字W0与W2之间的较小者。比较位C<2>表示字W0与W3之间的较小者。比较位C<3>表示字W1与W2之间的较小者。比较位C<4>表示字W1与W3之间的较小者。比较位C<5>表示字W2与W3之间的较小者。虽然图4并未显示第二加法器电路208的详细电路，但第二加法器电路208亦具有与第一加法器电路204相同的差异电路，用以针对高阶加法器电路207的字W4～W8进行相同的比较，并提供相应的比较位C<11:6>。因此，针对PMIN而言，比较位C<6>表示字W4与W5之间的较小者。比较位C<7>表示字W4与W6之间的较小者。比较位C<8>表示字W4与W7之间的较小者。比较位C<9>表示字W5与W6之间的较小者。比较位C<10>表示字W5与W7之间的较小者。比较位C<11>表示字W6与W7之间的较小者。第一PMIN电路206利用比较位C<5:0>，辨识出字W0～W3的最小者。第二PMIN电路210利用比较位C<11:6>，辨识出字W4～W7的最小者。In summary, for the PMIN instruction, the compare bit C<0> of the difference circuit 402 of the first adder circuit 204 represents the smaller of the words W0 and W1. Compare bit C<1> indicates the lesser between words W0 and W2. Compare bit C<2> indicates the lesser between words W0 and W3. Compare bit C<3> indicates the smaller of words W1 and W2. Compare bit C<4> indicates the smaller of words W1 and W3. Compare bit C<5> indicates the smaller of words W2 and W3. Although FIG. 4 does not show the detailed circuit of the second adder circuit 208, the second adder circuit 208 also has the same difference circuit as the first adder circuit 204, which is used for the words W4˜ W8 performs the same comparison and provides the corresponding compare bits C<11:6>. Thus, for PMIN, compare bit C<6> represents the lesser of words W4 and W5. Compare bit C<7> indicates the lesser of words W4 and W6. Compare bit C<8> indicates the lesser of words W4 and W7. Compare bit C<9> indicates the lesser of words W5 and W6. Compare bit C<10> indicates the lesser of words W5 and W7. Compare bit C<11> indicates the lesser of words W6 and W7. The first PMIN circuit 206 uses the comparison bits C<5:0> to identify the minimum of the words W0 - W3 . The second PMIN circuit 210 uses the comparison bits C<11:6> to identify the minimum of the words W4˜W7.

图5为本发明的差异单元DIFF1的一实施例。如图所示，差异单元DIFF1具有一加法器对。该加法器对具有一高(或第一)加法器502以及一低(或第二)加法器504。加法器502及504均具有一反相输入端B以及一非反相输入端A。因此，加法器502及加法器504均可执行一减法操作，用以确定反相输入端B及非反相输入端A之间的信号差异。针对PSAD指令而言，加法器502的反相输入端B接收字节B1。针对PMIN指令而言，加法器502的反相输入端B接收字节A3。针对PSAD及PMIN指令而言，加法器502的非反相输入端A接收字节A1。加法器502对反相输入端B所接收到的字节的每一位进行反相操作，用以得到反相值～B，其中～代表二进制中的反相。加法器502将反相后的结果(～B)与输入端A所接收到的字节进行无正负号的加总(即A+～B=A-B)，然后将加总后的结果由输出端SUM输出。加法器502具有一进位输出(carry out；CO)端CO，用以提供一进位输出信号CO1。当加法器502所得到的加总结果发生溢位(overflow)时，进位输出信号CO1为逻辑1。加法器502亦会对加总结果进行增量，并将增量后的结果由输出端INCSUM输出。加法器502具有一传递(propagate)输出端CP。如果加法器将一进位输入(carry input；未提供)输出时，传递输出端CP的传递输出信号CP1为逻辑1。在图5中，虽然没有进位输入，但若加法器502接收并传递进位输入时，则传递输出信号CP1为逻辑1。在一实施例中，将输入端A所接收到的字节的每一位与输入端B所接收到的字节的每一位，一对一地作或运算。经过或运算后，便可得到8个运算结果。再经这8个运算结果进行与运算。根据或运算结果以及与运算结果，便可决定传递输出端CP的传递输出信号CP1的逻辑电平。输出端SUM耦接至反相器508的输入端。针对字节的每一位而言，反相器508具有一独立的反相器。反相器508的输出端耦接多工器506的输入端0。输出端INCSUM耦接耦接多工器506的输入端1。多工器506的选择输入端接收进位输出信号CO1。多工器506的输出信号AD1即为，多工器502的输入端A及B所接收到的字节间的误差绝对值。FIG. 5 is an embodiment of the difference unit DIFF1 of the present invention. As shown, the difference unit DIFF1 has a pair of adders. The pair of adders has a high (or first) adder 502 and a low (or second) adder 504 . Both the adders 502 and 504 have an inverting input terminal B and a non-inverting input terminal A. Therefore, both adder 502 and adder 504 can perform a subtraction operation to determine the signal difference between the inverting input terminal B and the non-inverting input terminal A. For the PSAD instruction, the inverting input B of adder 502 receives byte B1. For the PMIN instruction, the inverting input B of adder 502 receives byte A3. For the PSAD and PMIN instructions, the non-inverting input A of adder 502 receives byte A1. The adder 502 performs an inversion operation on each bit of the byte received by the inversion input terminal B to obtain an inversion value ~B, where ~ represents an inversion in binary. The adder 502 sums the inverted result (~B) and the byte received by the input terminal A without a sign (that is, A+～B=A-B), and then sends the summed result to the output terminal A SUM output. The adder 502 has a carry out (CO) terminal CO for providing a carry out signal CO1. When the summing result obtained by the adder 502 overflows, the carry output signal CO1 is logic 1. The adder 502 also increments the summed result, and outputs the incremented result from the output terminal INCSUM. The adder 502 has a propagate output terminal CP. If the adder outputs a carry input (not provided), the transfer output signal CP1 of the transfer output terminal CP is logic 1. In FIG. 5 , although there is no carry input, if the adder 502 receives and transmits the carry input, the output signal CP1 is logic 1. In one embodiment, each bit of the byte received at the input terminal A and each bit of the byte received at the input terminal B are ORed one-to-one. After the OR operation, 8 operation results can be obtained. After these 8 operation results, an AND operation is performed. According to the OR operation result and the AND operation result, the logic level of the transfer output signal CP1 of the transfer output terminal CP can be determined. The output terminal SUM is coupled to the input terminal of the inverter 508 . Inverter 508 has a separate inverter for each bit of the byte. The output terminal of the inverter 508 is coupled to the input terminal 0 of the multiplexer 506 . The output terminal INCSUM is coupled to the input terminal 1 of the multiplexer 506 . The select input terminal of the multiplexer 506 receives the carry out signal CO1. The output signal AD1 of the multiplexer 506 is the absolute value of the error between the bytes received by the input terminals A and B of the multiplexer 502 .

同样地，针对PSAD指令而言，加法器504的反相输入端B接收字节B0。针对PMIN指令而言，加法器504的反相输入端B接收字节A2。针对PSAD及PMIN指令而言，加法器504的输入端A接收字节A0。加法器504对反相输入端B所接收到的字节的每一位进行反相操作，用以产生相反的逻辑值，如～B。加法器504将反相后的结果(～B)与输入端A所接收到的字节进行无正负号的加总，并提供输出信号予输出端INCSUM、SUM及CO。由于加法器504的输出端INCSUM、SUM及CO与加法器502相似，故不再赘述。加法器504的输出端CO提供一进位输出信号CO2。如果加法器504具有一传递输出端CP时，可不使用或省略传递输出端CP。加法器504的CP输出端可以不输出信号。加法器504的输出端INCSUM耦接多工器510的输入端1。多工器510用以提供AD2。加法器504的输出端SUM耦接反相器512的输入端。反相器512的输出端耦接多工器510的输入端0。多工器510的选择输入端接收进位输出信号CO2。与门514的两输入端中的一个接收进位输出信号CO2。或门516用以产生比较位C<0>，或门516的两输入端中的一个接收进位输出信号CO1。加法器502的输出端CP耦接与门514的一输入端。与门514的另一输入端接收加法器504的输出端CO的进位输出信号CO2。与门514的输出端耦接或门516。Likewise, for the PSAD instruction, the inverting input B of adder 504 receives byte B0. For the PMIN instruction, the inverting input B of adder 504 receives byte A2. For the PSAD and PMIN instructions, input A of adder 504 receives byte A0. The adder 504 performs an inversion operation on each bit of the byte received by the inversion input terminal B to generate an opposite logic value, such as ~B. The adder 504 performs an unsigned sum of the inverted result (˜B) and the byte received at the input terminal A, and provides output signals to the output terminals INCSUM, SUM and CO. Since the output terminals INCSUM, SUM and CO of the adder 504 are similar to those of the adder 502 , details are omitted here. The output terminal CO of the adder 504 provides a carry out signal CO2. If the adder 504 has a transfer output CP, the transfer output CP may not be used or omitted. The CP output terminal of the adder 504 may not output a signal. The output terminal INCSUM of the adder 504 is coupled to the input terminal 1 of the multiplexer 510 . The multiplexer 510 is used to provide AD2. The output terminal SUM of the adder 504 is coupled to the input terminal of the inverter 512 . The output terminal of the inverter 512 is coupled to the input terminal 0 of the multiplexer 510 . The select input terminal of the multiplexer 510 receives the carry out signal CO2. One of the two input terminals of the AND gate 514 receives the carry out signal CO2. The OR gate 516 is used to generate the comparison bit C<0>, and one of the two input terminals of the OR gate 516 receives the carry output signal CO1. The output terminal CP of the adder 502 is coupled to an input terminal of the AND gate 514 . The other input terminal of the AND gate 514 receives the carry output signal CO2 from the output terminal CO of the adder 504 . An output terminal of the AND gate 514 is coupled to an OR gate 516 .

针对加法器502及504而言，如果输入端A的字节大于输入端B的字节时，则输出端CO为逻辑1，并且输出端INCSUM表示输入端A及B之间的误差绝对值，即∣A-B∣。当加法器502将进位输出信号CO1设定成逻辑1时，或门516所输出的比较位C<0>=1。当进位输出信号CO1为逻辑1时，输入端A及B的逻辑值可决定加法器502的传递输出信号CP1为逻辑0或1。当进位输出信号CO1为逻辑1时，或门516便可将比较位C<0>设定成逻辑1，因此，对于比较位C<0>而言，传递输出信号CP1的值并不重要。举例而言，如果输入端A所接收到的二进制码为00000100(十进制码为4)，并且输入端B所接收到的二进制码为00000010(十进制码为2)，则输入端A及B之间的差A-B=00000010(十进制码为2)。输入端B所接收到的二进制码会先被反相，故反相后的结果～B=11111101。当输入端A所接收到的二进制码与～B进行无正负号加总时，则加总后的结果A+～B(或A-B)为00000001，并且进位输出信号CO1为逻辑1(传递输出信号CP1=0)。因此，加总后的结果(即输出端SUM的值)并非正确值。反相器(508或512)的输出端为～SUM(即输出端SUM的二进制码的反相值)=11111110。反相器的输出端的值亦并非正确值。输出端INCSUM的值为00000001+1=00000010，这才是正确的值。因此，针对加法器502及504而言，当输入端A的字节大于输入端B的字节时，输出端CO=1，因此，相对应的多工器(506或510)将输入端1的值(即INCSUM)视为正确的输出(输入端A及B间的绝对值)。For the adders 502 and 504, if the byte of the input terminal A is greater than the byte of the input terminal B, the output terminal CO is logic 1, and the output terminal INCSUM represents the absolute value of the error between the input terminals A and B, That is |A-B|. When the adder 502 sets the carry output signal CO1 to logic 1, the comparison bit C<0>=1 output by the OR gate 516 . When the carry output signal CO1 is logic 1, the logic values of the input terminals A and B can determine the transfer output signal CP1 of the adder 502 to be logic 0 or 1. When the carry output signal CO1 is logic 1, the OR gate 516 can set the comparison bit C<0> to logic 1. Therefore, for the comparison bit C<0>, the value of the output signal CP1 is not important. For example, if the binary code received by input A is 00000100 (4 in decimal), and the binary code received by input B is 00000010 (2 in decimal), then the The difference A-B=00000010 (decimal code is 2). The binary code received by the input terminal B will be inverted first, so the result after the inversion ~B=11111101. When the binary code received by the input terminal A and ~B are summed without a sign, the summed result A+~B (or A-B) is 00000001, and the carry output signal CO1 is logic 1 (passing the output signal CP1=0). Therefore, the summed result (ie, the value of the output SUM) is not the correct value. The output terminal of the inverter (508 or 512) is ~SUM (that is, the inverted value of the binary code of the output terminal SUM)=11111110. The value of the output terminal of the inverter is also not the correct value. The value of the output INCSUM is 00000001+1=00000010, which is the correct value. Therefore, for the adders 502 and 504, when the byte of the input terminal A is greater than the byte of the input terminal B, the output terminal CO=1, therefore, the corresponding multiplexer (506 or 510) will input the terminal 1 The value of (ie INCSUM) is considered the correct output (absolute value between inputs A and B).

如果输入端A的值小于等于输入端B的值时，输出端CO=0，并且相对应的多工器会将相对应的反相器(508或512)的输出信号～B视为正确的输出。当输入端A的值等于输入端B的值时，正确的输出为00000000。虽然正确的输出会反应在输出端INCSUM及～SUM中，但由于输出端CO=0，故相对应的多工器会选择～SUM。当输入端A的值等于输入端B的值时，传递输出端CP的值=1。举例而言，当输入端A及B的值均等于00001111时，则输入端A的值加上输入端B的反相值～B等于00001111+11110000=11111111=SUM，并且输出端CP的值=1。输出端SUM的反相值(即～SUM)为00000000，此为正确的值。输出端INCSUM的值为1+11111111，此结果为00000000，这也是正确的值(虽然不会被多工器所选择)。当输入端A的值小于输入端B的值时，输出端CO=0，并且多工器会把～SUM视为正确的值。举例而言，如果输入端A的值为00000010，并且输入端B的值为00000100，则∣A-B∣=00000010。在此例中，A+～B=00000010+11111011=11111101=SUM。由于输出端CO=0，故～SUM=00000010会被作为正确的值。在此例中，输出端INCSUM的值等于1+11111101=11111111，这并非正确的值。If the value of the input terminal A is less than or equal to the value of the input terminal B, the output terminal CO=0, and the corresponding multiplexer will regard the output signal ~B of the corresponding inverter (508 or 512) as correct output. When the value at input A is equal to the value at input B, the correct output is 00000000. Although the correct output will be reflected in the output terminals INCSUM and ~SUM, but because the output terminal CO=0, the corresponding multiplexer will select ~SUM. When the value at input A is equal to the value at input B, the value at output CP = 1. For example, when the values of input terminals A and B are both equal to 00001111, then the value of input terminal A plus the inverse value of input terminal B~B is equal to 00001111+11110000=11111111=SUM, and the value of output terminal CP= 1. The inverted value of the output SUM (ie ~SUM) is 00000000, which is the correct value. The value of the output INCSUM is 1+11111111, which results in 00000000, which is also the correct value (although it will not be selected by the multiplexer). When the value at input A is less than the value at input B, output CO=0, and the multiplexer sees ~SUM as the correct value. For example, if input A has a value of 00000010 and input B has a value of 00000100, then |A-B|=00000010. In this example, A+～B=00000010+11111011=11111101=SUM. Since the output terminal CO=0, ~SUM=00000010 will be regarded as the correct value. In this example, the value of the output INCSUM is equal to 1+11111101=11111111, which is not the correct value.

当控制码INSTR为PSAD指令时，根据PSAD操作，加法器502可得到误差绝对值AD1=∣A1-B1∣，并且加法器504可得到误差绝对值AD2=∣A0-B0∣，并且可省略比较位C<0>。当控制码INSTR为PMIN指令时，如果A1>A3，则字W0的高字节大于字W1的高字节，故W0>W1。在本例中，当W0>W1，由于CO1=1，故C<0>=1。当A3>A1时，加法器502的CO1及CP1均为逻辑0，故C<0>=0，用以代表字W0<W1。如果A1=A3，则加法器502的输出CO1=1并且CP1=0。在本例中，加法器504的相对字的低字节的比较结果会用来判断字W0及W1的相对值。当高字节都相等时，则CP1=1，如果A0>A2，则字W0的低字节大于字W1的低字节，故W0>W1。在本例中，CP1及CO2均为逻辑1，故C<0>=1。如果高字节都相等时，则CP1=1，则A0小于等于A2，故CO2为逻辑0，使得C<0>=0。在本例中，字W0小于等于W1，并且其它例中，字W0被作为最小值。其它的差异电路(DIFF2～DIFF8)的结构及操作均相同，用以判断AD3～AD16。差异单元DIFF4及DIFF7可被简化。特别来说，接收CO及CP，用以判断相对应的比较位C<x>的逻辑装置并非必要。如果必要，也可省略每一独立加法器所使用的传递逻辑。When the control code INSTR is a PSAD instruction, according to the PSAD operation, the adder 502 can obtain the absolute value of the error AD1=|A1-B1|, and the adder 504 can obtain the absolute value of the error AD2=|A0-B0|, and the comparison can be omitted Bit C<0>. When the control code INSTR is the PMIN instruction, if A1>A3, then the high byte of word W0 is greater than the high byte of word W1, so W0>W1. In this example, when W0>W1, since CO1=1, C<0>=1. When A3>A1, both CO1 and CP1 of the adder 502 are logic 0, so C<0>=0, which represents the word W0<W1. If A1=A3, the output of adder 502 CO1=1 and CP1=0. In this example, the comparison result of the low byte of the relative word of the adder 504 is used to determine the relative value of the words W0 and W1. When the high bytes are all equal, then CP1=1, if A0>A2, then the low byte of word W0 is greater than the low byte of word W1, so W0>W1. In this example, both CP1 and CO2 are logic 1, so C<0>=1. If the high bytes are all equal, then CP1=1, then A0 is less than or equal to A2, so CO2 is logic 0, making C<0>=0. In this example, word W0 is less than or equal to W1, and in other examples, word W0 is taken as the minimum value. The structures and operations of other difference circuits (DIFF2-DIFF8) are the same, and are used to judge AD3-AD16. The difference units DIFF4 and DIFF7 can be simplified. In particular, a logic device for receiving CO and CP to determine the corresponding comparison bit C<x> is not necessary. The transfer logic used by each individual adder can also be omitted if necessary.

请参考图4及图5，在PMIN指令及PSAD指令中，均使用相同的加法器电路，特别是每一差异单元里的每一加法器对均可应用在PMIN指令及PSAD指令中。针对PSAD指令而言，每一独立的加法器电路用以得到所输入的字节对间的误差绝对值。对于PMIN指令而言，虽然PSAD指令所得到误差绝对值总和并非必需，但每一加法器对利用字节间的比较，用以确定哪个字具有最小值。在PSAD指令中，路径选择电路将加法器作最大限度的使用，用以帮助PMIN指令。如上所述，针对PMIN指令而言，多个加法器被分成许多加法器对。将一对数字码(如两字)的高部分(如高字节)提供予第一加法器的相对应输入端，并且将该对数字码的低部分(如低字节)提供予第二加法器的相对应输入端。通过修改两加法器，使其得到进位输出。通过加法器对中的高加法器，使其具有传递输出。每一加法器对中的进位输出及传递输出用以确定每一数字码对的最小值。对于PSAD指令而言，加法器处理后的结果用以得到第一运算元及第二运算元之间的误差绝对值，并且对于PMIN指令而言，加法器处理后的结果可得到8个字集合中的最小者，其中第一运算元具有4个字节，第二运算元具有11个字节。Please refer to FIG. 4 and FIG. 5 , in the PMIN instruction and the PSAD instruction, the same adder circuit is used, especially each adder pair in each difference unit can be applied in the PMIN instruction and the PSAD instruction. For the PSAD instruction, each independent adder circuit is used to obtain the absolute value of the error between the input byte pairs. For the PMIN instruction, although the sum of the absolute values of the errors obtained by the PSAD instruction is not necessary, each adder pair uses a comparison between bytes to determine which word has the minimum value. In the PSAD instruction, the path selection circuit makes maximum use of the adder to help the PMIN instruction. As mentioned above, for the PMIN instruction, the multiple adders are divided into a number of adder pairs. The high part (such as the high byte) of a pair of digital codes (such as two words) is provided to the corresponding input terminal of the first adder, and the low part (such as the low byte) of the pair of digital codes is provided to the second The corresponding input of the adder. By modifying the two adders, it gets the carry output. Pass the high adder in the adder pair so that it has a pass-through output. The carry out and transfer out of each adder pair are used to determine the minimum value of each digital code pair. For the PSAD instruction, the result processed by the adder is used to obtain the absolute value of the error between the first operand and the second operand, and for the PMIN instruction, the result processed by the adder can obtain 8 word sets The smallest of , where the first operand has 4 bytes and the second operand has 11 bytes.

图6显示本发明的总和单元S1的一可能实施例。总和单元S1具有加法器602、加法器604及加法器606，用以提供具有10位的结果PSAD<9:0>。加法器602及加法器604均具有8位，加法器606具有9位。加法器602及加法器604与加法器502相似，不同之处在于，加法器602及加法器604不具有反相输入端，并且INCSUM电路并非必需，故可省略。另外，传递输出电路亦并非必要，故可省略。加法器602对于二进制值AD1及AD2进行无正负号加总，并提供一第一总和值SUM1(=AD1+AD2)以及一相对应的进位输出C1。加法器604对二进制值AD3及AD4进行无正负号加总，并提供一第二总和值SUM2(=AD3+AD4)以及一相对应的进位输出C2。进位输出C1作为SUM1的最高有效位(MSB)。进位输出C2作为SUM2的最高有效位(MSB)。加法器606的第一输入端接收进位输出C1及第一总和值SUM1结合后的结果。加法器606的第二输入端接收进位输出C2及第二总和值SUM2结合后的结果。加法器606的两输入端均接收到9位。加法器606对于两输入端所接收到的数据(C1，SUM1+C2，SUM2)进行无正负号加总，并提供具有10位的输出结果PSAD<9:0>。最小的9位PSAD<8:0>系代表无正负号二进制加总的结果，而最高有效位MSB PSAD<9>表示进位输出的加总结果。在本实施例中，总和单元S1加总第一误差绝对值群组(AD1～AD4)，用以得到第一误差绝对值总合PSAD<9:0>。其它的总和单元S2～S4的结构均相同，分别加总误差绝对值群组AD5～AD8、AD9～AD12及AD13～AD16，用以提供误差绝对值总合PSAD<19:10>、PSAD<29:20>及PSAD<39:30>。Fig. 6 shows a possible embodiment of the summing unit S1 of the present invention. The summing unit S1 has an adder 602 , an adder 604 and an adder 606 for providing a 10-bit result PSAD<9:0>. Both adder 602 and adder 604 have 8 bits, and adder 606 has 9 bits. The adder 602 and the adder 604 are similar to the adder 502, the difference is that the adder 602 and the adder 604 do not have an inverting input terminal, and the INCSUM circuit is not necessary, so it can be omitted. In addition, the transfer output circuit is not necessary, so it can be omitted. The adder 602 performs unsigned summation on the binary values AD1 and AD2, and provides a first sum value SUM1 (=AD1+AD2) and a corresponding carry output C1. The adder 604 performs an unsigned sum of the binary values AD3 and AD4, and provides a second sum value SUM2 (=AD3+AD4) and a corresponding carry output C2. The carry out C1 acts as the most significant bit (MSB) of SUM1. The carry out C2 acts as the most significant bit (MSB) of SUM2. The first input terminal of the adder 606 receives the combination result of the carry output C1 and the first sum SUM1 . The second input terminal of the adder 606 receives the combination result of the carry output C2 and the second sum value SUM2 . Adder 606 receives 9 bits at both inputs. The adder 606 performs unsigned summation on the received data ( C1 , SUM1 + C2 , SUM2 ) at the two input terminals, and provides a 10-bit output result PSAD<9:0>. The smallest 9 bits PSAD<8:0> represent the result of unsigned binary addition, while the most significant bit MSB PSAD<9> represents the result of carry-out summation. In this embodiment, the summing unit S1 sums the first error absolute value group ( AD1 ˜ AD4 ) to obtain a first error absolute value total PSAD<9:0>. The structures of the other summing units S2-S4 are the same, respectively summing up the error absolute value groups AD5-AD8, AD9-AD12, and AD13-AD16 to provide the total error absolute value PSAD<19:10>, PSAD<29 :20> and PSAD<39:30>.

图7为本发明的PMIN电路206的一实施例。PMIN电路206具有解码逻辑电路701、选择逻辑电路728以及位置逻辑电路(location logic)703。解码逻辑电路701具有反相器702、反相器704、反相器706、反相器712、反相器714、反相器720、反相器710、反相器718及反相器724以及与门708、与门716、与门722及与门726。与门708、与门716、与门722及与门726均具有三输入端。位置逻辑电路703具有或门730及或门732。或门730及或门732均具有二输入端。比较位C<2:0>分别提供至反相器702、反相器704及反相器706。与门708接收反相器702、反相器704及反相器706的输出。与门708输出信号W0_MIN。当字W0为最小字时，信号W0_MIN为逻辑1。比较位C<3:4>分别提供至反相器712及反相器714。与门716的三输入端分别接收反相器712及反相器714的输出以及比较位C<0>。与门716输出信号W1_MIN。当字W1为最小字时，信号W1_MIN为逻辑1。反相器720的输入端接收C<5>。与门722分别接收反相器720的输出、C<1>及C<3>。与门722输出信号W2_MIN。当字W2为最小字时，信号W2_MIN为逻辑1。反相器710、718及反相器724分别接收信号W0_MIN、W1_MIN及W2_MIN，用以分别产生信号～W0_MIN、～W1_MIN及～W2_MIN。信号～W0_MIN、～W1_MIN及～W2_MIN分别表示相对应的字并非最小值。与门726接收信号～W0_MIN、～W1_MIN及～W2_MIN，并输出信号W3_MIN。当字W3为最小字时，信号W3_MIN为逻辑1。FIG. 7 is an embodiment of the PMIN circuit 206 of the present invention. The PMIN circuit 206 has a decoding logic circuit 701 , a selection logic circuit 728 and a location logic circuit (location logic) 703 . The decoding logic circuit 701 has an inverter 702, an inverter 704, an inverter 706, an inverter 712, an inverter 714, an inverter 720, an inverter 710, an inverter 718, and an inverter 724 and AND gate 708 , AND gate 716 , AND gate 722 and AND gate 726 . Each of the AND gate 708 , the AND gate 716 , the AND gate 722 and the AND gate 726 has three input terminals. The position logic circuit 703 has an OR gate 730 and an OR gate 732 . Both the OR gate 730 and the OR gate 732 have two input terminals. The comparison bits C<2:0> are respectively provided to the inverter 702 , the inverter 704 and the inverter 706 . The AND gate 708 receives the outputs of the inverter 702 , the inverter 704 and the inverter 706 . The AND gate 708 outputs a signal W0_MIN. Signal W0_MIN is logic 1 when word W0 is the minimum word. The comparison bits C<3:4> are provided to the inverter 712 and the inverter 714 respectively. The three input terminals of the AND gate 716 respectively receive the outputs of the inverter 712 and the inverter 714 and the comparison bit C<0>. The AND gate 716 outputs a signal W1_MIN. Signal W1_MIN is logic 1 when word W1 is the minimum word. The input of inverter 720 receives C<5>. The AND gate 722 receives the output of the inverter 720 , C<1> and C<3> respectively. The AND gate 722 outputs a signal W2_MIN. Signal W2_MIN is logic 1 when word W2 is the minimum word. The inverters 710 , 718 and the inverter 724 respectively receive the signals W0_MIN, W1_MIN and W2_MIN to generate the signals ~W0_MIN, ~W1_MIN and ~W2_MIN respectively. The signals ~W0_MIN, ~W1_MIN and ~W2_MIN respectively indicate that the corresponding word is not the minimum value. The AND gate 726 receives the signals ~W0_MIN, ~W1_MIN and ~W2_MIN, and outputs the signal W3_MIN. Signal W3_MIN is logic 1 when word W3 is the minimum word.

AL<15:0>、BL<15:0>、BL<31:16>及BL<47:32>分别代表字W0～W3。选择电路728接收AL<15:0>、BL<15:0>、BL<31:16>、BL<47:32>、信号W0_MIN～W3_MIN。在同一时间，只有信号W0_MIN～W3_MIN中的一个为逻辑1，这表示在此周期内，W0_MIN～W3_MIN的相对应字为最小值。因此，选择电路728将字W0～W3中的一个作为最小字，并将此最小字作为PMINVAL<15:0>而输出。或门730接收信号W3_MIN及W2_MIN。或门730具有一输出端，用以输出相对应位置位PMINCLOC<1>。或门732接收信号W3_MIN及W1_MIN。或门732具有一输出端，用以输出相对应位置位PMINCLOC<0>。在本实施例中，通过PMINVAL<15:0>，可得知字W0～W3的最小者，并且PMINLOC<1:0>表示低阶加法器电路203所接收到的第一总线ABUS的后半部分字中的最小字的相对应位置。PMIN电路210的结构与PMIN电路206相似，用以提供代表字W4～W7最小者的PMINVAL<31:16>以及PMINLOC<3:2>。PMINLOC<3:2>表示高阶加法电路207所接收到的第一总线ABUS的前半部分字中的最小者的相对应位置。AL<15:0>, BL<15:0>, BL<31:16> and BL<47:32> respectively represent words W0-W3. The selection circuit 728 receives AL<15:0>, BL<15:0>, BL<31:16>, BL<47:32>, signals W0_MIN˜W3_MIN. At the same time, only one of the signals W0_MIN˜W3_MIN is logic 1, which means that within this period, the corresponding word of W0_MIN˜W3_MIN is the minimum value. Therefore, the selection circuit 728 selects one of the words W0 to W3 as the minimum word, and outputs the minimum word as PMINVAL<15:0>. OR gate 730 receives signals W3_MIN and W2_MIN. The OR gate 730 has an output terminal for outputting a corresponding bit PMINCLOC<1>. The OR gate 732 receives the signals W3_MIN and W1_MIN. The OR gate 732 has an output terminal for outputting the corresponding bit PMINCLOC<0>. In this embodiment, through PMINVAL<15:0>, the minimum of words W0~W3 can be known, and PMINLOC<1:0> represents the second half of the first bus ABUS received by the low-order adder circuit 203 The relative position of the smallest word in the partial word. The structure of the PMIN circuit 210 is similar to that of the PMIN circuit 206 for providing PMINVAL<31:16> and PMINLOC<3:2> representing the smallest of the words W4˜W7. PMINLOC<3:2> represents the corresponding position of the minimum among the first half words of the first bus ABUS received by the high-order adding circuit 207 .

图8为本发明的高阶/低阶比较电路212的一实施例。16位的比较电路802的反相输入端接收高阶加法器207所提供的PMINVAL<31:16>。比较电路802的非反相输入端接收低阶加法器电路203所提供的PMINVAL<15:0>。比较电路802具有一进位输出端CO，用以提供信号MINLOC<2>。比较电路802比较高阶及低阶的最小字，并且将进位输出作为MINLOC<2>。比较电路802进位输出端CO与上述的加法器的输出端CO相同。如果PMINVAL<15:0>的字大于PMINVAL<31:16>的字时，则比较电路802进位输出端CO的MINLOC<2>为逻辑1，否则MINLOC<2>为逻辑0。MINLOC<2>为位置值MINLOC<2:0>的最高有效位(MSB)。由于MINLOC<2>为逻辑1，故最小值位于第一总线ABUS的前半部字中。相反地，如果MINLOC<2>为逻辑0，则表示最小值位于第一总线ABUS的后半部字中。MINLOC<2>作为多工器804、多工器806及多工器808的选择输入端，多工器804选择字节值PMINVAL<23:16>或PMINVAL<7:0>，作为低字节MINVAL<7:0>。字节值PMINVAL<23:16>或PMINVAL<7:0>表示从高阶及低阶部分所找出的最小字的低字节。多工器806选择字节值PMINVAL<31:24>或PMINVAL<15:8>，作为高字节MINVAL<15:8>。PMINVAL<31:24>或PMINVAL<15:8>表示从高阶及低阶部分所找出的最小字的高字节。多工器808选择位置位PMINLOC<3:2>或PMINLOC<1:0>，作为MINLOC<1:0>。位置位PMINLOC<3:2>或PMINLOC<1:0>表示高阶或低阶部分的最低有效位置位(least significant location bits)。如上所述，比较电路802可判断出MINLOC或是MINLOC<2>的最高有效位。因此，MINLOC<2:0>表示第一总线ABUS的最小字的所在位置。FIG. 8 is an embodiment of the high-order/low-order comparison circuit 212 of the present invention. The inverting input terminal of the 16-bit comparison circuit 802 receives PMINVAL<31:16> provided by the high-order adder 207 . The non-inverting input terminal of the comparison circuit 802 receives PMINVAL<15:0> provided by the low-order adder circuit 203 . The comparison circuit 802 has a carry output terminal CO for providing the signal MINLOC<2>. The comparison circuit 802 compares the minimum word of high order and low order, and outputs the carry as MINLOC<2>. The carry output terminal CO of the comparison circuit 802 is the same as the output terminal CO of the above-mentioned adder. If the word of PMINVAL<15:0> is larger than the word of PMINVAL<31:16>, MINLOC<2> of the carry output terminal CO of the comparison circuit 802 is logic 1, otherwise MINLOC<2> is logic 0. MINLOC<2> is the Most Significant Bit (MSB) of the position value MINLOC<2:0>. Since MINLOC<2> is logic 1, the minimum value is in the first half word of the first bus ABUS. Conversely, if MINLOC<2> is logic 0, it indicates that the minimum value is located in the second half word of the first bus ABUS. MINLOC<2> is used as the selection input terminal of the multiplexer 804, the multiplexer 806 and the multiplexer 808, and the multiplexer 804 selects the byte value PMINVAL<23:16> or PMINVAL<7:0> as the low byte MINVAL<7:0>. The byte value PMINVAL<23:16> or PMINVAL<7:0> represents the low byte of the smallest word found from the high-order and low-order parts. Multiplexer 806 selects the byte value PMINVAL<31:24> or PMINVAL<15:8> as the upper byte MINVAL<15:8>. PMINVAL<31:24> or PMINVAL<15:8> indicate the upper byte of the smallest word found from the high-order and low-order parts. Multiplexer 808 selects bits to set PMINLOC<3:2> or PMINLOC<1:0> as MINLOC<1:0>. Bit bits PMINLOC<3:2> or PMINLOC<1:0> indicate the least significant location bits of the high-order or low-order part. As mentioned above, the comparison circuit 802 can determine whether MINLOC or the most significant bit of MINLOC<2>. Therefore, MINLOC<2:0> indicates the location of the minimum word of the first bus ABUS.

虽然本发明已详细说明许多较佳的实施方式，但其它可能的变化也已仔细考量过。举例而言，上述的所有电路均可利用任何逻辑装置或逻辑电路来实现。上述的逻辑电路的功能也可利用集成装置内的软件或固件来实现。上述的电路可能具有许多反相装置，用以对任何信号提供正相逻辑(positive logic)或反相逻辑(negative logic)。本发明所公开的电路系使用数字码或是二进制字节或字，但并不限定数字码或是二进制码的位数量。虽然本发明已以优选实施例公开如上，然其并非用以限定本发明，本领域技术人员，在不脱离本发明的精神和范围内，当可作些许的更动与润饰，因此本发明的保护范围当视所附权利要求书所界定者为准。While the invention has been described in detail for a number of preferred embodiments, other possible variations have also been considered. For example, all of the circuits described above may be implemented using any logic device or logic circuit. The functions of the above-mentioned logic circuit can also be implemented by software or firmware in an integrated device. The above circuit may have many inverting devices to provide positive logic or negative logic for any signal. The circuits disclosed in the present invention use digital codes or binary bytes or words, but the number of bits in digital codes or binary codes is not limited. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Those skilled in the art may make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the present invention The scope of protection shall prevail as defined by the appended claims.

Claims

1. A system for executing one of a minimum level command and a sum of absolute error command utilizing a shared adder circuit, the system comprising:

A plurality of digital codes. For the instruction of sum of absolute values of errors, these digital codes include a first set of digital codes and a second set of digital codes. For the horizontal minimum instruction, these digital codes include a plurality of pairs of digital codes , each digital code pair has a high digital code and a low digital code;

A plurality of adders, each adder compares a first digital code with a second digital code to provide an absolute error value, a carry output and a transfer output;

A summation circuit, summing up these absolute values of errors to provide a plurality of summed absolute values of errors;

a comparison circuit, combining the carry outputs and the transfer outputs, to find a minimum digital code pair of the digital code pairs; and

A path selection circuit, when executing the horizontal minimum instruction, the path selection circuit transmits each digital code pair of these digital code pairs to at least one adder pair of these adder pairs, so as to combine each digital code pair with Compared with other digital code pairs, when executing the error absolute value sum instruction, the path selection circuit sends the first digital code set and the second digital code set to these adder pairs to obtain the first digital code The absolute value of the error between each digital code of the set and each digital code of the second digital code set, the second digital code set has continuous digital codes.

2. The system according to claim 1, wherein the path selection circuit connects the first digital code set and the second digital code set to a first bus and a second bus when executing the absolute error sum command, They are respectively sent to a third bus and a fourth bus. When executing the horizontal minimum instruction, the path selection circuit sends these digital code pairs from the first bus to the third bus and the fourth bus.

3. The system as claimed in claim 1, wherein when executing the absolute sum of errors instruction, the routing circuit sends each digital code of the first digital code set to a first adder pair of the adders and sending each digital code of the second set of digital codes to a second adder of the first adder pair of the adders.

4. The system of claim 3, wherein the first adder of the first adder pair provides an absolute error value.

5. The system of claim 1, wherein the summing circuit comprises:

a first adder for summing up a first error absolute value pair provided by a first adder pair of the adder pairs to provide a first total value;

a second adder for summing up a second error absolute value pair provided by a second adder pair of the adder pairs to provide a second total value;

A third adder, summing up the first total value and the second total value to provide a total error absolute value.

6. The system of claim 1, wherein each adder pair comprises:

a high adder that compares a high digital code of a first digital code pair with a high digital code of a second digital code pair, the high adder providing these transfer outputs; and

A low adder compares a low code of the first digital code pair with a low code of the second digital code pair.

7. The system as claimed in claim 1, wherein each digital code in the digital codes comprises an unsigned byte when executing the sum-of-absolute-errors instruction, and these digital codes include an unsigned byte when executing the horizontal minimum instruction. Each numeric code in the code includes an unsigned word.

8. The system of claim 1, wherein each of the transfer outputs indicates whether a carry input is incremented by a high adder of a pair of adders.

9. The system of claim 1, wherein the comparison circuit comprises:

a first comparison circuit that combines the carry output and the transfer output of each of the adder pairs to generate a compare bit; and

A second comparison circuit determines a minimum digital code pair of the digital code pairs according to the comparison bits.

10. The system of claim 9, wherein the first comparison circuit of each of the adder pairs has an AND gate and an OR gate, the AND gate combines a transfer output of a high adder with a A transfer output of the low adder is combined to generate a first bit, and the OR gate combines the first bit with a carry output of the high adder to provide a compare bit.

11. The system as claimed in claim 9, wherein the second comparison circuit decodes the comparison bits to provide a plurality of minimum bits, and each minimum bit represents whether each digital code pair of the digital code pairs is a minimum number Code pair.

12. The system of claim 9, further comprising:

A memory for storing these digital code pairs, the second comparison circuit includes a decoding circuit, and the decoding circuit decodes these comparison bits to provide a plurality of minimum bits;

A selection circuit, selects a digital code pair of these digital code pairs, and uses the selected digital code pair as a minimum digital code pair according to the minimum bits, and stores the minimum digital code pair in the memory; and

A position circuit provides a position value according to the minimum bits, and the position value indicates the position of the minimum digital code pair in the memory.

13. A method of executing one of a minimum level command and a sum of absolute error command using a shared adder circuit, the method comprising:

Receive a plurality of digital codes, these digital codes include a first digital code set and a second digital code set when executing the error sum absolute value instruction, these digital codes include a high digital code when executing the horizontal minimum instruction and a low-digit code;

Provide a plurality of adders, each adder compares a first digital code with a second digital code to provide an absolute error value and a carry output;

Aggregate these absolute values of errors to provide a sum of multiple absolute values of errors;

sorting the adders into pairs of adders and providing a transfer output;

Combining the carry outputs and the transfer outputs to obtain a minimum digital code pair of the digital code pairs; and

When executing the horizontal minimum instruction, each digital code pair of these digital code pairs is sent to at least one adder pair of these adder pairs, in order to compare each digital code pair with other digital code pairs. During the error absolute value sum instruction, the first digital code set and the second digital code set are sent to these adder pairs, so as to obtain the difference between each digital code of the first digital code set and the second digital code set The absolute value of the error between each consecutive digital code.

14. The method according to claim 13, wherein when executing the instruction of the sum of absolute values of errors, further comprising:

sending each digital code of the first digital code set to a first adder of a first adder pair of the adders;

Each digital code of the second digital code set is sent to a second adder of the first adder pair of the adders.

15. The method of claim 13, wherein the summing step comprises:

summing up a first error absolute value pair provided by a first adder pair of the adder pairs to generate a first summed value;

summing a second pair of absolute values of errors provided by a second pair of adders of the pair of adders to generate a second summed value; and

Summing up the first total value and the second total value to provide a total error absolute value.

16. The method of claim 13, wherein when executing the horizontal minimum instruction, the step of providing and sorting the adders comprises:

Comparing a high digital code of a first digital code pair and a high digital code of a second digital code pair through a high adder of each adder pair to provide a first carry output and the transfer outputs;

Through a low adder of each adder pair, a low digital code of the first digital code pair is compared with a low digital code of the second digital code pair to provide a second carry output.

17. The method of claim 13, wherein the combining step comprises:

combining a first carry output and a second carry output of each of the adder pairs with one of the transfer outputs to provide a compare bit; and

According to these comparison bits, a minimum digital code pair of these digital code pairs is obtained.

18. The method as claimed in claim 17, wherein the step of knowing the minimum digital code pair of these digital code pairs comprises:

The comparison bits are decoded to provide a plurality of minimum bits, and each minimum bit represents whether the corresponding digital code pair is the minimum digital code pair.

19. The method of claim 18, further comprising:

storing the digital code pairs in a memory; and

According to these minimum bits, find out the position of the minimum digital code pair of these digital code pairs in the memory.