CN103365624A - Judgment system and method - Google Patents
Judgment system and method Download PDFInfo
- Publication number
- CN103365624A CN103365624A CN2013102439659A CN201310243965A CN103365624A CN 103365624 A CN103365624 A CN 103365624A CN 2013102439659 A CN2013102439659 A CN 2013102439659A CN 201310243965 A CN201310243965 A CN 201310243965A CN 103365624 A CN103365624 A CN 103365624A
- Authority
- CN
- China
- Prior art keywords
- digital code
- adder
- pair
- minimum
- circuit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Executing Machine-Instructions (AREA)
Abstract
判断系统及方法。该系统利用一共用加法器电路,执行一水平最小指令及一误差绝对值总和指令中的一个,并包括多个加法器、一加总电路、一比较电路以及一路径选择电路。路径选择电路根据所执行的指令,将多个数字码传送至加法器中。在执行水平最小指令时,这些加法器会被分类成许多加法器对。每一加法器对提供一进位输出及一传递输出。每一加法器对具有一高加法器以及一低加法器。高加法器比较这些数字码的一数字码对的高部分。低加法器比较这些数字码的该数字码对的低部分。根据这些进位输出及这些传递输出,找出最小的数字码。
Judgment system and method. The system uses a common adder circuit to execute one of a horizontal minimum instruction and an error absolute value sum instruction, and includes multiple adders, a summing circuit, a comparison circuit and a path selection circuit. The path selection circuit transmits multiple digital codes to the adder according to the executed instruction. When executing the horizontal minimum instruction, these adders are classified into many adder pairs. Each adder pair provides a carry output and a transfer output. Each adder pair has a high adder and a low adder. The high adder compares the high part of a digital code pair of these digital codes. The low adder compares the low part of the digital code pair of these digital codes. According to these carry outputs and these transfer outputs, the minimum digital code is found.
Description
本申请是申请日为2010年09月07日、申请号为201010277155.1、发明名称为“判断系统及方法”的发明专利申请的分案申请。This application is a divisional application of an invention patent application with an application date of September 07, 2010, an application number of 201010277155.1, and an invention title of "judgment system and method".
技术领域technical field
本发明涉及一种微处理器指令,特别涉及一种用以从一数字码集合(setof digital values)中,判断出最小码的系统及方法,其中最小的数字码作为一水平最小值(horizontal minimum)。The present invention relates to a microprocessor instruction, in particular to a system and method for judging the minimum code from a set of digital values, wherein the smallest digital code is used as a horizontal minimum value (horizontal minimum ).
背景技术Background technique
目前的微处理器(microprocessor)经常被用来执行媒体指令(MediaInstruction),用以增加多媒体应用的效率。举例而言,微处理器架构可能包含一个或多个媒体指令,用以从一数字码集合中辨识出一水平最小值,以及该水平最小值在一总线(bus)或一寄存器(register)的相对应位置(location)。一具体的例子就是英特尔(intel)的SSE4程序参考手册(SSE4programmingreference manual)里的PHMINPOSUW指令。PHMINPOSUW指令由8个无正负号字(unsigned words,128bits)中,找出最小字及最小字的相对应位置,其中最小字具有16个位(bit)。某些已知的微处理器在执行PHMINPOSUW指令时,需要较多的处理程序或是较多时钟周期。举例而言,为了辨识出多个字对里的最小字对,则需要使用4个16位的大小比较器(magnitudecomparators),才能在一第一周期内,将搜寻范围由8个字降低至4个字,再将所找到的4个字反馈(feed back)至2个比较器,用以在一第二周期内,搜寻范围由4个字降低至2个字,最后再将寻找结果反馈给1个比较器,用以在一第三(即最后)周期内,找出2个字里的最小字。在一已知的做法中,通过增加16位比较器的数量以达到在单一周期内执行指令的功能。以7个16位比较器为例,在单一周期内,先利用4个比较器进行第一次的比较,用以将搜寻的范围由8字降低至4字,然后再利用2个比较器,将搜寻的范围由4字降低至2字,最后再利用1比较器,从2字中找出最小者。然而,每一16位比较器会占用微处理器较大的空间,因而增加成本并降低处理效能。The current microprocessor (microprocessor) is often used to execute media instructions (MediaInstruction) to increase the efficiency of multimedia applications. For example, the microprocessor architecture may include one or more media instructions to identify a horizontal minimum value from a set of digital codes, and the horizontal minimum value in a bus (bus) or a register (register) Corresponding location. A specific example is the PHMINPOSUW instruction in Intel's SSE4 programming reference manual (SSE4programmingreference manual). The PHMINPOSUW instruction finds the minimum word and the corresponding position of the minimum word from 8 unsigned words (128 bits), wherein the minimum word has 16 bits. Certain known microprocessors require more processing procedures or more clock cycles when executing the PHMINPOSUW instruction. For example, in order to identify the smallest word pair among multiple word pairs, four 16-bit size comparators (magnitude comparators) are required to reduce the search range from 8 words to 4 words in a first cycle. words, and then feed back the found 4 words to 2 comparators to reduce the search range from 4 words to 2 words in a second cycle, and finally feed back the search results to 1 comparator to find the smallest word of 2 words in a third (ie final) cycle. In a known approach, the function of executing instructions in a single cycle is achieved by increasing the number of 16-bit comparators. Taking 7 16-bit comparators as an example, in a single cycle,
发明内容Contents of the invention
本发明的目的在于,不增加电路的情况下,又可在单一周期从数字码集合中找出最小数字码及其相对应位置。The purpose of the present invention is to find out the minimum digital code and its corresponding position from the digital code set in a single cycle without increasing the circuit.
本发明提供一种判断系统,用以从至少二个二进制码中,找出一最小二进制码。在一实施例中,判断系统包括,一第一加法器、一第二加法器以及一比较电路。第一加法器加总多个第一位以及多个第二位,用以提供一第一进位输出及一第一传递输出。这些第一位为一第一二进制码的高位。这些第二位反相于一第二二进制码的高位。第二加法器加总多个第三位以及多个第四位,用以提供一第二进位输出。这些第三位为第一二进制码的低位。这些第四位反相于第二二进制码的低位。比较电路根据第一及第二进位输出及第一传递输出,判断是否第一二进制码大于第二二进制码。第一及第二二进制码均无正负号(unsigned)。第一及第二加法器执行无正负号二进制加法。该第一传递输出代表该第一加法器是否接收到一进入输入(carry input)。The invention provides a judging system for finding a minimum binary code from at least two binary codes. In an embodiment, the judging system includes a first adder, a second adder and a comparison circuit. The first adder adds up a plurality of first bits and a plurality of second bits to provide a first carry output and a first transfer output. These first bits are high bits of a first binary code. These second bits are inverted to the upper bits of a second binary code. The second adder adds up a plurality of third bits and a plurality of fourth bits to provide a second carry output. These third bits are the lower bits of the first binary code. These fourth bits are the inverse of the lower bits of the second binary code. The comparison circuit judges whether the first binary code is greater than the second binary code according to the first and second carry outputs and the first transfer output. Both the first and second binary codes are unsigned. The first and second adders perform unsigned binary addition. The first transfer output represents whether the first adder receives a carry input.
本发明还提供一种判断系统,用以快速地由多个数字码中,找出一水平最小值。本发明的判断系统包括,多个差异电路、一路径选择电路以及一比较电路。每一差异电路比较两数字码。路径选择电路将这些数字码中的每一个指定给至少一差异电路,用以将每一数字码与其它数字码作比较。每一差异电路可能包括一高加法器以及一低加法器。高加法器比较一第一数字码的高部分及一第二数字码的高部分,用以提供一第一进位输出以及一传递输出。低加法器比较该第一数字码的低部分及该第二数字码的低部分,用以提供一第二进位输出。比较电路比较这些第一及第二进位输出以及比较这些传递输出,用以得知这些数字码中的一最小数字码。The present invention also provides a judging system for quickly finding a horizontal minimum value from multiple digital codes. The judging system of the present invention includes a plurality of difference circuits, a path selection circuit and a comparison circuit. Each difference circuit compares two digital codes. Routing circuitry assigns each of these digital codes to at least one difference circuit for comparing each digital code with other digital codes. Each difference circuit may include a high adder and a low adder. The high adder compares a high part of a first digital code with a high part of a second digital code to provide a first carry output and a transfer output. The low adder compares the low part of the first digital code with the low part of the second digital code to provide a second carry output. The comparison circuit compares the first and second carry outputs and compares the transfer outputs to obtain a minimum digital code among the digital codes.
每一传递输出表示这些差异电路中的一个的高加法器是否接收一进位输入。该比较电路包括一解码电路。解码电路解码比较位,用以提供多个最小位。每一最小位表示相对应的数字码是否为最小数字码。一位置电路告知该最小数字码的存储器位置。判断系统可能被整合在一微处理器芯片中,用以执行一快速的水平最小指令。Each transfer output indicates whether the high adder of one of these difference circuits receives a unary input. The comparison circuit includes a decoding circuit. The decoding circuit decodes the comparison bits to provide a plurality of minimum bits. Each minimum digit indicates whether the corresponding digital code is the minimum digital code. A location circuit informs the memory location of the minimum digital code. The judgment system may be integrated in a microprocessor chip to execute a fast horizontal minimum instruction.
本发明提供一种判断方法,用以找出多个数字码中的一最小数字码。在一可能实施例中,判断方法包括下列步骤,比较一第一数字码的高位以及一第二数字码的高位,用以提供一第一进位输出以及一传递输出;比较该第一数字码的低位以及该第二数字码的低位,用以提供一第二进位输出;以及根据第一及第二进位输出以及该传递输出,判断出该第一或第二数字码为一较小码。本发明的判断方法可能包括,将这些数字码的每一个传送至多个加法器对的至少一加法器对中,用以将每一数字码与其它数字码相比较,以得知一最小数字码。本发明的判断方法还包括,解码比较位。本发明的判断方法还包括,得知最小数字码在一存储器中的位置。The invention provides a judging method for finding a minimum digital code among multiple digital codes. In a possible embodiment, the judging method includes the following steps, comparing the high order of a first digital code and the high order of a second digital code to provide a first carry output and a transfer output; comparing the high order of the first digital code The low bit and the low bit of the second digital code are used to provide a second carry output; and according to the first and second carry output and the transfer output, it is judged that the first or second digital code is a smaller code. The judging method of the present invention may include sending each of these digital codes to at least one adder pair among a plurality of adder pairs, so as to compare each digital code with other digital codes to obtain a minimum digital code . The judging method of the present invention further includes decoding the comparison bit. The judging method of the present invention also includes knowing the location of the minimum digital code in a memory.
本发明提供一种系统,利用一共用加法器电路,执行一水平最小指令及一误差绝对值总和指令中的一个。在一实施例中,该系统包括,多个加法器、一加总电路、一比较电路以及一路径选择电路。输入运算元包括多个数字码。对误差绝对值总和指令而言,这些数字码包括一第一数字码集和以及一第二数字码集合。对水平最小指令而言,这些数字码包括多个数字码对。每一数字码对具有一高数字码以及一低数字码。每一加法器将一第一数字码与一第二数字码作比较,用以提供一误差绝对值以及一进位输出。加总电路加总这些误差绝对值,用以提供多个误差绝对值加总值。这些加法器构成多个加法器对,并提供一传递输出。比较电路结合这些进位输出及这些传递输出,用以找出这些数字码对的一最小数字码对。在执行该水平最小指令时,路径选择电路将这些数字码对的每一数字码对传送至这些加法器对的至少一加法器对,用以将每一数字码对与其它数字码对相比较。在执行该误差绝对值总和指令时,路径选择电路将该第一及第二数字码集合传送至这些加法器对,用以得知该第一数字码集合的每一数字码与该第二数字码集合的每一数字码之间的误差绝对值,该第二数字码集合具有连续的数字码。The present invention provides a system that utilizes a common adder circuit to execute one of a minimum level command and a sum of absolute error command. In one embodiment, the system includes a plurality of adders, a totalization circuit, a comparison circuit and a path selection circuit. Input operands include multiple numeric codes. For the error sum command, the digital codes include a first digital code set and a second digital code set. For horizontal minimal instructions, these codes consist of pairs of codes. Each code pair has a high code and a low code. Each adder compares a first digital code with a second digital code to provide an absolute error value and a carry output. The summing circuit sums up these absolute error values to provide a plurality of summed absolute error values. These adders form adder pairs and provide a transfer output. The comparison circuit combines the carry outputs and the transfer outputs to find a minimum digital code pair of the digital code pairs. When executing the horizontal minimum instruction, the path selection circuit transmits each digital code pair of these digital code pairs to at least one adder pair of these adder pairs, so as to compare each digital code pair with other digital code pairs . When executing the instruction of sum of absolute value of errors, the path selection circuit transmits the first and second digital code sets to these adder pairs, so as to obtain each digital code of the first digital code set and the second digital code The absolute value of the error between each digital code of the code set, the second digital code set has consecutive digital codes.
本发明还提供一种方法,利用一共用加法器电路,执行一水平最小指令以及一误差绝对值总和指令中的一个。在一实施例中,本发明所提供的方法包括:接收多个数字码。在执行误差绝对值总和指令时,这些数字码包括一第一数字码集合以及一第二数字码集合。在执行水平最小指令时,这些数字码包括一高数字码以及一低数字码。本发明所提供的方法还包括,提供多个加法器。每一加法器将一第一数字码与一第二数字码相比较,用以提供一误差绝对值以及一进位输出。本发明所提供的方法还包括,加总这些误差绝对值,用以提供多个误差绝对值总和值;将这些加法器分类成多个加法器对,并提供一传递输出;结合这些进位输出及这些传递输出,用以得知这些数字码对的一最小数字码对;以及在执行该水平最小指令时,将这些数字码对的每一数字码对传送至这些加法器对的至少一加法器对,用以将每一数字码对与其它数字码对相比较,在执行该误差绝对值总和指令时,将该第一及第二数字码集合传送至这些加法器对,用以得知第一数字码集合的每一数字码与该第二数字码集合的每一连续数字码之间的误差绝对值。The present invention also provides a method for executing one of a minimum level command and a sum of absolute error command by using a shared adder circuit. In one embodiment, the method provided by the present invention includes: receiving a plurality of digital codes. These digital codes include a first set of digital codes and a second set of digital codes when executing the command of sum of absolute value of errors. These digital codes include a high digital code and a low digital code when executing the horizontal minimum command. The method provided by the present invention further includes providing multiple adders. Each adder compares a first digital code with a second digital code to provide an absolute error value and a carry output. The method provided by the present invention also includes summing up these absolute values of errors to provide a plurality of sums of absolute values of errors; classifying these adders into a plurality of adder pairs and providing a transfer output; combining these carry outputs and These transfer outputs are used to obtain a minimum digital code pair of these digital code pairs; and when the horizontal minimum instruction is executed, each digital code pair of these digital code pairs is sent to at least one adder of these adder pairs pairs, for comparing each pair of digital codes with other pairs of digital codes, and when executing the instruction of the sum of absolute value of errors, the first and second digital code sets are sent to these adder pairs to obtain the first The absolute value of the error between each digital code of a digital code set and each continuous digital code of the second digital code set.
附图说明Description of drawings
图1显示微处理器100的一实施例。FIG. 1 shows an embodiment of a
图2为比较电路的一实施例。FIG. 2 is an embodiment of a comparison circuit.
图3为本发明的路径选择电路的一实施例。FIG. 3 is an embodiment of the path selection circuit of the present invention.
图4为本发明的第一加法器电路的一实施例。FIG. 4 is an embodiment of the first adder circuit of the present invention.
图5为本发明的差异单元DIFF1的一实施例。FIG. 5 is an embodiment of the difference unit DIFF1 of the present invention.
图6显示本发明的总和单元S1的一实施例。FIG. 6 shows an embodiment of the summing unit S1 of the present invention.
图7为本发明的PMIN电路206的一实施例。FIG. 7 is an embodiment of the
图8为本发明的高阶/低阶比较电路212的一实施例。FIG. 8 is an embodiment of the high-order/low-
【主要元件符号说明】[Description of main component symbols]
100:微处理器;100: microprocessor;
102:排程器;102: scheduler;
104:复杂整数执行单元;104: complex integer execution unit;
106:简单整数执行单元;106: simple integer execution unit;
108:浮点执行单元;108: floating point execution unit;
110:媒体单元;110: media unit;
114、802:比较电路;114, 802: comparison circuit;
112:其它单元;112: other units;
202:路径选择电路;202: path selection circuit;
203:低阶加法器电路;203: low-order adder circuit;
204:第一加法器电路;204: the first adder circuit;
206:第一PMIN电路;206: the first PMIN circuit;
207:高阶加法器电路;207: high-order adder circuit;
208:第二加法器电路;208: the second adder circuit;
210:第二PMIN电路;210: the second PMIN circuit;
212:高阶/低阶比较器电路;212: high-order/low-order comparator circuit;
302:缓冲器电路;302: buffer circuit;
304、306、308、506、510、804、806、808:多工器;304, 306, 308, 506, 510, 804, 806, 808: multiplexer;
402:差异电路;402: difference circuit;
404:总和电路;404: summing circuit;
410、412:选择逻辑电路;410, 412: select a logic circuit;
502、504、602、604、606:加法器;502, 504, 602, 604, 606: adders;
514、708、716、722、726:与门;514, 708, 716, 722, 726: AND gate;
516:或门;516: OR gate;
508、512、702、704、706、712、714、720、710、718、724:反相器;508, 512, 702, 704, 706, 712, 714, 720, 710, 718, 724: inverters;
728:选择电路;728: select circuit;
DIFF1~DIFF8:差异单元;DIFF1~DIFF8: difference unit;
S1~S4:总和单元。S1~S4: Sum unit.
具体实施方式Detailed ways
以下的实施例说明用以让本领域的普通技术人员得以制造和使用本发明公开的内容。优选实施例的修改对于本领域的技术人员将是显而易见的,且此处描述的普遍原理可应用于其他实施例。因此,本发明并未局限于此处提出和说明的特定实施例,其应涵盖所有符合公开在此的原理和新颖特征的最大范围。The following examples illustrate to enable those of ordinary skill in the art to make and use the present disclosure. Modifications to the preferred embodiment will be readily apparent to those skilled in the art, and the general principles described herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments presented and described herein, but is to be given the widest scope consistent with the principles and novel features disclosed herein.
本发明注意到,已知微处理器执行水平最小值指令需使用许多周期。本发明在执行相同的指令时,仅需单一周期,并且不会大量增加电路。本发明提供一种系统及方法,用以快速得知水平最小值,为使本发明的特征和优点能更明显易懂,下文特举出优选实施例,并配合附图(图1~图8),作详细说明。The present invention notes that known microprocessors use many cycles to execute horizontal minimum instructions. When the present invention executes the same instruction, only a single cycle is needed, and the circuit will not be greatly increased. The present invention provides a system and method for quickly obtaining the minimum level value. In order to make the features and advantages of the present invention more obvious and easy to understand, the preferred embodiments are specifically listed below, together with the accompanying drawings (Figs. 1-8 ), for details.
图1为本发明的一实施例中微处理器100的一结构图。处理器100具有比较电路114,比较电路114可由数字码集合中,快速地找出一水平最小值,并得到第一数字码集合及第二数字码集合的误差绝对值总和(sum ofabsolute differences)。在本实施例中,图1并未显示其它已知的系统及功能,如指令提取(instruction fetch)、指令队列(instruction queue)、指令解码(instruction decoding)、以及指令重排(Instruction reordering)…等。虽然图1没有显示部分已知技术,但并不会影响对于本发明的理解。微处理器100具有排程器(scheduler)102。排程器102安排(route)指令或操作的程序,用以选择算术逻辑单元(arithmetic logic units;ALUs)或是执行单元(executionunits;EUs)。如图1所示,排程器102耦接复杂整数执行单元(complex integerexecution unit;IEU)104、简单整数执行单元(simple IEU)106、浮点执行单元(floating point execution unit;FPEU)108、媒体单元(media unit)110以及其它单元112,其中其它单元112为其它相似或不同的处理单元。媒体单元110一般执行以媒体为基础的指令及运作,如单指令多数据流式扩展指令集(Streaming SIMD Extensions,SSE)或者多媒体延伸指令集(MultiMediaextension,MMX)及其它类似指令集。SSE是英特尔的x86架构中的一种SIMD指令集,SIMD是指单指令多数据(single instruction multiple data)。媒体单元110具有比较电路114,用以执行至少两独立的媒体指令。在本实施例中,该两媒体指令称为水平最小指令(PMIN指令)及误差绝对值总和指令(PSAD指令)。PSAD指令表示第一数字码(或二进制码)集合及第二数字码(或二进制码)集合的误差绝对值总和,其中第二数字码集合紧随在第一数字码集合之后。稍后将详细说明第一数字码集合及第二数字码集合。通过执行PMIN指令,可得知一最小数字码及其相对应位置。在本实施例中,上述数字码、二进制码以及相对应格式可相互替换,并且这些码代表多个位(bit)或十六进位的数字码。排程器102具有存储器116。存储器116用以存储PSAD指令及PMIN指令的运算元(operand),并具有第一总线ABUS以及第二总线BBUS。在一实施例中,第一总线ABUS以及第二总线BBUS可传送128位,但并非用以限制本发明。在其它实施例中,第一总线ABUS以及第二总线BBUS可传送其它数量的位。虽然媒体单元110一般用以执行其它多种本领域人士所深知的媒体指令,但比较电路114用以执行PSAD及PMIN指令。FIG. 1 is a structural diagram of a
在一可能实施例中,针对PSAD指令而言,第一数字码集合具有4个字节(每一字节具有8个位),其中这4个字节为无正负号字节。针对PSAD指令而言,第二数字码集合具有一字节集合。该字节集合具有11个连续的字节。同一时间,每4个连续的字节会被分类成一个群组。针对第二数字码集合而言,每一下一个4字节群组由下一较高字节开始,意思就是说,每一下一个群组会位移1个字节,因此,会重叠上一个群组的最后3个字节。假设第二数字码集合具有11个字节B0~B10。首先将B0~B3分类成第一群组,接着,由下一个较高字节(如B1)开始,再分类形成第二群组(B1~B4)。因此,第二群组(B1~B4)会重叠第一群组(B0~B3)的最后3个字节(B1~B3)。第一数字码集合的每一字节与第二数字码集合的每一字节之间的差称为误差绝对值。上述误差绝对值会被加总在一起。一具体的例子就是英特尔的SSE4程序参考手册里的MPSADBW指令。针对PSAD指令而言,第一总线ABUS传送第一运算元。第一运算元包括4个无正负号的字节。第二总线BBUS传送第二运算元。第二运算元具有11个无正负号的字节。误差绝对值总和为8个无正负号的10位二进制码。PSAD指令可能包括一个或多个偏移量(offset),用以找到上述运算元。本发明并不限定偏移量的大小,任何偏移量均可通过第一总线ABUS及第二总线BBUS而配置,因此,相对应的运算元会被配置在第一总线ABUS及第二总线BBUS的最右高位位置(right-most bit position)。在本实施例中,省略上述偏移量。在一实施例中,PMIN指令提供第一总线ABUS中的8个无正负号数字字的最小值及该最小值的相对应位置,其中这8个无正负号数字字的每一字具有16位。一具体的例子就是英特尔的SSE4程序参考手册里的PHMINPOSUW指令。针对PMIN指令而言,第一总线ABUS传送8个字,每一字具有16位。第二总线BBUS所传送的位可不被定义或是忽略,亦或是令第二总线BBUS所传送的字与第一总线ABUS相同。在本实施例中,比较电路114在单一周期内,利用相同的加法器电路执行双指令(PMIN指令及PSAD指令)。In a possible embodiment, for the PSAD instruction, the first digital code set has 4 bytes (each byte has 8 bits), wherein the 4 bytes are unsigned bytes. For the PSAD instruction, the second set of digital codes has a set of bytes. This byte set has 11 consecutive bytes. At the same time, every 4 consecutive bytes will be classified into a group. For the second set of digital codes, each next group of 4 bytes starts with the next higher byte, which means that each next group will be shifted by 1 byte, therefore, the previous group will be overlapped The last 3 bytes of the . Assume that the second digital code set has 11 bytes B0-B10. First, B0-B3 are classified into the first group, and then, starting from the next higher byte (such as B1), they are further classified into the second group (B1-B4). Therefore, the second group ( B1 ˜ B4 ) overlaps the last 3 bytes ( B1 ˜ B3 ) of the first group ( B0 ˜ B3 ). The difference between each byte of the first set of digital codes and each byte of the second set of digital codes is called the absolute value of the error. The absolute values of the above errors are summed together. A specific example is the MPSADBW instruction in Intel's SSE4 program reference manual. For the PSAD instruction, the first bus ABUS transfers the first operand. The first operand consists of 4 unsigned bytes. The second bus BBUS transmits the second operand. The second operand has 11 unsigned bytes. The sum of the absolute value of the error is 8 10-bit binary codes without signs. The PSAD instruction may include one or more offsets (offsets) to find the above operands. The present invention does not limit the size of the offset. Any offset can be configured through the first bus ABUS and the second bus BBUS. Therefore, the corresponding operands will be configured on the first bus ABUS and the second bus BBUS The rightmost high bit position (right-most bit position). In this embodiment, the above offset is omitted. In one embodiment, the PMIN instruction provides the minimum value of 8 unsigned digital words in the first bus ABUS and the corresponding position of the minimum value, wherein each of the 8 unsigned digital words has 16 bits. A specific example is the PHMINPOSUW instruction in Intel's SSE4 program reference manual. For the PMIN instruction, the first bus ABUS transfers 8 words, each word has 16 bits. The bits transmitted by the second bus BBUS can be undefined or ignored, or the words transmitted by the second bus BBUS can be the same as those transmitted by the first bus ABUS. In this embodiment, the
图2为本发明的比较电路114的一实施例。如图所示,比较电路114包括,路径选择电路(routing circuit)202、低阶(low-order;LO)加法器电路203、高阶(high-order;HI)加法器电路207、高阶/低阶比较器电路212。路径选择电路202具有二输入端,分别耦接第一总线ABUS及第二总线BBUS。路径选择电路202具有另一输入端,用以接收控制码INSTR。路径选择电路202根据输入端所接收到的控制码INSTR,对来自第一总线ABUS及第二总线BBUS的字节,进行重新排列或重新进行路径选择,用以切分第一总线ABUS及第二总线BBUS的字节。控制码INSTR具有至少1位。在本实施例中,当控制码INSTR等于1时,表示执行PMIN指令;当控制码INSTR等于0时,表示执行PSAD指令。第一总线ABUS被切分成一高位部分AH<31:0>以及一低位部分AL<31:0>,其中高位部分AH<31:0>及低位部分AL<31:0>均具有32位。第二总线BBUS被切分成一高位部分BH<55:0>以及一低位部分BL<55:0>,其中高位部分BH<55:0>及低位部分BL<55:0>均具有56位。稍后将详细说明如何根据一开始所执行的指令,对第一总线ABUS及第二总线BBUS的字节进行重新排列或重新进行路径选择。低阶加法器电路203具有第一加法器电路204。第一加法器电路204耦接第一PMIN电路206。高阶加法器电路207具有第二加法器电路208。第二加法器电路208耦接第二PMIN电路210。FIG. 2 is an embodiment of the
第一加法器电路204接收控制码INSTR、低位部分AL<31:0>及BL<55:0>,并输出误差绝对值总和PSAD<39:0>以及比较位C<5:0>。误差绝对值总和PSAD<39:0>具有40位。比较位C<5:0>具有6位。比较位C<5:0>、AL<15:0>及BL<47:0>被传送至第一PMIN电路206。针对低位部分,第一PMIN电路206输出最小值PMINVAL<15:0>以及相对应位置PMINLOC<1:0>。控制码INSTR、高位部分AH<31:0>及BH<55:0>被传送至第二加法器电路208。第二加法器电路208输出误差绝对值总和PSAD<79:40>以及比较位C<11:6>。误差绝对值总和PSAD<79:40>具有40位。比较位C<11:6>具有6位。比较位C<11:6>、AH<15:0>及BH<47:0>被传送至第二PMIN电路210。针对高位部分,第二PMIN电路210输出最小值PMINVAL<31:16>以及相对应位置PMINLOC<3:2>。将第一PMIN电路206所输出的最小值PMINVAL<15:0>以及相对应位置PMINLOC<1:0>以及第二PMIN电路210所输出的最小值PMINVAL<31:16>以及相对应位置PMINLOC<3:2>相结合,便可产生PMINVAL<31:0>以及PMINLOC<3:0>。高阶/低阶比较器电路212接收PMINVAL<31:0>以及PMINLOC<3:0>,并产生最终的最小数字码MINVAL<15:0>及其相对位置MINLOC<2:0>。The
第一加法器电路204及第二加法器电路208根据指令(即控制码INSTR),对输入的字节进行排列,并进行字节间的比较。针对PSAD指令而言,组合后的PSAD<79:0>具有8个10位的数字码,其中这些数字码没有正负号。这8个10位的数字码为执行误差绝对值总和操作后的结果。针对PSAD指令,第一PMIN电路206、第二PMIN电路210及高阶/低阶比较器电路212可被省略。针对PMIN指令,当每一高位部分及低位部分输入时,可省略PSAD<79:0>,通过第一PMIN电路206及第二PMIN电路210所接收到的比较位C<11:0>,便可得知最小的数字码及相对位置。当第一总线ABUS提供128位的输入数据时,高阶/低阶比较器电路212接收并比较高位部分及低位部分的最小数字码,并输出最小值MINVAL<15:0>以及相对位置MINLOC<2:0>。The
图3为本发明的路径选择电路202的一实施例。路径选择电路202根据特定的指令,用以对第一总线ABUS及第二总线BBUS所提供的数字码进行排列或是重新进行路径选择。缓冲器电路302接收ABUS<31:0>,并针对PSAD指令及PMIN指令,输出相对应的AL<31:0>。在一实施例中,针对每一位,缓冲器电路302可包含一独立的缓冲器,使得ABUS<31:0>可有效地被复制成AL<31:0>。换句话说,AL<31>=ABUS<31>、AL<30>=ABUS<30>、…、AL<0>=ABUS<0>。对于PSAD指令及PMIN指令而言,AL<31:0>具有4个字节A3~A0。对PMIN指令而言,字节A3~A0可分成两对,其中A3及A2可构成字W1,A1及A0可构成字W0。字W1及W0均具有16位。多工器304接收ABUS<95:64>及ABUS<31:0>。当多工器304的控制信号等于逻辑1(或高电平)时,多工器304的输出AH<31:0>等于ABUS<95:64>。当多工器304的控制信号等于逻辑0(或低电平)时,多工器304的输出AH<31:0>等于ABUS<31:0>。在一实施例中,对于32位的AH<31:0>中的每一位,均提供单独的一具有1位宽度的多工器,因此对于每一输入端及输出端均具有单独的多工器路径(MUX path)。如果控制码INSTR代表PMIN指令时,则多工器304将ABUS<95:64>作为AH<31:0>。这32位形成4个字节A11~A8。针对PMIN而言,字节A11~A8可分成两字,其中字节A11及A10可构成字W5,而字节A9及A8可构成字W4。如果控制码INSTR代表PSAD时,则多工器304将ABUS<31:0>作为AH<31:0>。这32位形成4个字节A3~A0。字节的复制就是因为PSAD指令的第一运算元对于高阶及低阶部分来说是相同的,稍后将详细说明。FIG. 3 is an embodiment of the
当多工器306的控制信号为逻辑1时(即控制码INSTR=1),多工器306接收并输出8个高位0x8以及ABUS<63:16>,其中这8个高位0x8的逻辑值均为0。此时,多工器306的输出BL<55:0>为8个高位0x8以及ABUS<63:16>。当多工器306的控制信号为逻辑0时,多工器306接收并输出BBUS<55:0>,此时,多工器306的输出BL<55:0>为BBUS<55:0>。在一实施例中,针对每一总线的每一字节而言,可使用具有1位宽度的多工器。如果控制码INSTR代表PMIN指令时,则ABUS<63:16>会被选择到。ABUS<63:16>具有6个字节A7~A2。字节A7~A2可被分别3对。字节A7及A6可构成字W3。字节A5及A4可构成字W2。字节A3及A2可构成字W1。如果控制码INSTR代表PSAD指令时,BBUS<55:0>会被选择到。BBUS<55:0>具7个低字节B6~B0的第二运算元。当多工器308的控制端为逻辑1时,多工器308接收并输出8个高位0x8以及ABUS<127:79>,其中这8个高位0x8的逻辑值均为0。此时,多工器308的输出BH<55:0>为8个高位0x8以及ABUS<127:79>的组合。当多工器308的控制端为逻辑0时,多工器308接收并输出BBUS<87:32>。此时,多工器308的输出BH<55:0>为BBUS<87:32>。如果控制码INSTR为PMIN指令时,ABUS<127:79>会被选择。ABUS<127:79>具有6个字节A15~A10。字节A15~A10可分别3对。字节A15及A14可构成字W7。字节A13及A12可构成字W6。字节A11及A10可构成字W5。如果控制码INSTR为PSAD指令时,BBUS<87:32>会被选择。BBUS<87:32>具有7个高字节B10~B4的,7个高字节B10~B4构成PSAD指令的第二运算元。When the control signal of the
请参考图2,针对PMIN指令而言,利用图3所显示的路径选择电路202的选派,可将字W1及W0提供给AL总线,将字W3~W1提供给BL总线,以便传送到第一加法器电路204。第一加法器电路204将字W0分别与字W1~W3相比较,再将字W1分别与字W2~W3相比较,然后再将字W2与字W3相比较,并根据比较结果,提供相对应的比较位C<5:0>。第一PMIN电路206接收字W3~W0,并将最小字作为PMINVAL<15:0>。第一PMIN电路206指出第一总线ABUS的低位部分的最小字及其相对应位置PMINLOC<1:0>。举例而言,如果最小字位于ABUS<15:0>时,则PMINLOC=00;若最小字位于ABUS<32:16>时,则PMINLOC=01。同样道理,针对PMIN指令而言,可将字W5及W4提供给AH总线,将字W7~W5提供给BH总线,以便传送到第二加法器电路208。第二加法器电路208将字W4与字W5~W7相比较,然后再将字W5分别与字W6~W7相比较,接着将字W6分别与字W7相比较,并根据比较结果,提供相对应的比较位C<11:6>。第二PMIN电路210接收字W7~W4,并将字W7~W4中的最小字的相对应位作为PMINVAL<31:16>。第二PMIN电路210亦指示位于第一总线ABUS的高位部分的最小字的相对应位置PMINLOC<3:2>。举例而言,如果最小字位于ABUS<79:64>时,则PMINLOC=00;若最小字位于ABUS<95:65>时,则PMINLOC=01。高阶/低阶比较器电路212将PMINVAL<15:0>的字与PMINVAL<31:16>的字相比较,用以辨识出何者才是ABUS<127:0>中的最小值。通过高阶/低阶比较器电路212的比较结果,也可得知最小值的相对位置MINLOC<2:0>。Please refer to FIG. 2, for the PMIN instruction, using the selection of the
请参考图2,针对PSAD指令而言,路径选择电路202(如图3所示)通过字节的选派,将来自第一总线ABUS的第一运算元的字节A3~A0提供给AL<31:0>及AH<31:0>,并分别将AL<31:0>提供予第一加法器电路204以及将AH<31:0>提供予第二加法器电路208。路径选择电路202将来自第二总线BBUS的第二运算元的字节B6~B0作为BL<55:0>,并将BL<55:0>传送至第一加法器电路204。路径选择电路202将来自第二总线BBUS的第二运算元的字节B10~B4作为BH<55:0>,并将BH<55:0>传送至第二加法器电路208。针对PSAD指令而言,第一加法器电路204将字节A0与B0间的差、字节A1与B1间的差、字节A2与B2间的差与字节A3与B3间的差加总在一起,并提供第一10位的结果PSAD<9:0>。第一加法器电路204将字节A0与B1间的差、字节A1与B2间的差、字节A2与B3间的差与字节A3与B4间的差加总在一起,并提供第二10位的结果PSAD<19:10>。第一加法器电路204将字节A0与B2间的差、字节A1与B3间的差、字节A2与B4间的差与字节A3与B5间的差加总在一起,并提供第三10位的结果PSAD<29:20>。第一加法器电路204将字节A0与B3间的差、字节A1与B4间的差、字节A2与B5间的差与字节A3与B6间的差加总在一起,并提供第三10位的结果PSAD<39:30>。同样道理,第二加法器电路208将字节A0与B4间的差、字节A1与B5间的差、字节A2与B6间的差与字节A3与B7间的差加总在一起,并提供第一10位的结果PSAD<49:40>。第二加法器电路208将字节A0与B5间的差、字节A1与B6间的差、字节A2与B7间的差与字节A3与B8间的差加总在一起,并提供第二10位的结果PSAD<59:50>。第二加法器电路208将字节A0与B6间的差、字节A1与B7间的差、字节A2与B8间的差与字节A3与B9间的差加总在一起,并提供第三10位的结果PSAD<69:60>。第二加法器电路208将字节A0与B7间的差、字节A1与B8间的差、字节A2与B9间的差与字节A3与B10间的差加总在一起,并提供第四10位的结果PSAD<79:70>。Please refer to FIG. 2, for the PSAD instruction, the path selection circuit 202 (as shown in FIG. 3 ) provides the bytes A3-A0 of the first operand from the first bus ABUS to AL<31 through selection of bytes. :0> and AH<31:0>, and provide AL<31:0> to the
图4为本发明的第一加法器电路204的一实施例。第一加法器电路204处理AL<31:0>与BL<31:0>中的字节,并提供PSAD<39:0>或C<5:0>。第一加法器电路204包括差异电路(difference circuit)402、总和电路(sumcircuit)404、选择逻辑电路(selection logic)410及选择逻辑电路412。差异电路402具有多个差异单元DIFF1~DIFF8。差异单元DIFF1~DIFF8各自独立。总和电路404具有总和单元S1~S4。总和单元S1~S4各自独立。每一差异单元判断4个字节(即2对字节)之间的差异(无正负号)。每一差异单元将每一对字节的其中的一字节反相后,再与另一字节加总在一起。每一对字节所产生的差异即为误差绝对值。差异单元所接收到的字节数据由一开始所执行的指令所决定。选择逻辑电路410具有多个多工电路。每一多工电路彼此独立。这些多工电路根据一开始所执行的指令,选择特定字节予差异单元DIFF3。如图所示,针对PMIN指令而言,当选择逻辑电路410的控制端为逻辑1时(即控制码INSTR=1),选择逻辑电路410选择并输出字节BL<47:40>、BL<31:24>、BL<39:32>及BL<23:16>予差异单元DIFF3。字节BL<47:40>、BL<31:24>、BL<39:32>及BL<23:16>分别对应于字节A7~A4。针对PSAD指令而言,当选择逻辑电路410的控制端为逻辑0时(即控制码INSTR=0),选择逻辑电路410选择并输出字节BL<23:16>、AL<15:8>、BL<15:8>及AL<7:0>予差异单元DIFF3。字节BL<23:16>、AL<15:8>、BL<15:8>及AL<7:0>分别对应于字节B2、A1、B1及A0。同样的道理,针对PMIN指令而言,当选择逻辑电路412的控制端为逻辑1时,选择逻辑电路412选择并输出字节AL<15:8>及AL<7:0>予差异单元DIFF8。字节AL<15:8>及AL<7:0>分别对应于字节A1及A0。针对PSAD指令而言,当选择逻辑电路412的控制端为逻辑0时,选择逻辑电路412选择并输出字节AL<23:16>及AL<15:8>予差异单元DIFF3。字节AL<23:16>及AL<15:8>分别对应于字节A2及A1。FIG. 4 is an embodiment of the
针对PSAD指令而言,差异单元DIFF1的第一反相输入端接收字节BL<15:8>。字节BL<15:8>对应字节B1。差异单元DIFF1的第二非反相输入端接收字节AL<15:8>。字节AL<15:8>对应字节A1。差异单元DIFF1确定字节A1与B1之间的误差绝对值(∣A1-B1∣)。差异单元DIFF1将字节A1与B1之间的误差绝对值(∣A1-B1∣)作为结果AD1,并由第一输出端输出。同样地,差异单元DIFF1的第三反相输入端接收字节BL<7:0>。字节BL<7:0>对应字节B0。差异单元DIFF1的第四非反相输入端接收字节AL<7:0>。字节AL<7:0>对应字节A0。差异单元DIFF1确定字节A0与B0之间的误差绝对值(∣A0-B0∣)。差异单元DIFF1将字节A0与B0之间的误差绝对值(∣A0-B0∣)作为结果AD2,并由第二输出端输出。同样地,差异单元DIFF2确定字节A3与B3之间的误差绝对值(∣A3-B3∣),并字节A3与B3之间的误差绝对值作为AD3,并由第一输出端输出。差异单元DIFF2确定字节A2与B2之间的误差绝对值(∣A2-B2∣),并将字节A2与B2之间的误差绝对值作为AD4,并由第二输出端输出。总而言之,当控制码INSTR为PSAD指令时,差异电路402确定字节A0分别与字节B0~B3之间的误差绝对值、字节A1分别与字节B1~B4之间的误差绝对值、字节A2分别与字节B2~B5之间的误差绝对值、及字节A3分别与字节B3~B6之间的误差绝对值。For the PSAD instruction, the first inverting input of the difference unit DIFF1 receives the bytes BL<15:8>. Byte BL<15:8> corresponds to Byte B1. The second non-inverting input of difference unit DIFF1 receives bytes AL<15:8>. Byte AL<15:8> corresponds to Byte A1. The difference unit DIFF1 determines the absolute value of the error (|A1-B1 |) between the bytes A1 and B1. The difference unit DIFF1 takes the absolute value of the error (|A1-B1|) between the bytes A1 and B1 as the result AD1, and outputs it from the first output terminal. Likewise, the third inverting input of the difference unit DIFF1 receives the bytes BL<7:0>. Byte BL<7:0> corresponds to Byte B0. The fourth non-inverting input of difference unit DIFF1 receives bytes AL<7:0>. Byte AL<7:0> corresponds to byte A0. Difference unit DIFF1 determines the absolute value of the error (|A0-B0|) between bytes A0 and B0. The difference unit DIFF1 takes the absolute value of the error (|A0-B0|) between the bytes A0 and B0 as the result AD2, and outputs it from the second output terminal. Similarly, the difference unit DIFF2 determines the absolute value of the error between the bytes A3 and B3 (|A3-B3|), and takes the absolute value of the error between the bytes A3 and B3 as AD3, and outputs it from the first output terminal. The difference unit DIFF2 determines the absolute value of the error between the bytes A2 and B2 (|A2-B2|), and takes the absolute value of the error between the bytes A2 and B2 as AD4, and outputs it from the second output terminal. In a word, when the control code INSTR is a PSAD instruction, the
总和单元S1计算4个字节AD1~AD4的总合,并将计算后的结果作为10位的PSAD<9:0>。总和单元S1的计算结果对应于(∣A0-B0∣)+(∣A1-B1∣)+(∣A2-B2∣)+(∣A3-B3∣)。针对PSAD指令而言,差异单元DIFF3确定A0与B1之间的误差绝对值,并将A0与B1之间的误差绝对值作为AD6。差异单元DIFF3确定A1与B2之间的误差绝对值,并将A1与B2之间的误差绝对值作为AD5。差异单元DIFF4确定A2与B3之间的误差绝对值,并将A2与B3之间的误差绝对值作为AD8。差异单元DIFF4确定A3与B4之间的误差绝对值,并将A3与B4之间的误差绝对值作为AD7。总和单元S2计算4个字节AD5~AD8的总合,并将计算后的结果作为10位的PSAD<19:10>。总和单元S2的计算结果对应于(∣A0-B1∣)+(∣A1-B2∣)+(∣A2-B3∣)+(∣A3-B4∣)。同样地,针对PSAD指令而言,总和单元S3计算4个字节AD9~AD12的总合,并将计算后的结果作为10位的PSAD<29:20>。总和单元S3的计算结果对应于(∣A0-B2∣)+(∣A1-B3∣)+(∣A2-B4∣)+(∣A3-B5∣)。最后,针对PSAD指令而言,总和单元S4计算4个字节AD13~AD16的总合,并将计算后的结果作为10位的PSAD<39:30>。总和单元S3的计算结果对应于(∣A0-B3∣)+(∣A1-B4∣)+(∣A2-B5∣)+(∣A3-B6∣)。虽然图4仅显示第一加法器电路204的一实施例,但第二加法器电路208大致上与第一加法器电路204相似,用以确定字节A0分别与字节B4~B7之间的误差绝对值、字节A1分别与字节B5~B8之间的误差绝对值、字节A2分别与字节B6~B9之间的误差绝对值、以及字节A3分别与字节B7~B10之间的误差绝对值。另外,第二加法器电路208加总4个误差绝对值,并根据加总后的结果,提供4个加总值。PSAD<79:40>包含这4个加总值。The sum unit S1 calculates the sum of 4 bytes AD1-AD4, and takes the calculated result as 10-bit PSAD<9:0>. The calculation result of the sum unit S1 corresponds to (|A0-B0|)+(|A1-B1|)+(|A2-B2|)+(|A3-B3|). For the PSAD instruction, the difference unit DIFF3 determines the absolute value of the error between A0 and B1, and takes the absolute value of the error between A0 and B1 as AD6. The difference unit DIFF3 determines the absolute value of the error between A1 and B2, and takes the absolute value of the error between A1 and B2 as AD5. The difference unit DIFF4 determines the absolute value of the error between A2 and B3, and takes the absolute value of the error between A2 and B3 as AD8. The difference unit DIFF4 determines the absolute value of the error between A3 and B4, and takes the absolute value of the error between A3 and B4 as AD7. The sum unit S2 calculates the sum of 4 bytes AD5-AD8, and takes the calculated result as 10-bit PSAD<19:10>. The calculation result of the sum unit S2 corresponds to (|A0-B1|)+(|A1-B2|)+(|A2-B3|)+(|A3-B4|). Similarly, for the PSAD instruction, the summing unit S3 calculates the sum of the 4 bytes AD9-AD12, and takes the calculated result as 10-bit PSAD<29:20>. The calculation result of the sum unit S3 corresponds to (|A0-B2|)+(|A1-B3|)+(|A2-B4|)+(|A3-B5|). Finally, for the PSAD instruction, the sum unit S4 calculates the sum of the 4 bytes AD13-AD16, and takes the calculated result as 10-bit PSAD<39:30>. The calculation result of the sum unit S3 corresponds to (|A0-B3|)+(|A1-B4|)+(|A2-B5|)+(|A3-B6|). Although FIG. 4 only shows an embodiment of the
总而言之,对于PSAD指令而言,差异电路402用以确定第一数字码集合中的每一字节(A3:A0)与第二数字码集合中的每一字节(B10:B0)之间的误差绝对值。当处理完第一群组B3:B0后,再由下一个较高位开始比较,如B1:B4、B2:B5、B3:B6…等。因此,在8个群组中,将产生误差绝对值AD1~AD4、AD5~AD8、…、AD29~AD32。总和电路404加总每一群组的误差绝对值,并提供相应的误差绝对值总和PSAD<79:0>。In summary, for the PSAD instruction, the
当控制码INSTR为PMIN指令时,除了所选派的字节不同外,差异电路402的处理方式大致相同。AD1~AD16的总和以及PSAD<39:0>可被省略,只需要比较位C<5:0>。差异单元DIFF1比较或用其它方法确定A1与A3之间的误差绝对值以及A0与A2之间的误差绝对值。第一字节A3为字W1的高字节,而第二字节A1为字W0的高字节。第三字节A2为字W1的低字节,而第四字节A0为字W0的低字节。在本实施例中,差异单元DIFF1分别比较字W1及W0的高字节及低字节。差异单元DIFF1确定比较位C<0>。位C<0>表示哪一个字(W1或W0)为较小的字。同样地,差异单元DIFF2比较字W2及W1的高字节A5与A3,以及比较字W2及W1的低字节A4与A2,用以确定哪一个字(W2或W1)为较小的字,并提供比较位C<3>。同样地,差异单元DIFF3比较字W3及W2的高字节A7与A5,以及比较字W3及W2的低字节A6与A4,用以确定哪一个字(W3或W2)为较小的字,并提供比较位C<5>。针对PMIN指令而言,可省略差异单元DIFF4。差异单元DIFF5比较字W2及W0的高字节A5与A1,以及比较字W2及W0的低字节A4与A0,用以确定哪一个字(W2或W0)为较小的字,并提供比较位C<1>。差异单元DIFF6比较字W3及W1的高字节A7与A3,以及比较字W3及W1的低字节A6与A2,用以确定哪一个字(W3或W1)为较小的字,并提供比较位C<4>。针对PMIN而言,可省略差异单元DIFF7。差异单元DIFF8比较字W3及W0的高字节A7与A1,以及比较字W3及W0的低字节A6与A0,用以确定哪一个字(W3或W0)为较小的字,并提供比较位C<2>。When the control code INSTR is the PMIN instruction, except for the selected byte is different, the processing method of the
总而言之,针对PMIN指令而言,第一加法器电路204的差异电路402的比较位C<0>表示字W0与W1之间的较小者。比较位C<1>表示字W0与W2之间的较小者。比较位C<2>表示字W0与W3之间的较小者。比较位C<3>表示字W1与W2之间的较小者。比较位C<4>表示字W1与W3之间的较小者。比较位C<5>表示字W2与W3之间的较小者。虽然图4并未显示第二加法器电路208的详细电路,但第二加法器电路208亦具有与第一加法器电路204相同的差异电路,用以针对高阶加法器电路207的字W4~W8进行相同的比较,并提供相应的比较位C<11:6>。因此,针对PMIN而言,比较位C<6>表示字W4与W5之间的较小者。比较位C<7>表示字W4与W6之间的较小者。比较位C<8>表示字W4与W7之间的较小者。比较位C<9>表示字W5与W6之间的较小者。比较位C<10>表示字W5与W7之间的较小者。比较位C<11>表示字W6与W7之间的较小者。第一PMIN电路206利用比较位C<5:0>,辨识出字W0~W3的最小者。第二PMIN电路210利用比较位C<11:6>,辨识出字W4~W7的最小者。In summary, for the PMIN instruction, the compare bit C<0> of the
图5为本发明的差异单元DIFF1的一实施例。如图所示,差异单元DIFF1具有一加法器对。该加法器对具有一高(或第一)加法器502以及一低(或第二)加法器504。加法器502及504均具有一反相输入端B以及一非反相输入端A。因此,加法器502及加法器504均可执行一减法操作,用以确定反相输入端B及非反相输入端A之间的信号差异。针对PSAD指令而言,加法器502的反相输入端B接收字节B1。针对PMIN指令而言,加法器502的反相输入端B接收字节A3。针对PSAD及PMIN指令而言,加法器502的非反相输入端A接收字节A1。加法器502对反相输入端B所接收到的字节的每一位进行反相操作,用以得到反相值~B,其中~代表二进制中的反相。加法器502将反相后的结果(~B)与输入端A所接收到的字节进行无正负号的加总(即A+~B=A-B),然后将加总后的结果由输出端SUM输出。加法器502具有一进位输出(carry out;CO)端CO,用以提供一进位输出信号CO1。当加法器502所得到的加总结果发生溢位(overflow)时,进位输出信号CO1为逻辑1。加法器502亦会对加总结果进行增量,并将增量后的结果由输出端INCSUM输出。加法器502具有一传递(propagate)输出端CP。如果加法器将一进位输入(carry input;未提供)输出时,传递输出端CP的传递输出信号CP1为逻辑1。在图5中,虽然没有进位输入,但若加法器502接收并传递进位输入时,则传递输出信号CP1为逻辑1。在一实施例中,将输入端A所接收到的字节的每一位与输入端B所接收到的字节的每一位,一对一地作或运算。经过或运算后,便可得到8个运算结果。再经这8个运算结果进行与运算。根据或运算结果以及与运算结果,便可决定传递输出端CP的传递输出信号CP1的逻辑电平。输出端SUM耦接至反相器508的输入端。针对字节的每一位而言,反相器508具有一独立的反相器。反相器508的输出端耦接多工器506的输入端0。输出端INCSUM耦接耦接多工器506的输入端1。多工器506的选择输入端接收进位输出信号CO1。多工器506的输出信号AD1即为,多工器502的输入端A及B所接收到的字节间的误差绝对值。FIG. 5 is an embodiment of the difference unit DIFF1 of the present invention. As shown, the difference unit DIFF1 has a pair of adders. The pair of adders has a high (or first)
同样地,针对PSAD指令而言,加法器504的反相输入端B接收字节B0。针对PMIN指令而言,加法器504的反相输入端B接收字节A2。针对PSAD及PMIN指令而言,加法器504的输入端A接收字节A0。加法器504对反相输入端B所接收到的字节的每一位进行反相操作,用以产生相反的逻辑值,如~B。加法器504将反相后的结果(~B)与输入端A所接收到的字节进行无正负号的加总,并提供输出信号予输出端INCSUM、SUM及CO。由于加法器504的输出端INCSUM、SUM及CO与加法器502相似,故不再赘述。加法器504的输出端CO提供一进位输出信号CO2。如果加法器504具有一传递输出端CP时,可不使用或省略传递输出端CP。加法器504的CP输出端可以不输出信号。加法器504的输出端INCSUM耦接多工器510的输入端1。多工器510用以提供AD2。加法器504的输出端SUM耦接反相器512的输入端。反相器512的输出端耦接多工器510的输入端0。多工器510的选择输入端接收进位输出信号CO2。与门514的两输入端中的一个接收进位输出信号CO2。或门516用以产生比较位C<0>,或门516的两输入端中的一个接收进位输出信号CO1。加法器502的输出端CP耦接与门514的一输入端。与门514的另一输入端接收加法器504的输出端CO的进位输出信号CO2。与门514的输出端耦接或门516。Likewise, for the PSAD instruction, the inverting input B of
针对加法器502及504而言,如果输入端A的字节大于输入端B的字节时,则输出端CO为逻辑1,并且输出端INCSUM表示输入端A及B之间的误差绝对值,即∣A-B∣。当加法器502将进位输出信号CO1设定成逻辑1时,或门516所输出的比较位C<0>=1。当进位输出信号CO1为逻辑1时,输入端A及B的逻辑值可决定加法器502的传递输出信号CP1为逻辑0或1。当进位输出信号CO1为逻辑1时,或门516便可将比较位C<0>设定成逻辑1,因此,对于比较位C<0>而言,传递输出信号CP1的值并不重要。举例而言,如果输入端A所接收到的二进制码为00000100(十进制码为4),并且输入端B所接收到的二进制码为00000010(十进制码为2),则输入端A及B之间的差A-B=00000010(十进制码为2)。输入端B所接收到的二进制码会先被反相,故反相后的结果~B=11111101。当输入端A所接收到的二进制码与~B进行无正负号加总时,则加总后的结果A+~B(或A-B)为00000001,并且进位输出信号CO1为逻辑1(传递输出信号CP1=0)。因此,加总后的结果(即输出端SUM的值)并非正确值。反相器(508或512)的输出端为~SUM(即输出端SUM的二进制码的反相值)=11111110。反相器的输出端的值亦并非正确值。输出端INCSUM的值为00000001+1=00000010,这才是正确的值。因此,针对加法器502及504而言,当输入端A的字节大于输入端B的字节时,输出端CO=1,因此,相对应的多工器(506或510)将输入端1的值(即INCSUM)视为正确的输出(输入端A及B间的绝对值)。For the
如果输入端A的值小于等于输入端B的值时,输出端CO=0,并且相对应的多工器会将相对应的反相器(508或512)的输出信号~B视为正确的输出。当输入端A的值等于输入端B的值时,正确的输出为00000000。虽然正确的输出会反应在输出端INCSUM及~SUM中,但由于输出端CO=0,故相对应的多工器会选择~SUM。当输入端A的值等于输入端B的值时,传递输出端CP的值=1。举例而言,当输入端A及B的值均等于00001111时,则输入端A的值加上输入端B的反相值~B等于00001111+11110000=11111111=SUM,并且输出端CP的值=1。输出端SUM的反相值(即~SUM)为00000000,此为正确的值。输出端INCSUM的值为1+11111111,此结果为00000000,这也是正确的值(虽然不会被多工器所选择)。当输入端A的值小于输入端B的值时,输出端CO=0,并且多工器会把~SUM视为正确的值。举例而言,如果输入端A的值为00000010,并且输入端B的值为00000100,则∣A-B∣=00000010。在此例中,A+~B=00000010+11111011=11111101=SUM。由于输出端CO=0,故~SUM=00000010会被作为正确的值。在此例中,输出端INCSUM的值等于1+11111101=11111111,这并非正确的值。If the value of the input terminal A is less than or equal to the value of the input terminal B, the output terminal CO=0, and the corresponding multiplexer will regard the output signal ~B of the corresponding inverter (508 or 512) as correct output. When the value at input A is equal to the value at input B, the correct output is 00000000. Although the correct output will be reflected in the output terminals INCSUM and ~SUM, but because the output terminal CO=0, the corresponding multiplexer will select ~SUM. When the value at input A is equal to the value at input B, the value at output CP = 1. For example, when the values of input terminals A and B are both equal to 00001111, then the value of input terminal A plus the inverse value of input terminal B~B is equal to 00001111+11110000=11111111=SUM, and the value of output terminal CP= 1. The inverted value of the output SUM (ie ~SUM) is 00000000, which is the correct value. The value of the output INCSUM is 1+11111111, which results in 00000000, which is also the correct value (although it will not be selected by the multiplexer). When the value at input A is less than the value at input B, output CO=0, and the multiplexer sees ~SUM as the correct value. For example, if input A has a value of 00000010 and input B has a value of 00000100, then |A-B|=00000010. In this example, A+~B=00000010+11111011=11111101=SUM. Since the output terminal CO=0, ~SUM=00000010 will be regarded as the correct value. In this example, the value of the output INCSUM is equal to 1+11111101=11111111, which is not the correct value.
当控制码INSTR为PSAD指令时,根据PSAD操作,加法器502可得到误差绝对值AD1=∣A1-B1∣,并且加法器504可得到误差绝对值AD2=∣A0-B0∣,并且可省略比较位C<0>。当控制码INSTR为PMIN指令时,如果A1>A3,则字W0的高字节大于字W1的高字节,故W0>W1。在本例中,当W0>W1,由于CO1=1,故C<0>=1。当A3>A1时,加法器502的CO1及CP1均为逻辑0,故C<0>=0,用以代表字W0<W1。如果A1=A3,则加法器502的输出CO1=1并且CP1=0。在本例中,加法器504的相对字的低字节的比较结果会用来判断字W0及W1的相对值。当高字节都相等时,则CP1=1,如果A0>A2,则字W0的低字节大于字W1的低字节,故W0>W1。在本例中,CP1及CO2均为逻辑1,故C<0>=1。如果高字节都相等时,则CP1=1,则A0小于等于A2,故CO2为逻辑0,使得C<0>=0。在本例中,字W0小于等于W1,并且其它例中,字W0被作为最小值。其它的差异电路(DIFF2~DIFF8)的结构及操作均相同,用以判断AD3~AD16。差异单元DIFF4及DIFF7可被简化。特别来说,接收CO及CP,用以判断相对应的比较位C<x>的逻辑装置并非必要。如果必要,也可省略每一独立加法器所使用的传递逻辑。When the control code INSTR is a PSAD instruction, according to the PSAD operation, the
请参考图4及图5,在PMIN指令及PSAD指令中,均使用相同的加法器电路,特别是每一差异单元里的每一加法器对均可应用在PMIN指令及PSAD指令中。针对PSAD指令而言,每一独立的加法器电路用以得到所输入的字节对间的误差绝对值。对于PMIN指令而言,虽然PSAD指令所得到误差绝对值总和并非必需,但每一加法器对利用字节间的比较,用以确定哪个字具有最小值。在PSAD指令中,路径选择电路将加法器作最大限度的使用,用以帮助PMIN指令。如上所述,针对PMIN指令而言,多个加法器被分成许多加法器对。将一对数字码(如两字)的高部分(如高字节)提供予第一加法器的相对应输入端,并且将该对数字码的低部分(如低字节)提供予第二加法器的相对应输入端。通过修改两加法器,使其得到进位输出。通过加法器对中的高加法器,使其具有传递输出。每一加法器对中的进位输出及传递输出用以确定每一数字码对的最小值。对于PSAD指令而言,加法器处理后的结果用以得到第一运算元及第二运算元之间的误差绝对值,并且对于PMIN指令而言,加法器处理后的结果可得到8个字集合中的最小者,其中第一运算元具有4个字节,第二运算元具有11个字节。Please refer to FIG. 4 and FIG. 5 , in the PMIN instruction and the PSAD instruction, the same adder circuit is used, especially each adder pair in each difference unit can be applied in the PMIN instruction and the PSAD instruction. For the PSAD instruction, each independent adder circuit is used to obtain the absolute value of the error between the input byte pairs. For the PMIN instruction, although the sum of the absolute values of the errors obtained by the PSAD instruction is not necessary, each adder pair uses a comparison between bytes to determine which word has the minimum value. In the PSAD instruction, the path selection circuit makes maximum use of the adder to help the PMIN instruction. As mentioned above, for the PMIN instruction, the multiple adders are divided into a number of adder pairs. The high part (such as the high byte) of a pair of digital codes (such as two words) is provided to the corresponding input terminal of the first adder, and the low part (such as the low byte) of the pair of digital codes is provided to the second The corresponding input of the adder. By modifying the two adders, it gets the carry output. Pass the high adder in the adder pair so that it has a pass-through output. The carry out and transfer out of each adder pair are used to determine the minimum value of each digital code pair. For the PSAD instruction, the result processed by the adder is used to obtain the absolute value of the error between the first operand and the second operand, and for the PMIN instruction, the result processed by the adder can obtain 8 word sets The smallest of , where the first operand has 4 bytes and the second operand has 11 bytes.
图6显示本发明的总和单元S1的一可能实施例。总和单元S1具有加法器602、加法器604及加法器606,用以提供具有10位的结果PSAD<9:0>。加法器602及加法器604均具有8位,加法器606具有9位。加法器602及加法器604与加法器502相似,不同之处在于,加法器602及加法器604不具有反相输入端,并且INCSUM电路并非必需,故可省略。另外,传递输出电路亦并非必要,故可省略。加法器602对于二进制值AD1及AD2进行无正负号加总,并提供一第一总和值SUM1(=AD1+AD2)以及一相对应的进位输出C1。加法器604对二进制值AD3及AD4进行无正负号加总,并提供一第二总和值SUM2(=AD3+AD4)以及一相对应的进位输出C2。进位输出C1作为SUM1的最高有效位(MSB)。进位输出C2作为SUM2的最高有效位(MSB)。加法器606的第一输入端接收进位输出C1及第一总和值SUM1结合后的结果。加法器606的第二输入端接收进位输出C2及第二总和值SUM2结合后的结果。加法器606的两输入端均接收到9位。加法器606对于两输入端所接收到的数据(C1,SUM1+C2,SUM2)进行无正负号加总,并提供具有10位的输出结果PSAD<9:0>。最小的9位PSAD<8:0>系代表无正负号二进制加总的结果,而最高有效位MSB PSAD<9>表示进位输出的加总结果。在本实施例中,总和单元S1加总第一误差绝对值群组(AD1~AD4),用以得到第一误差绝对值总合PSAD<9:0>。其它的总和单元S2~S4的结构均相同,分别加总误差绝对值群组AD5~AD8、AD9~AD12及AD13~AD16,用以提供误差绝对值总合PSAD<19:10>、PSAD<29:20>及PSAD<39:30>。Fig. 6 shows a possible embodiment of the summing unit S1 of the present invention. The summing unit S1 has an
图7为本发明的PMIN电路206的一实施例。PMIN电路206具有解码逻辑电路701、选择逻辑电路728以及位置逻辑电路(location logic)703。解码逻辑电路701具有反相器702、反相器704、反相器706、反相器712、反相器714、反相器720、反相器710、反相器718及反相器724以及与门708、与门716、与门722及与门726。与门708、与门716、与门722及与门726均具有三输入端。位置逻辑电路703具有或门730及或门732。或门730及或门732均具有二输入端。比较位C<2:0>分别提供至反相器702、反相器704及反相器706。与门708接收反相器702、反相器704及反相器706的输出。与门708输出信号W0_MIN。当字W0为最小字时,信号W0_MIN为逻辑1。比较位C<3:4>分别提供至反相器712及反相器714。与门716的三输入端分别接收反相器712及反相器714的输出以及比较位C<0>。与门716输出信号W1_MIN。当字W1为最小字时,信号W1_MIN为逻辑1。反相器720的输入端接收C<5>。与门722分别接收反相器720的输出、C<1>及C<3>。与门722输出信号W2_MIN。当字W2为最小字时,信号W2_MIN为逻辑1。反相器710、718及反相器724分别接收信号W0_MIN、W1_MIN及W2_MIN,用以分别产生信号~W0_MIN、~W1_MIN及~W2_MIN。信号~W0_MIN、~W1_MIN及~W2_MIN分别表示相对应的字并非最小值。与门726接收信号~W0_MIN、~W1_MIN及~W2_MIN,并输出信号W3_MIN。当字W3为最小字时,信号W3_MIN为逻辑1。FIG. 7 is an embodiment of the
AL<15:0>、BL<15:0>、BL<31:16>及BL<47:32>分别代表字W0~W3。选择电路728接收AL<15:0>、BL<15:0>、BL<31:16>、BL<47:32>、信号W0_MIN~W3_MIN。在同一时间,只有信号W0_MIN~W3_MIN中的一个为逻辑1,这表示在此周期内,W0_MIN~W3_MIN的相对应字为最小值。因此,选择电路728将字W0~W3中的一个作为最小字,并将此最小字作为PMINVAL<15:0>而输出。或门730接收信号W3_MIN及W2_MIN。或门730具有一输出端,用以输出相对应位置位PMINCLOC<1>。或门732接收信号W3_MIN及W1_MIN。或门732具有一输出端,用以输出相对应位置位PMINCLOC<0>。在本实施例中,通过PMINVAL<15:0>,可得知字W0~W3的最小者,并且PMINLOC<1:0>表示低阶加法器电路203所接收到的第一总线ABUS的后半部分字中的最小字的相对应位置。PMIN电路210的结构与PMIN电路206相似,用以提供代表字W4~W7最小者的PMINVAL<31:16>以及PMINLOC<3:2>。PMINLOC<3:2>表示高阶加法电路207所接收到的第一总线ABUS的前半部分字中的最小者的相对应位置。AL<15:0>, BL<15:0>, BL<31:16> and BL<47:32> respectively represent words W0-W3. The
图8为本发明的高阶/低阶比较电路212的一实施例。16位的比较电路802的反相输入端接收高阶加法器207所提供的PMINVAL<31:16>。比较电路802的非反相输入端接收低阶加法器电路203所提供的PMINVAL<15:0>。比较电路802具有一进位输出端CO,用以提供信号MINLOC<2>。比较电路802比较高阶及低阶的最小字,并且将进位输出作为MINLOC<2>。比较电路802进位输出端CO与上述的加法器的输出端CO相同。如果PMINVAL<15:0>的字大于PMINVAL<31:16>的字时,则比较电路802进位输出端CO的MINLOC<2>为逻辑1,否则MINLOC<2>为逻辑0。MINLOC<2>为位置值MINLOC<2:0>的最高有效位(MSB)。由于MINLOC<2>为逻辑1,故最小值位于第一总线ABUS的前半部字中。相反地,如果MINLOC<2>为逻辑0,则表示最小值位于第一总线ABUS的后半部字中。MINLOC<2>作为多工器804、多工器806及多工器808的选择输入端,多工器804选择字节值PMINVAL<23:16>或PMINVAL<7:0>,作为低字节MINVAL<7:0>。字节值PMINVAL<23:16>或PMINVAL<7:0>表示从高阶及低阶部分所找出的最小字的低字节。多工器806选择字节值PMINVAL<31:24>或PMINVAL<15:8>,作为高字节MINVAL<15:8>。PMINVAL<31:24>或PMINVAL<15:8>表示从高阶及低阶部分所找出的最小字的高字节。多工器808选择位置位PMINLOC<3:2>或PMINLOC<1:0>,作为MINLOC<1:0>。位置位PMINLOC<3:2>或PMINLOC<1:0>表示高阶或低阶部分的最低有效位置位(least significant location bits)。如上所述,比较电路802可判断出MINLOC或是MINLOC<2>的最高有效位。因此,MINLOC<2:0>表示第一总线ABUS的最小字的所在位置。FIG. 8 is an embodiment of the high-order/low-
虽然本发明已详细说明许多较佳的实施方式,但其它可能的变化也已仔细考量过。举例而言,上述的所有电路均可利用任何逻辑装置或逻辑电路来实现。上述的逻辑电路的功能也可利用集成装置内的软件或固件来实现。上述的电路可能具有许多反相装置,用以对任何信号提供正相逻辑(positive logic)或反相逻辑(negative logic)。本发明所公开的电路系使用数字码或是二进制字节或字,但并不限定数字码或是二进制码的位数量。虽然本发明已以优选实施例公开如上,然其并非用以限定本发明,本领域技术人员,在不脱离本发明的精神和范围内,当可作些许的更动与润饰,因此本发明的保护范围当视所附权利要求书所界定者为准。While the invention has been described in detail for a number of preferred embodiments, other possible variations have also been considered. For example, all of the circuits described above may be implemented using any logic device or logic circuit. The functions of the above-mentioned logic circuit can also be implemented by software or firmware in an integrated device. The above circuit may have many inverting devices to provide positive logic or negative logic for any signal. The circuits disclosed in the present invention use digital codes or binary bytes or words, but the number of bits in digital codes or binary codes is not limited. Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Those skilled in the art may make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the present invention The scope of protection shall prevail as defined by the appended claims.
Claims (19)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/605,753 US8386545B2 (en) | 2009-10-26 | 2009-10-26 | System and method of using common adder circuitry for both a horizontal minimum instruction and a sum of absolute differences instruction |
US12/605,753 | 2009-10-26 | ||
US12/605,702 | 2009-10-26 | ||
US12/605,702 US8650232B2 (en) | 2009-10-26 | 2009-10-26 | System and method for determination of a horizontal minimum of digital values |
CN201010277155.1A CN101937333B (en) | 2009-10-26 | 2010-09-07 | Judgment system and method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201010277155.1A Division CN101937333B (en) | 2009-10-26 | 2010-09-07 | Judgment system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103365624A true CN103365624A (en) | 2013-10-23 |
CN103365624B CN103365624B (en) | 2016-08-10 |
Family
ID=43390682
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310703550.5A Active CN103941601B (en) | 2009-10-26 | 2010-09-07 | Microprocessor |
CN201010277155.1A Active CN101937333B (en) | 2009-10-26 | 2010-09-07 | Judgment system and method |
CN201310243965.9A Active CN103365624B (en) | 2009-10-26 | 2010-09-07 | Judgment system and method |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310703550.5A Active CN103941601B (en) | 2009-10-26 | 2010-09-07 | Microprocessor |
CN201010277155.1A Active CN101937333B (en) | 2009-10-26 | 2010-09-07 | Judgment system and method |
Country Status (2)
Country | Link |
---|---|
CN (3) | CN103941601B (en) |
TW (2) | TWI423121B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI423121B (en) * | 2009-10-26 | 2014-01-11 | Via Tech Inc | System and method for determination of a horizontal minimum of digital values |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997023822A1 (en) * | 1995-12-21 | 1997-07-03 | Intel Corporation | A system for providing the absolute difference of unsigned values |
US6104836A (en) * | 1992-02-19 | 2000-08-15 | 8×8, Inc. | Computer architecture for video data processing and method thereof |
US6226737B1 (en) * | 1998-07-15 | 2001-05-01 | Ip-First, L.L.C. | Apparatus and method for single precision multiplication |
CN1588638A (en) * | 2004-08-09 | 2005-03-02 | 中芯联合(北京)微电子有限公司 | Multiple mold multiple scale movement evaluation super large scale integrated circuit system structure and method |
CN1949877A (en) * | 2005-10-12 | 2007-04-18 | 三星电子株式会社 | Adaptive quantization controller and method thereof |
CN101133389A (en) * | 2004-11-10 | 2008-02-27 | 辉达公司 | Multipurpose multiply-add functional unit |
US20080162896A1 (en) * | 2003-01-31 | 2008-07-03 | Via Technologies, Inc. | Apparatus and method for generating packed sum of absolute differences |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4039784A (en) * | 1976-01-30 | 1977-08-02 | Honeywell Inc. | Digital minimum/maximum vector crt display |
US4767949A (en) * | 1987-05-01 | 1988-08-30 | Rca Licensing Corporation | Multibit digital threshold comparator |
CN1063950A (en) * | 1991-01-29 | 1992-08-26 | 盛子昆 | "0" system and method for establishing the system |
US5220525A (en) * | 1991-11-04 | 1993-06-15 | Motorola, Inc. | Recoded iterative multiplier |
KR100265355B1 (en) * | 1997-05-22 | 2000-09-15 | 김영환 | Apparatus for performing multiply operation of floating point data with 2-cycle pipeline scheme in microprocessor |
US5991785A (en) * | 1997-11-13 | 1999-11-23 | Lucent Technologies Inc. | Determining an extremum value and its index in an array using a dual-accumulation processor |
US6377970B1 (en) * | 1998-03-31 | 2002-04-23 | Intel Corporation | Method and apparatus for computing a sum of packed data elements using SIMD multiply circuitry |
US6144975A (en) * | 1998-05-05 | 2000-11-07 | Fmr Corporation | Computer system for intelligent document management |
US7159003B1 (en) * | 2003-02-21 | 2007-01-02 | S3 Graphics Co., Ltd. | Method and apparatus for generating sign-digit format of sum of two numbers |
KR100780937B1 (en) * | 2004-12-20 | 2007-12-03 | 삼성전자주식회사 | Digital processing apparatus and method for horizontal synchronization extraction of video signal |
US20060288061A1 (en) * | 2005-06-20 | 2006-12-21 | Altera Corporation | Smaller and faster comparators |
US7725519B2 (en) * | 2005-10-05 | 2010-05-25 | Qualcom Incorporated | Floating-point processor with selectable subprecision |
CN101174200B (en) * | 2007-05-18 | 2010-09-08 | 清华大学 | A Floating-Point Multiply-Add Fusion Unit with a Five-Stage Pipeline Structure |
TWI423121B (en) * | 2009-10-26 | 2014-01-11 | Via Tech Inc | System and method for determination of a horizontal minimum of digital values |
-
2010
- 2010-09-06 TW TW99130018A patent/TWI423121B/en active
- 2010-09-06 TW TW102142392A patent/TWI489374B/en active
- 2010-09-07 CN CN201310703550.5A patent/CN103941601B/en active Active
- 2010-09-07 CN CN201010277155.1A patent/CN101937333B/en active Active
- 2010-09-07 CN CN201310243965.9A patent/CN103365624B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6104836A (en) * | 1992-02-19 | 2000-08-15 | 8×8, Inc. | Computer architecture for video data processing and method thereof |
WO1997023822A1 (en) * | 1995-12-21 | 1997-07-03 | Intel Corporation | A system for providing the absolute difference of unsigned values |
US6226737B1 (en) * | 1998-07-15 | 2001-05-01 | Ip-First, L.L.C. | Apparatus and method for single precision multiplication |
US20080162896A1 (en) * | 2003-01-31 | 2008-07-03 | Via Technologies, Inc. | Apparatus and method for generating packed sum of absolute differences |
CN1588638A (en) * | 2004-08-09 | 2005-03-02 | 中芯联合(北京)微电子有限公司 | Multiple mold multiple scale movement evaluation super large scale integrated circuit system structure and method |
CN101133389A (en) * | 2004-11-10 | 2008-02-27 | 辉达公司 | Multipurpose multiply-add functional unit |
CN1949877A (en) * | 2005-10-12 | 2007-04-18 | 三星电子株式会社 | Adaptive quantization controller and method thereof |
Non-Patent Citations (1)
Title |
---|
孙海平: "计算机算术中若干前缀计算问题的研究", 《中国优秀博硕士学位论文全文数据库(博)信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
TWI423121B (en) | 2014-01-11 |
CN103941601A (en) | 2014-07-23 |
CN101937333B (en) | 2014-12-10 |
CN101937333A (en) | 2011-01-05 |
CN103365624B (en) | 2016-08-10 |
CN103941601B (en) | 2017-08-11 |
TW201419138A (en) | 2014-05-16 |
TW201115460A (en) | 2011-05-01 |
TWI489374B (en) | 2015-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8650232B2 (en) | System and method for determination of a horizontal minimum of digital values | |
JP7423886B2 (en) | Device for performing logical comparison operations | |
TWI470543B (en) | Simd integer multiply-accumulate instruction for multi-precision arithmetic | |
JP3750820B2 (en) | Device for performing multiplication and addition of packed data | |
EP1403762A2 (en) | Processor executing simd instructions | |
US20150134936A1 (en) | Single instruction multiple data add processors, methods, systems, and instructions | |
US10331404B2 (en) | Number format pre-conversion instructions | |
CN110515589B (en) | Multiplier, data processing method, chip and electronic equipment | |
TW201732637A (en) | Method and apparatus for performing a vector bit shuffle | |
US8386545B2 (en) | System and method of using common adder circuitry for both a horizontal minimum instruction and a sum of absolute differences instruction | |
CN103365624B (en) | Judgment system and method | |
CN112596699A (en) | Multiplier, processor and electronic equipment | |
US8849885B2 (en) | Saturation detector | |
US11221826B2 (en) | Parallel rounding for conversion from binary floating point to binary coded decimal | |
JP2000207210A (en) | Microprocessor | |
CN112667197B (en) | A Parameterized Addition and Subtraction Operation Circuit Based on POSIT Floating Point Format | |
JPH0511980A (en) | Overflow detecting method and circuit | |
US6519620B1 (en) | Saturation select apparatus and method therefor | |
US20240427602A1 (en) | Multi-condition branch instruction for conditional branch operations | |
CN109960486B (en) | Binary data processing method, and apparatus, medium, and system thereof | |
JP2591250B2 (en) | Data processing device | |
TWI599953B (en) | Method and apparatus for performing big-integer arithmetic operations | |
CN118426735A (en) | Variable pipeline error correction and detection addition operation system and method | |
CN115833845A (en) | Position output device and position output method | |
JP2000235477A (en) | Arithmetic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |