[go: up one dir, main page]

CN1963746A - Apparatus and method to realize multiplication of long data by instruction of multiply adding addition - Google Patents

Apparatus and method to realize multiplication of long data by instruction of multiply adding addition Download PDF

Info

Publication number
CN1963746A
CN1963746A CN 200610164874 CN200610164874A CN1963746A CN 1963746 A CN1963746 A CN 1963746A CN 200610164874 CN200610164874 CN 200610164874 CN 200610164874 A CN200610164874 A CN 200610164874A CN 1963746 A CN1963746 A CN 1963746A
Authority
CN
China
Prior art keywords
source operand
long data
compression
result
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610164874
Other languages
Chinese (zh)
Other versions
CN100378654C (en
Inventor
高建良
何子键
徐勇军
李晓维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB2006101648746A priority Critical patent/CN100378654C/en
Publication of CN1963746A publication Critical patent/CN1963746A/en
Application granted granted Critical
Publication of CN100378654C publication Critical patent/CN100378654C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Executing Machine-Instructions (AREA)

Abstract

本发明涉及安全处理器技术领域,公开了使用乘加加指令实现长数据乘法的装置,该装置包括:部分积产生单元,用于实现第一源操作数与第二源操作数的乘积,将得到的乘积输出给压缩单元;压缩单元,用于实现部分积产生单元输入的部分积、第三源操作数和第四源操作数的压缩,将压缩结果输出给加法单元;所述第三源操作数为对应权重的中间结果,第四源操作数为上一次乘加加运算产生的进位;加法单元,用于对接收自压缩单元的压缩结果进行最终加法操作,输出积与进位。本发明同时公开了一种使用乘加加指令实现长数据乘法的方法。利用本发明,大大提高了安全处理器在执行长数据乘法运算时的运算效率和速度。

Figure 200610164874

The present invention relates to the technical field of security processors, and discloses a device for realizing multiplication of long data by using a multiply-accumulate-add instruction. The obtained product is output to the compression unit; the compression unit is used to realize the compression of the partial product input by the partial product generation unit, the third source operand and the fourth source operand, and output the compressed result to the addition unit; the third source The operand is the intermediate result corresponding to the weight, and the fourth source operand is the carry generated by the last multiplication and addition operation; the addition unit is used to perform the final addition operation on the compression result received from the compression unit, and output the product and the carry. The invention also discloses a method for realizing multiplication of long data by using a multiply-accumulate-add instruction. The invention greatly improves the operation efficiency and speed of the safety processor when performing multiplication operation of long data.

Figure 200610164874

Description

Use instruction of multiply adding addition to realize the device and method of long data multiplication
Technical field
The present invention relates to the safe processor technical field, relate in particular to a kind of device and method that uses instruction of multiply adding addition to realize the long data multiplication.
Background technology
Along with the development and the widespread use of computing machine and infotech, information security seems and becomes more and more important that this makes just needs the support of consideration to the information security specific operation when chip design.
A lot of operations that information security is handled are based on carries out encryption and decryption to numerical information, in order to improve the efficient of this generic operation, is necessary to design special safe processor.
Cryptographic algorithm often need be realized the multiplication between the big data, and the length of these big data often exceeded the word length of safe processor, claims that generally this big number is a long data.So how to realize on the long safe processor of short word that the long data multiplication becomes one of gordian technique of safe processor design.
Usually way is that the decomposition long data is the sequence of a plurality of machine word-lengths, adopts common multiplying order to obtain partial product and carry respectively then, and corresponding sum of products carry then adds up.
In this way and since each intermediate result and carry all need one independently add instruction handle, so the efficient of computing is very low when having caused carrying out the long data multiplication on the long safe processor of short word, the speed of computing is also very low.
Summary of the invention
(1) technical matters that will solve
In view of this, one object of the present invention is to provide a kind of device that uses instruction of multiply adding addition to realize the long data multiplication, to improve the operation efficiency and the speed of safe processor when the multiplying of executive chairman's data.
Another object of the present invention is to provide a kind of method of using instruction of multiply adding addition to realize the long data multiplication, to improve operation efficiency and the speed of safe processor when the multiplying of executive chairman's data.
(2) technical scheme
For achieving the above object, the invention provides a kind of device that uses instruction of multiply adding addition to realize the long data multiplication, this device comprises:
The partial product generation unit is used to realize the product of first source operand and second source operand, and the product that obtains is exported to compression unit;
Compression unit is used to realize the compression of partial product, the 3rd source operand and the 4th source operand of the input of partial product generation unit, and compression result is exported to adder unit; Described the 3rd source operand is the intermediate result of respective weights, and the 4th source operand adds the carry that computing produces for last taking advantage of;
Adder unit is used for the compression result that is received from compression unit is carried out final add operation, the long-pending and carry of output.
Described partial product generation unit is one group and gate logic.
Described compression unit is made of 3 grades of compression assemblies, is respectively the 0th grade of compression assembly, the 1st grade of compression assembly and the 2nd grade of compression assembly.
Described the 0th grade of compression assembly is 3, and the 1st grade of compression assembly is 2, and the 2nd grade of compression assembly is 1.
Described first source operand, second source operand and the 3rd source operand leave in the random access memory, can adopt many bodies memory technology to obtain simultaneously.
Described the 4th source operand is fixed as carry storage register.
For reaching above-mentioned another purpose, the invention provides a kind of method of using instruction of multiply adding addition to realize the long data multiplication, this method comprises:
A, two long datas needs being done multiplication are decomposed into the first source operand sequence and the second source operand sequence by machine word-length, the various piece of the various piece of the first source operand sequence and the second source operand sequence carried out to take advantage of successively add computing, the intermediate result that described the 3rd source operand in the computing is a respective weights, the 4th source operand adds the carry that computing produces for last taking advantage of;
B, when the various piece of part of the first source operand sequence and the second source operand sequence execute take advantage of for one time add computing after, the value in the carry storage register is saved in the random access memory;
C, when the first source operand sequence various piece and the second source operand sequence various piece all execute take advantage of add computing after, in random access memory, just obtain net result.
Described instruction of multiply adding addition is: operational code first source operand, second source operand, the 3rd source operand, destination operand.
Described instruction of multiply adding addition is used for realizing { carry storage register, destination operand } ← first source operand * second source operand+the 3rd source operand+the 4th source operand; Wherein, described first source operand, second source operand and described the 3rd source operand leave in the random access memory, and described the 4th source operand is a carry storage register, lies in the operational code.
The result space size that is used to preserve net result in the described random access memory is the twice of source operand length.
Be used to preserve the result space of net result in the described random access memory, preserve the intermediate result that produces in calculating process, the result who deposits in the result space after computing is finished is exactly a net result.
Described intermediate result leaves in the described result space units corresponding according to the weights of self.
(3) beneficial effect
From technique scheme as can be seen, the present invention has following beneficial effect:
1, utilize use instruction of multiply adding addition provided by the invention to realize the device and method of long data multiplication, the corresponding unit of corresponding power is as the source operand of instruction of multiply adding addition in can be carry and result space when realizing the long data multiplication, avoid the execution of twice add instruction, improved operation efficiency and the speed of safe processor when the multiplying of executive chairman's data greatly.
2, utilize use instruction of multiply adding addition provided by the invention to realize the device and method of long data multiplication, owing to when carrying out the long data multiplication, allow the intermediate result and the carry that produce in the computing in time participate in computing as the 3rd source operand and the 4th source operand, do not need to carry out in addition add instruction and handle, so improved the speed of on the long safe processor of short word, carrying out the long data multiplication.
3, utilize use instruction of multiply adding addition provided by the invention to realize the device and method of long data multiplication, owing to directly leave the intermediate result of calculating in be used to preserve net result in the random access memory result space, needn't utilize other storage spaces to deposit intermediate result, so saved the parking space of intermediate result, also reduced the complicacy of control simultaneously.
Description of drawings
Fig. 1 is the structural representation that use instruction of multiply adding addition provided by the invention is realized the device of long data multiplication.
Fig. 2 is the method flow diagram that use instruction of multiply adding addition provided by the invention is realized the long data multiplication.
Fig. 3 uses ordinary instruction to realize the synoptic diagram of long data multiplication in the prior art; Herein, the operand with the twice word length is an example.
Fig. 4 is the data flow diagram that use instruction of multiply adding addition provided by the invention is realized the long data multiplication.
Fig. 5 uses instruction of multiply adding addition to realize the synoptic diagram of long data multiplication according to the embodiment of the invention; Herein, the operand with the twice word length is an example.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
As shown in Figure 1, Fig. 1 is the structural representation that use instruction of multiply adding addition provided by the invention is realized the device of long data multiplication.The function that this device is realized is: { carry storage register, destination operand } ← first source operand * second source operand+the 3rd source operand+the 4th source operand.This device comprises partial product generation unit 10, compression unit 11 and adder unit 12.Wherein, partial product generation unit 10 is one group and gate logic, is used to realize the product of first source operand and second source operand, and the product that obtains is exported to compression unit 11.Compression unit 11 is made of 3 grades of compression assemblies, be respectively 3 the 0th grade of compression assemblies, 2 the 1st grade of compression assemblies and 1 the 2nd grade of compression assembly, be used to realize product, the 3rd source operand and the 4th source operand three's of 10 inputs of partial product generation unit compression, compression result is exported to adder unit 12.Described the 3rd source operand is the intermediate result of respective weights, and the 4th source operand adds the carry that computing produces for last taking advantage of.Adder unit 12 is used for the compression result that is received from compression unit 11 is carried out add operation, the long-pending and carry of output.
Described first source operand, second source operand and the 3rd source operand leave in random access memory (Random Access Memory, RAM) in, adopt many bodies memory technology to obtain simultaneously, can not increase the operand fetch time; The 4th source operand is fixed as carry storage register, and (Carry Register CR), lies in the operational code, and this operand need not show and provides when using instruction.
Realize the structural representation of the device of long data multiplication based on use instruction of multiply adding addition shown in Figure 1, Fig. 2 shows the method flow diagram that use instruction of multiply adding addition provided by the invention is realized the long data multiplication.This method uses instruction of multiply adding addition to realize the long data multiplication, amasss with carry in the middle of allowing and in time participates in computing, has improved arithmetic speed greatly, specifically may further comprise the steps:
Step 201: two long datas needs being done multiplication decompose the first source operand sequence and the second source operand sequence by machine word-length, the various piece of the first source operand sequence and various piece, intermediate result and the carry of the second source operand sequence are carried out instruction of multiply adding addition successively, the intermediate result that described the 3rd source operand in the computing is a respective weights, the 4th source operand adds the carry that computing produces for last taking advantage of;
Step 202: when various piece, intermediate result and the carry of part of the first source operand sequence and the second source operand sequence execute take advantage of for one time add computing after, the value in the carry storage register is saved in the random access memory;
Step 203: when the first source operand sequence various piece and the second source operand sequence various piece, intermediate result and carry all execute take advantage of add computing after, in random access memory, just obtain net result.
The form of above-mentioned instruction of multiply adding addition is: operational code first source operand, second source operand, the 3rd source operand, destination operand.Described instruction of multiply adding addition is used to be achieved as follows function: { carry storage register, destination operand } ← first source operand * second source operand+the 3rd source operand+the 4th source operand; Wherein, described first source operand, second source operand and described the 3rd source operand leave in the random access memory, and described the 4th source operand is fixed as carry storage register, lies in the operational code.
The result space size that is used to preserve net result in the above-mentioned random access memory is the twice of source operand length.Be used to preserve the result space of net result in the random access memory, preserve the intermediate result that produces in calculating process, the result who deposits in the result space after computing is finished is exactly a net result.Intermediate result leaves in the described result space units corresponding according to the weights of self.
Common multiplying order and add instruction be two source operands normally.Introduce the flow process of using ordinary instruction to realize the long data multiplication with an example below.
As shown in Figure 3, Fig. 3 uses ordinary instruction to realize the synoptic diagram of long data multiplication in the prior art.Herein, the operand with the twice word length is an example.Machine word-length is a byte, and A, B length are two bytes, will realize A * B now.A, B are decomposed into sequence A 1A0, the B1B0 that length is a byte.False code among Fig. 3 has been represented the flow process details of computing, at first removes to take advantage of A1A0 with B0, is capped for preventing the value in the carry storage register, needs in time to shift the value in the carry storage register before next bar multiplying order is carried out, and realizes with an ADD instruction in this example; Take advantage of A1A0 with B1 then.Except multiplying each other, all need with ADD instruction partial product that adds up with A0.
Use the realization long data multiplication of instruction of multiply adding addition, the instruction number that needs will be less than common multiplying order.As shown in Figure 4, Fig. 4 is the data flow diagram that use instruction of multiply adding addition provided by the invention is realized the long data multiplication.Two numbers of n machine word-length multiply each other, and they leave in respectively in n the ram cell; Have 2n unit to deposit the result among the RAM in addition, intermediate result and net result all leave in this 2n internal storage location.Carry storage register is deposited to take advantage of at every turn and is added the carry that computing produces.
Fig. 5 uses instruction of multiply adding addition to realize the synoptic diagram of long data multiplication according to the embodiment of the invention.Herein, be example also with the operand of twice word length.A, B length are two bytes, will realize A * B now.A, B are decomposed into sequence A 1A0, the B1B0 that length is a byte.This process can be divided into following 6 steps:
(1), MACA0, B0,0, E0, expression E0 ← A0*B0+0, this moment is the value of add carry register not;
(2), MAAA1, B0,0, E1, the expression E1 ← A1*B0+0+CR;
(3), MAA0,0,0, E2, the expression E2 ← 0*0+0+CR, promptly the value of carry storage register CR is saved in E2;
(4), MACA0, B1, E1, E1, the expression E1 ← A0*B1+E1;
(5), MAAA1, B1, E2, E2, the expression E2 ← A1*B1+E2+CR;
(6), MAA0,0,0, E3, the expression E3 ← 0*0+0+CR, promptly the value of carry storage register CR is saved in E3.
Through after above-mentioned 6 steps, the value of depositing among the E3E2E1E0 is the net result of long A * B.
Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of being made, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1, a kind of device that uses instruction of multiply adding addition to realize the long data multiplication is characterized in that this device comprises:
The partial product generation unit is used to realize the product of first source operand and second source operand, and the product that obtains is exported to compression unit;
Compression unit is used to realize the compression of partial product, the 3rd source operand and the 4th source operand of the input of partial product generation unit, and compression result is exported to adder unit; Described the 3rd source operand is the intermediate result of respective weights, and the 4th source operand adds the carry that computing produces for last taking advantage of;
Adder unit is used for the compression result that is received from compression unit is carried out final add operation, the long-pending and carry of output.
2, use instruction of multiply adding addition according to claim 1 is realized the device of long data multiplication, it is characterized in that described partial product generation unit is one group and gate logic.
3, use instruction of multiply adding addition according to claim 1 is realized the device of long data multiplication, it is characterized in that described compression unit is made of 3 grades of compression assemblies, is respectively the 0th grade of compression assembly, the 1st grade of compression assembly and the 2nd grade of compression assembly.
4, use instruction of multiply adding addition according to claim 3 is realized the device of long data multiplication, it is characterized in that described the 0th grade of compression assembly is 3, and the 1st grade of compression assembly is 2, and the 2nd grade of compression assembly is 1.
5, use instruction of multiply adding addition according to claim 1 is realized the device of long data multiplication, it is characterized in that described first source operand, second source operand and the 3rd source operand leave in the random access memory.
6, use instruction of multiply adding addition according to claim 1 is realized the device of long data multiplication, it is characterized in that described the 4th source operand is fixed as carry storage register.
7, a kind of method of using instruction of multiply adding addition to realize the long data multiplication is characterized in that this method comprises:
A, two long datas needs being done multiplication are decomposed into the first source operand sequence and the second source operand sequence by machine word-length, the various piece of the various piece of the first source operand sequence and the second source operand sequence carried out to take advantage of successively add computing, the intermediate result that described the 3rd source operand in the computing is a respective weights, the 4th source operand adds the carry that computing produces for last taking advantage of;
B, when the various piece of part of the first source operand sequence and the second source operand sequence execute take advantage of for one time add computing after, the value in the carry storage register is saved in the random access memory;
C, when the first source operand sequence various piece and the second source operand sequence various piece all execute take advantage of add computing after, in random access memory, just obtain net result.
8, use instruction of multiply adding addition according to claim 7 is realized the method for long data multiplication, it is characterized in that described instruction of multiply adding addition is: operational code first source operand, second source operand, the 3rd source operand, destination operand.
9, realize the method for long data multiplication according to claim 7 or 8 described use instruction of multiply adding addition, it is characterized in that, described instruction of multiply adding addition is used for realizing { carry storage register, destination operand } ← first source operand * second source operand+the 3rd source operand+the 4th source operand;
Wherein, described first source operand, second source operand and described the 3rd source operand leave in the random access memory, and described the 4th source operand is a carry storage register, lies in the operational code.
10, use instruction of multiply adding addition according to claim 7 is realized the method for long data multiplication, it is characterized in that, the result space size that is used to preserve net result in the described random access memory is the twice of source operand length.
11, use instruction of multiply adding addition according to claim 10 is realized the method for long data multiplication, it is characterized in that, be used to preserve the result space of net result in the described random access memory, preserve the intermediate result that produces in calculating process, the result who deposits in the result space after computing is finished is exactly a net result.
12, use instruction of multiply adding addition according to claim 11 is realized the method for long data multiplication, it is characterized in that described intermediate result leaves in the described result space units corresponding according to the weights of self.
CNB2006101648746A 2006-12-07 2006-12-07 Device and method for realizing multiplication of long data by using multiply-accumulate-add instruction Active CN100378654C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101648746A CN100378654C (en) 2006-12-07 2006-12-07 Device and method for realizing multiplication of long data by using multiply-accumulate-add instruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101648746A CN100378654C (en) 2006-12-07 2006-12-07 Device and method for realizing multiplication of long data by using multiply-accumulate-add instruction

Publications (2)

Publication Number Publication Date
CN1963746A true CN1963746A (en) 2007-05-16
CN100378654C CN100378654C (en) 2008-04-02

Family

ID=38082820

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101648746A Active CN100378654C (en) 2006-12-07 2006-12-07 Device and method for realizing multiplication of long data by using multiply-accumulate-add instruction

Country Status (1)

Country Link
CN (1) CN100378654C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861756A (en) * 2011-12-22 2018-03-30 英特尔公司 addition instruction with independent carry chain
CN110428247A (en) * 2019-07-02 2019-11-08 常州市常河电子技术开发有限公司 The variable weight value Fast implementation of multiplication and divisions is counted in asymmetric encryption calculating greatly

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5862067A (en) * 1995-12-29 1999-01-19 Intel Corporation Method and apparatus for providing high numerical accuracy with packed multiply-add or multiply-subtract operations
JPH1049348A (en) * 1996-08-05 1998-02-20 Toshiba Corp Integer multiplicator
US7627114B2 (en) * 2002-10-02 2009-12-01 International Business Machines Corporation Efficient modular reduction and modular multiplication

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107861756A (en) * 2011-12-22 2018-03-30 英特尔公司 addition instruction with independent carry chain
US11080045B2 (en) 2011-12-22 2021-08-03 Intel Corporation Addition instructions with independent carry chains
CN107861756B (en) * 2011-12-22 2022-04-15 英特尔公司 Add instructions with independent carry chains
US11531542B2 (en) 2011-12-22 2022-12-20 Intel Corporation Addition instructions with independent carry chains
CN110428247A (en) * 2019-07-02 2019-11-08 常州市常河电子技术开发有限公司 The variable weight value Fast implementation of multiplication and divisions is counted in asymmetric encryption calculating greatly

Also Published As

Publication number Publication date
CN100378654C (en) 2008-04-02

Similar Documents

Publication Publication Date Title
CN105892989B (en) Neural network accelerator and operational method thereof
CN101986264B (en) Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor
CN109791488A (en) For executing the system and method for being used for the fusion multiply-add instruction of plural number
Wang et al. VLSI design of a large-number multiplier for fully homomorphic encryption
CN110018850A (en) For can configure equipment, the method and system of the multicast in the accelerator of space
CN103699360B (en) A kind of vector processor and carry out vector data access, mutual method
CN104126170B (en) Packaged data operation mask register arithmetic combining processor, method, system and instruction
CN104395876B (en) There is no the computer processor of arithmetic and logic unit and system
CN105302525B (en) Method for parallel processing for the reconfigurable processor of multi-level heterogeneous structure
CN104579656A (en) Hardware acceleration coprocessor for elliptic curve public key cryptosystem SM2 algorithm
CN113705794B (en) A Neural Network Accelerator Design Method Based on Dynamic Activation Bit Sparse
CN108415882A (en) Utilize the vector multiplication of operand basic system conversion and reconvert
CN104679474A (en) Multiplying unit on finite field GF (2 227) and modular multiplication algorithm
CN107003832A (en) Method and apparatus for performing big integer arithmetic operations
CN115576606B (en) Method for realizing matrix transposition multiplication, coprocessor, server and storage medium
CN117155572A (en) A method to implement large integer multiplication in cryptographic technology in parallel based on GPU
CN116679905A (en) BRAM-based iterative NTT staggered storage system
CN101980182A (en) Parallel Computing Method Based on Matrix Operation
CN111736802B (en) Multiplier design device based on operand is tailor
Tai et al. Accelerating matrix operations with improved deeply pipelined vector reduction
CN103677735B (en) A kind of data processing equipment and digital signal processor
CN1963746A (en) Apparatus and method to realize multiplication of long data by instruction of multiply adding addition
CN101840324A (en) 64-bit fixed and floating point multiplier unit supporting complex operation and subword parallelism
CN101300544B (en) Large number multiplication method and device
CN112631546B (en) High-performance modular multiplier based on KO-8 algorithm

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant