[go: up one dir, main page]

CN103699358B - A kind of Fast Modular square operation circuit being applicable to big number - Google Patents

A kind of Fast Modular square operation circuit being applicable to big number Download PDF

Info

Publication number
CN103699358B
CN103699358B CN201310653889.9A CN201310653889A CN103699358B CN 103699358 B CN103699358 B CN 103699358B CN 201310653889 A CN201310653889 A CN 201310653889A CN 103699358 B CN103699358 B CN 103699358B
Authority
CN
China
Prior art keywords
bit
circuit
output
input
data selector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310653889.9A
Other languages
Chinese (zh)
Other versions
CN103699358A (en
Inventor
雷绍充
魏晓彤
马璐钖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201310653889.9A priority Critical patent/CN103699358B/en
Publication of CN103699358A publication Critical patent/CN103699358A/en
Application granted granted Critical
Publication of CN103699358B publication Critical patent/CN103699358B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

本发明公开了一种适用于大数的快速模平方运算电路,该电路结构包括:掐头去尾移位补值电路,两个二输入与门阵列,一级进位保存加法器(CSA)结构,一级全加器(FA)单元,以及一系列扫描寄存器。本发明将平方运算按多项式乘法展开,原先的m个部分积求和压缩成m/2个部分积求和,且从高位向低位累加,因此平方运算时间减少为原来的一半。

The invention discloses a fast modular square operation circuit suitable for large numbers. The circuit structure includes: a head-to-tail shift compensation circuit, two two-input AND gate arrays, and a one-stage carry-save adder (CSA) structure , a full adder (FA) unit, and a series of scan registers. The invention expands the square operation according to polynomial multiplication, compresses the original sum of m partial products into m/2 sum of partial products, and accumulates from high to low, so the square operation time is reduced to half of the original.

Description

一种适用于大数的快速模平方运算电路A Fast Modular Square Operation Circuit for Large Numbers

技术领域technical field

本发明涉及集成电路设计领域,具体涉及一种适用于大数的快速模平方运算电路。The invention relates to the field of integrated circuit design, in particular to a fast modular square operation circuit suitable for large numbers.

背景技术Background technique

目前,对于大数平方的研究通常采用的方案是蒙哥马利算法,该算法在平方的运算过程所耗费的时间和输入数据的长度成正比。At present, the research on the square of large numbers usually adopts the Montgomery algorithm, and the time spent by the algorithm in the square operation process is proportional to the length of the input data.

有鉴于此,有必要研究一种新的平方算法,通过对运算过程的部分积的优化,减少部分积的次数,从而减少整个平方的运行时间,解决上述问题。In view of this, it is necessary to study a new square algorithm, by optimizing the partial product of the operation process, reducing the number of partial products, thereby reducing the running time of the entire square, and solving the above problems.

发明内容Contents of the invention

本发明的目的在于提供一种能够有效减少模平方运算时间的适用于大数的快速模平方运算电路。The object of the present invention is to provide a fast modular square operation circuit suitable for large numbers that can effectively reduce the time for modular square operation.

为了达到上述目的,本发明所采用的技术方案是:包括向左移位电路、m位二选一数据选择器阵列、m位二输入与门阵列、m位部分积产生电路、全加器FA阵列、m+3位扫描寄存器、约简电路以及带有一个m/2位的johnson循环移位计数器的掐头去尾移位补值电路;掐头去尾移位补值电路的输入为m位的平方运算输入项A,输出为掐头去尾移位补值电路中m位寄存器的输出Q,同时输出掐头去尾移位补值电路中m/2位johnson循环移位计数器中寄存器的输出Qc;向左移位电路的输入为平方输入项A的低m/2位;m位部分积产生电路的输入端与m位二输入与门阵列的输出端相连,m位部分积产生电路的输出端与全加器FA阵列的输入端相连,全加器FA阵列通过m+3位扫描寄存器与简约电路相连;其中,160≤m≤15360。In order to achieve the above object, the technical solution adopted in the present invention is: comprising a leftward shift circuit, an m-bit two-to-one data selector array, an m-bit two-input AND gate array, an m-bit partial product generation circuit, a full adder FA Array, m+3-bit scan register, reduction circuit, and a shift-complement circuit with a johnson circular shift counter of m/2 bits; the input of the shift-complement circuit is m Bit square operation input item A, the output is the output Q of the m-bit register in the pinching head and tail shifting compensation circuit, and simultaneously outputs the register in the m/2-bit johnson circular shift counter in the pinching head and tail shifting compensation circuit The output Qc of the leftward shift circuit is the low m/2 bit of the square input item A; the input terminal of the m-bit partial product generation circuit is connected with the output end of the m-bit two-input AND gate array, and the m-bit partial product is generated The output terminal of the circuit is connected with the input terminal of the full adder FA array, and the full adder FA array is connected with the simple circuit through the m+3 scanning register; wherein, 160≤m≤15360.

所述的向左移位电路,每一个时钟上升沿到来时,向左移位电路中寄存器的值左移一位,并把低位补零,同时将最高位寄存器的输出端引出,定义为AL。In the leftward shift circuit, when each rising edge of the clock arrives, the value of the register in the leftward shift circuit is shifted to the left by one bit, and the lower bits are filled with zeros, and at the same time, the output terminal of the highest bit register is drawn out, which is defined as AL .

所述的m位二选一数据选择器阵列包括m位二选一数据选择器,其中m位的二选一数据选择器的控制信号由掐头去尾移位补值电路中的Johnson循环移位计数器输出Qc产生;其中,第一位二选一数据选择器和第二位二选一数据选择器的控制信号为Qc的第二位数值,第三位二选一数据选择器和第四位二选一数据选择器的控制信号为Qc的第三位数值,以此类推,第m-3位二选一数据选择器和第m-2位二选一数据选择器的制信号为Qc的第m/2位数值,第m-1位二选一数据选择器和第m位二选一数据选择器的制信号置为1;m位二选一数据选择器的“0”端接向左移位电路的输出AL,“1”端接掐头去尾移位补值电路的输出Q的最高位。The m-bit optional one data selector array includes an m-bit optional one-two data selector, wherein the control signal of the m-bit optional one data selector is shifted cyclically by Johnson in the head-cutting and tail-shifting compensation circuit. Bit counter output Qc is generated; wherein, the control signal of the first two-to-one data selector and the second two-to-one data selector is the second digit value of Qc, the third two-to-one data selector and the fourth The control signal of the two-bit one-to-one data selector is the third digit value of Qc, and so on, the control signal of the m-3 two-to-one data selector and the m-2 two-to-one data selector is Qc The value of the m/2th bit of the m-1 bit of the two-alternative data selector and the control signal of the m-th bit of the second-alternative data selector are set to 1; the "0" terminal of the m-bit two-alternative data selector is connected The output AL of the shift circuit to the left, "1" is connected to the most significant bit of the output Q of the head-cutting and tail-shifting compensation circuit.

所述的m位二输入与门的输入端中的一个端口与m位二选一数据选择器阵列的m位对应相连,另一个输入端按如下方式连接:One port in the input end of the described m two-input AND gate is correspondingly connected with the m bit of the m-bit two-to-one data selector array, and the other input end is connected as follows:

第一位与门连接掐头去尾移位补值电路的输出Q的最低位,第二位与门连接Q的第二位,以此类推,第m-1位与门连接Q的次高位的反码,第m位与门连接Q的次高位;m位与门阵列的输出为最终的m位部分积输出。The first AND gate is connected to the lowest bit of the output Q of the head-to-tail shift compensation circuit, the second AND gate is connected to the second bit of Q, and so on, the m-1th AND gate is connected to the second highest bit of Q The inverse code of the m-bit AND gate is connected to the second highest bit of Q; the output of the m-bit AND gate array is the final m-bit partial product output.

所述的m位部分积产生电路,在部分积产生后,送入m个全加器FA中作为输入,其中,最低位的全加器的进位输入连接一个二选一数据选择器的输出,二选一数据选择器的“0”端连接掐头去尾移位补值电路的m位输出Q的最低位Q[0],“1”端输入零,二选一数据选择器的控制信号为掐头去尾移位补值电路中Johnson循环移位计数器的输出Qc的第二位Qc[1];其余全加器的进位输入均为来自低位的进位输出。The m-bit partial product generation circuit, after the partial product is generated, is sent to m full adders FA as input, wherein the carry input of the lowest full adder is connected to the output of a two-to-one data selector, The "0" end of the two-to-one data selector is connected to the lowest bit Q[0] of the m-bit output Q of the head-clamping and tail-removing shift compensation circuit, and the "1" terminal inputs zero, and the control signal of the two-to-one data selector It is the second bit Qc[1] of the output Qc of the Johnson cyclic shift counter in the head-to-tail and shift-compensation circuit; the carry-in inputs of other full adders are all carry-in outputs from the lower bits.

所述的m位全加器FA的和位输出送入扫描寄存器的输入端“0”端;最低位扫描寄存器的“1”端连接一个二选一数据选择器的输出端,二选一数据选择器的输入“0”端连接平方运算输入项A的最低位A[0],“1”端输入零,二选一数据选择器的控制信号为Sel,该信号由掐头去尾移位补值电路中Johnson计数器的输出端Qc的最低位和最高位相异或产生;第二位扫描寄存器的“1”端输入零,第三位扫描寄存器的“1”端输入第一位全加器FA的和位,第四位扫描寄存器的“1”端输入第二位全加器FA的和位,以此类推,第m+2位扫描寄存器的“1”端输入第m位全加器FA的和位,第m+3位扫描寄存器的“1”端输入第m位全加器FA的进位输出。The sum bit output of the m-bit full adder FA is sent to the input end "0" of the scan register; the "1" end of the lowest bit scan register is connected to the output end of a two-to-one data selector, and the two-to-one data The input "0" terminal of the selector is connected to the lowest bit A[0] of the square operation input item A, and the "1" terminal inputs zero, and the control signal of the two-choice data selector is Sel, which is shifted by pinching the head and removing the tail The lowest bit and the highest bit of the output terminal Qc of the Johnson counter in the complement circuit are XORed; the "1" end of the second bit scan register inputs zero, and the "1" end of the third bit scan register enters the first full adder The sum bit of FA, the "1" end of the fourth scan register is input to the sum bit of the second full adder FA, and so on, the "1" end of the m+2 scan register is input to the m full adder The sum bit of FA, the "1" end of the m+3 scan register is input to the carry output of the m-th full adder FA.

所述的m+3位扫描寄存器的输出端送入约简电路进行约简。The output terminal of the m+3-bit scanning register is sent to the reduction circuit for reduction.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

本发明部分积产生电路将m个部分积压缩为m/2个,最终平方运算采用从高位到低位运算,第一个部分积产生后向左移两位,送入约简电路约简后与第二个部分积相加,完成一次累加,以此类推,直到第m/2个部分积产生,完成m/2次累加,最终完成模平方运算。The partial product generating circuit of the present invention compresses m partial products into m/2, and the final square operation adopts an operation from high to low, and the first partial product is generated and shifted to the left by two bits, and then sent to the reduction circuit for reduction and then combined with Add the second partial product to complete an accumulation, and so on until the m/2th partial product is generated, complete m/2 accumulations, and finally complete the modular square operation.

进一步的,本发明将平方运算按多项式乘法展开,原先的m个部分积求和压缩成m/2个部分积求和,且从高位向低位累加,因此平方运算时间减少为原来的一半。Furthermore, the present invention expands the square operation according to polynomial multiplication, and compresses the original sum of m partial products into m/2 sum of partial products, and accumulates from high to low, so the square operation time is reduced to half of the original.

附图说明Description of drawings

图1为本发明的电路结构示意图;Fig. 1 is the schematic diagram of circuit structure of the present invention;

图2为本发部分积累加及约简的电路结构示意图。Fig. 2 is a schematic diagram of the circuit structure of the partial accumulation and reduction of the present invention.

具体实施方式detailed description

下面结合附图和实施例对本发明作进一步详细的说明:Below in conjunction with accompanying drawing and embodiment the present invention will be described in further detail:

参见图1和图2,本发明包括向左移位电路、m位二选一数据选择器阵列、m位二输入与门阵列、m位部分积产生电路、全加器FA阵列、m+3位扫描寄存器、约简电路以及带有一个m/2位的johnson循环移位计数器的掐头去尾移位补值电路;掐头去尾移位补值电路的输入为m位的平方运算输入项A,输出为掐头去尾移位补值电路中m位寄存器的输出Q,同时输出掐头去尾移位补值电路中m/2位johnson循环移位计数器中寄存器的输出Qc;向左移位电路的输入为平方输入项A的低m/2位;向左移位电路,每一个时钟上升沿到来时,向左移位电路中寄存器的值左移一位,并把低位补零,同时将最高位寄存器的输出端引出,定义为AL。m位部分积产生电路的输入端与m位二输入与门阵列的输出端相连,m位二输入与门的输入端中的一个端口与m位二选一数据选择器阵列的m位对应相连,另一个输入端按如下方式连接:第一位与门连接掐头去尾移位补值电路的输出Q的最低位,第二位与门连接Q的第二位,以此类推,第m-1位与门连接Q的次高位的反码,第m位与门连接Q的次高位;m位与门阵列的输出为最终的m位部分积输出。m位部分积产生电路的输出端与全加器FA阵列的输入端相连,全加器FA阵列通过m+3位扫描寄存器与简约电路相连,m+3位扫描寄存器的输出端送入约简电路进行约简;其中,160≤m≤15360。Referring to Fig. 1 and Fig. 2, the present invention comprises shifting circuit to the left, m-bit two-to-one data selector array, m-bit two-input AND gate array, m-bit partial product generating circuit, full adder FA array, m+3 A bit scan register, a reduction circuit, and an m/2-bit johnson circular shift counter with a head-to-tail shift-complement circuit; the input to the head-to-tail shift-to-compensation circuit is an m-bit square operation input Item A, the output is the output Q of the m-bit register in the shift compensation circuit for pinching the head and removing the tail, and outputs the output Qc of the register in the m/2 johnson circular shift counter in the shift compensation circuit for pinching the head and removing the tail simultaneously; The input of the left shift circuit is the low m/2 bits of the square input item A; for the left shift circuit, when each rising edge of the clock arrives, the value of the register in the left shift circuit is shifted to the left by one bit, and the low bit is complemented Zero, at the same time lead out the output terminal of the highest register, defined as AL. The input end of the m-bit partial product generation circuit is connected to the output end of the m-bit two-input AND gate array, and one port of the input end of the m-bit two-input AND gate is connected to the m-bit corresponding to the m-bit two-to-one data selector array , and the other input terminal is connected as follows: the first AND gate is connected to the lowest bit of the output Q of the head and tail shift compensation circuit, the second AND gate is connected to the second bit of Q, and so on, the mth The -1-bit AND gate is connected to the inverse code of the second highest bit of Q, and the m-th bit AND gate is connected to the second highest bit of Q; the output of the m-bit AND gate array is the final m-bit partial product output. The output end of the m-bit partial product generation circuit is connected to the input end of the full adder FA array, the full adder FA array is connected to the reduction circuit through the m+3-bit scanning register, and the output end of the m+3-bit scanning register is sent to the reduction circuit The circuit is reduced; among them, 160≤m≤15360.

m位二选一数据选择器阵列包括m位二选一数据选择器,其中m位的二选一数据选择器的控制信号由掐头去尾移位补值电路中的Johnson循环移位计数器输出Qc产生;其中,第一位二选一数据选择器和第二位二选一数据选择器的控制信号为Qc的第二位数值,第三位二选一数据选择器和第四位二选一数据选择器的控制信号为Qc的第三位数值,以此类推,第m-3位二选一数据选择器和第m-2位二选一数据选择器的制信号为Qc的第m/2位数值,第m-1位二选一数据选择器和第m位二选一数据选择器的制信号置为1;m位二选一数据选择器的“0”端接向左移位电路的输出AL,“1”端接掐头去尾移位补值电路的输出Q的最高位。The m-bit 2-to-1 data selector array includes m 2-to-1 data selectors, wherein the control signal of the m-bit 2-to-1 data selector is output by the Johnson circular shift counter in the head-to-head and tail-to-tail shift compensation circuit Qc is generated; wherein, the control signal of the first two-to-one data selector and the second two-to-one data selector is the second digit value of Qc, the third two-to-one data selector and the fourth two-to-one data selector The control signal of a data selector is the third digit value of Qc, and so on, the control signal of the m-3 bit two-to-one data selector and the m-2-th bit two-to-one data selector is the mth value of Qc /2-digit value, the control signal of the m-1th bit 2-alternative data selector and the m-th bit 2-alternative data selector is set to 1; the "0" terminal of the m-bit 2-alternative data selector is shifted to the left The output AL of the bit circuit, "1" is terminated to the highest bit of the output Q of the head-cutting and tail-shifting compensation circuit.

m位部分积产生电路,在部分积产生后,送入m个全加器FA中作为输入,其中,最低位的全加器的进位输入连接一个二选一数据选择器的输出,二选一数据选择器的“0”端连接掐头去尾移位补值电路的m位输出Q的最低位Q[0],“1”端输入零,二选一数据选择器的控制信号为掐头去尾移位补值电路中Johnson循环移位计数器的输出Qc的第二位Qc[1];其余全加器的进位输入均为来自低位的进位输出。The m-bit partial product generation circuit, after the partial product is generated, is sent to m full adders FA as input, wherein the carry input of the lowest full adder is connected to the output of a two-to-one data selector, and two to one The "0" terminal of the data selector is connected to the lowest bit Q[0] of the m-bit output Q of the pinch head and tail shift compensation circuit, and the "1" terminal inputs zero, and the control signal of the two-choice data selector is the pinch head The second bit Qc[1] of the output Qc of the Johnson circular shift counter in the tail shift complement circuit; the carry inputs of the other full adders are all carry outputs from the lower bits.

m位全加器FA的和位输出送入扫描寄存器的输入端“0”端;最低位扫描寄存器的“1”端连接一个二选一数据选择器的输出端,二选一数据选择器的输入“0”端连接平方运算输入项A的最低位A[0],“1”端输入零,二选一数据选择器的控制信号为Sel,该信号由掐头去尾移位补值电路中Johnson计数器的输出端Qc的最低位和最高位相异或产生;第二位扫描寄存器的“1”端输入零,第三位扫描寄存器的“1”端输入第一位全加器FA的和位,第四位扫描寄存器的“1”端输入第二位全加器FA的和位,以此类推,第m+2位扫描寄存器的“1”端输入第m位全加器FA的和位,第m+3位扫描寄存器的“1”端输入第m位全加器FA的进位输出。The sum bit output of the m-bit full adder FA is sent to the input terminal "0" of the scanning register; the "1" terminal of the lowest bit scanning register is connected to the output terminal of a two-to-one data selector, and the output terminal of the two-to-one data selector The input "0" terminal is connected to the lowest bit A[0] of the square operation input item A, and the "1" terminal inputs zero, and the control signal of the data selector is Sel, and the signal is determined by the head-to-tail shift compensation circuit The lowest bit and the highest bit of the output terminal Qc of the Johnson counter in the middle are XORed; the "1" end of the second scan register inputs zero, and the "1" end of the third scan register inputs the sum of the first full adder FA Bit, the "1" end of the fourth scan register is input to the sum of the second full adder FA, and so on, the "1" end of the m+2 scan register is input to the sum of the m full adder FA Bit, the "1" end of the m+3 scan register is input to the carry output of the m-th full adder FA.

本发明适用于大数模平方运算的快速平方算法的电路实现结构包括:掐头去尾移位补值电路、向左移位电路、一系列二选一电路、二输入与门阵列、全加器FA阵列和一系列扫描寄存器和约简电路。其中,掐头去尾移位补值电路中包含一个m/2位的johnson循环移位计数器。掐头去尾移位补值电路的输入为m位的平方输入项A,输出为掐头去尾移位补值电路中m位寄存器的输出Q,同时输出m/2位johnson循环移位计数器中寄存器的输出Qc。向左移位电路的输入为平方输入项A的低m/2位,每一个时钟上升沿到来时,向左移位电路中寄存器的值左移一位,并把低位补零,同时将最高位寄存器的输出端引出,定义为AL。m位二选一阵列的控制信号主要由掐头去尾移位补值电路中的Johnson循环移位计数器输出Qc产生。第一位和第二位二选一的控制信号为Qc的第二位数值,第三位和第四位二选一的控制信号为Qc的第三位数值,以此类推……第m-3位和第m-2位的二选一控制信号为Qc的第m/2位数值,第m-1位和第m位的二选一控制信号置为1。m位二选一阵列的“0”端接向左移位电路的输出AL,“1”端接掐头去尾移位补值电路的输出Q的最高位。m位二输入与门输入端其中一个端口与m位二选一阵列的m位对应相连,另一个输入端按如下方式连接:第一位与门连接掐头去尾移位补值电路的输出Q的最低位,第二位与门连接Q的第二位,以此类推……第m-1位与门连接Q的次高位的反码,第m位与门连接Q的次高位。m位与门阵列的输出为最终的m位部分积输出。The circuit implementation structure of the fast square algorithm suitable for the large-number modulus square operation of the present invention includes: a head-and-tail shift compensation circuit, a left shift circuit, a series of two-select-one circuits, a two-input AND gate array, a full-add tor FA array and a series of scan registers and reduction circuits. Wherein, an m/2-bit johnson circular shift counter is included in the head-cutting and tail-cutting shift compensation circuit. The input of the head-to-tail and shift-compensation circuit is an m-bit square input item A, and the output is the output Q of the m-bit register in the head-to-tail shift and complement circuit, and the m/2-bit johnson circular shift counter is output at the same time The output Qc of the register. The input of the left shift circuit is the low m/2 bits of the square input item A. When the rising edge of each clock arrives, the value of the register in the left shift circuit is shifted to the left by one bit, and the low bits are filled with zeros, and the highest The output terminal of the bit register is drawn, defined as AL. The control signal of the m-bit two-choice array is mainly generated by the output Qc of the Johnson circular shift counter in the head-cutting and tail-cutting shift compensation circuit. The control signal of the first and second bits is the second value of Qc, the control signal of the third and fourth bits is the third value of Qc, and so on... m-th The one-two control signal of the 3rd bit and the m-2th bit is the value of the m/2-th bit of Qc, and the one-two control signal of the m-1th bit and the m-th bit is set to 1. The "0" terminal of the m-bit two-choice array is connected to the output AL of the leftward shift circuit, and the "1" terminal is connected to the highest bit of the output Q of the head-cutting and tail-removing shift-compensation circuit. One of the input ports of the m-bit two-input AND gate is connected to the m-bit of the m-bit two-choice array, and the other input terminal is connected as follows: the first bit AND gate is connected to the output of the first-bit and tail-cutting shift compensation circuit The lowest bit of Q, the second AND gate is connected to the second bit of Q, and so on... The m-1th AND gate is connected to the inverse code of the second highest bit of Q, and the mth bit AND gate is connected to the second highest bit of Q. The output of the m-bit AND gate array is the final m-bit partial product output.

部分积产生后,送入m个全加器FA中作为输入,其中,最低位的全加器的进位输入连接一个二选一的输出,二选一的“0”端连接掐头去尾移位补值电路的m位输出Q的最低位Q[0],“1”端输入零,二选一的控制信号为掐头去尾移位补值电路中Johnson循环移位计数器的输出Qc的第二位Qc[1];其余全加器的进位输入均为来自低位的进位输出。m位全加器的和位输出送入扫描寄存器的输入端“0”端。最低位扫描寄存器的“1”端连接一个二选一的输出端,二选一的输入“0”端连接平方运算的输入项A的最低位A[0],“1”端输入零,二选一的控制信号为Sel,该信号由掐头去尾移位补值电路中Johnson计数器的输出端Qc的最低位和最高位相异或产生。第二位扫描寄存器的“1”端输入零,第三位扫描寄存器的“1”端输入第一位全加器FA的和位,第四位扫描寄存器的“1”端输入第二位全加器FA的和位,以此类推,第m+2位扫描寄存器的“1”端输入第m位全加器FA的和位,第m+3位扫描寄存器的“1”端输入第m位全加器FA的进位输出。m+3位扫描寄存器的输出端送入约简电路进行约简。After the partial product is generated, it is sent to m full adders FA as input, among which, the carry input of the lowest full adder is connected to a two-choice output, and the "0" terminal of the two-choice one is connected to pinch the head and remove the tail shift The m-bit output Q of the bit complement value circuit is the lowest bit Q[0], and the "1" terminal inputs zero, and the control signal to select one of the two is the output Qc of the Johnson circular shift counter in the shift complement value circuit of pinching the head and removing the tail. The second bit Qc[1]; the carry inputs of the other full adders are all carry outputs from the lower bits. The sum bit output of the m-bit full adder is sent to the input terminal "0" of the scanning register. The "1" end of the lowest bit scanning register is connected to an output end of the alternative one, and the input "0" end of the alternative one is connected to the lowest bit A[0] of the input item A of the square operation, and the "1" end inputs zero, two The selected one control signal is Sel, which is generated by the XOR of the lowest bit and the highest bit of the output terminal Qc of the Johnson counter in the head-cutting and tail-cutting shift compensation circuit. The "1" terminal of the second scanning register inputs zero, the "1" terminal of the third scanning register inputs the sum bit of the first full adder FA, and the "1" terminal of the fourth scanning register inputs the second full The sum bit of the adder FA, and so on, the "1" end of the m+2 scan register is input to the sum bit of the m full adder FA, and the "1" end of the m+3 scan register is input to the m The carry output of the full adder FA. The output terminal of the m+3-bit scan register is sent to the reduction circuit for reduction.

在本发明中,部分积产生电路将m个部分积压缩为m/2个,最终平方运算采用从高位到低位运算,第一个部分积产生后向左移两位,送入约简电路约简后与第二个部分积相加,完成一次累加,以此类推,直到第m/2个部分积产生,完成m/2次累加,最终完成模平方运算。In the present invention, the partial product generation circuit compresses m partial products into m/2, and the final square operation adopts the operation from high order to low order. Then add it to the second partial product to complete an accumulation, and so on until the m/2th partial product is generated, complete m/2 accumulations, and finally complete the modular square operation.

表1Table 1

参照表1,表1为按照多项式乘法展开后的平方运算的部分积求和表格,整个过程需要对m个部分积进行求和,整个过程需要m-1个时钟周期。Referring to Table 1, Table 1 is the partial product summation table of the square operation expanded according to polynomial multiplication. The whole process needs to sum m partial products, and the whole process requires m-1 clock cycles.

表2Table 2

参照表2,表2为本发明优化后的部分积求和表格,将部分积压缩为表1中的一半,整个过程只需m/2-1个时钟即可完成平方运算。With reference to Table 2, Table 2 is the optimized partial product sum table of the present invention, and the partial product is compressed into half of that in Table 1, and the whole process only needs m/2-1 clocks to complete the square operation.

表3table 3

A5A4A5A4 A5(!A4)A5 (!A4) A7A1A7A1 A6A1A6A1 A5A1A5A1 A4A1A4A1 A3A1A3A1 A2A1+A2A2A1+A2 00 A1A1 A6A5A6A5 A6(!A5)A6 (!A5) A6A4A6A4 A6A3A6A3 A6A2A6A2 A5A2A5A2 A4A2A4A2 A3A2+A3A3A2+A3 A7A6A7A6 A7(!A6)A7 (!A6) A7A5A7A5 A7A4A7A4 A7A3A7A3 A7A2A7A2 A5A3A5A3 A4A3+A4A4A3+A4 A8A7A8A7 A8(!A7)A8 (!A7) A8A6A8A6 A8A5A8A5 A8A4A8A4 A8A3A8A3 A8A2A8A2 A8A1A8A1

参照表3,表3为以8位二进制数为例采用本算法的平方部分积求和表格,以8位二进制数为例,采用本发明算法,将部分积压缩为4项。With reference to table 3, table 3 adopts the square partial product summation form of this algorithm as an example with 8 binary numbers, takes 8 binary numbers as example, adopts algorithm of the present invention, the partial product is compressed into 4 items.

本发明的运算过程:Operation process of the present invention:

参见图1和图2,初始化时RS=0,将寄存器全部置为0;工作时,RS=1,第一个有效时钟沿后,掐头去尾移位补值电路输出Q=[Am,Am-1,Am-2···A2,A1]、Qc=[11···111];第二个有效时钟沿之后,第一次部分积累加完成,掐头去尾移位补值电路输出Q=[Am-1,Am-2···A2,Am/2+1,Am/2]、Qc=[11···110];直至第m/2个时钟沿之后,完成m/2-1个部分积累加,同时输出数据左移存入寄存器,因此寄存器最低两位为00;第m/2+1个时钟沿到来时,所有的部分积累加完成同时在最后两位寄存器中存入0和平方运算的输入项A的最低位;在部分积累加的过程中,每个时钟上升沿到来后将m+3个扫描寄存器的输出值值送入约简电路,约简完成后等待下一个时钟沿到来,与下一个部分积累加,以此类推……直到第m/2+1个时钟沿之后,寄存器停止移位,最多再经过两个时钟沿,最终完成模平方运算过程。Referring to Figure 1 and Figure 2, when initializing RS=0, all registers are set to 0; when working, RS=1, after the first effective clock edge, the shift compensation circuit output Q=[A m ,A m-1 ,A m-2 ···A 2 ,A 1 ], Qc=[11···111]; after the second effective clock edge, the first partial accumulation is completed, and the head and tail are cut The shift compensation circuit output Q=[A m-1 , A m-2 ···A 2 ,A m/2+1 ,A m/2 ], Qc=[11···110]; until the mth After /2 clock edges, m/2-1 partial accumulations are completed, and the output data is shifted to the left and stored in the register, so the lowest two bits of the register are 00; when the m/2+1 clock edge arrives, all parts The accumulation and addition are completed and the lowest bit of the input item A of the square operation is stored in the last two registers at the same time; in the process of partial accumulation and addition, the output values of m+3 scanning registers are scanned after the rising edge of each clock arrives. Send it to the reduction circuit. After the reduction is completed, wait for the next clock edge to arrive, and accumulate with the next part, and so on... until the m/2+1th clock edge, the register stops shifting, and at most two more clock edge, and finally complete the modular square operation process.

Claims (7)

1.一种适用于大数的快速模平方运算电路,其特征在于:包括向左移位电路、m位二选一数据选择器阵列、m位二输入与门阵列、m位部分积产生电路、全加器FA阵列、m+3位扫描寄存器、约简电路以及带有一个m/2位的johnson循环移位计数器的掐头去尾移位补值电路;掐头去尾移位补值电路的输入为m位的平方运算输入项A,输出为掐头去尾移位补值电路中m位寄存器的输出Q,同时输出掐头去尾移位补值电路中m/2位johnson循环移位计数器中寄存器的输出Qc;向左移位电路的输入为平方输入项A的低m/2位;m位部分积产生电路的输入端与m位二输入与门阵列的输出端相连,m位部分积产生电路的输出端与全加器FA阵列的输入端相连,全加器FA阵列通过m+3位扫描寄存器与简约电路相连;其中,160≤m≤15360。1. A fast modular square operation circuit applicable to large numbers, characterized in that: comprise a shift circuit to the left, an m-bit two-select-one data selector array, an m-bit two-input AND gate array, and an m-bit partial product generation circuit , full adder FA array, m+3-bit scan register, reduction circuit and a head-to-tail shift and complement circuit with an m/2-bit johnson circular shift counter; head-to-tail shift and complement The input of the circuit is an m-bit square operation input item A, and the output is the output Q of the m-bit register in the head-to-tail shift and complement circuit, and at the same time output the m/2-bit johnson cycle in the head-to-tail shift and complement circuit The output Qc of the register in the shift counter; the input to the left shift circuit is the low m/2 bit of the square input item A; the input end of the m bit partial product generation circuit is connected with the output end of the m bit two-input AND gate array, The output end of the m-bit partial product generation circuit is connected to the input end of the full adder FA array, and the full adder FA array is connected to the simple circuit through m+3 bit scan registers; wherein, 160≤m≤15360. 2.根据权利要求1所述的适用于大数的快速模平方运算电路,其特征在于:所述的向左移位电路,每一个时钟上升沿到来时,向左移位电路中寄存器的值左移一位,并把低位补零,同时将最高位寄存器的输出端引出,定义为AL。2. The fast modular square operation circuit suitable for large numbers according to claim 1, characterized in that: the leftward shift circuit, when each clock rising edge arrives, the value of the register in the leftward shift circuit Shift one bit to the left, fill the low bit with zero, and lead out the output terminal of the highest bit register, which is defined as AL. 3.根据权利要求1所述的适用于大数的快速模平方运算电路,其特征在于:所述的m位二选一数据选择器阵列包括m位二选一数据选择器,其中m位的二选一数据选择器的控制信号由掐头去尾移位补值电路中的Johnson循环移位计数器输出Qc产生;其中,第一位二选一数据选择器和第二位二选一数据选择器的控制信号为Qc的第二位数值,第三位二选一数据选择器和第四位二选一数据选择器的控制信号为Qc的第三位数值,以此类推,第m-3位二选一数据选择器和第m-2位二选一数据选择器的制信号为Qc的第m/2位数值,第m-1位二选一数据选择器和第m位二选一数据选择器的制信号置为1;m位二选一数据选择器的“0”端接向左移位电路的输出AL,“1”端接掐头去尾移位补值电路的输出Q的最高位。3. the fast modular square operation circuit suitable for large numbers according to claim 1, characterized in that: the m-bit data selector array includes m-bit two-choice data selectors, wherein the m-bit The control signal of the two-to-one data selector is generated by the output Qc of the Johnson circular shift counter in the head-to-tail shift compensation circuit; wherein, the first two-to-one data selector and the second two-to-one data selector The control signal of the device is the second value of Qc, the control signal of the third data selector and the fourth data selector is the third value of Qc, and so on, m-3 The control signal of the two-to-one data selector and the m-2th two-to-one data selector is the m/2th digit value of Qc, the m-1-th two-to-one data selector and the m-th two-to-one data selector The control signal of the data selector is set to 1; the "0" terminal of the m-bit two-choice data selector is connected to the output AL of the left shift circuit, and the "1" terminal is connected to the output Q of the head and tail shift compensation circuit the highest bit. 4.根据权利要求1所述的适用于大数的快速模平方运算电路,其特征在于:所述的m位二输入与门的输入端中的一个端口与m位二选一数据选择器阵列的m位对应相连,另一个输入端按如下方式连接:4. the fast modular square operation circuit suitable for large numbers according to claim 1, characterized in that: a port in the input end of the two-input AND gate of the m position and an m-bit two-select-one data selector array The m bits of are connected correspondingly, and the other input terminal is connected as follows: 第一位与门连接掐头去尾移位补值电路的输出Q的最低位,第二位与门连接Q的第二位,以此类推,第m-1位与门连接Q的次高位的反码,第m位与门连接Q的次高位;m位与门阵列的输出为最终的m位部分积输出。The first AND gate is connected to the lowest bit of the output Q of the head-to-tail shift compensation circuit, the second AND gate is connected to the second bit of Q, and so on, the m-1th AND gate is connected to the second highest bit of Q The inverse code of the m-bit AND gate is connected to the second highest bit of Q; the output of the m-bit AND gate array is the final m-bit partial product output. 5.根据权利要求1或4所述的适用于大数的快速模平方运算电路,其特征在于:所述的m位部分积产生电路,在部分积产生后,送入m个全加器FA中作为输入,其中,最低位的全加器的进位输入连接一个二选一数据选择器的输出,二选一数据选择器的“0”端连接掐头去尾移位补值电路的m位输出Q的最低位Q[0],“1”端输入零,二选一数据选择器的控制信号为掐头去尾移位补值电路中Johnson循环移位计数器的输出Qc的第二位Qc[1];其余全加器的进位输入均为来自低位的进位输出。5. according to claim 1 or 4 described fast modular squaring circuit applicable to large numbers, it is characterized in that: described m bit partial product generation circuit, after partial product generation, sends into m full adders FA As the input, wherein, the carry input of the lowest full adder is connected to the output of a two-to-one data selector, and the "0" end of the two-to-one data selector is connected to the m bit of the head-to-tail shift compensation circuit Output the lowest bit Q[0] of Q, input zero at the "1" terminal, and the control signal of the data selector is the second bit Qc of the output Qc of the Johnson circular shift counter in the head-to-head and tail-to-tail shift compensation circuit [1]; The carry-in of other full adders is the carry-out from the lower bit. 6.根据权利要求1所述的适用于大数的快速模平方运算电路,其特征在于:所述的m位全加器FA的和位输出送入扫描寄存器的输入端“0”端;最低位扫描寄存器的“1”端连接一个二选一数据选择器的输出端,二选一数据选择器的输入“0”端连接平方运算输入项A的最低位A[0],“1”端输入零,二选一数据选择器的控制信号为Sel,该信号由掐头去尾移位补值电路中Johnson计数器的输出端Qc的最低位和最高位相异或产生;第二位扫描寄存器的“1”端输入零,第三位扫描寄存器的“1”端输入第一位全加器FA的和位,第四位扫描寄存器的“1”端输入第二位全加器FA的和位,以此类推,第m+2位扫描寄存器的“1”端输入第m位全加器FA的和位,第m+3位扫描寄存器的“1”端输入第m位全加器FA的进位输出。6. the fast modular square operation circuit suitable for large numbers according to claim 1 is characterized in that: the sum bit output of the full adder FA of the m position is sent into the input terminal "0" end of the scanning register; The "1" terminal of the bit scan register is connected to the output terminal of a two-to-one data selector, and the input "0" terminal of the two-to-one data selector is connected to the lowest bit A[0] of the square operation input item A, and the "1" terminal Input zero, the control signal of the two-choice one data selector is Sel, and this signal is produced by the lowest bit and the highest bit of the output terminal Qc of the Johnson counter in the head-to-tail shift compensation circuit; the second scan register The "1" terminal inputs zero, the "1" terminal of the third scanning register inputs the sum bit of the first full adder FA, and the "1" terminal of the fourth scanning register inputs the sum bit of the second full adder FA , and so on, the "1" end of the m+2 scan register is input to the sum bit of the m full adder FA, and the "1" end of the m+3 scan register is input to the m full adder FA carry out. 7.根据权利要求1所述的适用于大数的快速模平方运算电路,其特征在于:所述的m+3位扫描寄存器的输出端送入约简电路进行约简。7. The fast modular square operation circuit suitable for large numbers according to claim 1, characterized in that: the output terminal of the m+3-bit scanning register is sent to the reduction circuit for reduction.
CN201310653889.9A 2013-12-05 2013-12-05 A kind of Fast Modular square operation circuit being applicable to big number Expired - Fee Related CN103699358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310653889.9A CN103699358B (en) 2013-12-05 2013-12-05 A kind of Fast Modular square operation circuit being applicable to big number

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310653889.9A CN103699358B (en) 2013-12-05 2013-12-05 A kind of Fast Modular square operation circuit being applicable to big number

Publications (2)

Publication Number Publication Date
CN103699358A CN103699358A (en) 2014-04-02
CN103699358B true CN103699358B (en) 2016-11-23

Family

ID=50360899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310653889.9A Expired - Fee Related CN103699358B (en) 2013-12-05 2013-12-05 A kind of Fast Modular square operation circuit being applicable to big number

Country Status (1)

Country Link
CN (1) CN103699358B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1008026B1 (en) * 1997-05-04 2007-09-05 M-Systems Flash Disk Pioneers Ltd. Improved apparatus & method for modular multiplication & exponentiation based on montgomery multiplication
CN200976579Y (en) * 2006-07-28 2007-11-14 东南大学 Reversed phase shift restricted competition metering code circuit

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7346159B2 (en) * 2002-05-01 2008-03-18 Sun Microsystems, Inc. Generic modular multiplier using partial reduction
US7627114B2 (en) * 2002-10-02 2009-12-01 International Business Machines Corporation Efficient modular reduction and modular multiplication
KR100459732B1 (en) * 2002-12-30 2004-12-03 삼성전자주식회사 Montgomery modular multiplier by 4 to 2 compressor and multiplication method thereof
CN101834712B (en) * 2010-04-19 2012-11-14 浙江大学 Method for realizing accurate time synchronization by utilizing IEEE1588 protocol

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1008026B1 (en) * 1997-05-04 2007-09-05 M-Systems Flash Disk Pioneers Ltd. Improved apparatus & method for modular multiplication & exponentiation based on montgomery multiplication
CN200976579Y (en) * 2006-07-28 2007-11-14 东南大学 Reversed phase shift restricted competition metering code circuit

Also Published As

Publication number Publication date
CN103699358A (en) 2014-04-02

Similar Documents

Publication Publication Date Title
US11301213B2 (en) Reduced latency multiplier circuitry for very large numbers
CN102681815B (en) By the method having symbol multiply accumulating algorithm of totalizer tree structure
CN103793199B (en) A kind of fast rsa password coprocessor supporting dual domain
CN103369326B (en) Be suitable to the transform coder of high-performance video coding standard HEVC
CN104679474A (en) Multiplying unit on finite field GF (2 227) and modular multiplication algorithm
Sousa et al. On the Design of RNS Reverse Converters for the Four-Moduli Set ${\bf\{2^{\mmb n}+ 1, 2^{\mmb n}-1, 2^{\mmb n}, 2^{{\mmb n}+ 1}+ 1\}} $
US8352532B1 (en) Circuit structure for multiplying numbers using look-up tables and adders
Kakde et al. Design of area and power aware reduced Complexity Wallace Tree multiplier
CN102253822A (en) A Modular (2n-3) Multiplier
CN103699358B (en) A kind of Fast Modular square operation circuit being applicable to big number
CN104579366B (en) High speed QC-LDPC encoder in WPAN based on three class pipeline
CN103699357B (en) A kind of Fast Modular Algorithm for Reduction circuit for modular multiplication and mould square
CN110688094A (en) Remainder operation circuit and method based on parallel cyclic compression
Reyhani-Masoleh et al. Low complexity sequential normal basis multipliers over GF (2/sup m/)
Laxman et al. FPGA implementation of different multiplier architectures
Lee Super Digit-Serial Systolic Multiplier over GF (2^ m)
CN102955682B (en) Modular(23n-2n)multiplier
Imaña Low latency $ GF (2^{m}) $ polynomial basis multiplier
CN106873941B (en) A kind of Fast Modular Multiplication and mould squaring circuit and its implementation
Gerards et al. Streaming reduction circuit
CN103077005B (en) A kind ofly to go here and there and the large number modular multiplier circuit of the prime field GF (p) combined
Solomko Optimization of the acyclic adders of binary codes
Jayarajkumar et al. Design and implementation of 16 bit systolic multiplier using modular shifting algorithm
Madhuri et al. Analysis of reconfigurable multipliers for integer and Galois field multiplication based on high speed adders
KR970003979B1 (en) Multiplexer

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161123

Termination date: 20201205

CF01 Termination of patent right due to non-payment of annual fee