Search results

2024/1246 (PDF) Last updated: 2024-08-06

MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof

Pengcheng Qiu, Guiming Wu, Tingqiang Chu, Changzheng Wei, Runzhou Luo, Ying Yan, Wei Wang, Hui Zhang

Implementation

Multi-scalar multiplication (MSM) is the most computation-intensive part in proof generation of Zero-knowledge proof (ZKP). In this paper, we propose MSMAC, an FPGA accelerator for large-scale MSM. MSMAC adopts a specially designed Instruction Set Architecture (ISA) for MSM and optimizes pipelined Point Addition Unit (PAU) with hybrid Karatsuba multiplier. Moreover, a runtime system is proposed to split MSM tasks with the optimal sub-task size and orchestrate execution of Processing Elements...

2023/1962 (PDF) Last updated: 2024-06-19

A Survey of Polynomial Multiplications for Lattice-Based Cryptosystems

Vincent Hwang

Implementation

We survey various mathematical tools used in software works multiplying polynomials in \[ \frac{\mathbb{Z}_q[x]}{\left\langle {x^n - \alpha x - \beta} \right\rangle}. \] In particular, we survey implementation works targeting polynomial multiplications in lattice-based cryptosystems Dilithium, Kyber, NTRU, NTRU Prime, and Saber with instruction set architectures/extensions Armv7-M, Armv7E-M, Armv8-A, and AVX2. There are three emphases in this paper: (i) modular arithmetic, (ii)...

2022/1095 (PDF) Last updated: 2022-08-24

Toffoli gate count Optimized Space-Efficient Quantum Circuit for Binary Field Multiplication

KIM, SUNYEOP, KIM, INSUNG, Seonggyeom Kim, Seokhie Hong

Implementation

Shor's algorithm solves Elliptic Curve Discrete Logarithm Problem(ECDLP) in polynomial time. To optimize Shor's algorithm for binary elliptic curve, reducing the cost for binary field multiplication is essential because it is most cost-critical basic arithmetic. In this paper, we propose Toffoli gate count optimized space-efficient quantum circuits for binary field $(\mathbb{F}_{2^{n}})$ multiplication. To do so, we take advantage of Karatsuba-like formula and show that its application can...

2022/921 (PDF) Last updated: 2022-07-20

Low-Delay 4, 5 and 6-Term Karatsuba Formulae in $\mathbb{F}_2[x]$ Using Overlap-free Splitting

Haining Fan

Implementation

The overlap-free splitting method, i.e., even-odd splitting and its generalization, can reduce the XOR delay of a Karatsuba multiplier. We use this method to derive Karatsuba formulae with one less XOR delay in each recursive iteration. These formulae need more multiplication operations, and are trade-offs between space and time. We also show that ``finding common subexpressions'' performs better than ``the refined identity'' in 4-term formula: we reduce the number of XOR gates given by...

2022/501 (PDF) Last updated: 2022-04-28

Another Concrete Quantum Cryptanalysis of Binary Elliptic Curves

Dedy Septono Catur Putranto, Rini Wisnu Wardhani, Harashta Tatimma Larasati, Howon Kim

Implementation

This paper presents concrete quantum cryptanalysis for binary elliptic curves for a time-efficient implementation perspective (i.e., reducing the circuit depth), complementing the previous research by Banegas et al., that focuses on the space-efficiency perspective (i.e., reducing the circuit width). To achieve the depth optimization, we propose an improvement to the existing circuit implementation of the Karatsuba multiplier and FLT-based inversion, then construct and analyze the resource...

2022/439 (PDF) Last updated: 2022-10-22

Efficient Multiplication of Somewhat Small Integers using Number-Theoretic Transforms

Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Lorenz Panny, Bo-Yin Yang

Implementation

Conventional wisdom purports that FFT-based integer multiplication methods (such as the Schönhage-Strassen algorithm) begin to compete with Karatsuba and Toom-Cook only for integers of several tens of thousands of bits. In this work, we challenge this belief, leveraging recent advances in the implementation of number-theoretic transforms (NTT) stimulated by their use in post-quantum cryptography. We report on implementations of NTT-based integer arithmetic on two Arm Cortex-M CPUs on...

2022/371 (PDF) Last updated: 2022-03-22

A High-performance ECC Processor over Curve448 based on a Novel Variant of the Karatsuba Formula for Asymmetric Digit Multiplier

Asep Muhamad Awaludin, Jonguk Park, Rini Wisnu Wardhani, Howon Kim

Implementation

In this paper, we present a high-performance architecture for elliptic curve cryptography (ECC) over Curve448, which to the best of our knowledge, is the fastest implementation of ECC point multiplication over Curve448 to date. Firstly, we introduce a novel variant of the Karatsuba formula for asymmetric digit multiplier, suitable for typical DSP primitive with asymmetric input. It reduces the number of required DSPs compared to previous work and preserves the performance via full...

2022/300 (PDF) Last updated: 2022-03-07

Faster NTRU on ARM Cortex-M4 with TMVP-based multiplication

Irem Keskinkurt Paksoy, Murat Cenk

Applications

The Number Theoretic Transform (NTT), Toom-Cook, and Karatsuba are the most commonly used algorithms for implementing lattice-based ?nalists of the NIST PQC competition. In this paper, we propose Toeplitz matrix-vector product (TMVP) based algorithms for multiplication for all parameter sets of NTRU. We implement the pro- posed algorithms on ARM Cortex-M4. The results show that TMVP- based multiplication algorithms using the four-way TMVP formula are more e?cient for NTRU. Our algorithms...

2022/108 (PDF) Last updated: 2022-01-31

Public Key Compression and Fast Polynomial Multiplication for NTRU using the Corrected Hybridized NTT-Karatsuba Method

Rohon Kundu, Alessandro de Piccoli, Andrea Visconti

Public-key cryptography

NTRU is a lattice-based public-key cryptosystem that has been selected as one of the Round III finalists at the NIST Post-Quantum Cryptography Standardization. Compressing the key sizes to increase efficiency has been a long-standing open question for lattice-based cryptosystems. In this paper we provide a solution to three seemingly opposite demands for NTRU cryptosystem: compress the key size, increase the security level, optimize performance by implementing fast polynomial...

2022/050 (PDF) Last updated: 2022-01-18

High-Speed and Unified ECC Processor for Generic Weierstrass Curves over GF(p) on FPGA

Asep Muhamad Awaludin, Harashta Tatimma Larasati, Howon Kim

Implementation

In this paper, we present a high-speed, unified elliptic curve cryptography (ECC) processor for arbitrary Weierstrass curves over GF(p), which to the best of our knowledge, outperforms other similar works in terms of execution time. Our approach employs the combination of the schoolbook long and Karatsuba multiplication algorithm for the elliptic curve point multiplication (ECPM) to achieve better parallelization while retaining low complexity. In the hardware implementation, the substantial...

2021/1117 (PDF) Last updated: 2021-09-03

All the Polynomial Multiplication You Need on RISC-V

Hwajeong Seo, Hyeokdong Kwon, Siwoo Eum, Kyungbae Jang, Hyunjun Kim, Hyunji Kim, Minjoo Sim, Gyeongju Song, Wai-Kong Lee

Implementation

Polynomial multiplication is a core operation for public key cryptography, such as pre-quantum cryptography (e.g. elliptic curve cryptography) and post-quantum cryptography (e.g. code-based cryptography and multivariate-based cryptography). For this reason, the efficient and secure implementation of polynomial multiplication has been actively conducted for high availability and security level in application services. In this paper, we present all polynomial multiplication methods on modern...

2021/1015 (PDF) Last updated: 2021-08-06

Look-up the Rainbow: Efficient Table-based Parallel Implementation of Rainbow Signature on 64-bit ARMv8 Processors

Hyeokdong Kwon, Hyunjun Kim, Minjoo Sim, Wai-Kong Lee, Hwajeong Seo

Implementation

Rainbow signature is one of the finalist in National Institute of Standards and Technology (NIST) standardization. It is also the only signature candidate that is designed based on multivariate quadratic hard problem. Rainbow signature is known to have very small signature size compared to other post-quantum candidates. In this paper, we propose an efficient implementation technique to improve performance of Rainbow signature schemes. A parallel polynomial-multiplication on a 64-bit ARMv8...

2021/998 (PDF) Last updated: 2021-10-15

Polynomial multiplication on embedded vector architectures

Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, Ingrid Verbauwhede

Public-key cryptography

High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those...

2021/446 (PDF) Last updated: 2021-04-08

Towards practical GGM-based PRF from (Module-)Learning-with-Rounding

Chitchanok Chuengsatiansup, Damien Stehle

Foundations

We investigate the efficiency of a (module-)LWR-based PRF built using the GGM design. Our construction enjoys the security proof of the GGM construction and the (module-)LWR hardness assumption which is believed to be post-quantum secure. We propose GGM-based PRFs from PRGs with larger ratio of output to input. This reduces the number of PRG invocations which improves the PRF performance and reduces the security loss in the GGM security reduction. Our construction bridges the gap between...

2021/185 (PDF) Last updated: 2021-06-18

No Silver Bullet: Optimized Montgomery Multiplication on Various 64-bit ARM Platforms

Hwajeong Seo, Pakize Sanal, Wai-Kong Lee, Reza Azarderakhsh

Implementation

In this paper, we firstly presented optimized implementations of Montgomery multiplication on 64-bit ARM processors by taking advantages of Karatsuba algorithm and efficient multiplication instruction sets for ARM64 architectures. The implementation of Montgomery multiplication can improve the performance of (pre-quantum and post-quantum) public key cryptography (e.g. CSIDH, ECC, and RSA) implementations on ARM64 architectures, directly. Last but not least, the performance of Karatsuba...

2020/1336 (PDF) Last updated: 2020-10-26

Faster Characteristic Three Polynomial Multiplication and Its Application to NTRU Prime Decapsulation

Esra Yeniaras, Murat Cenk

Implementation

Efficient computation of polynomial multiplication for characteristic three fields, $\mathbb{F}_{3^{n}}$ for $n\geq1$, is an important attribute for many cryptographic protocols. In this paper, we propose three new polynomial multiplication algorithms over $\mathbb{F}_{3}[x]$ and show that they are more efficient than the current state-of-the-art algorithms. We first examine through the well-known multiplication algorithms in $\mathbb{F}_{3}[x]$ including Karatsuba-2-way and 3-way split...

2020/1302 (PDF) Last updated: 2022-11-09

TMVP-based Multiplication for Polynomial Quotient Rings and Application to Saber on ARM Cortex-M4

İrem Keskinkurt Paksoy, Murat Cenk

Implementation

Lattice-based NIST PQC finalists need efficient multiplication in $\mathbb{Z}_q[x]/(f(x))$. Multiplication in this ring can be performed very efficiently via number theoretic transform (NTT) as done in CRYSTALS-Kyber if the parameters of the scheme allow it. If NTT is not supported, other multiplication algorithms must be employed. For example, if the modulus $q$ of the scheme is a power of two, as in Saber and NTRU, then NTT can not be used directly. In this case, Karatsuba and Toom-Cook...

2020/1109 (PDF) Last updated: 2021-09-05

Karatsuba-based square-root Vélu’s formulas applied to two isogeny-based protocols

Gora Adj, Jesús-Javier Chi-Domínguez, Francisco Rodríguez-Henríquez

Public-key cryptography

At a combined computational expense of about $6{\ell}$ field operations, Vélu's formulas are used to construct and evaluate degree-$\ell$ isogenies in the vast majority of isogeny-based cryptographic schemes. By adapting to Vélu's formulas a baby-step giant-step approach, Bernstein, De Feo, Leroux, and Smith presented a procedure that can computes isogeny operations at a reduced cost of just $\tilde{O}(\sqrt{\ell})$ field operations. In this paper, we present a concrete computational...

2020/1037 (PDF) Last updated: 2021-04-23

A High-performance Hardware Implementation of Saber Based on Karatsuba Algorithm

Yihong Zhu, Min Zhu, Bohan Yang, Wenping Zhu, Chenchen Deng, Chen Chen, Shaojun Wei, Leibo Liu

Implementation

Although large numbers of hardware and software implementations have been proposed to accelerate lattice-based cryptography, Saber, a module-LWR-based algorithm, which has advanced to second round of the NIST standardization process, has not been adequately supported by the current solutions. Based on these motivations, a high-performance crypto-processor is proposed based on an algorithm-hardware co-design in this paper. First, a hierarchical Karatsuba calculating framework, a...

2020/797 (PDF) Last updated: 2020-06-27

Fast, Small, and Area-Time Efficient Architectures for Key-Exchange on Curve25519

Mojtaba Bisheh Niasar, Rami El Khatib, Reza Azarderakhsh, Mehran Mozaffari-Kermani

Implementation

Abstract--- This paper demonstrates fast and compact implementations of Elliptic Curve Cryptography (ECC) for efficient key agreement over Curve25519. Curve25519 has been recently adopted as a key exchange method for several applications such as connected small devices as well as cloud, and included in the National Institute of Standards and Technology (NIST) recommendations for public key cryptography. This paper presents three different performance level designs including lightweight,...

2020/438 (PDF) Last updated: 2020-07-28

Fast hybrid Karatsuba multiplier for Type II pentanomials

Yin Li, Yu Zhang, Wei He

Implementation

We continue the study of Mastrovito form of Karatsuba multipliers under the shifted polynomial basis (SPB), recently introduced by Li et al. (IEEE TC (2017)). A Mastrovito-Karatsuba (MK) multiplier utilizes the Karatsuba algorithm (KA) to optimize polynomial multiplication and the Mastrovito approach to combine it with the modular reduction. The authors developed a MK multiplier for all trinomials, which obtain a better space and time trade-off compared with previous non-recursive Karatsuba...

2020/268 (PDF) Last updated: 2020-03-04

Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice based cryptography

Jose Maria Bermudo Mera, Angshuman Karmakar, Ingrid Verbauwhede

Public-key cryptography

Since the introduction of the ring-learning with errors problem, the number theoretic transform (NTT) based polynomial multiplication algorithm has been studied extensively. Due to its faster quasilinear time complexity, it has been the preferred choice of cryptographers to realize ring-learning with errors cryptographic schemes. Compared to NTT, Toom-Cook or Karatsuba based polynomial multiplication algorithms, though being known for a long time, still have a fledgling presence in the...

2019/1451 (PDF) Last updated: 2020-01-09

Tight bound on NewHope failure probability

Thomas Plantard, Arnaud Sipasseuth, Willy Susilo, Vincent Zucca

Public-key cryptography

NewHope Key Encapsulation Mechanism (KEM) has been presented at USENIX 2016 by Alchim et al. and is one of the remaining lattice-based candidates to the post-quantum standardization initiated by the NIST. However, despite the relative simplicity of the protocol, the bound on the decapsulation failure probability resulting from the original analysis is not tight. In this work we refine this analysis to get a tight upper-bound on this probability which happens to be much lower than what was...

2019/1259 (PDF) Last updated: 2020-08-12

Security and Efficiency Trade-offs for Elliptic Curve Diffie-Hellman at the 128-bit and 224-bit Security Levels

Kaushik Nath, Palash Sarkar

Implementation

Within the Transport Layer Security (TLS) Protocol Version 1.3, RFC 7748 specifies elliptic curves targeted at the 128-bit and the 224-bit security levels. For the 128-bit security level, the Montgomery curve Curve25519 and its birationally equivalent twisted Edwards curve Ed25519 are specified; for the 224-bit security level, the Montgomery curve Curve448, the Edwards curve Edwards448 (which is isogenous to Curve448) and another Edwards curve which is birationally equivalent to Curve448...

2019/1170 (PDF) Last updated: 2019-10-10

Space-efficient quantum multiplication of polynomials for binary finite fields with sub-quadratic Toffoli gate count

Iggy van Hoof

Public-key cryptography

Multiplication is an essential step in a lot of calculations. In this paper we look at multiplication of 2 binary polynomials of degree at most $n-1$, modulo an irreducible polynomial of degree $n$ with $2n$ input and $n$ output qubits, without ancillary qubits, assuming no errors. With straightforward schoolbook methods this would result in a quadratic number of Toffoli gates and a linear number of CNOT gates. This paper introduces a new algorithm that uses the same space, but by utilizing...

2019/1079 (PDF) Last updated: 2019-10-05

When NTT Meets Karatsuba: Preprocess-then-NTT Technique Revisited

Yiming Zhu, Zhen Liu, Yanbin Pan

Public-key cryptography

The Number Theoretic Transform (NTT) technique is widely used in the implementation of the cryptographic schemes based on the Ring Learning With Errors problem(RLWE), since it provides efficient algorithm for multiplication of polynomials over the finite field. However, to employ NTT, the finite field is required to have some special root of unity, such as $n$-th root, which makes the module $q$ in RLWE big since we need $q\equiv 1\mod 2n$ to ensure such root exits. At Inscrypt 2018, Zhou...

2019/111 (PDF) Last updated: 2019-12-03

On the Complexity of non-recursive $n$-term Karatsuba Multiplier for Trinomials

Yin Li, Shantanu Sharma, Yu Zhang, Xingpo Ma, Chuanda Qi

Implementation

The $n$-term Karatsuba algorithm (KA) is an extension of 2-term KA, which can obtain even fewer multiplications than the original one. In this contribution, we proposed a novel hybrid $GF(2^m)$ Karatsuba multiplier using $n$-term KA for irreducible trinomials of arbitrary degree, i.e., $x^m+x^k+1$ where $m\geq2k$. We multiply two $m$-term polynomials using $n$-term KA, by decomposing $m$ into $n\ell+r$, such that $r<n, \ell$. Combined with shifted polynomial basis (SPB), a new approach other...

2018/1018 (PDF) Last updated: 2019-04-09

Faster multiplication in $\mathbb{Z}_{2^m}[x]$ on Cortex-M4 to speed up NIST PQC candidates

Matthias J. Kannwischer, Joost Rijneveld, Peter Schwabe

Implementation

In this paper we optimize multiplication of polynomials in $\mathbb{Z}_{2^m}[x]$ on the ARM Cortex-M4 microprocessor. We use these optimized multiplication routines to speed up the NIST post-quantum candidates RLizard, NTRU-HRSS, NTRUEncrypt, Saber, and Kindi. For most of those schemes the only previous implementation that executes on the Cortex-M4 is the reference implementation submitted to NIST; for some of those schemes our optimized software is more than factor of 20 faster. One of the...

2018/682 (PDF) Last updated: 2020-03-05

Saber on ARM CCA-secure module lattice-based key encapsulation on ARM

Angshuman Karmakar, Jose Maria Bermudo Mera, Sujoy Sinha Roy, Ingrid Verbauwhede

Public-key cryptography

The CCA-secure lattice-based post-quantum key encapsulation scheme Saber is a candidate in the NIST's post-quantum cryptography standardization process. In this paper, we study the implementation aspects of Saber in resource-constrained microcontrollers from the ARM Cortex-M series which are very popular for realizing IoT applications. In this work, we carefully optimize various parts of Saber for speed and memory. We exploit digital signal processing instructions and efficient memory access...

2018/605 (PDF) Last updated: 2018-06-18

N-term Karatsuba Algorithm and its Application to Multiplier designs for Special Trinomials

Yin Li, Yu Zhang, Xiaoli Guo, Chuanda Qi

Foundations

In this paper, we propose a new type of non-recursive Mastrovito multiplier for $GF(2^m)$ using a $n$-term Karatsuba algorithm (KA), where $GF(2^m)$ is defined by an irreducible trinomial, $x^m+x^k+1, m=nk$. We show that such a type of trinomial combined with the $n$-term KA can fully exploit the spatial correlation of entries in related Mastrovito product matrices and lead to a low complexity architecture. The optimal parameter $n$ is further studied. As the main contribution of this...

2018/554 (PDF) Last updated: 2018-11-10

A new class of irreducible pentanomials for polynomial based multipliers in binary fields

Gustavo Banegas, Ricardo Custodio, Daniel Panario

We introduce a new class of irreducible pentanomials over ${\mathbb F}_{2^m}$ of the form $f(x) = x^{2b+c} + x^{b+c} + x^b + x^c + 1$. Let $m=2b+c$ and use $f$ to define the finite field extension of degree $m$. We give the exact number of operations required for computing the reduction modulo $f$. We also provide a multiplier based on Karatsuba algorithm in $\mathbb{F}_2[x]$ combined with our reduction process. We give the total cost of the multiplier and found that the bit-parallel...

2018/425 (PDF) Last updated: 2018-10-16

Implementing RLWE-based Schemes Using an RSA Co-Processor

Martin R. Albrecht, Christian Hanser, Andrea Hoeller, Thomas Pöppelmann, Fernando Virdia, Andreas Wallner

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card...

2018/091 (PDF) Last updated: 2018-01-28

Polynomial multiplication over binary finite fields: new upper bounds

Alessandro De Piccoli, Andrea Visconti, Ottavio Giulio Rizzo

Foundations

When implementing a cryptographic algorithm, efficient operations have high relevance both in hardware and software. Since a number of operations can be performed via polynomial multiplication, the arithmetic of polynomials over finite fields plays a key role in real-life implementations. One of the most interesting paper that addressed the problem has been published in 2009. In [5], Bernstein suggests to split polynomials into parts and presents a new recursive multiplication technique...

2017/523 (PDF) Last updated: 2018-02-16

Short, Invertible Elements in Partially Splitting Cyclotomic Rings and Applications to Lattice-Based Zero-Knowledge Proofs

Vadim Lyubashevsky, Gregor Seiler

When constructing practical zero-knowledge proofs based on the hardness of the Ring-LWE or the Ring-SIS problems over polynomial rings $Z_p[X]/(X^n+1)$, it is often necessary that the challenges come from a set $\mathcal{C}$ that satisfies three properties: the set should be large (around $2^{256}$), the elements in it should have small norms, and all the non-zero elements in the difference set $\mathcal{C}-\mathcal{C}$ should be invertible. The first two properties are straightforward to...

2017/108 (PDF) Last updated: 2017-02-14

Photonic Side Channel Attacks Against RSA

Elad Carmon, Jean-Pierre Seifert, Avishai Wool

Public-key cryptography

This paper describes the first attack utilizing the photonic side channel against a public-key crypto-system. We evaluated three common implementations of RSA modular exponentiation, all using the Karatsuba multiplication method. We discovered that the key length had marginal impact on resilience to the attack: attacking a 2048-bit key required only 9\% more decryption attempts than a 1024-bit key. We found that the most dominant parameter impacting the attacker's effort is the minimal block...

2016/694 (PDF) Last updated: 2017-02-26

Mastrovito Form of Non-recursive Karatsuba Multiplier for All Trinomials

Yin Li, Xingpo Ma, Yu Zhang, Chuanda Qi

We present a new type of bit-parallel non-recursive Karatsuba multiplier over $GF(2^m)$ generated by an arbitrary irreducible trinomial. This design effectively exploits Mastrovito approach and shifted polynomial basis (SPB) to reduce the time complexity and Karatsuba algorithm to reduce its space complexity. We show that this type of multiplier is only one $T_X$ slower than the fastest bit-parallel multiplier for all trinomials, where $T_X$ is the delay of one 2-input XOR gate. Meanwhile,...

2016/461 (PDF) Last updated: 2017-08-17

NTRU Prime: reducing attack surface at low cost

Daniel J. Bernstein, Chitchanok Chuengsatiansup, Tanja Lange, Christine van Vredendaal

Several ideal-lattice-based cryptosystems have been broken by recent attacks that exploit special structures of the rings used in those cryptosystems. The same structures are also used in the leading proposals for post-quantum lattice-based cryptography, including the classic NTRU cryptosystem and typical Ring-LWE-based cryptosystems. This paper (1) proposes NTRU Prime, which tweaks NTRU to use rings without these structures; (2) proposes Streamlined NTRU Prime, a public-key cryptosystem...

2015/1247 (PDF) Last updated: 2020-10-30

Missing a trick: Karatsuba variations

Mike Scott

Implementation

There are a variety of ways of applying the Karatsuba idea to multi-digit multiplication. These apply particularly well in the context where digits do not use the full word-length of the computer, so that partial products can be safely accumulated without fear of overflow. Here we re-visit the ``arbitrary degree'' version of Karatsuba and show that the cost of this little-known variant has been over-estimated in the past. We also attempt to definitively answer the question as to the...

2015/688 (PDF) Last updated: 2015-07-24

Binary Field Multiplication on ARMv8

Hwajeong Seo, Zhe Liu, Yasuyuki Nogami, Jongseok Choi, Howon Kim

Implementation

In this paper, we show efficient implementations of binary field multiplication over ARMv8. We exploit an advanced 64-bit polynomial multiplication (\texttt{PMULL}) supported by ARMv8 and conduct multiple levels of asymptotically faster Karatsuba multiplication. Finally, our method conducts binary field multiplication within 57 clock cycles for B-251. Our proposed method on ARMv8 improves the performance by a factor of $5.5$ times than previous techniques on ARMv7.

2015/465 (PDF) Last updated: 2015-05-20

Efficient Arithmetic on ARM-NEON and Its Application for High-Speed RSA Implementation

Hwajeong Seo, Zhe Liu, Johann Groschadl, Howon Kim

Implementation

Advanced modern processors support Single Instruction Multiple Data (SIMD) instructions (e.g. Intel-AVX, ARM-NEON) and a massive body of research on vector-parallel implementations of modular arithmetic, which are crucial components for modern public-key cryptography ranging from RSA, ElGamal, DSA and ECC, have been conducted. In this paper, we introduce a novel Double Operand Scanning (DOS) method to speed-up multi-precision squaring with non-redundant representations on SIMD...

2015/423 (PDF) Last updated: 2015-05-12

On the Implementation of Unified Arithmetic on Binary Huff Curves

Santosh Ghosh, Amit Kumar, Amitabh Das, Ingrid Verbauwhede

Unified formula for computing elliptic curve point addition and doubling are considered to be resistant against simple power-analysis attack. A new elliptic curve formula known as unified binary Huff curve in this regard has appeared into the literature in 2011. This paper is devoted to analyzing the applicability of this elliptic curve in practice. Our paper has two contributions.We provide an efficient implementation of the unified Huff formula in projective coordinates on FPGA....

2015/116 (PDF) Last updated: 2015-02-24

Efficient Hardware Design for Computing Pairings Using Few FPGA In-built DSPs

Riadh Brinci, Walid Khmiri, Mefteh Mbarek, Abdellatif Ben Rabâa, Ammar Bouallègue

Implementation

This paper is devoted to the design of a 258-bit multiplier for computing pairings over Barreto-Naehrig (BN) curves at 128-bit security level. The proposed design is optimized for Xilinx ﬁeld programmable gate array (FPGA). Each 258-bit integer is represented as a polynomial with ﬁve, 65 bit signed integer, coefficients. Exploiting this splitting we designed a pipelined 65-bit multiplier based on new Karatsuba- Ofman variant using non-standard splitting to ﬁt to the...

2014/852 (PDF) Last updated: 2015-03-23

Faster ECC over $\mathbb{F}_{2^{521}-1}$

Robert Granger, Michael Scott

Implementation

In this paper we present a new multiplication algorithm for residues modulo the Mersenne prime $2^{521} - 1$. Using this approach, on an Intel Haswell Core i7-4770, constant-time variable-base scalar multiplication on NIST's (and SECG's) curve P-521 requires 1,073,000 cycles, while on the recently proposed Edwards curve E-521 it requires just 943,000 cycles. As a comparison, on the same architecture openSSL's ECDH speed test for curve P-521 requires 1,319,000 cycles. Furthermore, our code...

2014/817 (PDF) Last updated: 2014-10-14

Optimized Karatsuba Squaring on 8-bit AVR Processors

Hwajeong Seo, Zhe Liu, Jongseok Choi, Howon Kim

Implementation

Multi-precision squaring is a crucial operation for implementation of Elliptic Curve Cryptography. Particularly, when it comes to embedded processors, the operation should be designed carefully to execute expensive ECC operation on resource constrained devices. In order to bridge the gap between high overheads and limited computation capabilities, we present optimized Karatsuba squaring method for embedded processors. Traditional squaring computation can be divided into two sub-squaring and...

2014/592 (PDF) Last updated: 2014-07-31

Multiprecision multiplication on AVR revisited

Michael Hutter, Peter Schwabe

Implementation

This paper presents new speed records for multiprecision multiplication on the AVR ATmega family of 8-bit microcontrollers. For example, our software takes only 1976 cycles for the multiplication of two 160-bit integers; this is more than 15% faster than previous work. For 256-bit inputs, our software is not only the first to break through the 6000-cycle barrier; with only 4797 cycles it also breaks through the 5000-cycle barrier and is more than 21% faster than previous work.We achieve...

2014/526 (PDF) Last updated: 2014-07-07

Curve41417: Karatsuba revisited

Daniel J. Bernstein, Chitchanok Chuengsatiansup, Tanja Lange

This paper introduces constant-time ARM Cortex-A8 ECDH software that (1) is faster than the fastest ECDH option in the latest version of OpenSSL but (2) achieves a security level above 2^200 using a prime above 2^400. For comparison, this OpenSSL ECDH option is not constant-time and has a security level of only 2^80. The new speeds are achieved in a quite different way from typical prime-field ECC software: they rely on a synergy between Karatsuba's method and choices of radix smaller than...

2014/268 (PDF) Last updated: 2015-07-03

New bit-parallel Montgomery multiplier for trinomials using squaring operation

Yin Li, Yiyang Chen

In this paper, a new bit-parallel Montgomery multiplier for $GF(2^m)$ is presented, where the field is generated with an irreducible trinomial. We first present a slightly generalized version of a newly proposed divide and conquer approach. Then, by combining this approach and a carefully chosen Montgomery factor, the Montgomery multiplication can be transformed into a composition of small polynomial multiplications and Montgomery squarings, which are simpler and more efficient. Explicit...

2014/142 Last updated: 2014-02-27

FPGA-Based High Performance AES-GCM Using Efficient Karatsuba Ofman Algorithm

Karim M. Abdellatif, R. Chotin-Avot, H. Mehrez

Implementation

AES-GCM has been utilized in various security applications. It consists of two components: an Advanced Encryption Standard (AES) engine and a Galois Hash (GHASH) core. The performance of the system is determined by the GHASH architecture because of the inherent computation feedback. This paper introduces a modification for the pipelined Karatsuba Ofman Algorithm (KOA)-based GHASH. In particular, the computation feedback is removed by analyzing the complexity of the computation process. The...

2013/427 (PDF) Last updated: 2013-07-03

Toeplitz matrix-vector product based GF(2^n) shifted polynomial basis multipliers for all irreducible pentanomials

Jiangtao Han, Haining Fan

Foundations

Besides Karatsuba algorithm, optimal Toeplitz matrix-vector product (TMVP) formulae is another approach to design GF(2^n) subquadratic multipliers. However, when GF(2^n) elements are represented using a shifted polynomial basis, this approach is currently appliable only to GF(2^n)s generated by all irreducible trinomials and a special type of irreducible pentanomials, not all general irreducible pentanomials. The reason is that no transformation matrix, which transforms the Mastrovito matrix...

2013/005 (PDF) Last updated: 2013-01-11

Efficient Multiplier for pairings over Barreto-Naehrig Curves on Virtex-6 FPGA

Riadh Brinci, Walid Khmiriy, Mefteh Mbarekz, Abdellatif Ben Rabaˆa, Ammar Bouallegue, Faouzi Chekir

This paper is devoted to the design of a 258- bit multiplier for computing pairings over Barreto-Naehrig (BN) curves at 128-bit security level. The proposed design is optimized for Xilinx field programmable gate array (FPGA). Each 258-bit integer is represented as a polynomial with five,65 bit signed integer, coefficients . Exploiting this splitting we designed a pipelined 65-bit multiplier based on new Karatsuba-Ofman variant using non-standard splitting to fit to the Xilinx embedded...

2012/414 (PDF) Last updated: 2012-08-01

Low complexity bit-parallel $GF(2^m)$ multiplier for all-one polynomials

Yin Li, Gong-liang Chen, Xiao-ning Xie

Implementation

This paper presents a new bit-parallel multiplier for the finite field $GF(2^m)$ generated with an irreducible all-one polynomial. Redundant representation is used to reduce the time delay of the proposed multiplier, while a three-term Karatsuba-like formula is combined with this representation to decrease the space complexity. As a result, the proposed multiplier requires about 10 percent fewer AND/XOR gates than the most efficient bit-parallel multipliers using an all-one polynomial, while...

2011/589 (PDF) Last updated: 2012-03-22

Impact of Intel's New Instruction Sets on Software Implementation of $GF(2)[x]$ Multiplication

Chen Su, Haining Fan

Implementation

PCLMULQDQ, a new instruction that supports $GF(2)[x]$ multiplication, was introduced by Intel in 2010. This instruction brings dramatic change to software implementation of multiplication in $GF(2^m)$ fields. In this paper, we present improved Karatsuba formulae for multiplying two small binary polynomials, compare different strategies for PCLMULQDQ-based multiplication in the five $GF(2^m)$ fields recommended by NIST and conclude the best design approaches to software implementation of...

2011/161 (PDF) Last updated: 2011-03-31

Efficient Hardware Implementations of BRW Polynomials and Tweakable Enciphering Schemes

Debrup Chakraborty, Cuauhtemoc Mancillas-Lopez, Francisco Rodriguez-Henriquez, Palash Sarkar

Implementation

A new class of polynomials was introduced by Bernstein (Bernstein 2007) which were later named by Sarkar as Bernstein-Rabin-Winograd (BRW) polynomials (Sarkar 2009). For the purpose of authentication, BRW polynomials offer considerable computational advantage over usual polynomials: $(m-1)$ multiplications for usual polynomial hashing versus $\lfloor\frac{m}{2}\rfloor$ multiplications and $\lceil\log_2 m\rceil$ squarings for BRW hashing, where $m$ is the number of message blocks to be...

2010/365 (PDF) Last updated: 2014-02-12

TASTY: Tool for Automating Secure Two-partY computations

Wilko Henecka, Stefan Kögl, Ahmad-Reza Sadeghi, Thomas Schneider, Immo Wehrenberg

Cryptographic protocols

Secure two-party computation allows two untrusting parties to jointly compute an arbitrary function on their respective private inputs while revealing no information beyond the outcome. Existing cryptographic compilers can automatically generate secure computation protocols from high-level specifications, but are often limited in their use and efficiency of generated protocols as they are based on either garbled circuits or (additively) homomorphic encryption only. In this paper we present...

2009/636 (PDF) Last updated: 2010-06-28

Obtaining More Karatsuba-Like Formulae over The Binary Field

Haining Fan, Ming Gu, Jiaguang Sun, Kwok-Yan Lam

Foundations

The aim of this paper is to find more Karatsuba-like formulae for a fixed set of moduli polynomials in GF(2)[x]. To this end, a theoretical framework is established. We first generalize the division algorithm, and then present a generalized definition of the remainder of integer division. Finally, a previously generalized Chinese remainder theorem is used to achieve our initial goal. As a by-product of the generalized remainder of integer division, we rediscover Montgomery’s N-residue and...

2009/398 (PDF) Last updated: 2009-08-19

Fast Architectures for the $\eta_T$ Pairing over Small-Characteristic Supersingular Elliptic Curves

Jean-Luc Beuchat, Jérémie Detrey, Nicolas Estibals, Eiji Okamoto, Francisco Rodríguez-Henríquez

Implementation

This paper is devoted to the design of fast parallel accelerators for the cryptographic $\eta_T$ pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of...

2009/122 (PDF) Last updated: 2009-08-04

Hardware Accelerator for the Tate Pairing in Characteristic Three Based on Karatsuba-Ofman Multipliers

Jean-Luc Beuchat, Jérémie Detrey, Nicolas Estibals, Eiji Okamoto, Francisco Rodríguez-Henríquez

Implementation

This paper is devoted to the design of fast parallel accelerators for the cryptographic Tate pairing in characteristic three over supersingular elliptic curves. We propose here a novel hardware implementation of Miller's loop based on a pipelined Karatsuba-Ofman multiplier. Thanks to a careful selection of algorithms for computing the tower field arithmetic associated to the Tate pairing, we manage to keep the pipeline busy. We also describe the strategies we considered to design our...

2008/127 (PDF) Last updated: 2008-03-25

A Pipelined Karatsuba-Ofman Multiplier over GF($3^{97}$) Amenable for Pairing Computation

Nidia Cortez-Duarte, Francisco Rodríguez-Henríquez, Jean-Luc Beuchat, Eiji Okamoto

Implementation

We present a subquadratic ternary field multiplier based on the combination of several variants of the Karatsuba-Ofman scheme recently published. Since one of the most relevant applications for this kind of multipliers is pairing computation, where several field multiplications need to be computed at once, we decided to design a $k$-stage pipeline structure for $k=1,\ldots,4$, where each stage is composed of a 49-trit polynomial multiplier unit. That architecture can compute an average of...

2007/463 (PDF) Last updated: 2008-02-07

Efficient GF(3m) Multiplication Algorithm for eta T Pairing

Gen Takahashi, Fumitaka Hoshino, Tetsutaro Kobayashi

Public-key cryptography

The computation speed of pairing based cryptosystems is slow compared with the other public key cryptosystems even though several efficient computation algorithms have been proposed. Thus more efficient computation of the Tate pairing is an important research goal. GF(3m) multiplication in GF(36m) in the pairing algorithm is the greatest consumer of time. Past research concentrated on reducing the number of GF(3m) multiplications, for instance the Karatsuba method. In this article, we...

2007/437 (PDF) Last updated: 2007-11-24

Reconfigurable Hardware Implementations of Tweakable Enciphering Schemes

Cuauhtemoc Mancillas-Lopez, Debrup Chakraborty, Francisco Rodriguez-Henriquez

Secret-key cryptography

Tweakable enciphering schemes are length preserving block cipher modes of operation that provide a strong pseudo-random permutation. It has been suggested that these schemes can be used as the main building blocks for achieving in-place disk encryption. In the past few years there has been an intense research activity towards constructing secure and efficient tweakable enciphering schemes. But, actual experimental performance data of these newly proposed schemes are yet to be reported....

2007/393 (PDF) Last updated: 2010-06-28

Overlap-free Karatsuba-Ofman Polynomial Multiplication Algorithms

Haining Fan, Jiaguang Sun, Ming Gu, Kwok-Yan Lam

We describe how a simple way to split input operands allows for fast VLSI implementations of subquadratic $GF(2)[x]$ Karatsuba-Ofman multipliers. The theoretical XOR gate delay of the resulting multipliers is reduced significantly. For example, it is reduced by about 33\% and 25\% for $n=2^{t}$ and $n=3^{t}$ $(t>1)$, respectively. To the best of our knowledge, this parameter has never been improved since the original Karatsuba-Ofman algorithm was first used to design $GF(2^n)$ multipliers in 1990.

2006/224 (PDF) (PS) Last updated: 2006-07-03

Generalizations of the Karatsuba Algorithm for Efficient Implementations

André Weimerskirch, Christof Paar

Implementation

In this work we generalize the classical Karatsuba Algorithm (KA) for polynomial multiplication to (i) polynomials of arbitrary degree and (ii) recursive use. We determine exact complexity expressions for the KA and focus on how to use it with the least number of operations. We develop a rule for the optimum order of steps if the KA is used recursively. We show how the usage of dummy coefficients may improve performance. Finally we provide detailed information on how to use the KA with least...

Search Help

62 results sorted by ID