Ed25519 20110926
Ed25519 20110926
1 Introduction
This paper introduces software for public-key signatures with several attractive
features:
– Fast single-signature verification. The software takes only 273364 cycles
to verify a signature on Intel’s widely deployed Nehalem/Westmere lines of
CPUs. (This performance measurement is for short messages; for very long
This work was supported by the National Science Foundation under grant 1018836,
by the European Commission under Contract ICT-2007-216676 ECRYPT II, and
by the National Science Council, National Taiwan University and Intel Corporation
under Grant NSC99-2911-I-002-001 and 99-2218-E-001-007, and the Academia Sinica
Career Award. Part of this work was carried out when Peter Schwabe was employed
by Academia Sinica, Taiwan. Part of this work was carried out when Niels Duif was
employed by Compumatica secure networks BV, the Netherlands. Permanent ID of
this document: a1a62a2f76d23f65d622484ddd09caf8. Date: 2011.09.26.
2 Bernstein, Duif, Lange, Schwabe, Yang
Longa and Gebotys in [51] claimed 281000 cycles on a Core 2 Duo E6750
(C2 65nm) for ECDH on a curve similar to ecfp256e, and 229000 cycles for
ECDH on a curve similar to gls1271. The software in [51] is not included in the
eBATS benchmarks and apparently is not publicly available, so we are unable
to benchmark it on a Westmere. More recently Käsper in [46] reported 457813
cycles for side-channel-protected ECDH on the NIST P-224 curve on a Core 2
Duo E8400 (C2 45nm); this software has been integrated into OpenSSL but not
yet into eBATS.
To aid comparisons we also implemented ECDH, specifically curve25519,
with the same side-channel defenses as our signature software (no secret array
indices, and no secret branch conditions). We submitted our ECDH software
to eBATS, which reports that the software uses 226872 cycles on hydra2 for
variable-base-point single-scalar multiplication. This is a new speed record for
public ECDH software, a new speed record for side-channel-protected ECDH
(out of all the papers mentioned above, the only ones that report side-channel
protection are [12] and [46]), and a new speed record for ECDH without endo-
morphisms.
We do not claim that curve25519 will maintain its current position on top of
eBATS: we would expect ECDH with endomorphisms (especially without side-
channel protection) to be somewhat faster than ECDH without endomorphisms
on many platforms. This expectation seems to be supported by the very recent
paper [42] by Hu, Longa, and Xu: [42, Table 2] claims 194000 cycles for a curve
with endomorphisms on an Intel Core 2 Duo E6750, and the accompanying web
site [42, reference 18] claims 182000 cycles on an Intel Core i5 540M (Westmere).
The same web site also claims 215000 Westmere cycles for a curve without
endomorphisms. However, like the software in [51], the software in [42] does not
appear to be publicly available. The resulting lack of verifiability raises questions
regarding accuracy. We are particularly skeptical of the Westmere speed claims,
given the Turbo Boost issues discussed above. After we wrote this paragraph,
the same web site was updated to claim 250000 cycles for the same software on
another Westmere CPU.
Given our 226872-cycle ECDH speed, given the ECDH-to-verification slow-
downs reported in [21] and [34], and given the extra costs that we incur for
decompressing keys and signatures, one would expect a verification speed close
to 400000 cycles. We do better than this for several reasons, the most important
reason being our use of batching. This requires careful design of the signature
system, as discussed later in this paper: ECDSA, like DSA and most other sig-
nature systems, is incompatible with fast batch verification.
Comparison to other signature systems. The eBATS benchmarks cover
42 different signature systems, including various sizes of RSA, DSA, ECDSA,
hyperelliptic-curve signatures, and multivariate-quadratic signatures. This paper
beats almost all of the signature times and verification times (and key-generation
times, which are an issue for some applications) by more than a factor of 2. The
only exceptions are as follows:
High-speed high-security signatures 5
The condition that d is not a square implies that d 6∈ {0, −1}, so this set E forms
a group with neutral element 0 = (0, 1) under the twisted Edwards addition law
x1 y2 + x2 y1 y1 y2 + x1 x2
(x1 , y1 ) + (x2 , y2 ) = ,
1 + dx1 x2 y1 y2 1 − dx1 x2 y1 y2
introduced by Bernstein, Birkner, Joye, Lange, and Peters in [13]. Completeness
of the addition law — the fact that the denominators 1±dx1 x2 y1 y2 are nonzero —
follows as explained in [13, Section 6]: −1 is a square in Fq (since q is congruent
to 1 modulo 4), so this addition law on E is Fq -isomorphic to the Edwards
addition law on the Edwards curve x2 + y 2 = 1 − dx2 y 2 , which is complete by
[14, Theorem 3.3] since −d is not a square in Fq . The latter follows from d being
a non-square and −1 being a square in Fq . The extra constraint mentioned above
is that `B = 0, where nB means the nth multiple of B in this group.
We use the encoding of Fq to define some field elements as being negative:
specifically, x is negative if the (b−1)-bit encoding of x is lexicographically larger
than the (b − 1)-bit encoding of −x. If q is an odd prime and the encoding is the
little-endian representation of {0, 1, . . . , q − 1} then the negative elements of Fq
are {1, 3, 5, . . . , q − 2}.
An element (x, y) ∈ E is encoded as a b-bit string (x, y), namely the (b − 1)-
bit encoding of y followed by a sign bit; the sign bit is 1 iff x is negative.
This encoding
p immediately determines y, and it determines x via the equation
x = ± (y − 1)/(dy 2 + 1).
2
EdDSA keys and signatures. An EdDSA secret key is a b-bit string k. The
hash H(k) = (h0 , h1 , . . . , h2b−1 ) determines an integer
X
a = 2b−2 + 2i hi ∈ 2b−2 , 2b−2 + 8, . . . , 2b−1 − 8 ,
3≤i≤b−3
EdDSA samples r from the interval [0, 22b − 1], ensuring almost uniformity of
the distribution modulo `. The guideline [2, Section 4.1.1, Algorithm 2] specifies
that the interval should be of size at least [0, 2b+61 − 1], i.e., 64 bits more than
`; for Ed25519 there are 259 extra bits.
Comparison to previous ElGamal variants. The original ElGamal system
[33, Section III] predated elliptic-curve cryptography; it instead used the mul-
tiplicative group F∗q . ElGamal took a large non-prime `, specifically ` = q − 1,
and focused on the case of prime q. ElGamal’s signatures were pairs (R, S) of
integers between 0 and q − 2 inclusive satisfying B H(M ) = AR RS in F∗q . See [33,
equation (3)]; see also [33, Attack 6] for the introduction of H. The signer, given
M , generates a random r coprime to ` and computes the signature (R, S), where
R = B r and S = r−1 (H(M ) − Ra) mod `.
Schnorr in [72] pointed out that one could safely work in an order-` subgroup
of F∗q with a prime ` much smaller than q, saving most of the space for S. Schnorr
also introduced several other improvements to ElGamal’s system, as discussed
below.
ElGamal’s verification equation involves R as an element of the group F∗q and
as a scalar, the exponent for A. For more general groups one needs a function
x mapping group elements to scalars. ECDSA works this way: it replaces F∗q
with an order-` subgroup of an elliptic-curve group over Fq and defines x(R) as
the x-coordinate of R. ECDSA also replaces A with −A, changing the signer’s
subtraction into an addition and obtaining the verification equation H(M )B +
x(R)A = SR. ECDSA replaces this three-scalar equation with the equivalent
two-scalar equation S −1 H(M )B + S −1 x(R)A = R at the expense of requiring
S to be invertible modulo `; note that both the signer and the verifier compute
inverses here.
Schnorr used a cryptographic hash function for x. This has minimal expense
and eliminates any concerns regarding the mathematical structure of simpler
functions x. Schnorr also compressed the group element R to the scalar x(R): a
Schnorr signature is (x(R), S) rather than (R, S). Given a compressed signature
(x(R), S), the verifier recomputes R as S −1 H(M )B +S −1 x(R)A and checks that
x(R) matches; at this point the verifier knows a valid uncompressed signature
(R, S), so the compression cannot reduce security.
Schnorr also merged the hashing of R with the hashing of M . One way to
understand this merging is to replace S with x(R)S, and to impose the extra
constraint x(R) 6= 0, obtaining the verification equation x(R)−1 H(M )B + A =
SR. There is no need for the multiplicative structure of x(R)−1 H(M ) here: one
can instead use the verification equation H(R, M )B + A = SR, with the signer
obtaining S as r−1 (H(R, M ) + a) mod `. Schnorr actually used the equation
SB = R + H(R, M )A, eliminating all inversions both for the signer and for the
verifier; this is an obvious advantage, saving time and reducing code size.
The presence of R as input to the hash function provides collision resilience:
the attacker cannot break Schnorr’s system by merely finding hash collisions.
Neven, Smart, and Warinschi in [60] proposed taking advantage of collision
resilience by choosing H to output only b/2 bits, reducing the size of compressed
10 Bernstein, Duif, Lange, Schwabe, Yang
signatures by 25%; but the same proposal had actually appeared twenty years
earlier in Schnorr’s original paper [72, Section 2].
Practical use of Schnorr’s system was hampered by a patent (which expired in
2008), but the system became well known to theoreticians, because the hashing
of R allowed various security proofs. Some proofs use the “forking lemma” to
show that any generic-hash attack against Schnorr’s system (i.e., any attack that
works for arbitrary functions H) can be converted into a DLP algorithm with
a polynomially bounded, although often quite severe, loss of efficiency. There
are also theorems with a different loss of efficiency for generic-group attacks
(i.e., attacks that work for arbitrary groups) under mild assumptions on H, and
theorems with no loss of efficiency for generic-group generic-hash attacks. See,
for example, [67], [74], [11], and [60]. We do not mean to exaggerate the real-
world relevance of “provable security”, but we find it obvious that Schnorr’s
system is a conservative, well-studied signature system.
Our verification equation is the same as Schnorr’s verification equation with
double-size hashing instead of half-size hashing, with A inserted as an extra hash
input, and without Schnorr’s compression of R. These modifications obviously do
not compromise security. The use of double-size hashing helps alleviate concerns
regarding hash-function security; the use of A is an inexpensive way to alleviate
concerns that several public keys could be attacked simultaneously; and the
avoidance of compression allows an important verification speedup, as discussed
in Section 5. We also reuse the double-size hash to alleviate concerns regarding
nonce randomness, as discussed above.
enforce these bounds except for comparisons. Multiplication accepts inputs with
each limb having up to 54 bits and produces results of which each limb can be
only slightly larger than 251 .
Multiplication and squaring. Schoolbook multiplication of two field elements
x and y, each represented in 5 unsigned integers, takes 25 mul instructions. The
results are again produced in two 64-bit integer registers, but as both inputs
12 Bernstein, Duif, Lange, Schwabe, Yang
have only up to 54 bits, the value in the upper result register has only up
to 44 bits. Adding two multiplication results now takes only one adc and one
add instruction. Furthermore reduction can be carried out simultaneously to
multiplication. For example, we do not compute a coefficient r5 . Whenever the
result of a mul instruction belongs to r5 , for example in the multiplication of
x2 · y3 , we multiply one of the inputs by 19 and add the result to r0 . Similarly
we do not compute r6 , r7 , r8 and r9 but directly add into r1 , . . . , r4 . Multiplying
one input by 19 yields a result with less than 64 bits so we can use the faster
imul instruction for these multiplications. The 5 result coefficients require 10
64-bit registers; the AMD64 architecture has 15 such registers, so we can keep
the result coefficients inside registers throughout the computation.
After the multiplication we need to reduce (carry) the 5 coefficients to obtain
a result with coefficients that are at most slightly larger than 251 . Denote the two
registers holding coefficient r0 as r00 and r01 with r0 = 264 r01 + r00 . Similarly
denote the two registers holding coefficient r1 as r10 and r11 . We first shift r01
left by 13, while shifting in the most significant bits of r00 (shld instruction)
and then compute the logical and of r00 with 251 − 1. We do the same with r10
and r11 and add r01 into r10 after the logical and with 251 − 1. We proceed this
way for coefficients r2 , . . . , r4 ; register r41 is multiplied by 19 before adding it
to r00 . Now all 5 coefficients fit into 64-bit registers but are still too large to be
used as input to another multiplication. We therefore carry from r0 to r1 , from
r1 to r2 , from r2 to r3 , from r3 to r4 , and finally from r4 to r0 . Each of these
carries is done as one copy, one right shift by 51, one logical and with 251 − 1,
and one addition.
Squaring needs only 15 mul instructions. Some inputs are multiplied by 2; this
is combined with multiplication by 19 where possible. The coefficient reduction
after squaring is the same as for multiplication.
Multiplication and squaring are implemented as separate functions, but calls
to these functions are used only for inversion (see below). Edwards-curve arith-
metic uses inlined functions for point addition and doubling.
Addition, subtraction, and inversion. The results of additions do not have
to be reduced if they are used as input to a multiplication. Long sequences of
additions that let coefficients grow larger than 54 bits would be a problem but we
do not have such sequences of additions. Field addition is therefore nothing but 5
integer additions without carries (add instruction). Subtraction is slightly more
expensive because we use unsigned coefficients. Therefore we first add a multiple
of q and then perform subtraction. This costs 5 add and 5 sub instructions.
Inversion is implemented as exponentiation with exponent q − 2. It uses the
same sequence of 255 squarings and 11 multiplications as [12].
4 Signing messages
Signature generation has three steps: (1) computing r = H(hb , . . . , h2b−1 , M );
(2) computing R = rB; (3) computing S = (r + H(R, A, M )a) mod `.
High-speed high-security signatures 13
Our primary concern is with short messages M , obviously the top concern for
a server trying to keep up with a given volume of data; longer messages take
more cycles per signature but far fewer cycles per byte. The computations of
H take negligible time for short messages. The reduction modulo ` also takes
negligible time with standard branchless techniques. For the rest of this section
we focus on the main signing bottleneck, namely computing rB given r.
High-level strategy. We begin by computing the 253-bit integer r mod `. We
then write r mod ` as r0 + 16r1 + · · · + 1663 r63 with
For each i we look up 16i |ri |B in a precomputed table, and P then conditionally
negate 16i |ri |B to obtain 16i ri B. Finally we compute rB as i 16i ri B.
There is nothing new in our computation at this level. Computing rB as a
sum of precomputed pieces is a special case of a standard scalar-multiplication
algorithm published by Pippenger in [64] (subsequently reinvented in [19] and
[50]); allowing negative coefficients is a standard tweak. The devil lies in the
lower-level
P i details — choosing the optimal radix 16, and computing 16i ri B and
i 16 ri B as efficiently as possible. These details are discussed below.
We also spend about 21000 cycles to invert Z at the end of the computation.
The complete signing procedure for a short message takes 87548 cycles.
5 Verifying signatures
Fast signature verification seems considerably more difficult than fast signa-
ture generation, for two reasons. First, the verifier has to recover the elliptic-
curve points A and R from the compressed points A and R. Second, checking
SB = R + H(R, A, M )A seems to require not merely a fixed-base scalar multi-
plication SB but also a much more expensive variable-base scalar multiplication
H(R, A, M )A. This section explains several techniques that we use to address
these problems.
Fast decompression. Recall that the encoding R of a point R = (x, y) contains
a straightforward encoding of y but contains p only a sign bit for x. One must
therefore recover x via the equation x = ± (y 2 − 1)/(dy 2 + 1); note that dy 2 +
1 6= 0 since −d is not a square. The division and square root here seem to involve
two exponentiations, about twice as expensive as the usual Weierstrass-curve
decompression.
Of course, we could use Montgomery’s trick to merge the two divisions in-
volved in decompressing two points, but two square roots and a division are still
more expensive than two Weierstrass-curve decompressions. We could also skip
the compression and decompression for applications willing to use 64-byte keys
and 96-byte signatures; but we think that 32-byte keys and 64-byte signatures
are considerably more attractive.
To save time we look more closely at the standard computation of square roots
in Fq . The prime q = 2255 − 19 is congruent to 5 modulo 8, so any square α ∈ Fq
satisfies α2 = β 4 where β = α(q+3)/8 , i.e., ±α = β 2 . The standard computation
is a√single exponentiation to compute β, followed by a quick multiplication of β
by −1 if β 2 = −α.
In the decompression context we are given α as a fraction u/v, where u = y 2 −1
and v = dy 2 + 1. Instead of computing α we merge the division with the square-
root computation:
discussed in Section 4 does not seem to offer any advantage. This computation
fits in very little space.
We have also considered the verification method suggested by Antipa, Brown,
Gallant, Lambert, Struik, and Vanstone in [7], but our very efficient elliptic-
curve arithmetic makes the overheads in this method — extra decompression
and a Euclidean computation — much more troublesome. In the batch context
discussed below, the only extra overhead of the method of [7] would be the
Euclidean computation, but the benefit would also be much smaller.
Fast batch verification. For any system bottlenecked by signature verification,
the problem is not to verify one signature at a time, but to verify many signatures
as quickly as possible.
Naccache, M’Raı̈hi, Vaudenay, and Raphaeli in [58, Section 2.2] proposed
verifying a batch of linear signature equations by verifying a random linear com-
bination of the equations. This proposal is not directly applicable to ElGamal,
DSA, Schnorr, ECDSA, etc., because all of those systems require computing lin-
ear functions (to compute R) rather than merely verifying linear functions; but
if R is transmitted instead of H(· · · ), as suggested in [58], then this problem
disappears.
Unfortunately, the verification algorithm in [58] was quite slow: [58, Table
1] reported “29n” multiplications to verify n signatures from the same signer
at a highly questionable 220 security level. If the same technique were adapted
to ECDSA and increased to a 2128 security level then it would require nearly
200 elliptic-curve additions for each signature from the same signer — somewhat
faster than verifying each signature separately, but not much.
The followup paper [10] by Bellare, Garay, and Rabin proposed a more com-
plicated verification technique using, e.g., 3200 multiplications to verify 100 ex-
ponentiations, or 6480 multiplications to verify 100 DSA signatures, in both
cases at a substandard 260 security level. See [10, Appendix A.1]. The number
of multiplications per signature begins to drop as the batch size grows towards
1000 — see [10, Figure 3] — but such large batches do not fit into cache on typical
CPUs.
The unimpressive theoretical performance of these batch-verification tech-
niques can be traced directly to the naive exponentiation algorithms used in
[58] and [10]. We do much better by using random linear combinations, as in
[58], together with state-of-the-art scalar-multiplication techniques.
Specifically, we start from a batch of (Mi , Ai , Ri , Si ) where (Ri , Si ) is an
alleged signature of Mi under key Ai . We choose independent uniform random
128-bit integers zi , compute Hi = H(Ri , Ai , Mi ), and verify the equation
X X X
− zi Si mod ` B + zi Ri + (zi Hi mod `)Ai = 0
i i i
cause it fits into less storage; see below for details. Note that zi is not secret, so
side-channel protection is not required.
The number of scalars here is 2n + 1. Half of the scalars are 253-bit and
half are 128-bit. If public keys appear repeatedly, the situation considered in
[58] and [10], then we could save some time by merging the 253-bit scalars;
this merging also explains why we do not use the similar signature equation
SB = A + H(R, A, M )R, which would allow only merging 128-bit scalars. Our
software focuses on general-purpose verification with arbitrary keys.
If verification succeeds then we are confident that 8Si B = 8Ri + 8Hi Ai for
each i, i.e., that each signature is valid. The logic is simple: the differences
Pi = 8Ri + 8Hi Ai − 8Si B arePelements of a cyclic group of prime order `, and
have been verified to satisfy i zi Pi = 0; but this equation cannot hold with
probability more than 2−128 unless all Pi = 0. For example, if P4 is nonzero then
the choices of z1 , z2 , z3 , z5 , z6 , . . . determine exactly one choice of z4 satisfying
−128
P
i zi Pi = 0, and z4 has chance at most 2 of matching that choice.
If verification fails then there must be at least one invalid signature. We then
fall back to verifying each signature separately. There are several techniques to
identify a small number of invalid signatures in a batch, but all known techniques
become slower than separate verification as the number of invalid signatures
increases; separate verification provides the best defense against denial-of-service
attacks.
Fast multi-scalar multiplication. The Bos–Coster method mentioned above
is as follows: to compute n1 P1 + n2 P2 + · · · , where n1 ≥ n2 ≥ · · · , we recursively
compute (n1 − n2 )P1 + n2 (P1 + P2 ) + · · · . For n1 much larger than n2 , say
2k+1 n2 > n1 ≥ 2k n2 , we could gain speed by instead recursively computing
(n1 − 2k n2 )P1 + n2 (2k P1 + P2 ) + · · · , but we have found this to occur so rarely
that checking for it is not worthwhile.
We keep the scalars ni in a heap so that identifying the two largest scalars is
easy. The usual method to replace the root of a heap is top-down, starting at the
root and swapping down for a variable number of steps. We instead use Floyd’s
1964 bottom-up algorithm discussed in [48, Exercise 5.2.3–18] (often miscredited
to [25] and [78]): start at the root, swap down to the bottom, and then swap
up for a variable number of steps. This has the advantage of somewhat reducing
the number of comparisons, and the not-so-well-known advantage of drastically
reducing the number of branches, especially for balanced heaps.
Results. The complete verification procedure takes under 134000 cycles per sig-
nature for batch size 64. Our batch-verification software is included in, although
not yet benchmarked by, the public eBATS benchmarking framework.
Doubling the batch size to 128 no longer fits into L1 cache but still improves
performance on our target CPU, taking under 125000 cycles per signature.
Larger batches take under 114000 cycles per signature while still fitting into
L2 cache. Our software spends about 44000 cycles on decompression, so verifi-
cation of uncompressed signatures (32 extra bytes) using uncompressed public
keys (another 32 extra bytes) would take only about 81000 cycles for batch size
18 Bernstein, Duif, Lange, Schwabe, Yang
128, even faster than signing. However, in this paper we have emphasized the
performance that we obtain without using so much space.
References
[1] — (no editor), 17th annual symposium on foundations of computer science, IEEE
Computer Society, Long Beach, California, 1976. MR 56:1766. See [64].
[2] — (no editor), Technical guideline TR-03111, elliptic curve cryptography
(2009). URL: https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/
Publikationen/TechnischeRichtlinien/TR03111/BSI-TR-03111_pdf.pdf?__
blob=publicationFile. Citations in this document: §2.
[3] — (no editor), SPEED: software performance enhancement for encryption and
decryption, 2007. URL: http://www.hyperelliptic.org/SPEED. See [35].
[4] — (no editor), Proceedings of the 6th ACM symposium on information, computer
and communications security, Hong Kong, March 22–24, 2011, Association for
Computing Machinery, 2011. ISBN 978-1-4503-0564-8. See [70].
[5] Michel Abdalla, Paulo S. L. M. Barreto (editors), Progress in cryptology —
LATINCRYPT 2010, first international conference on cryptology and information
security in Latin America, Puebla, Mexico, August 8–11, 2010, proceedings, Lec-
ture Notes in Computer Science, 6212, Springer, 2010. ISBN 978-3-642-14711-1.
See [59].
[6] Masayuki Abe (editor), Advances in cryptology — ASIACRYPT 2010, 16th inter-
national conference on the theory and application of cryptology and information
security, Singapore, December 5–9, 2010, proceedings, Lecture Notes in Computer
Science, 6477, Springer, 2010. ISBN 978-3-642-17372-1. See [38].
[7] Adrian Antipa, Daniel R. L. Brown, Robert P. Gallant, Robert J. Lam-
bert, René Struik, Scott A. Vanstone, Accelerated verification of ECDSA sig-
natures, in SAC 2005 [69] (2006), 307–318. MR 2007d:94044. URL: http://
www.cacr.math.uwaterloo.ca/techreports/2005/tech_reports2005.html. Ci-
tations in this document: §5, §5.
[8] Vijay Atluri, Trent Jaeger (program chairs), Proceedings of the 10th ACM con-
ference on computer and communications security, ACM Press, 2003. ISBN 1-
58113-738-9. See [47].
[9] George Barwood, Digital signatures using elliptic curves, message 32f519ad.
19609226@news.dial.pipex.com posted to sci.crypt (1997). URL: http://
groups.google.com/group/sci.crypt/msg/b28aba37180dd6c6. Citations in this
document: §2.
[10] Mihir Bellare, Juan A. Garay, Tal Rabin, Fast batch verification for modular ex-
ponentiation and digital signatures, in Eurocrypt ’98 [62] (1998), 236–250. URL:
http://cseweb.ucsd.edu/~mihir/papers/batch.html. Citations in this docu-
ment: §5, §5, §5, §5, §5.
[11] Mihir Bellare, Gregory Neven, Multi-signatures in the plain public-key model
and a general forking lemma, in CCS 2006 [45] (2006), 390–399. URL: http://
cseweb.ucsd.edu/~mihir/papers/multisignatures.html. Citations in this doc-
ument: §2.
[12] Daniel J. Bernstein, Curve25519: new Diffie-Hellman speed records, in PKC 2006
[81] (2006), 207–228. URL: http://cr.yp.to/papers.html#curve25519. Cita-
tions in this document: §1, §1, §2, §2, §2, §2, §3.
High-speed high-security signatures 19
[13] Daniel J. Bernstein, Peter Birkner, Marc Joye, Tanja Lange, Christiane Peters,
Twisted Edwards curves, in Africacrypt 2008 [77] (2008), 389–405. URL: http://
eprint.iacr.org/2008/013. Citations in this document: §2, §2, §4.
[14] Daniel J. Bernstein, Tanja Lange, Faster addition and doubling on elliptic curves,
in Asiacrypt 2007 [49] (2007), 29–50. URL: http://eprint.iacr.org/2007/286.
Citations in this document: §2, §2.
[15] Daniel J. Bernstein, Tanja Lange (editors), eBACS: ECRYPT Benchmarking of
Cryptographic Systems, accessed 19 September 2011 (2011). URL: http://bench.
cr.yp.to. Citations in this document: §1.
[16] G. R. Blakley, David Chaum (editors), Advances in cryptology, proceedings of
CRYPTO ’84, Santa Barbara, California, USA, August 19–22, 1984, proceedings,
Lecture Notes in Computer Science, 196, Springer, Berlin, 1985. ISBN 3-540-
15658-5. MR 86j:94003. See [32].
[17] Joppe W. Bos, High-performance modular multiplication on the Cell processor, in
WAIFI 2010 [39] (2010), 7–24. Citations in this document: §3.
[18] Gilles Brassard (editor), Advances in cryptology — CRYPTO ’89, 9th annual in-
ternational cryptology conference, Santa Barbara, California, USA, August 20–
24, 1989, proceedings, Lecture Notes in Computer Science, 435, Springer, Berlin,
1990. ISBN 3-540-97317-6. MR 91b:94002. See [72].
[19] Ernest F. Brickell, Daniel M. Gordon, Kevin S. McCurley, David B. Wilson,
Fast exponentiation with precomputation (extended abstract), in Eurocrypt ’92
[71] (1993), 200–207; see also newer version [20]. URL: http://cr.yp.to/bib/
entries.html#1993/brickell-exp. Citations in this document: §4.
[20] Ernest F. Brickell, Daniel M. Gordon, Kevin S. McCurley, David B. Wilson, Fast
exponentiation with precomputation: algorithms and lower bounds (1995); see also
older version [19]. URL: http://research.microsoft.com/~dbwilson/bgmw/.
[21] Michael Brown, Darrel Hankerson, Julio López, Alfred Menezes, Software imple-
mentation of the NIST elliptic curves over prime fields (2000); see also newer
version [22]. URL: http://www.cacr.math.uwaterloo.ca/techreports/2000/
corr2000-56.ps. Citations in this document: §1, §1.
[22] Michael Brown, Darrel Hankerson, Julio López, Alfred Menezes, Software imple-
mentation of the NIST elliptic curves over prime fields, in CT-RSA 2001 [56]
(2001), 250–265; see also older version [21]. MR 1907102.
[23] Billy Bob Brumley, Risto M. Hakala, Cache-timing template attacks, in Asiacrypt
2009 [53] (2009), 667–684. Citations in this document: §1.
[24] “Bushing”, Hector Martin “marcan” Cantero, Segher Boessenkool, Sven Peter,
PS3 epic fail (2010). URL: http://events.ccc.de/congress/2010/Fahrplan/
attachments/1780_27c3_console_hacking_2010.pdf. Citations in this docu-
ment: §2.
[25] Svante Carlsson, Average-case results on heapsort, BIT 27 (1987), 2–17. Citations
in this document: §5.
[26] Neil Costigan, Peter Schwabe, Fast elliptic-curve cryptography on the Cell
Broadband Engine, in Africacrypt 2009 [68] (2009), 368–385. URL: http://
cryptojedi.org/users/peter/#celldh. Citations in this document: §3.
[27] Peter de Rooij, Efficient exponentiation using precomputation and vector addition
chains, in Eurocrypt ’94 [28] (1995), 389–399. MR 1479665. Citations in this
document: §5.
[28] Alfredo De Santis (editor), Advances in cryptology — EUROCRYPT ’94, work-
shop on the theory and application of cryptographic techniques, Perugia, Italy,
May 9–12, 1994, proceedings, Lecture Notes in Computer Science, 950, Springer,
Berlin, 1995. ISBN 3-540-60176-7. MR 98h:94001. See [27], [58].
20 Bernstein, Duif, Lange, Schwabe, Yang
[29] Yvo Desmedt (editor), Advances in cryptology — CRYPTO ’94, 14th annual in-
ternational cryptology conference, Santa Barbara, California, USA, August 21–
25, 1994, proceedings, Lecture Notes in Computer Science, 839, Springer, Berlin,
1994. ISBN 3-540-58333-5. See [50].
[30] Vivien Dubois, Pierre-Alain Fouque, Adi Shamir, Jacques Stern, Practical crypt-
analysis of SFLASH, in Crypto 2007 [54] (2007), 1–12. URL: http://eprint.
iacr.org/2007/141. Citations in this document: §1.
[31] Niels Duif, Smart card implementation of a digital signature scheme for
Twisted Edwards curves, M.A. thesis, Technische Universiteit Eindhoven, 2011.
URL: http://www.nielsduif.nl/2011_05_20_report_final.pdf. Citations in
this document: §4.
[32] Taher ElGamal, A public key cryptosystem and a signature scheme based on dis-
crete logarithms, in Crypto ’84 [16] (1985), 10–18; see also newer version [33].
MR 87b:94037.
[33] Taher ElGamal, A public key cryptosystem and a signature scheme based on dis-
crete logarithms, IEEE Transactions on Information Theory 31 (1985), 469–472;
see also older version [32]. ISSN 0018-9448. MR 86j:94045. Citations in this doc-
ument: §2, §2, §2, §2, §2.
[34] Steven Galbraith, Xibin Lin, Michael Scott, Endomorphisms for faster elliptic
curve cryptography on a large class of curves, in Eurocrypt 2009 [43] (2009),
518–535. URL: http://eprint.iacr.org/2008/194. Citations in this document:
§1, §1, §1.
[35] Pierrick Gaudry, Emmanuel Thomé, The mpFq library and implementing curve-
based key exchanges, in SPEED [3] (2007), 49–64. URL: http://www.loria.fr/
~gaudry/papers.en.html. Citations in this document: §1.
[36] Danilo Gligoroski, Rune Steinsmo Odegøard, Rune Erlend Jensen, Ludovic Per-
ret, Jean-Charles Faugère, Svein Johan Knapskog, Smile Markovski, The digital
signature scheme MQQ-SIG (2010). URL: http://eprint.iacr.org/2010/527.
pdf. Citations in this document: §1.
[37] Eu-Jin Goh, Stanislaw Jarecki, Jonathan Katz, Nan Wang, Efficient signature
schemes with tight reductions to the Diffie-Hellman problems, Journal of Cryp-
tology 20 (2007), 493–514. URL: http://www.cs.umd.edu/~jkatz/papers.html.
See [47].
[38] Robert Granger, On the static Diffie–Hellman problem on elliptic curves over
extension fields, in Asiacrypt 2010 [6] (2010), 283–302. URL: http://eprint.
iacr.org/2010/177. Citations in this document: §1.
[39] M. Anwar Hasan, Tor Helleseth (editors), Arithmetic of finite fields, third interna-
tional workshop, WAIFI 2010, Istanbul, Turkey, June 27–30, 2010, proceedings,
Lecture Notes in Computer Science, 6087, Springer, 2010. ISBN 978-3-642-13796-
9. See [17].
[40] Hüseyin Hisil, Elliptic curves, group law, and efficient computation, Ph.D. thesis,
Queensland University of Technology, 2010. URL: http://eprints.qut.edu.au/
33233. Citations in this document: §1.
[41] Hüseyin Hisil, Kenneth Koon-Ho Wong, Gary Carter, Ed Dawson, Twisted Ed-
wards curves revisited, in Asiacrypt 2008 [63] (2008), 326–343. URL: http://
eprint.iacr.org/2008/522. Citations in this document: §4, §4, §4.
[42] Zhi Hu, Patrick Longa, Maozhi Xu, Implementing 4-dimensional GLV method on
GLS elliptic curves with j-invariant 0 (2011). URL: http://eprint.iacr.org/
2011/315. Citations in this document: §1, §1, §1, §1.
High-speed high-security signatures 21
[43] Antoine Joux (editor), Advances in cryptology — EUROCRYPT 2009, 28th an-
nual international conference on the theory and applications of cryptographic tech-
niques, Cologne, Germany, April 26–30, 2009, proceedings, Lecture Notes in Com-
puter Science, 5479, Springer, 2009. ISBN 978-3-642-01000-2. See [34].
[44] Antoine Joux, Vanessa Vitse, Elliptic curve discrete logarithm problem over
small degree extension fields. Application to the static Diffie–Hellman problem
on E(Fq5 ) (2010). URL: http://eprint.iacr.org/2010/157. Citations in this
document: §1.
[45] Ari Juels, Rebecca N. Wright, Sabrina De Capitani di Vimercati (editors), Pro-
ceedings of the 13th ACM conference on computer and communications security,
CCS 2006, Alexandria, VA, USA, October 30–November 3, 2006, Association for
Computing Machinery, 2006. See [11].
[46] Emilia Käsper, Fast elliptic curve cryptography in OpenSSL, in 2nd Workshop on
Real-Life Cryptographic Protocols and Standardization (RLCPS 2011), to appear
(2011). Citations in this document: §1, §1.
[47] Jonathan Katz, Nan Wang, Efficiency improvements for signature schemes with
tight security reductions, in CCS 2003 [8] (2003), 155–164; portions incorporated
into [37]. URL: http://www.cs.umd.edu/~jkatz/papers.html. Citations in this
document: §2.
[48] Donald E. Knuth, The art of computer programming, volume 3: sorting and
searching, 2nd edition, Addison-Wesley, Reading, 1998. ISBN 0-201-89685-0. Ci-
tations in this document: §5.
[49] Kaoru Kurosawa (editor), Advances in cryptology — ASIACRYPT 2007, 13th in-
ternational conference on the theory and application of cryptology and information
security, Kuching, Malaysia, December 2–6, 2007, proceedings, Lecture Notes in
Computer Science, 4833, Springer, 2007. ISBN 978-3-540-76899-9. See [14].
[50] Chae Hoon Lim, Pil Joong Lee, More flexible exponentiation with precomputation,
in [29] (1994), 95–107. Citations in this document: §4.
[51] Patrick Longa, Catherine H. Gebotys, Efficient techniques for high-speed elliptic
curve cryptography, in CHES 2010 [52] (2010), 80–94. Citations in this document:
§1, §1, §1.
[52] Stefan Mangard, François-Xavier Standaert (editors), Cryptographic hardware
and embedded systems, CHES 2010, 12th international workshop, Santa Barbara,
CA, USA, August 17–20, 2010, proceedings, Lecture Notes in Computer Science,
6225, Springer, 2010. ISBN 978-3-642-15030-2. See [51].
[53] Mitsuru Matsui (editor), Advances in cryptology — ASIACRYPT 2009, 15th in-
ternational conference on the theory and application of cryptology and informa-
tion security, Tokyo, Japan, December 6–10, 2009, proceedings, Lecture Notes in
Computer Science, 5912, Springer, 2009. ISBN 978-3-642-10365-0. See [23].
[54] Alfred Menezes (editor), Advances in cryptology — CRYPTO 2007, 27th annual
international cryptology conference, Santa Barbara, CA, USA, August 19–23,
2007, proceedings, Lecture Notes in Computer Science, 4622, Springer, 2007. ISBN
978-3-540-74142-8. See [30].
[55] David M’Raı̈hi, David Naccache, David Pointcheval, Serge Vaudenay, Computa-
tional alternatives to random number generators, in SAC ’98 [76] (1999), 72–80.
URL: http://www.di.ens.fr/~pointche/Documents/Papers/1998_sac.pdf. Ci-
tations in this document: §2.
[56] David Naccache (editor), Topics in cryptology — CT-RSA 2001: the cryptogra-
phers’ track at RSA Conference 2001, San Francisco, CA, USA, April 2001,
proceedings, Lecture Notes in Computer Science, 2020, Springer, 2001. ISBN 3-
540-41898-9. MR 2003a:94039. See [22].
22 Bernstein, Duif, Lange, Schwabe, Yang