BEAT: Asynchronous BFT Made Practical: Sisi Duan Michael K. Reiter Haibin Zhang

Session 10B: Protocols CCS’18, October 15-19, 2018, Toronto, ON, Canada
BEAT: Asynchronous BFT Made Practical

Sisi Duan Michael K. Reiter Haibin Zhang
UMBC UNC Chapel Hill UMBC
sduan@umbc.edu reiter@cs.unc.edu hbzhang@umbc.edu
ABSTRACT Iroha [56], R3 Corda [30], Tendermint [88], and many more. The
We present BEAT, a set of practical Byzantine fault-tolerant (BFT) Hyperledger umbrella [5], for instance, has become a global collab-
protocols for completely asynchronous environments. BEAT is flex- orative open-source project under the Linux Foundation, now with
ible, versatile, and extensible, consisting of five asynchronous BFT more than 250 members.
protocols that are designed to meet different goals (e.g., different Asynchronous BFT protocols [14, 21, 23, 70] are arguably the most
performance metrics, different application scenarios). Due to mod- appropriate solutions for building high-assurance and intrusion-
ularity in its design, features of these protocols can be mixed to tolerant permissioned blockchains in wide-area (WAN) environ-
achieve even more meaningful trade-offs between functionality ments, as these asynchronous protocols are inherently more ro-
and performance for various applications. Through a 92-instance, bust against timing and denial-of-service (DoS) attacks that can
five-continent deployment of BEAT on Amazon EC2, we show that be mounted over an unprotected network such as the Internet.
BEAT is efficient: roughly, all our BEAT instances significantly Asynchronous BFT ensures liveness of the protocol without de-
outperform, in terms of both latency and throughput, HoneyBad- pending on any timing assumptions, which is prudent when the
gerBFT, the most efficient asynchronous BFT known. network is controlled by an adversary. In contrast, partially syn-
chronous BFT (e.g., PBFT [27]) guarantees liveness only when the
CCS CONCEPTS network becomes synchronous (i.e., satisfies timing assumptions).
For instance, it was shown in [70] that PBFT would achieve zero
• Security and privacy → Systems security; Distributed sys-
throughput against an adversarial asynchronous scheduler.
tems security; • Computer systems organization → Reliabil-
ity; Availability; Challenges and opportunities in adopting asynchronous per-
missioned blockchains. While a recent asynchronous BFT proto-
KEYWORDS col, HoneyBadgerBFT [70], significantly improves prior asynchro-
nous BFT protocols [14, 21, 23, 70], there are still significant pain
Byzantine fault tolerance, BFT, asynchronous BFT, blockchain, ro-
points and challenges that prevent it from being used in practice.
bustness, threshold cryptography
Meanwhile, there are also new opportunities for asynchronous BFT
ACM Reference Format: with the rise of blockchains.
Sisi Duan, Michael K. Reiter, and Haibin Zhang. 2018. BEAT: Asynchronous Performance (latency, throughput) issues. Compared to partially syn-
BFT Made Practical. In 2018 ACM SIGSAC Conference on Computer and
chronous BFT protocols (e.g., PBFT [27]), HoneyBadgerBFT has
Communications Security (CCS ’18), October 15–19, 2018, Toronto, ON, Canada.
significantly higher latency and lower throughput, in part due to
ACM, New York, NY, USA, 14 pages. https://doi.org/10.1145/3243734.3243812
its use of expensive threshold cryptography (specifically, threshold
encryption [10] and threshold signatures [17]). This is particularly
1 INTRODUCTION visible in cases where each replica has limited computation power.
State machine replication (SMR) [64, 81] is a fundamental soft- These limitations are further exacerbated by various engineering
ware approach to enabling highly available services in practical issues. For example, HoneyBadgerBFT was evaluated at only 80-
distributed systems and cloud computing platforms (e.g., Google’s bit security and it will be even slower if implemented with now-
Chubby [20] and Spanner [29], Apache ZooKeeper [53]). Its Byzan- standard 128-bit security. Moreover, due to its use of an erasure-
tine failure counterpart, Byzantine fault-tolerant SMR (BFT), has coding library zfec [93], HoneyBadgerBFT can only support Reed-
recently regained its prominence, as BFT has been regarded as the Soloman codes (for which better alternatives exist) and at most 28
model for building permissioned blockchains where the distributed servers.
ledgers know each other’s identities but may not trust one another. No one-size-fits-all BFT. In partially synchronous environments, one-
As an emerging technology transforming business models, there has size-fits-all BFT protocols have been hard to achieve (as has been ar-
been a large number of industry implementations of permissioned gued in various works, e.g., [8, 31, 59]). Indeed, a variety of partially
blockchains, including Hyperledger Fabric [7, 87], Hyperledger synchronous BFT protocols [1, 8, 16, 27, 28, 31, 33, 59] have been
Permission to make digital or hard copies of all or part of this work for personal or
proposed to meet different needs. For instance, chain-based BFT
classroom use is granted without fee provided that copies are not made or distributed protocols, such as Aliph-Chain [8], BChain [33], and Shuttle [90],
for profit or commercial advantage and that copies bear this notice and the full citation favor throughput over latency. Q/U [1] achieves fault-scalability
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, that tolerates increasing numbers of faults without significantly
to post on servers or to redistribute to lists, requires prior specific permission and/or a decreasing performance. Zyzzyva [59] and Aliph [8] are hybrid
fee. Request permissions from permissions@acm.org. protocols that have high performance in failure-free cases. More-
CCS ’18, October 15–19, 2018, Toronto, ON, Canada
© 2018 Association for Computing Machinery.
over, a large number of robust BFT protocols [4, 9, 16, 28, 91] aim
ACM ISBN 978-1-4503-5693-0/18/10. . . $15.00
https://doi.org/10.1145/3243734.3243812
2028
to provide a trade-off between performance and liveness during BEAT family includes asynchronous BFT protocols that are de-
attacks that affect the timing behavior of the network. signed to meet different needs. BEAT’s design is modular, and it can
While robustness is natively achieved in asynchronous BFT, we be extended to provide many more meaningful trade-offs among
still require different designs and trade-offs for different perfor- functionality and performance. Third, BEAT is efficient. Roughly,
mance metrics. Unlike HoneyBadgerBFT, which was designed to all our BEAT instances significantly outperform, in terms of both
optimize throughput only, BEAT aims to be flexible and versatile, latency and throughput, HoneyBadgerBFT.
providing protocol instances optimized for latency, throughput, The BEAT protocols. BEAT includes five BEAT instances (BEAT0–
bandwidth, or scalability (in terms of the number of servers). BEAT4). BEAT0, BEAT1, are BEAT2 are general SMR that can sup-
Append-only ledger vs. smart contracts. We advocate distinguish- port both off-chain and on-chain smart contracts, while BEAT3 and
ing two different classes of blockchain applications: append-only BEAT4 are BFT storage that can support off-chain smart contracts
ledgers and on-chain smart contracts. The former corresponds to only. We summarize the characteristics of the BEAT protocols in
append-only, linearizable storage systems (hereinafter, BFT stor- Table 1 as a series of improvements to HoneyBadgerBFT.
age), and the latter corresponds to general SMR. While they share • BEAT0, our baseline protocol, incorporates a more secure and ef-
security requirements (agreement, total order of updates, liveness), ficient threshold encryption [85], a direct instantiation of thresh-
general SMR requires each replica to maintain a copy of all service old coin-flipping [22] (instead of using threshold signatures [17]),
state to support contracts that operate on that state. In contrast, and more flexible and efficient erasure-coding support.
BFT storage may leverage erasure coding to reduce overall storage • BEAT1 additionally replaces an erasure-coded broadcast (AVID
by allowing servers to keep only fragments. (See Sec. 3 for formal broadcast) [24] used in HoneyBadgerBFT with a replication-
definitions.) Both of the applications are rather popular. Applica- based broadcast (Bracha’s broadcast [19]). This helps reduce
tions such as food safety [92] and exchange of healthcare data [54] latency when there is low contention and the batch size is small.
are examples of append-only ledgers, while AI blockchain [86] and • BEAT2 opportunistically moves the encryption part of the thresh-
financial payments [55] fall into the category of requiring smart old encryption to the client, further reducing latency. BEAT2
contracts. Internet of things (IoT) with blockchains may be of either does so at the price of achieving a weaker liveness notion, but
type, depending on the applications: if one just uses blockchains can be combined with anonymous communication networks to
to store and distribute IoT data to avoid the single point of failure achieve full liveness. Asynchronous BFT with Tor networks has
that the clouds may have, then we just need the distributed ledger been demonstrated in HoneyBadgerBFT.
functionality; if one additionally uses blockchains to consume and BEAT2 additionally achieves causal order [21, 35, 79], a rather
analyze the data, then we will additionally need smart contracts. useful property for many blockchain applications that process
BFT storage may be extended to support off-chain smart con- transactions in a “first come, first served” manner, such as stock
tracts run among clients (e.g., Hyperledger Fabric [7]). While off- trading and financial payments.
chain smart contracts have many benefits (e.g., achieving some level • BEAT3 is a BFT storage system. While HoneyBadgerBFT, BEAT0,
of confidentiality, as argued in [7]), they also have limitations: 1) BEAT1, and BEAT2 use Byzantine reliable broadcast [19, 24, 67],
they are less suited to running complex smart contract applications we find that replacing Byzantine reliable broadcast with a differ-
with power- and computation-restricted clients (e.g., IoT devices); ent and more efficient primitive — bandwidth-efficient asynchro-
2) they require communication channels among clients; and 3) they nous verifiable information dispersal (AVID-FP) [45] (using fin-
do not support efficient cross-contract state update. gerprinted cross-checksum) suffices to build a BFT storage. The
Some blockchain systems use BFT for building consensus order- bandwidth consumption in BEAT3 is information-theoretically
ing services (e.g., Hyperledger Fabric). We find that BFT storage optimal. To order transactions of size B, the communication
may be used to model the consensus ordering service, and a more complexity of BEAT3 is O(B), while the complexity for Hon-
efficient BFT storage can lead to a more efficient ordering service. eyBadger and PBFT is O(nB) (where n is the total number of
When designing BEAT, we aimed to answer the following major replicas). This improvement is significant, as it allows running
question: Can we have asynchronous BFT storage that significantly BEAT in bandwidth-restricted environments, allows more ag-
outperforms asynchronous general SMR? gressive batching, and significantly improves scalability.
Flexible read. Some applications benefit from flexible reading, i.e., • BEAT4 further reduces read bandwidth. BEAT4 is particularly
reading just a portion of a data block as needed (instead of the whole useful when it is common that clients frequently read only a
block). For example, in a blockchain that stores video, a user may fraction of stored transactions. We provide a generic framework
only want to read the first portion of the stored video. This can be to enable this optimization, and BEAT4 is a specific instanti-
challenging when we use erasure-coding as the underlying storage ation of the framework. Roughly, BEAT4 reduces the access
mechanism. BEAT aims to achieve flexible read with significantly overhead by 50% with around 10% additional storage overhead.
reduced bandwidth. To achieve this, we extend fingerprinted cross-checksums [45]
BEAT in a nutshell. We design, implement, and evaluate BEAT — to handle partial read and to the case of pyramid codes [51],
a set of practical asynchronous BFT protocols that resolve the above and we design a novel erasure-coded asynchronous verifiable
challenges. First, BEAT leverages more secure and efficient cryp- information dispersal protocol with reduced read bandwidth
tography support and more flexible and efficient erasure-coding (AVID-FP-Pyramid). Both techniques may be of independent
support. Second, BEAT is flexible, versatile, and extensible; the interest.
2029
Discussed in New Techniques Features Applicability

more efficient labeled threshold encryption and coin flipping; baseline protocol; outperform
BEAT0 Section 6 general SMR
more flexible and efficient erasure coding support HoneyBadgerBFT in all metrics
BEAT1 Section 7 replacing AVID broadcast with Bracha’s broadcast latency optimized general SMR
latency optimized general SMR
BEAT2 Section 7 client-side labeled CCA threshold encryption
additional achieve causal order anonymous networks
BEAT3 Section 8 a novel protocol combining AVID-FP and ABA storage, bandwidth, throughput optimized BFT storage
extending fpcc to single fragments and pyramid
BEAT4 Section 9 further saving read bandwidth BFT storage
codes; a novel AVID-FP saving bandwidth
Table 1: Characteristics of BEAT protocols.
To our knowledge, all the erasure-coded systems against arbi- have rather different properties from BEAT storage (i.e., BEAT3 and
trary failures in reliable distributed systems community [6, 25, BEAT4). Loft has the same communication complexity as BEAT stor-
32, 41, 46] use conventional MDS (maximum distance separable) age, but it only achieves obstruction-freedom, vs. BEAT’s (random-
codes [69] such as Reed-Solomon codes [78] and they inherit ized) wait-freedom. AWE, Pasis, CT, and M-PoWerStore have larger
the large bandwidth features of MDS codes. On the other hand, communication complexity. Additionally, while AWE achieves wait-
a large number of works aim to reduce the read bandwidth by freedom, it relies on an architecture that separates storage from
designing new erasure coding schemes [42–44, 50–52, 58]. The metadata and therefore may rely on more servers.
systems using these codes work in synchronous environments Erasure-code choice in BEAT4 (Or: Why pyramid codes?). As
only, and do not achieve any strong consistency goals even in discussed in Section 1, an ingredient in BEAT is a novel adaptation
the crash failure model (let alone Byzantine failures). It is our of fingerprinted cross-checksums [45] to accommodate pyramid
goal to blend these two disjoint communities and offer new in- codes. Pyramid codes and their derivatives have already been used
sights to both, by designing novel Byzantine reliable broadcast in practice, although in a very different setting (data centers), and
and BFT protocols with reduced bandwidth. offer a significant performance boost [26, 51, 52]. We leverage them
• BEAT’s design is modular, and features of these protocols can here to reduce bandwidth costs for fragments that contain real
be mixed to achieve even more meaningful trade-offs among data. Its close competitor, Xorbas codes [80], reduces bandwidth
functionalities, performance metrics, and concrete applications. cost for both data and redundant fragments, though we do not
leverage them here. We also do not choose the (more complex)
2 RELATED WORK derivatives of basic pyramid codes such as generalized pyramid
The (subtle) differences between (BFT) SMR and (BFT) atomic codes [51] and local reconstruction codes [52] that offer maximal
registers. State machine replication [81] is a general technique to recoverability and improve the fault tolerance of basic pyramid
provide a fault-tolerant services using a number of server repli- codes. For our designed protocol, these codes offer even greater
cas. It can support arbitrary operations, not just read and write. In recoverability than we need and hence would be overkill. Weaver
SMR, the servers need to communicate with each other and run codes [43], HoVer codes [44], and Stepped Combination codes [42]
an interactive consensus protocol to keep the servers in the same (belonging to LDPC codes) do not provide the bandwidth savings
state. and flexibility that we need.
Register specifications were introduced by Lamport in a series of Another direction of research in code design is to read instead
papers [62, 65, 66], with atomic register as the strongest one. The from more fragments (see [50, 58] and references therein), but less
notions of linearizability and wait-freedom for atomic registers were data from each. However, the bandwidth savings are only around
introduced by Herlihy and Wing [48] and Herlihy [47], respectively. 20%∼30%, much less than pyramid codes and its derivatives. In
Atomic registers can only support reads and writes. addition, these codes do not fit our setting where we assume a fixed
Atomic registers can be realized in asynchronous distributed sys- number of servers may behave maliciously and we attempt to mask
tems with failures. However, state machine replication cannot be as many Byzantine servers as possible.
achieved in asynchronous environments [38], unless it uses random-
ization to circumvent this impossibility result. HoneyBadgerBFT 3 SYSTEM AND THREAT MODEL
and BEAT fall into this category. Timing assumptions. Distributed systems can be roughly divided
BFT SMR is suitable for a number of permissioned blockchain ap- into three categories according to their timing assumption: asyn-
plications (e.g., on-chain smart contracts), while atomic registers are chronous, synchronous, or partially synchronous. An asynchronous
more suitable to model data-centric and cloud storage applications. system makes no timing assumptions on message processing or
Comparison with erasure-coded Byzantine atomic registers. transmission delays. If there is a known bound on message process-
An active line of research studies erasure-coded Byzantine atomic ing delays and transmission delays, then the corresponding system
registers, as erasure coding can be used to provide storage reduc- is synchronous. The partial synchrony model [37] lies in-between:
tion and/or reduce bandwidth. Notable systems include Pasis [41], messages are guaranteed to be delivered within a time bound, but
CT [25], M-PoWerStore [32], Loft [46], and AWE [6]. These systems the bound may be unknown to participants of the system.
2030
In protocols for asynchronous systems, neither safety nor live- 4 BUILDING BLOCKS
ness can rely on timing assumptions. In contrast, a protocol built This section reviews the cryptographic and distributed systems
for a synchronous or partially synchronous system risks having its building blocks for BEAT.
safety or liveness properties violated if the synchrony assumption
Labeled threshold cryptosystems. We review robust labeled
on which it depends is violated. For this reason, protocols built for
threshold cryptosystem (i.e., threshold encryption) [85] where a
asynchronous systems are inherently more robust to timing and
public key is associated with the system and a decryption key is
denial-of-service (DoS) attacks [70, 94].
shared among all the servers. Syntactically, a (t, n) threshold en-
BFT SMR. We consider asynchronous Byzantine fault-tolerant cryption scheme ThreshEnc consists of the following algorithms.
state machine replication (BFT SMR) protocols, where f out of n A probabilistic key generation algorithm TGen takes as input a
replicas can fail arbitrarily (Byzantine failures) and a computation- security parameter l, the number n of total servers, and threshold
ally bounded adversary can coordinate faulty replicas. parameter t, and outputs (pk, vk, sk), where pk is the public key, vk
The replicas collectively implement the abstraction of a key- is the verification key, and sk = (sk1 , · · · , skn ) is a list of private
value store. A replica delivers operations, each submitted by some keys. A probabilistic encryption algorithm TEnc takes as input a
client. All operations must be deterministic functions of the key- public key pk, a message m, and a label lb, and outputs a ciphertext
value store contents. The client should be able to compute a final c. A probabilistic decryption share generation algorithm ShareDec
response to its submitted operation from the responses it receives takes as input a private key ski , a ciphertext c, and a label lb, and
from replicas. Correctness for a secure BFT SMR protocol is speci- outputs a decryption share σ . A deterministic share verification
fied as follows. algorithm Vrf takes as input the verification key vk, a ciphertext
• Agreement: If any correct replica delivers an operation m, then c, a label lb, and a decryption share σ , and outputs b ∈ {0, 1}. A
every correct replica delivers m. deterministic combining algorithm Comb takes as input the verifi-
• Total order: If a correct replica has delivered m 1 , m 2 , · · · , ms cation key vk, a ciphertext c, a label lb, a set of t decryption shares,
and another correct replica has delivered m 1′ , m 2′ , · · · , ms′ ′ , then and outputs a message m, or ⊥ (a distinguished symbol).
mi = mi′ for 1 ≤ i ≤ min(s, s ′ ). We require the threshold encryption scheme to be chosen cipher-
• Liveness: If an operation m is submitted to n− f correct replicas, text attack (CCA) secure against an adversary that controls up to
then all correct replicas will eventually deliver m. t − 1 servers. We also require consistency of decryptions, i.e., no
The liveness property has been referred to by other names, e.g., adversary that controls up to t − 1 servers can produce a ciphertext
“fairness” in CKPS [21] and SINTRA [23], and “censorship resilience” and two t-size sets of valid decryption shares (i.e., where Vrf returns
in HoneyBadgerBFT [70]. We use them interchangeably. b = 1 for each share) such that they yield different plaintexts.
For our purpose, we require a labeled threshold encryption
scheme [85]; threshold cryptosystems that do not support labels [10,
We consider two types of BFT SMR services.
18] are not suitable.
BFT storage. A BFT storage service implements only read(key) and
Threshold PRF. We review threshold PRF (e.g., [22]), where a
write(key, val) operations. The former should return to the client
public key is associated with the system and a PRF key is shared
the current value for key in the key-value store, and the latter
among all the servers. A (t, n) threshold PRF scheme for a func-
should update the value of key in the key-value store to val.
tion F consists of the following algorithms. A probabilistic key
General SMR. A general SMR service—which is our default concern, algorithm FGen takes as input a security parameter l, the num-
unless specified otherwise—supports operations that consist of ber n of total servers, and threshold parameter t, and outputs
arbitrary deterministic programs, or transactions, that operate on (pk, vk, sk), where pk is the public key, vk is the verification key,
the key-value store. and sk = (sk 1 , · · · , skn ) is a list of private keys. A PRF share evalu-
ation algorithm Eva takes a PRF input c, pk, and a private key ski ,
To support operations that are arbitrary transactions, each replica and outputs a PRF share yi . A deterministic share verification algo-
will typically maintain the contents of the key-value store in its en- rithm Vrf takes as input the verification key vk, a PRF input c, and
tirety. Then, total order and the determinism of transactions ensures a PRF share yi , and outputs b ∈ {0, 1}. A deterministic combining
that the key-value store contents remain synchronized at correct algorithm FCom takes as input the verification key vk, x, and a set
replicas (assuming they begin in the same state). BFT storage can be of t valid PRF shares, and outputs a PRF value y.1
implemented in more space-efficient ways, e.g., with each replica We require the threshold PRF value to be unpredictable against
storing only an erasure-coded fragment for the value of each key an adversary that controls up to t − 1 servers. We also require the
(e.g., [24, 41, 45, 46]). threshold PRF to be robust in the sense the combined PRF value for
Secure causal BFT protocols. One of the BEAT instances achieves c is equal to F(c).
causality, which we briefly recall as follows. Input causality prevents We can use a direct implementation of threshold PRF [22] or can
the faulty replicas from creating an operation derived from a correct use build a threshold PRF using threshold signatures [17, 83].
client’s but that is delivered (and so executed) before the operation Erasure coding scheme. An (n, m) erasure coding scheme takes
from which it is derived. The problem of preserving input causality as input m data fragments and outputs n (n ≥ m) same-size coded
in BFT atomic broadcast protocols was first introduced by Reiter fragments. This essentially captures the encode algorithm of an
and Birman [79]. The notion was later refined by Cachin et al. [21]
1 Oursyntactic description of threshold encryption and threshold PRF can be made
and recently generalized by Duan et al. [35].
more general and the algorithms are not necessarily non-interactive.
2031
erasure code, but we (intentionally) leave the decode algorithm

d1 d2 d3 d4 d5 d6
undefined.
An (n, m) erasure coding scheme is systematic if the n coded (9,6)
d7 d8 d9
fragments contain the original m data fragments and д = n − m MDS
redundant fragments. Let di (i ∈ [1..m]) denote the data fragments,
and di (i ∈ [m + 1..n]) denote the redundant fragments. We have d1 d2 d3 d4 d5 d6
di (i ∈ [1..n]) to denote all the coded fragments. (10,6) d7-1
d d
An (n, m) erasure coding scheme is maximum distance separable Pyramid d7-2 8 9
(MDS) [69] if and only if all the data fragments can be recovered
from any m-size subset of coded fragments.
A systematic (n, m) erasure coding scheme is linear if each redun- Figure 1: A basic (10,6) pyramid code from a (9,6) MDS code.
dant fragment di (i ∈ [m + 1..n]) is a linear combination of the data
fragments, i.e., di = m j=1 bi j d j , where bi j ’s are coding coefficients.
Í
j=1 b 7j d j and d 7−2 = j=4 b 7j d j , respectively. Clearly, we have

Í3 Í6
Basic pyramid codes. Huang et al. [51] introduced (basic) pyramid
codes to slightly trade space for access efficiency in erasure-coded d 7−1 + d 7−2 = d 7 . They are local (group) redundant fragments, and
storage systems. For practical parameters, the pyramid codes can d 8 and d 9 are global redundant fragments. If the fragment d 1 is not
reduce the access overhead by 50% with around 10% additional available, one can just use 3 fragments (d 2 , d 3 , and d 7−1 ) to recover
storage overhead, compared to MDS erasure codes. d 1 . If there is more than one failure in the local group, one would
Pyramid codes can be efficiently built from any (n, m) system- need to use the traditional MDS decoding algorithm to recover the
atic and MDS code that tolerates arbitrary д = n − m erasures as faulty local fragment. One may need to compute the sum of d 7−1
defined above. Specifically, we divide the m data fragments into L and d 7−2 to recover d 7 if necessary.
equal-size2 disjoint groups Sl (l ∈ [1..L]), each of which contains Fingerprinted cross-checksum. A fingerprinted cross-checksum
m/L data fragments. Next, we keep д1 out of д redundant fragments [45] is data structure used by a server to verify that its fragment
unchanged. These fragments are global redundant fragments. Then, corresponds to a unique original data block. An (n, m) fingerprinted
for each group Sl , we compute д0 = д − д1 group redundant frag- cross-checksum fpcc consists of an array fpcc.cc[ ] of n values and
ments and add them to each group, where the j-th group redundant an array fpcc.fp[ ] of m values. The first array is a cross-checksum [40,
fragment, denoted d j,l , is a projection of the j-th redundant block 60] that contains the n hashes of the n coded fragments. The sec-
dm+j in the original MDS code onto that group, i.e., d j,l is com- ond array holds homomorphic fingerprints of data fragments that
puted as dm+j except setting all the coding coefficients that do not preserve the property of linear codes. Let h be a collision-resistant
correspond to Sl to 0. hash function and H be a hash function modeled as a random or-
This yields an (m + д0 L + д1 , m) systematic and non-MDS code, acle. A homomorphic fingerprinting function fingerprint takes as
which has m data fragments and д0 L + д1 redundant fragments, input a random key and a data fragment and outputs a small field
where each of group Sl has д0 group redundant fragments. It is element. A fragment d is consistent with fpcc for index i ∈ [1..n],
important to note that new code is also linear. if fpcc.cc[i] = h(d) and fingerprint(r , d) = encode(fpcc.fp[1], · · · ,
We briefly describe two useful properties of the basic pyramid fpcc.fp[m]), where r = H(fpcc.cc[1], · · · , fpcc.cc[n]).
codes [51]: (1) An (m + д0 L + д1 , m) pyramid code can tolerate A central theorem in [45, 46] is that for an (n, m) systematic, MDS,
arbitrary д = д0 + д1 erasures; (2) Each equal-size group Sl is an and linear erasure coding scheme, no adversary A can produce two
(m/L + д0 , m/L) MDS code. To decode a data fragment, first try different sets of m fragments such that each fragment is consistent
from the group level. For each group Sl , if the number of the un- with fpcc for its index and they can be decoded into two different
available fragments is less than д0 , any unavailable fragments can data blocks with non-negligible probability.
be recovered. Otherwise, first recover all the unavailable fragments Asynchronous verifiable information dispersal using finger-
from the group level, and then move to global level. One may needs printed cross-checksum. In an asynchronous verifiable informa-
to compute the fragments in the original MDS code that correspond tion dispersal (AVID) protocol [24], a client disperses a block M to n
to group redundant fragments, and then uses the conventional de- servers (where at most f of them might be faulty). The clients can
coding algorithm of an MDS code to recover unavailable fragments. later retrieve the full block M through the servers. The verifiability
Figure 1 uses an example to describe pyramid codes and shows of the protocol ensures that any two clients retrieve the same block.
how they can reduce read bandwidth and system I/O. The example An (n, m)-asynchronous verifiable information dispersal scheme
builds a (10, 6) pyramid code from a (9, 6) linear, MDS code. In is a pair of protocols (disperse, retrieve) that satisfy the following
the (9, 6) MDS code, there are 6 data fragments and 3 redundant with high probability:
fragments. The redundant fragments (d 7 , d 8 , and d 9 ) are linear
combinations of the data fragments. For instance, d 7 can be written • Termination: If a correct client initializes disperse(M), then all
as d 7 = 6j=1 b7j d j , where b7j ’s are coding coefficients.
Í correct servers will eventually complete dispersal disperse(M).
• Agreement: If some correct server completes disperse(M), all
We then divide the data fragments equally (setting L = 2),
correct servers eventually complete disperse(M).
and compute one redundant fragment for each group: d 7−1 =
• Availability: If f + 1 correct servers complete disperse(M), a
2 While there is no need to require m divides L for pyramid codes, in practice one correct client can run retrieve( ) to eventually reconstruct some
almost always uses equal-size sets. Typically, L can be set to 2. block M ′ .
2032
• Correctness: If f + 1 correct servers complete disperse(M), all ACS can trivially lead to asynchronous BFT: each server can
correct clients that run retrieve( ) eventually retrieve the same propose a subset of transactions, and deliver the union of the trans-
block M ′ . If the client that initiated disperse(M) was correct, actions in the agreed-upon vector; sequence numbers can be then
then M ′ = M. assigned to the agreed transactions using any predefined but fixed
Cachin and Tessaro [24] proposed an erasure-coded AVID, which order.
we call AVID-CT, To broadcast a message M, the communication HoneyBadgerBFT in a nutshell. HoneyBadgerBFT essentially
complexity of AVID-CT is O(n|M |). follows Ben-Or et al. [14], which uses reliable broadcast (RBC)
AVID-FP [45] is a bandwidth-efficient AVID using fingerprinted and asynchronous binary Byzantine agreement (ABA) to achieve
cross-checksum. In AVID-FP, given a block B to be dispersed, the ACS. HoneyBadgerBFT cherry-picks a bandwidth-efficient, erasure-
dealer applies an (m, n) erasure coding scheme, where m ≥ f + 1 coded RBC (AVID broadcast) [24] and the most efficient ABA [72] to
and n = m + 2f . Here f is the maximum number of Byzantine realize ACS. Specifically, HoneyBadgerBFT uses Boldyreva’s thresh-
faulty servers that system can tolerate, and n is the total number of old signature [17] to provide common coins for the randomized
servers. Then it generates the corresponding fingerprinted cross- ABA protocol [72]. HoneyBadgerBFT favors throughput over la-
checksum for B with respect to the erasure coding scheme. Next, tency by aggressively batching client transactions. It was shown
the client distributes the erasure-coded fragments and the same that HoneyBadgerBFT can outperform PBFT when the number of
fingerprinted cross-checksum to the servers. Each server verifies servers exceeds 16 in terms of throughput in WANs, primarily be-
the correctness of the fragment that it receives according to the fin- cause HoneyBadgerBFT distributes the network load more evenly
gerprinted cross-checksum and then, roughly speaking, leverages than PBFT [27].
the (much smaller) fingerprinted cross-checksum in place of the
fragment in the original AVID protocol. Different from AVID-CT, to
disperse a message M, the communication complexity of AVID-FP RBC Yes ABA
trans0 ABA0 1 coin
is O(|M |).
No ......
Byzantine reliable broadcast. Byzantine reliable broadcast (RBC), trans1 ABA1 0 coin
also known as the “Byzantine generals’ problem,” was first intro-
Yes ......
duced by Lamport et al. [67]. An asynchronous reliable broadcast trans2 ABA2 1 coin
protocol satisfies the following properties: Yes ......
• Agreement: If two correct servers deliver two messages M and trans3 ABA3 1 coin
M ′ then M = M ′ .
• Totality: If some correct server delivers a message M, all correct
servers deliver M. Figure 2: The HoneyBadgerBFT protocol.
• Validity: If a correct sender broadcasts a message M, all correct
servers deliver M. As illustrated in Figure 2, the HoneyBadgerBFT protocol is com-
Bracha’s broadcast [19], one that assumes only authenticated posed of two subprotocols/phases: RBC and ABA. In the RBC phase,
channels, is a well-known implementation of Byzantine reliable each replica first proposes a set of transactions and uses reliable
broadcast. To broadcast a message M, its communication complexity broadcast to disseminate its proposal to all other replicas. In the
is O(n 2 |M |). Cachin and Tessaro [24] proposed both an erasure- second phase, n concurrent ABA instances are used to agree on an
coded AVID (AVID-CT, mentioned above) and an erasure-coded n-bit vector bi for i ∈ [1..n], where bi indicates that if replica i’s
variant of Bracha’s broadcast — AVID broadcast, which reduces proposed transactions are included.
the cost to O(n|M |) compared to that of Bracha’s broadcast. Note HoneyBadgerBFT proceeds in epochs. Let B be a batch size of
that we explicitly distinguish among AVID-CT and AVID-FP (both client transactions. In each epoch, each replica will propose B/n
of which are verifiable information dispersal protocols) and AVID transactions. Each epoch will commit Ω(B) transactions. To im-
broadcast (a RBC protocol). prove efficiency, HoneyBadgerBFT ensures that each replica pro-
poses mostly disjoint sets of transactions. For this reason, it asks
5 REVIEWING HONEYBADGERBFT replicas to propose randomly selected transactions. To prevent ad-
versary from censoring some particular transaction by excluding
This section provides an overview of HoneyBadgerBFT and related
whichever replicas propose it, HoneyBadgerBFT requires replicas
primitives. We begin by introducing asynchronous common subset
to use threshold encryption to encrypt transactions proposed to
(ACS).
avoid censorship.
Asynchronous common subset. HoneyBadgerBFT uses ACS [14, HoneyBadgerBFT contains four distributed algorithms: a thresh-
21]. Formally, an ACS protocol satisfies the following properties: old signature [17] that provides common coins for ABA, an ABA
• Validity: If a correct server delivers a set V , then |V | ≥ n − f protocol [72] that has expected running time O(1) (completing
and V contains the inputs of at least n − 2f correct servers. within O(k) rounds with probability 1 − 2−k ), a bandwidth-efficient
• Agreement: If a correct server delivers a set V , then all correct reliable broadcast [24], and a threshold encryption [10] to avoid
servers deliver V . censorship and achieve liveness.
• Totality: If n− f correct servers submit an input, then all correct Roughly, the reliable broadcast dominates the bandwidth and
servers deliver an output. guides the selection of batch size. The threshold encryption scheme
2033
and the threshold signature scheme use expensive cryptographic Directly instantiating common coin protocol. Instead of using
operations, and they and the ABA dominate the latency of Honey- a threshold signature to derive the common coins as in HoneyBad-
BadgerBFT. gerBFT and other multi-party computation protocols, we choose to
While HoneyBadgerBFT is the most efficient asynchronous BFT directly use threshold coin flipping. Specifically, we use the scheme
protocol known, HoneyBadgerBFT favors throughput over other due to Cachin, Kursawe, and Shoup (CKS) [22] and implement it
performance metrics (latency, bandwidth, scalability). For instance, again using the P-256 curve that provides 128 bits of security. The
HoneyBadgerBFT has rather high latency, which is particularly threshold PRF scheme is proven secure under the Computational
visible in local area networks (LANs) [89]. This makes it difficult to Diffie-Hellman (CDH) assumption in the random oracle model.
work in latency-critical applications. Indeed, it is desirable to have Enabling more efficient and more flexible erasure coding.
asynchronous BFT protocols that are designed for different goals HoneyBadgerBFT uses an erasure-coding library zfec [93] that sup-
(different performance metrics, different application scenarios). ports Reed-Soloman codes only and supports at most 128 servers.
We integrate the C erasure coding library Jerasure 2.0 [73] with
our BEAT framework. This allows us to remove the restriction
6 BEAT0 that HoneyBadgerBFT can only support at most 128 replicas, use
This section describes BEAT0, our baseline protocol, that uses a set more efficient erasure-coding schemes (e.g., Cauchy Reed-Soloman
of generic techniques to improve HoneyBadgerBFT. Specifically, codes [75]), and flexibly choose between erasure-coding scheme
BEAT0 incorporates a more secure and efficient threshold encryp- parameters to improve performance.
tion, a direct implementation of threshold coin flipping, and more Distributed key generation. Our threshold encryption and thresh-
flexible and efficient erasure-coding support. old PRF are discrete-log based, and BEAT0 and all subsequent BEAT
BEAT0 specification. Instead of using CPA/CCA-secure threshold instances allow efficient distributed key generation [39, 57], which
encryption that does not support labels, BEAT0 leverages a CCA- should be run during setup. The implementation of distributed key
secure, labeled threshold encryption [85] to encrypt transactions generation, however, is outside the scope of the present paper.
while making the ciphertexts uniquely identifiable.
BEAT0 proceeds in epochs (i.e., rounds). Let r the current epoch 7 BEAT1 AND BEAT2 — LATENCY OPTIMIZED
number. Let n be the total number of replicas. Let ThreshEnc = This section presents two latency-optimized protocols in BEAT:
(TGen, TEnc, ShareDec, Vrf, Comb) be a (f +1, n) labeled threshold BEAT1 and BEAT2.
encryption scheme. Let pk and vk be threshold encryption public
BEAT1. Via a careful study of latency for each HoneyBadgerBFT
key and verification key, respectively. Let ski be the private key for
subprotocol, we find that 1) most of latency comes from threshold
replica i ∈ [1..n]. Let B be the batch size of BEAT0.
encryption and threshold signatures, and 2) somewhat surpris-
Each replica i ∈ [1..n] randomly selects a set T of transactions of
ingly, when the load is small and there is low contention, erasure-
size B/n. It then computes a labeled threshold encryption ciphertext
coded reliable broadcast (AVID broadcast) [24] causes significant la-
(lb, c) ← TEncpk (lb,T ) where lb = (r, i). Next, each replica submits
$
tency. To test the actual latency overhead incurred by erasure-coded
the labeled ciphertexts to ACS as input. Each replica i, upon receiv-
broadcast, we implement a variant of HoneyBadgerBFT, HB-Bracha,
ing some labeled threshold ciphertexts (r , j ′, c) from some other
which replaces erasure-coded broadcast with a popular, replication-
replica j, does a sanity check to see if j = j ′ and if there is already
based reliable broadcast protocol — Bracha’s broadcast [19]. We
a different triple for the same r and j before proceeding. Namely,
find that when the client load is small, HB-Bracha outperforms
each replica i only stores and processes one ciphertext from the
HoneyBadgerBFT in terms of latency by 20%∼60%. This motivates
same j and the same r , and will discard ciphertexts subsequently
us to devise BEAT1.
received for the same j and r .
BEAT1 replaces the AVID broadcast protocol in BEAT0 with
After getting output from ACS, a replica i can run ShareDec to
Bracha’s broadcast. It turns out that when the load is small, BEAT1
decrypt the ciphertexts using its secret key ski , and broadcasts its
is consistently faster than BEAT0, though the difference by percent-
decryption shares. When receiving f + 1 valid shares (that pass
age is not as significant as that between HB-Bracha and Honey-
the verification of Vrf), a replica can use Comb to combine the
BadgerBFT. However, when the load becomes larger, BEAT1 has
transactions.
significantly higher throughput, just as the case between HB-Bracha
Efficiently instantiating CCA secure labeled threshold en- and HoneyBadgerBFT.
cryption. We observe that much of the latency in HoneyBadgerBFT
BEAT2. In BEAT0, our use of CCA-secure, labeled threshold en-
is due to usage of pairing-based cryptography, which is much slower
cryption is at the server side, to prevent the adversary from choos-
than elliptic curve cryptography (cf. [71]). We thus implement our
ing which servers’ proposals to include. BEAT2 opportunistically
threshold encryption using the TDH2 scheme by Shoup and Gen-
moves the use of threshold encryption to the client side, while still
naro [85] using the P-256 curve which provides standard 128-bit
using Bracha’s broadcast as in BEAT1.
security. TDH2 is secure against chosen-ciphertext attacks, under
In BEAT2, when the ciphertexts are delivered, it is too late for
the Decisional Diffie-Hellman (DDH) assumption in the random
the adversary to censor transactions. Thus, the adversary does not
oracle model [13].
know what transactions to delay, and can only delay transactions
Jumping ahead, while we use a stronger and functionally more
from specific clients. BEAT2 can be combined with anonymous
complex cryptographic scheme, our experiments show that doing
communication networks to achieve full liveness. BEAT2 addition-
so actually improves the latency of HoneyBadgerBFT greatly.
ally achieves causal order [21, 35, 79], which prevents the adversary
2034
from inserting derived transactions before the original, causally cross-checksum and its fragment. The client can then reconstruct
prior transactions. Causal order is a rather useful property for the transaction.
blockchain applications that process client transactions in a “first More formally, validity, agreement, and totality of the ACS us-
come, first served” manner, such as trading services, financial pay- ing AVID-FP follow directly from the properties of asynchronous
ments, and supply chain management. verifiable information dispersal, just as the case of using reliable
broadcast. The only difference is that the ACS using AVID-FP now
8 BEAT3 — BANDWIDTH OPTIMIZED BFT delivers a fingerprinted cross-checksum. We just need to prove that
STORAGE our ACS is functionally correct. This follows easily from correctness
of asynchronous verifiable information dispersal: if a fingerprinted
This section presents BEAT3, an asynchronous BFT storage system.
cross-checksum is delivered, then the corresponding data (i.e., trans-
BEAT3 significantly improves all performance metrics that we know
action) is retrievable, and all clients are able to retrieve the data
of — latency (compared to HoneyBadgerBFT), bandwidth, storage
and the data was previously proposed by some server.
overhead, throughput, and scalability.
Bandwidth comparison. To order transactions of size B, the com-
Deployment scenarios. Recall that the safety and liveness prop-
munication complexity of BEAT1, BEAT2, and HB-Bracha is O(n2 B),
erties of BFT storage remain the same as those of general SMR,
the complexity of HoneyBadgerBFT and BEAT0 is O(nB), while
with the only exception that the state may not be replicated at each
the communication complexity of BEAT3 is only O(B). This im-
server (but instead may be erasure-coded). BEAT3 can be used for
provement is significant, as it allows running BEAT in bandwidth-
blockchain applications that need append-only ledgers, and specific
restricted environments, allows more aggressive batching, and
blockchains where the consensus protocol serves as an ordering
greatly improves scalability.
service, such as Hyperledger Fabric [7, 87].
BEAT3. BEAT3 achieves better performance by using a novel com- 9 BEAT4 — FLEXIBLE READ
bination of a bandwidth-efficient information dispersal scheme
This section presents a general optimization for erasure-coded
(AVID-FP [45]) and an ABA protocol [72]. In comparison, Hon-
BEAT instances that significantly reduce read bandwidth. For many
eyBadgerBFT, BEAT0, BEAT1, and BEAT2 use a combination of
blockchain applications, particularly data-intensive ones, it is com-
reliable broadcast and an ABA protocol.
mon for clients to read only a fraction of the data block. Additionally,
AVID-FP has optimal bandwidth consumption which does not
for many applications using smart contracts, clients may be inter-
depend on the number of replicas. The bandwidth required to dis-
ested in seeing the first few key terms of a large contract instead of
perse a block M in AVID-FP is only O(|M |), while the bandwidth in
the lengthy, detailed, and explanatory terms.
AVID broadcast (used in HoneyBadgerBFT) is O(n|M |). Technically
Our technique relies on a novel erasure-coded reliable broadcast
speaking, AVID-FP has a much smaller communication complexity
protocol, AVID-FP-Pyramid, that reduces read bandwidth. AVID-
than AVID-CT because replicas in AVID-FP agree upon a small
FP-Pyramid uses pyramid codes [51]. As reviewed in Sec. 4, a
constant-size fingerprinted cross-checksum instead of on the block
(m +д0 L +д1 , m) pyramid code can tolerate arbitrary д = д0 +д1 era-
itself (i.e., the bulk data).
sures. Let n = m +д0 L+д1 . We define for a (m +д0 L+д1 , m) pyramid
Our basic idea is to replace AVID broadcast used in HoneyBad-
code a tailored fingerprinted cross-checksum. Our (m +д0 L +д1 , m)
gerBFT with an (n, m) AVID-FP protocol, where n = m + 2f and
fingerprinted cross-checksum fpcc consists of an array fpcc.cc[ ]
m ≥ f + 1. Accordingly, at the end of the AVID-FP protocol, each
that holds the hashes of all n coded fragments. The second array
replica now stores some fingerprinted cross-checksum and the cor-
fpcc.fp[ ] still contains m values that are fingerprints of the first m
responding erasure-coded fragment. There is, however, a challenge
data fragments, and because pyramid codes are linear, all the finger-
to use the approach. In AVID-FP, a correct replica cannot recon-
prints of coded fragments can be derived by these m fingerprints,
struct its fragment if it is not provided by the AVID-FP client who
just as all the coded fragments can be derived by the original m
proposes some transaction (here, some other replica in our pro-
fragments.
tocol). Namely, as mentioned by Hendricks et al. [45], even with
We say a fragment d is consistent with fpcc for index i ∈ [1..n],
a successful dispersal, only f + 1 correct replicas, instead of all
if fpcc.cc[i] = h(d) and fingerprint(r , d)= encode (fpcc.fp[1], · · · ,
correct replicas, may have the corresponding fragments. However,
fpcc.fp[m]), where r = H(fpcc.cc[1], · · · , fpcc.cc[n]).
ABA expects all correct replicas to deliver the transaction during
We extend the central theorem used in [45, 46] to the case of pyra-
the broadcast/dispersal stage (to correctly proceed). Note that we
mid codes and to the case for fragments. We derive the following
cannot trivially ask replicas in AVID-FP to reconstruct their indi-
new lemma.
vidual fragment or reconstruct the whole transaction, which would
nullify the bandwidth benefit of using AVID-FP. Lemma 9.1. For an (m +д0 L +д1 , m) fingerprinted cross-checksum
We observe that AVID-FP actually agrees on the fingerprinted fpcc, any probabilistic adversary A can produce with negligible prob-
cross-checksum of the transaction. It is good enough for us to ability a target data fragment index (resp., data fragment indexes)
proceed to the ABA protocol once each replica delivers the finger- and two sets of fragments (that may have different sizes) such that
printed cross-checksum. The consequence for BEAT3 is just as in each fragment is consistent with fpcc for its index and they can be
AVID-FP: at least f + 1 correct replicas have their fragments, and decoded into two different data fragments for the target index (resp.,
some correct replicas may not have their fragments. This causes different sets of fragments for the target indexes).
no problem, as the data is retrievable using f + 1 = m correct frag- The target data fragment index(es) may be an index of one of
ments. Each replica just needs to send the client the fingerprinted data fragment, indexes of all data fragments, or any number of
2035
indexes in between. The two set of fragments that A provides can These fragments are then decoded and the resulting block is re-
be of different sizes, and the decoding approaches for two sets may turned.
differ (may it be a group level or global level decoding). Read. To read a single fragment di , one could choose one of the
The proof the lemma is an adaptation to the one due to Hendricks following two options. In the first option, which we term as the
et al. [45, Theorem 3.4]. In proving Theorem 3.4 [45], the key claim is optimistic mode, a client requests from all servers the fingerprinted
that two different sets of m fragments for the same fragment indexes cross-checksum and only the target server i for the fragment. If it
and the same consistent fingerprinted cross-checksum imply that does not receive the fragment in time (set arbitrarily by the client),
at least one fragment from the two sets is different, which is the it queries the servers at the group level that contains the server i,
starting point of their proof. Following the same argument, we can and all servers in the local group should send their fragments. The
show that the probability that two fragments with the same index client will repeat the procedure from the group level until it receives
are different is bounded by ϵ ′ + q · ϵ, where ϵ ′ is the advantage of m + д0 (L − 1) fragments with matching fpcc and then recovers the
attacking the hash function, q is the total number random oracle fragment. In the second, which we term as the balanced mode, a
queries, and ϵ is the probability of the collisions in the fingerprinting client directly queries all servers at the group level, expecting the
function. The proof applies to any linear erasure-coding schemes, fragments from these group level servers.
including pyramid codes. Definition and security. While we could be more general, we pro-
AVID-FP-Pyramid. Now we describe AVID-FP-Pyramid, an asyn- vide a definition for AVID-FP-Pyramid that is specifically tailored
chronous verifiable information dispersal protocol that compared to for our purpose.
AVID-FP, further reduces read bandwidth. Instead of using a conven- An (n, m)-asynchronous verifiable information dispersal scheme
tional MDS erasure code, AVID-FP-Pyramid uses a pyramid code. In is a triple of protocols (disperse, retrieve, read) that satisfy the
an MDS code, m valid fragments can be used to reconstruct the orig- following with high probability:
inal block. In a pyramid code, we need in general m +д0 (L − 1) valid
• Termination: If a correct client initializes disperse(M) then all
fragments to reconstruct the block. Therefore, we have to make sure
correct servers will eventually complete dispersal disperse(M).
that in our new AVID protocol at least m+д0 (L−1)+f servers receive
• Agreement: If some correct server completes disperse(M), all
consistent fragments, of which f servers might be faulty. Moreover,
correct servers eventually complete disperse(M).
one needs to make sure that m + д0 L + д1 ≥ m + д0 (L − 1) + 2f , i.e.,
• Availability: If f + 1 correct servers complete disperse(M),
f ≤ (д0 + д1 )/2, which ensures that the total number of replicas do
a correct client can run retrieve( ) to eventually reconstruct
not overflow.
some block M ′ . Additionally, if f + 1 correct servers complete
Given a pyramid code (n, m) where n = m + д0 L + д1 that can
disperse(M), a correct client can run read(i) where i ∈ [1..m] to
tolerate arbitrary д = д0 + д1 erasures, we construct AVID-FP-
eventually obtain a fragment di .
Pyramid where f ≤ m and f ≤ (д0 + д1 )/2. Specifically, AVID-FP-
• Correctness: If f + 1 correct servers complete disperse(M), all
Pyramid consists of a triple of protocols (disperse, retrieve, read)
correct clients that run retrieve( ) eventually retrieve the same
which are described as follows.
block M ′ . If the client that initiated disperse(M) was correct,
Dispersal. To disperse a block B, a client applies the (n, m) pyramid then M ′ = M. Additionally, if f + 1 correct servers complete
code to generate n fragments {di }i=1 n and the fingerprinted cross-
disperse(M), all correct clients that run read(i) for i ∈ [1..m]
checksum fpcc. The server then sends each server i its fragment di eventually obtain the same fragment di′ . If the client that initiated
and fpcc. disperse(M) was correct, then di′ = di , where di is the i-th data
Upon receiving a disperse message, a server i verifies that the fragment of M.
fragment di is consistent with fpcc. (Concretely, server i checks
if fpcc.cc[i] = h(d) and fingerprint(r , d)= encode (fpcc.fp[1], · · · , Theorem 9.2. AVID-FP-Pyramid is an asynchronous verifiable
fpcc.fp[m]), where r = H(fpcc.cc[1], · · · , fpcc.cc[n]).) If this is true, information dispersal protocol as defined above.
the server stores the fragment and sends an echo message contain- BEAT-FR. Replacing the AVID-FP protocol in BEAT3 with our
ing fpcc (and only fpcc) to all servers. AVID-FP-Pyramid protocol, we obtain a new BFT storage protocol
Upon receiving m + д0 (L − 1) + f echo messages with matching — BEAT-FR which has reduced read bandwidth.
fingerprinted cross-checksum fpcc, a server sends a ready message
containing fpcc to all servers. Corollary 9.3. BEAT-FR is a BFT storage.
If receiving f + 1 ready with matching fingerprinted cross-
Instantiating BEAT-FR: BEAT4. BEAT-FR is a generic asynchro-
checksum fpcc, and if a server does not yet send a ready message,
nous BFT framework that reduces read bandwidth. BEAT4 is an
it sends a ready message to all other servers.
instantiation to BEAT-FR for concrete parameters. In BEAT4, we
Upon receiving 2f + 1 ready messages with matching fpcc, it
set L = 2, m is even, and д0 = 1, which allows us to tolerate one
stores and delivers fpcc.
failure within the local group, and reduces the read bandwidth by
Retrieval. The retrieval protocol is almost the same as that in AVID- 50%. In BEAT4, we have n = m + 2f + 1, m = f + 1, and n = 3m − 1.
FP, with only a parameter difference. To retrieve a block, a client Note that the number of echo messages which a replica has to wait
retrieves a fragment and fingerprinted cross-checksum from each before it can send ready message in BEAT4 is m + f .
server, waiting for matching fingerprinted cross-checksums from
Technique applicability. We comment that our technique pre-
f + 1 servers and consistent fragments from m + д0 (L − 1) servers.
sented in the section is general. While it is described for the setting
of AVID-FP, it can be applied to all erasure-coded asynchronous
2036
verifiable information dispersal and erasure-coded reliable broad- evaluate the polynomial directly, without leveraging faster lookup
cast protocols, including AVID-CT [24] and AVID broadcast [24]. tables. While the implementation can be further improved, we find
Therefore, the technique can be used to improve both erasure-coded that the implementation can already improve all performance met-
BFT storage (BEAT3) and general SMR (BEAT0). rics significantly. We implement fingerprinted cross-checksum in
C, with 3,500 lines of code.
10 IMPLEMENTATION AND EVALUATION Finally, we use Cython [11] to wrap the C code in Python and sup-
10.1 Implementation port functions including Reed-Solomon codes, pyramid codes, ma-
trix generation, coding padding, and fingerprinted cross-checksum.
We utilize the HoneyBadgerBFT prototype as the baseline to imple- The implementation involves around 1,000 lines of code in Python.3
ment six asynchronous BFT protocols, including five BEAT proto-
Threshold cryptography. We use the TDH2 scheme [85] for
cols (BEAT0 to BEAT4) and HB-Bracha. HB-Bracha is implemented
CCA-secure labeled threshold encryption and the threshold PRF
to understand the latency overhead caused by erasure coding. HB-
scheme [22] for distributed coin flipping. We implement both schemes
Bracha replaces the underlying erasure-coded reliable broadcast
using the Charm [2] Python library. We use NIST recommended
(AVID broadcast) with Bracha’s Broadcast [19], with the rest of the
P-256 curve to implement both schemes to provide standard 128-bit
components intact.
security.
Each of the six protocols involves 6,000 to 8,000 lines of code in
Python. The underlying erasure-coding schemes (Reed-Soloman
codes and pyramid codes) and fingerprinted cross-checksum, how- 10.2 Evaluation
ever, are implemented in C. The design and implementation of Overview. We deploy and test our protocols on Amazon EC2 uti-
BEAT is modular, and we have implemented the following building lizing up to 92 nodes from ten different regions in five different
blocks for the protocols. continents. Each node is a general purposed t2.medium type with
Erasure coding support. HoneyBadgerBFT is 100% Python, and two virtual CPUs and 4GB memory. We evaluate our protocols in
uses the zfec library to implement the Reed-Soloman code, an MDS both LAN and WAN settings, where the LAN nodes are selected
erasure code. The zfec library, while popular in Python projects, from the same Amazon EC2 region, and the WAN nodes are uni-
suffers from both efficiency and usability issues: it supports only the formly selected from different regions. We evaluate the protocols
traditional Reed-Soloman code implementation and supports only a under different network sizes (number of replicas) and contention
word size (finite field size, a key tunable parameter in erasure coding levels (batch sizes). For each experiment, we use f to represent the
for efficiency) of 8. Moreover, due to the usage of an erasure-coding network size, where 3f + 1 nodes are launched in total for BEAT0
library zfec [93], HoneyBadgerBFT supports at most 28 replicas. to BEAT3, HB-Bracha, and HoneyBadgerBFT (abbreviated as HB
In BEAT, we instead use Jerasure 2.0 [73], a C library for erasure- in the figures), and 3f + 2 nodes are used for BEAT4. We vary the
coding, to implement the underlying erasure-coding schemes (in- batch size where nodes propose 1 to 20,000 transactions at a time.
cluding Reed-Soloman codes and pyramid codes). Jerasure 2.0 sup- Bandwidth. The protocols mentioned above have rather different
ports a variety of other coding schemes (including Cauchy Reed- communication complexity. To order transactions of size B, the
Soloman codes [75]), and allows fine-grained parameter tuning. communication complexity of BEAT1, BEAT2, and HB-Bracha is
Fingerprinted cross-checksum. We observe that for efficiency O(n 2 B), the communication complexity of HoneyBadgerBFT and
reasons one cannot separate the implementation of fingerprinting BEAT0 is O(nB), while the communication complexity of BEAT3 is
functions from the underlying erasure-coding support. The only only O(B). The consequence for throughput is significant: with the
implementation of fingerprinting is due to Hendricks et al. [45, 46]. same bandwidth, BEAT3 and BEAT4 can process an order of mag-
They implemented their own erasure coding scheme using Rabin’s nitude more batched transactions, leading to significantly higher
information dispersal scheme [77] and the corresponding finger- throughput. Our evaluation, however, does not set the bandwidth
printed cross-checksum using Shoup’s NTL [84]. While their fin- this way, but rather assumes the bandwidth is ample and assumes
gerprinted cross-checksum is efficient, the erasure coding scheme all protocols use the same batch size. The readers should be aware
is rather slow. that BEAT3 and BEAT4 have much higher throughput if using a larger
In contrast, we use GF-Complete [74], the Jerasure’s underlying batch size.
Galois Field library using Intel SSID, to implement the fingerprinted Latency. We first evaluate the latency in the LAN setting with
cross-checksum primitive. Erasure coding schemes have three pa- f = 1, 2, 5, 10, and 15, respectively. We examine and compare the
rameters n, m, and w, where n is the number of fragments (also average latency under no contention where each node proposes a
the number of replicas), m is the number of data fragments (where single transaction (with variable size) at a time and no concurrent
m = f + 1 in our protocols), and w is the word size (the index size of requests are sent by the clients. In the LAN setting, network latency
the Galois Field GF(2w )). It is required that n + m < 2w and there- is relatively small, so the overhead is mainly caused by the protocols
fore n < 2w . The word size w is typically set to be between 4 and themselves. We report our result for f = 1, 2 in Figure 3.
16 for efficiency, and indeed w = 32 is the largest value supported
by Jerasure. However, for our applications, we need to use larger 3 PyECLib [76] is popular python library for erasure-coding: it has a Python interface
w = 64 or 128 for the security of fingerprinted cross-checksum. We but implements C based library, Liberasurecode [61], which allows us to use existing
therefore extend Jerasure to include these large w’s. erasure-coding library such as Jerasure[73] and Intel(R) ISA-L. We choose not to use
PyECLib, primarily because the underlying Liberasurecode has implemented data
The specific fingerprinting function we implemented is the eval- structures that are not necessary for our purpose. We therefore (have to) write our
uation fingerprinting [82]. Currently, we apply Horner’s rule to own wrapper for Jerasure and fingerprinted cross-checksum using Cython [11].
2037
1.5 HB BEAT0 1.47 in Figure 5(a) and the the result in the WAN setting in Figure 5(b).
BEAT1 HB-Bracha Both cases set f = 1. We also show latency vs. throughput in
BEAT2 BEAT3
Figure 5(c).
BEAT4
1.06 We first notice that BEAT0 slightly outperforms HoneyBad-
Latency (Sec)
1 gerBFT in both settings. This is expected since BEAT0 employs

0.8 optimized threshold cryptography. This also matches the result
for the latency under no contention. In comparison, while BEAT1,
0.58
0.54
0.49 BEAT2, and HB-Bracha are latency optimized, they do not outper-
0.5 0.44
form HoneyBadgerBFT in terms of throughput. We observe that in
0.21 0.23 both the LAN setting and WAN setting, BEAT1, BEAT2, and HB-
0.13 Bracha achieve higher throughput than HoneyBadgerBFT when
0.1 0.09 0.080.09
0 the batch size is small. However, when batch size is higher than
f =1 f =2 5000, all the three protocols have 20% to 30% lower throughput
than HoneyBadgerBFT. This is mainly because HB-Bracha con-
Figure 3: Latency of the protocols in the LAN setting under sumes higher network bandwidth, which causes degradation when
no contention. the batch size is large. This underscores the wisdom in designing
Encrypt Consensus Decrypt
HoneyBadgerBFT.
BEAT3 and BEAT4 outperform HoneyBadgerBFT consistently.
HB They also outperform BEAT0, BEAT1, and BEAT2 consistently,
BEAT0 though under low contention in the LAN setting, BEAT1 has larger
BEAT1 throughput than the other protocols. These results also meet our
HB-Bracha
expectation since BEAT3 and BEAT4 are bandwidth optimized.
0 0.05 0.1 0.15 time(s) Again, we stress that we compare the performance of the protocols
under the same batch size. BEAT3 and BEAT4 actually use much
Figure 4: Latency breakdown in the LAN setting with f = 1. lower network bandwidth than the other protocols, and so for the
same bandwidth budget, BEAT3 and BEAT4 (with more aggressive
batching) will achieve much better throughput compared with other
When f = 1, BEAT0, BEAT1, BEAT2, and BEAT3 are around 2× protocols.
faster than HoneyBadger, and when f becomes larger, they are even Scalability. We evaluate the scalability of BEAT0, BEAT3, and Hon-
faster than HoneyBadger. When f = 1, BEAT4 is about as fast as eyBadger by varying f from 1 to 30. We report our comparison
HoneyBadger. This is primarily because BEAT4 has one more node, between BEAT3 and HoneyBadger in Figure 5(d) (without BEAT0,
and the added overhead for the underlying consensus protocols and for ease of illustration). We observe that the throughput for both
threshold cryptography is particularly visible when f is small. As protocols is in general higher when the number of replicas is smaller.
f increases, HoneyBadger is much slower than BEAT4. Meanwhile, Peak throughput for BEAT3 is reached in all the cases when the
the difference between BEAT3 and BEAT4 becomes smaller; when batch size is greater than 15,000. In the most extreme case for our
f is 15, we barely notice the difference between them (not shown). experiment, where f = 30 and batch size is 20,000, the average
The differences among BEAT0, BEAT1, and BEAT2 are rather latency is about 1.5 minutes. As we can see in the figure, BEAT3
small when the batch size is 1, but becomes much more visible outperforms HoneyBadgerBFT in all the cases. However, the dif-
when the batch size becomes larger. However, the difference be- ference between BEAT3 and HoneyBadgerBFT becomes smaller as
tween BEAT1 and BEAT2 is not as large as the difference between the number of replicas grows. This is in part due to the fact that in
HoneyBadger and HB-Bracha. Meanwhile, when the batch size large-scale networks, network latency may dominate the overhead
exceeds 1,000, BEAT0 becomes faster than BEAT1 (not shown). of the protocol. BEAT0 has performance between BEAT3 and Hon-
We further assess the latency breakdown for HoneyBadgerBFT, eyBadger, and again when f increases their difference becomes
BEAT0, BEAT1, and HB-Bracha in order to better understand why smaller.
we have these results. As illustrated in Figure 4, we evaluate the time
for encrypting transactions, consensus protocols, and decrypting
and combining transactions. We find the encryption and decryption
11 LESSONS LEARNED
for BEAT0 and BEAT1 are about three times faster than those in
HoneyBadger and HB-Bracha. In addition, BEAT0 and BEAT1 use We implemented six new protocols (BEAT instances and HB-Bracha).
threshold PRF to produce the common coins for the consensus, While many of these protocols use similar components, maintaining,
and the latency of the consensus is also reduced by about 50%. HB- deploying, and comparing different BEAT instances takes tremen-
Bracha also achieves lower latency than HoneyBadgerBFT due to dous effort. While one of our goals is to make BEAT modular and
the use of latency-optimized Bracha’s broadcast. This also explains extensible, in practice it is still challenging to develop all the vari-
why BEAT1 has lower latency than BEAT0 when the batch size is ants of the protocols. This is in part because even for the same
small. function (e.g., threshold encryption), different APIs need to main-
tained. In fact, changing a small function in a BEAT instance may
Throughput. We evaluate the throughput of the protocols under
need to touch a large number of related functions accordingly.
different contention levels. We present the results in the LAN setting
2038
·104
2.5 4,000
HB BEAT0 HB BEAT0
BEAT1 HB-Bracha BEAT1 HB-Bracha
2 BEAT2 BEAT3 BEAT2 BEAT3
3,000
Throughput (tx/sec)
Throughput (tx/sec)
BEAT4 BEAT4
1.5
2,000
1
1,000
0.5
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Batch Size ·104 Batch Size ·104
(a) Throughput for f = 1 where the nodes are from the same Amazon (b) Throughput for f = 1 where the nodes are from 4 Amazon EC2
EC2 region. regions in different continents.
3,000
15
HB BEAT0 BEAT1 f =1 f =2
HB-Bracha BEAT2 BEAT3 f =5 f = 15
BEAT4 f = 20 f = 30
Throughput (tx/sec)
2,000
Latency (Sec)
10
1,000
5
0 0
0 500 1,000 1,500 2,000 2,500 3,000 0 0.5 1 1.5 2
Throughput (tx/sec) Batch Size ·104
(c) Latency vs. Throughput in the WAN setting with f = 1. (d) Scalability of BEAT3 and HoneyBadgerBFT. Solid lines represent
the BEAT3. Dashed lines with the same mark represent the result for
HoneyBadgerBFT with the same number of replicas.
Figure 5: Performance of the protocols. (The pictures are best viewed in color.)
On the other hand, we find that perhaps surprisingly, it may cross-checksum and new asynchronous verifiable information dis-
be easier to develop and deploy asynchronous BFT than partially persal, which might be of independent interest.
synchronous BFT, for at least two reasons. First, protocols assuming
partial synchrony rely on view change subprotocols, which are very ACKNOWLEDGMENT
difficult to implement well from our own experience and from the The authors are indebted to our shepherd Haibo Chen and the
fact that a significant number of academic papers choose not to CCS reviewers for their helpful comments that greatly improve our
implement the view change protocols. Second, because of native paper.
robustness against timing and liveness attacks for asynchronous
BFT, we simply do not need to take further measures to ensure
robustness.
12 CONCLUSION
We describe the design and implementation of BEAT, a family of
practical asynchronous BFT protocols that are efficient, flexible,
versatile, and extensible. We deploy and evaluate the five BEAT
protocols using 92 instances on Amazon EC2, and we show BEAT
protocols are significantly more efficient than HoneyBadgerBFT,
the most efficient asynchronous BFT known. We also develop new
distributed system ingredients, including generalized fingerprinted
2039
REFERENCES [39] R. Gennaro, S. Jarecki, H. Krawczyk, and T. Rabin. Secure distributed key genera-
[1] M. Abd-El-Malek, G. Ganger, G. Goodson, M. K. Reiter, and J. Wylie. Fault-scalable tion for discrete-log based cryptosystems. J. Cryptology 20(1): 51–83 (2007)
Byzantine fault-tolerant services. SOSP 2005. [40] L. Gong. Securely replicating authentication services. ICDCS, pp. 85–91, IEEE
[2] J. A. Akinyele, et al. Charm: a framework for rapidly prototyping cryptosystems. Computer Society, 1989.
Journal of Cryptographic Engineering, 3(2):111–128, 2013. [41] G. R. Goodson, J. J. Wylie, G. R. Ganger, and M. K. Reiter. Efficient Byzantine-
[3] Amazon Web Services (AWS). https://aws.amazon.com/ tolerant erasure-coded storage. DSN-DCCS 2004, pp. 135–144, 2004.
[4] Y. Amir, B. Coan, J. Kirsch, and J. Lane. Prime: Byzantine replication under attack. [42] K. M. Greenan, X. Li, and J. J. Wylie. Flat XOR-based erasure codes in storage sys-
IEEE TDSC, 8(4):564–577, 2011. tems: Constructions, efficient recovery, and tradeoffs. IEEE Mass Storage Systems
[5] Hyperledger Whitepaper: An introduction to Hyperledger. https: and Technologies, 2010.
//www.hyperledger.org/wp-content/uploads/2018/08/HL_Whitepaper_ [43] J. L. Hafner. Weaver codes: Highly fault tolerant erasure codes for storage systems.
IntroductiontoHyperledger.pdf USENIX FAST, 2005.
[6] E. Androulaki, C. Cachin, D. Dobre, and M. Vukolic. Erasure-coded Byzantine [44] J. L. Hafner. HoVer erasure codes for disk arrays. DSN, 2006.
storage with separate metadata. OPODIS 2014, pp. 76–90, 2014. [45] J. Hendricks, G. R. Ganger, and M. K. Reiter. Verifying distributed erasure-coded
[7] E. Androulaki et al. Hyperledger Fabric: a distributed operating system for per- data. PODC 2007, pp. 139–146, 2007.
missioned blockchains. EuroSys 2018. [46] J. Hendricks, G. R. Ganger, and M. K. Reiter. Low-overhead Byzantine fault-
[8] P-L. Aublin, R. Guerraoui, N. Knezevic, V. Quema, and M. Vukolic. The next 700 tolerant storage. SOSP 2007, 2007.
BFT protocols. TOCS, vol. 32, issue 4, January 2015. [47] M. Herlihy. Wait-free synchronization. ACM Transactions on Programming Lan-
[9] P-L. Aublin, S. Mokhtar, and V. Quema. RBFT: Redundant Byzantine fault toler- guages and Systems, 13(1):124–149, 1991.
ance. ICDCS 2013. [48] M. Herlihy and J. Wing. Linearizability: a correctness condition for concurrent
[10] J. Baek and Y. Zheng. Simple and efficient threshold cryptosystem from the gap objects. ACM Transactions on Programming Languages and Systems, 12(3):463–
Diffie-Hellman group. GLOBECOM ’03, pp. 1491–1495, 2003. 492, 1990.
[11] S. Behnel, et al. Cython: The best of both worlds. Computing in Science & Engi- [49] M. Herlihy, V. Luchangco, and M. Moir. Obstruction-free synchronization: Double-
neering, 13(2:31–39, 2011. ended queues as an example. Proceedings of the 23rd International Conference on
[12] M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message Distributed Computing Systems, pp. 522–529, IEEE Computer Society, 2003.
authentication. CRYPTO 1996. [50] Y. Hu, H. Chen, P. Lee, and Y. Tang. NCCloud: Applying network coding for the
[13] M. Bellare and P. Rogaway. Random oracles are practical: A paradigm for design- storage repair in a Cloud-of-Clouds. USENIX FAST, 2012.
ing efficient protocols. CCS 93, 1993. [51] C. Huang, M. Chen, and J. Li. Pyramid codes: Flexible schemes to trade space for
[14] M. Ben-Or, B. Kelmer, and T. Rabin. Asynchronous secure computations with access efficiency in reliable data storage systems. ACM Transactions on Storage
optimal resilience. PODC 94. (TOS), Volume 9 Issue 1, March 2013. Earlier version in NCA 2007.
[15] A. Bessani, E. Alchieri, M. Correia, and J. Fraga. DepSpace: A Byzantine fault- [52] C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin.
tolerant coordination service. EuroSys ’08. Erasure coding in Windows Azure Storage. USENIX ATC’12, 2012.
[16] A. Bessani, J. Sousa, and E. Alchieri. State machine replication for the masses [53] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination
with BFT-SMART. DSN ’14. for Internet-scale systems. USENIX ATC 2010.
[17] A. Boldyreva. Efficient threshold signature, multisignature and blind signature [54] IBM Watson Health Announces Collaboration to Study the Use of Blockchain
schemes based on the gap-Diffie-Hellman-group signature scheme. PKC 2003. Technology for Secure Exchange of Healthcare Data. https://www-03.ibm.com/
[18] D. Boneh, X. Boyen, and S. Halevi. Chosen ciphertext secure public key threshold press/us/en/pressrelease/51394.wss
encryption without random oracles. CT-RSA, 2006. [55] IBM Announces Major Blockchain Solution to Speed Global Payments.
[19] G. Bracha. Asynchronous Byzantine agreement protocols. Information and Com- https://www-03.ibm.com/press/us/en/pressrelease/53290.wss
putation 75, pp. 130–143, 1987. [56] Iroha. https://github.com/hyperledger/iroha
[20] M. Burrows. The Chubby lock service for loosely-coupled distributed systems. [57] A. Kate, Y. Huang, and I. Goldberg. Distributed key generation in the wild. IACR
OSDI, 2006. Cryptology ePrint Archive 2012: 377 (2012).
[21] C. Cachin, K. Kursawe, F. Petzold, and V. Shoup. Secure and efficient asynchronous [58] O. Khan, R. Burns, J. Plank, W. Pierce, and C. Huang. Rethinking erasure codes
broadcast protocols (extended abstract). CRYPTO 2001. for cloud file systems: Minimizing I/O for recovery and degraded reads. USENIX
[22] C. Cachin, K. Kursawe, and V. Shoup. Random oracles in Constantinople: Practical FAST, 2012.
asynchronous Byzantine agreement using cryptography. Journal of Cryptology [59] R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: Speculative
18(3), 219–246. Byzantine fault tolerance. SOSP 2007.
[23] C. Cachin and J. Poritz. Secure Intrusion-tolerant replication on the Internet. DSN [60] H. Krawczyk. Distributed fingerprints and secure information dispersal. Proceed-
2002, pp. 167–176, 2002. ings of the 12th ACM Symposium on Principles of Distributed Computing, pp. 207–
[24] C. Cachin and S. Tessaro. Asynchronous verifiable information dispersal. SRDS 218, ACM Press, 1993.
2005. [61] Liberasurecode. https://github.com/openstack/liberasurecode
[25] C. Cachin and S. Tessaro. Optimal resilience for erasure-coded Byzantine dis- [62] L. Lamport. Concurrent reading and writing. Communications of the ACM 11(20),
tributed storage. DSN-DCCS 2006, pp. 115–124, 2006. 806–811, 1977.
[26] B. Calder et al. Windows Azure Storage: A highly available cloud storage service [63] L. Lamport. Time, clocks, and the ordering of events in a distributed system.
with strong consistency. ACM SOSP, 2011. Comm. ACM 21, 7 (July), 558–565, 1978.
[27] M. Castro and B. Liskov. Practical Byzantine fault tolerance and proactive recov- [64] L. Lamport. Using time instead of timeout for fault-tolerant distributed systems.
ery. ACM Trans. Comput. Syst, 20(4): 398–461, 2002. Trans. Prog. Lang. and Systems 6(2):254–280, 1984.
[28] A. Clement, E. Wong, L. Alvisi, M. Dahlin, and M. Marchetti. Making Byzantine [65] L. Lamport. On interprocess communication. Part I: Basic formalism. Distrib.
fault tolerant systems tolerate Byzantine faults. NSDI 2009. Comput. 1, 2, 77–85, 1986.
[29] J. Corbett et al. Spanner: Google’s globally-distributed database. OSDI, 2012. [66] L. Lamport. On interprocess communication. Part II: Algorithms. Distrib. Comput.
[30] Corda. https://github.com/corda/corda 1, 2, 86–101, 1986.
[31] J. Cowling et al. HQ replication: A hybrid quorum protocol for Byzantine fault [67] L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM
tolerance. OSDI 2006. Trans. on Programming Languages and Systems 4(3): 382–401, 1982.
[32] D. Dobre, G. Karame, W. Li, M. Majuntke, N. Suri, and M. Vukolic. PoWerStore: [68] Q. Lian, W. Chen, and Z. Zhang. On the impact of replica placement to the
Proofs of writing for efficient and robust storage. ACM CCS, 2013. reliability of distributed brick storage systems. ICDCS 2005, pp. 187–196, 2005.
[33] S. Duan, H. Meling, S. Peisert, and H. Zhang. BChain: Byzantine replication with [69] F. J. MacWilliams and N. J. A. Sloane. The Theory of Error Correcting Codes. Ams-
high throughput and embedded reconfiguration. OPODIS 2014. terdam, North-Holland, 1977.
[34] Sisi Duan, Sean Peisert, and Karl Levitt. hBFT: Speculative Byzantine fault toler- [70] A. Miller, Y. Xia, K. Croman, E. Shi, and D. Song. The honey badger of BFT
ance with minimum cost. IEEE Transaction on Dependable and Secure Computing, protocols. ACM CCS 16, 2016.
12(1): 58–70, 2015. [71] D. Moody, R. Peralta, R. Perlner, A. Regenscheid, A. Roginsky, and L. Chen. Report
[35] S. Duan, M. K. Reiter, and H. Zhang. Secure causal atomic broadcast, revisited. on pairing-based cryptography. Journal of Research of the National Institute of
DSN 2017. Standards and Technology, 2015.
[36] S. Duan and H. Zhang. Practical state machine replication with confidentiality. [72] A. Mostefaoui, H. Moumen, and M. Raynal. Signature-free asynchronous Byzan-
SRDS, 2016. tine consensus with t < n/3 and O(n 2 ) messages. PODC 2014.
[37] C. Dwork, N. Lynch, and L. Stockmeyer. Consensus in the presence of partial [73] J. Plank and K. Greenan. Jerasure 2.0. http://jerasure.org/jerasure-2.0/
synchrony. J. ACM 35(2): 288–323, 1988. [74] J. Plank, K. Greenan, and E. Miller. Screaming fast Galois field arith-
[38] M. Fischer, N. Lynch, and M. Paterson. Impossibility of distributed consensus metic using Intel SIMD instructions. FAST 2013, 2013. Latest version:
with one faulty process. J. ACM 32(2): 374–382, 1985. http://lab.jerasure.org/jerasure/gf-complete
2040
[75] J. Plank and L. Xu. Optimizing Cauchy Reed-Solomon codes for fault-tolerant We now prove the second part of availability. Following an anal-
network storage applications. NCA 2006. ogous line of the above argument, if a correct server completes
[76] PyECLib. https://pypi.python.org/pypi/PyECLib
[77] M. O. Rabin. Efficient dispersal of information for security, load balancing, and disperse, at least m + д0 (L − 1) correct servers stored consistent
fault tolerance. Journal of the ACM, 36(2):335–348, 1989. fragments. If f + 1 correct servers complete disperse, any client
[78] I. S. Reed and G. Solomon. Polynomial codes over certain finite fields. J. Soc.
Industrial Appl. Math, 1960.
that initiates read(i) for i ∈ [1..m] will receive f + 1 matching
[79] M. K. Reiter and K. Birman. How to securely replicate services. ACM TOPLAS, fingerprinted cross-checksums. If the fragment i happens to be
vol. 16 issue 3, pp. 986–1009, ACM, 1994. available or there is less than д0 failures in the local group, the
[80] M. Sathiamoorthy. et al. XORing elephants: novel erasure codes for big data.
Journal Proceedings of the VLDB Endowment volume 6, issue 5, pp. 325–336, 2013. fragment will be available for the client. Otherwise, another round
[81] F. Schneider. Implementing fault-tolerant services using the state machine ap- of interaction is needed, and the client will obtain m + д0 (L − 1)
proach: A tutorial. ACM Comput. Surveys 22(4): 299–319, 1990. consistent fragments and reconstruct the fragment needed.
[82] V. Shoup. On fast and provably secure message authentication based on universal
hashing. CRYPTO ’96, pages 313–328, 1996. We now prove correctness. We first claim that if some correct
[83] V. Shoup. Practical threshold signatures. EUROCRYPT 2000. server delivers fpcc1 and some correct server delivers fpcc2 , then
[84] V. Shoup. NTL: A library for doing number theory. http://shoup.net/ntl
[85] V. Shoup and R. Gennaro. Securing threshold cryptosystems against chosen
fpcc1 = fpcc2 . The proof is quite “standard” for a quorum based
ciphertext attack. EUROCRYPT ’98. protocol: if fpcc1 is delivered then m + д0 (L − 1) + f servers echoed
[86] SingularityNET. https://singularitynet.io/ fpcc1 , of which at least m +д0 (L − 1) is correct. The same applied to
fpcc2 . As a correct server will only echo once, there are at least 2m +
[87] J. Sousa, A. Bessani, and M. Vukolic. A Byzantine fault-tolerant ordering service
for the Hyperledger Fabric blockchain platform. DSN 2018.
[88] Tendermint core. https://github.com/tendermint/tendermint 2д0 (L − 1) + f servers echoed, which is larger than the total server
[89] H. Turki, F. Salgado, J. M. Camacho. HoneyLedgerBFT: Enabling (note that L ≥ 2 and 2f ≤ (д0 + д1 )). This leads to a contradiction.
Byzantine fault tolerance for the Hyperledger platform. Available:
https://www.semanticscholar.org/ Thus, any block decoded during retrieve or any fragment during
[90] R. van Renesse, C. Ho, and N. Schiper. Byzantine chain replication. OPODIS 2012. read is consistent with the same fpcc. By Theorem 3.4 in [45] and
[91] G. S. Veronese, M. Correia, A. N. Bessani, and L. C. Lung. Spin one’s wheels?
Byzantine fault tolerance with a spinning primary. SRDS 2009.
by Lemma 9.1, the probability that clients do not obtain the same
[92] Walmart, JD.com, IBM and Tsinghua University Launch a Blockchain Food Safety block or fragment(s) is negligible.
Alliance in China. https://www-03.ibm.com/press/us/en/pressrelease/53487.wss
[93] Z. Wilcox-O’Hearn. Zfec 1.5.2. https://pypi.python.org/pypi/zfec
[94] L. Zhou, F. B. Schneider, R. van Renesse. APSS: proactive secret sharing in asyn-
chronous systems. ACM Trans. Inf. Syst. Secur. 8(3): 259–286 (2005)
A CORRECTNESS PROOF
Proof of Theorem 9.2. Termination is simple, as in AVID-FP. If a
correct server initiates disperse, the server erasures codes the trans-
action, and sends fragments and the fingerprinted cross-checksum
to each server. As the server initiating disperse is correct, at least
n − f ≥ m +д0 (L − 1) + f correct servers receive disperse messages,
and send echo messages to all servers. Each server will eventually
receive m + д0 (L − 1) + f echo messages, and then sends a ready
message, if it has not done so. Each correct server will eventu-
ally receive at lest 2f + 1 ready messages, and will then store the
fingerprinted cross-checksum and complete.
Agreement follows exactly as in AVID-FP. If some correct server
completes disperse(M), then the server must have received 2f +
1 ready messages and at least f + 1 ready messages much have
come from correct servers. This means that all correct servers
will eventually receive ready messages from these correct servers.
As our protocol implements the amplification step as in all other
Bracha’s broadcast like broadcast, all correct servers will send ready
messages, and all of them will eventually receive at least 2f + 1
ready messages. Agreement thus follows.
We first prove the first part of availability. In our protocol, if a
correct server completes disperse, it must have received 2f +1 ready
messages, and at least one correct server received m + д0 (L − 1) + f
echo messages. Therefore, at least m + д0 (L − 1) correct servers
stored consistent fragments. According to the property of pyramid
codes, these fragments can be used to reconstruct the original block.
Accordingly, if f + 1 correct servers complete disperse, any client
that initiates retrieve will receive m +д0 (L − 1) consistent fragments
and f + 1 matching fingerprinted cross-checksums. The client can
then decode the fragments to generate some block.
2041

BEAT: Asynchronous BFT Made Practical: Sisi Duan Michael K. Reiter Haibin Zhang

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

BEAT: Asynchronous BFT Made Practical: Sisi Duan Michael K. Reiter Haibin Zhang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BEAT: Asynchronous BFT Made Practical: Sisi Duan Michael K. Reiter Haibin Zhang

Uploaded by

Copyright:

Available Formats

Session 10B: Protocols CCS’18, October 15-19, 2018, Toronto, ON, Canada

BEAT: Asynchronous BFT Made Practical

Discussed in New Techniques Features Applicability

erasure code, but we (intentionally) leave the decode algorithm

j=1 b 7j d j and d 7−2 = j=4 b 7j d j , respectively. Clearly, we have

1 gerBFT in both settings. This is expected since BEAT0 employs

You might also like