[go: up one dir, main page]

0% found this document useful (0 votes)
29 views19 pages

2020 Delphi

D ELPHI is a cryptographic inference service designed to enable secure neural network predictions without compromising the privacy of either the user or the service provider. It employs a hybrid cryptographic protocol that enhances communication and computation efficiency, achieving a 22× reduction in prediction latency compared to existing methods. The system automatically optimizes neural network architectures to balance performance and accuracy, making it suitable for real-world applications while ensuring that sensitive data remains confidential.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views19 pages

2020 Delphi

D ELPHI is a cryptographic inference service designed to enable secure neural network predictions without compromising the privacy of either the user or the service provider. It employs a hybrid cryptographic protocol that enhances communication and computation efficiency, achieving a 22× reduction in prediction latency compared to existing methods. The system automatically optimizes neural network architectures to balance performance and accuracy, making it suitable for real-world applications while ensuring that sensitive data remains confidential.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

D ELPHI: A Cryptographic Inference Service for Neural Networks

Pratyush Mishra Ryan Lehmkuhl Akshayaram Srinivasan


Wenting Zheng Raluca Ada Popa

UC Berkeley

Abstract
Many companies provide neural network prediction services
Client Cryptographic Cloud
to users for a wide range of applications. However, current Protocol
prediction systems compromise one party’s privacy: either NN Model
Data
the user has to send sensitive inputs to the service provider for Prediction
classification, or the service provider must store its proprietary
neural networks on the user’s device. The former harms the
Figure 1: Cryptographic neural network inference. The lock indi-
personal privacy of the user, while the latter reveals the service
cates data provided in encrypted form.
provider’s proprietary model.
We design, implement, and evaluate D ELPHI, a secure pre- tion over (convolutional) neural networks [Gil+16; Moh+17;
diction system that allows two parties to execute neural net- Liu+17a; Juv+18] by utilizing specialized secure multi-party
work inference without revealing either party’s data. D ELPHI computation (MPC) [Yao86; Gol+87]. At a high level, these
approaches the problem by simultaneously co-designing cryp- protocols proceed by encrypting the user’s input and the ser-
tography and machine learning. We first design a hybrid cryp- vice provider’s neural network, and then tailor techniques for
tographic protocol that improves upon the communication computing over encrypted data (like homomorphic encryption
and computation costs over prior work. Second, we develop a or secret sharing) to run inference over the user’s input. At the
planner that automatically generates neural network architec- end of the protocol execution, the intended party(-ies) learn
ture configurations that navigate the performance-accuracy the inference result; neither party learns anything else about
trade-offs of our hybrid protocol. Together, these techniques the other’s input. Fig. 1 illustrates this protocol flow.
allow us to achieve a 22× improvement in online prediction Unfortunately, these cryptographic prediction protocols
latency compared to the state-of-the-art prior work. are still unsuitable for deployment in real world applications
as they require the use of heavy cryptographic tools during
1 Introduction the online execution. These tools are computationally inten-
Recent advances in machine learning have driven increasing sive and often require a large amount of communication be-
deployment of neural network inference in popular applica- tween the user and the service provider. Furthermore, this
tions like voice assistants [Bar18] and image classification cost grows with the complexity of the model, making these
[Liu+17b]. However, the use of inference in many such ap- protocols unsuitable for use with state-of-the-art neural net-
plications raises privacy concerns. For example, home moni- work architectures used in practice today. For example, using
toring systems (HMS) such as Kuna [Kun] and Wyze [Wyz] a state-of-the-art protocol like G AZELLE [Juv+18] to per-
use proprietary neural networks to classify objects in video form inference for state-of-the-art deep neural networks like
streams of users’ homes such as cars parked near the user’s ResNet-32 [He+16] requires ∼ 82 seconds and results in over
house, or faces of visitors to the house. These models are core 560 MB communication.
to these companies’ business and are expensive to train. Our contribution. In this paper, we present D ELPHI, a
To make use of these models, either the user has to upload cryptographic prediction system for realistic neural network
their streams to the servers of the HMS (which then evaluate architectures. D ELPHI achieves its performance via a careful
the model over the stream), or the HMS has to store its model co-design of cryptography and machine learning. D ELPHI
on the user’s monitoring device (which then performs the contributes a novel hybrid cryptographic prediction protocol,
classification). Both of these approaches are unsatisfactory: as well as a planner that can adjust the machine learning algo-
the first requires users to upload video streams containing rithm to take advantage of the performance-accuracy trade-
sensitive information about their daily activities to another offs of our protocol. Our techniques enable us to perform cryp-
party, while the second requires the HMS to store its model tographic prediction on more realistic network architectures
on every device, thus allowing users and competitors to steal than those considered in prior work. For example, using D EL -
the proprietary model. PHI for cryptographic prediction on ResNet-32 requires just
To alleviate these privacy concerns, a number of recent 3.8 seconds and 60 MB communication in the online phase,
works have proposed protocols for cryptographic predic- improving upon G AZELLE by 22× and 9× respectively.

1
1.1 Techniques to efficiently switch back-and-forth between the two afore-
We now describe at a high level the techniques underlying mentioned primitives via a technique based on additive secret
D ELPHI’s excellent performance. sharing.
As noted above, G AZELLE’s use of heavy cryptography
Performance goals. Modern convolutional neural networks in the online phase leads to efficiency and communication
consist of a number of layers, each of which contains one sub- overheads. To reduce these overheads, we proceed as follows.
layer for linear operations, and one sub-layer for non-linear
Reducing the cost of linear operations. To reduce the
operations. Common linear operations include convolutions,
online cost of computing the linear operations, we adapt
matrix multiplication, and average pooling. Non-linear opera-
G AZELLE to move the heavy cryptographic operations over
tions include activation functions such as the popular ReLU
LHE ciphertexts to the preprocessing phase. Our key insight
(Rectified Linear Unit) function.
is that the service provider’s input M to the linear layer (i.e.
Achieving cryptographic prediction for realistic neural net- the model weights for that layer) is known before user’s input
works thus entails (a) constructing efficient subprotocols for is available, and so we can use LHE to create secret shares
evaluating linear and non-linear layers, and (b) linking the of M during preprocessing. Later, when the user’s input be-
results of these subprotocols with each other. comes available in the online phase, all linear operations can
Prior work. Almost all prior protocols for cryptographic be performed directly over secret-shared data without invok-
prediction utilize heavyweight cryptographic tools to imple- ing heavy cryptographic tools like LHE, and without requiring
ment these subprotocols, which results in computation and interactions to perform matrix-vector multiplications.
communication costs that are much higher than the equiva- The benefits of this technique are two-fold. First, the on-
lent plaintext costs. Even worse, many protocols utilize these line phase only requires transmitting secret shares instead of
tools during the latency-sensitive online phase of the protocol, ciphertexts, which immediately results in an 8× reduction
i.e., when the user acquires their input and wishes to obtain in online communication for linear layers. Second, since the
a classification for it. (This is opposed to the less latency- online phase only performs computations over elements of
sensitive preprocessing phase that occurs before the user’s prime fields, and since our system uses concretely small 32-
input becomes available). bit primes for this purpose, our system can take advantage of
For example, the online phase of the state-of-the-art state-of-the-art CPU and GPU libraries for computing linear
G AZELLE protocol uses heavy cryptography like linearly layers; see Section 7.2 and Remark 4.2 for details.
homomorphic encryption and garbled circuits. As we show Reducing the cost of non-linear operations. While the
in Section 7.4, this results in heavy preprocessing and on- above technique already significantly reduces computation
line costs: for the popular network architecture ResNet-32 time and communication cost, the primary bottleneck for both
trained over CIFAR-100, G AZELLE requires ∼ 158 seconds remains the cost of evaluating garbled circuits for the ReLU
and 8 GB of communication during the preprocessing phase, activation function. To minimize this cost, we use an alternate
and ∼ 50 seconds and 5 GB of communication during the approach [Gil+16; Liu+17a; Moh+17; Cho+18] that is better
preprocessing phase, and ∼ 82 seconds and 600 MB of com- suited to our setting of computing over finite field elements:
munication during the online phase. computing polynomials. In more detail, D ELPHI replaces
1.1.1 D ELPHI’s protocol ReLU activations with polynomial (specifically, quadratic)
approximations. These can be computed securely and effi-
To achieve good performance on realistic neural networks, ciently via standard protocols [Bea95].
D ELPHI builds upon techniques from G AZELLE to develop Because these protocols only require communicating a
new protocols for evaluating linear and non-linear layers that small constant number of field elements per multiplication,
minimize the use of heavy cryptographic tools, and thus mini- using quadratic approximations significantly reduces the com-
mizes communication and computation costs in the prepro- munication overhead per activation, without introducing ad-
cessing and online phases. We begin with a short overview of ditional rounds of communication. Similarly, since the un-
G AZELLE’s protocol as it is the basis for D ELPHI’s protocols. derlying multiplication protocol only requires a few cheap
Starting point: G AZELLE. G AZELLE [Juv+18] is a state- finite field operations, the computation cost is also reduced by
of-the-art cryptographic prediction system for convolutional several orders of magnitude. Concretely, the online communi-
neural networks. G AZELLE computes linear layers using an cation and computation costs of securely computing quadratic
optimized linearly-homomorphic encryption (LHE) scheme approximations are 192× and 10000× smaller (respectively)
[Elg85; Pai99; Reg09; Fan+12] that enables one to perform than the corresponding costs for garbled circuits.
linear operations directly on ciphertexts. To compute non- However, this performance improvement comes at the cost
linear layers, G AZELLE uses garbled circuits [Yao86] to com- of accuracy and trainability of the underlying neural network.
pute the bitwise operations required by ReLU. Finally, be- Prior work has already established that quadratic approxi-
cause each layer in a neural network consists of alternating mations provide good accuracy in some settings [Moh+17;
linear and non-linear layers, G AZELLE also describes how Liu+17a; Gho+17; Cho+18]. At the same time, both prior

2
work [Moh+17] and our own experiments indicate that Service Provider
in many settings simply replacing ReLU activations with 1 Architectural search
quadratic approximations results in severely degraded accu-
racy, and can increase training time by orders of magnitude
(if training converges at all). To overcome this, we develop a Client
98% 94% 92%
Agree on
hybrid cryptographic protocol that uses ReLUs and quadratic Precomputed
min. accuracy 92%
2 Preprocessing
approximations to achieve good accuracy and good efficiency. data

Planning an efficient usage of the hybrid cryptographic Cryptographic


Protocol
protocol. It turns out that it is not straightforward to de- Input Data 3 Online prediction
termine which ReLU activations should be replaced with Prediction

quadratic approximations. Indeed, as we explain in Section 5,


simply replacing arbitrary ReLU activations with quadratic Figure 2: D ELPHI’s architecture. Orange layers represent quadratic
approximations can degrade the accuracy of the resulting approximations while blue ones represent ReLUs.
network, and can even cause the network to fail to train.
So, to find an appropriate placement or network configura- a secure prediction together by providing their own inputs.
tion, we design a planner that automatically discovers which The service provider’s input is the neural network, while the
ReLUs to replace with quadratic approximations so as to max- client’s input is its private input used for prediction.
imize the number of approximations used while still ensuring
that accuracy remains above a specified threshold.
2.2 Threat model
The insight behind our planner is to adapt techniques for D ELPHI’s threat model is similar to that of prior secure
neural architecture search (NAS) and hyperparameter opti- prediction works such as G AZELLE [Juv+18] and Min-
mization (see [Els+19; Wis+19] for in-depth surveys of these iONN [Liu+17a]. More specifically, D ELPHI is designed for
areas) to our setting. Namely, we adapt these techniques to the two-party semi-honest setting, where only one of the par-
discover which layers to approximate within a given neural ties is corrupted by an adversary. Furthermore, this adversary
network architecture, and to optimize the hyperparameters for never deviates from the protocol, but it will try to learn in-
the discovered network. See Section 5 for details. formation about the other parties’ private inputs from the
The overall system. D ELPHI combines the above in- messages it receives.
sights into a cohesive system that service providers can use 2.3 Privacy goals
to automatically generate cryptographic prediction proto-
cols meeting performance and accuracy criteria specified by D ELPHI’s goal is to enable the client to learn only two pieces
the provider. In more detail, the service provider invokes of information: the architecture of the neural network, and
D ELPHI’s planner with acceptable accuracy and performance the result of the inference; all other information about the
thresholds. The planner outputs an optimized architecture client’s private inputs and the parameters of the server’s neu-
that meets this goal, which D ELPHI then uses to instantiate a ral network model should be hidden. Concretely, we aim to
concrete cryptographic prediction protocol that utilizes our achieve a strong simulation-based definition of security; see
cryptographic techniques from above. Definition 4.1.
This co-design of cryptography and machine learning en- Like all prior work, D ELPHI does not hide information
ables D ELPHI to efficiently provide cryptographic prediction about the architecture of the network, such as the dimensions
for networks deeper than any considered in prior work. For and type of each layer in the network. For prior work, this is
example, in Section 7 we show that using D ELPHI to provide usually not an issue because the architecture is independent
inference for the popular ResNet-32 architecture requires only of the training data. However, because D ELPHI’s planner uses
60 MB communication and 3.8 seconds. training data to optimally place quadratic approximations,
revealing the network architecture reveals some information
2 System overview about the data. Concretely, in optimizing an `-layer network,
the planner makes ` binary choices, thus reveals at most ` bits
2.1 System setup of information about the training data. Because ` is concretely
There are two parties in the system setup: the client and the small for actual networks (for example, ` = 32 for ResNet-
service provider (or server). In the plaintext version of our 32), this leakage is negligible. This leakage can be further
system, the service provider provides prediction as a service mitigated by using differentially private training algorithms
using its internal models via an API. The client uses this API [Sho+15; Aba+16]
to run prediction on its own data by transferring its data to the D ELPHI, like most prior systems for cryptographic predic-
service provider. The service provider runs prediction using tion, does not hide information that is revealed by the result
the appropriate neural network, then sends the prediction of the prediction. In our opinion, protecting against attacks
result back to the client. In D ELPHI, the two parties execute that exploit this leakage is a complementary problem to that

3
solved by D ELPHI. Indeed, such attacks have been success- device detects suspicious activity locally, it can run the online
fully carried out even against systems that “perfectly” hide the inference phase to obtain a classification. On the basis of this
model parameters by requiring the client to upload its input result, it can decide whether to alert the user or not.
to the server [Fre+14; Ate+15; Fre+15; Wu+16b; Tra+16].
Furthermore, popular mitigations for these attacks, such as Remark 2.2 (applications suitable for use with D ELPHI). Ex-
differential privacy, can be combined with D ELPHI’s protocol. ample 2.1 indicates that D ELPHI is best suited for applications
We discuss these attacks and possible mitigations in more where there is ample computational power available for pre-
detail in Section 8.2. processing, and where inference is latency-sensitive, but is not
performed frequently enough to deplete the reserve of prepro-
2.4 System architecture and workflow cessed material. Other examples of such applications include
D ELPHI’s architecture consists of two components: a hybrid image classification in systems like Google Lens [Goo].
cryptographic protocol for evaluating neural networks, and a
neural network configuration planner that optimizes a given 3 Cryptographic primitives
neural network for use with our protocol. Below we provide
an overview of these components, and then demonstrate how In this section, we provide a high-level description of the
one would use these in practice by describing an end-to-end cryptographic building blocks used in D ELPHI; this high-level
workflow for cryptographic prediction in home monitoring description suffices to understand our protocols. We provide
systems (HMS). formal definitions of security properties in Appendix A, and
only provide high level intuitions here.
Hybrid cryptographic protocol. D ELPHI’s protocol for
cryptographic prediction consists of two phases: an offline Garbled circuits. Garbled circuits (GC), introduced in the
preprocessing phase, and an online inference phase. The of- seminal work of Yao [Yao86], are a method of encoding a
fline preprocessing phase is independent of the client’s input boolean circuit C and its input x such that, given the encoded
(which regularly changes), but assumes that the server’s model circuit and the encoded input, an evaluator can use a special
is static; if this model changes, then both parties would have evaluation procedure to obtain the output C(x) while ensuring
to re-run the preprocessing phase. After preprocesing, during that the evaluator learns nothing else about C or x. We now
the online inference phase, the client provides its input to our describe this notion in more detail.
specialized secure two-party computation protocol, and even- A garbling scheme [Yao86; Bel+12] is a tuple of algorithms
tually learns the inference result. We note that our protocol GS = (Garble, Eval) with the following syntax:
provides two different methods of evaluating non-linear lay- • GS.Garble(C) → (C̃, {labeli,0 , labeli,1 }i∈[n] ). On input a
ers: the first offers better accuracy at the cost of worse offline boolean circuit C, Garble outputs a garbled circuit C̃ and a
and online efficiency, while the other degrades accuracy, but set of labels {labeli,0 , labeli,1 }i∈[n] . Here labeli,b represents
offers much improved offline and online efficiency. assigning the value b ∈ {0, 1} to the i-th input label.
Planner. To help service providers navigate the trade off • GS.Eval(C̃, {labeli,xi }) → y. On input a garbled circuit C̃
between performance and accuracy offered by these two com- and labels {labeli,xi } corresponding to an input x ∈ {0, 1}n ,
plementary methods to evaluate non-linear layers, D ELPHI Eval outputs a string y = C(x).
adopts a principled approach by designing a planner that We provide a formal definition in Appendix A, and briefly
generates neural networks that mix these two methods to max- describe here the key properties satisfied by garbling schemes.
imize efficiency while still achieving the accuracy desired by First, GS must be complete: the output of Eval must equal
the service provider. Our planner applies neural architecture C(x). Second, it must be private: given C̃ and {labeli,xi }, the
search (NAS) to the cryptographic setting in a novel way in evaluator should not learn anything about C or x except the
order to automatically discover the right architectures. size of |C| (denoted by 1|C| ) and the output C(x).
Example 2.1 (HMS workflow). As explained in Section 1, Linearly homomorphic public-key encryption. A lin-
a home monitoring system (HMS) enables users to surveil early homomorphic encryption scheme [Elg85; Pai99] is a
activity inside and outside their houses. Recent HMSes [Kun; public key encryption scheme that additionally supports (only)
Wyz] use neural networks to decide whether a given activity linearly homomorphic operations on the ciphertexts. To give
is malicious or not. If it is, they alert the user. In this setting more details, a linearly homomorphic encryption consists of a
privacy is important for both the user and the HMS provider, tuple of algorithms HE = (KeyGen, Enc, Dec, Eval) with the
which makes D ELPHI an ideal fit. To use D ELPHI to provide following syntax:
strong privacy, the HMS provider proceeds as follows. • HE.KeyGen → (pk, sk). HE.KeyGen is a randomized algo-
The HMS provider first invokes D ELPHI’s planner to op- rithm that outputs a public key pk and a secret key sk.
timize its baseline all-ReLU neural network model. Then, • HE.Enc(pk, m) → c. On input the public key pk and a mes-
during the HMS device’s idle periods, the device and the sage m, the encryption algorithm HE.Enc outputs a cipher-
HMS server run the preprocessing phase for this model. If the text c. The message space is a finite ring R .

4
• HE.Dec(sk, c) → m. On input the secret key sk and a ci- key improvements to protocols proposed in prior work like
phertext c, the decryption algorithm HE.Dec outputs the MiniONN [Liu+17a] and G AZELLE [Juv+18]. First, D ELPHI
message m contained in c. splits the protocol into a preprocessing phase and an online
• HE.Eval(pk, c1 , c2 , L) → c0 . On input the public key pk, phase such that most of the heavy cryptographic computation
two ciphertexts c1 , c2 encrypting messages m1 and m2 , and is performed in the preprocessing phase. Second, D ELPHI
a linear function L,1 HE.Eval outputs a new ciphertext c0 introduces two different methods of evaluating non-linear
encrypting L(m1 , m2 ). functions that provide the users with trade offs between accu-
Informally, we require HE to satisfy the following properties: racy and performance. The first method uses garbled circuits
• Correctness. HE.Dec, on input sk and a ciphertext c := to evaluate the ReLU activation function, while the second
HE.Enc(pk, m), outputs m. method uses securely evaluates polynomial approximations of
• Homomorphism. HE.Dec, on input sk and a ciphertext c := the ReLU. The former provides maximum accuracy but is in-
HE.Eval(pk, HE.Enc(pk, m1 ), HE.Enc(pk, m2 ), L), outputs efficient, while the latter is computationally cheap but lowers
L(m1 , m2 ). accuracy. (We note that below we describe a protocol for eval-
• Semantic security. Given a ciphertext c and two messages uating any polynomial approximation, but in the rest of the
of the same length, no attacker should be able to tell which paper, we restrict ourselves only to quadratic approximations
message was encrypted in c. because these are maximally efficient.)
• Function privacy. Given a ciphertext c, no attacker can tell Notation. Let R be a finite ring. Let HE = (KeyGen,
what homomorphic operations led to c. Enc, Dec, Eval) be a linearly homomorphic encryption over
Oblivious transfer. An oblivious transfer protocol [Rab81; the plaintext space R . The server holds a model M consist-
Eve+82; Ish+03] is a protocol between two parties, a sender ing of ` layers M1 , . . . , M` . The client holds an input vector
who has as input two messages m0 , m1 , and a receiver who x ∈ R n.
has as input a bit b. At the end of the protocol, the receiver We now give the formal definition of a cryptographic pre-
learns mb . The security requirement states that the sender diction protocol. Intuitively, the definition guarantees that
does not learn anything about bit b and the receiver does not after the protocol execution, a semi-honest client (i.e., one
learn anything about the string m1−b . that follows the specification of the protocol) only learns the
Additive secret sharing. Given a finite ring R and an el- architecture of the neural network and the result of the in-
ement x ∈ R , a 2-of-2 additive secret sharing of x is a pair ference; all other information about the parameters of the
([x]1 , [x]2 ) = (x − r, r) ∈ R 2 (so that x = [x]1 + [x]2 ) where r server’s neural network model are hidden. Similarly, a semi-
is a random element from the ring. Additive secret sharing is honest server does not learn any information about the client’s
perfectly hiding, i.e., given a share [x]1 or [x]2 , the value x is input, not even the output of the inference.
perfectly hidden.
Definition 4.1. A protocol Π between a server having as in-
Beaver’s multiplicative triples. Beaver’s multiplication put model parameters M = (M1 , . . . , M` ) and a client having
triples [Bea95] generation procedure is a two-party proto- as input a feature vector x is a cryptographic prediction
col that securely computes the following function. Sample protocol if it satisfies the following guarantees.
a, b ← R and return [a]1 , [b]1 , [ab]1 to the first party and • Correctness. On every set of model parameters M that
[a]2 , [b]2 , [ab]2 to the second party. In this work, we will gener- the server holds and every input vector x of the client, the
ate Beaver’s triples using a linearly homomorphic encryption output of the client at the end of the protocol is the correct
scheme; we provide further details in Appendix A. prediction M(x).
Beaver’s multiplication procedure. Let P1 and P2 be two • Security:
parties who hold [x]1 , [y]1 and [x]2 , [y]2 respectively where – Corrupted client. We require that a corrupted, semi-
x, y are some ring elements. Additionally, let us assume that honest client does not learn anything about the server’s
P1 and P2 also hold a Beaver’s multiplication triple, namely, network parameters M. Formally, we require the exis-
([a]1 , [b]1 , [ab]1 ) and ([a]2 , [b]2 , [ab]2 ) respectively. Beaver’s tence of an efficient simulator SimC such that ViewCΠ ≈c
multiplication procedure is a secure protocol such that at SimC (x, out), where ViewCΠ denotes the view of the client
the end of the protocol, parties P1 and P2 hold an additive in the execution of Π (the view includes the client’s input,
secret sharing of xy. We provide details of this protocol in randomness, and the transcript of the protocol), and out
Appendix A but note here that this protocol can be used to denotes the output of the inference.
securely evaluate any polynomial. – Corrupted server. We require that a corrupted, semi-
honest server does not learn anything about the pri-
4 Cryptographic protocols vate input x of the client. Formally, we require the exis-
In D ELPHI, we introduce a hybrid cryptographic protocol for tence of an efficient simulator SimS such that ViewΠ S ≈c
cryptographic prediction (see Fig. 4). Our protocol makes two SimS (M), where ViewΠ denotes the view of the server
S
1
L maps (m1 , m2 ) to am1 + m2 for some a ∈ R . in the execution of Π.

5
Client Server
x, ri Mi, si
Client Server xi − ri
Linear
Mi
Miri − si Mi(xi − ri) + si
Sample ri ← R n Sample si ← R n
for i ∈[1,…, ℓ] for i ∈[1,…, ℓ]
enc(ri)
Linear Labels for Mi(xi − ri) + si
enc(Miri −si) Evaluate
Miri −si si Garbled C̃i
circuits
ri+ 1 xi+ 1 − ri+ 1
C̃i
Garbled
circuits
OT [xi+ 1]1 [xi+ 1]2
o n lin e
FBeaver
Polynomial
approx.
Beaver’s [ai]1, [bi]1, [aibi]1 [ai]2, [bi]2, [aibi]2 [xi+ 1]1 − ri+ 1
pre
FBeaver
triples ri+ 1 xi+ 1 − ri+ 1

Figure 3: D ELPHI’s preprocessing phase. Figure 4: D ELPHI’s online phase.

The D ELPHI protocol proceeds in two phases: the prepro- 4.2 Online
cessing phase and the online phase, and we give the details of The online phase is divided into a two stages: the setup and
both these phases in the subsequent sections. the layer evaluation.
4.1 Preprocessing phase 4.2.1 Setup
During preprocessing, the client and the server pre-compute [xi+ 1]1The
The client on input x, sends x − r1 to the server. − ri+ 1server
+ [xi+ 1]2
data that can be used during the online execution. This phase and the client now hold an additive secret sharing of x.
can be executed independent of the input values, i.e., D ELPHI
can run this phase before either party’s input is known. 4.2.2 Layer evaluation
At the beginning of the i-th layer, the client holds ri and
1. The client runs HE.KeyGen to obtain a public key pk and the server holds xi − ri where xi is the vector obtained by
a secret key sk. evaluating the first (i − 1) layers of the neural network on
2. For every i ∈ [`], the client and the server choose random input x (with x1 set to x). This invariant will be maintained
masking vectors ri , si ← R n respectively. for each layer. We now describe the protocol for evaluating
3. The client sends HE.Enc(pk, ri ) to the server. The server the i-th layer, which consists of linear functions and activation
computes HE.Enc(pk, Mi · ri − si ) using the HE.Eval pro- functions.
cedure and sends this ciphertext to the client.
4. The client decrypts the above ciphertexts and to obtain Linear layer. The server computes Mi (xi − ri ) + si , which
(Mi · ri − si ) for each layer. The server holds si for each ensures that the client and the server an additive secret sharing
layer and thus, the client and the server hold an additive of Mi xi .
secret sharing of Mi ri . Non-linear layer. After the linear functions, the server holds
5. This step depends on the activation type: Mi (xi − ri ) + si and the client holds Mi · ri − si . There are
(a) ReLU: The server constructs Ce by garbling the circuit two ways of evaluating non-linear layers: garbled circuits for
C described in Fig. 5. It sends Ce to the client and ReLU, or Beaver’s multiplication for polynomial approxima-
simultaneously, the server and the client exchange tion:
labels for the input wires corresponding to ri+1 and • Garbled circuits
Mi · ri − si via an Oblivious Transfer (OT). 1. The server sends the garbled labels corresponding to
(b) Polynomial approximaitons: The client and the server Mi (xi − ri ) + si to the client.
run the Beaver’s triples generation protocol to gener- 2. The client evaluates the garbled circuit Ce using the
ate a number of Beaver’s multiplication triples.2 above labels as well as the labels obtained via OT (in
2
The exact number of triples generated depends on the number of layers the offline phase) to obtain a one-time pad ciphertext
that have to be approximated using a polynomial. OTP(xi+1 − ri+1 ). It then sends this output to the server.

6
3. The server uses the one time pad key to obtain xi+1 − Beaver’s triples generation and multiplication procedure, the
ri+1 . protocol described above is a cryptographic prediction proto-
• Polynomial approximation col (see Definition 4.1).
1. The client and the server run the Beaver’s multiplication Proof. Below we describe simulators first for the case where
procedure to evaluate the polynomial approximating the client is corrupted, and then for the case where the server
this layer. At the end of the procedure, the client holds is corrupted. We provide a hybrid argument that relies on
[xi+1 ]1 and the server holds [xi+1 ]2 . these simulators in Appendix B.
2. The client computes [xi+1 ]1 − ri+1 and sends them to 4.3.1 Client is corrupted
the server. The server adds [xi+1 ]2 to this value to obtain
The simulator Sim, when provided with the client’s input x,
xi+1 − ri+1 .
proceeds as follows:
Output layer. The server sends x` − r` to the client who
adds this with r` to learn x` . 1. Sim chooses an uniform random tape for the client.
2. In the offline phase:
(a) Sim receives the public key and the ciphertext
Hardwired: A random one time pad key. HE.Enc(pk, ri ) from the client. In return, it sends
Input: Mi (xi − ri ) + si , ri+1 , Mi · ri − si . HE.Enc(pk, −s0i ) for a randomly chosen s0i from R n .
1. Compute Mi · xi = Mi (xi − ri ) + si + (Mi · ri − si ). (b) Sim uses the simulator for garbled circuits SimGS and
2. Compute ReLU(Mi · xi ) to obtain xi+1 . runs it on 1λ , 1|C| and sets the output of the circuit to
3. Compute xi+1 − ri+1 and output OTP(xi+1 − ri+1 ). be a random value. SimGS outputs C, e {labeli }. For the
i-th OT execution, Sim gives the labeli in both slots
Figure 5: A circuit that computes ReLU. as input. It sends Ce to the client.
(c) For the secure protocol to generate the Beaver’s
Remark 4.2 (fixed-point arithmetic in finite fields). The dis- triples, Sim runs the corresponding simulator for this
cussion so far assumes arithmetic over a finite ring. However, procedure.
popular implementations of neural network inference perform 3. Online phase. In the preamble phase, Sim receives x − r1 .
arithmetic over floating-point numbers. We work around this It sends x to the ideal functionality (a semi-honest client
by using fixed-point representations of floating-point num- uses the same x as its input) and receives the output y. Sim
bers, and embedding this fixed-point arithmetic in our ring performs the layer evaluation as follows:
arithmetic.
(a) Garbled circuits layer. Sim sends the simulated la-
Concretely, our implementation works over the 41-bit
bels.
prime finite field defined by the prime 2061584302081, and
(b) Polynomial approximation layer. Sim uses the sim-
uses a 11-bit fixed-point representation. This choice of pa-
ulator for the Beaver’s multiplication procedure to
rameters enables a single multiplication of two fixed-point
evaluate the polynomial.
numbers before the result overflows capacity of the prime
field. To prevent values from growing exponentially with the 4. Output layer. Sim sends y − r` to the client.
number of multiplications (and thus overflowing), we use a
In Appendix B, we show that the simulated distribution is com-
trick from [Moh+17] that allows us to simply truncate the
putationally indistinguishable to the real world distribution
extra LSBs of fixed-point values. This trick works even when
using the security of the underlying cryptographic building
the result is secret-shared, albeit at the cost of a 1-bit error.
blocks.
Similarly to Slalom [Tra+19], our choice of prime field also
enables us to losslessly embed our field arithmetic in 64-bit 4.3.2 Server is corrupted
floating point arithmetic. In more detail, 64-bit floating point The simulator Sim, when provided with the server’s input
numbers can represent all integers in the range 2−53 , . . . , 253 . M1 , . . . , M`−1 , proceeds as follows.
Because the online phase of our protocol for linear layers
1. Sim chooses an uniform random tape for the server.
requires multiplication of a fixed-point matrix by a secret
2. In the offline phase:
shared vector, the result is a ∼ 52-bit integer, and hence can
be represented with full precision in a 64-bit floating point (a) Sim chooses a public key pk for a linearly homomor-
number. This enables our implementation to use state-of-the- phic encryption scheme. It then sends HE.Enc(pk, 0)
art CPU and GPU libraries for linear algebra. to the server. In return, it receives the homomorphi-
cally evaluated ciphertext from the server.
4.3 Security (b) For every oblivious transfer execution where Sim acts
Theorem 4.3. Assuming the existence of garbled circuits, as the receiver, it uses junk input, say 0 as the re-
linearly homomorphic encryption and secure protocols for ceiver’s choice bit. It receives Ce from the server.

7
(c) For the secure protocol for generating the Beaver’s The foregoing is a brief description that omits many details.
triples, Sim runs the corresponding simulator for this Below, we describe how we solved the challenges that re-
procedure. quired solving to adapt NAS to this setting (Section 5.1), our
3. Online phase. In the preamble phase, Sim sends r1 for an concrete choice of NAS algorithm (Section 5.2), and detailed
uniformly chosen r1 . Sim performs the layer evaluation pseudocode for the final algorithm (Fig. 6).
step as follows: 5.1 Adapting NAS for D ELPHI’s planner
(a) Garbled circuits layer. Sim sends a random value
Challenge 1: Training candidate networks. Prior work
back to the server.
[Moh+17; Gil+16; Gho+17; Cho+18] and our own experi-
(b) Polynomial approximation layer. Sim uses the sim-
ments indicate that networks that use quadratic approxima-
ulator for the Beaver’s multiplication procedure to
tions are challenging to train and deploy: the quadratic acti-
evaluate the polynomial. At the last round of this step,
vations cause the underlying gradient descent algorithm to
it sends a random value back to the server.
diverge, resulting in poor accuracy. Intuitively, we believe
In Appendix B, we show that the simulated distribution is that this behavior is caused by these functions’ large and
indistinguishable from the real world distribution using the alternating gradients.
security of the underlying cryptographic primitives. To solve this issue, we used the following techniques:
• Gradient and activation clipping: During training, we mod-
5 Planner ify our optimizer to use gradient value clipping, which
D ELPHI’s planner takes the service provider’s neural network helps prevent gradients from exploding [Ben+94]. In par-
model (as well as other constraints) and produces a new neural ticular, we clip the values of all gradients to be less than
network architecture that meets the accuracy and efficiency 2. We furthermore modify our networks to use the ReLU6
goals of the service provider. At the heart of this planner is an activation function [Kri10] that ensures that post-activation
algorithm for neural architecture search (NAS) that enables values have magnitude at most 6. This keeps errors from
the service provider to automatically find such network ar- compounding during both inference and training.
chitectures. Below we give a high level overview of this key • Gradual activation exchange: Our experiments determined
component and describe how our planner uses it. that despite clipping, the gradients were still exploding
Background: neural architecture search. Recently, ma- quickly, especially in deeper networks that contained a
chine learning research has seen rapid advancement in the area higher fraction of approximations. To overcome this, we
of neural architecture search (NAS) [Els+19; Wis+19]. The made use of the following insight: intuitively, ReLU6 and
goal of NAS is to automatically discover neural network ar- (clipped) quadratic approximations to ReLU should share
chitectures that best satisfy a set of user-specified constraints. relatively similar gradients, and so it should be possible to
Most NAS algorithms do so by (partially) training a number use ReLU6 to initially guide the descent towards a stable
of different neural networks, evaluating their accuracy, and region where gradients are smaller, and then to use the ap-
picking the best-performing ones. proximation’s gradients to make fine-grained adjustments
Overview of our planner. D ELPHI’s planner, when given within this region.
as input the baseline all-ReLU neural network, operates in We take advantage of this insight by modifying the train-
two modes. When retraining is either not possible or unde- ing process to gradually transform an already-trained all-
sirable (for example if the training data is unavailable or if ReLU6 network into a network with the required number
the provider cannot afford the extra computation required and placement of quadratic approximations. In more de-
for NAS), the planner operates in the first mode and simply tail, our training process expresses each activation as a
outputs the baseline network. If retraining (and hence NAS) weighted average of quadratic and ReLU6 activations, i.e.,
is feasible, then the planner takes as additional inputs the act(x) := wq · quad(x) + wr ReLU(x) such that wq + wr = 1.
training data, and a constraint on the minimum acceptable In the beginning, wq = 0 and wr = 1. Our training algo-
prediction accuracy t, and then uses NAS to discover a net- rithm then gradually increases wq and reduces wr , so that
work configuration that maximizes the number of quadratic eventually wq = 1 and wr = 0.
approximations while still achieving accuracy greater than t. This technique also improves running times for the NAS
Our planner then further optimizes the hyperparameters of as it no longer has to train each candidate network configu-
this configuration. In more detail, in this second mode, our ration from scratch.
planner uses NAS to optimize the following properties of Challenge 2: Efficiently optimizing configurations. Re-
a candidate network configuration given t: (a) the number call from above that our planner aims to optimize the number
of quadratic approximations, (b) the placement of these ap- of quadratic approximations, their placement in the network,
proximations (that is, the layers where ReLUs are replaced and the training hyperparameters. Attempting to optimize all
with approximations), and (c) training hyperparameters like of these variables within a single NAS execution results in
learning rate and momentum. a large search space, and finding efficient networks in this

8
search space takes a correspondingly long time. 
all-ReLU6 neural network N

To solve this problem, we divided up the monolithic NAS Planner  training data D 
execution into independent runs that are responsible for op- accuracy threshold t
timizing different variables. For instance, for an architecture 1. Let the number of non-linear layers in N be n.
with n non-linear layers, for relevant choices of m < n, we 2. Initialize set of output networks F .
first perform NAS to find high-scoring architectures that have 3. For i in {n/2, . . . , n}:
m approximation layers, and then perform NAS again to opti- (a) Compute the set of best performing models
with i quadratic approximation layers: Si ←
mize training hyperparameters for these architectures. At the
PBT(N, D, score(·)).
end of this process, our planner outputs a variety of networks
(b) Optimize hyperparameters for these models:
with different performance-accuracy trade-offs. Si0 ← PBT(Si , D, score(·)).
Challenge 3: Prioritizing efficient configurations. Our (c) If for any N j ∈ Si0 the accuracy of N j is less than t,
planner’s goal is to choose configurations containing the discard N j .
largest number of approximations in order to maximize effi- (d) Define Ni to be the network with the maximum score
ciency. However, network configurations with large numbers among the remaining networks.
of approximations take longer to train and may be slightly (e) Set F := F ∪ {Ni }.
4. Output F .
less accurate than networks with fewer approximations. Since
the traditional NAS literature focuses on simply maximizing
efficiency, using NAS in this default setting results in select- Figure 6: Pseudocode for D ELPHI’s planner.
ing slower networks over more efficient networks that are just
slightly less accurate than the slower ones. To overcome this, rithms for linear layers in SEAL; this may be of indepen-
we changed the way the NAS assigns “scores” to candidate dent interest. D ELPHI’s planner is implemented in Python
networks by designing a new scoring function score(·) which and uses the scalable PBT [Jad+17] implementation in Tune
balances prioritizing accuracy and performance. Our experi- [Lia+18]. Our implementation is available online at https:
ments from Section 7 indicate that this function enables us to //github.com/mc2-project/delphi.
select networks that are both efficient and accurate.
  Remark 6.1 (reimplementing G AZELLE’s algorithms). Riazi
score(N) := acc(N) 1 + #quad. activations et al. [Ria+19] note that G AZELLE’s implementation does not
#total activations .
provide circuit privacy for HE, which can result in leakage
of information about linear layers. To remedy this, they rec-
5.2 Choosing a NAS algorithm ommend using larger parameters that ensure circuit privacy.
The discussion so far has been agnostic to the choice of NAS (The caveat is that these parameters result in worse perfor-
algorithm. In our implementation, we decided to use the pop- mance than using G AZELLE’s highly optimized parameters.)
ular population-based training algorithm [Jad+17] because Because D ELPHI uses G AZELLE’s algorithms in our prepro-
it was straightforward to customize it for our use case, and cessing phase, we attempted to modify G AZELLE’s implemen-
because it enjoys a number of optimized implementations tation4 to use the circuit-private parameters. However, this
(like the one in [Lia+18]). proved to be difficult, and so we decided to reimplement these
Population-based training (PBT) [Jad+17] maintains a pop- algorithms in SEAL, which does support these parameters.
ulation of candidate neural networks that it trains over a series
of time steps. At the end of each time step, it measures the 7 Evaluation
performance of each candidate network via a user-specified
We divide our evaluation into three sections that answer the
scoring function, and replaces the worst-performing candi-
following questions.
dates with mutated versions of the best-performing ones (the
• Section 7.2: How efficient are D ELPHI’s building blocks?
mutation function is specified by the user). At the end of
• Section 7.3: Does D ELPHI’s planner provide a good bal-
the optimization process, PBT outputs the best-performing
ance between efficiency and accuracy for realistic neural
candidate network architectures it has found (along with the
networks, such as ResNet-32?
hyperparameters for training them).
• Section 7.4: What is the latency and communication cost
6 System implementation of using D ELPHI for serving predictions with such neural
networks? Does D ELPHI’s protocol preserve the accuracy
We implemented D ELPHI’s cryptographic protocols in Rust of the plaintext model?
and C++. We use the SEAL homomorphic encryption library
[Sea] to implement HE, and rely on the fancy-garbling 7.1 Evaluation setup
library3 for garbled circuits. To ensure an efficient prepro- All cryptographic experiments were carried out on AWS
cessing phase, we reimplemented G AZELLE’s efficient algo- c5.2xlarge instances possessing an Intel Xeon 8000 series
3 4
https://github.com/GaloisInc/fancy-garbling/ https://github.com/chiraag/gazelle_mpc

9
machine CPU at 3.0 GHz with 16 GB of RAM. The client and report the cost of doing so for a batch sizes of 1, 5, and 10. The
server were executed on two such instances located in the key takeaway is that, for single convolutions these costs are
us-west-1 (Northern California) and us-west-2 (Oregon) over 50–100× lower than the equivalent ones in Table 1, and
regions respectively. The client and server executions used 4 for batched convolutions, the cost seems to scale sub-linearly
threads each. Machine learning experiments were carried out with the batch size.
on various machines with NVIDIA Tesla V100 GPUs. Our
7.2.2 ReLU and quadratic activations
machine learning and cryptographic protocol experiments
rely on the following datasets and architectures: Recall that our protocol for evaluating ReLU activations uses
1. CIFAR-10 is a standardized dataset consisting of (32 × garbled circuits. Our circuit for ReLU follows the design laid
32) RGB images separated into 10 classes. The training out in [Juv+18] with minor additional optimizations. To eval-
set contains 50, 000 images, while the test set has 10, 000 uate quadratic activations, our protocol uses Beaver’s mul-
images. Our experiments use the 7-layer CNN architecture tiplication procedure [Bea95], which requires sending one
specified in MiniONN [Liu+17a]. Doing so allows us to field element from the server to the client and vice versa, and
compare our protocol with prior work. then requires some cheap local field operations from each
2. CIFAR-100 contains the same number of training and test party. The communication and computation costs for both
images as CIFAR-10, but divides them up into 100 classes activations are presented in Table 3.
instead of 10. This increased complexity requires a deeper 7.3 D ELPHI’s planner
network with more parameters, and so our experiments use
the popular ResNet-32 architecture introduced in [He+16]. To demonstrate the effectiveness of our planner we need to
We note that no prior work on secure inference attempts to show that (a) quadratic activations are an effective replace-
evaluate their protocols on difficult datasets like CIFAR- ment for ReLU activations, and that (b) the networks found by
100 or on deep network architectures like ResNet-32. the planner offer better performance than all-ReLU networks.
Whenever we compare D ELPHI with G AZELLE, we estimate In our experiments below, we use 80% of the training data
the cost of G AZELLE’s protocols by summing the costs of to train networks in the planner, and the remaining 20% as a
our re-implementation of the relevant subprotocols for linear validation set. The planner scores candidate networks based
and non-linear layers. We do this as there is no end-to-end on their validation accuracy, but the final reported accuracy is
implementation of G AZELLE’s protocol; only the individual the test set accuracy.
subprotocols are implemented. Quadratic activations are effective. We need to show that
not only do networks output by our planner achieve good
7.2 Microbenchmarks accuracy, but also that the quadratic activations are not redun-
We provide microbenchmarks of D ELPHI’s performance on dant. That is, we need to show that the network is not learning
linear and non-linear layers, comparing both with G AZELLE. to “ignore” quadratic activations. This is a concern because
7.2.1 Linear operations prior work [Mol+17; Liu+18] has shown that modern neural
Below we focus on the performance of convolution operations network architectures can be “pruned” to remove extraneous
because these comprise the majority of the cost of neural parameters and activations while still maintaining almost the
networks’ linear operations. The complexity of a convolution same accuracy.
is determined by the dimensions of the input and the size We show this point by running our planner in two modes.
and number of convolution kernels, as well as the padding In the first mode, our planner was configured to find perfor-
and stride (the latter parameter decides how often the kernel mant networks that used quadratic activations, while in the
is applied to the input). In Table 1, we evaluate the cost of second mode it was configured to find networks that used
convolutions used in ResNet-32. The key takeaway is that the identity function instead of quadratic activations, with the
our online time is over 80× smaller than G AZELLE’s, and our intuition that if the quadratic activations were ineffective, then
online communication is over 150× lower. On the other hand, networks that used the identity function instead would per-
our preprocessing time and communication are higher than form just as well. The results of these runs for varying number
G AZELLE’s, but are at most equal to G AZELLE’s online time of non-ReLU layers are displayed in Fig. 7 (for CIFAR-10)
and communication. and in Fig. 8 (for CIFAR-100). Together, these results indicate
Optimized GPU operations. As explained in Remark 4.2, that the networks output by our planner achieve performance
D ELPHI’s choice of prime field enables D ELPHI to use stan- that is comparable to that of the all-ReLU baselines. Fur-
dard GPU libraries for evaluating convolutional layers in the thermore, as the number of non-ReLU layers increase, the
online phase. However, doing so requires copying the layer best-performing networks that use the identity activation func-
weights and input into GPU memory, and copying the output tion have much worse accuracy than the equivalent networks
back into CPU memory for every linear layer. This copying that use quadratic activations.
can have substantial overhead. To amortize it, one can batch Planned networks perform better. To evaluate the ability
convolutions over different inputs together. In Table 2, we of our planner to find networks that offer good performance,

10
conv. parameters time (ms) comm. (MB)
input kernel stride & system
preproc. online preproc. online
C ×H ×W N ×K ×K padding
D ELPHI 1255 17.41 10.48 0.065
16 × 32 × 32 16 × 3 × 3 (1, 1)
G AZELLE — 1255 — 10.48
D ELPHI 1285 17.17 5.24 0.020
32 × 16 × 16 32 × 3 × 3 (1, 1)
G AZELLE — 1285 — 5.24
D ELPHI 2775 17.05 5.24 0.036
64 × 8 × 8 64 × 3 × 3 (1, 1)
G AZELLE — 2775 — 5.24

Table 1: Computation time and communication cost of ResNet-32 convolutions in D ELPHI.

68 300 k

Number of ReLUs
80
66 250 k
Accuracy (%)

70 Accuracy (%) 64
200 k
62
60
150 k
all-ReLU baseline 60 all-ReLU baseline
50 ReLU + Identity ReLU + Identity 100 k all-ReLU baseline
58
ReLU + Quadratic ReLU + Quadratic Delphi
40 50 k
56
0 1 2 3 4 5 6 7 0 5 10 15 20 25 0 5 10 15 20 25
Number of non-ReLU layers Number of non-ReLU layers Number of non-ReLU layers

Figure 7: CIFAR-10 accuracy of 7-layer Min- Figure 8: CIFAR-100 accuracy of ResNet-32 Figure 9: Number of ReLU activations in
iONN networks found by our planner. networks found by our planner. ResNet-32 networks found by our planner.

Delphi preproc.
175 Delphi online
Total execution time (s)

150 Gazelle preproc.


Gazelle online
125
conv. parameters time (ms) 100
input kernel stride &
b=1 b=5 b = 10 75
C ×H ×W N ×K ×K padding
16 × 32 × 32 16 × 3 × 3 (1, 1) 0.34 1.07 2.75 50

32 × 16 × 16 32 × 3 × 3 (1, 1) 0.24 0.61 1.10 25


64 × 8 × 8 64 × 3 × 3 (1, 1) 0.24 0.37 0.616 0
MiniONN ResNet-32
Table 2: Computation time and communication cost of ResNet-32
convolutions in D ELPHI when run on the GPU across different batch Figure 10: Total execution time on the best planned network
sizes b. (D ELPHI) and the all-ReLU baseline (G AZELLE).

we run the planner to produce networks with a varying num-


ber (say k) of quadratic layers. We then compare the number
of ReLU activations in these networks to that in all-ReLU
networks (like those supported by G AZELLE). Fig. 9 illus-
trates this comparison for ResNet-32 on CIFAR-100. We
observe that the networks found by our planner consistently
activation time (µs) comm. (kB) have fewer activations than the all-ReLU baseline.
function preproc. online preproc. online
Quad 9.6 0.04 0.152 0.008 7.4 D ELPHI’s cryptographic protocols
ReLU 111.9 65.0 17.5 2.048 We demonstrate the effectiveness of D ELPHI’s cryptographic
protocol by showing that D ELPHI’s preprocessing phase and
Table 3: Amortized computation time and communication cost of
online phase offer significant savings in latency and commu-
individual ReLU and quadratic activations in D ELPHI.
nication cost over prior work (G AZELLE). Figs. 10 and 11
summarizes this improvement for networks found by our plan-
ner; we provide a detailed evaluation next.

11
Delphi preproc. model, the two methods agreed over 99.2% of the time, thus
6
Total data transferred (GB) Delphi online
indicating that D ELPHI mostly avoids these errors.
Gazelle preproc.
5
Gazelle online
4
8 Related work
3
We first discuss cryptographic techniques for for secure exe-
cution of machine learning algorithms in Section 8.1. Then,
2 in Section 8.2, we discuss model inference attacks that re-
1 cover information about the model from predictions, as well
as countermeasures for these attacks. Finally, in Section 8.3,
0
MiniONN ResNet-32 we discuss prior work on neural architecture search.

Figure 11: Total communication on the best planned network 8.1 Secure machine learning
(D ELPHI) and the all-ReLU baseline (G AZELLE).
The problem of secure inference can be solved via generic
secure computation techniques like secure two-party (2PC)
Preprocessing phase. Figs. 12a and 13a compare the time computation [Yao86; Gol+87], fully homomorphic encryp-
required to execute the preprocessing phases of D ELPHI tion (FHE) [Gen09], or homomorphic secret sharing (HSS)
and G AZELLE on ResNet-32 on CIFAR-100 and the Min- [Boy+16]. However, the resulting protocols would suffer from
iONN architecture on CIFAR-10, respectively. In both cases, terrible communication and computation complexity. For in-
we observe that, on networks that have a large number of stance, the cost of using 2PC to compute a function grows
ReLU activations, D ELPHI’s preprocessing time is larger than with the size of the (arithmetic or boolean) circuit for that func-
G AZELLE’s. This is because D ELPHI needs to additionally tion. In our setting, the function being computed is the neural
perform preprocessing for each linear layer. However, as the network itself. Evaluating the network requires matrix-vector
number of approximate activations increases, D ELPHI’s pre- multiplication, and circuits for this operation grow quadrat-
processing time quickly decreases below that of G AZELLE, ically with the size of the input. Thus using a generic 2PC
because garbling circuits for ReLUs is far more expensive protocol for secure inference would result in an immediate
than the preprocessing phase for the approximate activations. quadratic blow up in both computation and communication.
A similar trend can be observed for communication costs in Similarly, despite a series of efforts to improve the effi-
Figs. 12c and 13c. Overall, for the most efficient networks ciency of FHE [Bra+11; Gen+11; Fan+12; Hal+18; Hal+19]
output by our planner, D ELPHI requires 1.5–2 × less prepro- and HSS [Boy+17], their computational overhead is still large,
cessing time, and 6–40 × less communication. making them unsuitable for use in our scenario.
Online phase. Figs. 12b and 13b compare the time required Hence, it seems that it is necessary to design specialized
to execute the online phases of D ELPHI and G AZELLE on protocols for secure machine learning, and indeed there is a
ResNet-32 on CIFAR-100 and the MiniONN architecture long line of prior work [Du+04; Lau+06; Bar+09; Nik+13a;
on CIFAR-10, respectively. In both cases, we observe that Nik+13b; Sam+15; Bos+15; Wu+16a; Aon+16; Sch+19] that
G AZELLE’s use of HE for processing linear layers imposes a does exactly this. These works generally fall into two cat-
significant computational cost. Furthermore, as the number of egories: those that focus on secure training, and those that
approximate activations increases, the gap between D ELPHI focus on secure inference. Since secure training is not our
and G AZELLE grows larger. A similar trend can be observed focus in this paper, we omit discussing it, and instead focus
for communication costs in Figs. 12d and 13d. Overall, for on prior work on secure inference. Most of these early works
the most efficient networks output by our planner, D ELPHI focus on simpler machine learning algorithms such as SVMs
requires 22–100 × less time to execute its online phase, and and linear regression. Designing cryptographic protocols for
9–40 × less communication. these simpler algorithms is often more tractable than our set-
ting of inference for neural networks.
Accuracy during protocol execution. As noted in Re-
mark 4.2, D ELPHI uses a “truncation trick” from [Moh+17] Hence, in the rest of this section we discuss prior work that
to avoid overflow when multiplying secret-shared fixed point focus on secure inference over neural networks. This work
numbers. However, ABY3 [Moh+18] demonstrates that this generally falls into the following categories: (a) 2PC-based
trick can introduce large errors with non-negligible probabil- protocols; (b) FHE-based protocols; (c) TEE-based protocols;
ity. For D ELPHI, this can result in erroneous classifications. To and (d) protocols working in a multi-party model.
verify that D ELPHI avoids such errors, we ran the following 2PC-based protocols. SecureML [Moh+17] is one of the
experiment. For each MiniONN model output by our planner, first systems to focus on the problem of learning and predict-
we first sampled 1000 images from the CIFAR-10 test set, and ing with neural networks securely. However, it relies entirely
then counted the number of images that the plaintext model on generic 2PC protocols to do this, resulting in poor perfor-
classified differently from D ELPHI. We found that, for each mance on realistic networks. MiniONN [Liu+17a] uses the

12
Preprocessing time (s)

Data transferred (GB)

Data transferred (GB)


180 80
6

Inference time (s)


0.6
Delphi
160 Gazelle 60

140 Delphi 4
0.4 Delphi
40 Gazelle Gazelle
120
20 Delphi 0.2
100 2 Gazelle
0
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
Number of non-ReLU layers Number of non-ReLU layers Number of non-ReLU layers Number of non-ReLU layers

(a) Preprocessing time (b) Online time (c) Preprocessing communication (d) Online communication

Figure 12: Comparison of D ELPHI with G AZELLE on the ResNet-32 architecture.


Preprocessing time (s)

Data transferred (GB)

Data transferred (GB)


Inference time (s)

100
Delphi 3
40 0.3
Gazelle
80 Delphi Delphi
2 0.2
20 Gazelle Gazelle
60 0.1
1 Delphi
Gazelle
40 0 0.0
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
Number of non-ReLU layers Number of non-ReLU layers Number of non-ReLU layers Number of non-ReLU layers

(a) Preprocessing time (b) Online time (c) Preprocessing communication (d) Online communication

Figure 13: Comparison of D ELPHI with G AZELLE on the architecture from MiniONN [Liu+17a].

SPDZ protocol to compute linear layers and polynomial ap- FHE-based protocols. CryptoNets [Gil+16] is the first work
proximation activations. Unlike D ELPHI, MiniONN generates that attempts to optimize and tailor FHE schemes for secure
multiplicative triples for each multiplication in a linear layer; inference. Despite optimizations, the limitations of FHE mean
for a layer with input size n, MiniONN requires n2 offline and that CryptoNets is limited to networks only a few layers deep,
online communication, compared to n for D ELPHI. and even for these networks it only becomes efficient when
G AZELLE [Juv+18] is the system most similar to ours: it processing a batch of inputs. Recent papers [Hes+17; Bru+18;
uses an efficient HE-based protocol for linear layers, while Bou+18; Cho+18; San+18] develop different approaches to
using garbled circuits to compute non-linear activations. How- optimize the CryptoNets paradigm, but the resulting proto-
ever, its reliance on heavy cryptographic operations in the cols still require tens of minutes to provide predictions over
online phase results in a protocol that is more expensive than networks much smaller than the ones we consider here.
D ELPHI’s protocol with respect to both computation and com- CHET [Dat+19] compiles high-level specifications of neu-
munication (see Section 7 for a thorough comparison). ral network to FHE-based inference protocols. To efficiently
DeepSecure [Rou+18] and XONN [Ria+19] use garbled use FHE, CHET must replace all ReLUs with polynomial
circuits to provide secure inference for the restricted class of approximations, which harms accuracy for large networks.
binarized neural networks [Cou+15] whose weights are all TEE-based protocols. There are two approaches for infer-
boolean. This restriction enables these protocols to construct ence using trusted execution enclaves (TEEs): (a) inference
a protocol that uses only a constant number of round trips. via server-side enclaves, where the client uploads their input
DeepSecure additionally prunes the input neural network to to the server’s enclave, and (b) inference in client-side en-
reduce the number of activations. Ball et al. [Bal+19] have claves, where the client submits queries to a model stored in
also recently constructed a protocol for secure inference that the client-side enclave.
relies on the garbling schemes of [Bal+16]. Unlike XONN Slalom and Privado are examples of protocols that rely on
and DeepSecure, the protocol of [Bal+19] supports general server-side enclaves. Slalom [Tra+19], like D ELPHI, splits
neural networks. Despite optimizations, each of these works inference into an offline and online phase, and uses additive
suffers from large concrete costs because each work performs secret sharing for the online phase. Unlike D ELPHI, Slalom
matrix-vector multiplications inside the garbled circuit. uses the Intel SGX hardware enclave [McK+13] to securely
EzPC [Cha+17], on input a high-level description of a pro- compute both the offline and online phases. Privado [Top+18]
gram, synthesizes a cryptographic protocol implementing that compiles neural networks into oblivious neural networks,
program. The compiled protocol intelligently uses a mix of meaning that computing the transformed network does not re-
arithmetic and boolean 2PC protocols to increase efficiency. quire branching on secret data. They use the oblivious network

13
to perform inference inside Intel SGX enclaves. Slalom’s im- placement of quadratic approximation layers within a net-
plementation indicates that it does not implement linear or work, as ReLU activations were the bottleneck in our system.
non-linear layers obliviously. Common approaches to neural architecture search include
MLCapsule [Han+18] describes a system for performing those based on reinforcement-learning [Zop+17], evolution-
inference via client-side enclaves. Apple uses a client-side ary algorithms [Yao99; Ber+13], and random search [Ber+12;
secure enclave to perform fingerprint and face matching to Jad+17]. D ELPHI’s planner uses the Population-Based Train-
authorize users [App19]. ing algorithm [Jad+17] to perform NAS. PBT can be seen
In general, most TEE-based cryptographic inference pro- as a hybrid of the evolutionary algorithm and random search
tocols offer better efficiency than protocols that rely on cryp- approaches.
tographic (like D ELPHI). This improved efficiency comes at
the cost of a weaker threat model that requires trust in hard- 9 Acknowledgements
ware vendors and the implementation of the enclave. Further- We thank Liam Li for the suggestion to use the PBT algorithm
more, because the protocol execution occurs in an adversarial to perform NAS, Joey Gonzalez for answering questions about
environment, any side-channel leakage is more dangerous PBT, Robert Nishihara for the suggestion to use ReLU’s gradi-
(since the adversary can carefully manipulate the execution ents to guide gradient descent, Chiraag Juvekar for providing
to force this leakage). Indeed, the past few years have seen a the code for G AZELLE, and our shepherd Siddharth Garg and
number of powerful side-channel attacks [Bra+17; Häh+17; the anonymous reviewers for their invaluable feedback. This
Göt+17; Mog+17; Sch+17; Wan+17; Van+18] against popu- work was supported by the NSF CISE Expeditions Award
lar enclaves like Intel SGX and ARM TrustZone. CCF-1730628, as well as gifts from the Sloan Foundation,
Protocols with more parties. The discussion above focuses Bakar and Hellman Fellows Fund, Alibaba, Amazon Web Ser-
on two-party protocols, because in our opinion secure infer- vices, Ant Financial, Arm, Capital One, Ericsson, Facebook,
ence maps naturally to this setting. Nevertheless, a number of Google, Intel, Microsoft, Scotiabank, Splunk and VMware.
works [Ria+18; Wag+18; Tfe; Bar+19] have instead targeted
the three-party setting where shares of the model are divided
References
amongst two non-colluding servers, and a client must interact [Aba+16] M. Abadi, A. Chu, I. J. Goodfellow, H. B.
with these servers to obtain their prediction. McMahan, I. Mironov, K. Talwar, and L. Zhang.
“Deep Learning with Differential Privacy”. In:
8.2 Model leakage from predictions CCS ’16.
Prediction API attacks [Ate+15; Fre+15; Wu+16b; Tra+16; [Aon+16] Y. Aono, T. Hayashi, L. T. Phong, and L. Wang.
Sho+17; Jag+19] aim to learn private information about the “Scalable and Secure Logistic Regression via
server’s model or training data given access only to the results Homomorphic Encryption”. In: CODASPY ’16.
of predictions on arbitrary queries.
[App19] Apple. “iOS Security”. https://www.apple.
There is no general defense against prediction API attacks
com/business/docs/site/iOS_Security_
beyond rate limiting and query auditing [Jag+19]. However,
Guide.pdf.
there are defenses against specific classes of attacks. For
example, one can use differentially private training [Sho+15; [Ate+15] G. Ateniese, L. V. Mancini, A. Spognardi, A. Vil-
Aba+16] to train neural networks that that do not leak sensitive lani, D. Vitali, and G. Felici. “Hacking smart ma-
information about the underlying training data. chines with smarter ones: How to extract mean-
The guarantees of D ELPHI are complementary to those pro- ingful data from machine learning classifiers”.
vided by any such mitigations. Indeed, with sufficient effort, In: IJSN (2015).
these techniques can be integrated into D ELPHI to provide [Bal+16] M. Ball, T. Malkin, and M. Rosulek. “Garbling
even stronger privacy guarantees; we leave this to future work. Gadgets for Boolean and Arithmetic Circuits”.
In: CCS ’16.
8.3 Neural architecture search
[Bal+19] M. Ball, B. Carmer, T. Malkin, M. Rosulek, and
Recently, machine learning research has seen rapid advance- N. Schimanski. “Garbled Neural Networks are
ment in the area of neural architecture search (NAS) (see Practical”. ePrint Report 2019/338.
[Els+19; Wis+19] for surveys). The aim of this field is to
develop methods to automatically optimize properties of a [Bar+09] M. Barni, P. Failla, V. Kolesnikov, R. Lazzeretti,
neural network like accuracy and efficiency by optimizing A. Sadeghi, and T. Schneider. “Secure Evalua-
the hyperparameters of the network. Examples of commonly tion of Private Linear Branching Programs with
optimized hyperparameters include the size of convolutional Medical Applications”. In: ESORICS ’09.
kernels, the number of layers, and parameters of the gradient [Bar18] B. Barrett. “The year Alexa grew up”. https:
descent algorithm like learning rate and momentum. In this //www.wired.com/story/amazon- alexa-
work, we rely on NAS algorithms only for optimizing the 2018-machine-learning/.

14
[Bar+19] A. Barak, D. Escudero, A. Dalskov, and M. [Cou+15] M. Courbariaux, Y. Bengio, and J. David. “Bi-
Keller. “Secure Evaluation of Quantized Neural naryConnect: Training Deep Neural Networks
Networks”. ePrint Report 2019/131. with binary weights during propagations”. In:
[Bea95] D. Beaver. “Precomputing Oblivious Transfer”. NeurIPS ’18.
In: CRYPTO ’95. [Dat+19] R. Dathathri, O. Saarikivi, H. Chen, K. Laine,
[Bel+12] M. Bellare, V. T. Hoang, and P. Rogaway. “Foun- K. E. Lauter, S. Maleki, M. Musuvathi, and T.
dations of garbled circuits”. In: CCS ’12. Mytkowicz. “CHET: An optimizing compiler
for fully-homomorphic neural-network inferenc-
[Ben+94] Y. Bengio, P. Y. Simard, and P. Frasconi. “Learn- ing”. In: PLDI ’19.
ing long-term dependencies with gradient de-
scent is difficult”. In: IEEE Trans. Neural Net- [Du+04] W. Du, Y. S. Han, and S. Chen. “Privacy-
works (1994). Preserving Multivariate Statistical Analysis:
Linear Regression and Classification”. In:
[Ber+12] J. Bergstra and Y. Bengio. “Random Search
SDM ’04.
for Hyper-Parameter Optimization”. In: JMLR
(2012). [Elg85] T. Elgamal. “A public key cryptosystem and a
signature scheme based on discrete logarithms”.
[Ber+13] J. Bergstra, D. Yamins, and D. D. Cox. “Mak-
In: IEEE Trans. on Inf. Theory (1985).
ing a Science of Model Search: Hyperparameter
Optimization in Hundreds of Dimensions for [Els+19] T. Elsken, J. H. Metzen, and F. Hutter. “Neu-
Vision Architectures”. In: ICML ’13. ral Architecture Search: A Survey”. In: JMLR
(2019).
[Bos+15] R. Bost, R. A. Popa, S. Tu, and S. Gold-
wasser. “Machine Learning Classification over [Eve+82] S. Even, O. Goldreich, and A. Lempel. “A Ran-
Encrypted Data”. In: NDSS ’15. domized Protocol for Signing Contracts”. In:
CRYPTO ’82.
[Bou+18] F. Bourse, M. Minelli, M. Minihold, and
P. Paillier. “Fast Homomorphic Evaluation [Fan+12] J. Fan and F. Vercauteren. “Somewhat Practical
of Deep Discretized Neural Networks”. In: Fully Homomorphic Encryption”. ePrint Report
CRYPTO ’18. 2012/144.
[Boy+16] E. Boyle, N. Gilboa, and Y. Ishai. “Function [Fre+14] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page,
Secret Sharing: Improvements and Extensions”. and T. Ristenpart. “Privacy in Pharmacogenet-
In: CCS ’16. ics: An End-to-End Case Study of Personalized
Warfarin Dosing”. In: USENIX Security ’14.
[Boy+17] E. Boyle, G. Couteau, N. Gilboa, Y. Ishai, and
M. Orrú. “Homomorphic Secret Sharing: Opti- [Fre+15] M. Fredrikson, S. Jha, and T. Ristenpart. “Model
mizations and Applications”. In: CCS ’17. Inversion Attacks that Exploit Confidence In-
formation and Basic Countermeasures”. In:
[Bra+11] Z. Brakerski and V. Vaikuntanathan. “Efficient
CCS ’15.
Fully Homomorphic Encryption from (Stan-
dard) LWE”. In: FOCS ’11. [Gen09] C. Gentry. “Fully homomorphic encryption us-
ing ideal lattices”. In: STOC ’09.
[Bra+17] F. Brasser, U. Müller, A. Dmitrienko, K. Kos-
tiainen, S. Capkun, and A. Sadeghi. “Software [Gen+11] C. Gentry and S. Halevi. “Implementing Gen-
Grand Exposure: SGX Cache Attacks Are Prac- try’s Fully-Homomorphic Encryption Scheme”.
tical”. In: WOOT ’17. In: EUROCRYPT ’11.
[Bru+18] A. Brutzkus, O. Elisha, and R. Gilad-Bachrach. [Gho+17] Z. Ghodsi, T. Gu, and S. Garg. “SafetyNets: Ver-
“Low Latency Privacy Preserving Inference”. ifiable Execution of Deep Neural Networks on
ArXiV, cs.CR 1812.10659. an Untrusted Cloud”. In: NIPS ’17.
[Cha+17] N. Chandran, D. Gupta, A. Rastogi, R. Sharma, [Gil+16] R. Gilad-Bachrach, N. Dowlin, K. Laine, K.
and S. Tripathi. “EzPC: Programmable, Effi- Lauter, M. Naehrig, and J. Wernsing. “Cryp-
cient, and Scalable Secure Two-Party Compu- toNets: Applying Neural Networks to Encrypted
tation for Machine Learning”. ePrint Report Data with High Throughput and Accuracy”. In:
2017/1109. ICML ’16.
[Cho+18] E. Chou, J. Beal, D. Levy, S. Yeung, A. Haque, [Gol+87] O. Goldreich, S. Micali, and A. Wigderson.
and L. Fei-Fei. “Faster CryptoNets: Leveraging “How to Play any Mental Game or A Complete-
Sparsity for Real-World Encrypted Inference”. ness Theorem for Protocols with Honest Major-
ArXiV, cs.CR 1811.09953. ity”. In: STOC ’87.

15
[Goo] Google. “Google Lens”. https : / / lens . [Lia+18] R. Liaw, E. Liang, R. Nishihara, P. Moritz, J. E.
google.com/. Gonzalez, and I. Stoica. “Tune: A Research Plat-
[Göt+17] J. Götzfried, M. Eckert, S. Schinzel, and T. form for Distributed Model Selection and Train-
Müller. “Cache Attacks on Intel SGX”. In: EU- ing”. In:
ROSEC ’17. [Liu+17a] J. Liu, M. Juuti, Y. Lu, and N. Asokan. “Oblivi-
[Häh+17] M. Hähnel, W. Cui, and M. Peinado. “High- ous Neural Network Predictions via MiniONN
Resolution Side Channels for Untrusted Operat- Transformations”. In: CCS ’17.
ing Systems”. In: ATC ’2017. [Liu+17b] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and
[Hal+18] S. Halevi and V. Shoup. “Faster Homomor- F. E. Alsaadi. “A survey of deep neural network
phic Linear Transformations in HElib”. In: architectures and their applications”. In: Neuro-
CRYPTO ’18. computing (2017).
[Hal+19] S. Halevi, Y. Polyakov, and V. Shoup. “An Im- [Liu+18] K. Liu, B. Dolan-Gavitt, and S. Garg. “Fine-
proved RNS Variant of the BFV Homomorphic Pruning: Defending Against Backdooring
Encryption Scheme”. In: CT-RSA ’19. Attacks on Deep Neural Networks”. In:
RAID ’2018.
[Han+18] L. Hanzlik, Y. Zhang, K. Grosse, A. Salem,
M. Augustin, M. Backes, and M. Fritz. “ML- [McK+13] F. McKeen, I. Alexandrovich, A. Berenzon, C. V.
Capsule: Guarded Offline Deployment of Ma- Rozas, H. Shafi, V. Shanbhogue, and U. R. Sav-
chine Learning as a Service”. ArXiV, cs.CR agaonkar. “Innovative instructions and software
1808.00590. model for isolated execution”. In: HASP ’13.
[He+16] K. He, X. Zhang, S. Ren, and J. Sun. “Deep [Mog+17] A. Moghimi, G. Irazoqui, and T. Eisenbarth.
Residual Learning for Image Recognition”. In: “CacheZoom: How SGX Amplifies the Power of
CVPR ’16. Cache Attacks”. In: CHES ’17.
[Hes+17] E. Hesamifard, H. Takabi, and M. Ghasemi. [Moh+17] P. Mohassel and Y. Zhang. “SecureML: A Sys-
“CryptoDL: Deep Neural Networks over En- tem for Scalable Privacy-Preserving Machine
crypted Data”. ArXiV, cs.CR 1711.05189. Learning”. In: IEEE S&P ’17.
[Ish+03] Y. Ishai, J. Kilian, K. Nissim, and E. Petrank. [Moh+18] P. Mohassel and P. Rindal. “ABY3 : A Mixed
“Extending Oblivious Transfers Efficiently”. In: Protocol Framework for Machine Learning”. In:
CRYPTO ’03. CCS ’18.
[Jad+17] M. Jaderberg, V. Dalibard, S. Osindero, W. Czar- [Mol+17] P. Molchanov, S. Tyree, T. Karras, T. Aila, and
necki, J. Donahue, A. Razavi, et al. “Population J. Kautz. “Pruning Convolutional Neural Net-
Based Training of Neural Networks”. ArXiV, works for Resource Efficient Inference”. In:
cs.LG 1711.09846. ICLR ’17.
[Jag+19] M. Jagielski, N. Carlini, D. Berthelot, A. Ku- [Nik+13a] V. Nikolaenko, S. Ioannidis, U. Weinsberg,
rakin, and N. Papernot. “High-Fidelity Extrac- M. Joye, N. Taft, and D. Boneh. “Privacy-
tion of Neural Network Models”. ArXiV, cs.LG preserving matrix factorization”. In: CCS ’13.
1909.01838. [Nik+13b] V. Nikolaenko, U. Weinsberg, S. Ioannidis,
[Juv+18] C. Juvekar, V. Vaikuntanathan, and A. Chan- M. Joye, D. Boneh, and N. Taft. “Privacy-
drakasan. “GAZELLE: A Low Latency Frame- Preserving Ridge Regression on Hundreds of
work for Secure Neural Network Inference”. In: Millions of Records”. In: IEEE S&P ’13.
USENIX ’18. [Pai99] P. Paillier. “Public-Key Cryptosystems Based
[Kri10] A. Krizhevsky. “Convolutional Deep Belief Net- on Composite Degree Residuosity Classes”. In:
works on CIFAR-10”. Unpublished manuscript. EUROCRYPT ’99.
http://www.cs.utoronto.ca/~kriz/conv- [Rab81] M. O. Rabin. “How To Exchange Secrets with
cifar10-aug2010.pdf. Oblivious Transfer”. Harvard University Tech-
[Kun] Kuna. “Kuna AI”. https : / / getkuna . com / nical Report 81 (TR-81).
blogs / news / 2017 - 05 - 24 - introducing - [Reg09] O. Regev. “On lattices, learning with errors, ran-
kuna-ai. dom linear codes, and cryptography”. In: JACM
[Lau+06] S. Laur, H. Lipmaa, and T. Mielikäinen. “Cryp- (2009).
tographically private support vector machines”.
In: KDD ’06.

16
[Ria+18] M. S. Riazi, C. Weinert, O. Tkachenko, E. M. [Van+18] J. Van Bulck, M. Minkin, O. Weisse, D. Genkin,
Songhori, T. Schneider, and F. Koushanfar. B. Kasikci, F. Piessens, M. Silberstein, T. F.
“Chameleon: A Hybrid Secure Computation Wenisch, Y. Yarom, and R. Strackx. “Fore-
Framework for Machine Learning Applica- shadow: Extracting the Keys to the Intel SGX
tions”. In: AsiaCCS ’18. Kingdom with Transient Out-of-Order Execu-
[Ria+19] M. S. Riazi, M. Samragh, H. Chen, K. Laine, tion”. In: USENIX Security ’18.
K. Lauter, and F. Koushanfar. “XONN: XNOR- [Wag+18] S. Wagh, D. Gupta, and N. Chandran. “Se-
based Oblivious Deep Neural Network Infer- cureNN: Efficient and Private Neural Network
ence”. In: USENIX ’19. Training”. ePrint Report 2018/442.
[Rou+18] B. D. Rouhani, M. S. Riazi, and F. Koushanfar.
[Wan+17] W. Wang, G. Chen, X. Pan, Y. Zhang, X. Wang,
“DeepSecure: Scalable Provably-secure Deep
V. Bindschaedler, H. Tang, and C. A. Gunter.
Learning”. In: DAC ’18.
“Leaky Cauldron on the Dark Land: Understand-
[Sam+15] B. K. Samanthula, Y. Elmehdwi, and W. Jiang. ing Memory Side-Channel Hazards in SGX”. In:
“k-Nearest Neighbor Classification over Seman- CCS ’17.
tically Secure Encrypted Relational Data”. In:
IEEE Trans. Knowl. Data Eng. (2015). [Wis+19] M. Wistuba, A. Rawat, and T. Pedapati. “A Sur-
vey on Neural Architecture Search”. ArXiV,
[San+18] A. Sanyal, M. Kusner, A. Gascón, and V.
cs.LG 1905.01392.
Kanade. “TAPAS: Tricks to Accelerate
(encrypted) Prediction As a Service”. In: [Wu+16a] D. J. Wu, T. Feng, M. Naehrig, and K. E. Lauter.
ICML ’18. “Privately Evaluating Decision Trees and Ran-
[Sch+17] M. Schwarz, S. Weiser, D. Gruss, C. Maurice, dom Forests”. In: PoPETs (2016).
and S. Mangard. “Malware Guard Extension: [Wu+16b] X. Wu, M. Fredrikson, S. Jha, and J. F.
Using SGX to Conceal Cache Attacks”. In: Naughton. “A Methodology for Formalizing
DIMVA ’17. Model-Inversion Attacks”. In: CSF ’16.
[Sch+19] P. Schoppmann, A. Gascon, M. Raykova, and
[Wyz] “Wyze: Contact and Motion Sensors for Your
B. Pinkas. “Make Some ROOM for the Zeros:
Home”. https://www.wyze.com/.
Data Sparsity in Secure Distributed Machine
Learning”. ePrint Report 2019/281. [Yao86] A. C. Yao. “How to Generate and Exchange
[Sea] “Microsoft SEAL (release 3.3)”. https : / / Secrets (Extended Abstract)”. In: FOCS ’86.
github.com/Microsoft/SEAL. Microsoft Re- [Yao99] X. Yao. “Evolving artificial neural networks”.
search, Redmond, WA. In: Proceedings of the IEEE (1999).
[Sho+15] R. Shokri and V. Shmatikov. “Privacy-
Preserving Deep Learning”. In: CCS ’15. [Zop+17] B. Zoph and Q. V. Le. “Neural Architec-
ture Search with Reinforcement Learning”. In:
[Sho+17] R. Shokri, M. Stronati, C. Song, and V. ICLR ’17.
Shmatikov. “Membership Inference Attacks
Against Machine Learning Models”. In:
S&P ’17. A Security properties of our building blocks
[Tfe] “TF Encrypted”. https : / / github . com / Security for garbled circuits requires the existence of a
mortendahl/tf-encrypted. simulator SimGS that, given input 1λ , 1|C| , and C(x), outputs
[Top+18] S. Tople, K. Grover, S. Shinde, R. Bhagwan, and C̃, {labeli }i∈[n] such that this output is computationally indis-
R. Ramjee. “Privado: Practical and Secure DNN tinguishable to (C̃, {labeli,xi }) generated by GS.Garble.
Inference”. ArXiV, cs.CR 1810.00602.
Security for linearly homomorphic encryption schemes re-
[Tra+16] F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, and quires the scheme to satisfy the following properties:
T. Ristenpart. “Stealing Machine Learning Mod- • Semantic security. For any two messages m, m0 , we require
els via Prediction APIs”. In: USENIX Secu- {pk, HE.Enc(pk, m)} ≈c {pk, HE.Enc(pk, m0 )}, where the
rity ’16. two distributions are over the random choice of pk and the
[Tra+19] F. Tramer and D. Boneh. “Slalom: Fast, Verifi- random coins of the encryption algorithm.
able and Private Execution of Neural Networks • Function privacy. There exists a simulator SimFP such that
in Trusted Hardware”. In: ICLR ’19. for every efficient adversary A , every linear function L, and
every pair of messages m1 , m2 , we have that the following

17
distributions are computationally indistinguishable: that this hybrid is indistinguishable to the previous hybrid.
Notice that in this hybrid, the server is no longer using
(r, r1 , r2 ) ← {0, 1}λ
 

 
 xi − ri , si as well as the matrix Mi to evaluate the i-th layer.
 λ 

 (pk, sk) ← HE.KeyGen(1 ; r) 
• Hyb6 : For every homomorphic evaluation in the offline
0
(r, r1 , r2 , c ) : c1 ← HE.Enc(pk, m1 ; r1 )
  phase, we use the simulator SimFP for the function privacy


 c2 ← HE.Enc(pk, m2 ; r2 ) 

 of HE. Note that SimFP only requires the output Mi · ri −
 0 
c ← HE.Eval(pk, c1 , c2 , L) si to generate the homomorphically evaluated ciphertext.
≈c It follows from the function privacy of HE that Hyb6 is
SimFP (1λ , m1 , m2 , L(m1 , m2 )) computationally indistinguishable from Hyb5 .
B Security proofs • Hyb7 : In this hybrid, we replace the input −s0i given to
SimFP with randomly sampled s0i from R n (instead of the
Proof of indistinguishability with corrupted client. We true value Mi · ri − si ). Thus Hyb7 is distributed identically
show that the real world distribution is computationally indis- to Hyb6 as si is chosen uniformly at random. Finally, we
tinguishable to the simulated distribution via a hybrid argu- note that Hyb7 is identically distributed to the simulator’s
ment. In the final simulated distribution, the simulator does output, completing the proof.
not use the weights for the server’s model, and so a corrupted
client learns nothing beyond the output prediction and the Proof of indistinguishability with corrupted server. We
model architecture in the real world. show that the real world distribution is computationally indis-
• Hyb0 : This corresponds to the real world distribution where tinguishable to the simulated distribution via a hybrid argu-
the server uses its input matrices M1 , . . . , M`−1 . ment. In the final simulated distribution, the simulator does
• Hyb1 : This hybrid involves only a syntactic change. In not use the user’s input, and so a corrupted server learns noth-
the output phase, the simulator sends y − r` to the client, ing in the real world.
where y is the output of the neural network on input x. Ad- • Hyb0 : This corresponds to the real world distribution where
ditionally, the simulator uses the knowledge of the client’s the client uses its actual input x.
random tape to begin the evaluation of the i-th layer with • Hyb1 : This hybrid involves only a syntactic change. For
xi − ri . Since this is a syntactic change, Hyb1 is distributed every layer that is evaluated by garbled circuits, instead of
identically to Hyb0 . evaluating the circuits, we instead send OT P(xi+1 − ri+1 )
• Hyb2 : We change the inputs that the server provides to by using our knowledge of x, the matrices Mi , and the
each OT execution where it acts as the sender. Instead of random tape of the server. Similarly, in every quadratic
providing the labels corresponding to 0 and 1 in each OT approximation layer, we send a share in the final round
execution, the server provides labeli,b where b is the input such that when the server adds it with its own share it gets
used by the client in that OT execution. Note that in the xi+1 − ri+1 . Because this change is only syntactic, Hyb1 is
semi-honest setting, we know b as a result of setting the identical to Hyb0 .
random tape as well learning the input of the corrupted • Hyb2 : In this hybrid, we change the inputs that the client
client. It follows from the sender security of OT that Hyb2 provides to each OT execution where it is acting as the
is indistinguishable from Hyb1 . receiver. Instead of providing the actual inputs, it provides
• Hyb3 : In this hybrid, for every layer of the neural network some junk inputs, say 0. It follows from the receiver secu-
that uses garbled circuits, we generate Ce using SimGS on rity of the underlying oblivious transfer protocol that Hyb2
input 1λ , 1|C| and C(z) where z is the input that the client is computationally indistinguishable from Hyb1 .
uses to evaluate this circuit (this is again known in the semi- • Hyb3 : In this hybrid, we generate the multiplication triples
honest setting as a result of setting the random tape and in the offline phase using the simulator for Beaver’s multi-
knowing the input). Note that C(z) is an OTP encryption plication protocol. It follows from the simulation security
and hence is distributed identically to a random string. It of this protocol that Hyb4 is indistinguishable from Hyb2 .
follows from the security of the garbled circuits that Hyb3 • Hyb4 : In this hybrid, for every quadratic approximation
is indistinguishable from Hyb2 . layer of the neural network, we use the simulator for the
• Hyb4 : In this hybrid, we generate the multiplication triples Beaver’s multiplication procedure. It follows from simula-
in the offline phase using the corresponding simulator for tion security that Hyb4 is indistinguishable from Hyb3 .
Beaver’s protocol. It follows from the simulation security • Hyb5 : In this hybrid, we change the ciphertexts sent by the
of this protocol that Hyb4 is indistinguishable from Hyb3 . client in the offline phase. Instead of sending encryptions
• Hyb5 : In this hybrid, for every quadratic approximation of ri , the client sends HE.Enc(pk, 0). It follows from the
layer, we use the simulator for Beaver’s multiplication semantic security of the encryption scheme that Hyb5 is
procedure. It again follows from the simulation security computationally indistinguishable from Hyb4 .

18
• Hyb6 : In this hybrid, we make the following changes. For the preamble phase, we send an uniformly chosen value r1 .
every layer that is evaluated by garbled circuits, we send Hyb6 is distributed identically to Hyb5 . Finally, note that
OT P(ri+1 ) for a randomly chosen ri+1 . Similarly, in every Hyb6 is identically distributed to the simulator’s output,
quadratic approximation layer, we send a share in the final completing the proof.
round that is chosen uniformly at random. Additionally, in

19

You might also like