[go: up one dir, main page]

0% found this document useful (0 votes)
10 views12 pages

Evaluating Website Fingerprinting Defenses

This paper systematically analyzes website fingerprinting attacks and defenses, highlighting the inadequacies of existing defenses in terms of security and bandwidth overhead. It introduces a new defense called Tamaraw, which offers improved security and efficiency compared to previous methods. The authors also present a mathematical framework for evaluating the performance of these defenses in both closed-world and open-world scenarios.

Uploaded by

Richa Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views12 pages

Evaluating Website Fingerprinting Defenses

This paper systematically analyzes website fingerprinting attacks and defenses, highlighting the inadequacies of existing defenses in terms of security and bandwidth overhead. It introduces a new defense called Tamaraw, which offers improved security and efficiency compared to previous methods. The authors also present a mathematical framework for evaluating the performance of these defenses in both closed-world and open-world scenarios.

Uploaded by

Richa Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

A Systematic Approach to Developing and Evaluating

Website Fingerprinting Defenses

Xiang Cai 1 Rishab Nithyanand 1 Tao Wang 2 Rob Johnson 1 Ian Goldberg 2
1 2
Stony Brook University University of Waterloo
{xcai, rnithyanand, rob}@[Link] {t55wang, iang}@[Link]

ABSTRACT transferred between the web server and client, but they do not ef-
Fingerprinting attacks have emerged as a serious threat against pri- fectively hide the size, timing, and direction of packets. A website
vacy mechanisms, such as SSL, Tor, and encrypting tunnels. Re- fingerprinting attack uses these features to infer the web page being
searchers have proposed numerous attacks and defenses, and the loaded by a client.
Tor project now includes both network- and browser-level defenses Website fingerprinting attacks have emerged as a serious threat
against these attacks, but published defenses have high overhead, against web browsing privacy mechanisms, such as SSL, Tor, and
poor security, or both. encrypting tunnels. At the 2012 Oakland conference, Dyer, et al.
This paper (1) systematically analyzes existing attacks and de- [5] showed that an attacker could infer, with a success rate over
fenses to understand which traffic features convey the most infor- 80%, which of 128 pages a victim was visiting, even if the victim
mation (and therefore are most important for defenses to hide), used network-level countermeasures. At CCS 2012, Cai et al. [3]
(2) proves lower bounds on the bandwidth costs of any defense described an attack that could achieve a greater than 75% success
that achieves a given level of security, (3) presents a mathematical rate (out of 100 sites) against numerous network and application-
framework for evaluating performance of fingerprinting attacks and level defenses.
defenses in the open-world, given their closed-world performance, Published and deployed defenses have high overhead, poor se-
and (4) presents a new defense, Tamaraw, that achieves a better se- curity, or both. The Tor project has incorporated network- and
curity/bandwidth trade-off than any previously proposed defense. browser-level defenses against fingerprinting attacks into its Tor
Our feature-based analysis provides clear directions to defense Browser Bundle [13], but Cai found that they provide no secu-
designers on which features need to be hidden. Our lower bounds rity benefit. Luo, et al. proposed HTTPOS [11], a collection of
on bandwidth costs help us understand the limits of fingerprint- network- and HTTP-level defenses, but Cai’s attack showed that it
ing defenses and to determine how close we are to “success”. Our offered little security benefit. Wright, et al. proposed traffic morph-
open-world/close-world connection enables researchers to perform ing [19], but both Dyer and Cai showed that it provides little protec-
simpler closed-world experiments and predict open-world perfor- tion against fingerprinting attacks. In fact, Dyer and Cai surveyed
mance. Tamaraw provides an “existence proof” for efficient, secure numerous defenses and found them all ineffective. Dyer proposed
defenses. BuFLO, which offers good security, but at a high bandwidth cost.
This paper addresses several challenges in designing, evaluating
and comparing the performance of website fingerprinting attacks
Categories and Subject Descriptors and defenses.
C.2.0 [Computer-Communication Networks]: General–Security Most attack evaluations have used the artificial “closed-world”
and protection model, in which the victim selects one of n websites uniformly
; K.4.1 [Computing Milieux]: Computers and Society–Privacy randomly and the attacker attempts to guess the chosen website
based on the observed network traffic. This model has been crit-
Keywords icized for being unrealistic because, in a real attack, the victim
may visit any website in the world [14], potentially making the at-
Anonymity; Website fingerprinting attacks and defenses tacker’s task much more difficult. Consequently, some researchers
have suggested that website fingerprinting attacks are in fact a pa-
1. INTRODUCTION per tiger [14].
Website fingerprinting attacks enable an adversary to infer which In this paper we show how to compute the open-world perfor-
website a victim is visiting, even if the victim uses an encrypting mance of an attack based on its performance in a closed-world ex-
proxy, such as Tor. These privacy mechanisms encrypt the content periment. Thus, researchers can evaluate attacks and defenses in
the simpler closed-world model and, using our method, compute
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
open-world performance results. We use this method to compute
for profit or commercial advantage and that copies bear this notice and the full citation open-world performance of our new defense, Tamaraw.
on the first page. Copyrights for components of this work owned by others than the We also investigate the danger that fingerprinting attacks pose in
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or the real world. We find that, without any defense whatsoever, fin-
republish, to post on servers or to redistribute to lists, requires prior specific permission gerprinting attacks can pose a significant threat to visitors to pop-
and/or a fee. Request permissions from Permissions@[Link]. ular web pages. For example, an ideal attacker against defenseless
CCS’14, November 3–7, 2014, Scottsdale, Arizona, USA.
Copyright is held by the owner/author(s). Publication rights licensed to ACM.
victims could recognize visits to the 100 most popular websites
ACM 978-1-4503-2957-6/14/11 ...$15.00. with a false discovery rate of less than 50%.
[Link]
Defense Security () BW Overhead Overhead Ratio We then use the new evaluation techniques described above to
Tamaraw 3.4% 199% 31.3 evaluate Tamaraw. We show that Tamaraw offers significantly bet-
Tamaraw 0.4% 687% 6.9 ter security than BuFLO in a closed-world setting. Our closed-
BuFLO 41.5% 199% 1965 world evaluation uses an ideal attacker and therefore bounds the
Tor 77.5% 25% 495.2 success rate that any attacker can achieve against Tamaraw. Table 1
summarizes the evaluation results. We evaluated Tamaraw under
two different configurations. One of them offers over 12× bet-
Table 1: Security and bandwidth overhead of defenses in a ter security than BuFLO for the same bandwidth overhead. Table 1
closed-world experiment with 800 websites. A defense is - also shows that all the defenses have a significant gap between their
secure if no attacker can guess the victim’s website with prob- performance and the trade-off lower-bound curve, although Tama-
ability more than . Note that this security evaluation is attack raw comes closest by far.
independent. An overhead ratio of r indicates that the defense Then, we show how to compute the open-world performance of
incurred an overhead r times larger than the lower bound on an attack based on experimental results derived in the closed-world
overhead derived in Section 5.1. setting. Further, we show that even the optimal attacker, against de-
fenseless victims, suffers from high false discovery rates. Finally,
we evaluate Tamaraw in our open-world model and show that it
performs significantly better than BuFLO, against the optimal at-
Most defense evaluations have attempted to estimate the efficacy
tacker.
of the proposed defense by using the state-of-the-art attacks avail-
The rest of this paper is organized as follows. Section 2 reviews
able at that time. This provides only a lower bound on the security
website fingerprinting attacks. Section 3 describes our feature-
of a defense — future attacks may demonstrate that it is insecure.
based defense comparison methodology. In section 4, we survey
This approach also makes it difficult to compare defenses that were
a set of six previously proposed defenses and four attack features.
evaluated using different attacks.
We then present the results of applying our feature-based compar-
We solve both problems by evaluating defenses against an ideal
ative methodology to each of these defenses. Section 5 presents
attacker. In our ideal attack, two websites are distinguishable unless
our model of fingerprinting attacks, the security/overhead trade-off
they generate the exact same sequence of network traffic observa-
lower-bound theorems, and the closed-world/open-world connec-
tions. Thus evaluating a defense against an ideal attacker gives the
tion. Section 6 describes Tamaraw and presents our evaluation re-
lower bound on the security provided by the defense and the eval-
sults. Section 8 discusses implications of our research.
uation results are attack-independent — i.e., future researchers can
compare different defenses using the same ideal attack.
Even when two defenses have been evaluated against the same 2. WEBSITE FINGERPRINTING
attack, it can be difficult to compare them, since every defense of-
fers a different trade-off between security and overhead. And even ATTACKS
if one defense strictly dominates all other defenses in terms of secu- In general, website fingerprinting (WF) attacks are a subset of
rity and efficiency, it is still not clear whether the attack is optimal. traffic analysis attacks. A WF attacker is able to monitor the com-
How efficient can a defense be while offering a given level of secu- munications between a client’s computer and a private web brows-
rity? ing proxy. The private browsing proxy may be an SSH proxy, VPN
To answer these questions, we develop an abstract model of web- server, Tor, or other privacy service. The traffic between the user
site fingerprinting attacks and defenses and prove lower bounds on and proxy is encrypted, so the attacker can only see the timing, di-
the bandwidth overhead of any defense that offers a given level of rection, and size of packets exchanged between the user and the
security. This enables us to compare defenses with different over- proxy. Based on this information, the attacker attempts to infer
head/security trade-offs by comparing how close they are to the the website(s) that the client is visiting via the proxy by examining
optimal trade-off curve. We can also bound how far defenses are various features of the observed traffic. Cai et al. [3] and Chen et
from optimal. Our bounds suggest that, although defenses are get- al. [4] describe variants wherein an attacker aims to identify a web
ting better, there may still be significant room for improvement. site instead of a web page. For consistency across the literature,
In order to design efficient and effective defenses, we need to however, we focus on the identification of single web pages. A WF
know which traffic features leak the most information about the defense is a set of countermeasures that protect the client against a
website being visited. Without this information, we may make the WF attack by obfuscating, or covering features of the web traffic,
mistake of designing a defense that incurs high cost to hide a low- making them less distinguishable across different web pages, thus
information feature. For example, most early defenses focused ex- reducing the accuracy of the attack.
clusively on packet sizes, but Cai, et al. showed that packet or- A WF attacker is assumed to be local and passive: the attacker
dering contains at least as much information as packet sizes [3]. taps and observes from only one location, and he is not allowed
Therefore, we systematically analyze existing attacks and defenses to add, drop, or change packets. This means that the attacker is
to understand which traffic features convey the most information weak, but is also resource-light and essentially undetectable. The
and therefore are most important for defenses to hide. Our anal- attacker can prepare for the attack by collecting information about
ysis goes beyond the “black-box” approach of previous work that websites in advance. For example, he can visit websites using the
published overall attack success rates but did not investigate the same privacy service as the client, collecting a set of website “fin-
reasons attacks and defenses succeed or fail. gerprints” as a training set, which he later uses to recognize the
Finally, we propose and evaluate a new, provably secure fin- client site. These ’fingerprints’ usually consist of packet sizes, di-
gerprinting defense. Tamaraw extends and tunes BuFLO to hide rections, etc. observed between the client and the proxy. Although
the most significant traffic features uncovered by our feature-based the packets themselves are encrypted, the packet sequence still car-
analysis. In particular, we find that BuFLO unnecessarily wastes ries information about the sizes of the objects in a web page and
bandwidth hiding the number of upstream packets and does not ad- the order they are being requested, which is induced by the struc-
equately hide the total number of downstream packets. ture of the web page (the list of resources, their lengths, and which
resources each resource requests). The interaction of multiple con- size of the page, which changes due to random advertisements and
nections between a client and a web server may cause randomiza- updated content. We will subsequently show that all current WF
tion in packet ordering but, as we will see later, a successful WF defenses fail to cover the total traffic size, as doing so is difficult
attack will tolerate these randomized differences while learning to and would necessarily incur a large traffic overhead. Therefore,
distinguish different web pages. the poor performance of most early WF attacks [6, 9, 10] may be
We retain two assumptions that all previous works on WF have attributed to the fact that they explicitly discard packet length fre-
made of the attacker. First, the attacker is able to notice exactly quencies.
when a new page is requested. In other words, the attacker knows Suppose nL (P` ) is the number of times packet length L appears
which sequence of packets corresponds to a single page. This as- in P` . P and P 0 are said be have different packet length frequencies
sumption is sometimes made in a stronger form — that the client’s iff their packet lengths occur at different frequencies while exclud-
think time always dominates page load time — which implies that a ing packet lengths that are unique to either P or P 0 – i.e.,
distinct pause exists between any two packet sequences of different ∃L|nL (P` ) 6= nL (P`0 ) ∧ nL (P` ) > 0 ∧ nL (P`0 ) > 0
page loads. Second, any background activity the client may have Packet Ordering: The structure of a page induces a logical or-
will not interfere with the client’s traffic. For example, a client will der in its packet sequence. As an example, a GET packet for a
not download a file while visiting a web page; alternatively, these resource can only be sent once the reference to that resource is
file download packets can be easily discarded by the attacker. These received by the client. An attacker may be able to infer informa-
assumptions are used by all previous works on WF as they simplify tion about the content of each packet from observing packet order-
the problem, though it should be noted that these assumptions are ing. Further, packet ordering depends on network conditions: it
advantageous for the attacker. may vary due to bandwidth and latency, and it may be affected by
changing the parameters for persistent HTTP connections, pipelin-
3. FEATURES AND METHODOLOGY ing, and so on. Tor developers have implemented a prototype de-
fense based on packet ordering by randomizing request order (see
A classifier succeeds at distinguishing between two classes when
Section 4.1).
it is able to discover a consistent difference between them. This
We denote M` as the multiset of packet lengths in P` , without or-
can be viewed as a difference between their features, which char-
dering. We say that two sequences P and P 0 have different packet
acterize a class. Implicitly or explicitly, classification techniques
ordering iff:
such as WF attacks extract features to classify. Conversely, a suc-
cessful defense effectively hides these features. In this section, we M` = M`0 ∧ P` 6= P`0
describe our methodology and use it to evaluate the strengths and
Interpacket Timing: Interpacket times often reveal the logi-
weaknesses of different WF defenses.
cal relationship between packets. For example, viewing from the
3.1 Packet Sequences and their Features client’s end, the outgoing server connection SYN and the incoming
In general, packet sequences have four major features: unique SYN-ACK will differ by a round-trip time; so will the GET request
packet lengths, packet length frequency, packet ordering, and in- and the first packet of that resource. If the attacker’s tap is near the
terpacket timing. Therefore, a packet sequence P can be written client, then the attacker can infer that an outgoing packet and an
as: incoming packet cannot be logically dependent if their interpacket
time is less than one RTT.
P = h(t1 , `1 ), (t2 , `2 ), ..., (tn , `n )i Suppose P and P 0 have sequence lengths |P | and |P 0 |. P and
P 0 are said to have different interpacket timings iff their timings
In the above, ti is the difference in time observed between packets
are different – i.e.,
i and i − 1 (interpacket timing), with t1 = 0; `i is the byte length
∃i, 1 ≤ i ≤ min(|P |, |P 0 |) : (Pt )i 6= Pt0 i

of packet i. The sequence length, |P |, is equal to n. We write
Pt and P` as the sequences of only the interpacket times and only
FACT 1. If P 6= P 0 , then they must differ in at least one of the
the packet lengths, respectively. We indicate the packet length as a
four features above.
positive value if the packet is outgoing and as a negative value if it
is incoming. This fact demonstrates that our choice of features is, in some
Unique Packet Lengths: Packet lengths are a simple and strong sense, complete, in that it represents any difference between two
feature of a web page. GET request lengths are partly determined packet sequences. We can therefore claim that successful attacks
by the length of the resource name. Incoming packets are almost should expose at least one of those four features between packet
always sent at the Maximum Transmission Unit (MTU), with the sequences, while defenses are perfectly effective if they can hide
length of the last packet indicating the size of the resource (modulo all four features.
the MTU).
Most packet lengths of a page are unlikely to change unless re- FACT 2. Given any packet sequence P and any subset of the
source lengths change. WF attacks almost always consider packet above features, there exists a packet sequence P 0 such that P and
lengths unless they are designed for the Tor scenario, in which case P 0 only differ in this subset of features, with the exception that
packet lengths are covered. When unspecified, we assume that different packet ordering implies that unique packet lengths and
packet lengths are not hidden from the attacker. packet length frequencies do not differ.
Mathematically, sequences P and P 0 are said to have different
unique packet lengths iff their sets of packet lengths are different – This fact implies that our features are somewhat independent of
i.e., each other (except packet ordering). It should therefore be possible
to find generators that change one feature without affecting the oth-
/ P`0 ) ∨ (∃L ∈ P`0 |L ∈
(∃L ∈ P` |L ∈ / P` ) ers, allowing us to pinpoint which features various defenses attempt
Packet Length Frequency: Packet length frequency is the num- to cover. It also shows that the features considered in this paper are
ber of times each unique packet length occurs. The number of exhaustive, even for data-dependent features (For example, if the
incoming packets at MTU size is a rough estimation of the total size of a response depends on the content of a cookie).
3.2 Comparative Methodology generators’ operations. The page we chose has a suitable amount
To determine if a defense is able to hide a feature, we apply the of randomness and has therefore been difficult to classify [18]. We
defense to two classes, C and C 0 , which differ only by that feature. use the first 200 elements of classes C and C 0 for training and the
Then, we say that a defense is successful in hiding the feature if af- last 200 elements for testing our feature classifiers described below.
ter applying the defense, there is no discernible difference between The training and testing sets are denoted Ctrain and Ctest , with the
0 0
C and C 0 . generator-modified sets being Ctrain and Ctest . We use four dif-
We use a generator G to transform C into C 0 by causing a change ferent feature classifiers, each one specializing on differentiating
in some feature of each packet sequence P ∈ C and inserting the one specific feature.
output into C 0 . We parameterize G(v) by v, a non-negative in- Unique Packet-Lengths (F1 ): Given a unique packet length in
teger such that the greater the value, the more “different” P and Ptest , if it is in any packet sequence P ∈ C and not in any P 0 ∈ C 0 ,
P 0 = G(v) (P ) will be; we require G(0) (P ) = P . We design each add 1 to the score of class C and vice versa. This is a modified
generator to modify only one specific feature. G(v) operates from version of the early Jaccard co-efficient classifier introduced in [9].
the start of the packet sequence. Informally, G(v) is equivalent to Packet-Length Frequencies (F2 ): For training, we count the
G(1) repeated v times, possibly from different starting points in the total number of bytes contained in incoming and outgoing packets,
packet sequence. For all of our generators, this interpretation en- as well as the total number of incoming and outgoing packets, and
sures that v functions as a magnitude. The designed generators are take the mean and standard deviation over the packet sequences in
not randomized and each generator may accept values of v up to each class. Each testing packet sequence is then scored using the
180 or |P |/5, whichever is lower. None of the generators produce normal distribution kernel against those four values for each class,
a packet length greater than the MTU. We give a textual description with the incoming and outgoing packets scored separately and then
of each generator below and mathematically define each generator multiplied. This classifier is a simplified version of the Naïve Bayes
(v) classifier also described in [9].
Gi in Table 2. More details of the generators can be found in our
Packet Ordering (F3 ): For testing, each packet length in the se-
tech report. [17]
quence (discarding direction) is compared to the mean of all train-
Small Packet-Length Changes (G1 ): All packet lengths are in-
ing packet lengths at that position. This classifier is derived from
creased by v, up to MTU.
the Cross Correlation classifier described in [2].
Large Packet-Length Changes (G2 ): v packet lengths are in-
Interpacket Timing (F4 ): Classification here is based only on
creased by 1000, up to MTU.
total elapsed time. We use this classifier because G10 is a delay.
Packet-Length Diffusion (G3 ): The lengths of v packet are in-
This reveals whether or not the total load time of a page would still
creased by their position divided by 5, up to MTU.
be a useful feature after the defense is applied.
Appended Incoming Packets (G4 ): v incoming MTU packets
Since the objective is simply to find the difference between the
are appended to the end.
two classes that differ only by a single feature, the above four single
Appended Outgoing Packets (G5 ): v outgoing packets are ap-
feature classifiers are sufficient.
pended to the end, their lengths being the lengths of the first outgo- 0
The defense D is applied to each element of Ctrain and Ctrain
ing packets of P . 0
to produce D(Ctrain ) and D(Ctrain ); the feature classifier is then
Inserted Incoming Packets (G6 ): v incoming MTU packets are
trained to distinguish between them. Finally, D is applied to Ctest
added, one per 5 packets. 0
and Ctest and its effectiveness is measured by the feature classi-
Adjacent Transpositions (G7 ): v packets are transposed with
fiers. The effectiveness of the defense D is measured by the value
the previous packet.
of v before the classifier is able to distinguish between the two
Short-Distance Transpositions (G8 ): v packets are transposed
classes with reasonable levels of accuracy (.55 and .75) – i.e., the
with the packet 4 elements ago.
magnitude of differences induced between C and C 0 before the
Long-Distance Transpositions (G9 ): v packets are transposed
classifier can distinguish between them.
with the packet 19 elements ago.
Our experimental setup is as follows. We load pages over a
Delays (G10 ): Each packet is delayed by a linearly increasing
100 Mbps Ethernet connection with MTU set at 1500. For auto-
amount of time, multiplied by v.
mated page loading, we used iMacros 9.00 on Firefox 23.0. We
collected data with tcpdump and parsed the data into packet se-
3.3 Classification and Experimental Setup quences with our code. We implemented all of the classifiers and
In order to understand the significance of these traffic features simulated defenses in this paper in a combination of Python, C++,
for fingerprinting attacks, we focus on distinguishing between two and C, and all of them are available upon request.
classes:
C = {P1 , P2 , ..., P400 } 4. COMPARISON OF DEFENSES
C 0 = {G(v) (P1 ), G(v) (P2 ), ..., G(v) (P400 )} In this section, we survey the current state-of-the-art of WF de-
These are the original class (C) and the generator-modified class fenses and present the results of our comparative evaluation based
(C 0 ) with one feature changed. Since our generators operate on on the methodology described in Section 3. We then analyze our
packet sequences, the elements of C and C 0 are packet sequences. findings and try to shed light on previously unexplained results.
We construct C by connecting to [Link] 400 times with Our simulations of each of the listed defenses operate on packet
caching disabled. The reason we do so is that C should contain sequences (packet lengths and timings, but no content) rather than
packet sequences of the same page rather than different pages, be- raw packets; they do not directly modify packets during page load-
cause WF attack classifiers are designed to tolerate the randomness ing. This allows us to observe if a defense is able to cover a feature
within the same page while exposing the differences between dif- modified by a generator.
ferent pages. A successful classifier should therefore be able to dis-
tinguish C and C 0 despite the randomness in C (and therefore C 0 ). 4.1 Simulated Defenses
On the other hand, the elements of C need to differ from each other; We simulate and compare the following network level finger-
this will allow us to measure how sensitive each defense is to the printing defenses:
(v)
Feature type Generator name # Transformation for G#
Small packet size changes 1 For 0 < i ≤ |P |: `i ← di min(|`i | + v, 1500)
Unique packet sizes Large packet size changes 2 For 0 < i ≤ v: `5i ← d5i min(|`5i | + 1000, 1500)
Diffusing packet sizes 3 For 0 < i ≤ v: `5i ← d5i min(|`5i | + i, 1500)
Append inbound MTU packets 4 Repeat v times: P ← A(P, N, (tN , −1500))
Packet size frequencies Append outbound packets 5 For 0 < i ≤ v: P ← A(P, N, (tN , Pouti ))
Insert inbound MTU packets 6 For 0 < i ≤ v: P ← A(P, 5i, (t5i , −1500))
Adjacent transpositions 7 For 0 < i ≤ v: T (P, 5i, 5i − 1)
Packet ordering Short distance transpositions 8 For 0 < i ≤ v: T (P, 5i, 5i − 4)
For 0 < i ≤ bv/5c, 0 < j ≤ 5: T (P, 25i − j, 25i −
Long distance transpositions 9
19 − j)
Interpacket timing Delays 10 ∀pi ∈ P, ti ← ti + v · i · 0.020 ms

Table 2: Generators. Packet sequence P = {p1 , p2 , ..., pN } where pi = (ti , `i ), t1 = 0. Pout is the sequence of outgoing packets in
P , and Pouti is its ith element (wrapping back to the beginning if i > |Pout |). di is the direction (1=outgoing, -1=incoming) of pi .
A(P, i, p) appends packet p after pi . T (P, i, j) transposes the packet lengths of pi and pj .

Defense Unique packet lengths Packet length counts Packet ordering Timing
G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
PadM * * * * * * * * * F2 16 57 F2 3 9 F2 16 57 * * * * * * * * * F4 23 47
PadE F2 28 91 F3 1 3 F3 59 * F3 2 27 F2 2 9 F3 1 1 F3 29 * F3 2 3 F3 4 4 F4 23 47
Wr-Morph F2 13 74 F2 29 180 * * * F2 43 72 F2 2 11 F2 32 68 * * * * * * * * * F4 23 47
HTTPOS F3 29 30 F3 23 50 F3 45 * F3 6 18 F2 3 9 F3 1 3 F3 179 * F3 2 * F3 4 4 F4 23 47
Pa-Decoy F1 1 68 F1 48 178 F1 34 * F3 93 * F2 38 151 F3 1 * * * * * * * F3 14 * F4 111 146
BuFLO F2 147 * * * * * * * F2 23 85 F2 78 * F2 20 94 * * * * * * * * * * * *
Tamaraw F2 177 * F2 180 * * * * F2 122 * F2 68 168 F2 82 * * * * * * * * * * * * *

Table 3: Upper bounds on the quality of the defenses. Results are given in three columns: the first column is the feature classifier
that was able to achieve the highest mean accuracy, the second is the value of v for which the feature classifier in column 1 had an
accuracy greater than 0.55 for all greater values of v, and the third is similar, but for 0.75; asterisks indicate when these accuracies
were not reached for v ≤ 180. Asterisks and higher v values indicate that the corresponding defense is more successful in covering
the specific feature. Tamaraw is our BuFLO-based defense presented in Section 6.

Maximum Packet Padding (PadM): Padding adds garbage data ing, and HTTP ranges in order to control the sizes of both outgoing
to the end of packets in order to obscure their true length. Packet and incoming packets.
padding schemes have been known in the literature; a large number Our simulation is not done at the application layer; rather, we
of different padding schemes were analyzed by Dyer et al. [5] and simply go through the packet sequence and split up each incom-
shown to be ineffective against the attack classifiers described in [9] ing packet of size less than the MTU. In the implementation of
and [12]. Packet padding schemes are meant to obscure the unique HTTPOS, this requires a new outgoing packet between the two
packet length feature. In the PadM defense, all packet lengths are splits which signals the end server that the client is ready to receive
padded to the MTU. The effect of using Tor is similar to PadM, as the second split, which costs one round-trip time. In our simulated
Tor traffic is delivered in fixed-size cells. defense, we assume this can be done without the signal and round-
Exponential Packet Padding (PadE): Here, packet lengths are trip time, which is possible with the cooperation of a proxy or the
padded upwards to the closest power of 2, but not exceeding the end server.1 We complete this defense by also padding all outgoing
MTU. No extra packets are added in either of the two schemes. packets to a fixed size of 1500; the authors describe how outgoing
PadE is meant to be a bandwidth-cheaper version of PadM. packet padding can be done, but they are not clear as to what they
Traffic Morphing (Wr-Morph): Wright et al. [19] published a implemented. Our choice gives maximum protection for unique
WF defense that is meant to obscure packet lengths, known as traf- packet lengths, at the cost of extra overhead.
fic morphing. Unique among the defenses, Wr-Morph is designed Background Noise (Pa-Decoy): Background noise can be used
to allow the client to set a target page CT and mimic the page’s to add randomness to each page load in order to make them less
packet size distribution by modifying packet lengths. distinguishable. Tor has some background activity that makes fin-
In our implementation, we use [Link] as CT , our target gerprinting more difficult, such as circuit construction, circuit mea-
morph page since it is reasonable for the client to attempt to hide surement, and stream SENDMEs.
her traffic features by using the most commonly accessed page as a Panchenko et al. [12] proposed a simple defense using back-
decoy. ground noise to defeat their own attack. Whenever a page is loaded,
HTTP Obfuscation (HTTPOS): HTTPOS was presented by a decoy page is loaded in the background, using its own connec-
Luo et al. [11] as a platform for website fingerprinting defenses. tions and connection limit so as not to interfere with the connec-
Unlike many other defenses, HTTPOS is implemented entirely on tions of the intended page. This defense has a high overhead. We
the client-side, with no need for support from any proxy or the end
server. It does so by using TCP advertised windows, HTTP pipelin- 1
This assumption is reasonable for us as many of the other defenses
also require the cooperation of a proxy or the end server.
used a different page from Alexa’s top 800 sites as background it is over 10 seconds at ρ = 0.020 s/packet, which happened quite
(v)
noise for each of the training and testing elements in order to simu- often. Trying to cover packet length frequency on G4 becomes a
late the intended effect that the attacker cannot predict which page race between v and the overhead of BuFLO; a larger v requires a
the client picked. Our simulated defense assumes that the decoy larger setting of minimum time τ to cover it.
page does not interfere at all with the page load of the true page. Our results are upper bounds on the quality of the defense, as
Buffered Fixed Length Obfuscator (BuFLO): After Dyer et more complicated classifiers could reveal more information. For
al. analyzed a large number of traffic analysis countermeasures and instance, HTTPOS performs well against F1 , but if C 0 has larger
found that efficient defenses failed, they presented their own de- unique packet lengths than C, then D(C 0 ) will also have larger
fense, BuFLO — a Buffered Fixed-Length Obfuscator, and demon- unique packet lengths than D(C) under HTTPOS. We wrote a clas-
strated its relative success against the attacks of the day [5]. The sifier specifically to find this difference and achieved an accuracy
BuFLO defense is an obfuscator with large bandwidth and time of 0.99 with G1
(100)
compared to 0.5 for F1 .
overhead that is applied to both the incoming and outgoing traf- Tor developers want to understand what WF defenses work with
fic. Packets are sent at fixed intervals with fixed length, and if no Tor [15]. As Tor already covers unique packet length, PadM, PadE,
data needs to be sent, dummy packets are sent instead. The traffic Wr-Morph, and HTTPOS are not meaningful on Tor, as all of these
must continue for a minimum amount of time, after which it may defenses are focused on covering unique packet lengths (although
terminate if no more data needs to be sent. only PadM truly does so). We note in particular that HTTPOS is a
BuFLO is parameterized by three values: d the fixed size of platform valuable for its client-only implementation (requiring no
packets (set to 1500), ρ the interpacket timing (set to one packet/20 cooperation from a proxy or the end server), but Tor bridges can be
milliseconds), and τ the fixed minimum amount of time for which made to cooperate by implementing a WF defense as a pluggable
data must be sent (set to 10 seconds). These parameter settings transport. Pa-Decoy and Dy-BuFLO achieve only limited success
were the strong (and more bandwidth-intensive) choice presented at covering packet length frequencies. In short, none of these de-
by Dyer et al., which was able to sharply decrease the accuracy of fenses can be considered a perfect fit for Tor.
the Panchenko classifier [12].

4.2 Comparative Results 5. THEORETICAL FOUNDATIONS


In this section we highlight the conclusions of our comparative In this section we develop a model of website fingerprinting at-
evaluation. The full results are given in Table 3. For each defense, tacks and defenses, derive lower bounds on the bandwidth overhead
we only present the most successful feature classifier. We only use of any defense that achieves a given level of security, and show how
the interpacket timing classifier on G10 . We vary v from 1 to 180 to derive open-world performance from closed-world experimental
and present the minimum value of v for which the feature classi- results.
fier had a higher accuracy than 0.55 and 0.75 for all greater v. A
lower v indicates that the defense was less successful in covering 5.1 Security vs. Overhead Trade-Off
the feature against our classifiers, whereas an asterisk indicates that We focus on understanding the relationship between bandwidth
our feature classifiers could not distinguish the classes with the tar- overhead and security guarantees. The overhead required by a fin-
get accuracy (0.55 or 0.75) for any value of v that we tested. This gerprinting defense depends on the set of web sites to be protected
table also includes our new proposed defense, Tamaraw (presented – a set of similar websites can be protected with little overhead, a
in Section 6). set of dissimilar websites requires more overhead. To derive lower
Our evaluation shows that PadM covers unique packet length as bounds of bandwidth costs, we consider an offline version of the
well as packet orderings. In PadE, changing packet length can website fingerprinting defense problem, i.e. the defense system
cause the lengths to be padded to different powers of 2, which knows, in advance, the set of websites that the user may visit and
could affect the set of unique packet lengths. However, the number the packet traces that each website may generate. We develop an
of distinct lengths with PadE is small, so this is unlikely. Con- efficient dynamic program to compute a lower bound on the band-
sequently, the unique packet length classifier did not detect a dif- width overhead of any fingerprinting defense scheme in the closed-
ference. Our packet ordering classifier, however, considered the world setting. We will use this algorithm to compute lower bounds
changes in packet lengths to be re-ordering, and therefore managed on overhead for the websites used in our evaluation (in Section 6.2).
to defeat PadE. PadM and PadE are ineffective against attacks that
use packet length frequencies. 5.1.1 Definitions
HTTPOS has been shown to be effective against attack classi- In a website fingerprinting attack, the defender selects a website,
fiers that are strongly dependent on unique packet lengths [9, 11]. w, and uses the defense mechanism to load the website, producing
However, Cai, et al.’s attack succeeds against HTTPOS [3] . This a packet trace, t, that is observed by the attacker. The attacker then
is because HTTPOS attempts to remove unique packet lengths as a attempts to guess w.
feature, but Cai’s attack primarily uses packet ordering. Let W be a random variable representing the URL of the web-
Pa-Decoy and BuFLO are effective against the Panchenko clas- site selected by the defender. The probability distribution of W
sifier and attacks that are dependent on packet length frequencies reflects the probability that the defender visits each website. For
[5, 12] – i.e., these defenses partly cover packet length frequencies. each website, w, let TwD and Tw be the random variables repre-
The table exposes several flaws. PadM, PadE, Wr-Morph and senting the packet trace generated by loading w with and without
HTTPOS are not designed to cover total transmission size (packet defense system D, respectively. Packet traces include the time, di-
length frequencies), so they would be ineffective against the attacks rection, and content of each packet. Since cryptographic attacks
that leverage them. Pa-Decoy fails to completely cover interpacket are out of scope for this paper, we assume any encryption functions
timing because it only covers the total transmission time roughly used by the defense scheme are information-theoretically secure.
half the time (i.e., when the decoy page takes longer to load than The probability distribution of TwD captures variations in network
the desired page), which may leak the total page size, a powerful conditions, changes in dynamically-generated web pages, random-
feature. Similarly, BuFLO does not cover total transmission time if ness in the browser, and randomness in the defense system. We
assume the attacker knows the distribution of W and TwD for every • |f (S)| ≤ n.
w. Pn Pn
In a closed world setting, the attacker’s goal is to infer W from • i=1 f (si )/ i=1 si ≤ BWRatioD (W ).
D
TW . The optimal closed-world attacker, A, upon observing trace t, Intuitively, f represents a mapping from each website’s original
outputs h i size (si ) to the number of bytes that D transmits when loading web-
A(t) = argmax Pr[W = w] Pr TwD = t site wi .
w
If more than one w attains the maximum, then the attacker chooses This theorem enables us to efficiently compute a lower bound on
randomly among them. the overhead of any defense that is  uniformly or non-uniformly
Some privacy applications require good worst-case performance, secure in a closed world experiment on w1 , . . . , wn . To get a lower
and some only require good average-case performance. This leads bound for non-uniformly -secure defenses, we just need to find
to two security definitions for website fingerprinting defenses: a monotonically increasing P function f : S → S that satisfies
|f (S)| ≤ n and minimizes n i=1 f (si ).
D EFINITION 1. A fingerprintingdefense D is non-uniformly Such an f is equivalent to aP partition S1 , . . . , Sk of S satisfy-
-secure for W iff Pr A(TWD
) = W ≤ . Defense D is uniformly ing k ≤ n and minimizing ki=1 |Si | maxs∈Si s. These par-
-secure for W if maxw Pr A(TwD ) = w ≤ . titions satisfy a recurrence relation. If S1 , . . . , Sk is an optimal
 
non-uniformly nk -secure partition, then S1 , . . . , Sk−1 is an optimal
These are information-theoretic security definitions – A is the op- non-uniformly n−|S k−1
-secure partition of S1 ∪ · · · ∪ Sk−1 . There-
k|
timal attacker described above. The first definition says that A’s k
average success rate is less than , but it does not require that every fore the cost,  C( n , n), of the optimal f satisfies the recurrence
website be difficult to recognize. The second definition requires all  nsn if k = 1
k
websites to be at least  difficult to recognize. All previous papers C( , n) = k−1
n  min C( , n − j) + jsn otherwise.
on website fingerprinting attacks and defenses have reported aver- 1≤j≤n−1 n−j
age attack success rates in the closed-world model, i.e. they have We can obtain a similar bound for uniformly -secure determin-
reported non-uniform security measurements. We will do the same. istic defenses. We say a defense is deterministic if, on each load of
To define the bandwidth overhead of a defense system, let B(t) website wi , it always transmits bi bytes. The following theorem is
be the total number of bytes transmitted in trace t. We define the proven in Appendix A.
bandwidth ratio of defense D as 
E B TW D
 T HEOREM 2. Let W be uniformly distributed over w1 , . . . , wn ,
BWRatioD (W ) = i.e. W represents a closed-world experiment. Suppose D is a de-
E [B (TW )]
This definition captures the overall bandwidth ratio between a user terministic defense that is uniformly -secure against AS on distri-
surfing the web while using defense D and a user visiting the same bution W . Then there exists a monotonically increasing function f
websites with no defense. from S = {s1 , . . . , sn } to itself such that
• mini |f −1 (si )| ≥ 1/.
5.1.2 Bandwidth Lower Bounds
Pn Pn
In this section we derive an algorithm to compute, given web- • i=1 f (si )/ i=1 si ≤ BWRatioD (W ).
sites w1 , . . . , wn , a lower bound for the bandwidth that any non-
uniformly -secure fingerprinting defense can use in a closed-world As with the lower bound on non-uniformly secure defenses, such
experiment using w1 , . . . , wn . an f corresponds to a partition P S1 , . . . , Sk of S satisfying mini
To compute a lower bound on bandwidth, we consider an adver- |Si | ≥ 1/ and minimizing ki=1 |Si | maxs∈Si s. These parti-
sary that looks only at the total number of bytes in a packet trace, tions satisfy a slightly different recurrence. If S1 , . . . , Sk is is an
i.e. an attacker AS that always guesses optimal uniformly -secure partition of S, then S1 , . . . , Sk−1 is an
optimal uniformly -secure partition on S1 ∪ · · · ∪ Sk−1 . Thus the
h i
AS (t) = argmax Pr B(TwD ) = B(t)
w cost, C(, n) of the optimal uniformly -secure partition satisfies
Any defense that is -secure against an arbitrary attacker must also the recurrence relation:
be at least -secure against AS . If we can derive a lower bound on  ∞ if n < 1/ 
if n ∈ 1 , 2

defenses that are -secure against AS , that lower bound will apply C 0 (, n) = nsn
0
to any -secure defense.  min C (, n − j) + jsn otherwise.
n−1

1≤j≤
We make two simplifying assumptions in order to obtain an ef- 

ficient algorithm for computing lower bounds. First, we assume Algorithm 1 shows a dynamic program for computing a lower
that each website has a unique fixed size, si . In our closed world bound on the bandwidth of any defense that can achieve  non-
experiments, we found that, for just over half the web pages in our uniform security in a closed-world experiment on static websites
dataset, their size had a normalized standard deviation of less than with sizes s1 , . . . , sn in time O(n2 ). We use this algorithm to
0.11 across 20 loads, so we do not believe this assumption will compute the lower bounds reported in Section 6.2. The dynamic
significantly impact the results of our analysis. Second, we assume program for computing uniform security lower bounds is similar.
that the defense mechanism does not compress or truncate the web- 5.2 From Closed to Open World
site.
We prove the following theorem in Appendix A: In this section, we show how to use closed-world experimental
results to compute open-world security of defenses and open-world
T HEOREM 1. Suppose n is an integer. Let W be a random performance of attacks. This makes attack and defense evaluation
variable uniformly distributed over w1 , . . . , wn , i.e. W represents simpler: researchers need only perform closed-world experiments
a closed-world experiment. Suppose D is a defense that is -non- to predict open-world performance.
uniformly-secure against AS on distribution W . Then there exists In an open-world attack, the defender selects a website, W , ac-
D
a monotonically increasing function f from S = {s1 , . . . , sn } to cording to some probability distribution and generates a trace, TW ,
itself such that corresponding to a visit to that website using some defense, D.
n
The attacker’s goal is to determine whether W = w∗ , where w∗ is
P
decrease the false-positive rate since it increases pi . Increasing
a particular website of interest. (It is easy to generalize this defini- i=2
tion to situations with multiple websites of interest). n can only decrease the true-positive rate.
In the open-world setting, the distribution of the random variable Thus we can tune the false-positive and true-positive rates of C
W corresponds to the popularity of different websites among the by varying n. Small n will have large true- and false-positive rates.
population of users being monitored in the attack. So, for example, Increasing n will reduce both the false- and true-positive rates. By
if the fingerprinting attacker is a government monitoring citizens varying n, we can generate the receiver operating curve (ROC) of
Tor usage, then W would be distributed according to the popularity C.
of websites among that nation’s Tor users. In the real world, visits to w∗ may be rare. In this case, false-
Any closed-world attack can be used to construct an open-world positive rate can be a misleading metric. A classifier with a low
attack by selecting websites w2 , . . . , wn and building a closed- false-positive rate may still be useless if true positives are so rare
world classifier, A, on w∗ , w2 , . . . , wn . The open-world classifier that they are overwhelmed by false positives. Therefore, we also
is defined as C(t) = 1 iff A(t) = w∗ . report true-discovery rates for the open-world attack and defense
We can compute the false positive rate of this open-world attack evaluations in this paper. Given an open-world classifier, C, its
as follows. Let p∗ = Pr[W = w∗ ] and pi = Pr[W = wi ] for i = true-discovery rate is defined as
2, . . . , n. We can obtain estimates for p∗ , p2 , . . . , pn from public TDR(C) = Pr[W = w∗ |C(TW D
) = 1].
sources, such as the Alexa Page-Views per Million database [1]. Intuitively, the true-discovery rate is the fraction of alarms that are
Let Rn be the average success rate of A in the closed world, i.e. true alarms. The true-discovery rate can be computed from the
n false-positive and true-positive rates as follows:
Pr[A(TwD∗ ) = w∗ ] + Pr[A(TwDi ) = wi ]
P
Pr[W = w∗ ] TPR(C)
Rn =
i=2 TDR(C) =
n Pr[W = w∗ ] TPR(C) + Pr[W 6= w∗ ] FPR(C)
Note that Rn is the standard performance metric used in closed- p∗ R n
world evaluations. For simplicity, we assume that Pr[A(TwD∗ ) = = n
p∗ Rn + 1−R pi + (1 − n 1
n
P P
w∗ ] = Rn . We also assume that, whenever A misclassifies a trace, n i=2 pi ) n
i=2
there is a 1/n chance that it misclassifies the trace as w∗ , i.e. that
Pr[A(TW D
) = w∗ |W 6= w∗ ∧ A(TW D
) 6= W ] = 1/n. Essen-
tially, these two assumptions are equivalent to assuming that w∗ Algorithm 1 Algorithm to compute a lower bound on the band-
is not particularly difficult or easy for A to recognize. With these width of any offline non-uniformly  secure fingerprinting defense
assumptions, we can compute C’s false-positive rate: against AS attackers.
FPR(C) = Pr[C(TW D
) = 1|W 6= w∗ ] function AS - MIN - COST(n, , {s1 , . . . , sn })
X Pr[W = w] Pr[C(TwD ) = 1] Array C[0 . . . n, 0 . . . n]
= for i = 0, . . . , n do

1 − p∗ C[i, 0] ← 0
w6=w
X Pr[W = w] Pr[A(TwD ) = w∗ ] end for
= for i = 0, . . . , n do

1 − p∗ C[0, i] ← ∞
w6=w
n end for
X Pr[W = wi ] Pr[A(TwDi ) = w∗ ]
= for i = 1 → n do
i=2
1 − p∗ for j = 1 → n do
n
! C[j, i] = min1≤`≤i−1 [(i − `)si + C[j − 1, `]]
X 1
+ 1− Pr[W = wi ] end for
i=2
n(1 − p∗ ) end for
return C[n, n]
n n
!
1 − Rn X 1 X
= pi + 1− pi end function
n(1 − p∗ ) i=2 n(1 − p∗ ) i=2
With the same assumptions, the true positive rate of C is
TPR(C) = Pr[C(TW D
) = 1|W = w∗ ] = Rn
The choice of the websites w2 , . . . , wn used to build A will af- 6. TAMARAW: A NEW DEFENSE
fect the performance of C in the open world. The choice of web- In this section we present a prototype of a new defense, Tama-
sites affects the false-positive rate in two ways: (1) choosing less raw2 , that can be considered a theoretically provable version of Bu-
popular websites tends to increase the false-positive rate since it FLO.
n
P
decreases pi , and (2) choosing more similar websites increases
i=2 6.1 Design
the false-positive rate by reducing Rn . The choice of websites af- Based on the results of our comparative and theoretical study of
fects the true-positive rate only through Rn . Cai, et al., showed that website fingerprinting defenses, Tamaraw is designed with three
the Alexa top 100 websites were about as similar as 100 randomly guiding principles in mind:
chosen websites [3], i.e. that the most popular websites are not par-
ticularly similar to eachother. Thus it is generally a good strategy 1. Strong Theoretical Foundations: The security of the Tama-
to choose w2 , . . . , wn to be the most popular websites other than raw defense is based on an extension of the concept of op-
w∗ . timal partitioning and feature hiding demonstrated in Sec-
Similarly, the number, n, of websites used to build A affects the tion 5.1 (against AS attackers). The relation is seen in section
false-positive rate in two ways: (1) increasing n tends to increase Section 6.2.1.
the false positive rate by lowering Rn , and (2) increasing n tends to
2
A Tamaraw is a lightweight cousin of the Buffalo.
2. Feature coverage: While Section 5.1 aims to hide only the In order to achieve these same benefits when we use Tamaraw with
total transmission size, Tamaraw attempts to hide all fea- 750 byte packet sizes, we maintain the same transmission rate. This
tures. In fact, Table 3 shows that Tamaraw effectively hides is achieved by halving ρin and ρout – i.e., we set ρout = 0.02 and
all the features studied in Section 3, with the exception of to- ρin = 0.006.
tal downstream transmission size. As in Section 5.1, optimal In our implementations of BuFLO and Tamaraw, we pessimisti-
partitioning requires different sites to be padded to different cally required that the original logical ordering of the real packets
transmission sizes. must be maintained. For example, if the defense allowed an outgo-
ing packet to be sent, but the next real packet to be sent is an incom-
3. Reducing Overhead Costs: BuFLO faces the dilemma that ing packet, then a dummy outgoing packet is sent, even if there are
increasing τ will increase its defensive coverage, but also in- other outgoing packets waiting after the incoming packet. This is
crease its overhead. We address this dilemma and we are able to guarantee that the causal order is preserved: it could be that the
to find ways to significantly reduce the overhead of BuFLO subsequent outgoing packets depend on the incoming packet. This
in both bandwidth and time. rule has a large effect on the bandwidth. A practical implementa-
tion could achieve a lower size and time overhead as re-ordering is
Tamaraw works as follows. As in BuFLO, traffic is still sent in possible for both defenses when subsequence is not consequence;
fixed size packets and at fixed intervals; however, the packet size our simulations are therefore pessimistic on the overhead but nec-
is set at 750 bytes rather than the MTU. This is done since most essary to guarantee correctness.
outgoing packets are covered by this size (see Appendix: Figure We apply the same defense methodology in Section 4.2, and
3) while not incurring unwanted overhead. Further, incoming and give the results in Table 3, presented earlier. We can see that at
outgoing traffic are treated differently. Outgoing traffic is fixed at a ρout = 0.02, ρin = 0.006, and L = 100, Tamaraw is much more
higher packet interval, which saves overhead as outgoing traffic is successful at protecting all features than other defenses, despite that
much less frequent. We denote the packet intervals as ρout and ρin it cannot achieve a perfect cover of total packet length or frequency
(measured in s/packet). We use experimentation to choose these either. Looking more closely at the table, we see that Tamaraw is
values of ρ in Section 6.2. not perfectly successful against generators that significantly change
Additionally, BuFLO only attempts to cover total transmission the total transmission size (including the unique packet length gen-
size if the total amount of time for one page is less than τ . This erators G1 and G2 ). As BuFLO sends more dummy outgoing pack-
makes the choice of τ especially awkward: increasing τ increases ets than Tamaraw, BuFLO is more able to cover changes in outgo-
the number of web pages covered by BuFLO, but it also increases ing transmission size (G2 , G5 ), but it is less able to cover changes
the overhead. In Tamaraw, however, the number of packets sent in in incoming transmission size (G1 , G4 , G6 ). We next show why
both directions are always padded to multiples of a padding param- Tamaraw is a more effective defense than BuFLO.
eter, L.3 This means that if L is large enough, then it is likely that
for each web page there exists some other web page that is mapped 6.2.1 An Ideal Attacker
to the same multiple of L. Suppose the total number of incoming In order to produce an upper bound on the attack accuracy of
packets is I, where AL < I ≤ (A + 1)L, then we pad to (A + 1)L any classifier on Tamaraw, we evaluate the partitions produced by
at the rate ρin . We do this separately for outgoing packets as well. Tamaraw (partitions were introduced in Section 5.1). The number
Compared to the optimal defense in Section 5.1, the partitions pro- of partitions is directly linked to the maximum classification accu-
duced are fixed, independent of the dataset. racy. For a partition of size |S|, the attacker can at best achieve an
Even though the differences between BuFLO and Tamaraw are accuracy of 1/|S| on each site in the partition.
not very large, the impact on security is tremendous: Tamaraw of- For Tamaraw, the partition is calculated as follows. Let D be
fers a maximum attack-accuracy guarantee, BuFLO does not. Tamaraw where L is set to 0 (no dummy packets appended to the
6.2 Experimental Results end). Suppose the number of incoming packets for defended packet
sequence D(P ) is |D(P )inc | and the number of outgoing packets
In BuFLO, ρout and ρin were both 0.02, but it is expected that is |D(P )out |. Two packet sequences D(P ) and D(P 0 ) are the
ρout should not have to be as large as ρin . As the distinguishability same
of two different web pages is controlled by the padding parame- j under Tamaraw
k j if they satisfy:
|D(P 0 )inc | 0
k j k j k
|D(P )inc |
ter L, our objective in the choice of ρin and ρout is to minimize L
= L
, |D(PL)out | = |D(PL)out | .
overhead. We test the bandwidth and time overhead of Tamaraw This is the case even if the attacker is able to observe those packet
on Alexa’s top 100 pages, loaded 70 times each. We vary ρout and sequences multiple times with the knowledge that they belong to
ρin from 0.005 to 0.16 seconds/packet while using Tamaraw with two pages. We experiment on Alexa’s top 800 sites. We only load
MTU packet sizes. We present a pareto curve showing all values one instance of each web page and reset the browser state between
for which no other choice of parameters had both a lower time and each page load. By doing this, we eliminate the network variability
bandwidth overhead in the Appendix: Figure 4. and make the defense system deterministic, which, as shown in the
Generally, as ρin and ρout increased, size overhead decreased Appendix, does not reduce the security of the defense. Thus we
while time overhead increased. Here we set L to 100, for reasons can soundly use this technique to obtain an upper bound on the
discussed below. With MTU packets, ρout = 0.04 and ρin = success rate of an ideal attacker against this defense. For BuFLO,
0.012, we achieve a 17% decrease of total transmission size over- we consider two packet sequences to belong to the same partition
head from BuFLO’s 149 ± 6% to 123 ± 10%, with the time over- if the total transmission size is the same, as total transmission size
head roughly the same, changing from 330 ± 80% to 320 ± 70%. is the only observable difference.
3
It is non-trivial for the proxy to know when to pad, as it does not 6.2.2 Closed-world Performance
know when the data stream has ended. One way for the proxy to Figure 1 shows the non-uniform security provided by Tamaraw
know this is to set a parameter K, such that if the last K packets
were dummy packets, then the traffic is determined to have ended. and BuFLO against their corresponding bandwidth overheads. The
In our analysis we assume that the client and proxy know when to BuFLO points correspond to the BuFLO configurations evaluated
pad. by Dyer, et al. [5]. For reference, Figure 1 also includes a point
Bandwidth Overhead (x100%)
TDR of the corresponding open-world classifier where the the web-
3 site of interest is wi , for i = 1, . . . , 100. Note that, even though
Non-Uniform Bound
2.5 Tamaraw the open-world is the entire internet, our experiment only consid-
BuFLO
2 Tor ers open-world attacks that attempt to recognize visits to one of
the 100 most popular websites. The reason is because the TDR of
1.5
an open-world attack on an unpopular website will be lower than
1 that of an attack on a more popular website. By showing that the
0.5 TDR becomes extremely low when attacking Tamaraw, even for the
0 first 100 websites, we show that it’s extremely low for all websites.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 The popularity of each website was taken from the Alexa estimated
Adversarial Accuracy (ε) page views per million database [1]. We only need popularity in-
Figure 1: Non-uniform security () against transmission size formation for the sites used to construct the closed-world classifier;
overhead for BuFLO, Tamaraw with L = 100, and Tor. the rest of the sites on the internet are treated as being randomly
classified and have, in aggregate, 106 − (p1 + p2 + ... + p800 ) page-
1
No Defense views per million, where pi is the page-views-per-million of the ith
BuFLO vs. Optimal Attack
Tamaraw (OH: 200%) vs. Optimal Attack
Tamaraw (OH:687%) vs. Optimal Attack
most popular site. For comparison, for the 100th most popular site,
in the open world, Tamaraw (with 200% overhead) has a TDR ap-
0.8
1 th
proximately 13 the TDR of BuFLO and Tamaraw (with overhead
1 th
687%) has a TDR approximately 110 the TDR of BuFLO. These
True Discovery Rate

0.6
results show that Tamaraw is a significant improvement over Bu-
FLO in the open- and closed-world settings.
0.4

7. CODE AND DATA RELEASE


0.2 To ensure reproducibility and correctness, all code and data used
in this paper are publicly available4 . This includes: traces of web-
sites loaded, code for all generators, code for all feature based at-
0
10 20 30 40 50 60 70 80 90 100 tackers, and code for all defenses tested (including Tamaraw).
Site Index

Figure 2: TDR for the Alexa top 100 sites in the open-world 8. CONCLUSIONS
when using various defenses against the ideal attacker. In this paper, we developed and tested a new feature-based com-
parative methodology that classifies and qualifies WF defenses.
This methodology allows us to understand which defenses are able
for Tor, for which we use overhead and security measurements to successfully hide which features – thereby upper-bounding their
reported by Cai, et al. against their Ca-DLevenshtein attack [3]. success rates in defending against attackers that rely heavily on that
The results show that Tamaraw offers a significantly better secu- feature. This methodology also exposes some flaws of previous de-
rity/efficiency trade-off than BuFLO. For reference, at a size over- fenses.
head of 130%, there are 553 partitions (non-uniform security of Our theoretical model clarifies the limits of website fingerprint-
69%) in BuFLO (τ = 9) and 18 partitions (non-uniform security ing defenses. It establishes efficiency bounds that no defense can
of 2.25%) in Tamaraw. This shows that a design that adheres to the cross, giving an absolute benchmark for evaluating the efficiency
principles of provable lower bounds in Section 5.1 is more suitable of defenses. The lower bounds of bandwidth costs are surprisingly
for clients. low, suggesting that it may be possible to build very efficient de-
Table 1 shows how close different defenses are to the optimal fenses. We also show that, in some contexts, randomized defenses
lower bound curve derived in Section 5.1. The Overhead Ratio of offer no security or overhead advantage compared to deterministic
a defense is the ratio between the defense’s bandwidth overhead defenses. This theoretical foundation also provides a framework
and the lower bound on overhead. Table 1 shows the best over- for comparing schemes which offer different overhead and security
head ratios that Tamaraw and BuFLO achieved in our experiments. trade-offs. Further, it allows conclusions to be drawn about open-
For reference, we also give the overhead ratios for Tor. Tamaraw world performance of attacks and defenses, based on their closed-
achieves its best overhead ratio in a high-security/high-overhead world results. This greatly simplifies the experimental setup re-
configuration that may not be practical for most users. Therefore, quired to estimate open-world performance of attacks and defenses.
we also report Tamaraw’s performance in a more feasible configu- While previous work has shown that current WF defenses are
ration with overhead comparable to the best BuFLO configuration either ineffective or inefficient, and while our work has explained
in our experiments. Even with this restriction, Tamaraw has an these results using a systematic methodology, we argue that the
overhead ratio less than a sixtieth of BuFLO. situation is not hopeless for web browsing clients who desire pri-
vacy. We propose a new defense, Tamaraw, that is able to reduce
6.2.3 Open-world Performance the overhead of BuFLO significantly. Using our methodology, we
Figure 2 shows the TDR in the open-world for the Alexa top show that Tamaraw provides better protection against website fin-
100 sites when they face the ideal adversary and are defended by gerprinting attacks than all previous defenses, in both, the open and
BuFLO, Tamaraw (at 200% and 687% overhead), or no defense, closed-world models.
respectively. To compute these curves, we build an ideal closed-
world classifier on the Alexa top 800 sites. The ideal attacker is 4
[Link]
based on the ambiguity sets described above. We then compute the webfingerprint/
9. ACKNOWLEDGMENTS 1
We would like to thank Scott E. Coull, Andriy Panchenko, and 0.8
Kevin P. Dyer for their correspondence with us, which helped us 0.6

CDF
improve the paper. We thank NSERC, ORF, and The Tor Project for
funding this project. This work was made possible by the facilities 0.4
of the Shared Hierarchical Academic Research Computing Net- 0.2 Outgoing
Incoming
work (SHARCNET: [Link]) and Compute/Calcul 0
Canada. 0 200 400 600 800 1000 1200 1400
Packet length (bytes)
10. REFERENCES Figure 3: Packet lengths observed when loading one instance
of each of Alexa’s top 800 sites. Packets sized 1500 bytes are
[1] Alexa — The Web Information Company. [Link]. discarded.
[2] G. D. Bissias, M. Liberatore, D. Jensen, and B. N. Levine.
400

Time overhead (%)


Privacy Vulnerabilities in Encrypted HTTP Streams. In Dy-BuFLO
Privacy Enhancing Technologies, pages 1–11. Springer, 300 ρout = 0.04,
2006. ρin = 0.012
200
[3] X. Cai, X. Zhang, B. Joshi, and R. Johnson. Touching from a
Distance: Website Fingerprinting Attacks and Defenses. In 100
Proceedings of the 2012 ACM Conference on Computer and Tamaraw
0
Communications Security, pages 605–616, 2012. 0 50 100 150 200 250
[4] S. Chen, R. Wang, X. Wang, and K. Zhang. Side-Channel Size overhead (%)
Leaks in Web Applications: A Reality Today, a Challenge
Tomorrow. In Security and Privacy (SP), 2010 IEEE Figure 4: The lower-left boundary of the two-dimensional feasi-
Symposium on, pages 191–206. IEEE, 2010. bility region of size and time overhead for Tamaraw when vary-
[5] K. Dyer, S. Coull, T. Ristenpart, and T. Shrimpton. ing ρout and ρin . Our chosen parameters and the overhead of
Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis BuFLO on the same data set are marked. The overhead in-
Countermeasures Fail. In Proceedings of the 2012 IEEE cludes the padding mandated by L = 100.
Symposium on Security and Privacy, pages 332–346, 2012.
[6] D. Herrmann, R. Wendolsky, and H. Federrath. Website [14] M. Perry. A critique of website fingerprinting attacks.
Fingerprinting: Attacking Popular Privacy Enhancing https:
Technologies with the Multinomial Naïve-Bayes Classifier. //[Link]/blog/critique-
In Proceedings of the 2009 ACM workshop on Cloud website-traffic-fingerprinting-attacks,
computing security, pages 31–42, 2009. November 2013.
[7] A. J. Hoffman and J. B. Kruskal. Integral boundary points of [15] M. Perry, E. Clark, and S. Murdoch. The Design and
convex polyhedra. In M. JÃijnger, T. M. Liebling, D. Naddef, Implementation of the Tor Browser [DRAFT].
G. L. Nemhauser, W. R. Pulleyblank, G. Reinelt, G. Rinaldi, [Link]
and L. A. Wolsey, editors, 50 Years of Integer Programming torbrowser/design/. Accessed Oct. 2013.
1958-2008, pages 49–76. Springer Berlin Heidelberg, 2010.
[16] P. Seymour. Decomposition of regular matroids. Journal of
[8] I. Keller and C. Tompkins. An Extension of a Theorem of Combinatorial Theory, Series B, 28:305–359, 1980.
Dantzig’s. Linear Inequalities and Related Systems, Annals
[17] T. Wang and I. Goldberg. Comparing website fingerprinting
of Mathematics Studies, 38:247–254, 1956.
attacks and defenses. Technical Report 2013-30, CACR,
[9] M. Liberatore and B. Levine. Inferring the Source of 2013. [Link]
Encrypted HTTP Connections. In Proceedings of the 13th techreports/2013/[Link].
ACM Conference on Computer and Communications
[18] T. Wang and I. Goldberg. Improved Website Fingerprinting
Security, pages 255–263, 2006.
on Tor. In Proceedings of the 12th ACM Workshop on
[10] L. Lu, E.-C. Chang, and M. C. Chan. Website Fingerprinting Privacy in the Electronic Society, 2013.
and Identification Using Ordered Feature Sequences. In
[19] C. Wright, S. Coull, and F. Monrose. Traffic Morphing: An
Computer Security–ESORICS 2010, pages 199–214.
Efficient Defense against Statistical Traffic Analysis. In
Springer, 2010.
Proceedings of the 16th Network and Distributed Security
[11] X. Luo, P. Zhou, E. W. Chan, W. Lee, R. K. Chang, and Symposium, pages 237–250, 2009.
R. Perdisci. HTTPOS: Sealing Information Leaks with
Browser-side Obfuscation of Encrypted Flows. In NDSS,
2011. APPENDIX
[12] A. Panchenko, L. Niessen, A. Zinnen, and T. Engel. Website A. LOWER BOUND PROOFS
Fingerprinting in Onion Routing Based Anonymization
Suppose websites w1 , . . . , wn have sizes s1 < s2 < . . . <
Networks. In Proceedings of the 10th ACM Workshop on
sn . Let S = {s1 , . . . , sn }. For any defense, D, let pij be the
Privacy in the Electronic Society, pages 103–114, 2011.
probability that D transmits j bytes during a load of website wi .
[13] M. Perry. Experimental Defense for Website Traffic
Since, in a closed-world experiment, each website occurs with
Fingerprinting. https:
probability 1/n, the bandwidth cost of D is
//[Link]/blog/experimental- ∞ X n
defense-website-traffic-fingerprinting,
X 1
jpij
September 2011. Accessed Feb. 2013. j=1 i=1
n
and the non-uniform success probability of AS is L EMMA 2. The above linear program has an integral solution.
∞ P ∞
X maxi pij pij X maxi pij
P · i = P ROOF. Linear programs with Totally Unimodular (TU) con-
j=1
p
i ij n j=1
n straint matrices and integral objective functions have integral so-
We derive lower bounds on the bandwidth cost of D by comput- lutions [7]. We prove that the constraint matrix, A (derived by
ing the matrix of pij values that minimize the above bandwidth cost the constraints (a), (b), and (c) of the above LP), is TU. To prove
function while still satisfying the above security constraint. Recall TU-ness of A, it is sufficient to prove the following [8]: (i) Every
that, since D is assumed not to compress or truncate web pages, column contains at-most 2 non-zero entries, (ii) Every entry is 0,
pij = 0 for j < si . 1, or -1, (iii) If two non-zero entries in any column of A have the
The overall structure of the proof for non-uniform security is same sign, then the row of each belongs in two disjoint partitions
• Constrain the structure of the optimal pij so that we can for- of A.
mulate the optimization problem as a linear program. (see Since the set of TU matrices is closed under the operation of
Lemma 1). adding a row or column with at-most one non-zero entry [16], we
may delete the 2n rows of A corresponding to constraint (c) and
• Prove that the linear program has an integral solution, so that prove that the remaining constraint matrix A0 satisfies the TU con-
the optimal solution is equivalent to a function f : S → S ditions (i) - (iii).
satisfying certain constraints (see Lemma 2). Observe the following properties of A0 :
• Prove that f is monotonically increasing (see Lemma 3). • There are n rows (WLOG, rows 1 to n) induced by the con-
The lower bound for uniform security is similar. We first prove that straint (a). These are such that: Ai,(i−1)n , . . . , Ai,in−1 = 1,
there exists a similar function f for any deterministic uniformly ∀i ∈ {1, . . . , n} and 0 for all other entries. Therefore, each
secure defense, and then apply Lemma 3. column of the partition B composed of these n rows contains
only a single non-zero entry (i.e., +1).
L EMMA 1. Let pij be the probabilities that minimize the band-
width cost while meeting the security requirement. • There is only 1 row (WLOG, row n + 1) induced by the
constraint (b). This row has the form: An+1,j = 1, ∀j ∈
1. pij = 0 unless j ∈ {s1 , . . . , sn }. {12 , . . . , n2 } and 0 for all other entries. Each column of the
2. If pij < maxk pkj , then for all j 0 > j, pij 0 = 0. partition C composed of this single vector may contain only
3. For all j, pkj ≤ pk+1,j for k ∈ [1, i], where si ≤ j < si+1 . a single non-zero entry (i.e., +1).

P ROOF. 1. Suppose p`j 6= 0 where sk < j < sk+1 . Then From the above properties, it is clear that matrix A0 is TU since:
we can make a more efficient and no less secure by replacing Each column contains at-most 2 non-zero entries (+1) and it may
pisk with pisk +pij for all i and setting pij = 0 for all i. This be partitioned into matrices B and C such that condition (iii) is
will have lower bandwidth cost because sk < j. This will not satisfied. Therefore, the matrices A0 and A are TU and the LP
violate the constraint that, for all i and j 0 < si , pij 0 = 0, describing A has only integral optima.
because, if pij 6= 0 before the change, then si ≤ j, so si ≤ In an integral solution of the linear program, all the probabilities
sk . This will not worsen security because maxi (pisk +pij ) ≤ are 0 or 1, so the solution is equivalent to a function f : S → S
maxi pisk + maxi pij . satisfying
2. Suppose otherwise. Let t = min(maxk pkj −pij , pij 0 ). Note • |f (S)| ≤ n.
Pn Pn
t 6= 0. Thus we can construct a more efficient and no less • i=1 f (si )/ i=1 si ≤ BWRatioD (W ).
secure defense by replacing pij with pij + t and pij 0 with We now show there is a similar function for any deterministic
pij 0 − t. uniformly secure defense D. Set f (si ) = bi where bi is the num-
3. Suppose pkj > pk+1,j for some k ∈ [1, i], where si ≤ ber of bytes transmitted when the defense D loads website wi .
j < si+1 . This implies that pk+1,j < maxi pij . By Item 2, Since D does not compress or truncate websites, we must have
pk+1,j 0 = 0 for all j 0 > j. bi ≥ maxs∈f −1 (bi ) s for all i. Observe that we can assume bi =
Thus we must have that: jj 0 =sk+1 pk+1,j 0 = 1.
P maxs∈f −1 (bi ) s without harming security or efficiency, so that f :
S → S. PThus f satisfies Pthe security constraint mini |f −1 (si )| ≥
This also implies that pkj 6= 0. Thus, by Item 2, pkj 0 =
1/, and n f (s )/ n
i=1 si ≤ BWRatioD (W ).
maxi pij 0 for all j 0 ∈ {sk , . . . , j − 1}. This implies that i=1 i
Pj Pj
j 0 =sk pkj > j 0 =sk+1 pk+1,j = 1, a contradicition.
0 0
L EMMA 3. The mapping function f corresponding to an opti-
Since pij is non-zero only if j ∈ {s1 , . . . , sn }, we can relabel mal non-uniformly -secure defense, or a deterministic uniformly
the pij to be the probability that the defense transmits sj bytes -secure defense, is monotonic.
during a load of website wi .
P ROOF. Consider any partition of {s1 , . . . , sn } into sets S1 ,
Lemma 1, Item 3 implies that P maxi pij = pii , so the security . . . , Sk . Let mi = maxs∈Si Si . Without loss of generality, as-
constraint can be re-written as n i=1 pii ≤ n.
Now that the security constraint is a linear function of the pij sume m1 ≤ m2 ≤ · · · ≤ mk . Now consider the monotonic
variables, we can formulate a linear program for computing the allocation of traces into sets S1∗ , . . . , Sk∗ where |Si∗ | = |Si |. Let
optimal pij values: m∗i = maxs∈Si∗ s. Observe that m∗i ≤ mi for all i, i.e. the new
n X
X n allocation has lower bandwidth.
minimize pij sj (the bandwidth cost) Since the number of sets in the partition and the sizes of those
i=1 j=i sets are unchanged, this new allocation has the same uniform and
subject to the P
constraints non-uniform security as the original, but lower bandwidth. Hence
(a) Pn i=1 pii ≤ n ( non-uniform security) the optimal f must be monotonic.
n
(b) j=i pij = 1 (pij are probabilities)
(c) 0 ≤ pij ≤ 1

You might also like