Understanding Skip Graphs in P2P Networks
Understanding Skip Graphs in P2P Networks
Skip Graphs
James Aspnes* Gauri Shah t
and may cause imbalance in the distribution of nodes of the nodes chosen by an adversary still leaves most
leading to hotspots. Some methods to solve the near- of the nodes in the largest surviving component. Skip
est neighbor problem for overlay networks can be seen graphs can also be constructed without knowledge of
in [HKRZ02] and [KR02]. the total number of nodes in advance. In contrast,
Some of these systems are partly resilient to DHT systems such as Pastry and Chord require a
random node failures, but their performance may be priori knowledge about the size of the system or its
badly impaired by adversarial deletion of nodes. Fiat keyspace.
and Saia [FS02] present a system which is resilient The rest of the paper is organized as follows:
to adversarial deletion of a constant fraction of the we describe skip graphs and algorithms for them
nodes; some extensions of this result can be seen in detail in Section 2. Sections 3 and 4 describe
in [Dat02]. However, they do not give efficient the repair mechanism and fault-tolerance properties
methods to dynamically maintain such a system. for a skip graph. Contention analysis and load
TerraDir [SBK02] is a recent system that pro- balancing results are described in Section 5. Finally,
vides locality and maintains a hierarchical data struc- we conclude in Section 6.
ture using caching and replication. There are as yet
no provable guarantees on load balancing and fault 1.2 M o d e l We briefly describe the model for our
tolerance for this system. algorithms. We assume a m e s s a g e p a s s i n g envi-
ronment in which all processes communicate with
1.1 O u r a p p r o a c h The underlying structure of each other by sending messages over a communication
Chord, CAN, and similar DHTs resembles a balanced channel. The system is p a r t i a l l y s y n c h r o n o u s , i.e.,
tree in which balancing depends on the near-uniform there is a fixed upper bound (time-out) on the trans-
distribution of the output of the hash function. So mission delay of a message. Processes can c r a s h , i.e.,
the costs of constructing, maintaining, and searching halt prematurely, and crashes are permanent. We use
these data structures is closer to the O(logn) costs the term node to represent a process that is running
of tree operations than the O(1) costs of traditional on a particular machine. We assume that each mes-
hash tables. But because keys are hashed, DHTs can sage takes at most unit time to be delivered and any
provide only hash table functionality. Our approach internal processing at a machine takes no time.
is to exploit the underlying tree structure to give
tree functionality, while applying a simple distributed 2 Skip graphs
balancing scheme to preserve balance and distribute A skip list [Pug90] is a randomized balanced tree
load. data structure organized as a tower of increasingly
We describe a new model for a peer-to-peer net- sparse linked lists. Level 0 of a skip list is a linked
work based on a distributed data structure that we list of all nodes in increasing order by key. For each
call a skip g r a p h . This distributed data structure i greater than 0, each node in level i - 1 appears
has several benefits. Resource location and dynamic in level i independently with some fixed probability
node addition and deletion can be done in logarith- p. In a doubly-linked skip list, each node stores
mic time, and each node in a skip graph requires a predecessor pointer and a successor pointer for
only logarithmic space to store information about each list in which it appears, for an average of l i p
its neighbors. More importantly, there is no hashing pointers per node. The lists at the higher level
of the resource keys so related resources are present act as "express lanes" that allow the sequence of
near each other in a skip graph. This may be use- nodes to be traversed quickly. Searching for a node
ful for certain applications such as prefetching of with a particular key involves searching first in the
web pages, enhanced browsing and efficient search- highest level, and repeatedly dropping down a level
ing. Skip graphs also support c o m p l e x q u e r i e s such whenever it becomes clear that the node is not in the
as range queries, i.e. locating resources whose keys current level. Considering the search path in reverse
lie within a certain specified range. There has been shows that no more than 1-~ nodes are searched on
some interest in supporting complex queries in peer- average per level, giving an average search time of
to-peer-systems [HHH+02], and designing a system
that supports range queries has been posed as an O ( logn(l_p~log ~ ) . Skip lists have been extensively
open question. Skip graphs are resilient to node fail- studied [Pug90, PMP90, Dev92, KP94, KMP95] and
ures: a skip graph tolerates removal of a large fraction because they require no global balancing operations
of its nodes chosen at random without becoming dis- are particularly useful in parallel systems [GMM93,
connected, and even the loss of an O ( 1 / l o g n) fraction GMM96, GM97].
386
HEAD TAIL
appears to be necessary for distributed data struc-
tures based on nodes in a one-dimensional space
i "( ";-~)" i l LEVEL 2
linked by random connections whose distribution sat-
isfies certain symmetry properties lADS02]. While
this lower bound requires some independence as-
sumptions that are not satisfied by skip graphs, there
is enough similarity between skip graphs and the class
Figure 1: A skip list. of models considered in lADS02] that an f~(log n) av-
erage degree is not surprising.
We now give a formal definition of a skip graph.
Precisely which lists an element x belongs to is
We would like to use a data structure similar to controlled by a m e m b e r s h i p v e c t o r re(x). We
a skip list to support typical binary tree operations think of rn(x) as an infinite random word over some
on a sequence whose elements are stored at separate fixed alphabet, although in practice, only an O(log n)
locations in a highly distributed system subject to length prefix of m ( x ) needs to be generated on
unpredictable failures. A skip list alone is not enough average. The idea of the membership vector is that
for our purposes, because it lacks redundancy and every doubly-linked list in the skip graph is labeled
is thus vulnerable to both failures and contention. by some finite word w, and an element x is in the list
Since only a few nodes appear in the highest-level labeled by w if and only if w is a prefix of re(x).
list, each such node acts as a single point of failure
whose removal partitions the list, and forms a hot
d('2i)~ i~ 33~!
spot that must process a constant fraction of all
search operations. Skip lists also offer few guarantees 13)..... i
that individual nodes are not separated from their -~~o~ ', ! 65 -il "il
i •
fellows even with occasional random failures. Since MEMBERSHIP .... . )----~ i -~ LEVEL 1
each node is connected on average to only O(1) other VECTOR ! ~-,~'<~ T6 " -" :--~, ' ii "il
". !H 13)----- "~33)-- ~48 2
nodes, even a constant probability of node failures SKIP LIST --!i " .06. . . . . . . . ol
. . - " - :~
will isolate a large fraction of the surviving nodes. \" .... i ~ 13 ~ ( 21 )----( 33 '- -'~ 48 )'---( 75 ?--'i 99 p i I LEVEL 0
Our solution is to define a generalization of a skip ; "06 ]b ~L ~0 Ti qi ij
list that we call a skip g r a p h . As in a skip list,
each node in a skip graph is a member of multiple
linked lists. The level 0 list consists of all nodes in Figure 2: A skip graph with [log N] = 3 levels.
sequence. Where a skip graph is distinguished from
a skip list is that there may be many lists at level To reason about this structure formally, we will
i, and every node participates in one of these lists, need some notation. Let Z be a finite alphabet, let Z*
until the nodes are splintered into singletons after be the set of all finite words consisting of characters
O(logn) levels on average. A skip graph supports in E, and let Z oo consist of all infinite words. We
s e a r c h , i n s e r t , and d e l e t e operations analogous use subscripts to refer to individual characters of a
to the corresponding operations for skip lists; indeed, word, starting with subscript 0; a word w is equal to
we show in Lemma 2.1 that algorithms for skip lists wowlw~ .... Let Iwl be the length of w, with Iwl = oo
can be applied directly to skip graphs, as a skip graph if w E E ~. If Iw I _> i, write w [ i for the prefix of w
is equivalent to a collection of up to n skip lists that of length i. Write e for the empty word.
happen to share some of their lower levels. Returning to skip graphs, the b o t t o m level is
Because there are many lists at each level, the always a doubly-linked list S~ consisting of all the
chances that any individual node participates in elements in order. In general, for each w in ~*,
some search is small, eliminating both single points the doublyqinked list Sw contains all x for which
of failure and hot spots. Furthermore, each node w is a prefix of re(x), in increasing order. We
has O(logn) neighbors on average, and with high say that a particular list Svo is part of level i if
probability no node is isolated. In Section 4 we Iwl = i. This gives an infinite family of doubly-
observe that skip graphs are resilient to node failures linked lists; in an actual implementation, only those
and have an expansion ratio of ~ ( 1 / l o g n ) with n Sw with at least two elements are represented. A
nodes in the graph. skip graph is precisely a family {Sw} of doubly-linked
In addition to providing fanlt-tolerance, having lists generated in this fashion. Note that because the
an f~(logn) degree to support O(logn) search time membership vectors are random variables, each Sw is
387
Inserts can be trickier when we have to deal with A l g o r i t h m 3: I n s e r t for existing node n
concurrent node joins. Before n' links to any neigh- upon receiving (link0p, n', side, level):
bors, it verifies that its join will not violate the skip if side = R t h e n crop = <
graph properties. So if any new nodes have joined the else cmp = >
skip graph between n' and its predetermined neigh- if (n sidelevel).key crop n'.key t h e n
bor, n' will advance over the new nodes if required s e n d (link0p, n', side, level) to n sideleve 1
before linking in the correct place. else
adjust links to add n' as side neighbor
A l g o r i t h m 2: i n s e r t for new node n' s e n d (link0p, n', otherS±de, level) to n sidetevel
if introducer = n' t h e n upon receiving (buddy0p, n', level, val) from side L(R):
nlLo = ±
if m(n)level = ± t h e n
n'Ro = ±
else m(n)level = getCoin 0
nnleve t = ±
if [Link] < n'.key t h e n side = R
else side = L n R l e v e l ---- ±
s e n d (searchOp, n', n'.key, 0 7 to introducer if m(n)level = val t h e n
upon receiving (searchOp, neighbor): s e n d (buddy0p, n, level) to n'
s e n d (linkOp, n', side, O) to neighbor else
level+- 1 if nRlevel(Llevel) # ± t h e n
w h i l e true d o s e n d (buddy0p, n', val, level) to nRleve 1
if n'Llevel_ 1 # ± t h e n (nLlevel)
s e n d (buddy0p, n', level, m(n')level ) to else
s e n d (buddyflp, l , level) to n'
n'Llevel_l
upon receiving (buddy0p, newBuddy, level):
if newBuddy ¢ ± t h e n
send (link0p, n', R, level) to newBuddy 3. If xLi # ±, xLiR~ = x.
else if (n'Rlevel_ 1 # ± ) A (newBuddy = L )
then 4. If xRi # L, x R i L i = x.
s e n d (buddy0p, n', level, m(n')level ) to
n'Rlevel_l 5. If i > O, re(x) [ i = m(xR~_l) [ i and ~tk, k <
upon receiving (buddy0p, newBuddy, level): i , m ( x ) [ i = m ( x R i k l ) I i, then xRi = xR~_ 1.
if newBuddy # I t h e n
6. If i > O, re(x) [ i = m(xL~_l) r i and ~ k , k <
send (linkOp, n', L, level) to newBuddy
1,re(x) [ i = m(xLi~_l) [ i, then x i i = x i ~ _ 1.
else b r e a k
else b r e a k
THEOREM 3.1. Every connected component of the
level6-1evel+ 1
data structure is a skip graph if and only if conditions
nJLlevet = ±
1 - 6 are satisfied.
nIRlevel = ±
3.1 M a i n t a i n i n g t h e i n v a r i a n t Define _l_Li =
± R i = ±. We define conditions 1 - 4 as an i n v a r i a n t
LEMMA 2.3. The i n s e r t operation in a skip graph
for a skip graph as they hold in all states with no
S with n nodes takes expected O(logn) time and
undelivered messages, even in the presence of failures.
O(log n) messages.
Conditions 5 - 6 may fail to hold with failures, but
they can be restored by the repair mechanism. We
3 Repair Mechanism
shall call conditions 5 and 6 as the L and R successor
In this section, we describe a self-stabilization mecha- conditions respectively.
nism that repairs the skip graph in the event of node
and link failures. We first characterize the constraints THEOREM 3.2. With no undelivered messages, the
for an ideal skip graph. Let x be any node in the skip invariant is maintained for a skip graph with node
graph; then for any level i: insertions, deletions and node failures.
1. If x R i # ±, x R i > x.
3.2 R e s t o r i n g skip g r a p h c o n s t r a i n t s The
2. If xLi # ±, xLi < x. successor conditions get violated during insert and
389
• Send (z±pper0pB, x, i) to a.
A l t h o u g h the skip g r a p h constraints m a y get
violated during an insert or a delete operation,
once no messages are pending and provided no C a s e 2: x R i ~ xRik_l, for any k. There are
additional inserts, deletes or failures occur, the three ways to repair this violation depending on w h a t
successor conditions are satisfied. Thus we see t h a t other nodes are present at level i - 1.
the repair mechanism is required to restore the
Case 2a: 3a = xR~_ 1 > xP~ and ~b = xRi+_x < a
successor conditions only in case of node or link such t h a t re(b) r i = re(x) r i.
failures. We consider the possible cases in which the
successor conditions can be violated and provide a x xRi LEVEL i
repair mechanism for the each of those cases. We
7
will concentrate on the repair mechanism for the R
" } zipperOpF
links arid fixing the L links is symmetric. It m a y be : aLi- I _
possible to combine the two mechanisms to improve
z~.pperOpB / ._~__~#___(~__ LEVEL i 1
the performance but we will treat t h e m separately -
for simplicity. L R
There are two cases when the R successor condi- T h e nodes connected to a and x R i at level i - 1
tion is violated: have to be merged together into one ring by sending
the following messages:
1. x R i = xR~_x b u t Sa = x R ik'_ l , k' < k, re(x) I
i = re(a) I i. This case occurs when two nodes s P r o b e level i - 1 to find largest x R i R i k'_ 1 = R < a.
are connected to each other at levels i - 1 and
i, and a new node in inserted between t h e m at • Send ( z i p p e r 0 p F , a, i - 1) to R.
level i - 1 but is pending to be inserted between ktl
• P r o b e level i - 1 to find smallest x R i L i _ 1 = L >
t h e m at level i. If the left neighbor of the new
aLi-1.
node checks its R successor condition at level i
before the insert of the new node at level i is • Send ( z i p p e r 0 p B , aLi-1, i - 1) to L.
completed, it will detect a discrepancy.
Case 2b: ~a = xR~_ 1 < x P ~ , m ( a ) ~ i = m(x) I i
2. xP~ ~ xR~_l, for any k. This case occurs with and xRi+_l ~ xP~.
the failure of any node or link in an ideal skip
graph.
-"" ' ""- LEVEL i
We consider each case in detail and propose a •~ .-4"" ! ")..... xR~
repair mechanism for each violation. ---CZ::- / ! L "'C ~--
: /) :
zippe~OpB d aR/-1 i zipperOpF
C a s e 1: x R i = xR~_l, b u t 3a = x ,~~ri _, kl ,' , t., <
k, m ( x ) r i = re(a) r i. ~ ,,,.fT---- LEVELi-1
M
• Send (z±pper0pB, a, i - 1) to M.
Node a should be inserted into level i and this is
done by sending the following messages1: • Send ( z i p p e r 0 p F , aRi-1, i - 1) to M.
• Send ( z i p p e r 0 p B , x, i) to a.
TD-etails of the zipperOp a l g o r i t h m are given in Algo-
rithms 4 a n d 5. • Send ( z i p p e r 0 p F , xRi, i) to a.
390
Case 2c: 3a < xRi, aRi = 3_. THEOREM 3.3. In the absence of new failures, the
repair mechanism described in Section 3.2 will even-
---~ "~ ~)- - LEVEL i tually restore the violated constraints of a skip graph,
without losing existing connectivity.
6
4 Fault Tolerance
-/'~ -----~---q, L E V E L i -- 1
z:[Link] C~--- (~_')--~2~-----~ ;----~.~--)-- In this section, we describe some of the fault tolerance
R properties of a skip graph. Fault tolerance of related
data structures, such as augmented versions of linked
lists and binary trees, has been well-studied and some
The nodes connected to a and xRi at level i - 1 results can be seen in [MP84, AB96]. The main
have to be merged by sending the following messages: question is how many nodes can be separated from
the primary component by the failure of other nodes,
• Probe level i - 1 to find smallest xRiLik_l = R >
as this determines the size of the surviving skip graph
a.
after the repair mechanism finishes.
• Send (zipper0pB, a, i - 1) to R. We show first that even a worst-case choice
of failures by an adversary that can observe the
A l g o r i t h m 4: z i p p e r 0 p B for node n structure of the skip graph can do only limited
damage. With high probability, a skip graph with
upon receiving <zipper0pB, x, g):
n nodes has an tQ(1/logn) expansion ratio, implying
if nLt > [Link] t h e n
that at most O ( f log n) nodes can be separated from
s e n d (zipper0pB, x, ~) to nL~
else the primary component by ff failures. These results
are described in Section 4.1
trap = nLt
For random failures, the situation appears even
nL~ = x
more promising; our experimental results, presented
XRl ~ n
in Section 4.2, show that for a reasonably large skip
if trap ~ 3_ t h e n
graph nearly all nodes remain in the primary com-
s e n d (zipper0pB, tmp, ~) to x
ponent until about two-thirds of the nodes fail, and
that it is possible to make searches highly resilient to
failure even without using the repair mechanism by
A l g o r i t h m 5: z i p p e r 0 p F for node n use of redundant links.
upon receiving (zipper0pF, x, t):
4.1 A d v e r s a r i a l f a i l u r e s Given a subset A of the
i f nRt <[Link] t h e n
nodes of a skip graph, define 5A as the set of all nodes
s e n d <zipper0pF, x, g) to nRe
that are not in A but that are adjacent to A. Further
else
define ~hA as the set of all nodes that are not in A
tmp = nRt
but are joined to a node in A by an edge at level h.
xL~ = n
Clearly 6A = Uh 5hA and 16AI > maxh 15hAl.
n R g --~ x
The expansion ratio of a set A is ISAI/[AI. The
if trap ~ 3- t h e n
expansion ratio of a graph is the minimum expansion
s e n d (zipper0pF, tmp, g/ to x
ratio of any set A for which 1 < IA[ .< n/2. The
expansion ratio determines the resilience of a skip
graph in the presence of adversarial failures, because
Original Links Unchanged Links separating a set A from the primary component
requires all nodes in 5A to fail. We will show that
skip graphs have f~(1/logn) expansion ratios with
high probability, implying that only O ( f log n) nodes
can be separated by f failures, even if the failures are
New Links carefully targeted.
( z i p p e r f l p messages)
Our strategy for showing a lower bound on the
expansion ratio of a skip graph will be to show that
Figure 3: z i p p e r 0 p operation to merge nodes on the
with high probability, all sets A either have large ~0A
same level.
(i.e., many neighbors at the b o t t o m level of the skip
391
than
s--1 [m+l~ [n--m--l~
Z r ~ I ~ r #~ r--I 1" ¢1
I.-I
q-I
0.4!
............ii.i I
Sketch of proof: Represent each A as a bit-vector m 0.2
¢1
where 1 indicates a member of the set and 0 a N 0
non-member. Then I~0AI is at least the number of 0.1 0,2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.1
Probability of node failure
intervals of zeroes in this bit-vector. The bound in
the lemma is then obtained by bounding the number Figure 4: Size of the largest connected component as
of length n bit-vectors with m ones and at most s a fraction of the surviving nodes with 131072 nodes.
intervals of zeroes. |
~ 0.15
Sketch of proof: The key observation is that for each
b in {0, 1} h, each skip list Sb that contains a member ~ 0.1
of both A and its complement contributes at least one
distinct element to 5hA. We then show that at least .0 0,05
THEOREM 4.1. Let c >_ 6. Then a skip graph with n Figure 5: Failed Searches with 131072 nodes and
nodes has an expansion ratio of at least _c l_o gLa /_2 _n with 10000 messages.
probability 1-0(n5-C), where the constant factor does
not depend on c.
most searches succeed as long as the proportion of
Sketch of proof: The probability bound is obtained failed nodes is substantially less that O(logn). By
by summing the probability of having 5hA too small detecting failures locally and using additional redun-
over all A for which 5oA is too small. For each dant edges, we can make searches highly tolerant to
set A of size m, h is chosen so that the 1 . 2 h small numbers of random faults; some experimental
bound of Lemma 4.2 exceeds m times the expansion results are shown in Figure 5. In these experiments,
ratio. The probabilities derived from Lemma 4.2 are each node x has extra links to its five nearest neigh-
then summed over all sets A of a fixed size m using bors on each side, at every level that it is a member
Lemma 4.1, and the result of this process is summed of. In general, we cannot make as strong guaran-
over all m > clog3/2 n to obtain the final bound, i tees as those provided by data structures based on
explicit use of expanders [FS02, Dat02], but we be-
4.2 R a n d o m f a i l u r e s In our experiments, skip lieve that this is compensated for by the simplicity
graphs appear to be highly resilient against random of skip graphs and the existence of good distributed
failures. As shown in Figure 4, nearly all nodes re- mechanisms for constructing and repairing them.
main in the primary component even as the probabil-
ity of individual node failure exceeds 0.6, and we sus- 5 Load balancing
pect that most of the lost nodes at this stage become In addition to fault-tolerance, a skip graph provides
isolated only because all of their immediate neighbors a limited form of load balancing, by smoothing out
die. hot spots caused by popular search targets. The
For searches, the fact that the average search in- guarantees that a skip graph makes in this case are
volves only O(logn) nodes establishes trivially that similar to the guarantees made for survivability. Just
392
[CDHR02] Miguel Castro, Peter Druschel, Y. Charlie Programming: Sorting and Searching, volume 3.
Hu, and Anthony Rowstron. Exploiting network Addison-Wesley Publishing Company Inc., Reading,
proximity in peer-to-peer overlay networks. In Massachusetts, 1973.
International Workshop on Future Directions in [KP94] P. Kirschenhofer and H. Prodinger. The path
Distributed Computing, Bertinoro, Italy, June 2002. length of random skip lists. Acta Informatica,
[Longer version submitted for publication]. 31(8):775-792, 1994.
[Dat02] Mayur Datar. Butterflies and peer-to-peer net- [KR02] David Karger and Matthias Ruhl. Finding
works. In Proceedings of the lOth European Sympo- nearest neighbors in growth-restricted metrics. In
sium on Algorithms, Rome, Italy, September 2002. Thirty-Fourth ACM Symposium on Theory of Com-
[Dev92] L. Devroye. A limit theory for random skip puting, pages 741-750, Montreal, Canada, May
lists. The Annals of Applied Probability, 2(3):597- 2002.
609, 1992. [MNR02] Dahlia Malkhi, Moni Naor, and David Rata-
[dlB59] Rene de la Briandals. File searching using vari- jczak. Viceroy: A scalabale and dynamic emulation
able length keys. In Western Joint Computer Con- of the butterfly. In Twenty-First ACM Symposium
ference, volume 15, pages 295-298, Montvale, N J, on Principles of Distributed Computing, pages 183-
USA, 1959. AFIPS Press. 192, Monterey, CA, USA, July 2002.
[FRE] FREENET. [Link] IMP84] J. Ian Munro and Patricio V. Poblete. Fault
[Fre60] Edward Fredkin. Trie memory. Communications tolerance and storage reduction in binary search
of the ACM, 3(9):490-499, September 1960. trees. Information and Control, 62(2/3):210-218,
[FS02] Amos Fiat and Jared Saia. Censorship resistant August 1984.
peer-to-peer content addressable networks. In Pro- [NAP] NAPSTER. Formerly, [Link]
ceedings of the Thirteenth Annual ACM-SIAM Sym- [PMP90] T. Papadakis, J.I. Munro, and P.V. Poblete.
posium on Discrete Algorithms, San Francisco, CA, Analysis of the expected search cost in skip lists. In
USA, January 2002. J. R. Gilbert and R. G. Karlsson, editors, SWAT 90,
[GM97] J. Gabarr6 and X. Messeguer. A unified ap- 2nd Scandinavian Workshop on Algorithm Theory,
proach to concurrent and parallel algorithms on bal- volume 447 of Lecture Notes in Computer Science,
anced data structures. In X V I I International Con- pages 160-172, Bergen, Norway, 11-14 July 1990.
ference of the Chilean Computer Society, 1997. Springer.
[GMM93] J. Gabarr6,~C. Mart/nez, and X. Messeguer. [PRR97] C. Plaxton, R. Rajaram, and A. W. Richa. Ac-
Parallel update and search in skip lists. In 13th cessing nearby copies of replicated objects in a dis-
International Conference of the Chilean Computer tributed environment. In Proceedings of the Ninth
Society, 1993. Annual ACM Symposium on Parallel Algorithms
[GMM96] J. Gabarr6, C. Martinez, and X. Messeguer. A and Architectures (SPAA), June 1997.
top-down design of a parallel dictionary using skip [Pugg0] William Pugh. Skip lists: A probabilistic alter-
lists. Theoretical Computer Science, 158(1-2):1-33, native to balanced trees. Communications of the
May 1996. ACM, 33(6):668-676, June 1990.
[GNU] GNUTELLA. [Link] [RD01] Antony Rowstron and Peter Druschel. Pastry:
[HHH+02] Matthew Harren, Joseph M. Hellerstein, Ryan Scalable, distributed object location and routing for
Huebsch, Boon Thau Loo, Scott Shenker, and Ion large-scale peer-to-peer systems. In Proceedings of
Stoica. Complex queries in DHT-based peer-to-peer the 18th IFIP/ACM International Conference on
networks. In 1st International Workshop on Peer- Distributed Systems Platforms (Middlewarc 2001),
to-Peer Systems (IPTPS), Cambridge, MA, USA, Heidelberg, Germany, November 2001.
March 2002. [RFH+01] Sylvia Ratnasamy, Paul Francis, Mark Hand-
[HKRZ02] Kirsten Hildrum, John D. Kubiatowicz, Satish ley, Richard Karp, and Scott Shenker. A scalable
Rao, and Ben Y. Zhao. Distributed object location content-addressable network. In Proceedings of the
in a dynamic network. In Fourteenth A CM Sympo- ACM SIGCOMM, pages 161-170, 2001.
sium on Parallel Algorithms and Architectures, Win- [SBK02] Bujor Silaghi, Bobby Bhattachaxjee, and Pete
nipeg, Manitoba, Canada, August 2002. Keleher. Query routing in the terradir distributed
[JKZ01] Anthony D. Joseph, John Kubiatowicz, and directory. In SPIE ITCOM 2002, August 2002.
Ben Y. Zhao. Tapestry: An infrastructure for fault- [SMK+01] Ion Stoica, Robert Morris, David Karger,
tolerant wide-area location and routing. Technical Frans Ka~shoek, and Hari Balakrishna. Chord:
Report UCB/CSD-01-1141, University of California, A scalable peer-to-peer lookup service for internet
Berkeley, Apr 2001. applications. In Proceedings of SIGCOMM 2001,
[KMP95] P. Kirschenhofer, C. Martlnez, and pages 149-160, 2001.
H. Prodinger. Analysis of an optimized search [ZJK02] Ben Y. Zhao, Anthony D. Joseph, and John D.
algorithm for skip lists. Theoretical Computer Kubiatowicz. Locality-aware mechanisms for large-
Science, 144(1-2):119-220, 26 June 1995. scale networks. In Workshop on Future Directions in
[Knu73] Donald E. Knuth. The Art of Computer Distributed Computing, Bertinoro, Italy, June 2002.