[go: up one dir, main page]

0% found this document useful (0 votes)
52 views9 pages

P2PSIP Maintenance Operations Study

This document studies the performance of a P2PSIP (Peer-to-Peer Session Initiation Protocol) system through experiments on a 500-node overlay network implemented on PlanetLab. It uses the Chord distributed hash table to organize the P2PSIP overlay and the Peer-to-Peer Protocol as the peer communication protocol. The experiments measure the performance under different churn rates when peers join and leave the network, as well as with different maintenance interval times for the Chord routing table updates.

Uploaded by

youcef moualkia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views9 pages

P2PSIP Maintenance Operations Study

This document studies the performance of a P2PSIP (Peer-to-Peer Session Initiation Protocol) system through experiments on a 500-node overlay network implemented on PlanetLab. It uses the Chord distributed hash table to organize the P2PSIP overlay and the Peer-to-Peer Protocol as the peer communication protocol. The experiments measure the performance under different churn rates when peers join and leave the network, as well as with different maintenance interval times for the Chord routing table updates.

Uploaded by

youcef moualkia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Study on Maintenance Operations in a Chord-based Peer-to-Peer Session

Initiation Protocol Overlay Network

Jouni Mäenpää and Gonzalo Camarillo


Ericsson
{[Link], [Link]}@[Link]

Abstract traditional P2P applications such as file sharing. The


paper is structured as follows. In Section 2, an
Peer-to-Peer Session Initiation Protocol (P2PSIP) introduction is given to our P2PSIP prototype and to
is a new technology being standardized in the Internet the mechanisms it uses. Section 3 summarizes the
Engineering Task Force. A P2PSIP network consists of results of previous studies. Section 4 describes the
a collection of nodes organized in a peer-to-peer experiment setup. In Section 5, the results of the
fashion for the purpose of enabling real-time measurements are presented. Finally, Section 6
communication using the Session Initiation Protocol concludes the paper.
(SIP). In this paper, we present experimental results
obtained by running a P2PSIP prototype in PlanetLab. 2. P2PSIP prototype
Our prototype uses the Chord Distributed Hash Table
(DHT) to organize the P2PSIP overlay and Peer-to- 2.1. Peer-to-peer protocol
Peer Protocol (P2PP) as the protocol spoken between
the peers. In the experiments, the performance of the Peers in a P2PSIP overlay need to use a protocol
system is studied under different churn rates and using between them to maintain the overlay and to store and
different DHT maintenance intervals. retrieve data [3]. This protocol, which is called the
P2PSIP peer protocol, is currently being designed by
1. Introduction the P2PSIP working group of the Internet Engineering
Task Force (IETF). The peer protocol used in the
Traditional architectures using the Session Initiation experiments carried out for this paper is called the
Protocol (SIP) [1] have a relatively fixed hierarchy of Peer-to-Peer Protocol (P2PP) [4]. P2PP is one of the
SIP proxies and user agents. These client-server peer protocol proposals submitted to the P2PSIP
systems typically employ a proxy-registrar server for working group. It is an application-layer binary
every domain. In contrast, in a Peer-to-Peer Session protocol for creating and maintaining an overlay
Initiation Protocol (P2PSIP) system, SIP is used in a network.
peer-to-peer environment where traditional proxy- The peers in a P2PSIP overlay collectively run a
registrar and message routing functions are replaced by distributed database algorithm, which allows data to be
a peer-to-peer overlay network [2]. Whereas in stored and retrieved in an efficient manner. This
traditional client-server SIP, Address of Record (AoR) algorithm is used to implement a distributed location
to contact Uniform Resource Identifier (URI) mappings function providing the mapping between SIP AoRs and
are stored by a SIP proxy-registrar, in P2PSIP these contact URIs. Distributed Hash Tables (DHTs) are one
mappings are distributed amongst the peers in the way to implement the distributed database. Both the
overlay. Chord [5] and Bamboo [6] DHTs have been suggested
In this paper, the performance of a P2PSIP system is as good choices for the distributed database algorithm
studied through experiments in the PlanetLab network. to be used by P2PSIP [3]. The Resource Location and
The goal is to gain an understanding of the Discovery (RELOAD) [7] P2PSIP peer protocol
characteristics and requirements of P2PSIP. Gaining proposal specifies the Chord DHT as mandatory to
such an understanding is important, since the implement. For these reasons, we chose Chord as the
characteristics of P2PSIP, and person-to-person DHT algorithm for our P2PSIP prototype.
communication in general, are different from
2.2. Chord results have been obtained by running experiments in
PlanetLab instead of using a simulated or emulated
Chord [5] is a structured P2P protocol that uses network. In contrast to [6], which studies the
consistent hashing to build a DHT out of several performance of the Bamboo DHT, we used the Chord
independent nodes. Consistent hashing assigns each DHT. This paper also investigates three different churn
peer and data key an m-bit identifier using SHA-1 as rates, unlike [17], which uses only one churn rate.
the base hash function. These keys are ordered on an Finally, in addition to studying churn, we also
identifier circle of size 2m . On the identifier circle, key experiment with different maintenance intervals.
k is assigned to the first peer whose identifier equals or
follows the identifier of k in the identifier space. The 4. Experiments
identifier circle is called the Chord ring.
On the Chord ring, each peer n maintains a routing The results presented in this paper were obtained
table with up to m entries, called the finger table. The from a real P2PSIP overlay created by nodes running
ith entry in the table contains the identity of the first the P2PSIP prototype. The P2PSIP prototype was
peer s that succeeds n by at least 2i −1 on the identifier implemented in the Java programming language. The
circle. This peer is called the ith finger. In an N-node prototype uses P2PP as the peer protocol and Chord
network, each node maintains information about only with recursive routing as the DHT. Peer protocol
O (log N ) other nodes in its finger table. connections run over TCP. The reason TCP was
Chord needs the successor pointer of each node to chosen is that both the P2PP [4] and RELOAD [7]
be up to date in order to ensure that lookups produce P2PSIP peer protocols prefer the use of TCP over
correct results as the set of participating nodes changes. UDP. The prototype uses SIP as the call control
This is achieved by running a stabilization protocol in protocol. SIP uses the P2PSIP overlay as a lookup
the background periodically. The stabilization protocol mechanism to map AoRs to contact URIs.
uses three operations: successor stabilization, finger The experiments were carried out in the PlanetLab
stabilization and predecessor aliveness check. In the network [8]. To run each experiment, the P2PSIP
successor stabilization procedure, a node n asks its prototype was uploaded to a set of PlanetLab nodes,
successor for the successor's predecessor p, and decides which then created a global P2PSIP overlay consisting
whether p should be n's successor instead. The purpose of 500 peers. For instance a small enterprise could have
of the finger stabilization procedure is to incorporate a 500-peer global internal P2PSIP telephony network.
new nodes into the finger table. Finally, in the Ideally, we would have used larger overlays, but the
predecessor aliveness check, a node simply contacts its number of simultaneously online PlanetLab nodes was
predecessor to learn whether the predecessor is still a limiting factor. We believe that a 500-node overlay
alive. already demonstrates problems that will also affect
To increase robustness in the event of node failures, larger overlays. Each PlanetLab node ran three peers.
each Chord node maintains a successor list of size r, Peers join the overlay by contacting a bootstrap peer, a
containing the node's first r successors. PlanetLab node located in France. During their uptime,
peers collect measurement data and report the results to
a server before leaving the overlay.
3. Related work
Previous studies of churn in DHTs include [6] and 4.1. Traffic model
[17]. In [6], the performance of the Bamboo DHT is
studied in an emulated 1000-node network. Three The arrival and departure of users (i.e., churn) is
factors that affect the behavior of DHTs under churn modeled as a Poisson process with three different mean
were identified: reactive versus periodic recovery from arrival rates: 1/5, 1/10, and 1/30 peers/s. We chose 1/5
failures, calculation of message timeouts, and choice of peers/s as the highest churn rate because at even higher
nearby over distant neighbors. churn rates, the stability of the network started to
In [17], the impact of churn on four DHTs, suffer. On the other hand, the choice of the lowest
Tapestry, Chord, Kelips and Kademlia is studied churn rate was motivated by the observation that in the
through simulations in a 1024-node overlay. The main Skype network, the median session time is several
finding is that with the right parameter settings, all four hours [9], [10]. Graceful node departures are used;
DHTs have similar overall performance. before leaving the overlay, a peer sends a Leave
Unlike [6] and [17], this paper focuses on P2PSIP. message to its first successor and predecessor.
Our traffic model is specific to P2PSIP. Also, our
Table 1. Traffic model Table 2. Chord parameters
Parameter Value Parameter Value
Arrival rate of users high 1/5s, medium 1/10s, low1/30s Finger pointers 9
Average network size 500 peers Successor pointers 10
Measurement duration 3600s Maintenance intervals 15s, 45s, 90s, 135s, 180s, 270s, 360s
Busy hour call attempts 2.21 calls per user Self-lookup interval 60s
% of calls to non-buddies 33.3% Every measurement for a specific maintenance interval
Size of buddy list 22 at a specific churn rate was repeated 15 times.
In each experiment, a 500-node P2PSIP network When the maintenance timer fires at peer n, peer n
was created from scratch. The actual data collection performs the successor stabilization, finger
phase begins when the network size reaches 500 users stabilization, and predecessor aliveness check
and lasts for an hour. This one hour period is modeled operations. The successor stabilization operation of
as a busy hour, and during it the average network size Chord is implemented by using the P2PP
stays at 500 peers. ExchangeTable request and response. The finger
The SIP traffic exchanged between the peers stabilization operation is implemented by using P2PP
consists of instant messaging, presence and Voice over LookupPeer request. When the maintenance timer fires,
IP (VoIP) calls. Each user is assumed to make 13 VoIP peer n chooses the next finger interval i that should be
calls per day, as suggested in [11]. Out of these 13 fixed and sends a LookupPeer request for the identifier
calls, 17% are used to represent the busy hour traffic
n + 2i −1 . The peer that answers the request will be made
[12], meaning that the number of busy hour call
the ith finger of peer n. Finally, the predecessor
attempts per user is 2.21. The arrival of calls is
aliveness check operation is implemented by using
modeled as a Poisson process with a mean rate of 2.21
P2PP KeepAlive request.
calls per hour. It is assumed that 1/3 of calls are made
In addition to the stabilization operations described
to users not on the buddy list; users typically call their
above, each peer performs a self-lookup operation [15]
friends instead of strangers [13]. Only calls targeted to
every 60 seconds. In the self-lookup, a node searches
strangers require a P2PSIP lookup operation; in the
for itself in the network by sending a lookup to its first
case of buddies, the contact URI is learned already
successor. The purpose of the self-lookup operation is
when making the initial lookup for the buddy to find
to detect loops; in a loopy network a self-lookup
her presence status. The size of the buddy list of each
initiated by node n may never reach node n, but is
user is 22, based on the results obtained in [14]. After
answered by some other node. The Chord parameters
having joined the P2PSIP overlay, each peer initiates
we used are summarized in Table 2.
22 lookup operations to find the contact addresses of
the user's buddies. Periodic queries are not sent for
offline buddies, but it is assumed that the overlay 5. Results
notifies the user when the buddy becomes online. The
traffic model is summarized in Table 1. This section presents the results obtained in the
measurements. The error bars shown in the figures of
4.2. Chord parameters this section represent 95% confidence intervals.

In Chord, each peer maintains on the order of log N 5.1. Bandwidth consumption
fingers and successors [5]. In the experiments, nine
P2PP traffic consists of maintenance traffic and
fingers and ten successor pointers were maintained.
lookup traffic. By maintenance traffic we refer to
According to [15], a Chord network in a ring-like
traffic used to maintain the overlay, that is, traffic
state remains in a ring-like state as long as nodes send
generated by joins, departures, and stabilization
Ω(log 2 N ) messages before N new nodes join or N/2
operations. In contrast, lookup traffic consists of the
nodes fail. Thus, in a 500-node network, at a churn rate P2P lookup operations done by users.
of 1/5s, roughly Ω(log 2 N ) rounds of stabilization The average number of incoming bytes per second
should occur in 1250 seconds (i.e., before 250 nodes for the peers participating in the overlay is shown in
leave). However, as we will see, making the Fig. 1. Due to the symmetric nature of P2PP traffic, the
maintenance operations too frequent has a negative average bandwidth consumption of outgoing traffic is
impact on the performance of the overlay. In the nearly identical to Fig. 1 and is thus not shown
experiments, we studied the performance of the P2PSIP separately. The 95% confidence intervals are so small
overlay network at three different churn rates with that they are not visible in the figure. Only the
maintenance intervals ranging from 15 to 360s. bandwidth consumption of P2PP messages is included
in the figure; the overhead of lower layers in the 4,8

protocol stack has been omitted. The average 4,6

bandwidth consumption is similar for all churn rates; 4,4

4,2
since Chord uses periodic recovery from failures [6],

Hop count
1/5s
4
the frequency of maintenance messages does not 3,8
1/10s
1/30s
depend on the churn rate. However, although the 3,6

amount of maintenance traffic is identical for all churn 3,4

3,2
rates, at the higher churn rates there is more lookup 3
traffic, as can be observed from Fig. 2, which depicts 0 45 90 135 180 225 270 315 360 405
Maintenance interval [s]
the percentage of maintenance traffic out of total
traffic. For instance, at the highest churn rate, 720 Figure 3. Average hop count
peers join the overlay during the one hour measurement 270s to 360s reduces the number of stabilization
period. Each joining user initiates among other things rounds occurring during the one hour period only by
22 lookups for her buddies. These lookups are three and thus does not reduce the total amount of
processed by 1220 peers that are active in the overlay traffic significantly, whereas increasing the interval
during the measurement period. At the lowest churn from 15s to 45s causes the number of stabilization
rate, only 120 peers join during the one hour period, rounds to drop by 160 from 240 to 80.
and the buddy lookups they initiate are processed by From Fig. 2, one can observe that even at the
620 peers. Thus, at the highest churn rate, an average highest churn rate, 80-95% of the traffic is
peer receives more incoming lookup traffic than at the maintenance traffic, depending on the frequency of
lowest churn rate. This explains why there is a small maintenance operations. Thus, maintenance traffic
statistically significant increase in the bandwidth clearly dominates over lookup traffic. This is especially
consumption for each maintenance interval when the clear at the lowest churn rate, for which 93-98% of the
churn rate is increased. total traffic is maintenance traffic.
From Fig. 1, we can also observe that for all churn
rates, there is a statistically significant reduction in the 5.2. Hop count
amount of traffic when the maintenance interval is
increased from 15s to 45s or to 90s. As the Fig. 3 shows the average hop count for each churn
maintenance interval is increased further, the amount of rate. We can observe that churn rate has a clear impact
traffic each peer generates per second starts to level on hop count: the average number of hops is always
off. This is because e.g. increasing the interval from significantly (statistically at the 95% confidence level)
120 larger for the same maintenance interval at higher
100
churn rates than at the lower churn rates. For instance,
for the maintenance interval of 90s, the average hop
80
1/5s count is 0.5 hops larger at the highest churn rate (1/5s)
Bytes/s

60 1/10s
1/30s
than it is at the lowest churn rate (1/30s). The reason
40 for the hop count being larger at higher churn rates
20 is that a high churn rate causes the finger table of
0
each node to be less optimal. Each finger that leaves
0 45 90 135 180 225 270 315 360 405 the network introduces a hole to the finger table of at
Maintenance interval [s]
least one other node in the network. Since the finger
Figure 1. Incoming bytes per second stabilization process is run periodically, it takes time
105 before the empty finger table entry can be filled with a
100
new finger. The higher the churn rate, the more
frequently holes are introduced to the finger tables.
% of maintenance traffic

95

90 1/5s
And the more holes the finger table of a peer has, the
1/10s less optimal are the routing decisions the peer makes,
85 1/30s
since the peer is forced to forward the request being
80
routed to a finger that may be a rather distant
75
predecessor of the target identifier. This has the effect
70
0 45 90 135 180 225 270 315 360 405 of increasing the average hop count.
Maintenance interval [s]
By observing the graph of each churn rate, we can
Figure 2. Percentage of maintenance traffic also see that increasing the maintenance interval has the
effect of increasing the hop count. As an example, for 1600

the high and medium churn rates, lengthening the 1500


maintenance interval from 15s to 45s introduces a 1400
statistically significant increase in the hop count. Hop

Delay [ms]
1300 1/5s

count grows as the maintenance operations become less 1200


1/10s
1/30s
frequent since at longer maintenance intervals, it takes
1100
longer before old fingers are replaced with more
1000
optimal fingers, and before holes in the finger table are
900
filled. The highest hop count values are observed when 0 45 90 135 180 225 270 315 360 405

a long maintenance interval is used in a high-churn Maintenance interval [s]

scenario. In that case, holes are introduced to finger


Figure 4. Average lookup delay
tables frequently, and it takes a long time before the
40
empty positions are filled with new fingers. 35

30

Failed lookups [%]


5.3. Lookup Delay 25
1/5s
20 1/10s
1/30s
Fig. 4 shows the average lookup delay for all churn 15

rates. Delays of self-lookup requests are not included in 10

5
the figure; since self-lookups requests make a 360 0
degree turn around the Chord ring, their delays are 0 45 90 135 180 225 270 315 360 405
Maintenance interval [s]
considerably longer than for other lookups. In Fig. 4,
the x-axis has been shifted to right by 5s for the Figure 5. All failed lookups
medium (1/10s) churn rate, since the confidence
intervals of that churn rate overlap with those of the 5.4. Failed lookups
other churn rates. The difference in delay between the
medium and high churn rates and the medium and low Fig. 5 shows the percentage of failed lookups for the
churn rates is in many cases statistically insignificant. three churn rates. By lookups we mean P2PP
However, we can still see from Fig. 4 that regardless of LookupObject, Self-Lookup and LookupPeer requests.
the frequency of maintenance operations, there is The differences between different churn rates are
always a statistically significant difference between the statistically significant. At the highest churn rate, the
average delay at the high and low churn rates. The best performance is achieved when the maintenance
reason why the delay is greater for the higher churn interval is 15s or 45s (the difference in lookup failure
rates is explained first of all by the fact that the higher rates for these two maintenance intervals is statistically
churn rates have higher hop counts. Second of all, at insignificant), in which case roughly nine percent of all
the higher churn rates also the average load (queue lookups fail. At the 1/10s churn rate, the best
size) at each peer is higher, meaning that the queuing performance is achieved when the maintenance interval
delay is higher. As an example, for the maintenance is 15-90s. In this case, roughly five percent of lookups
interval of 15s, the average forwarding delay at each fail. The lowest churn rate also has the lowest lookup
hop is roughly 40ms more for the highest churn rate failure ratio. The best performance is achieved when
than for the lowest churn rate. Thirdly, path failures are the maintenance interval is 15-180s, in which case at
more common at higher churn rates. The detection of a the minimum, 3.5 percent of all lookups fail. A lookup
path failure requires a timeout at an intermediate node, is considered a failure if it either times out, exceeds the
which has the effect of increasing the average delay. hop count limit or encounters a path error.
From Fig. 4, we can also observe that for each From Fig. 5, we can observe for the two highest
individual churn rate, the average lookup delay grows churn rates, that as the maintenance interval is
as the maintenance operations become less frequent. As increased beyond 90s, the number of failed lookups
an example, for the 1/10s churn rate, the average starts to grow steeply. The differences between
lookup delay is significantly higher for the 180s maintenance intervals are statistically significant. At
maintenance interval than for the 90s interval. The the lowest churn rate, the performance does not
delay grows first of all because the average hop count degrade as dramatically, since the number of peers
grows. Second of all, also the amount of requests joining and leaving the network between any two
experiencing path failures grows, which has the effect consecutive rounds of maintenance operations is
of increasing the delay. considerably smaller than for the higher churn rates.
14000 departure of fingers that leave the overlay can be
12000 detected by peer n from the fact that the TCP
10000 connection is closed, provided of course that peer n has
Loop
8000 TTL established a TCP connection with the finger.
Timeout
6000
Path error
However, if the finger leaves the overlay before peer n
4000
has established a TCP connection with it, peer n will
2000
not learn that the finger has left until a path error
0
15 45 90 135 180 270 360 occurs. We used a TCP socket connect timeout of 30s.
Maintenance interval [s]

Figure 6. Self-lookup failures 5.5. Timeouts


This results in a smaller amount of path errors, loops,
timeouts and Time-to-Live (TTL) exceeded errors. Fig. 7 plots the percentage of requests that
The best level of performance at the highest churn experience a timeout. Also all other request types in
rate is achieved when the maintenance interval is 15- addition to lookup requests are included. From the
45s; but even in this case roughly one out of ten figure, we can observe that timeouts are more common
lookups fails. This can hardly be considered at the higher churn rates. The differences between the
satisfactory. To help understanding the reasons behind churn rates are statistically significant except in the
failed lookups, Fig. 6 depicts the causes of failures for case of the 90s maintenance interval for which the
self-lookup requests at the highest churn rate. The difference between the low and medium churn rates is
figure shows only self-lookup requests because the rate statistically insignificant. We can also observe that for
of self-lookup operations does not depend on the each churn rate, a reduction in the rate of maintenance
maintenance interval, which makes comparison messages results in a higher amount of timeouts. As an
between different maintenance intervals easy. From the example, for all churn rates, the percentage of timed
figure, we can observe that for all maintenance out requests is significantly higher for the 180s interval
intervals, the most common failure type is a path error. than for the 15s interval. The majority of timeouts are
A path error occurs whenever the peer to which a caused by path errors. At the time when an intermediate
message is being routed cannot be contacted, for node along the path of a message detects that the next
instance because the peer has already left the network. hop node is unreachable, it will send an error response
When the maintenance interval is 45s, roughly 75% of back to the initiator of the request. However, if it took a
self-lookup failures are caused by path errors. Nearly long time for the intermediate node to realize that the
all path errors occur because the next hop node a peer n next hop node is gone, then the request may already
selects from its finger or neighbor table for a request have timed out at the initiator and also at intermediate
has left the network, meaning that the peer cannot nodes along the path.
establish a connection to the next hop node. Nearly all There is no statistically significant difference
path errors are forward path errors; path errors that between the percentages of failed requests for the
occur because of a reverse path failure, that is, because medium and low churn rates at the 90s maintenance
a response cannot be returned to the previous hop interval. As we will see later, in the case of the medium
intermediary since that node has left the network, are churn rate, the 90s interval also has a lower amount of
very rare. For instance, at the highest churn rate, less detected loops and path failures than the other
than 1.5 percent of all path failures are reverse path intervals. The low number of timeouts for this interval
failures regardless of the frequency of maintenance at the medium churn rate is thus explained by the small
operations. This result supports the use of symmetric amount of path errors (as we discussed above, path
recursive routing in P2PSIP overlays [16]. errors cause the majority of timeouts).
Forward path errors occur whenever a finger to 25

which a request is being routed has already left the 20

overlay. Since Chord uses periodic recovery, a finger


Timed out requests [%]

that leaves the overlay does not inform a remote peer n 15


1/5s
1/10s
having it as a finger of its departure. Instead, peer n is 10
1/30s

responsible for detecting the departure of the finger by


itself. Whenever peer n detects that a finger has left 5

while trying to unsuccessfully forward a request to it, a 0


0 45 90 135 180 225 270 315 360 405
forward path error occurs. However, since the P2PSIP Maintenance interval [s]

prototype uses TCP as the transport protocol, the


Figure 7. Percentage of timed out requests
45 25

% self-lookups failing due to a path failure


40

35 20
Failed self-lookups [%]

30
15
25 1/5s 1/5s
1/10s 1/10s
20 1/30s 1/30s
10
15

10
5
5

0 0
0 45 90 135 180 225 270 315 360 405 0 50 100 150 200 250 300 350 400
Ma inte nance interva l [s] Maintenance interval [s]

Figure 8. Percentage of failed self-lookups Figure 9. Self-lookup path errors


The percentage self-lookup requests experiencing a
5.6. Failed self-lookups path failure is depicted in Fig. 9. Although initially, the
percentage of path failures decreases significantly as
Fig. 8 shows the percentage of failed self-lookups the maintenance interval is lengthened, it starts to grow
for each churn rate. As can be expected, we can again after a certain point. This is especially visible for
observe that at the higher churn rates, significantly the two highest churn rates; for both of them the
more self-lookups fail. However, when considering the increase in the percentage of path failures is
frequency of maintenance operations, we can observe statistically significant when the maintenance interval is
that it does not seem to pay off to make the increased from 135 to 270s. In our Chord
maintenance interval very small; for each churn rate, a implementation, a peer does not establish a connection
statistically significant reduction in the percentage of to a new finger or to a new neighbor until it has the first
failed self-lookups can be achieved by using an interval message that should be forwarded to that peer. This
longer than 15s. The most common reason why a self- works well at small maintenance intervals since due to
lookup request fails is a path error. The high amount of the high amount of traffic, the gap between adding a
path errors at short maintenance intervals is explained peer to the routing table and sending the first message
by the fact that when the interval is short, a peer to it is very small. However, at longer maintenance
switches its fingers and successors too frequently. If the intervals, there is less traffic and it will take longer
interval is 15s, each of the nine fingers may change before the first message is routed to a new finger. Thus,
every 135s. When switching to a new finger, the it will take longer before the connection is established.
connection to the old finger is closed and the new Consequently, the probability that the new finger or
finger is inserted into the finger table. The neighbor has left the network before the moment when
disadvantage of changing fingers frequently is that a it is contacted for the first time grows as the
peer may choose as the new finger a peer that will maintenance interval becomes longer. The effect is
leave the network shortly. If the new finger leaves the more pronounced at the higher churn rates.
network before the peer has established a TCP On the other hand, e.g. in the case of the longest
connection with it (to deliver the first message), a path maintenance interval, the successor stabilization
error will occur. The benefit of using slightly longer procedure is only run every six minutes. If the churn
maintenance intervals is that since fingers change less rate is high, the chances are that some of the successors
frequently, there are less chances of switching to a leave the network between two consecutive successor
finger that will leave the network shortly. stabilization rounds. This will result in more path
Another reason for the high number of self-lookup errors, since messages will get routed to successors that
path failures at the 15s maintenance interval is that due are no longer present in the overlay.
to the large amount of traffic, some of the peers To avoid the case that the departure of a peer is not
become overloaded, as will be discussed in Section 5.7. detected until the moment when routing the first
If the successor s of peer n is overloaded, the messages message to it, peers could as an optimization establish
n sends to s may time out, which causes n to remove TCP connections to new neighbors and fingers
the successor. The removed successor is replaced by immediately after they have learned about the existence
the next successor on the successor list. However, if no of the new neighbors and fingers, even before they
TCP connection has yet been established with the new have any traffic to send to them. Of course, detecting
successor, a path error will occur if the new successor node failures from closed connections is TCP-specific
leaves the overlay before the first messages gets routed and does not work if a connectionless transport such as
to it. UDP is used. We plan to evaluate this approach as part
of future work.
5.7. Detected loops successors starts to level off beginning from the
maintenance interval of 45s. For the medium churn
8
rate, this happens starting from the interval of 90s, and
for the lowest churn rate, from the interval of 270s
% of self-lookups failing due to a loop

6 onwards. For the medium and high churn rates, there is


5 a statistically significant drop in the number of
5s
4 10s unresponsive successors when increasing the
30s
3 maintenance interval from 15s to 45s. E.g. in the case
2 of the highest churn rate, the number of unresponsive
1 successors drops by almost 73%. For the low churn
0
0 45 90 135 180 225 270 315 360 405
rate, there is a statistically significant drop in the
Maintenance interval [s] number of unresponsive successors when increasing the
maintenance interval from 15s to 90s. Removal of
Figure 10. Percentage of looping self-lookups
unresponsive successors creates a lot of instability in
450

400
the network and causes messages to be incorrectly
routed. The high amount of unresponsive successors
Unresponsive successors

350

300 explains the reason why the amount of loops decreases


1/5s
250
1/10s initially in Fig. 10 when moving away from the shortest
200

150
1/30s
maintenance intervals. The unresponsiveness of the
100 successors is explained by the fact that at the 15s
50 maintenance interval, the queue size of the message
0
0 45 90 135 180 225 270 315 360 405
dispatcher thread of the P2PSIP prototype becomes
Maintenance interval [s] very large in some PlanetLab nodes. Such overloaded
Figure 11. Unresponsive successors nodes create a lot of instability in the network.
The percentage of self-lookups failing due to a loop
is plotted in Fig. 10. The amount of loops reveals how 6. Conclusion
stable the Chord ring is. We consider a loop to exist in
the network if a self-lookup request initiated by peer n In this paper, the performance of a P2PSIP VoIP
is answered by some other peer than peer n itself. From and instant messaging system was studied through
the figure, we can observe that the network is clearly measurements carried out in PlanetLab.
less stable at higher churn rates. As the maintenance The average lookup delay in a 500-node P2PSIP
interval is increased beyond 135s, the difference in the overlay was found to be between 1.0s and 1.5s,
percentage of self-lookups encountering a loop for the depending on the churn rate and maintenance interval.
three churn rates is statistically significant. We can also Based on these results we can conclude that the extra
see that for the high and medium churn rates, the delay caused by replacing the centralized location
amount of instability increases rapidly as the service of SIP with a distributed mechanism is not
maintenance interval grows. However, initially the excessive.
percentage of self-lookups encountering a loop In a P2PSIP overlay where P2PP lookup traffic
decreases for the highest churn rate; as the maintenance consists of lookups related to presence, instant
interval grows from 15s to 45s, there is a statistically messaging and VoIP calls, DHT maintenance traffic
significant drop in the percentage of self-lookups clearly dominates over lookup traffic.
failing due to a loop. The same phenomenon can be As churn increases, hop count, lookup delay, and
observed for the medium churn rate as the interval is amount of timeouts, detected loops and failed lookups
increased from 15s to 90s. increase. The same happens when the rate of
Fig. 11 shows the number of successors that fail to maintenance messages is decreased. However, what is
respond to ExchangeTable requests that are used to perhaps more surprising is that making maintenance
implement the successor stabilization procedure. A operations too frequent clearly has a negative impact
successor is removed if two consecutive on the performance of the overlay. This is because too
ExchangeTable requests sent to it time out. From the short a maintenance interval results in some peers
figure, we can see that for each churn rate, there is a becoming overloaded and in too frequently changing
large number of unresponsive successors in the case of finger and neighbor pointers.
the shortest maintenance interval. In the case of the The most common cause of failed requests is a path
highest churn rate, the amount of unresponsive error. Path errors are nearly always forward path
failures. Reverse path failures were found to be rather
rare. This result supports the use of symmetric [8] L. Peterson, A. Bavier, M.E. Fiuczynski and S. Muir:
recursive routing in P2PSIP overlays, since it shows “Experiences Building PlanetLab,” in Proc. 7th USENIX
Symp. on Operating Systems Design and Implementation,
that reverse path failures, which have been considered
2006.
to be a major disadvantage for symmetric recursive
routing, do not in fact constitute a problem. [9] S. Guha, N. Daswani, and R. Jain: “An experimental
The number of failed lookups is rather high for all stydy of the Skype peer-to-peer VoIP system,” in Proc. 5th
of the churn rates. Since the most common cause for a International Workshop on Peer-to-Peer Systems (IPTPS),
failed lookup is a path error, an improvement can be Santa Barbara, CA, Feb. 2006.
achieved by retrying a failed lookup. However, if the
lookup is related to e.g. a VoIP call, such retries are [10] D. Rossi, M. Mellia, and M. Meo: “A detailed
undesirable since they increase the call setup delay. measurement of Skype Network Traffic,” in Proc 7th
International Workshop on Peer-to-Peer Systems
At the highest churn rate and highest rate of
(IPTPS’08), Tampa Bay, Florida, Feb. 2008.
maintenance messages, each peer sends and receives
355 kilobytes of data during one hour. Thus, we can [11] B. Athwal, F.C. Harmatzis and V.P. Tanguturi:
conclude that the bandwidth consumption of a P2PSIP “Replacing Centric Voice Services with Hosted VoIP
system is rather low compared to traditional P2P Services: An Application of Real Options Approach,” in
applications such as file sharing. Proc. 16th International Telecommunications Society (ITS)
European Regional Conference, Porto, Portugal, Sept. 2006.
7. References [12] “Traffic Analysis for Voice over IP”,
[Link]
[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, olutions/TA_ISD.html
J. Peterson, R. Sparks, M. Handley and E. Schooler: “SIP:
Session Initiation Protocol”, RFC 3261, Jun. 2002. [13] C. Cheng, S. Tsao and J. Chou: “Unstructured Peer-to-
Peer Session Initiation Protocol for Mobile Environment,” in
[2] K. Singh and H. Schulzrinne: “Peer-to-peer internet Proc. 18th Annual IEEE International Symposium on
telephony using SIP,” in Proc. International Workshop on Personal, Indoor and Mobile Radio Communications
Network and Operating System Support for Digital Audio (PIMRC’07), 3-7, Sept 2007, pp. 1-5.
and Video, Washington, USA, 2005, pp. 63-68.
[14] B. A. Nardi, S. Wittaker and E. Bradner: “Interaction
[3] D. Bryan, P. Matthews, E. Shim, D. Willis and S. and Outeraction: Instant Messaging in Action,” in 2000 Proc
Dawkins: “Concepts and Terminology for Peer-to-Peer SIP”, ACM conference on computer supported cooperative work,
Internet-Draft, work in progress, Jul. 2008. Philadelphia, Pennsylvania, United States, 2000, pp. 79-88.
[4] S. Baset, H. Schulzrinne and M. Matuszewski: “Peer-to- [15] D. Liben-Nowell, H. Balakrishnan and D. Karger:
Peer Protocol (P2PP)”, Internet-Draft draft-baset-p2psip- “Observations on the dynamic evolution of peer-to-peer
p2pp-01, work in progress, Nov. 2007. networks,” in Proc. First International Workshop on Peer-to-
Peer Systems (IPTPS’02), Cambridge, MA, Mar. 2002.
[5] I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M. F.
Kaashoek, F. Dabek and H. Balakrishnan: “Chord: A scalable [16] D.A. Bryan, B.B. Lowekamp and M. Zangrilli: “The
peer-to-peer lookup protocol for Internet applications,” Design of a Versatile, Secure P2PSIP Communications
IEEE/ACM Trans. Networking, vol. 11, no. 1, pp. 17-32, Architecture for the Public Internet,” in IEEE International
Feb. 2003. Symposium on Parallel and Distributed Processing (IPDPS
2008), April 2008, pp. 1-8
[6] S. Rhea, D. Geels, T. Roscoe and J. Kubiatowicz:
“Handling churn in a DHT,” in Proc. annual conference on [17] J. Li, J. Stribling, T.M. Gil, R. Morris and M.F.
USENIX, Boston, MA, 2004. Kaashoek: “Comparing the performance of distributed hash
tables under churn,” in Proc. 3rd International Workshop on
[7] C. Jennings, B. Lowekamp, E. Rescorla, S. Baset and H. Peer-to-Peer Systems (IPTPS), San Diego, CA, Feb. 2004.
Schulzrinne: “Resource Location and Discovery (RELOAD)
Base Protocol”, Internet-Draft draft-ietf-p2psip-base-01,
work in progress, Dec. 2008.

You might also like