P2PSIP Maintenance Operations Study
P2PSIP Maintenance Operations Study
In Chord, each peer maintains on the order of log N 5.1. Bandwidth consumption
fingers and successors [5]. In the experiments, nine
P2PP traffic consists of maintenance traffic and
fingers and ten successor pointers were maintained.
lookup traffic. By maintenance traffic we refer to
According to [15], a Chord network in a ring-like
traffic used to maintain the overlay, that is, traffic
state remains in a ring-like state as long as nodes send
generated by joins, departures, and stabilization
Ω(log 2 N ) messages before N new nodes join or N/2
operations. In contrast, lookup traffic consists of the
nodes fail. Thus, in a 500-node network, at a churn rate P2P lookup operations done by users.
of 1/5s, roughly Ω(log 2 N ) rounds of stabilization The average number of incoming bytes per second
should occur in 1250 seconds (i.e., before 250 nodes for the peers participating in the overlay is shown in
leave). However, as we will see, making the Fig. 1. Due to the symmetric nature of P2PP traffic, the
maintenance operations too frequent has a negative average bandwidth consumption of outgoing traffic is
impact on the performance of the overlay. In the nearly identical to Fig. 1 and is thus not shown
experiments, we studied the performance of the P2PSIP separately. The 95% confidence intervals are so small
overlay network at three different churn rates with that they are not visible in the figure. Only the
maintenance intervals ranging from 15 to 360s. bandwidth consumption of P2PP messages is included
in the figure; the overhead of lower layers in the 4,8
4,2
since Chord uses periodic recovery from failures [6],
Hop count
1/5s
4
the frequency of maintenance messages does not 3,8
1/10s
1/30s
depend on the churn rate. However, although the 3,6
3,2
rates, at the higher churn rates there is more lookup 3
traffic, as can be observed from Fig. 2, which depicts 0 45 90 135 180 225 270 315 360 405
Maintenance interval [s]
the percentage of maintenance traffic out of total
traffic. For instance, at the highest churn rate, 720 Figure 3. Average hop count
peers join the overlay during the one hour measurement 270s to 360s reduces the number of stabilization
period. Each joining user initiates among other things rounds occurring during the one hour period only by
22 lookups for her buddies. These lookups are three and thus does not reduce the total amount of
processed by 1220 peers that are active in the overlay traffic significantly, whereas increasing the interval
during the measurement period. At the lowest churn from 15s to 45s causes the number of stabilization
rate, only 120 peers join during the one hour period, rounds to drop by 160 from 240 to 80.
and the buddy lookups they initiate are processed by From Fig. 2, one can observe that even at the
620 peers. Thus, at the highest churn rate, an average highest churn rate, 80-95% of the traffic is
peer receives more incoming lookup traffic than at the maintenance traffic, depending on the frequency of
lowest churn rate. This explains why there is a small maintenance operations. Thus, maintenance traffic
statistically significant increase in the bandwidth clearly dominates over lookup traffic. This is especially
consumption for each maintenance interval when the clear at the lowest churn rate, for which 93-98% of the
churn rate is increased. total traffic is maintenance traffic.
From Fig. 1, we can also observe that for all churn
rates, there is a statistically significant reduction in the 5.2. Hop count
amount of traffic when the maintenance interval is
increased from 15s to 45s or to 90s. As the Fig. 3 shows the average hop count for each churn
maintenance interval is increased further, the amount of rate. We can observe that churn rate has a clear impact
traffic each peer generates per second starts to level on hop count: the average number of hops is always
off. This is because e.g. increasing the interval from significantly (statistically at the 95% confidence level)
120 larger for the same maintenance interval at higher
100
churn rates than at the lower churn rates. For instance,
for the maintenance interval of 90s, the average hop
80
1/5s count is 0.5 hops larger at the highest churn rate (1/5s)
Bytes/s
60 1/10s
1/30s
than it is at the lowest churn rate (1/30s). The reason
40 for the hop count being larger at higher churn rates
20 is that a high churn rate causes the finger table of
0
each node to be less optimal. Each finger that leaves
0 45 90 135 180 225 270 315 360 405 the network introduces a hole to the finger table of at
Maintenance interval [s]
least one other node in the network. Since the finger
Figure 1. Incoming bytes per second stabilization process is run periodically, it takes time
105 before the empty finger table entry can be filled with a
100
new finger. The higher the churn rate, the more
frequently holes are introduced to the finger tables.
% of maintenance traffic
95
90 1/5s
And the more holes the finger table of a peer has, the
1/10s less optimal are the routing decisions the peer makes,
85 1/30s
since the peer is forced to forward the request being
80
routed to a finger that may be a rather distant
75
predecessor of the target identifier. This has the effect
70
0 45 90 135 180 225 270 315 360 405 of increasing the average hop count.
Maintenance interval [s]
By observing the graph of each churn rate, we can
Figure 2. Percentage of maintenance traffic also see that increasing the maintenance interval has the
effect of increasing the hop count. As an example, for 1600
Delay [ms]
1300 1/5s
30
5
the figure; since self-lookups requests make a 360 0
degree turn around the Chord ring, their delays are 0 45 90 135 180 225 270 315 360 405
Maintenance interval [s]
considerably longer than for other lookups. In Fig. 4,
the x-axis has been shifted to right by 5s for the Figure 5. All failed lookups
medium (1/10s) churn rate, since the confidence
intervals of that churn rate overlap with those of the 5.4. Failed lookups
other churn rates. The difference in delay between the
medium and high churn rates and the medium and low Fig. 5 shows the percentage of failed lookups for the
churn rates is in many cases statistically insignificant. three churn rates. By lookups we mean P2PP
However, we can still see from Fig. 4 that regardless of LookupObject, Self-Lookup and LookupPeer requests.
the frequency of maintenance operations, there is The differences between different churn rates are
always a statistically significant difference between the statistically significant. At the highest churn rate, the
average delay at the high and low churn rates. The best performance is achieved when the maintenance
reason why the delay is greater for the higher churn interval is 15s or 45s (the difference in lookup failure
rates is explained first of all by the fact that the higher rates for these two maintenance intervals is statistically
churn rates have higher hop counts. Second of all, at insignificant), in which case roughly nine percent of all
the higher churn rates also the average load (queue lookups fail. At the 1/10s churn rate, the best
size) at each peer is higher, meaning that the queuing performance is achieved when the maintenance interval
delay is higher. As an example, for the maintenance is 15-90s. In this case, roughly five percent of lookups
interval of 15s, the average forwarding delay at each fail. The lowest churn rate also has the lowest lookup
hop is roughly 40ms more for the highest churn rate failure ratio. The best performance is achieved when
than for the lowest churn rate. Thirdly, path failures are the maintenance interval is 15-180s, in which case at
more common at higher churn rates. The detection of a the minimum, 3.5 percent of all lookups fail. A lookup
path failure requires a timeout at an intermediate node, is considered a failure if it either times out, exceeds the
which has the effect of increasing the average delay. hop count limit or encounters a path error.
From Fig. 4, we can also observe that for each From Fig. 5, we can observe for the two highest
individual churn rate, the average lookup delay grows churn rates, that as the maintenance interval is
as the maintenance operations become less frequent. As increased beyond 90s, the number of failed lookups
an example, for the 1/10s churn rate, the average starts to grow steeply. The differences between
lookup delay is significantly higher for the 180s maintenance intervals are statistically significant. At
maintenance interval than for the 90s interval. The the lowest churn rate, the performance does not
delay grows first of all because the average hop count degrade as dramatically, since the number of peers
grows. Second of all, also the amount of requests joining and leaving the network between any two
experiencing path failures grows, which has the effect consecutive rounds of maintenance operations is
of increasing the delay. considerably smaller than for the higher churn rates.
14000 departure of fingers that leave the overlay can be
12000 detected by peer n from the fact that the TCP
10000 connection is closed, provided of course that peer n has
Loop
8000 TTL established a TCP connection with the finger.
Timeout
6000
Path error
However, if the finger leaves the overlay before peer n
4000
has established a TCP connection with it, peer n will
2000
not learn that the finger has left until a path error
0
15 45 90 135 180 270 360 occurs. We used a TCP socket connect timeout of 30s.
Maintenance interval [s]
35 20
Failed self-lookups [%]
30
15
25 1/5s 1/5s
1/10s 1/10s
20 1/30s 1/30s
10
15
10
5
5
0 0
0 45 90 135 180 225 270 315 360 405 0 50 100 150 200 250 300 350 400
Ma inte nance interva l [s] Maintenance interval [s]
400
the network and causes messages to be incorrectly
routed. The high amount of unresponsive successors
Unresponsive successors
350
150
1/30s
maintenance intervals. The unresponsiveness of the
100 successors is explained by the fact that at the 15s
50 maintenance interval, the queue size of the message
0
0 45 90 135 180 225 270 315 360 405
dispatcher thread of the P2PSIP prototype becomes
Maintenance interval [s] very large in some PlanetLab nodes. Such overloaded
Figure 11. Unresponsive successors nodes create a lot of instability in the network.
The percentage of self-lookups failing due to a loop
is plotted in Fig. 10. The amount of loops reveals how 6. Conclusion
stable the Chord ring is. We consider a loop to exist in
the network if a self-lookup request initiated by peer n In this paper, the performance of a P2PSIP VoIP
is answered by some other peer than peer n itself. From and instant messaging system was studied through
the figure, we can observe that the network is clearly measurements carried out in PlanetLab.
less stable at higher churn rates. As the maintenance The average lookup delay in a 500-node P2PSIP
interval is increased beyond 135s, the difference in the overlay was found to be between 1.0s and 1.5s,
percentage of self-lookups encountering a loop for the depending on the churn rate and maintenance interval.
three churn rates is statistically significant. We can also Based on these results we can conclude that the extra
see that for the high and medium churn rates, the delay caused by replacing the centralized location
amount of instability increases rapidly as the service of SIP with a distributed mechanism is not
maintenance interval grows. However, initially the excessive.
percentage of self-lookups encountering a loop In a P2PSIP overlay where P2PP lookup traffic
decreases for the highest churn rate; as the maintenance consists of lookups related to presence, instant
interval grows from 15s to 45s, there is a statistically messaging and VoIP calls, DHT maintenance traffic
significant drop in the percentage of self-lookups clearly dominates over lookup traffic.
failing due to a loop. The same phenomenon can be As churn increases, hop count, lookup delay, and
observed for the medium churn rate as the interval is amount of timeouts, detected loops and failed lookups
increased from 15s to 90s. increase. The same happens when the rate of
Fig. 11 shows the number of successors that fail to maintenance messages is decreased. However, what is
respond to ExchangeTable requests that are used to perhaps more surprising is that making maintenance
implement the successor stabilization procedure. A operations too frequent clearly has a negative impact
successor is removed if two consecutive on the performance of the overlay. This is because too
ExchangeTable requests sent to it time out. From the short a maintenance interval results in some peers
figure, we can see that for each churn rate, there is a becoming overloaded and in too frequently changing
large number of unresponsive successors in the case of finger and neighbor pointers.
the shortest maintenance interval. In the case of the The most common cause of failed requests is a path
highest churn rate, the amount of unresponsive error. Path errors are nearly always forward path
failures. Reverse path failures were found to be rather
rare. This result supports the use of symmetric [8] L. Peterson, A. Bavier, M.E. Fiuczynski and S. Muir:
recursive routing in P2PSIP overlays, since it shows “Experiences Building PlanetLab,” in Proc. 7th USENIX
Symp. on Operating Systems Design and Implementation,
that reverse path failures, which have been considered
2006.
to be a major disadvantage for symmetric recursive
routing, do not in fact constitute a problem. [9] S. Guha, N. Daswani, and R. Jain: “An experimental
The number of failed lookups is rather high for all stydy of the Skype peer-to-peer VoIP system,” in Proc. 5th
of the churn rates. Since the most common cause for a International Workshop on Peer-to-Peer Systems (IPTPS),
failed lookup is a path error, an improvement can be Santa Barbara, CA, Feb. 2006.
achieved by retrying a failed lookup. However, if the
lookup is related to e.g. a VoIP call, such retries are [10] D. Rossi, M. Mellia, and M. Meo: “A detailed
undesirable since they increase the call setup delay. measurement of Skype Network Traffic,” in Proc 7th
International Workshop on Peer-to-Peer Systems
At the highest churn rate and highest rate of
(IPTPS’08), Tampa Bay, Florida, Feb. 2008.
maintenance messages, each peer sends and receives
355 kilobytes of data during one hour. Thus, we can [11] B. Athwal, F.C. Harmatzis and V.P. Tanguturi:
conclude that the bandwidth consumption of a P2PSIP “Replacing Centric Voice Services with Hosted VoIP
system is rather low compared to traditional P2P Services: An Application of Real Options Approach,” in
applications such as file sharing. Proc. 16th International Telecommunications Society (ITS)
European Regional Conference, Porto, Portugal, Sept. 2006.
7. References [12] “Traffic Analysis for Voice over IP”,
[Link]
[1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, olutions/TA_ISD.html
J. Peterson, R. Sparks, M. Handley and E. Schooler: “SIP:
Session Initiation Protocol”, RFC 3261, Jun. 2002. [13] C. Cheng, S. Tsao and J. Chou: “Unstructured Peer-to-
Peer Session Initiation Protocol for Mobile Environment,” in
[2] K. Singh and H. Schulzrinne: “Peer-to-peer internet Proc. 18th Annual IEEE International Symposium on
telephony using SIP,” in Proc. International Workshop on Personal, Indoor and Mobile Radio Communications
Network and Operating System Support for Digital Audio (PIMRC’07), 3-7, Sept 2007, pp. 1-5.
and Video, Washington, USA, 2005, pp. 63-68.
[14] B. A. Nardi, S. Wittaker and E. Bradner: “Interaction
[3] D. Bryan, P. Matthews, E. Shim, D. Willis and S. and Outeraction: Instant Messaging in Action,” in 2000 Proc
Dawkins: “Concepts and Terminology for Peer-to-Peer SIP”, ACM conference on computer supported cooperative work,
Internet-Draft, work in progress, Jul. 2008. Philadelphia, Pennsylvania, United States, 2000, pp. 79-88.
[4] S. Baset, H. Schulzrinne and M. Matuszewski: “Peer-to- [15] D. Liben-Nowell, H. Balakrishnan and D. Karger:
Peer Protocol (P2PP)”, Internet-Draft draft-baset-p2psip- “Observations on the dynamic evolution of peer-to-peer
p2pp-01, work in progress, Nov. 2007. networks,” in Proc. First International Workshop on Peer-to-
Peer Systems (IPTPS’02), Cambridge, MA, Mar. 2002.
[5] I. Stoica, R. Morris, D. Liben-Nowell, D.R. Karger, M. F.
Kaashoek, F. Dabek and H. Balakrishnan: “Chord: A scalable [16] D.A. Bryan, B.B. Lowekamp and M. Zangrilli: “The
peer-to-peer lookup protocol for Internet applications,” Design of a Versatile, Secure P2PSIP Communications
IEEE/ACM Trans. Networking, vol. 11, no. 1, pp. 17-32, Architecture for the Public Internet,” in IEEE International
Feb. 2003. Symposium on Parallel and Distributed Processing (IPDPS
2008), April 2008, pp. 1-8
[6] S. Rhea, D. Geels, T. Roscoe and J. Kubiatowicz:
“Handling churn in a DHT,” in Proc. annual conference on [17] J. Li, J. Stribling, T.M. Gil, R. Morris and M.F.
USENIX, Boston, MA, 2004. Kaashoek: “Comparing the performance of distributed hash
tables under churn,” in Proc. 3rd International Workshop on
[7] C. Jennings, B. Lowekamp, E. Rescorla, S. Baset and H. Peer-to-Peer Systems (IPTPS), San Diego, CA, Feb. 2004.
Schulzrinne: “Resource Location and Discovery (RELOAD)
Base Protocol”, Internet-Draft draft-ietf-p2psip-base-01,
work in progress, Dec. 2008.