[go: up one dir, main page]

0% found this document useful (0 votes)
11 views10 pages

Best Practices For Collecting Speed Test Data

This document outlines best practices for collecting and analyzing Internet speed test data to improve the accuracy of conclusions drawn about ISP performance. It highlights challenges such as the lack of standardized protocols and the influence of non-ISP factors on test results, and provides recommendations including logging client-server latency, connection type, and client device. The authors emphasize the importance of careful data collection and analysis to ensure that speed test data remains a valuable tool for policymakers and researchers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views10 pages

Best Practices For Collecting Speed Test Data

This document outlines best practices for collecting and analyzing Internet speed test data to improve the accuracy of conclusions drawn about ISP performance. It highlights challenges such as the lack of standardized protocols and the influence of non-ISP factors on test results, and provides recommendations including logging client-server latency, connection type, and client device. The authors emphasize the importance of careful data collection and analysis to ensure that speed test data remains a valuable tool for policymakers and researchers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Best Practices for Collecting Speed Test Data

Kyle MacMillan, Tarun Mangla, Marc Richardson, Nick Feamster


Department of Computer Science, University of Chicago, USA
{macmillan,tmangla,mtrichardson,feamster}@uchicago.edu

Abstract However, there are multiple challenge to evaluating In-


ternet speed using crowdsourced data. First, there are no
In an effort to expand Internet access, local and federal standardized protocols to measure Internet speed. As a result,
policymakers have sought to use speed test data to determine there exist multiple tools that each have there own testing
where to allocate funding. However, drawing accurate and protocol and produce different results under similar test con-
meaningful conclusions about Internet Service Provider (ISP) ditions. Second, there is no guarantee that a speed test will
network performance from speed test data requires careful measure the ISP access link. Speed tests measure the bottle-
consideration of the test conditions under which the data was neck link, wherever along the end-to-end path (see Figure 1)
collected. This paper provides a set of recommendations for that link may be. The bottleneck can shift to the WiFi network
gathering and analysing a single speed test measurement. Our inside the user’s home but can also appear outside the ISP
recommendations are based on our own analysis of speed test altogether, occurring at an interconnect or closer to the test
data collected from in-lab, controlled experiments, as well server. Thus, a test result indicating poor performance may
as from a six month long deployment across 77 households not be attributable to the ISP.
in Chicago. Based on the results of our analysis, we suggest Given these challenges, drawing accurate conclusions
that the client-server latency, the client device, and the access from speed test data is a path fraught with pitfalls. Neverthe-
medium (wireless vs. wired) be logged for each test. These less, speed test data remains an invaluable tool in measuring
three metadata can be used later to assess the accuracy or Internet performance and will continue to be used in high
utility of the speed test. We also propose that running “paired stakes decision making. Therefore, we present a set of rec-
tests” (running two different speed tests in succession) can ommendations for both gathering and analyzing individual
help mitigate the effects of test conditions for which we can’t speed test results. In particular, we discuss factors that should
measure, such as server load or upstream congestion. Such be controlled for at test-time (when the test is conducted) or
paired tests are most beneficial for households subscribed to post-hoc during data analysis1 .
download speeds greater than 500 Mbps. Our recommendations are based on speed test measure-
ment data collected from both in-lab experiments conducted
1 Introduction under controlled network conditions and a six-month long
deployment across 77 households in Chicago. While our rec-
Consumer speed test data has become an attractive re- ommendations are applicable across speed test tools, they are
source for federal and local officials to evaluate the last-mile drawn from analyses of two popular speed test tools, Ookla
Internet Service Provider (ISP) performance. In particular, Speedtest (Ookla) [26] and Measurement Lab’s (M-Lab’s) Net-
these policymakers have sought to use speed test data to iden- work Diagnostic Test version 7 (NDT7) [18]. We chose to
tify neighborhoods and regions that qualify for funding from analyze Ookla and NDT7 because they are used extensively—
the Broadband Equity, Access, and Deployment (BEAD) pro- Ookla and NDT7 report a daily average of over 10 million [25]
gram, which provides $42.5 billion to expand Internet access and 6 million tests [16], respectively. As a result, Ookla and
in underserved areas. The National Telecommunications and M-Lab have collectively amassed billions of speed tests [6,
Information Administration (NTIA) [21] maintains a National 25] and have compiled data sets that have become universal
Broadband Availability Map to identify underserved neigh- resources for analyzing broadband Internet performance [3,
borhoods and uses speed test data among other sources. In 8, 22, 30].
addition, state and local officials across the country are urg-
1 There are other important issues including metrics and methods used
ing consumers to participate in speed test crowd-sourcing
to aggregate data from multiple measurements which we relegate to future
initiatives to help to further identify which areas meet the work
federal funding criteria.

Electronic copy available at: https://ssrn.com/abstract=4189044


Figure 1: End-to-end path for a speed test.

Our key recommendations are summarized below: offered by both ISPs [2, 7] and non-ISP entities [18, 23, 26], are
flooding-based tools that work by saturating the bottleneck
• Record the latency between the client and server at test link through active measurements.
time. High latency can prevent the tool from saturating
the link, thus underestimating the speed. This can
Evaluating speed tests. Feamster and Livingood [10] dis-
occur above 100 ms and 400 ms for NDT7 and Ookla,
cuss considerations with using flooding-based tools to mea-
respectively.
sure speed. They do not, however, conduct empirical exper-
• Record whether the client connected via WiFi or wired iments to characterize NDT7 and Ookla performance. Sim-
connection. For users subscribed to speeds above 50 ilarly, Bauer et al. [4] explain how differences in speed test
Mbps, connecting via WiFi will often cause the tool to design and execution contribute to differences in test results.
underestimate speed, almost always at speeds above Bauer et al.’s work differs from ours in several ways. First,
800 Mbps. However, the tool is less likely to underesti- both Ookla and NDT have seen major design changes in the 12
mate for speeds < 50 Mbps. years since that study. Both tools have updated their flooding
and sampling mechanisms, and NDT’s latest version (NDT7)
• Record the client device. The client device’s hardware uses TCP BBR instead of TCP Reno. Second, they only analyze
may become the bottleneck at sufficiently high speeds public NDT data and do not study both Ookla and NDT in
(1Gbps). controlled lab settings, nor did they conduct paired measure-
ments in the wide area that allows direct comparison of Ookla
• To mitigate the effects of non-ISP unobservable bottle-
and NDT, as we do. Complimentary to our comparative anal-
necks, such as server and congested transit ISPs, that
ysis is work by Clark et al. [6] that provides recommendations
appear especially in high-speed connections, consider
on how to use aggregated NDT data, including considering
running two different tools in succession and taking
the self-selection bias and other end-user bottlenecks like
the max of the two results.
slow WiFi and outdated modems.

2 Related Work Residential broadband. Goga et al. [11] evaluate the ac-
Speed test design. There are two primary ways to mea- curacy of various speed test tools in residential networks,
sure throughput: (1) packet probing and (2) flooding. Most yet tools have changed and speeds on residential networks
packet probing techniques send a series of packets and infer have increased more than 20× since this study ten years
metrics like available bandwidth or link capacity based on the ago. Sundaresan et al. [29] studied network access link per-
inter-arrival packet delay [9, 12, 14, 15, 27]. More recently, formance in residential networks more than ten years ago.
Ahmed et al. [1] estimate bandwidth bottlenecks by probing Whereas our work is more focused on characterizing speed
the network using recursive in-band packet trains. However, test tools, this work examined network performance differ-
these techniques can be inaccurate especially for high speed ences across ISPs, looking at latency, packet loss, and jit-
networks due to their sensitivity to packet loss, queuing policy ter in addition to throughput. Canadi et al. [5] use publicly
etc. As a result, most commercial speed tests, including ones available Ookla data to analyze broadband performance in 35

Electronic copy available at: https://ssrn.com/abstract=4189044


metropolitan regions. Finally, the Federal Communications is unique and is characterized by the tool’s server inclusion
Commission (FCC) conducts the Measuring Broadband Amer- policy. For instance, any network can operate an Ookla server,
ica project (MBA) [20], an ongoing study of fixed broadband though each server needs to pass some quality checks and
performance in the United States. The FCC uses SamKnows can be removed if it is empirically determined to under-report
whiteboxes [28] to collect a suite of network QoS metrics, throughput. On the other hand, NDT7 servers are operated by
including throughput, latency, and packet loss. Because the a managed infrastructure, owned and operated by a single or-
MBA project maps broadband Internet performance across ganization (Measurement Lab). Ookla servers are sometimes
different ISPs, they use a single speed test—a proprietary “on net” (within the same ISP as the client), although that
test developed by SamKnows—and do not consider Ookla or is neither a requirement nor a guarantee on the other hand,
NDT7. because NDT7 servers are operated in data centers, they are
typically “off net”. Servers that are off net result in end-to-end
3 Factors impacting speed test out- paths that may traverse multiple networks, including tran-
sit networks and interconnection points that may introduce
come bottlenecks.
An Internet speed test measures the achievable through-
put (or data transfer rate) of a network path between the Test protocol. The test protocol is an important aspect of
client and the test server (see Figure 1). To do so, most exist- the test and consists of the following key functions:
ing speed test tools send as much traffic along the network
path as possible, a technique typically referred to as “flooding”. • Server selection. Before beginning the test, the tool must
The sending rate is controlled by the Transmission Control determine the test server to be used. Existing tools typically
Protocol’s congestion control algorithm, which adapts the choose a server based on the client-server latency and client
sending rate during the test based on its estimation of the geolocation. For example, Ookla begins by choosing the 10
available network bandwidth. The tool will then estimate closest servers to the client, where the client’s location can
the speed based on the throughput achieved during the test. be determined either by IP geolocation or GPS. Ookla then
There are a number of factors that can impact the final speed pings of each of these servers and picks the server with the
that the tool reports, including test protocol and elements on lowest latency.
the network path, as explained below: • Flooding mechanism. Having picked the server, the tool
begins to transfer data. With the goal of measuring the
Client platform. The client platform, including the maximum data transfer rate, the tool will attempt to saturate
client’s device and software, could impact the result of a speed the bottleneck link. The tools can differ in their saturation
test. In most cases, speed tests are initiated by consumers on mechanisms. We find that Ookla adapts both the number
devices such as laptops and smartphones. In some cases, speed of open TCP connections and the test length in response to
tests can also be automatically conducted using wrapper soft- changes in the measured throughput over the course of the
ware (e.g., Murakami [17], Netrics [13]) installed on dedicated test; whereas NDT7 opens only a single TCP connection,
hardware. As for test software, the measurements can be run and the test itself always runs for ten seconds. With that
using a browser or native application, depending on user pref- said, the latest versions of both Ookla and NDT7 use TCP
erence, client device, and the speed test tool implementation. websockets.
• Sampling and aggregation. Over the course of the test,
Access medium. The client device connects to the home
the tool will periodically sample its sending rate. Once the
router via wireless or wired link. With that said, most devices
data transfer is complete, the tool will aggregate this sample
typically connect via WiFi. The WiFi access medium, as we
to determine the final reported speed. As a result, the chosen
will show later, may introduce a new speed bottleneck into the
sampling and aggregation methods can have varying impact
network path. As a result, conducting a speed test over WiFi
on the reported speed. While NDT7 reports the average
can produce lower results than if the test had been conducted
throughput over the entire test (bytes transferred / test
using a wired connection.
time), Ookla discards lower speed samples. This decision
affects every test because of how the tools increase their
Test server and End-to-end network path Before data sending rates. Each TCP connection begins with a period
transfer begins, the speed test tool chooses a test server from called “slow start”, in which the client and server transfer
its network of test servers. Each tool’s network of test servers data at a rate that is slower than the steady state transfer

Electronic copy available at: https://ssrn.com/abstract=4189044


rate. Futhermore, any transient decrease in sending rate the client and server. We then conduct speed test measure-
that occurs during the test, triggered by, for example, short- ments and log their accuracy. Here, accuracy is defined as the
term packet loss, will be captured by the speed reported by ratio of reported speed metric and the link capacity. The accu-
NDT7 but not by Ookla. Although the effects of sampling racy is always between 0 and 1; value of 1 indicates the tool
and aggregation difference are comparatively small, they reported the capacity accurately and lower values indicate
nevertheless impact the reported speed. under estimation of link capacity.
Figure 2 shows how the accuracy of the measured down-
Among these factors, the client platform and access load speed decreases as the round-trip time between the client
medium are dependent on the end-user and their network and the server increases. The “Method” indicates what the
setup. The choice of the speed test tool determines the testing measured speed would be using different sampling and aggre-
protocol. The test server and the corresponding client-server gation techniques. “Reported” indicates the speed reported by
network path is dependent on both the test tool and the client the tool, while Avg shows the speed as calculated by taking the
location. Given the number of non-ISP related factors can total number of bytes transferred over the total transmission
impact a speed test, a blind use of speed measurement data time. Unsurprisingly, the overall trend is that both Ookla and
can lead to incorrect conclusions about ISP performance. For NDT7 report lower speeds as the latency increases. However,
example, a low speed result that was caused by poor WiFi, Ookla is not affected by the increase in latency until it exceeds
could be incorrectly attributed to the ISP underperforming. 400 ms. On the other hand, NDT7’s accuracy decreases to 90%
Therefore, in this paper, we provide recommendations on how of the link capacity when the RTT is 100 ms and further to
to best gather (and analyze) the speed test measurements to 83% at 200 ms.
reduce the likelihood of a non-ISP factor impacting speed test The differences between the tools can be explained by
outcomes. the differences in test protocol and TCP’s congestion control.
Under high latency, TCP takes longer time to saturate the link
4 Recommendations due to delayed feedback. Ookla is able to mitigate latency
In this section, we provide our key recommendations. effect to a large extent as it uses an adaptive test length. It
Wherever needed, we support our measurements with data runs the test for longer duration when the variance in mea-
collected either from experiments conducted in-lab under con- sured throughput is high and thus can saturate the link even
trolled network conditions or from a six-month deployment under high latency. Moreover, its sampling strategy discards
gathering measurement data from 67 households in Chicago. the TCP slow start phase. NDT7, on the other hand, uses a
fixed test length and reports average throughput across the
4.1 Log the client-server latency. entire duration of the test. This leads to Ookla reporting more
accurate throughput values than NDT7.
Recommendation: Record the latency between the Nevertheless, the latency effects are potentially significant
client and server at test time. High latency can prevent because they imply that if a client test selects a path to a
the tool from saturating the link, thus underestimating server with high latency, then the reported throughput could
the speed. This can occur above 100 ms and 400 ms for be significantly lower. We argue that it would be prudent to
NDT7 and Ookla, respectively. discard any speed tests for which the client-server round-trip
time exceeds 400 ms for Ookla tests and 100 ms for NDT7
The measured speed is a function of the network condi- tests. Tests that are conducted when the latency exceeds
tions along the end-to-end path. Although many of these these thresholds finish before the sending rate has increased
conditions, such as the load on each link, are unknown, there enough to saturate the bottleneck link, leading to the lower
are conditions that are measurable and possible to control than expected speed. It could also happen that a significant
for. One such factor is the latency between the client and number of tests from a specific region exceed the latency
the server. Intuitively, higher latency can delay the feedback threshold. We argue that such data is indicative of either poor
process in TCP’s congestion control and thus it may take test server infrastructure or overall network infrastructure in
longer for the test to saturate the link. To understand the the area. However, the solution in both cases is not to increase
threshold at which the latency begins to impact the speed the ISP access speeds, but to improve the server footprint
test, we conduct a set of controlled experiments to precisely in the former case and improve the network infrastructure
control the network conditions along the network path. To do in the latter case by increased peering and improving the
so, we host both the test client and test server in-lab, allowing middle-mile connectivity.
us to fix the link capacity while varying the latency between

Electronic copy available at: https://ssrn.com/abstract=4189044


1.0
Accuracy [Measured / Capacity]

Number of Mean Frequency of


Speed Tier (Mbps)
Households Bottlenecks
0.8
<50 2 0.01
0.6
50 - 100 11 0.16
Tool
Ookla
100 - 200 3 0.41
0.4
NDT7 200 - 400 6 0.72
Method 400 - 800 9 0.84
0.2
Reported
Avg >800 6 0.98
0.0
0 100 200 300 400 500 600
Round-Trip Time [ms] Table 1: Average frequency of WiFi bottlenecks by speed tier. There
are at least 30 tests for each household.
Figure 2: Download accuracy vs. round-trip latency. Shaded region
represents a 95% confidence interval for n = 10 tests. The reported
method shows the speeds reported by the tool. The average method is nection at that point in time3 . Note, the plugin runs speed
the average data transfer rate during the test. test only when there is no browser network activity to avoid
ruining the user experience.
We then compare the result of this (WiFi) speed test with
4.2 Do NOT use WiFi for high-speed links a speed test conducted between the RPi and an M-Lab hosted
NDT7 server that occurred within 6 hours of the laptop to
Recommendation: Record whether the client con- device test. Although ideally these two tests would be con-
nected via WiFi or wired connection. For users sub- ducted in succession, our experimental setup did not easily
scribed to speeds above 50 Mbps, connecting via WiFi facilitate this approach. In comparing these two results, we
will often cause the tool to underestimate speed, almost posit that if the WiFi test result is lower than the wired test
always at speeds above 800 Mbps. However, the tool is result, then the WiFi link would have been the bottleneck had
less likely to underestimate for speeds < 50 Mbps. the user conducted a speed test between their laptop and the
M-Lab NDT7 server. This framework allows us to determine
When conducting a speed test, users access the Internet via a how often the WiFi link is the bottleneck. We further consider
wireless (e.g. WiFi) or wired (e.g. Ethernet) connection, the the effect of WiFi by grouping the devices into different speed
choice of which, may skew the measurement. Because wire- tiers. We determine the speed tier based on the results of
less transmission is more susceptible to interference and sig- speed tests conducted by that participant over the course of
nal attenuation, wireless connections can have lower through- the study.
put limits than wired connections. As a result, running a speed Table 1 shows the frequency of speed tests for which the
test over WiFi may shift the bottleneck link from the access WiFi is the bottleneck, as determined by our experiment setup.
link to the wireless connection. Ideally this shift is inhibited We only include households for which there are more than
by exclusively conducting speed test over a wired connection. 30 pairs of WiFi and wired tests that fulfill our requirement
However, encouraging a wired-only approach across all speed that they occurred within 6 hours of each other. Our results
tiers may be excessive. suggest that the wireless hop becomes the bottleneck in the
As part of our deployment, we conducted an experiment to vast majority of speed tests for households subscribed to
determine how often and under what speeds does the wireless speeds greater than 200Mbps. For households subscribed
hop becomes the bottleneck during a speed test in practice. We to plans above 800Mbps, the WiFi will almost always be the
placed Raspberry Pis (RPi) in 67 households across Chicago bottleneck link that is measured. Conversely, households
from November 2021 to April 20222 . Each RPi is connected to with speed plans lower than 100Mbps are far less likely to
the home network via wired connection and conducts daily experience WiFi bottlenecks. Of particular interest is that the
NDT7 speed tests. In addition, study participants installed a 2 households subscribed to speed tiers lower than 50Mbps
browser plugin on their laptop that would conduct an NDT7 experience a WiFi bottleneck in only 1% of tests. Although the
test between a user’s laptop and the RPi. This test indicates number of households is too low to draw broad conclusions,
the maximum achievable throughput over the wireless con- this result suggests that, for users subscribed to these speeds,
2 For more details about the deployment, please refer to our past work [19] 3 We assume that the user laptop is connected to Internet through WiFi

which is often the case these days.

Electronic copy available at: https://ssrn.com/abstract=4189044


1.0 1.0 1.0
Accuracy [Measured / Capacity]

Accuracy [Measured / Capacity]

Accuracy [Measured / Capacity]


0.8 0.8 0.8
1.00 1.00

0.6 0.6 0.6


Native Native Native
Browser Browser Browser
0.95 0.95
0.4 0.4 0.4

0.2 0.2 0.2


0.90 0.90
100 101 102 103 0 1 2 3 4 5

0.0 0.0 0.0


1 5 10 100 500 2000 0 1 2 3 4 5 0 100 200 300 400 500 600
Link Capacity [Mbps] Packet Loss [%] Round-Trip Time [ms]

(a) Link Capacity (b) Packet Loss (c) Latency

Figure 3: NDT7 accuracy on different client types (browser vs. native) for upload tests.

it is possible to measure the access link throughput despite median accuracy between the browser and the native client
the user connecting over WiFi. is within 1%. Although there is a small dip in accuracy as the
link capacity is increased, the difference is less than 2%. The
4.3 Log the client hardware and software difference in the results from past measurements might be due
Recommendation: Record the client device. The client to upgrades in the browser and testing client. Nevertheless,
device’s hardware may become the bottleneck at suffi- browser-based clients may still become a bottleneck at higher
ciently high speeds (1Gbps). speeds. For instance, Ookla suggests using their native clients
instead of browser clients for testing connections with speed
greater than 3 Gbps [24]. Thus, it is good practice to record
Speed bottlenecks are not limited to links along the end- the client application type used to conduct the test.
to-end path. The endpoints of a connection, in our case the
client and the server, can sometimes be the limiting factor. As 4.4 Run paired tests for higher confidence
such, it’s important to know what device the client is using
(e.g. laptop or phone), as well as whether they are using the Recommendation: To mitigate the effects of non-ISP
native or browser test application. unobservable bottlenecks, such as server and congested
The client device is only likely to become the bottleneck transit ISPs, that appear especially in high-speed connec-
at high speeds. For example, the RPis used in our deployment tions, consider running two different tools in succession
have a network interface card (NIC) that supports up to 1 Gbps and taking the max of the two results.
throughput. Thus, the RPi device becomes the bottleneck for
households subscribed to plans exceeding 1 Gbps. The specific While some bottlenecks are observable (or easy to infer),
limits, however, can vary based on the device hardware, and the others are challenging to locate or unobservable (e.g., test
hence it is important to log the client hardware. server, transit ISPs). To account for unobservable factors and
Past work has found that the client type (browser vs na- increase the likelihood of measuring the ISP access link, we
tive) may also impact the accuracy of the speed test [10]. propose running paired tests, i.e., two different speed test
However, we observe minimal effects in our in-lab measure- tools in succession. The “paired test” idea relies on the path
ments. We compare the accuracy of the NDT7 native client diversity achieved by using two different speed test tools, as
and NDT7 browser client under different network conditions, different tools have different server infrastructure. Running
varying the link capacity, packet loss, and client to server the same tool twice does not yield the same advantage as the
latency. There is no readily available way to conduct the same two successive runs could measure the same network path
comparison for Ookla because it is not open-source, and we containing a non-ISP bottleneck link.
cannot configure the test server in Ookla ’s current browser To make the idea more clear, we provide an example of
client. running two successive speed tests using Tool A and Tool
Figure 3 shows how the accuracy of the NDT7 browser and B. If Tool A and Tool B report similar throughput values, it
native client vary under different network conditions. We do is highly likely that they are measuring the same bottleneck
not observe significant differences between tests conducted link. Moreover, the bottleneck lies in the common network
over browser and native client. As packet loss and latency path between Tool A and Tool B. On the other hand, if Tool
between the client and server is induced, the difference in A reports a relatively low speed while Tool B reports a high

Electronic copy available at: https://ssrn.com/abstract=4189044


1.000 1.0
0.975

10th Percentile Normalized Speed


0.9
0.950
Median Normalized Speed

0.925
0.8
0.900
0.875 0.7
0.850 Tool Tool
Ookla 0.6 Ookla
0.825 Max Max
NDT7 NDT7
0.800 0.5
0-100 100-500 >500 0-100 100-500 >500
Speed Tier (Mbps)
Speed Tier (Mbps)
(a) Download
(a) Download
1.000
1.0
0.975

10th Percentile Normalized Speed


0.950 0.9
Median Normalized Speed

0.925
0.8
0.900
0.875
0.7
0.850 Tool
Ookla Tool
0.825 Max 0.6 Ookla
NDT7 Max
0.800 NDT7
0-30 >30 0.5
Speed Tier (Mbps) 0-30 >30
Speed Tier (Mbps)
(b) Upload
(b) Upload
Figure 4: The distribution of median speeds across households from
Ookla, NDT7, and paired tests by speed tier. Each point in the plot Figure 5: The distribution of 10th percentile speeds across households
represents the median speed from a single household. from Ookla, NDT7, and paired tests by speed tier. Each point in the plot
represents the median speed form a single household.

speed, then the bottleneck link for tool A was likely upstream,
as otherwise tool B would also have reported a low speed. measurements are needed over time and paired tests can also
Therefore, we propose running two successive tests using detect persistent congestion events at the last-mile ISP link.
two different tools and take the maximum of the two reported Advantage of using Paired Test: We now empirically
speed values. The “paired tests” methods is not a full-proof analyze the advantage of using paired tests using the data
method to measure the ISP access link but it only increase the from our wide-area deployment. We consider the speed test
confidence. It could very well happen that the both the tests data collected from our deployment of RPis. Each RPi would
measure a different upstream bottleneck. conduct at least daily paired Ookla and NDT7 tests at ran-
There is also a trade-off in using paired tests as it may dom times of day. The RPi is connected directly to the router
ignore transient congestion in the ISP link. More specifically, to avoid any WiFi bottlenecks. Over the course of the de-
it could happen that the last-mile is the bottleneck for both ployment, the median number of paired download test across
the tests but due to dynamic link conditions the test reports households was 354, with a minimum of 50 and a maximum
two different throughput values. Paired tests, in this case, of 2,429. For each paired test, we compute the maximum of
would ignore the lower throughput value. While this could the two tests. We compare the distribution of individual tools
be considered a limitation of using paired tests, we argue with the maximum results of a paired test.
that a single measurement (paired or single) is anyways not To understand the magnitude of the benefit, we first nor-
enough to characterize a dynamic link. Ultimately, multiple malize each speed test result. Normalizing allows us to com-
pare results across different households, among which there

Electronic copy available at: https://ssrn.com/abstract=4189044


are different speed tiers. Ideally we could normalize against its impact is more apparent when looking at the tail of the
each result against the offered speed. However, because we distribution.
do not have access to this information, we instead define the Looking at Figure 5a we see that the 10th percentile for
nominal speed for a given household to be the 90th percentile. paired tests is strictly greater than both NDT7 and Ookla.
Using the nominal speed, we compute the normalized speed This indicates that neither tool always reports a higher speed.
for each test i as follows: Moreover, we observe that the benefit of using paired tests
is greatest for users subscribed to download speeds above
Si 500Mbps. For these users, the median 10th percentile speed of
Ŝi = (1)
S90th paired tests is 2% and 5% higher than that of Ookla and NDT7,
where Si is the speed reported by test i and S90th is the 90th respectively. The difference is more striking for the 25th per-
percentile result across all speed tests from that particular centile household, in which the paired tests are 14% and 10%
household. In addition to using the nominal speed to nor- greater than that of Ookla and NDT7. As for upload results,
malize test results, we use it when assigning households to the differences are less pronounced. This is unsurprising, as
different speed tiers. upload speeds are typically much lower than download, so
Having normalized each test, we study how the distri- there is less likely to be upstream congestion.
bution of paired test results differs from the distribution of
Ookla and NDT7 results. Although there are several ways to 5 Conclusion
characterize the distribution, we focus on the average (me-
dian) and the tail (10th percentile) of the distribution. Using The paper presents a best practice guide to accurately
this framework, Figure 5 shows the 10th percentile speed and measure ISP access speeds using consumer speed tests. We
Figure 4 shows the median speed across households from the categorize the factors that can impact accuracy of tests as
distributions of Ookla, NDT7, and paired tests. observable and unobservable factors. We quantify the impact
Looking first at the median download results (Figure 4a, of observable factors under different conditions using data
there is not a significant advantage to using paired tests as collected from both in-lab experiments and wide-area deploy-
opposed to conducting only Ookla tests. Indeed, the median ment. For unobservable factors, we propose taking paired
household speed using paired tests is within 1% of the median measurements as it can increase the confidence in a test mea-
household speed using Ookla across all speed tiers. It is likely suring ISP access link. Future work will consider methods
that congestion in an upstream link is a rare enough event that and metrics around aggregation of speed data across users
its occurrence is not reflected in the median speed. Instead, and times.

Electronic copy available at: https://ssrn.com/abstract=4189044


References [14] Manish Jain and Constantinos Dovrolis. “Pathload:
A measurement tool for end-to-end available band-
[1] Adnan Ahmed, Ricky Mok, and Zubair Shafiq. “Flow- width”. In: Passive and Active Measurements (PAM)
trace: A framework for active bandwidth measure- Workshop. 2002.
ments using in-band packet trains”. In: Passive and
[15] Srinivasan Keshav. “A control-theoretic approach to
Active Network Measurement. Springer. 2020.
flow control”. In: Communications architecture & pro-
[2] AT&T. AT&T Internet Speed Test. url: https://www. tocols. 1991.
att.com/support/speedtest/.
[16] Measurement Lab. BigQuery QuickStart. url: https://
[3] New York State Office of the Attorney General. New www.measurementlab.net/data/docs/bq/quickstart/.
York Internet Health Test. url: https://ag.ny.gov/
[17] Measurement Lab. Murakami: Automating Speed Test
SpeedTest.
Measurements. url: https : / / github . com / m - lab /
[4] Steven Bauer, David D Clark, and William Lehr. “Un- murakami.
derstanding broadband speed measurements”. In: TPRC.
[18] Measurement Lab. Speed test by Measurement Lab. url:
2010.
https://speed.measurementlab.net/.
[5] Igor Canadi, Paul Barford, and Joel Sommers. “Revis-
[19] Kyle MacMillan, Tarun Mangla, James Saxon, Nicole P
iting broadband performance”. In: Proceedings of the
Marwell, and Nick Feamster. “A Comparative Analysis
2012 Internet Measurement Conference. ACM, 2012.
of Ookla Speedtest and Measurement Labs Network Di-
[6] David D. Clark and Sara Wedeman. Measurement, Mean- agnostic Test (NDT7)”. In: arXiv preprint arXiv:2205.12376
ing and Purpose: Exploring the M-Lab NDT Dataset. (2022).
SSRN Scholarly Paper. Rochester, NY, Aug. 2021.
[20] Measuring Broadband America. Apr. 2022. url: https:
[7] Comcast. Xfinity Speed Test. url: https://speedtest. //www.fcc.gov/general/measuring-broadband-ameri
xfinity.com/. ca.
[8] Federal Communications Commission. FTC Takes Ac- [21] National Broadband Availability Map. url: https://
tion Against Frontier for Lying about Internet Speeds www.ntia.doc.gov/category/national- broadband-
and Ripping Off Customers Who Paid High-Speed Prices availability-map.
for Slow Service. May 2022. url: https : / / www . ftc .
[22] Battle For the Net. Internet Health Test based on Mea-
gov / news - events / news / press - releases / 2022 / 05 /
surement Lab NDT. url: https://www.battleforthene
ftc-takes-action-against-frontier-lying-about-
t.com/internethealthtest/.
internet - speeds - ripping - customers - who - paid -
high-speed. [23] Netflix. fast.com. url: https://fast.com/.
[9] Constantinos Dovrolis, Parameswaran Ramanathan, [24] Ookla. 10 Gbps? There’s a Speedtest for That. url: https:
and David Moore. “What do packet dispersion tech- //www.ookla.com/articles/10-gbps-speedtest.
niques measure?” In: Proceedings IEEE INFOCOM. 2001. [25] Ookla. About Ookla SpeedTests. url: https : / / www .
[10] Nick Feamster and Jason Livingood. “Measuring inter- speedtest.net/about.
net speed: current challenges and future recommenda- [26] Ookla. Speedtest by Ookla - The Global Broadband Speed
tions”. In: Communications of the ACM (2020). Test. url: https://www.speedtest.net/.
[11] Oana Goga and Renata Teixeira. “Speed measurements [27] Vinay Joseph Ribeiro, Rudolf H Riedi, Richard G Bara-
of residential internet access”. In: Passive and Active niuk, Jiri Navratil, and Les Cottrell. “pathchirp: Efficient
Network Measurement. Springer. 2012. available bandwidth estimation for network paths”. In:
[12] Ningning Hu and Peter Steenkiste. “Evaluation and Passive and active measurement workshop. 2003.
characterization of available bandwidth probing tech- [28] SamKnows. url: https://www.samknows.com/.
niques”. In: IEEE Journal on Selected Areas in Commu-
[29] Srikanth Sundaresan, Walter De Donato, Nick Feam-
nications (2003).
ster, Renata Teixeira, Sam Crawford, and Antonio Pescapè.
[13] Internet Equity Initiative. Netrics: Automating Speed “Broadband internet performance: a view from the
Test Measurements. url: https://github.com/chicago gateway”. In: ACM SIGCOMM computer communica-
-cdac/nm-exp-active-netrics. tion review (2011).

Electronic copy available at: https://ssrn.com/abstract=4189044


[30] Pennsylvania State University and Measurement Lab. nia. url: https://www.rural.pa.gov/publications/
Broadband Availability and Access in Rural Pennsylva- broadband.cfm.

10

Electronic copy available at: https://ssrn.com/abstract=4189044

You might also like