Best Practices For Collecting Speed Test Data
Best Practices For Collecting Speed Test Data
Our key recommendations are summarized below: offered by both ISPs [2, 7] and non-ISP entities [18, 23, 26], are
flooding-based tools that work by saturating the bottleneck
• Record the latency between the client and server at test link through active measurements.
time. High latency can prevent the tool from saturating
the link, thus underestimating the speed. This can
Evaluating speed tests. Feamster and Livingood [10] dis-
occur above 100 ms and 400 ms for NDT7 and Ookla,
cuss considerations with using flooding-based tools to mea-
respectively.
sure speed. They do not, however, conduct empirical exper-
• Record whether the client connected via WiFi or wired iments to characterize NDT7 and Ookla performance. Sim-
connection. For users subscribed to speeds above 50 ilarly, Bauer et al. [4] explain how differences in speed test
Mbps, connecting via WiFi will often cause the tool to design and execution contribute to differences in test results.
underestimate speed, almost always at speeds above Bauer et al.’s work differs from ours in several ways. First,
800 Mbps. However, the tool is less likely to underesti- both Ookla and NDT have seen major design changes in the 12
mate for speeds < 50 Mbps. years since that study. Both tools have updated their flooding
and sampling mechanisms, and NDT’s latest version (NDT7)
• Record the client device. The client device’s hardware uses TCP BBR instead of TCP Reno. Second, they only analyze
may become the bottleneck at sufficiently high speeds public NDT data and do not study both Ookla and NDT in
(1Gbps). controlled lab settings, nor did they conduct paired measure-
ments in the wide area that allows direct comparison of Ookla
• To mitigate the effects of non-ISP unobservable bottle-
and NDT, as we do. Complimentary to our comparative anal-
necks, such as server and congested transit ISPs, that
ysis is work by Clark et al. [6] that provides recommendations
appear especially in high-speed connections, consider
on how to use aggregated NDT data, including considering
running two different tools in succession and taking
the self-selection bias and other end-user bottlenecks like
the max of the two results.
slow WiFi and outdated modems.
2 Related Work Residential broadband. Goga et al. [11] evaluate the ac-
Speed test design. There are two primary ways to mea- curacy of various speed test tools in residential networks,
sure throughput: (1) packet probing and (2) flooding. Most yet tools have changed and speeds on residential networks
packet probing techniques send a series of packets and infer have increased more than 20× since this study ten years
metrics like available bandwidth or link capacity based on the ago. Sundaresan et al. [29] studied network access link per-
inter-arrival packet delay [9, 12, 14, 15, 27]. More recently, formance in residential networks more than ten years ago.
Ahmed et al. [1] estimate bandwidth bottlenecks by probing Whereas our work is more focused on characterizing speed
the network using recursive in-band packet trains. However, test tools, this work examined network performance differ-
these techniques can be inaccurate especially for high speed ences across ISPs, looking at latency, packet loss, and jit-
networks due to their sensitivity to packet loss, queuing policy ter in addition to throughput. Canadi et al. [5] use publicly
etc. As a result, most commercial speed tests, including ones available Ookla data to analyze broadband performance in 35
Figure 3: NDT7 accuracy on different client types (browser vs. native) for upload tests.
it is possible to measure the access link throughput despite median accuracy between the browser and the native client
the user connecting over WiFi. is within 1%. Although there is a small dip in accuracy as the
link capacity is increased, the difference is less than 2%. The
4.3 Log the client hardware and software difference in the results from past measurements might be due
Recommendation: Record the client device. The client to upgrades in the browser and testing client. Nevertheless,
device’s hardware may become the bottleneck at suffi- browser-based clients may still become a bottleneck at higher
ciently high speeds (1Gbps). speeds. For instance, Ookla suggests using their native clients
instead of browser clients for testing connections with speed
greater than 3 Gbps [24]. Thus, it is good practice to record
Speed bottlenecks are not limited to links along the end- the client application type used to conduct the test.
to-end path. The endpoints of a connection, in our case the
client and the server, can sometimes be the limiting factor. As 4.4 Run paired tests for higher confidence
such, it’s important to know what device the client is using
(e.g. laptop or phone), as well as whether they are using the Recommendation: To mitigate the effects of non-ISP
native or browser test application. unobservable bottlenecks, such as server and congested
The client device is only likely to become the bottleneck transit ISPs, that appear especially in high-speed connec-
at high speeds. For example, the RPis used in our deployment tions, consider running two different tools in succession
have a network interface card (NIC) that supports up to 1 Gbps and taking the max of the two results.
throughput. Thus, the RPi device becomes the bottleneck for
households subscribed to plans exceeding 1 Gbps. The specific While some bottlenecks are observable (or easy to infer),
limits, however, can vary based on the device hardware, and the others are challenging to locate or unobservable (e.g., test
hence it is important to log the client hardware. server, transit ISPs). To account for unobservable factors and
Past work has found that the client type (browser vs na- increase the likelihood of measuring the ISP access link, we
tive) may also impact the accuracy of the speed test [10]. propose running paired tests, i.e., two different speed test
However, we observe minimal effects in our in-lab measure- tools in succession. The “paired test” idea relies on the path
ments. We compare the accuracy of the NDT7 native client diversity achieved by using two different speed test tools, as
and NDT7 browser client under different network conditions, different tools have different server infrastructure. Running
varying the link capacity, packet loss, and client to server the same tool twice does not yield the same advantage as the
latency. There is no readily available way to conduct the same two successive runs could measure the same network path
comparison for Ookla because it is not open-source, and we containing a non-ISP bottleneck link.
cannot configure the test server in Ookla ’s current browser To make the idea more clear, we provide an example of
client. running two successive speed tests using Tool A and Tool
Figure 3 shows how the accuracy of the NDT7 browser and B. If Tool A and Tool B report similar throughput values, it
native client vary under different network conditions. We do is highly likely that they are measuring the same bottleneck
not observe significant differences between tests conducted link. Moreover, the bottleneck lies in the common network
over browser and native client. As packet loss and latency path between Tool A and Tool B. On the other hand, if Tool
between the client and server is induced, the difference in A reports a relatively low speed while Tool B reports a high
0.925
0.8
0.900
0.875 0.7
0.850 Tool Tool
Ookla 0.6 Ookla
0.825 Max Max
NDT7 NDT7
0.800 0.5
0-100 100-500 >500 0-100 100-500 >500
Speed Tier (Mbps)
Speed Tier (Mbps)
(a) Download
(a) Download
1.000
1.0
0.975
0.925
0.8
0.900
0.875
0.7
0.850 Tool
Ookla Tool
0.825 Max 0.6 Ookla
NDT7 Max
0.800 NDT7
0-30 >30 0.5
Speed Tier (Mbps) 0-30 >30
Speed Tier (Mbps)
(b) Upload
(b) Upload
Figure 4: The distribution of median speeds across households from
Ookla, NDT7, and paired tests by speed tier. Each point in the plot Figure 5: The distribution of 10th percentile speeds across households
represents the median speed from a single household. from Ookla, NDT7, and paired tests by speed tier. Each point in the plot
represents the median speed form a single household.
speed, then the bottleneck link for tool A was likely upstream,
as otherwise tool B would also have reported a low speed. measurements are needed over time and paired tests can also
Therefore, we propose running two successive tests using detect persistent congestion events at the last-mile ISP link.
two different tools and take the maximum of the two reported Advantage of using Paired Test: We now empirically
speed values. The “paired tests” methods is not a full-proof analyze the advantage of using paired tests using the data
method to measure the ISP access link but it only increase the from our wide-area deployment. We consider the speed test
confidence. It could very well happen that the both the tests data collected from our deployment of RPis. Each RPi would
measure a different upstream bottleneck. conduct at least daily paired Ookla and NDT7 tests at ran-
There is also a trade-off in using paired tests as it may dom times of day. The RPi is connected directly to the router
ignore transient congestion in the ISP link. More specifically, to avoid any WiFi bottlenecks. Over the course of the de-
it could happen that the last-mile is the bottleneck for both ployment, the median number of paired download test across
the tests but due to dynamic link conditions the test reports households was 354, with a minimum of 50 and a maximum
two different throughput values. Paired tests, in this case, of 2,429. For each paired test, we compute the maximum of
would ignore the lower throughput value. While this could the two tests. We compare the distribution of individual tools
be considered a limitation of using paired tests, we argue with the maximum results of a paired test.
that a single measurement (paired or single) is anyways not To understand the magnitude of the benefit, we first nor-
enough to characterize a dynamic link. Ultimately, multiple malize each speed test result. Normalizing allows us to com-
pare results across different households, among which there
10