[go: up one dir, main page]

0% found this document useful (0 votes)
19 views7 pages

Accuracy and Coverage Analysis of IP Geolocation

Uploaded by

pierreloisel2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views7 pages

Accuracy and Coverage Analysis of IP Geolocation

Uploaded by

pierreloisel2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/372091523

Accuracy and Coverage Analysis of IP Geolocation Databases

Conference Paper · June 2023


DOI: 10.1109/BalkanCom58402.2023.10167899

CITATION READS
1 212

1 author:

Abdullah Yasin Nur


University of New Orleans
14 PUBLICATIONS 142 CITATIONS

SEE PROFILE

All content following this page was uploaded by Abdullah Yasin Nur on 05 July 2023.

The user has requested enhancement of the downloaded file.


Accuracy and Coverage Analysis of IP Geolocation
Databases
Abdullah Yasin Nur
Department of Computer Science
University of New Orleans
New Orleans, LA 70148
Email: ayn@cs.uno.edu

Abstract—Identifying the geographical location of Internet hosts including content personalization [10], advertising [14], e-
is crucial for researchers, governments, and commercial entities. commerce [15], content delivery networks [12], credit card
While public and commercial geolocation services are commonly fraud protection [13], and law enforcement [11]. Additionally,
employed for this task, their accuracy in locating Internet hosts
remains questionable. This paper studies the accuracy and cov- understanding the geographical characteristics of the Internet
erage of four popular geolocation databases; MaxMind, DBIP, infrastructure allows us to utilize resourceful paths during or
IP2Location, and IPGeolocationIO. We assess the consistency and after natural disasters [6]; improve the inter-domain routing
comprehensiveness of these services by analyzing the entire IPv4 processes [7]; deploy geography-aware network overlays for
space. Furthermore, we investigate the issue at the prefix level efficient multimedia communications [18]; predict path latency
since geolocation databases typically provide location data at that
level. Finally, we create a ground truth dataset by employing a for service improvement in the Internet [8]; develop more
DNS-based approach and publicly available vendor locations to realistic Internet topology generation tools [19].
evaluate the accuracy of the databases. Our findings indicate that IP geolocation services map IP addresses to their physical
these databases provide comprehensive coverage, whereas their
locations such as a country, city, and/or geographic coordinates.
accuracy is far from satisfactory. Therefore, it is essential to use the
information obtained from these databases cautiously and verify In addition to the challenges of maintaining and updating them,
its accuracy before making any decisions based on it. the accuracy of geolocation databases is highly questionable
Index terms— IP Geolocation; Geolocation Databases; Ge- [16, 17], particularly due to the absence of information about
ographic Information Systems; Accuracy; Reliability the techniques used to construct them. In numerous instances,
the geolocation service providers are the only source of infor-
I. I NTRODUCTION mation regarding the accuracy of their databases. Some vendors
The Internet is one of the largest human-engineered, de- declare accuracy metrics without disclosing the methods used
centralized network of networks serving billions of people to obtain them.
worldwide. It is the primary communication medium for critical In this paper, we study the accuracy and coverage of four
infrastructures such as electricity, finance, and transportation. popular geolocation databases; MaxMind [25], DBIP [24],
With the emergence of innovative applications and technolo- IP2Location [26], and IPGeolocationIO [27]. Our study reveals
gies, such as cloud computing and the Internet of Things, it that the databases cover the majority of the IPv4 space (more
is evident that this trend will only continue to expand in the than 85%). However, they exhibit numerous inconsistencies
future. when we compare their results pairwise. We observe that they
Internet Protocol (IP) to geolocation refers to the process of have a 620 km distance discrepancy on average. The primary
mapping an IP address to a physical location of the device using challenge in conducting such research is the scarcity of ground
that address. It is a challenging task because IP protocol does truth reference information, specifically a comprehensive and
not provide any geolocation information. Researchers suggest diverse collection of IP addresses with established geographical
delay-based geolocation [3, 4] and topology-based geolocation locations to compare with geolocation databases. We create
[5]. Delay-based algorithms typically use latency metrics col- a ground truth dataset using a DNS-based approach [2] and
lected from known geographically distributed locations to locate publicly available vendor locations. The ground truth dataset
target hosts. Topology-based algorithms extend the delay-based contains 6, 345, 323 unique IP addresses with their locations.
techniques by considering the topology with an assumption We use the ground truth dataset to evaluate the accuracy of the
that the topologically close addresses are also physically close. databases. Our results show that the four databases’ average
Moreover, several commercial geolocation databases in the distance discrepancy mean is 376 km.
market combine several methods to increase their accuracy and The rest of the paper is organized as follows. Section
sell their databases [24, 25]. II presents the related work. We introduce the details of
Identifying the geographical location of Internet hosts is our ground truth dataset creation in Section III. Section IV
crucial for researchers, governments, and commercial enti- demonstrates our experimental results and comparisons. Finally,
ties. Specifically, geolocation is used by many applications, Section V concludes the paper.
II. R ELATED W ORK divides the hostname into individual terms, compares them
to a geolocation dictionary to create a set of characteristics,
Mapping IP addresses to their physical location is a signif- and subsequently employs a binary classifier to analyze the
icant task for several reasons, including location-based con- hostname and features obtained.
tent delivery, advertising and marketing, fraud prevention, and
Several research studies have indicated that public and
assistance to law enforcement. Content providers can deliver
commercial databases offer low-resolution geolocation and are
location-specific content to users based on their location using
not dependable regarding city-level accuracy. Hufaker et al.
IP geolocation [10]. For instance, news platforms can provide
[22] utilized a majority vote system across all participating
users with news tailored to their specific location based on
databases to determine the location of an IP address block and
the users’ IP addresses. Additionally, location information
evaluated the databases based on the resultant location. Shavitt
helps some companies to deal with copyright and licensing
and Zilberman [23] assessed the consistency of databases by
agreements that limit the availability of certain titles based on
using a ground truth dataset of IP addresses with verified Points
geographical region. For example, Netflix users in the United
of Presences. Gharaibeh et al. [17] analyzed the reliability of
States see a different selection of content than those in another
router geolocation by using 1.6 million router interface IP ad-
country. Businesses can utilize IP geolocation for targeted
dresses and a ground-truth dataset of 16,586 router interface IP
advertising [14]. Businesses may send localized marketing and
addresses. Their findings indicate that the databases’ accuracy
promotions to their clients by knowing where they are. For
at the country and city levels need improvement because they
instance, users located in New Orleans may see advertisements
are inadequate for geolocating routers correctly.
for local dining establishments, shops, or tourist attractions.
This work analyzes the accuracy and coverage of four major
On the other hand, if the users live in a different city, they
commercial geolocation databases. We examine their consis-
would likely come across advertisements specific to that area. IP
tency and coverage using the entire IPv4 space (more than 4
geolocation can be used for detecting and preventing fraudulent
billion IP addresses). Additionally, we analyze the problem at
activities, such as credit card fraud [13]. Specifically, credit card
the prefix level since geolocation databases provide locations
vendors can utilize geolocation information to detect anomalies
at the prefix level. Finally, we create 6, 345, 323 IP addresses
and determine whether a transaction is legitimate. Finally, IP
as a ground truth and analyze the accuracy of these databases.
geolocation can play a role in law enforcement by providing
information about the location of a device accessing the Internet III. G ROUND -T RUTH L OCATION DATASET
[11]. This information can be used to track down individuals
engaging in illegal activities online. A. Vendor location based Geolocation
Researchers suggested several methods for IP geolocation Some organizations provide a global research network that
by utilizing network measurement and Internet data mining allows researchers to develop, deploy, and test new network
approaches. Network measurement methods use delay and services and applications on a large-scale, geographically dis-
network topology information, whereas Internet data mining tributed platform. Volunteer organizations worldwide join these
approaches use diverse information mined from the Internet, types of networks and make the network globally distributed.
including WHOIS databases, reverse DNS, and public ven- In this work, we use RIPE Atlas [29] and Measurement Lab
dor locations. Padmanabhan and Subramanian [3] provided a (M-Lab) [30] nodes.
technique that involves sending ICMP packets from landmark RIPE NCC is the regional Internet registry (RIR) for Europe,
servers across different geographic locations to the target IP the Middle East, and parts of Central Asia. They created RIPE
address, where the location of the target IP is then estimated Atlas to provide a worldwide collection of probes that gauge
based on the proximity of the closest landmark server in the connectivity and reachability measurements of the Internet
terms of latency. Gueye et al. [4] propose constraint-based in real-time. Volunteers around the world deploy RIPE probes
geolocation technique that estimates a position using a sufficient or RIPE anchors to their own networks. RIPE probes are
number of distances to some fixed points. Katz-Bassett et al. compact hardware devices powered by USB that users connect
[5] propose topology-based geolocation by leveraging network to the Ethernet port on their router. By the time this paper is
topology along with network delay measurements, using tracer- written, there are 11, 981 probes available. RIPE anchors are
oute queries from landmark servers to the IP target. a combination of RIPE probes with increased measurement
DNS-based geolocation methods use geographic hints en- capabilities, and regional measurement targets that are part
coded in domain names to infer locations. UNDNS [2] is one of the more extensive RIPE Atlas network. By the time this
of the most popular DNS decoders, which is a database of paper is written, there are 802 anchors available. For example,
regular expressions that have been manually compiled to extract DigitalOcean (AS number 14061) provides an anchor with an
geographical hints and other relevant details from hostnames. IP address of 104.131.160.184, which is located in New York
DRoP [20] determines the geographic location of hostnames by City.
utilizing rules that are automatically generated by identifying Similarly, Measurement Lab (M-Lab) is an open, distributed
patterns across all the hostname terms associated with a given platform that provides researchers, developers, and the general
domain. Dan et al. [21] provide a machine learning approach public with an easy way to measure and diagnose the perfor-
to IP geolocation using reverse DNS hostnames. Their method mance of their internet connections. They provide 195 nodes
TABLE I: General Characteristics of the Databases TABLE II: Database Coverage
Database Prefix Count Country City Coordinates Missing Prefix Missing IP Missing IP
Database
Count Location Count Location Percentage
IPGeolocationIO 4,786,915 249 66,442 891,840
DBIP 3,304,193 243 118,599 364,360 IPGeolocationIO 33 320,025,007 7.451%
IP2Location 3,123,918 243 72,026 96,879 DBIP 53 592,718,656 13.8%
IP2Location 3511 604,528,896 14.075%
MaxMind 3,422,806 244 97,735 131,613
MaxMind 3356 606,552,311 14.122%

from 66 different cities around the world. For example, TATA


Table I shows the general characteristics of the given
Communications (AS number 6453) provides a node with an
databases. All four databases use ISO 3166-1 Alpha-2 country
IP address of 63.243.240.78, which is located in Los Angeles.
code for representing countries. There are 249 countries in
We collect 24, 810 IP addresses from RIPE probes, 1176 Alpha-2 representation. IPGeolocationIO contains at least one
IP addresses from RIPE anchors, and 1746 IP addresses from IP block for each country, whereas MaxMind does not have
M-Lab. We observe 1159 IP addresses in both RIPE anchors an entry for five, and DBIP and IP2Location do not have an
and RIPE probes set. In total, we collect 26, 573 unique IPv4 entry for six countries. For counting the unique coordinates,
addresses with their location. we rounded the latitude and longitude values to 3 decimal
places, which gives 0.1 km accuracy. Our observations show
B. DNS-based Geolocation
that IPGeolocationIO contains the most unique coordinates,
Autonomous Systems (ASes) typically encode geographic whereas IP2Location has the minimum values. Note that these
information in their DNS naming conventions. Although values do not present accuracy.
DNS naming usage is not mandatory, it is still one of
the most valuable sources of information directly from B. Coverage Analysis
the ASes. DNS-based geolocation methods use geographic In this subsection, we analyze the coverage of four databases
hints encoded in domain names to infer locations. To il- regardless of their correctness. Table II presents the missing
lustrate, Comcast uses the naming convention ”te-0-0-0-5- prefixes and missing IP location counts. Our observations show
sur03.chicago302.il.chicago.comcast.net” which denotes the that IPGeolocationIO misses only 33 prefixes with 7.451% of
location of Chicago, Illinois. the IPv4 geolocation, whereas MaxMind misses 3356 prefixes
We use UNDNS, which is a tool for extracting geolocation with 14.122% of the IPv4 geolocation. Note that we check all
information from DNS names [2]. In our previous work [1], we possible IP addresses in IPv4 space, which is 232 corresponding
updated and improved their key dataset to extend the coverage to 4, 294, 967, 296 unique IP addresses. Nearly 18 million IP
and accuracy of the DNS names. Note that UNDNS provides addresses are reserved for private networks, and these IP blocks
city and country names and does not provide coordinates. In do not have valid geolocation in the databases. Even though it
order to receive coordinates from cities and countries, we use does not negatively affect their coverage, we put them in the
Google Geocoding API [31] missing IP location class.
We use Caida’s ”DNS Names for IPv4 Routed /24 Topology” C. Pairwise Analysis for entire IPv4
dataset, which provides DNS names for every routed /24 in the
In this part, we compare each database pair’s consistency
IPv4 address space [28]. In the dataset, we have 39, 228, 837
with respect to distance discrepancy. We check each IPv4
unique DNS names. UNDNS was able to obtain a valid ge-
addresses location in both databases and calculate the distance
olocation for 6, 318, 932 DNS entries. In the DNS geolocation
between their location. Note that the location of the IP address
list, we have 182 IP addresses with locations that are the same
might be incorrect in both databases, correct in one of the
as the vendor list. Therefore, we obtain a total of 6, 345, 323
databases, or correct in both. For this comparison, our focus is
unique IP addresses with their locations in our final ground
to check the consistency between databases, and the correctness
truth dataset.
of the location is not considered.
IV. E XPERIMENTAL R ESULTS Distance between two locations are calculated by using
Haversine formula which calculates the great-circle distance
A. Database Overview between two points on the surface of a sphere as suggested in
In this work, we use one commercial database (IPGeolocatio- [9]. Haversine formula requires two latitude and longitude pairs
nIO) and three freely available databases (DBIP, IP2Location, to compute the distance between them as shown in Equation 1.
and MaxMind). Unfortunately, none of the databases share their    
creation process. Their database contains entries with an IP ϕ2 − ϕ1 λ2 − λ1
a = sin2 + cos(ϕ1 ) cos(ϕ2 ) sin2
address block (e.g., 8.21.216.0, 8.21.216.255, or 8.21.216.0/24 2 2
prefix), several useful information associated with the block, √
c = 2arcsin( a)
such as country code, city, latitude, and longitude. For example,
d = Rc (1)
DBIP has the following entry in their database: ”8.21.216.0,
8.21.216.255, NA, US, Louisiana, New Orleans, 29.9511, - where ϕ is latitude in radians, λ is longitude in radians, R is
90.0715”. Earth’s radius and d is the great-circle distance between (ϕ1 , λ1 )
TABLE III: Distance Discrepancy Between Database Pairs (All IPv4 Address Space)
Database Pair NA [0-50] (50-100] (100-500] (500-1000] (1000-5000] (5000-10000] (10000-20000)
604,538,688 2,365,689,331 133,830,516 534,390,632 151,668,019 341,294,973 148,673,068 14,882,069
DBIP - IP2Location
14.08% 55.08% 3.12% 12.44% 3.53% 7.95% 3.46% 0.35%
592,735,790 2,324,329,277 114,562,202 569,818,688 150,193,737 367,815,423 154,934,574 20,577,605
DBIP - IPGeolocationIO
13.80% 54.12% 2.67% 13.27% 3.50% 8.56% 3.61% 0.48%
606,562,103 1,218,031,227 144,166,629 641,207,622 525,602,246 988,410,849 154,080,532 16,906,088
DBIP - MaxMind
14.12% 28.36% 3.36% 14.93% 12.24% 23.01% 3.59% 0.39%
604,545,793 2,558,782,455 129,227,732 527,565,715 150,778,376 284,573,277 25,043,536 14,450,412
IP2Location - IPGeolocationIO
14.08% 59.58% 3.01% 12.28% 3.51% 6.63% 0.58% 0.34%
609,290,959 1,287,448,723 152,153,011 597,749,849 469,627,305 1,128,616,275 34,922,087 15,159,087
IP2Location - MaxMind
14.19% 29.98% 3.54% 13.92% 10.93% 26.28% 0.81% 0.35%
606,569,208 1,122,022,035 146,886,889 702,221,727 467,161,664 1,194,610,344 38,839,202 16,656,227
IPGeolocationIO - MaxMind
14.12% 26.12% 3.42% 16.35% 10.88% 27.81% 0.90% 0.39%

and (ϕ2 , λ2 ) pairs. The distance corresponds to the shortest San Jose, Costa Rica (9.933,-84.084).
distance between two points on the surface of a sphere where IP2Location-IPGeolocationIO: Our observations show that
the ellipsoidal effects of the earth are ignored. these two pairs had the most agreement, where 2.5 billion
Table III presents the distance discrepancy between each locations are within 50 km. The overall average distance
database pair. In case one of the databases could not locate discrepancy is 296 km. The maximum distance discrepancy
an IP address, we put it in the NA (not available) class. The is 19, 732 km for the 168.205.92.34 IP address. IP2Location
table shows that DBIP, IP2Location, and IPGeolocationIO agree located the IP address in Buenos Aires, Argentina (-34.603, -
with each other more than they agree with MaxMind. When we 58.381), whereas IPGeolocationIO located it in Nantong, China
check a distance between 0 to 50 km, these three databases have (32.078, 121.260).
around 55.26% on average. However, their pairwise comparison IP2Location-MaxMind:
with MaxMind gives 28.15% on average for the same distance As we stated above, database comparisons with MaxMind
range. have the most disagreement in locations with an overall
DBIP-IP2Location: We observe that at least one database 760 km average distance. Between MaxMind pairs, this pair
could not locate around 604 million IP addresses corresponding has the lowest average distance discrepancy with 684 km.
to 14.08% in the entire IPv4 space. Moreover, around 2.3 billion The maximum distance discrepancy is 19, 665 km for prefix
IP addresses are located within a 50 km distance, corresponding 161.123.66.0/24. IP2Location located the prefix in Auckland,
to 55.08% in the entire IPv4 space. Interestingly, 15.29% New Zealand (-36.866, 174.766), whereas MaxMind located it
of the IP addresses are located more than 500 km distance. in Rabat, Morocco (34.012, -6.848).
The maximum distance discrepancy is 19, 727 km for prefix IPGeolocationIO-MaxMind: This pair has the lowest agree-
66.198.44.0/24. DBIP located the prefix in Quito, Ecuador (- ment within the 50 km range with only 26.12%. Additionally,
0.202, -78.494), whereas IP2Location located it in Singapore 16.35% disagreement in the 100-500 km range and 27.81%
(1.289, 103.850). agreement in the 1000-5000 km range are the highest numbers
DBIP-IPGeolocationIO: Our observations show that around compared to other pairs. The maximum distance discrepancy
2.3 billion IP addresses are located within a 50 km distance, is 19, 899 km for prefix 77.81.118.64/30. IPGeolocationIO
corresponding to 54.12% in the entire IPv4 space. Comparing located the prefix in Hamilton, New Zealand (-37.763,175.246),
the DBIP-IPGeolocationIO pair with DBIP-IP2Location, DBIP- whereas MaxMind located it in Seville, Spain (37.384,-5.970).
IPGeolocationIO has more than 6 million IP addresses in the
10000-20000 km discrepancy range. The maximum distance D. Pairwise Analysis for Prefixes
discrepancy is 19, 910 km for prefix 167.114.26.40/29. DBIP Geolocation databases provide geolocation information for IP
located the prefix in Jakarta, Indonesia (-6.176, 106.857), blocks. For example, MaxMind provides geolocation informa-
whereas IP2Location located it in Santander, Colombia (7.124,- tion of the prefix 1.0.0.0/24 as [−37.8333, 145.2375] latitude
73.109). and longitude. In this part, we check each database pair’s
DBIP-MaxMind: Comparing DBIP with the other two distance discrepancy regarding the prefixes. Figure 1 shows the
databases, DBIP has the most discrepancy with MaxMind. Only cumulative distribution function (CDF) of the database pairs
1.2 billion of the IP addresses are located within a 50 km with respect to distance discrepancy. In addition, Table IV
distance, whereas the overall average discrepancy is 864 km. shows the minimum, first quartile, second quartile (median),
The maximum distance discrepancy is 19, 750 km for prefix third quartile, maximum, mean, and standard deviation of the
45.138.10.232/30. DBIP located the prefix in Shire of Cocos, distribution. Note that −1 represents not available locations. It
West Island (-12.145,96.821), whereas MaxMind located it in is evident that the database pairs have a significant discrepancy
106
1 5
DBIP
IP2Location
4
0.8 IPGeolocationIO
Maxmind

Frequency
3
0.6
2
CDF

0.4 1
DBIP - IP2Location
DBIP - IPGeolocationIO
DBIP - Maxmind 0
0.2

[0,50]
[NA]

(50,100]

(100,500]

(500,1000]

(1000,5000]

(5000,10000]

(10000,20000]
IP2Location - IPGeolocationIO
IP2Location - Maxmind
IPGeolocationIO - Maxmind
0
0 500 1000 1500 2000
Distance Discrepancy (km) Distance Discrepancy (km)
Fig. 1: Pairwise Databases Distance discrepancy CDF (Zoom Fig. 2: Databases vs Ground Truth Distance Discrepancy
in to 2000 km)

TABLE IV: Summary Statistics for Distance Discrepancy in 1


km Between Database Pairs (Prefixes)
Database Pair Q0 Q1 Q2 Q3 Q4 Mean StdDev
0.8
DBIP - IP2Location -1 4 45 353 19728 511.31 1408.89
DBIP - IPGeolocationIO -1 6 110 508 19910 642.63 1526.06
DBIP - MaxMind -1 13 108 513 19750 603.69 1466.04
0.6
IP2Location - IPGeolocationIO -1 7 124 516 19732 596.92 1385.16
CDF

IP2Location - MaxMind -1 15 98 501 19666 569.07 1400.56


IPGeolocationIO - MaxMind -1 14 166 715 19900 670.53 1445.13
0.4

DBIP
with a mean between 511 to 670 km. Additionally, the third 0.2 IP2Location
quartile (75%) shows us that the discrepancy is between 353 IPGeolocationIO
to 715, with an average of 517 km. Maxmind
0
0 200 400 600 800 1000
E. Database Coverage and Accuracy over the Ground Truth
Distance Discrepancy (km)
This subsection discusses the accuracy of four databases with
respect to the ground truth dataset that contains 6, 345, 323 Fig. 3: Databases vs Ground Truth Distance Discrepancy CDF
unique IP addresses with their locations. In order to assess (Zoom in to 1000 km)
the accuracy, we use distance discrepancy as our metric. For
each IP address in the ground truth database, we check the within a 50 km distance. On the other hand, MaxMind has
location provided by databases, then find the distance between the lowest accuracy within the 50 km range, where MaxMind
the ground truth location and database location. Figure 2 shows located 38.97% IP addresses (2,472,911). All four databases
the histogram of the distance discrepancy. We observe that all locate around 25,000 IP addresses with more than 5000 km
four databases cover the majority of the IP addresses in the discrepancy from the ground truth location.
ground truth. DBIP and IPGeolocationIO could not locate 2 IP Figure 3 shows the cumulative distribution function (CDF)
addresses, IP2Location 14 and MaxMind 19 IP addresses. IP- of the database pairs with respect to distance discrepancy. In
GeolocationIO located 71.05% of the IP addresses (4,508,464) addition, Table V shows the minimum, first quartile, second
quartile (median), third quartile, maximum, mean, and standard
TABLE V: Summary Statistics for Distance Discrepancy in km deviation of the distribution. DBIP has the lowest mean at
Between Database and Ground Truth 316.89 km, and MaxMind has the highest mean at 432.2 km.
Even though IPGeolocationIO located most IP addresses within
Database Q0 Q1 Q2 Q3 Q4 Mean StdDev a 50 km distance, their distance discrepancy mean is 423.43
DBIP -1 0.9 16.53 349.36 19821.64 316.89 777.86 km. The reason is that they located 23,215 IP addresses in
IP2Location -1 1.9 83.97 514.64 19820.02 423.43 833.45 5000-10000 km interval and 14,719 IP addresses in 10000-
IPGeolocationIO -1 2.21 4.98 205.63 19820.56 334.46 1025.41 20000 km interval. These numbers are the highest within all
MaxMind -1 15.98 146.21 617.88 19834.57 432.20 817.31
four databases.
V. C ONCLUSIONS [12] C. Huang, D. A. Maltz, J. Li, and A. Greenberg, ”Public
In this paper, we evaluate the coverage and accuracy of DNS System and Global Traffic Management”, IEEE INFO-
four widely used IP geolocation databases. We examine their COM , 2011
consistency and coverage using the entire IPv4 space and [13] J. Akhilomen, ”Data Mining Application for Cyber Credit-
prefix level. Our pairwise comparisons show that databases have card Fraud Detection System”, Springer Proceedings of the
significant disagreement providing locations with a 620 km Industrial Conference on Data Mining, 2013
overall average distance discrepancy. Additionally, we create [14] G. C. Bruner and A. Kumar, ”Attitude Toward Location-
6, 345, 323 IP addresses as ground truth and analyze the accu- based Advertising”, Journal of Interactive Advertising, vol.
racy of these databases. Our results show that most databases 7.2, 2007
provide comprehensive coverage over IPv4 space. However, [15] D. J. B. Svantesson, ”E-commerce Tax: How the Taxman
our findings indicate that the accuracy of these databases is Brought Geography to the’borderless’ Internet”, Revenue
questionable. Therefore, it is essential to use the information Law Journal, vol. 17, 2007
obtained from these databases with caution and verify its [16] I. Poese, S. Uhlig, M. A. Kaafar, B. Donnet, and B. Gu-
accuracy before making any decisions based on it. eye, ”IP Geolocation Databases: Unreliable?”, SIGCOMM
Computer Communication Review, vol. 41.2, 2011
ACKNOWLEDGEMENTS [17] M. Gharaibeh, A. Shah, B. Huffaker, H. Zhang, R. Ensafi,
We are grateful to IPGeolocationIO for providing their com- and C. Papadopoulos, ”A Look at Router Geolocation in
mercial geolocation database. We are also thankful to CAIDA, Public and Commercial Databases”, ACM IMC, 2017
MaxMind, DBIP, and IP2Location projects for publicly sharing [18] A. Elmokashfi, E. Myakotnykh, J. M. Evang, A. Kval-
their datasets. bein, and T. Cicic, ”Geography Matters: Building an Effi-
cient Transport Network for a Better Video Conferencing
R EFERENCES
Experience”, ACM Conference on Emerging Networking
[1] A. Y. Nur and M. E. Tozal, ”Cross-AS (X-AS) Internet Experiments and Technologies, 2013
Topology Mapping”, Computer Networks, Vol. 132, 2018 [19] B. Quoitin, V. Van den Schrieck, P. François, and O.
[2] N. Spring, R. Mahajan, and D. Wetherall, ”Measuring ISP Bonaventure, ”IGen: Generation of Router-Level Internet
Topologies with Rocketfuel”, ACM SIGCOMM, 2002 Topologies through Network Design Heuristics”, IEEE In-
[3] V. Padmanabhan and L. Subramanian, ”An Investigation of ternational Teletraffic Congress, 2009
Geographic Mapping Techniques for Internet Hosts”, ACM [20] B. Huffaker, M. Fomenkov, and K. C. Claffy, ”DRoP:
SIGCOMM, 2001 DNS-based Router Positioning”, ACM SIGCOMM Com-
[4] B. Gueye, A. Ziviani, M. Crovella, and S. Fdida, puter Communication Review, vol. 44.3, 2014
”Constraint-based Geolocation of Internet Hosts”, [21] O. Dan, V. Parikh, and B. D. Davison, ”IP Geolocation
IEEE/ACM Transactions on Networking, 2006 through Reverse DNS”, ACM Transactions on Internet Tech-
[5] E. Katz-Bassett, J. P. John, A. Krishnamurthy, D. Wetherall, nology (TOIT), vol. 22.1, 2021
T. Anderson, and Y. Chawathe, ”Towards IP Geolocation [22] B. Hufaker, M. Fomenkov, and KC clafy, ”Geocom-
Using Delay and Topology Measurements”, Internet Mea- pare: a Comparison of Public and Commercial Geolocation
surement Conference (IMC), 2006 Databases”, Technical Report, CAIDA, 2011
[6] J. Wu, Y. Zhang, Z. M. Mao, and K. G. Shin, ”Internet [23] Y. Shavitt and N. Zilberman, ”A Geolocation Databases
Routing Resilience to Failures: Analysis and Implications”, Study”, IEEE Journal on Selected Areas in Communications,
ACM CoNEXT, 2007 vol. 29.10, 2011
[7] R. Oliveira, M. Lad, B. Zhang, and L. Zhang, ”Geograph- [24] DB-IP Geolocation Database - January 2023 - https:
ically Informed Inter-Domain Routing”, IEEE International //www.db-ip.com/
Conference on Network Protocols, 2007 [25] GeoLite2 Geolocation Database - January 2023 - https:
[8] H. V. Madhyastha, T. Anderson, A. Krishnamurthy, N. //www.maxmind.com
Spring, and A. Venkataramani, ”A Structural Approach to [26] LITE Geolocation Database - January 2023 - https://lite.
Latency Prediction”, ACM IMC 2006 ip2location.com
[9] A. Y. Nur and M. E. Tozal, ”Geography and Routing in [27] Ip Geolocation IO Geolocation Database - January 2023
the Internet”, ACM Transactions on Spatial Algorithms and - https://ipgeolocation.io/
Systems (TSAS), vol. 4.4, 2018 [28] The CAIDA UCSD IPv4 Routed /24 DNS Names Dataset
[10] A. Hannak, P. Sapiezynski, A. M. Kakhki, B. Krishna- - January 2023 - https://www.caida.org/catalog/datasets/
murthy, D. Lazer, A. Mislove, and C. Wilson, ”Measuring ipv4 dnsnames dataset/
Personalization of Web Search”, ACM International Confer- [29] RIPE Atlas - January 2023 - https://atlas.ripe.net/
ence on World Wide Web, 2013 [30] Measurement Lab (M-Lab) - January 2023 - https://www.
[11] C. A. Shue, N. Paul, and C. R. Taylor, ”From an IP measurementlab.net/
Address to a Street Address: Using Wireless Signals to [31] Geocoding API - https://developers.google.com/maps/
Locate a Target”, Proceedings of the Workshop on Offensive documentation/geocoding
Technologies (WOOT), USENIX, 2013

View publication stats

You might also like